Vol. 22 No. 1 (2023): Mapana Journal of Sciences
Research Articles

Survey on Feature Selection for Data Mining and its Application in Opinion Mining

Sumanth S
HOD, Department of Computer Science, Smt. V. H. D. Central Institute of Home Science, Bengaluru

Published 2023-01-14

Keywords

  • Sentiment Analysis (SA),
  • Opinion Mining,
  • Feature Selection,
  • Machine Learning,
  • Meta-Heuristics Methods

Abstract

Sentiment Analysis (SA) and opinion mining is used for the systems of business intelligence in analyzing public opinion towards various brands and implementing market strategies. Machine learning aims at developing the algorithm in such a way that the performance of a system is optimized with past data or experience. In the case of SA, the feature selection method is used for the identification of different goals such as reduction of cost of computation, avoidance of over-fitting, and enhancing the accuracy of classification of the model. The methods of feature selection can reduce their original feature sets by the removal of irrelevant features for the classification of text sentiment and their accuracy. The paper surveys various feature selection techniques available in the literature. The study shows that feature selection significantly improves the classification of the sentiments, but it depends on the technique adopted and the number of features selected. 

References

  1. Kumar, B. S., & Reddy, B. (2016). An Analysis on Opinion Mining: Techniques and Tools. Indian Journal of Research, 5(8).
  2. Ravi, K., & Ravi, V. (2015). A survey on opinion mining and sentiment analysis: tasks, approaches and applications. Knowledge-Based Systems, 89, 14-46.
  3. Tabassum, A., & Patil, R. R. (2020). A Survey on Text Pre-Processing & Feature Extraction Techniques in Natural Language Processing. International Research Journal of Engineering and Technology (IRJET), Volume: 07 Issue: 06, 4864-4867.
  4. Samsudin, N., Puteh, M., Hamdan, A. R., &Nazri, M. Z. A. (2013). Immune based feature selection for opinion mining. In Proceedings of the World Congress on Engineering (Vol. 3, pp. 3-5).
  5. Varma, R. R., Nagesh, Y. N., Umesh, I. M., & Pradeep, N. C. Opinion Mining using Machine Learning. International Journal of Latest Technology in Engineering, Management & Applied Science (IJLTEMAS) Volume VI, Issue XII, December 2017 ISSN 2278-2540
  6. Gamal, D., Alfonse, M., M El-Horbaty, E. S., & M Salem, A. B. (2019). Analysis of Machine Learning Algorithms for Opinion Mining in Different Domains. Machine Learning and Knowledge Extraction, 1(1), 224-234.
  7. Patel, V., Prabhu, G., &Bhowmick, K. (2015). A survey of opinion mining and sentiment analysis. International Journal of Computer Applications, 131(1), 24-27.
  8. Riaz, S., Fatima, M., Kamran, M., & Nisar, M. W. (2019). Opinion mining on large scale data using sentiment analysis and k-means clustering. Cluster Computing, 22(3), 7149-7164.
  9. Kang, M., Ahn, J., & Lee, K. (2018). Opinion mining using ensemble text hidden Markov models for text classification. Expert Systems with Applications, 94, 218-227.
  10. Kaladevi, P., &Thyagarajah, K. (2019). Integrated CNN-and LSTM-DNN-based sentiment analysis over big social data for opinion mining. Behaviour & Information Technology, 1-9.
  11. Basiri, M. E., &Kabiri, A. (2020). HOMPer: A new hybrid system for opinion mining in the Persian language. Journal of Information Science, 46(1), 101-117.
  12. Eo, K. S., & Lee, K. C. (2019). Investigating Opinion Mining Performance by Combining Feature Selection Methods with Word Embedding and BOW (Bag-of-Words). Journal of Digital Convergence, 17(2), 163-170.
  13. Hemalatha, S. M., &Selvi, C. K. (2018). Feature Selection for Opinion Mining Using Shuffled Frog Leaping Algorithm. International Journal of Engineering and Computer Science, 7(02), 23656-23662.
  14. Jain, A., Nandi, B. P., Gupta, C., &KumarTayal, D. (2019, March). A hybrid framework based on PSO and neutrosophic set for document level sentiment analysis. In International Conference on Information Technology and Applied Mathematics (pp. 372-379). Springer, Cham.
  15. Jiang, H., Kwong, C. K., Kremer, G. O., & Park, W. Y. (2019). Dynamic modelling of customer preferences for product design using DENFIS and opinion mining. Advanced Engineering Informatics, 42, 100969.
  16. Sangam, S., & Shinde, S. (2018, May). A Novel Feature Selection Method Based on Genetic Algorithm for Opinion Mining of Social Media Reviews. In International Conference on Information, Communication and Computing Technology (pp. 167-175). Springer, Singapore.
  17. Ernawati, S., &Yulia, E. R. (2018, August). Implementation of The Naïve Bayes Algorithm with Feature Selection using Genetic Algorithm for Sentiment Review Analysis of Fashion Online Companies. In 2018 6th International Conference on Cyber and IT Service Management (CITSM) (pp. 1-5). IEEE.
  18. Kurniawati, I., &Pardede, H. F. (2018, October). Hybrid method of information gain and particle swarm optimization for selection of features of SVM-based sentiment analysis. In 2018 International Conference on Information Technology Systems and Innovation (ICITSI) (pp. 1-5). IEEE.
  19. Jain, A., Nandi, B. P., Gupta, C., & Tayal, D. K. (2020). Senti-NSetPSO: large-sized document-level sentiment analysis using Neutrosophic Set and particle swarm optimization. Soft Computing, 24(1), 3-15.
  20. Akyol, S., &Alatas, B. (2020). Sentiment classification within online social media using whale optimization algorithm and social impact theory-based optimization. Physica A: Statistical Mechanics and its Applications, 540, 123094.
  21. Osmani, A., Mohasefi, J. B., &Gharehchopogh, F. S. (2020). Sentiment Classification Using Two Effective Optimization Methods Derived from The Artificial Bee Colony Optimization and Imperialist Competitive Algorithm. The Computer Journal.
  22. Alarifi, A., Tolba, A., Al-Makhadmeh, Z., & Said, W. (2020). A big data approach to sentiment analysis using greedy feature selection with cat swarm optimization-based long short-term memory neural networks. The Journal of Supercomputing, 76(6), 4414-4429.
  23. Padmavathy, P., &Mohideen, S. P. (2020). An efficient two-pass classifier system for patient opinion mining to analyze drugs satisfaction. Biomedical Signal Processing and Control, 57, 101755.
  24. Ahamed, S., Danti, A., & Raghavendra, S. P. (2019, March). Feature Based Fuzzy Framework for Sentimental Analysis of Web Data. In 2019 International Conference on Data Science and Communication (IconDSC) (pp. 1-5). IEEE.
  25. Kumar, R., Pannu, H. S., &Malhi, A. K. (2020). Aspect-based sentiment analysis using deep networks and stochastic optimization. Neural Computing and Applications, 32(8), 3221-3235.
  26. Keyvanpour, M., Zandian, Z. K., &Heidarypanah, M. (2020). OMLML: a helpful opinion mining method based on lexicon and machine learning in social networks. Social Network Analysis and Mining, 10(1), 1-17.
  27. Asghar, M. Z., Khan, A., Zahra, S. R., Ahmad, S., &Kundi, F. M. (2019). Aspect-based opinion mining framework using heuristic patterns. Cluster Computing, 22(3), 7181-7199.
  28. Li, Z., Tian, Z. G., Wang, J. W., & Wang, W. M. (2019). Extraction of affective responses from customer reviews: an opinion mining and machine learning approach. International Journal of Computer Integrated Manufacturing, 1-16.
  29. Khan, A., Younis, U., Kundi, A. S., Asghar, M. Z., Ullah, I., Aslam, N., & Ahmed, I. (2019, April). Sentiment classification of user reviews using supervised learning techniques with comparative opinion mining perspective. In Science and Information Conference (pp. 23-29). Springer, Cham.
  30. Ghosh, S., Hazra, A., & Raj, A. (2020). A Comparative Study of Different Classification Techniques for Sentiment Analysis. International Journal of Synthetic Emotions (IJSE), 11(1), 49-57.
  31. Bhalla, R., &Bagga, A. (2019). Opinion mining framework using proposed rb-bayes model for text classification. International Journal of Electrical & Computer Engineering (2088-8708), 9(1).
  32. Tiwari, P., Pandey, H. M., Khamparia, A., & Kumar, S. (2019). Twitter-based opinion mining for flight service utilizing machine learning. Informatica, 43(3).