A Study of Stock Market Prediction through Sentiment Analysis


  • SANJAY SONI J.E.C Jabalpur


Bombay Stock Exchange (BSS), Logistic Regression, KNN-LR Hybrid Classifier, Naïve Bayes Algorithm, Particle Swarm Optimization (PSO), Sentiment Analysis, Stock Market, Support Vector Machine, Social Media, Twitter


In the modern world, the current state and course of economic development and growth are determined by the fortunes and vagaries of the stock markets. In this research study, the authors provide a model that can aid in making reliable and error-free predictions of stock market trends. The described approach uses sentiment analytics based on financial news and past stock market patterns. The proposed structure has been used to forecast stock market patterns that incorporates sentiment analysis taken from news and previous stock market patterns to provide more precise results. The model shown here has provided a two-step process- the Naive Bayes algorithm and forecasting future values of stocks using evaluation findings on text polarity and historical stock value movement information. A novel idea known as the KNN-LR Hybrid algorithm has been introduced to achieve better outcomes when evaluating the accuracy and efficacy of other machine learning algorithms.


. Wyss, B. (2001). Fundamentals of the Stock Market. McGraw Hill, 1–245.

. Khan, W., Ghazanfar, M. A., Azam, M. A., Karami, A., Alyoubi, K. H., and Alfakeeh, A. S. (2020). Stock market prediction using machine learning classifiers and social media, news. Journal of Ambient Intelligence and Humanized Computing.

. See-To, E. W. K., and Yang, Y. (2017). Market sentiment dispersion and its effects on stock return and volatility. Electronic Markets, 27: 283–296.

. Xindan, LI., and Bing, Z. (2017). Stock market behavior and Investor sentiment: Evidence from China. Front. Bus. Res. China, 2(2): 277–282.

. Smailovic, J., Grcar, M., Lavrac, N., and Znidarsic, M. (2013). Predictive Sentiment Analysis of Tweets: A Stock Market Application. Human-Computer Interaction and Knowledge Discovery in Complex, Unstructured, Big Data. HCI-KDD 2013. Lecture Notes in Computer Science, 7947: 77-88.

. Wang, H., and Ou, P. (2009). Prediction of Stock Market Index Movement by Ten Data Mining Techniques. Modern Applied Science, 3(12):28-42.

Shu-e, Y., Qiang, Z. (2009). Noise Trading, Investor Sentiment Volatility, and Stock Returns. Systems Engineering – Theory & Practice, 29(3):40-47.

Pal, R., Pawar, U., Zambare, K., and Hole, V. (2020). Predicting Stock Market Movement Based on Twitter Data and News Articles Using Sentiment Analysis and Fuzzy Logic. Second International Conference on Computer Networks and Communication Technologies. ICCNCT 2019. Lecture Notes on Data Engineering and Communications Technologies, 44:561-571.

Bose, R., Dey, R. K. Roy, S., and Sarddar, D. (2018). Analyzing Political Sentiment Using Twitter Data. Information and Communication Technology for Intelligent Systems. Smart Innovation, Systems and Technologies, 107:427-436.

Federer, L. M., Belter, C. W., Joubert, D. J. Livinski, A., Lu, Y-L., Snyders, L. N., and Thompson, H. (2018). Data sharing in PLOS ONE: An analysis of Data Availability Statements. PLoS ONE, 13(5):1-12.

Das, N., Ghosh, P., and Roy D. (2020). Effect of Demonetization on Stock Market Co-rrelated with Geo-Twitter Sentiment Analysis. In: Dawn S., Balas V., Esposito A., Gope S. (eds) Intelligent Techniques and Applications in Science and Technology. ICIMSAT 2019. Learning and Analytics in Intelligent Systems, 12:780-797.

Bose, R., Dey, R. K., Chakraborty, S., Roy, S., and Sarddar, D. Examining Hidden Meaning of E-commerce Platform. International Journal of Innovative Technology and Exploring Engineering (IJITEE), 8(12):257-261.

Han, J., Kamber, M., Pei, J. (2011). Data Mining Concepts and Techniques. Morgan Kaufmann. 3:1-744.

Namugera, F., Wesonga, R., and Jehopio, P. (2019). Text mining and determinants of sentiments: Twitter social media usage by traditional media houses in Uganda. Computational Social Networks, 6(3):1-21.

Aqlan, A.A.Q., Manjula, B., Lakshman Naik, R. (2019). A Study of Sentiment Analysis: Concepts, Techniques, and Challenges. International Conference on Computational Intelligence and Data Engineering. Lecture Notes on Data Engineering and Communications Technologies, 28:14-162.

Sharma, D., Sabharwal, M., Goyal, V., and Vij M. (2020). Sentiment Analysis Techniques for Social Media Data: A Review. First International Conference on Sustainable Technologies for Computational Intelligence. Advances in Intelligent Systems and Computing, 1045:.75-90.

Kothari, A. A., and Patel, W. D. (2015). A Novel Approach Towards Context Sensitive Recommendations Based on Machine Learning Methodology. 2015 Fifth International Conference on Communication Systems and Network Technologies, 1114-1118.

Basari, A. S. H., Hussin, B., Ananta, G. P. (2012). Opinion Mining of Movie Review using Hybrid Method of Support Vector Machine and Particle Swarm Optimization. Malaysian Technical Universities Conference on Engineering and Technology (MUCET). 4:545-552.

Gopal, A., Sultani, M. M., Bansal, J. C. (2019). On Stability Analysis of Particle Swarm Optimization Algorithm. Arabian Journal for Science and Engineering, 1-10.

Kennedy, J., and Eberhart, R. (1995). Particle swarm optimization. International Conference on Neural Networks, 4:1942-1948.

Khan, W., Ghazanfar, M.A., Azam, M.A., Karami, A., Alyoubi, K. H., Alfakeeh, A. S. (2020). Stock market prediction using machine learning classifiers and social media, news. Journal of Ambient Intelligence and Humanized Computing.

More, P., and Ghotkar, A. (2016). A Study of Different Approaches to Aspect-based Opinion Mining. International Journal of Computer Applications, 145(6):11-15.

Smeureanu, I., and Bucur, C. (2012). Applying Supervised Opinion Mining Techniques on Online User Reviews. Informatica Economica, 16(2):81-91.

Sharma, D., and Sabharwal, M. (2019). Sentiment Analysis for Social Media using SVM Classifier of Machine Learning. International Journal of Innovative Technology and Exploring Engineering (IJITEE), 8(9):39-47.

Umamaheswari, K., Rajamohana, S. P., and Aishwaryalakshmi, G. Opinion Mining using Hybrid Methods. International Journal of Computer Applications, 18-21.

Sarddar, D., Dey, R. K., Bose, R., and Roy, S. Topic Modeling as a Tool to Gauge Political Sentiments from Twitter Feeds. International Journal of Natural Computing Research (IJNCR), 9(2):1-22.

Srivastava, D. K., and Bhambhu, L. (2009).Data Classification using Support Vector Machine.Journal of Theoretical and Applied Information Technology, 1 – 7.

Hastie, T., Rosset, S., Tibshirani, R., and Zhu, J. (2004). The Entire Regularization Path for the Support Vector Machine. Journal of Machine Learning Research, 5:1391-1415.

Tong, S., and Koller, D. (2001). Support Vector Machine Active Learning with Applications to Text Classification. Journal of Machine Learning Research, 45-66.

Chang, C-C., Lin, C-J. (2011). LIBSVM: A library for support vector. ACM Transactions on Intelligent Systems and Technology, 2(3):1-27.

Watson, T.J. (2001). An empirical study of the naive Bayes classifier. 1-6.

Ahmad, S., Asghar, M.Z., Alotaibi, F.M., and Awan I. (2019). Detection and classification of social media-based extremist affiliations using sentiment analysis techniques. Human Centric Computing and Information Science. 9:24.

Ding, G., and Qin, L. (2019). Study on the prediction of stock price based on the associated network model of LSTM. International Journal of Machine Learning and Cybernetics. 1-11.

Wilson, J.R., and Lorenz, K.A. (2015). Standard Binary Logistic Regression Model. In: Modeling Binary Correlated Responses using SAS, SPSS and R. ICSA Book Series in Statistics, 9:25-54.

Tolles, J. Meurer, W. J. (2016). Logistic Regression Relating Patient Characteristics to Outcomes. JAMA Guide to Statistics and Methods. 316(5):533-534.

Phienthrakul, T., Kijsirikul, B., Takamura, H., and Okumura, M. (2009). Sentiment Classification with Support Vector Machines and Multiple Kernel Functions. In: Leung C.S., Lee M., Chan J.H. (eds) Neural Information Processing. ICONIP 2009. Lecture Notes in Computer Science, 5864:583-592.

Cai,L., Hofmann, T. (2004). Hierarchical Document Categorization with Support Vector Machines. CIKM’04, ACMI. 78-87.

Lee, T. S., Chiu, T. C. C., Lu, C. J,and Chen, I. F.(2002). Credit scoring using the hybrid neural discriminant technique.Expert Systems with Applications, 23(3):245–254.

Bentz, Y., and Merunka, D.(2000). Neural networks and the multinomial logit for brand choice modelling: a hybrid approach. Journal of Forecasting, 19(3):177–200.

Altman, N.S. (1992). An Introduction to Kernel and Nearest-Neighbor Nonparametric Regression. The American Statistician. 46(3):175-85.

Verma, P., Om, H. (2019). A novel approach for text summarization using optimal combination of sentence scoring methods. Sādhanā. 44(110 ).

Porter, M. (1980). An algorithm for suffix stripping. Program: electronic library and information systems. 14(3): 130-137.

Bose R., Dey, R.K., Roy, S., and Sarddar D. (2020). Sentiment Analysis on Online Product Reviews. Advances in Intelligent Systems and Computing, 933:559-569.

Ting K.M. (2017). Confusion Matrix. Encyclopedia of Machine Learning and Data Mining.

Singh, R., and Baidya, D. (2019). Usage of Data Science to Predict String Integrity Failures. Society of Petroleum Engineers.

Chandra, K., Bhattacharjee, P., Roy, S., Biswas, S. (2020). Intelligent Data Prognosis of Recent of Depression in Medical Diagnosis. ICRITO’20, IEEE. 1-5.

Narayana, G. &Kolli, Kamakshaiah.(2021). Fuzzy K-means clustering with fast density peak clustering on multivariate kernel estimator with evolutionary multimodal optimization clusters on a large dataset. Multimedia Tools and Applications. 80.1-19.10.1007/s11042-020-09718-4.

Xu, Shuliang& Liu, Shenglan& Zhou, Jian & Feng, Lin. (2019). Fuzzy rough clustering for categorical data. International Journal of Machine Learning and Cybernetics. 10. 10.1007/s13042-019-01012-6.

Qian ,C. , Mathur , N., Hidayati N.Z., Arora , R., Gupta ,V., Ali M.,(2022)Understanding public opinions on social media for financial sentiment analysis using AI-based techniques,Information Processing &Management,Volume 59, Issue 6,103098,ISSN 0306-4573

Srijiranon K., Lertratanakham Y., TanantongT.(2022) A Hybrid Framework Using PCA, EMD and LSTM Methods for Stock Market Price Prediction with Sentiment Analysis. Applied Sciences. ; 12(21):10823. https://doi.org/10.3390/app122110823

Parekh R. et al., (2022)"DL-GuesS: Deep Learning and Sentiment Analysis-Based Cryptocurrency Price Prediction," in IEEE Access, vol. 10, pp. 35398-35409, 2022, doi: 10.1109/ACCESS.3163305.