Fake News Detection in Low Resource Language Using Machine Learning Techniques and SMOTE
Published 2024-12-23
Keywords
- Fake news classification,
- Low resource languages,
- machine learning,
- Machine learning models,
- Comparative analysis
- SMOTE ...More
Copyright (c) 2024
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Abstract
Fake content dissemination is a significant challenge in the era of digital information. This paper discusses the critical issues in detecting fake content in news articles of low-resource languages, specifically focusing on the Tamil language, where the availability of labelled data and advanced natural language processing tools are limited. We employ traditional machine learning models to mitigate this problem, with a particular emphasis on the detection and classification of fake and real content in the context of Tamil news. Our study explores the performance of different models like logistic regression, support vector machines (SVM), naive Bayes, k-nearest neighbours (KNN), decision trees, random forests and passive-aggressive classifiers. By conducting a comprehensive comparative analysis of these models within the challenging linguistic environment of Tamil, we aim to provide insights into their suitability for detecting fake content in low-resource languages and draw meaningful comparisons between their performance.
References
- D. Lin, Y. Murakami, and T. Ishida, "Towards Language Service Creation and Customization for Low-Resource Languages," Information, vol. 11, no. 2, p. 67, 2020. doi: 10.3390/info11020067.
- S. Dua and X. Du, Data Mining and Machine Learning in Cybersecurity, New York: Auerbach Publications, 2016. doi: 10.1201/b10867.
- Z. Khanam, B. N. Alwasel, H. Sirafi, and M. Rashid, "Fake news detection using machine learning approaches," in IOP Conference Series: Materials Science and Engineering, vol. 1099, no. 1, p. 012040, IOP Publishing, 2021. doi: 10.1088/1757-899X/1099/1/012040.
- Q. Nan, J. Cao, Y. Zhu, Y. Wang, and J. Li, "MDFEND: Multi-domain fake news detection," in Proceedings of the 30th ACM International Conference on Information & Knowledge Management, pp. 3343-3347, 2021. doi: 10.1145/3459637.3482139.
- A. Magueresse, V. Carles, and E. Heetderks, "Low-resource languages: A review of past work and future challenges," arXiv preprint arXiv:2006.07264, 2020. doi: 10.48550/arXiv.2006.07264.
- D. Kakwani, A. Kunchukuttan, S. Golla, N. C. Gokul, A. Bhattacharyya, M. M. Khapra, and P. Kumar, "IndicNLPSuite: Monolingual corpora, evaluation benchmarks and pre-trained multilingual language models for Indian languages," in Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 4948-4961, 2020. doi: 10.18653/v1/2020.findings-emnlp.445.
- S. Gokila, S. Rajeswari, and S. Deepa, "TAMIL-NLP: Roles and impact of machine learning and deep learning with natural language processing for Tamil," in 2023 Eighth International Conference on Science Technology Engineering and Mathematics (ICONSTEM), pp. 1-9, IEEE, 2023. doi: 10.1109/ICONSTEM56934.2023.10142680.
- AjhayAk, "Dataset: FakeNewsDetectionTamil," 2023. [Online]. Available: https://github.com/AjhayAk/FakeNewsDetectionTamil.
- R. Mohammed, J. Rawashdeh, and M. Abdullah, "Machine Learning with Oversampling and Undersampling Techniques: Overview Study and Experimental Results," in 2020 11th International Conference on Information and Communication Systems (ICICS), Irbid, Jordan, pp. 243-248, 2020. doi: 10.1109/ICICS49469.2020.239556.
- M. Fayaz, A. Khan, M. Bilal, and S. U. Khan, "Machine learning for fake news classification with optimal feature selection," Soft Computing, vol. 26, no. 16, pp. 7763-7771, 2022. doi: 10.1007/s00500-022-06773-x.
- S. Gupta and P. Meel, "Fake news detection using passive-aggressive classifier," in Inventive Communication and Computational Technologies: Proceedings of ICICCT 2020, Lecture Notes in Networks and Systems, vol. 145, pp. 155-164, Springer, Singapore, 2021. doi: 10.1007/978-981-15-7345-3_13.
- M. J. Awan et al., "Fake news data exploration and analytics," Electronics, vol. 10, no. 19, p. 2326, 2021. doi: 10.3390/electronics10192326.
- M. Granik and V. Mesyura, "Fake news detection using naive Bayes classifier," in 2017 IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON), pp. 900-903, IEEE, 2017. doi: 10.1109/UKRCON.2017.8100379.
- M. G. Hussain, M. R. Hasan, M. Rahman, J. Protim, and S. A. Hasan, "Detection of Bangla fake news using MNB and SVM classifier," in 2020 International Conference on Computing, Electronics & Communications Engineering (iCCECE), pp. 81-85, IEEE, 2020. doi: 10.1109/iCCECE49321.2020.9231167.
- T. Mladenova and I. Valova, "Analysis of the KNN classifier distance metrics for Bulgarian fake news detection," in 2021 3rd International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), pp. 1-4, IEEE, 2021. doi: 10.1109/HORA52670.2021.9461333.
- N. Smitha and R. Bharath, "Performance comparison of machine learning classifiers for fake news detection," in 2020 Second International Conference on Inventive Research in Computing Applications (ICIRCA), pp. 696-700, IEEE, 2020. doi: 10.1109/ICIRCA48905.2020.9183072.
- D. Mangal and D. K. Sharma, "A Framework for Detection and Validation of Fake News via Authorize Source Matching," in Micro-Electronics and Telecommunication Engineering: Proceedings of 4th ICMETE 2020, Lecture Notes in Networks and Systems, vol. 179, Springer, Singapore, 2021. doi: 10.1007/978-981-33-4687-1_54.
- S. R. Indarapu et al., "Comparative analysis of machine learning algorithms to detect fake news," in 2021 3rd International Conference on Signal Processing and Communication (ICPSC), pp. 591-594, IEEE, 2021. doi: 10.1109/ICSPC51351.2021.9451690.
- R. Sivanaiah et al., "Fake News Detection in Low-Resource Languages," in M. A. K. et al., Speech and Language Technologies for Low-Resource Languages, SPELLL 2022, Communications in Computer and Information Science, vol. 1802. Springer, Cham. doi: 10.1007/978-3-031-33231-9_23.