Turkish Clickbait News Detection using Explainable Artificial Intelligence

Alper Celal Akgün; Tülin İnkaya

Research Article

Year 2024, Volume: 7 Issue: 1, 68 - 80, 21.11.2024

Alper Celal Akgün , Tülin İnkaya

Abstract

References

We Are Social, “Digital 2023 Global Overview Report,” [Online]. Available: https://wearesocial.com/wp-content/uploads/2023/03/Digital-2023-Global-Overview-Report.pdf. Accessed: Aug. 9, 2024.
Z. B. Şahin and Y. Birincioğlu, “Tık odaklı başlıklar ve okuyucu refleksleri üzerine bir araştırma: Odak grup çalışması,” TRT Akademi, vol. 7, no. 14, pp. 236–261, 2022.
R. Raj, C. Sharma, R. Uttara, and C. R. Animon, “A Literature Review on Clickbait Detection Techniques for Social Media,” Proc. 2024 11th Int. Conf. Reliability, Infocom Technol. Optimization (ICRITO), pp. 1–5, Mar. 2024. http://dx.doi.org/10.1109/ICRITO61523.2024.10522359
A. F. H. N. Adrian, N. N. Handradika, A. E. Prasojo, A. A. S. Gunawan, and K. E. Setiawan, “Clickbait Detection on Online News Headlines Using Naive Bayes and LSTM,” Proc. 2024 IEEE Int. Conf. Artificial Intell. Mechatronics Syst. (AIMS), pp. 1–6, Feb. 2024. https://doi.org/10.1109/AIMS61812.2024.10512986
Y. Arfat and S. C. Tista, “Bangla Misleading Clickbait Detection Using Ensemble Learning Approach,” Proc. 2024 6th Int. Conf. Electrical Eng. Inf. Commun. Technol. (ICEEICT), pp. 184–189, May 2024. https://doi.org/10.1109/ICEEICT62016.2024.10534333
W. Yang, Y. Wei, H. Wei, Y. Chen, G. Huang, X. Li, and B. Kang, “Survey on explainable AI: From approaches, limitations and applications aspects,” Human-Centric Intell. Syst., vol. 3, no. 3, pp. 161–188, 2023. https://doi.org/10.1007/s44230-023-00038-y
M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should I trust you? Explaining the predictions of any classifier,” Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery Data Mining, pp. 1135–1144, Aug. 2016. https://doi.org/10.1145/2939672.2939778
S. M. Lundberg and S. I. Lee, “A unified approach to interpreting model predictions,” Adv. Neural Inf. Process. Syst., vol. 30, 2017. https://doi.org/10.48550/arXiv.1705.07874
M. Potthast, S. Köpsel, B. Stein, and M. Hagen, “Clickbait detection,” Adv. Inf. Retrieval: Proc. 38th European Conf. IR Res. (ECIR), pp. 810–817, Mar. 2016. https://doi.org/10.1007/978-3-319-30671-1_72
A. Chakraborty, B. Paranjape, S. Kakarla, and N. Ganguly, “Stop clickbait: Detecting and preventing clickbaits in online news media,” Proc. 2016 IEEE/ACM Int. Conf. Advances Social Networks Anal. Mining (ASONAM), pp. 9–16, Aug. 2016. https://doi.org/10.1109/ASONAM.2016.7752207
K. K. Yadav and N. Bansal, “A Comparative Study on Clickbait Detection using Machine Learning Based Methods,” Proc. 2023 Int. Conf. Disruptive Technol. (ICDT), pp. 661–665, May 2023. https://doi.org/10.1109/ICDT57929.2023.10150475
A. Chowanda, N. Nadia, and L. M. M. Kolbe, “Identifying clickbait in online news using deep learning,” Bull. Electrical Eng. Informatics, vol. 12, no. 3, pp. 1755–1761, 2023. https://doi.org/10.11591/eei.v12i3.4444
C. I. Coste, D. Bufnea, and V. Niculescu, “A new language independent strategy for clickbait detection,” Proc. 2020 Int. Conf. Software, Telecommun. Comput. Networks (SoftCOM), pp. 1–6, Sep. https://doi.org/10.23919/SoftCOM50211.2020.9238342
M. M. Mahtab, M. Haque, M. Hasan, and F. Sadeque, “Banglabait: Semi-supervised adversarial approach for clickbait detection on Bangla clickbait dataset,” 14th International Conference on Recent Advances in Natural Language Processing, pp 748–758, 2023. https://doi.org/10.26615/978-954-452-092-2_081
D. M. Broscoteanu and R. T. Ionescu, “A Novel Contrastive Learning Method for Clickbait Detection on RoCliCo: A Romanian Clickbait Corpus of News Articles,” Findings of the Association for Computational Linguistics: EMNLP 2023, pp. 9547–9555, 2023. https://doi.org/10.18653/v1/2023.findings-emnlp.640
T. Liu, K. Yu, L. Wang, X. Zhang, H. Zhou, and X. Wu, “Clickbait detection on WeChat: a deep model integrating semantic and syntactic information,” Knowledge-Based Syst., vol. 245, p. 108605, 2022. https://doi.org/10.1016/j.knosys.2022.108605
Ş. Genç, “Turkish clickbait detection in social media via machine learning algorithms,” MSc Thesis, Middle East Technical University, Ankara, 2021. https://hdl.handle.net/11511/92039
K. Shu, L. Cui, S. Wang, D. Lee, and H. Liu, “DEFEND: Explainable fake news detection,” Proc. 25th ACM SIGKDD Int. Conf. Knowledge Discovery Data Mining, pp. 395–405, Jul. 2019. https://doi.org/10.1145/3292500.3330935
S. Y. Chien, C. J. Yang, and F. Yu, “XFlag: Explainable fake news detection model on social media,” Int. J. Human–Comput. Interaction, vol. 38, no. 18-20, pp. 1808–1827, 2022. https://doi.org/10.1080/10447318.2022.2062113
V. Sharma and D. Midhunchakkaravarthy, “XGBoost classification of XAI based LIME and SHAP for detecting dementia in young adults,” Proc. 2023 14th Int. Conf. Comput. Commun. Networking Technol. (ICCCNT), pp. 1–6, Jul. 2023. https://doi.org/10.1109/ICCCNT56998.2023.10307791
G. I. Pérez-Landa, O. Loyola-González, and M. A. Medina-Pérez, “An explainable artificial intelligence model for detecting,” Human-Centric Intell. Syst., vol. 2, no. 3, pp. 160–188, 2021. https://doi.org/10.3390/app112210801
M. Zhou, W. Xu, W. Zhang, and Q. Jiang, “Leverage knowledge graph and GCN for fine-grained-level clickbait detection,” World Wide Web, vol. 25, no. 3, pp. 1243–1258, 2022. https://doi.org/10.1007/s11280-022-01032-3
T. Turan, E. Küçüksille, and N. K. Alagöz, “Prediction of Turkish Constitutional Court decisions with explainable artificial intelligence,” Bilge Int. J. Sci. Technol. Res., vol. 7, no. 2, pp. 128–141, 2023. https://doi.org/10.30516/bilgesci.1317525
S. Rao, S. Mehta, S. Kulkarni, H. Dalvi, N. Katre, and M. Narvekar, “A study of LIME and SHAP model explainers for autonomous disease predictions,” Proc. 2022 IEEE Bombay Sect. Signature Conf. (IBSSC), pp. 1–6, Dec. 2022. https://doi.org/10.1109/IBSSC56953.2022.10037324
Turkish News Title 20000+ Clickbait Classified, [Online]. Available: https://www.kaggle.com/datasets/suleymancan/turkishnewstitle20000clickbaitclassified . Accessed: Aug. 9, 2024.
P. Domingos, “A few useful things to know about machine learning,” Commun. ACM, vol. 55, no. 10, pp. 78–87, Oct. 2012. http://dx.doi.org/10.1145/2347736.2347755
A. A. Akın and M. D. Akın, “Zemberek, an open source NLP framework for Turkic languages,” Structure, vol. 10, pp. 1–5, 2007.
K. Sparck Jones, “A statistical interpretation of term specificity and its application in retrieval,” J. Documentation, vol. 28, no. 1, pp. 11–21, 1972. https://doi.org/10.1108/eb026526
A. Kiran and D. Vasumathi, “Data mining: Min–max normalization based data perturbation technique for privacy preservation,” Proc. Third Int. Conf. Comput. Intell. Informatics: ICCII 2018, pp. 723–734, Mar. 2020. https://doi.org/10.1007/978-981-15-1480-7_66
N. A. Zuhroh and N. A. Rakhmawati, “Clickbait detection: A literature review of the methods used,” Register: J. Ilmiah Teknologi Sistem Informasi, vol. 6, no. 1, pp. 1–10, 2022. http://dx.doi.org/10.26594/register.v6i1.1561
S. B. Kotsiantis, I. Zaharakis, and P. Pintelas, “Supervised machine learning: A review of classification techniques,” Emerg. Artificial Intell. Applications Comput. Eng., vol. 160, no. 1, pp. 3–24, 2007.
I. H. Sarker, “Machine learning: Algorithms, real-world applications and research directions,” SN Comput. Sci., vol. 2, no. 3, p. 160, 2021. https://doi.org/10.1007/s42979-021-00592-x
G. Van den Broeck, A. Lykov, M. Schleich, and D. Suciu, “On the tractability of SHAP explanations,” J. Artificial Intell. Res., vol. 74, pp. 851–886, 2022. https://doi.org/10.1613/jair.1.13283
D. M. Powers, “Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation,” International Journal of Machine Learning Technology vol. 2, no. 1, pp.37-63, 2011. https://doi.org/10.48550/arXiv.2010.16061
M. Friedman, “The use of ranks to avoid the assumption of normality implicit in the analysis of variance,” J. American Stat. Assoc., vol. 32, no. 200, pp. 675–701, 1937. https://doi.org/10.1080/01621459.1937.10503522
A. Benavoli, G. Corani, and F. Mangili, “Should we really use post-hoc tests based on mean-ranks?” J. Machine Learn. Res., vol. 17, no. 1, pp. 152–161, 2016. https://doi.org/10.48550/arXiv.1505.02288
P. Biyani, K. Tsioutsiouliklis, and J. Blackmer, “‘8 amazing secrets for getting more clicks’: Detecting clickbaits in news streams using article informality,” Proc. 2016 AAAI Conf. Artificial Intell., vol. 30, no. 1, 2016. https://doi.org/10.1609/aaai.v30i1.9966

Turkish Clickbait News Detection using Explainable Artificial Intelligence

Year 2024, Volume: 7 Issue: 1, 68 - 80, 21.11.2024

Alper Celal Akgün , Tülin İnkaya

Abstract

Internet users frequently prefer digital journalism to acquire information. However, the content produced by malicious news sources leads to various issues for users. One of these issues is clickbait headlines, which are used to capture users' attention and direct them to specific content. Clickbait headlines exploit users' curiosity, causing them to navigate to targeted content and spend more time on it. Such content, which can be malicious, is one of the main problems for today's internet users. In the literature, artificial intelligence-based approaches using machine learning and deep learning models have been developed for the problem of clickbait detection. However, there is a need for studies on the explainability of artificial intelligence models developed in this field. Explainable artificial intelligence (XAI) aims to explain the transparency, understandability and decision-making processes of machine learning models. This study aims to develop explainable artificial intelligence-based models for the clickbait detection problem. In this context, a Turkish dataset compiled from different news sources was used. Initially, data preprocessing activities including feature engineering, missing data handling, stemming, normalization and term frequency-inverse document-frequency (TF-IDF) transformation were performed. Subsequently, k-nearest neighbors, Naive Bayes, logistic regression, decision tree, random forest, extreme gradient boosting (XGBoost), support vector machine and multi-layer perceptron (MLP) models were developed using the dataset. Hyperparameter optimization was applied to determine the most suitable parameter values for each model. The performances of the applied models were comparatively evaluated. Finally, to ensure the explainability of artificial intelligence models in clickbait detection, the SHAP method was used for identifying the factors affecting the classification results.

Keywords

Clickbait Detection, Natural Language Processing, SHAP, Explainable Artificial Intelligence

References

We Are Social, “Digital 2023 Global Overview Report,” [Online]. Available: https://wearesocial.com/wp-content/uploads/2023/03/Digital-2023-Global-Overview-Report.pdf. Accessed: Aug. 9, 2024.
Z. B. Şahin and Y. Birincioğlu, “Tık odaklı başlıklar ve okuyucu refleksleri üzerine bir araştırma: Odak grup çalışması,” TRT Akademi, vol. 7, no. 14, pp. 236–261, 2022.
R. Raj, C. Sharma, R. Uttara, and C. R. Animon, “A Literature Review on Clickbait Detection Techniques for Social Media,” Proc. 2024 11th Int. Conf. Reliability, Infocom Technol. Optimization (ICRITO), pp. 1–5, Mar. 2024. http://dx.doi.org/10.1109/ICRITO61523.2024.10522359
A. F. H. N. Adrian, N. N. Handradika, A. E. Prasojo, A. A. S. Gunawan, and K. E. Setiawan, “Clickbait Detection on Online News Headlines Using Naive Bayes and LSTM,” Proc. 2024 IEEE Int. Conf. Artificial Intell. Mechatronics Syst. (AIMS), pp. 1–6, Feb. 2024. https://doi.org/10.1109/AIMS61812.2024.10512986
Y. Arfat and S. C. Tista, “Bangla Misleading Clickbait Detection Using Ensemble Learning Approach,” Proc. 2024 6th Int. Conf. Electrical Eng. Inf. Commun. Technol. (ICEEICT), pp. 184–189, May 2024. https://doi.org/10.1109/ICEEICT62016.2024.10534333
W. Yang, Y. Wei, H. Wei, Y. Chen, G. Huang, X. Li, and B. Kang, “Survey on explainable AI: From approaches, limitations and applications aspects,” Human-Centric Intell. Syst., vol. 3, no. 3, pp. 161–188, 2023. https://doi.org/10.1007/s44230-023-00038-y
M. T. Ribeiro, S. Singh, and C. Guestrin, “Why should I trust you? Explaining the predictions of any classifier,” Proc. 22nd ACM SIGKDD Int. Conf. Knowledge Discovery Data Mining, pp. 1135–1144, Aug. 2016. https://doi.org/10.1145/2939672.2939778
S. M. Lundberg and S. I. Lee, “A unified approach to interpreting model predictions,” Adv. Neural Inf. Process. Syst., vol. 30, 2017. https://doi.org/10.48550/arXiv.1705.07874
M. Potthast, S. Köpsel, B. Stein, and M. Hagen, “Clickbait detection,” Adv. Inf. Retrieval: Proc. 38th European Conf. IR Res. (ECIR), pp. 810–817, Mar. 2016. https://doi.org/10.1007/978-3-319-30671-1_72
A. Chakraborty, B. Paranjape, S. Kakarla, and N. Ganguly, “Stop clickbait: Detecting and preventing clickbaits in online news media,” Proc. 2016 IEEE/ACM Int. Conf. Advances Social Networks Anal. Mining (ASONAM), pp. 9–16, Aug. 2016. https://doi.org/10.1109/ASONAM.2016.7752207
K. K. Yadav and N. Bansal, “A Comparative Study on Clickbait Detection using Machine Learning Based Methods,” Proc. 2023 Int. Conf. Disruptive Technol. (ICDT), pp. 661–665, May 2023. https://doi.org/10.1109/ICDT57929.2023.10150475
A. Chowanda, N. Nadia, and L. M. M. Kolbe, “Identifying clickbait in online news using deep learning,” Bull. Electrical Eng. Informatics, vol. 12, no. 3, pp. 1755–1761, 2023. https://doi.org/10.11591/eei.v12i3.4444
C. I. Coste, D. Bufnea, and V. Niculescu, “A new language independent strategy for clickbait detection,” Proc. 2020 Int. Conf. Software, Telecommun. Comput. Networks (SoftCOM), pp. 1–6, Sep. https://doi.org/10.23919/SoftCOM50211.2020.9238342
M. M. Mahtab, M. Haque, M. Hasan, and F. Sadeque, “Banglabait: Semi-supervised adversarial approach for clickbait detection on Bangla clickbait dataset,” 14th International Conference on Recent Advances in Natural Language Processing, pp 748–758, 2023. https://doi.org/10.26615/978-954-452-092-2_081
D. M. Broscoteanu and R. T. Ionescu, “A Novel Contrastive Learning Method for Clickbait Detection on RoCliCo: A Romanian Clickbait Corpus of News Articles,” Findings of the Association for Computational Linguistics: EMNLP 2023, pp. 9547–9555, 2023. https://doi.org/10.18653/v1/2023.findings-emnlp.640
T. Liu, K. Yu, L. Wang, X. Zhang, H. Zhou, and X. Wu, “Clickbait detection on WeChat: a deep model integrating semantic and syntactic information,” Knowledge-Based Syst., vol. 245, p. 108605, 2022. https://doi.org/10.1016/j.knosys.2022.108605
Ş. Genç, “Turkish clickbait detection in social media via machine learning algorithms,” MSc Thesis, Middle East Technical University, Ankara, 2021. https://hdl.handle.net/11511/92039
K. Shu, L. Cui, S. Wang, D. Lee, and H. Liu, “DEFEND: Explainable fake news detection,” Proc. 25th ACM SIGKDD Int. Conf. Knowledge Discovery Data Mining, pp. 395–405, Jul. 2019. https://doi.org/10.1145/3292500.3330935
S. Y. Chien, C. J. Yang, and F. Yu, “XFlag: Explainable fake news detection model on social media,” Int. J. Human–Comput. Interaction, vol. 38, no. 18-20, pp. 1808–1827, 2022. https://doi.org/10.1080/10447318.2022.2062113
V. Sharma and D. Midhunchakkaravarthy, “XGBoost classification of XAI based LIME and SHAP for detecting dementia in young adults,” Proc. 2023 14th Int. Conf. Comput. Commun. Networking Technol. (ICCCNT), pp. 1–6, Jul. 2023. https://doi.org/10.1109/ICCCNT56998.2023.10307791
G. I. Pérez-Landa, O. Loyola-González, and M. A. Medina-Pérez, “An explainable artificial intelligence model for detecting,” Human-Centric Intell. Syst., vol. 2, no. 3, pp. 160–188, 2021. https://doi.org/10.3390/app112210801
M. Zhou, W. Xu, W. Zhang, and Q. Jiang, “Leverage knowledge graph and GCN for fine-grained-level clickbait detection,” World Wide Web, vol. 25, no. 3, pp. 1243–1258, 2022. https://doi.org/10.1007/s11280-022-01032-3
T. Turan, E. Küçüksille, and N. K. Alagöz, “Prediction of Turkish Constitutional Court decisions with explainable artificial intelligence,” Bilge Int. J. Sci. Technol. Res., vol. 7, no. 2, pp. 128–141, 2023. https://doi.org/10.30516/bilgesci.1317525
S. Rao, S. Mehta, S. Kulkarni, H. Dalvi, N. Katre, and M. Narvekar, “A study of LIME and SHAP model explainers for autonomous disease predictions,” Proc. 2022 IEEE Bombay Sect. Signature Conf. (IBSSC), pp. 1–6, Dec. 2022. https://doi.org/10.1109/IBSSC56953.2022.10037324
Turkish News Title 20000+ Clickbait Classified, [Online]. Available: https://www.kaggle.com/datasets/suleymancan/turkishnewstitle20000clickbaitclassified . Accessed: Aug. 9, 2024.
P. Domingos, “A few useful things to know about machine learning,” Commun. ACM, vol. 55, no. 10, pp. 78–87, Oct. 2012. http://dx.doi.org/10.1145/2347736.2347755
A. A. Akın and M. D. Akın, “Zemberek, an open source NLP framework for Turkic languages,” Structure, vol. 10, pp. 1–5, 2007.
K. Sparck Jones, “A statistical interpretation of term specificity and its application in retrieval,” J. Documentation, vol. 28, no. 1, pp. 11–21, 1972. https://doi.org/10.1108/eb026526
A. Kiran and D. Vasumathi, “Data mining: Min–max normalization based data perturbation technique for privacy preservation,” Proc. Third Int. Conf. Comput. Intell. Informatics: ICCII 2018, pp. 723–734, Mar. 2020. https://doi.org/10.1007/978-981-15-1480-7_66
N. A. Zuhroh and N. A. Rakhmawati, “Clickbait detection: A literature review of the methods used,” Register: J. Ilmiah Teknologi Sistem Informasi, vol. 6, no. 1, pp. 1–10, 2022. http://dx.doi.org/10.26594/register.v6i1.1561
S. B. Kotsiantis, I. Zaharakis, and P. Pintelas, “Supervised machine learning: A review of classification techniques,” Emerg. Artificial Intell. Applications Comput. Eng., vol. 160, no. 1, pp. 3–24, 2007.
I. H. Sarker, “Machine learning: Algorithms, real-world applications and research directions,” SN Comput. Sci., vol. 2, no. 3, p. 160, 2021. https://doi.org/10.1007/s42979-021-00592-x
G. Van den Broeck, A. Lykov, M. Schleich, and D. Suciu, “On the tractability of SHAP explanations,” J. Artificial Intell. Res., vol. 74, pp. 851–886, 2022. https://doi.org/10.1613/jair.1.13283
D. M. Powers, “Evaluation: From precision, recall and F-measure to ROC, informedness, markedness and correlation,” International Journal of Machine Learning Technology vol. 2, no. 1, pp.37-63, 2011. https://doi.org/10.48550/arXiv.2010.16061
M. Friedman, “The use of ranks to avoid the assumption of normality implicit in the analysis of variance,” J. American Stat. Assoc., vol. 32, no. 200, pp. 675–701, 1937. https://doi.org/10.1080/01621459.1937.10503522
A. Benavoli, G. Corani, and F. Mangili, “Should we really use post-hoc tests based on mean-ranks?” J. Machine Learn. Res., vol. 17, no. 1, pp. 152–161, 2016. https://doi.org/10.48550/arXiv.1505.02288
P. Biyani, K. Tsioutsiouliklis, and J. Blackmer, “‘8 amazing secrets for getting more clicks’: Detecting clickbaits in news streams using article informality,” Proc. 2016 AAAI Conf. Artificial Intell., vol. 30, no. 1, 2016. https://doi.org/10.1609/aaai.v30i1.9966

There are 37 citations in total.

Details

Primary Language	English
Subjects	Natural Language Processing
Journal Section	Research Article
Authors	Alper Celal Akgün 0009-0006-5286-0020 Tülin İnkaya 0000-0002-6260-0162
Publication Date	November 21, 2024
Submission Date	October 13, 2024
Acceptance Date	November 21, 2024
Published in Issue	Year 2024 Volume: 7 Issue: 1

Cite

IEEE	A. C. Akgün and T. İnkaya, “Turkish Clickbait News Detection using Explainable Artificial Intelligence”, International Journal of Data Science and Applications, vol. 7, no. 1, pp. 68–80, 2024.

Download Cover Image

Article Files

Full Text

AI Research and Application Center, Sakarya University of Applied Sciences, Sakarya, Türkiye.