Research Article
BibTex RIS Cite

Türkçe metinlerde makine öğrenmesi yöntemleri ile siber zorbalık tespiti

Year 2022, , 443 - 453, 15.04.2022
https://doi.org/10.17714/gumusfenbil.935448

Abstract

İnternet kullanımının yaygınlaşması ve sosyal medya platformlarının popülaritesinin artması siber zorbalık olarak adlandırılan eylemlerin hızla yayılmasına neden olmuştur. Dünya genelinde siber zorbalığa maruz kalan kişilerin sayısı her geçen gün artmaktadır ve bu da mağdurlar üzerinde büyük etkiler yaratmaktadır. Bu eylemin tespit edilmesi, yeni mağdurların ortaya çıkmaması ve mevcut mağdurların daha fazla bu eyleme maruz kalmaması açısından büyük önem taşımaktadır. Bu noktada literatürde siber zorbalık tespitine yönelik birçok çalışmanın gerçekleştirildiği görülmüş ancak Türkçe metinlerde yapılan çalışma sayısının çok az olduğu tespit edilmiştir. Bu çalışmada kaggle adlı paylaşım sitesinden elde edilmiş ve manuel olarak oluşturulan 3000 cümlelik hazır Türkçe bir veri seti üzerinde doğal dil işleme yöntemleri kullanılarak siber zorbalık tespiti gerçekleştirilmiştir. Çalışmada kullanılan veri setinin yeni olması ve bildiğimiz kadarıyla bu kadar çok sayıda algoritmanın literatürde test edilmemiş olması nedeniyle bu çalışmanın literatüre katkı sağlayacağı düşünülmektedir. Çalışmada bu veri seti üzerinde Bagging, Boosting, C4.5, Gradient Boosting, K-Means, KNN, LR, NB, ANN, RO, DVM, Stokastik Gradient Descent ve XGBoost algoritmaları karşılaştırmalı olarak ilk kez kullanılmıştır.

References

  • 1. Barlet, C. P. “Cyberbullying, Traditional Bullying, and Aggression: A Complicated Relationship”, Predicting Cyberbullying Research, Theory, and Intervention-2019, Pages 11-16, https://doi.org/10.1016/B978-0-12-816653-6.00002-9
  • 2. TUIK, Hane halkı bilişim teknolojileri kullanım araştırması. Sayı: 33679.
  • 3. Balakrishnan, V. Khan, S. Fernandez, T. Arabnia,H. R. “Cyberbullying detection on twitter using Big Five and Dark Triad features”, Personality and Individual Differences, Volume 141, 15 April 2019, Pages 252-257, https://doi.org/10.1016/j.paid.2019.01.024
  • 4. Balakrishnan, V. Khan, S. Arabnia, H. R. “Improving cyberbullying detection using Twitter users’ psychological features and machine learning”, Computers & Security, Volume 90, March 2020, 101710, https://doi.org/10.1016/j.cose.2019.101710
  • 5. Modha, S. Majumder, P. Mandl, T. Mandalia, C. “Detecting and visualizing hate speech in social media: A cyber Watchdog for surveillance”, Expert Systems with Applications Volume 161, 15 December 2020, 113725, https://doi.org/10.1016/j.eswa.2020.113725
  • 6. MinSong, T. Song, J. “Prediction of Risk Factors of cyberbullying-related words in Korea: Application of Data Mining Using Social Big Data”, Telematics and Informatics Available online 9 November 2020, 101524, In Press, Journal Pre-proofWhat are Journal Pre-proof articles?, https://doi.org/10.1016/j.tele.2020.101524
  • 7. Fortunatus, M. Anthony, P. Charters, S. “Combining textual features to detect cyberbullying in social media posts”, Procedia Computer Science Volume 176, 2020, Pages 612-621, https://doi.org/10.1016/j.procs.2020.08.063
  • 8. Agrawal, S. Aweka, A. “Deep Learning for Detecting Cyberbullying Across Multiple Social Media Platforms”, Springer International Publishing AG, part of Springer Nature 2018 G. Pasi et al. (Eds.): ECIR 2018, LNCS 10772, pp. 141–153, 2018. https://doi.org/10.1007/978-3-319-76941-7_11
  • 9. Hosseinmardi, H. Mattson, S. A. Ibn Rafiq, R. Han, R. Lv, Q. Mishra, S. “Detection of Cyberbullying Incidents on the Instagram Social Network”, arXiv: 1503.03909v1 [cs.SI] 12 Mar 2015
  • 10. N-Garci´a, P. G. De La Puerta, J. G. Go´Mez, C. L. Santos, I. Bringas, P. G. “Supervised machine learning for the detection of troll profiles in twitter social network: application to a real case of cyberbullying”, Vol. 24 No. 1, © The Author 2015. Published by Oxford University Press. All rights reserved. doi:10.1093/jigpal/jzv048 Advance Access published 31 October 2015
  • 11. Duwairi, R. M. Marji, R. Sha'ban, N. Rushaidat, S. “Sentiment Analysis in Arabic Tweets”, 5th International Conference on Information and Communication Systems (ICICS), 2014.
  • 12. Squicciarini, A. Rajtmajer, S. Liu, Y. Griffin, C. “Identification and characterization of cyberbullying dynamics in an online social network”, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM '15, August 25-28, 2015, Paris, France 280 © 2015 ACM. ISBN 978-1-4503-3854-7/15/08 $15.00 DOI: http://dx.doi.org/10.1145/2808797.2809398
  • 13. Al-Mamun, A. Akhter, S. “Social Media Bullying Detection Using Machine Learning On Bangla Text”, 10th International Conference on Electrical and Computer Engineering, 20-22 December 2018, Dhaka, Bangladesh.
  • 14. Hussain, M. G. Al Mahmud, T. Akthar, W. “An Approach to Detect Abusive Bangla Text”, International Conference on Innovation in Engineering and Technology (ICIET), 27-29 December 2018.
  • 15. Shekhar, A. Mathangi, V. “A Bag-of-Phonetic-Codes Modelfor Cyber-Bullying Detection in Twitter.”, 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT) (2018): 1-7.
  • 16. Venckauskas, A. Karpavicius, A. Damaševičius, R. Marcinkevičius, R. Kapočiūte-Dzikiené, J. Napoli, C. “Open Class Authorship Attribution of Lithuanian Internet Comments using One-Class Classifier”, 2017 Federated Conference on Computer Science and Information Systems (FedCSIS), 3-6 September 2017, Prague, Czech Republic.
  • 17. Zois, D. S. Kapodistria, A. Yao, M. Chelmis, C. “Optimal Online Cyberbullying Detection”, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 15-20 April 2018, Calgary, AB, Canada.
  • 18. Dadvar, M. Jong, F. D. Ordelman, R. Trieschnigg, D. “Improved Cyberbullying Detection Using Gender Information”, Human Media Interaction Group, University of Twente PO Box 217, 7500AE, Enschede, the Netherlands
  • 19. Al-Mamun, A. Akhter, S. “Social Media Bullying Detection Using Machine Learning On Bangla Text”, 10th International Conference on Electrical and Computer Engineering, 20-22 December 2018, Dhaka, Bangladesh.
  • 20. https://www.kaggle.com/tbrknt/detection-of-cyberbullying-in-turkish
  • 21. Breiman, L. “Random Forests, Machine Learning”, 45 (1): 5-32, 2001
  • 22. Akman, M. Genç, Y. Ankaralı, H. 2011. Random forests yöntemi ve sağlık alanında bir uygulama, Türkiye Klinikleri Journal of Biostatistics, 3 (1): 36-48.
  • 23. J. R. Quinlan, “Bagging, boosting, and c4.5,” in Proceedings of the Thirteenth National Conference on Artificial Intelligence, Portland, Oregon, 1996, pp. 725–730.
  • 24. Aydın, C. “Makine Öğrenmesi Algoritmaları Kullanılarak İtfaiye İstasyonu İhtiyacının Sınıflandırılması”, Avrupa Bilim ve Teknoloji Dergisi Sayı 14, S.169-175, Aralık 2018
  • 25. A. McCallum and K. Nigam, “A comparison of event models for Naive Bayes text classification,” in AAAI-98 workshop on learning for text categorization, vol. 752. no. 1. 1998.
  • 26. Saravanaraj, A. Sheeba, J. I. Pradeep Devaneyan, S. “Automatic Detection Of Cyberbullying From Twitter”, IRACST - International Journal of Computer Science and Information Technology & Security (IJCSITS), ISSN: 2249-9555 Vol.6, No.6, Nov-Dec 2016
  • 27. Ayo, F. E. Folorunso, O. Ibharalu, F. T. Osinuga, I. A. “Machine learning techniques for hate speech classification of twitter data: State-of-the-art, future challenges and research directions”, Computer Science Review Volume 38, November 2020, 100311November 2020, 100311, https://doi.org/10.1016/j.cosrev.2020.100311
  • 28. Atalay, M. Çelik, E. “Büyük Veri Analizinde Yapay Zekâ Ve Makine Öğrenmesi Uygulamaları”, Mehmet Akif Ersoy Üniversitesi Sosyal Bilimler Enstitüsü Dergisi Cilt.9 Sayı.22 2017 - Aralık (s.155-172)
  • 29. Chandrashekhar, A. M. Raghuveer, K. “Amalgamation of K-means Clustering Algorithm with Standard MLP and SVM Based Neural Networks to Implement Network Intrusion Detection System”, Advanced Computing, Networking and Informatics - Volume 2, Smart Innovation, Systems and Technologies 28, 273, DOI: 10.1007/978-3-319-07350-7_31, © Springer International Publishing Switzerland 2014
  • 30. Sheikhi, S.” An Efficient Method for Detection of Fake Accounts on the Instagram Platform”, International Information and Engineering Technology Association, Page: 429-436, September 2020, https://doi.org/10.18280/ria.340407
  • 31. Callens, A. Morichon, D. Abadie, S. Delpey, M. Liquet, B. “, Using Random forest and Gradient boosting trees to improve wave forecast at a specific location”, Applied Ocean Research Volume 104, November 2020, 10233, https://doi.org/10.1016/j.apor.2020.102339
  • 32. Bardina, M. Vaganov, D. Guleva, V. “Socio-demographic features meet interests: on subscription patterns and attention distribution in online social media”, Procedia Computer Science Volume 178, 2020, Pages 162-171,
  • 33. https://doi.org/10.1016/j.procs.2020.11.018Subsection (Brief Heeding)
Year 2022, , 443 - 453, 15.04.2022
https://doi.org/10.17714/gumusfenbil.935448

Abstract

Undoubtedly, the widespread use of the internet and the increasing popularity of social media platforms have caused the rapid spread of the actions called cyberbullying. The number of people subjected to cyberbullying throughout the world is increasing day by day and it has a great impact on their victims. Identifying this action is of great importance in terms of preventing the emergence of new victims and not being exposed to this action any more. At this point, it has been observed that many studies have been carried out in the literature on the detection of cyberbullying, but it has been determined that the number of studies in Turkish texts is very low. It is thought that this study will contribute to the literature because the dataset used in the study is new and to the best of our knowledge, such a large number of algorithms have not been tested in the literature. In the study, Bagging, Boosting, C4.5, Gradient Boosting, K-Means, KNN, LR, NB, ANN, RO, DVM, Stochastic Gradient Descent and XGBoost algorithms were used comparatively for the first time on this data set.

References

  • 1. Barlet, C. P. “Cyberbullying, Traditional Bullying, and Aggression: A Complicated Relationship”, Predicting Cyberbullying Research, Theory, and Intervention-2019, Pages 11-16, https://doi.org/10.1016/B978-0-12-816653-6.00002-9
  • 2. TUIK, Hane halkı bilişim teknolojileri kullanım araştırması. Sayı: 33679.
  • 3. Balakrishnan, V. Khan, S. Fernandez, T. Arabnia,H. R. “Cyberbullying detection on twitter using Big Five and Dark Triad features”, Personality and Individual Differences, Volume 141, 15 April 2019, Pages 252-257, https://doi.org/10.1016/j.paid.2019.01.024
  • 4. Balakrishnan, V. Khan, S. Arabnia, H. R. “Improving cyberbullying detection using Twitter users’ psychological features and machine learning”, Computers & Security, Volume 90, March 2020, 101710, https://doi.org/10.1016/j.cose.2019.101710
  • 5. Modha, S. Majumder, P. Mandl, T. Mandalia, C. “Detecting and visualizing hate speech in social media: A cyber Watchdog for surveillance”, Expert Systems with Applications Volume 161, 15 December 2020, 113725, https://doi.org/10.1016/j.eswa.2020.113725
  • 6. MinSong, T. Song, J. “Prediction of Risk Factors of cyberbullying-related words in Korea: Application of Data Mining Using Social Big Data”, Telematics and Informatics Available online 9 November 2020, 101524, In Press, Journal Pre-proofWhat are Journal Pre-proof articles?, https://doi.org/10.1016/j.tele.2020.101524
  • 7. Fortunatus, M. Anthony, P. Charters, S. “Combining textual features to detect cyberbullying in social media posts”, Procedia Computer Science Volume 176, 2020, Pages 612-621, https://doi.org/10.1016/j.procs.2020.08.063
  • 8. Agrawal, S. Aweka, A. “Deep Learning for Detecting Cyberbullying Across Multiple Social Media Platforms”, Springer International Publishing AG, part of Springer Nature 2018 G. Pasi et al. (Eds.): ECIR 2018, LNCS 10772, pp. 141–153, 2018. https://doi.org/10.1007/978-3-319-76941-7_11
  • 9. Hosseinmardi, H. Mattson, S. A. Ibn Rafiq, R. Han, R. Lv, Q. Mishra, S. “Detection of Cyberbullying Incidents on the Instagram Social Network”, arXiv: 1503.03909v1 [cs.SI] 12 Mar 2015
  • 10. N-Garci´a, P. G. De La Puerta, J. G. Go´Mez, C. L. Santos, I. Bringas, P. G. “Supervised machine learning for the detection of troll profiles in twitter social network: application to a real case of cyberbullying”, Vol. 24 No. 1, © The Author 2015. Published by Oxford University Press. All rights reserved. doi:10.1093/jigpal/jzv048 Advance Access published 31 October 2015
  • 11. Duwairi, R. M. Marji, R. Sha'ban, N. Rushaidat, S. “Sentiment Analysis in Arabic Tweets”, 5th International Conference on Information and Communication Systems (ICICS), 2014.
  • 12. Squicciarini, A. Rajtmajer, S. Liu, Y. Griffin, C. “Identification and characterization of cyberbullying dynamics in an online social network”, 2015 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, ASONAM '15, August 25-28, 2015, Paris, France 280 © 2015 ACM. ISBN 978-1-4503-3854-7/15/08 $15.00 DOI: http://dx.doi.org/10.1145/2808797.2809398
  • 13. Al-Mamun, A. Akhter, S. “Social Media Bullying Detection Using Machine Learning On Bangla Text”, 10th International Conference on Electrical and Computer Engineering, 20-22 December 2018, Dhaka, Bangladesh.
  • 14. Hussain, M. G. Al Mahmud, T. Akthar, W. “An Approach to Detect Abusive Bangla Text”, International Conference on Innovation in Engineering and Technology (ICIET), 27-29 December 2018.
  • 15. Shekhar, A. Mathangi, V. “A Bag-of-Phonetic-Codes Modelfor Cyber-Bullying Detection in Twitter.”, 2018 International Conference on Current Trends towards Converging Technologies (ICCTCT) (2018): 1-7.
  • 16. Venckauskas, A. Karpavicius, A. Damaševičius, R. Marcinkevičius, R. Kapočiūte-Dzikiené, J. Napoli, C. “Open Class Authorship Attribution of Lithuanian Internet Comments using One-Class Classifier”, 2017 Federated Conference on Computer Science and Information Systems (FedCSIS), 3-6 September 2017, Prague, Czech Republic.
  • 17. Zois, D. S. Kapodistria, A. Yao, M. Chelmis, C. “Optimal Online Cyberbullying Detection”, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 15-20 April 2018, Calgary, AB, Canada.
  • 18. Dadvar, M. Jong, F. D. Ordelman, R. Trieschnigg, D. “Improved Cyberbullying Detection Using Gender Information”, Human Media Interaction Group, University of Twente PO Box 217, 7500AE, Enschede, the Netherlands
  • 19. Al-Mamun, A. Akhter, S. “Social Media Bullying Detection Using Machine Learning On Bangla Text”, 10th International Conference on Electrical and Computer Engineering, 20-22 December 2018, Dhaka, Bangladesh.
  • 20. https://www.kaggle.com/tbrknt/detection-of-cyberbullying-in-turkish
  • 21. Breiman, L. “Random Forests, Machine Learning”, 45 (1): 5-32, 2001
  • 22. Akman, M. Genç, Y. Ankaralı, H. 2011. Random forests yöntemi ve sağlık alanında bir uygulama, Türkiye Klinikleri Journal of Biostatistics, 3 (1): 36-48.
  • 23. J. R. Quinlan, “Bagging, boosting, and c4.5,” in Proceedings of the Thirteenth National Conference on Artificial Intelligence, Portland, Oregon, 1996, pp. 725–730.
  • 24. Aydın, C. “Makine Öğrenmesi Algoritmaları Kullanılarak İtfaiye İstasyonu İhtiyacının Sınıflandırılması”, Avrupa Bilim ve Teknoloji Dergisi Sayı 14, S.169-175, Aralık 2018
  • 25. A. McCallum and K. Nigam, “A comparison of event models for Naive Bayes text classification,” in AAAI-98 workshop on learning for text categorization, vol. 752. no. 1. 1998.
  • 26. Saravanaraj, A. Sheeba, J. I. Pradeep Devaneyan, S. “Automatic Detection Of Cyberbullying From Twitter”, IRACST - International Journal of Computer Science and Information Technology & Security (IJCSITS), ISSN: 2249-9555 Vol.6, No.6, Nov-Dec 2016
  • 27. Ayo, F. E. Folorunso, O. Ibharalu, F. T. Osinuga, I. A. “Machine learning techniques for hate speech classification of twitter data: State-of-the-art, future challenges and research directions”, Computer Science Review Volume 38, November 2020, 100311November 2020, 100311, https://doi.org/10.1016/j.cosrev.2020.100311
  • 28. Atalay, M. Çelik, E. “Büyük Veri Analizinde Yapay Zekâ Ve Makine Öğrenmesi Uygulamaları”, Mehmet Akif Ersoy Üniversitesi Sosyal Bilimler Enstitüsü Dergisi Cilt.9 Sayı.22 2017 - Aralık (s.155-172)
  • 29. Chandrashekhar, A. M. Raghuveer, K. “Amalgamation of K-means Clustering Algorithm with Standard MLP and SVM Based Neural Networks to Implement Network Intrusion Detection System”, Advanced Computing, Networking and Informatics - Volume 2, Smart Innovation, Systems and Technologies 28, 273, DOI: 10.1007/978-3-319-07350-7_31, © Springer International Publishing Switzerland 2014
  • 30. Sheikhi, S.” An Efficient Method for Detection of Fake Accounts on the Instagram Platform”, International Information and Engineering Technology Association, Page: 429-436, September 2020, https://doi.org/10.18280/ria.340407
  • 31. Callens, A. Morichon, D. Abadie, S. Delpey, M. Liquet, B. “, Using Random forest and Gradient boosting trees to improve wave forecast at a specific location”, Applied Ocean Research Volume 104, November 2020, 10233, https://doi.org/10.1016/j.apor.2020.102339
  • 32. Bardina, M. Vaganov, D. Guleva, V. “Socio-demographic features meet interests: on subscription patterns and attention distribution in online social media”, Procedia Computer Science Volume 178, 2020, Pages 162-171,
  • 33. https://doi.org/10.1016/j.procs.2020.11.018Subsection (Brief Heeding)
There are 33 citations in total.

Details

Primary Language Turkish
Subjects Engineering
Journal Section Articles
Authors

Enver Yazğılı 0000-0001-8459-3488

Muhammet Baykara 0000-0001-5223-1343

Publication Date April 15, 2022
Submission Date May 10, 2021
Acceptance Date February 1, 2022
Published in Issue Year 2022

Cite

APA Yazğılı, E., & Baykara, M. (2022). Türkçe metinlerde makine öğrenmesi yöntemleri ile siber zorbalık tespiti. Gümüşhane Üniversitesi Fen Bilimleri Dergisi, 12(2), 443-453. https://doi.org/10.17714/gumusfenbil.935448