Creating a New Dataset for the Classification of Cyber Bullying

Çilem Koçak; Tuncay Yiğit; Mehmet Bilen

doi:10.54569/aair.1206144

Araştırma Makalesi

Creating a New Dataset for the Classification of Cyber Bullying

Yıl 2023, Cilt: 3 Sayı: 2, 45 - 53, 29.10.2023

Çilem Koçak , Tuncay Yiğit , Mehmet Bilen

https://doi.org/10.54569/aair.1206144

Öz

Regardless of young or old, people have quickly stepped into the world of internet with today's communication technologies such as phones, tablets, computers and smart devices. As the place of the Internet in people's lives increases, social media platforms are diversifying and users want to take part in these platforms. With the increase in the number of social media users, some negativities are encountered. The most important problem encountered in social media platforms is cyber bullying. Although cyber bullying seems to be a daily dialogue between social media users or between groups, the situation of encountering is increasing day by day with the diversity of shared information, content and agenda social media environments. With the development of technology, it is necessary to develop a platform that detects bullying with artificial intelligence technologies. One of the biggest difficulties in text classification problems that we encounter during the development of these platforms is the need to train the artificial intelligence algorithm to be used with labeled data. In this study, 21 different people, including journalists, athletes, scientists, doctors, politicians, comedians, social media phenomena, and artists who actively use social media, were selected in order to create the necessary dataset for training the models to be developed to detect cyber bullying situations. The public messages (mentions) of these 21 people sent via Twitter were compiled. After filtering the repetitive and meaningless messages sent by bot accounts out of 10500 tweets compiled, the number of messages in the dataset decreased to 7706. The labeling process, which is necessary for the dataset to be used for training and testing purposes in classification processes, was carried out by three independent people who were given preliminary information about cyberbullying (1=Includes Cyber bullying, 0=Does not include Cyber bullying). The majority of the tags, which were read and assigned by 3 different people, were accepted as the final class of the relevant message. Afterwards, the dataset was preprocessed in accordance with the principles of natural language processing and made suitable for classification algorithms. The findings obtained after the classification processes performed with the basic classification algorithms are shared. When the findings are examined, it is understood that the data set created has the competence to be used in the detection and prevention of cyber bullying. In this context, it is predicted that training specially developed and optimized artificial intelligence algorithms with the relevant dataset for the detection of cyberbullying will greatly increase the success rate.

Anahtar Kelimeler

Cyber bullying, Twitter;, Artificial intelligence;

Kaynakça

Gezgin, D. M., & Çuhadar, C. “Bilgisayar ve öğretim teknolojileri eğitimi bölümü öğrencilerinin siber zorbalığa ilişkin duyarlılık düzeylerinin incelenmesi”, Eğitim Bilimleri Araştırmaları Dergisi, 2(2) (2012), 93-104.
Özdemir, M., & Akar, F. “Lise Öğrencilerinin Siber-Zorbalığa İlişkin Görüşlerinin Bazı Değişkenler Bakımından İncelenmesi”, Kuram ve Uygulamada Eğitim Yönetimi, 4(4) (2011), 605-626.
Eroğlu, Y., Güler, N. “Koşullu Öz-Değer, Riskli İnternet Davranışları ve Siber Zorbalık/Mağduriyet Arasındaki İlişkinin İncelenmesi”, Sakarya University Journal Of Education, 5(3) (2015), 118-129.
Global social media usage report 2021, https://www.statista.com/ (accessed: Apr 10, 2022).
Turkey Internet, social media and Mobile User Statistics According to We Are Social 2020-2021 Report Https://Wearesocial.Com/ (accessed: Jun 15 2022).
Žufić, T. Žajgar, S. Prkić, “Children Online Safety”, 2017 40th International Convention On Information And Communication Technology, Electronics And Microelectronics (MIPRO), 22-26 May 2017, Opatija, Croatia
Ayas, T., & Horzum, M. B. (2011). Exploring The Teachers' Cyber Bullying Perception In Terms Of Various Variables. International Online Journal of Educational Sciences, 3(2).
S. Karabatak, A. Namlı, M. Karabatak, “Perceptions of High School Students Regarding Cyberbullying and Precautions on Coping With Cyberbullying”, 2018 6th International Symposium On Digital Forensic And Security (ISDFS), 22-25 March 2018, Antalya, Turkey.
Arıcak, O. T. “Siber Zorbalık: Gençlerimizi Bekleyen Yeni Tehlike”, Kariyer Penceresi, 2(6) (2011), 10-12.
Erdur-Baker, Ö. and Kavşut, F. “Akran Zorbalığının Yeni Yüzü: Siber Zorbalık”, Eurasian Journal of Educational Research (EJER), 27(2007), pp, 31-42.
Aricak, T., Siyahhan, S., Uzunhasanoglu, A., Saribeyoglu, S., Ciplak, S., Yilmaz, N., & Memmedov, C. Cyberbullying Among Turkish Adolescents. Cyberpsychology & Behavior, 11(3)(2008), 253-261.
M. G. Hussain, T. Al Mahmud, W. Akthar, “An Approach To Detect Abusive Bangla Text”, International Conference On Innovation İn Engineering And Technology (ICIET), 27-29 December 2018.
Al-Mamun, S. Akhter, “Social Media Bullying Detection Using Machine Learning On Bangla Text”, 10th International Conference On Electrical And Computer Engineering, 20-22 December 2018, Dhaka, Bangladesh
Turkishstemmer. Https://Github.Com/Otuncelli/Turkish-Stemmer-Python (accessed: Jun 13, 2022).
NLTK: Https://Www.Nltk.Org/ (accessed: Jun 10, 2022).
Ikonomakis, M., Kotsiantis, S., & Tampakas, V. (2005). Text Classification Using Machine Learning Techniques. WSEAS Transactions On Computers, 4(8), 966-974.
Wilkinson, A. W. Literature Review on Advance Directives. U.S. Department of Health and Human Services. Washington: RAND Corporation, 2007.
Abuaid, A. M., & Mishra, A. (2010). A Rule-Based Approach to Embedding Techniques for Text Document Classification. Applied Science, 10(11), 4009.
Ross, T. J. (2005). Fuzzy Logic with Engineering Applications. West Sussex, United Kingdom: John Wiley & Sons.
Rajput, A. A. (2011). J48 And JRIP Rules For E-Governance Data. International Journal of Computer Science and Security, 5(2), 201.
Buddhinath, G. D. (2006). A Simple Enhancement To One Rule Classification. Melbourne, Australia: Department of Computer Science & Software Engineering University Of Melbourne.
Sayad, S. (2022). Zeror, Saedsayad: Https://Www.Saedsayad.Com/Zeror.Htm (accessed: Jun 10, 2022).

Yıl 2023, Cilt: 3 Sayı: 2, 45 - 53, 29.10.2023

Çilem Koçak , Tuncay Yiğit , Mehmet Bilen

https://doi.org/10.54569/aair.1206144

Öz

Kaynakça

Gezgin, D. M., & Çuhadar, C. “Bilgisayar ve öğretim teknolojileri eğitimi bölümü öğrencilerinin siber zorbalığa ilişkin duyarlılık düzeylerinin incelenmesi”, Eğitim Bilimleri Araştırmaları Dergisi, 2(2) (2012), 93-104.
Özdemir, M., & Akar, F. “Lise Öğrencilerinin Siber-Zorbalığa İlişkin Görüşlerinin Bazı Değişkenler Bakımından İncelenmesi”, Kuram ve Uygulamada Eğitim Yönetimi, 4(4) (2011), 605-626.
Eroğlu, Y., Güler, N. “Koşullu Öz-Değer, Riskli İnternet Davranışları ve Siber Zorbalık/Mağduriyet Arasındaki İlişkinin İncelenmesi”, Sakarya University Journal Of Education, 5(3) (2015), 118-129.
Global social media usage report 2021, https://www.statista.com/ (accessed: Apr 10, 2022).
Turkey Internet, social media and Mobile User Statistics According to We Are Social 2020-2021 Report Https://Wearesocial.Com/ (accessed: Jun 15 2022).
Žufić, T. Žajgar, S. Prkić, “Children Online Safety”, 2017 40th International Convention On Information And Communication Technology, Electronics And Microelectronics (MIPRO), 22-26 May 2017, Opatija, Croatia
Ayas, T., & Horzum, M. B. (2011). Exploring The Teachers' Cyber Bullying Perception In Terms Of Various Variables. International Online Journal of Educational Sciences, 3(2).
S. Karabatak, A. Namlı, M. Karabatak, “Perceptions of High School Students Regarding Cyberbullying and Precautions on Coping With Cyberbullying”, 2018 6th International Symposium On Digital Forensic And Security (ISDFS), 22-25 March 2018, Antalya, Turkey.
Arıcak, O. T. “Siber Zorbalık: Gençlerimizi Bekleyen Yeni Tehlike”, Kariyer Penceresi, 2(6) (2011), 10-12.
Erdur-Baker, Ö. and Kavşut, F. “Akran Zorbalığının Yeni Yüzü: Siber Zorbalık”, Eurasian Journal of Educational Research (EJER), 27(2007), pp, 31-42.
Aricak, T., Siyahhan, S., Uzunhasanoglu, A., Saribeyoglu, S., Ciplak, S., Yilmaz, N., & Memmedov, C. Cyberbullying Among Turkish Adolescents. Cyberpsychology & Behavior, 11(3)(2008), 253-261.
M. G. Hussain, T. Al Mahmud, W. Akthar, “An Approach To Detect Abusive Bangla Text”, International Conference On Innovation İn Engineering And Technology (ICIET), 27-29 December 2018.
Al-Mamun, S. Akhter, “Social Media Bullying Detection Using Machine Learning On Bangla Text”, 10th International Conference On Electrical And Computer Engineering, 20-22 December 2018, Dhaka, Bangladesh
Turkishstemmer. Https://Github.Com/Otuncelli/Turkish-Stemmer-Python (accessed: Jun 13, 2022).
NLTK: Https://Www.Nltk.Org/ (accessed: Jun 10, 2022).
Ikonomakis, M., Kotsiantis, S., & Tampakas, V. (2005). Text Classification Using Machine Learning Techniques. WSEAS Transactions On Computers, 4(8), 966-974.
Wilkinson, A. W. Literature Review on Advance Directives. U.S. Department of Health and Human Services. Washington: RAND Corporation, 2007.
Abuaid, A. M., & Mishra, A. (2010). A Rule-Based Approach to Embedding Techniques for Text Document Classification. Applied Science, 10(11), 4009.
Ross, T. J. (2005). Fuzzy Logic with Engineering Applications. West Sussex, United Kingdom: John Wiley & Sons.
Rajput, A. A. (2011). J48 And JRIP Rules For E-Governance Data. International Journal of Computer Science and Security, 5(2), 201.
Buddhinath, G. D. (2006). A Simple Enhancement To One Rule Classification. Melbourne, Australia: Department of Computer Science & Software Engineering University Of Melbourne.
Sayad, S. (2022). Zeror, Saedsayad: Https://Www.Saedsayad.Com/Zeror.Htm (accessed: Jun 10, 2022).

Toplam 22 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Yapay Zeka
Bölüm	Araştırma Makalesi
Yazarlar	Çilem Koçak 0000-0002-4516-2076 Tuncay Yiğit 0000-0001-7397-7224 Mehmet Bilen 0000-0002-6016-2349
Erken Görünüm Tarihi	23 Ekim 2023
Yayımlanma Tarihi	29 Ekim 2023
Kabul Tarihi	22 Nisan 2023
Yayımlandığı Sayı	Yıl 2023 Cilt: 3 Sayı: 2

Kaynak Göster

IEEE	Ç. Koçak, T. Yiğit, ve M. Bilen, “Creating a New Dataset for the Classification of Cyber Bullying”, Adv. Artif. Intell. Res., c. 3, sy. 2, ss. 45–53, 2023, doi: 10.54569/aair.1206144.

Kapak Resmi İndir

Makale Dosyaları

Tam Metin

Advances in Artificial Intelligence Research is an open access journal which means that the content is freely available without charge to the user or his/her institution. All papers are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License, which allows users to distribute, remix, adapt, and build upon the material in any medium or format for non-commercial purposes only, and only so long as attribution is given to the creator.

Graphic design @ Özden Işıktaş