STACKOVERFLOW'DA "BIG DATA" İLE İLGİLİ GÖNDERİLERİN KONU MODELLEME VE BİRLİKTELİK ANALİZİ İLE ÖZELLİKLERİNİN ÇIKARILMASI

Adile Genç; Ayça Yurtseven; Hacer Özyurt; Özcan Özyurt

doi:10.31796/ogummf.1375611

Research Article

EXTRACTING FEATURES OF "BIG DATA" RELATED POSTS ON STACKOVERFLOW WITH TOPIC MODELING AND ASSOCIATION ANALYSIS

Year 2024, Volume: 32 Issue: 1, 1257 - 1268, 22.04.2024

Adile Genç , Ayça Yurtseven , Hacer Özyurt , Özcan Özyurt

https://doi.org/10.31796/ogummf.1375611

Abstract

With the increase in the use of the internet in today's technology, the emergence of the concept of "Big Data" has become inevitable. With more than 23 million questions and nearly 35 million answers, the analysis of the information shared on StackOverflow, which contributes to big data, can provide important inferences about current issues and trends. Since it is not possible to manually analyze discussions on this large and distributed dataset on StackOverflow, there is a need for methods that can perform automatic analysis. Topic modeling approaches have been used to address this need. The Latent Dirichlet Allocation (LDA) method has been highly preferred and proven successful in topic modeling studies. In the current study, the LDA method was used to semantically analyze the questions labeled "Big Data" and the answers to these questions on the StackOverflow platform, and it was concluded that the most talked-about topics about big data are machine learning, data science, and memory management, with a rate of 16%. A separate dataset was created with the tags used in StackOverflow posts, and association analysis was performed. The main purpose of this stage is to reveal invisible relationships using the Apriori algorithm. As a result of the data obtained, it was seen that the bigdata tag and the hadoop tag were used together in 25 out of 100 questions with the highest rate. In addition, someone who uses the hive tag is 60% likely to use both hadoop and bigdata tags, increasing the usage rate of these tags by 2.39.

Keywords

Topic Modeling, LDA, Association Analysis, Big Data, Stackoverflow Posts

References

Alan, M. A. & Yeşilyurt, C. (2019). Birliktelik Kuralları Madenciliği İle Yatan Hasta Profilinin Çıkarılması. Atatürk Üniversitesi Sosyal Bilimler Enstitüsü Dergisi, 23(4), 1917-1926.
Altınbaş, V., Albayrak, M. & Topal, K. (2021). Topic modeling with latent dirichlet allocation for cancer disease posts, Journal of the Faculty of Engineering and Architecture of Gazi University, 36(4), 2183–2196.
Altunışık, R. (2015). Büyük veri: fırsatlar kaynağı mı yoksa yeni sorunlar yumağı mı?. Yildiz Social Science Review, 1(1), 45-76.
Atalı, L. (2018). Sporda büyük veri kullanımının incelenmesi” bigdata. 16. Spor Bilimleri Kongresi Tam Metin Bildiri Kitabı, S: 1997‐2000, Antalya.
Bagherzadeh, M. & Raffi, K. (2019). "Going big: a large-scale study on what big data developers ask." Proceedings of the 2019 27the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Tallinn Estonia.
Bakı̇r, C., Hakkoymaz, V, Dı̇rı̇, B. & Güçlü, M. (2020). Dağıtık veritabanlarında saldırı önleme metotları. Gümüşhane Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 10(2), 425-441.
Doğan, B., Erol, B. & Buldu, A. (2014). Sigortacılık sektöründe müşteri ilişkileri yönetimi için birliktelik kuralı kullanılması. Marmara Fen Bilimleri Dergisi, 26(3), 105-114. doi: https://doi.org/10.7240/mufbed.56489
Ekinci, E. & Omurca, S. İ. (2017). Ürün özelliklerinin konu modelleme yöntemi ile çıkartılması. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 9(1), 51-58.
Eravcı, D. B. (2010). Kurumların dijital dönüşümü: büyük veri . Çalışma İlişkileri Dergisi, 11(1), 90-112.
Favaretto, M., De Clercq, E., Schneble, C. O., & Elger, B. S. (2020). What is your definition of Big Data? Researchers’ understanding of the phenomenon of the decade. PloS one, 15(2), e0228987.
Gürcan, F. & Özyurt, Ö. (2021). Stackoverflow gönderilerinde tartışılan trend konuların kelime frekans analizi ile belirlenmesi. Gümüşhane Üniversitesi Fen Bilimleri Dergisi, 11(2), 357-368. doi: https://doi.org/10.17714/gumusfenbil.811123
Güven, Z. A. , Diri, B. & Çakaloğlu, T. (2018). Classification of turkish tweet emotions by n- stage latent dirichlet allocation, 2018 Electric Electronics, Computer Science, Biomedical Engineerings Meeting (EBBT). doi: https://doi.org/10.21541/apjes.459447
Güven, Z. A. , Diri, B. & Çakaloğlu, T. (2020).
Comparison of n-stage Latent Dirichlet Allocation versus other topic modeling methods for emotion analysis. Journal of the Faculty of Engineering and Architecture of Gazi University, 35(4), 2135-2146. doi: https://doi.org/ 10.17341/gazimmfd.556104
Hoş, S. (2020). Veri analizi nedir, büyük veri analizi nasıl yapılır? Erişim adresi: http://www.hosting.com.tr/blog/buyuk-veri-analizi/
Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y. & Zhao, L. (2019). Latent dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Latent dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications, 78(11), 15169-15211. doi: https://dl.acm.org/doi/10.1007/s11042-018-6894-4
Kaya, A. & Gülbandılar, E. (2022). Konu modelleme yöntemlerinin karşılaştırılması. Eskişehir Türk Dünyası Uygulama ve Araştırma Merkezi Bilişim Dergisi, 3(2),46-53. doi: https://doi.org/10.53608/estudambilisim.1097978
Ma, Y., Zhou, Q., Tag, B., Sarsenbayeva, Z., Knibbe, J. & Goncalves, J. (2023). “Hello, fellow villager!”: perceptions and impact of displaying users’ locations on weibo. In IFIP Conference on Human-Computer Interaction (pp. 511-532). doi: https://dl.acm.org/doi/abs/10.1007/978-3-031-42286-7_29
Ouni, A., Saidani, I., Alomar, E. & Mkaouer, M. W. (2023). An empirical study on continuous integration trends, topics and challenges in stack overflow. In Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering (pp. 141-151). doi: https://doi.org/10.1145/3593434.3593485
Özyurt, O. & Özyurt, H. (2023). A large-scale study based on topic modeling to determine the research interests and trends on computational thinking. Education and Information Technologies, 28(3), 3557-3579. doi: https://dl.acm.org/doi/abs/10.1007/s10639-022-11325-9
Rosen, C. & Shihab, E. (2016). What are mobile developers asking about? a large scale study using stack overflow. Empirical Software Engineering, 21(3), 1192-1223. doi: https://dl.acm.org/doi/10.1007/s10664-015-9379-3
Stackoverflow (t.y.). Who We Are. Erişim adresi: https://stackoverflow.co/
Steyvers, M. & Griffiths, T. (2007). Probabilistic topic models. Handbook of latent semantic analysis, 427(7), 424-440.
Syam, G., Lal, S. & Chen, T. (2023). Empirical Study of the Evolution of Python Questions on Stack Overflow. e-Informatica Software Engineering Journal, 17(1).
Yang, X. L., Lo, D., Xia, X., Wan, Z. Y. & Sun, J. L. (2016). What security questions do developers ask? a large-scale study of stack overflow posts. Journal of Computer Science and Technology, 31, 910-924. doi: https://doi.org/10.1007/s11390-016-1672-0
Zhang, P. (2019). What topics do developers concern? An analysis of java related posts on stackoverflow. In 2019 2nd International Conference on Artificial Intelligence and Big Data (ICAIBD) (pp. 362-368). IEEE. doi: https://doi.org/ 10.31590/ejosat.702949

STACKOVERFLOW'DA "BIG DATA" İLE İLGİLİ GÖNDERİLERİN KONU MODELLEME VE BİRLİKTELİK ANALİZİ İLE ÖZELLİKLERİNİN ÇIKARILMASI

Year 2024, Volume: 32 Issue: 1, 1257 - 1268, 22.04.2024

Adile Genç , Ayça Yurtseven , Hacer Özyurt , Özcan Özyurt

https://doi.org/10.31796/ogummf.1375611

Abstract

Günümüz teknolojisinde internet kullanımının artması ile birlikte "Büyük Veri" kavramının ortaya çıkması kaçınılmaz olmuştur. 23 milyondan fazla soru ve 35 milyona yakın cevap barındırarak büyük veriye katkı sağlayan StackOverflow'da paylaşılan bilgilerin analizi güncel konu ve eğilimlerin belirlenmesi konusunda önemli çıkarımlar sunabilmektedir. StackOverflow'daki bu büyük ve dağınık veri kümesi üzerinde tartışmaların elle analiz edilmesi mümkün olmadığı için otomatik analiz yapabilecek yöntemlere ihtiyaç duyulmaktadır. Bu ihtiyacı gidermek için konu modelleme yaklaşımlarına başvurulmuştur. Konu modelleme alanında yapılan çalışmalarda Gizli Dirichlet Ataması (Latent Dirichlet Allocation - LDA) yöntemi oldukça tercih edilmiş ve başarısı ispatlanmıştır. Yürütülen çalışmada LDA yöntemi kullanılarak StackOverflow platformu üzerinde "Big Data" etiketli soruların ve bu soruların cevaplarının anlamsal analizi yapılmış olup büyük veri hakkında en çok konuşulan konuların %16’lık bir oran ile makine öğrenmesi/veri bilimi ve bellek yönetimi olduğu sonucuna varılmıştır. StackOverflow gönderilerinde kullanılan etiketlerle ayrı bir veri seti oluşturulmuş ve birliktelik analizi yapılmıştır. Bu aşamanın asıl amacı Apriori algoritması kullanarak görülemeyen ilişkileri ortaya çıkarmaktır. Elde edilen veriler sonucunda en yüksek oran ile 100 sorunun 25'inde bigdata etiketi ile hadoop etiketinin beraber kullanıldığı görülmüştür. Ek olarak hive etiketini kullanan biri %60 gibi bir ihtimalle hadoop ve bigdata etiketini de kullanmaktadır ve bu etiketlerin kullanım oranını 2.39 artırmaktadır.

Keywords

Konu modelleme, LDA, Birliktelik Analizi, Büyük Veri, Stackoverflow günderileri

References

Alan, M. A. & Yeşilyurt, C. (2019). Birliktelik Kuralları Madenciliği İle Yatan Hasta Profilinin Çıkarılması. Atatürk Üniversitesi Sosyal Bilimler Enstitüsü Dergisi, 23(4), 1917-1926.
Altınbaş, V., Albayrak, M. & Topal, K. (2021). Topic modeling with latent dirichlet allocation for cancer disease posts, Journal of the Faculty of Engineering and Architecture of Gazi University, 36(4), 2183–2196.
Altunışık, R. (2015). Büyük veri: fırsatlar kaynağı mı yoksa yeni sorunlar yumağı mı?. Yildiz Social Science Review, 1(1), 45-76.
Atalı, L. (2018). Sporda büyük veri kullanımının incelenmesi” bigdata. 16. Spor Bilimleri Kongresi Tam Metin Bildiri Kitabı, S: 1997‐2000, Antalya.
Bagherzadeh, M. & Raffi, K. (2019). "Going big: a large-scale study on what big data developers ask." Proceedings of the 2019 27the ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Tallinn Estonia.
Bakı̇r, C., Hakkoymaz, V, Dı̇rı̇, B. & Güçlü, M. (2020). Dağıtık veritabanlarında saldırı önleme metotları. Gümüşhane Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 10(2), 425-441.
Doğan, B., Erol, B. & Buldu, A. (2014). Sigortacılık sektöründe müşteri ilişkileri yönetimi için birliktelik kuralı kullanılması. Marmara Fen Bilimleri Dergisi, 26(3), 105-114. doi: https://doi.org/10.7240/mufbed.56489
Ekinci, E. & Omurca, S. İ. (2017). Ürün özelliklerinin konu modelleme yöntemi ile çıkartılması. Türkiye Bilişim Vakfı Bilgisayar Bilimleri ve Mühendisliği Dergisi, 9(1), 51-58.
Eravcı, D. B. (2010). Kurumların dijital dönüşümü: büyük veri . Çalışma İlişkileri Dergisi, 11(1), 90-112.
Favaretto, M., De Clercq, E., Schneble, C. O., & Elger, B. S. (2020). What is your definition of Big Data? Researchers’ understanding of the phenomenon of the decade. PloS one, 15(2), e0228987.
Gürcan, F. & Özyurt, Ö. (2021). Stackoverflow gönderilerinde tartışılan trend konuların kelime frekans analizi ile belirlenmesi. Gümüşhane Üniversitesi Fen Bilimleri Dergisi, 11(2), 357-368. doi: https://doi.org/10.17714/gumusfenbil.811123
Güven, Z. A. , Diri, B. & Çakaloğlu, T. (2018). Classification of turkish tweet emotions by n- stage latent dirichlet allocation, 2018 Electric Electronics, Computer Science, Biomedical Engineerings Meeting (EBBT). doi: https://doi.org/10.21541/apjes.459447
Güven, Z. A. , Diri, B. & Çakaloğlu, T. (2020).
Comparison of n-stage Latent Dirichlet Allocation versus other topic modeling methods for emotion analysis. Journal of the Faculty of Engineering and Architecture of Gazi University, 35(4), 2135-2146. doi: https://doi.org/ 10.17341/gazimmfd.556104
Hoş, S. (2020). Veri analizi nedir, büyük veri analizi nasıl yapılır? Erişim adresi: http://www.hosting.com.tr/blog/buyuk-veri-analizi/
Jelodar, H., Wang, Y., Yuan, C., Feng, X., Jiang, X., Li, Y. & Zhao, L. (2019). Latent dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Latent dirichlet allocation (LDA) and topic modeling: models, applications, a survey. Multimedia Tools and Applications, 78(11), 15169-15211. doi: https://dl.acm.org/doi/10.1007/s11042-018-6894-4
Kaya, A. & Gülbandılar, E. (2022). Konu modelleme yöntemlerinin karşılaştırılması. Eskişehir Türk Dünyası Uygulama ve Araştırma Merkezi Bilişim Dergisi, 3(2),46-53. doi: https://doi.org/10.53608/estudambilisim.1097978
Ma, Y., Zhou, Q., Tag, B., Sarsenbayeva, Z., Knibbe, J. & Goncalves, J. (2023). “Hello, fellow villager!”: perceptions and impact of displaying users’ locations on weibo. In IFIP Conference on Human-Computer Interaction (pp. 511-532). doi: https://dl.acm.org/doi/abs/10.1007/978-3-031-42286-7_29
Ouni, A., Saidani, I., Alomar, E. & Mkaouer, M. W. (2023). An empirical study on continuous integration trends, topics and challenges in stack overflow. In Proceedings of the 27th International Conference on Evaluation and Assessment in Software Engineering (pp. 141-151). doi: https://doi.org/10.1145/3593434.3593485
Özyurt, O. & Özyurt, H. (2023). A large-scale study based on topic modeling to determine the research interests and trends on computational thinking. Education and Information Technologies, 28(3), 3557-3579. doi: https://dl.acm.org/doi/abs/10.1007/s10639-022-11325-9
Rosen, C. & Shihab, E. (2016). What are mobile developers asking about? a large scale study using stack overflow. Empirical Software Engineering, 21(3), 1192-1223. doi: https://dl.acm.org/doi/10.1007/s10664-015-9379-3
Stackoverflow (t.y.). Who We Are. Erişim adresi: https://stackoverflow.co/
Steyvers, M. & Griffiths, T. (2007). Probabilistic topic models. Handbook of latent semantic analysis, 427(7), 424-440.
Syam, G., Lal, S. & Chen, T. (2023). Empirical Study of the Evolution of Python Questions on Stack Overflow. e-Informatica Software Engineering Journal, 17(1).
Yang, X. L., Lo, D., Xia, X., Wan, Z. Y. & Sun, J. L. (2016). What security questions do developers ask? a large-scale study of stack overflow posts. Journal of Computer Science and Technology, 31, 910-924. doi: https://doi.org/10.1007/s11390-016-1672-0
Zhang, P. (2019). What topics do developers concern? An analysis of java related posts on stackoverflow. In 2019 2nd International Conference on Artificial Intelligence and Big Data (ICAIBD) (pp. 362-368). IEEE. doi: https://doi.org/ 10.31590/ejosat.702949

There are 26 citations in total.

Details

Primary Language	Turkish
Subjects	Software Engineering (Other)
Journal Section	Research Articles
Authors	Adile Genç 0009-0001-6520-8596 Ayça Yurtseven 0009-0002-6361-6796 Hacer Özyurt 0000-0001-8621-2335 Özcan Özyurt 0000-0002-0047-6813
Early Pub Date	April 22, 2024
Publication Date	April 22, 2024
Submission Date	October 16, 2023
Acceptance Date	December 19, 2023
Published in Issue	Year 2024 Volume: 32 Issue: 1

Cite

APA	Genç, A., Yurtseven, A., Özyurt, H., Özyurt, Ö. (2024). STACKOVERFLOW’DA "BIG DATA" İLE İLGİLİ GÖNDERİLERİN KONU MODELLEME VE BİRLİKTELİK ANALİZİ İLE ÖZELLİKLERİNİN ÇIKARILMASI. Eskişehir Osmangazi Üniversitesi Mühendislik Ve Mimarlık Fakültesi Dergisi, 32(1), 1257-1268. https://doi.org/10.31796/ogummf.1375611
AMA	Genç A, Yurtseven A, Özyurt H, Özyurt Ö. STACKOVERFLOW’DA "BIG DATA" İLE İLGİLİ GÖNDERİLERİN KONU MODELLEME VE BİRLİKTELİK ANALİZİ İLE ÖZELLİKLERİNİN ÇIKARILMASI. ESOGÜ Müh Mim Fak Derg. April 2024;32(1):1257-1268. doi:10.31796/ogummf.1375611
Chicago	Genç, Adile, Ayça Yurtseven, Hacer Özyurt, and Özcan Özyurt. “STACKOVERFLOW’DA ‘BIG DATA’ İLE İLGİLİ GÖNDERİLERİN KONU MODELLEME VE BİRLİKTELİK ANALİZİ İLE ÖZELLİKLERİNİN ÇIKARILMASI”. Eskişehir Osmangazi Üniversitesi Mühendislik Ve Mimarlık Fakültesi Dergisi 32, no. 1 (April 2024): 1257-68. https://doi.org/10.31796/ogummf.1375611.
EndNote	Genç A, Yurtseven A, Özyurt H, Özyurt Ö (April 1, 2024) STACKOVERFLOW’DA "BIG DATA" İLE İLGİLİ GÖNDERİLERİN KONU MODELLEME VE BİRLİKTELİK ANALİZİ İLE ÖZELLİKLERİNİN ÇIKARILMASI. Eskişehir Osmangazi Üniversitesi Mühendislik ve Mimarlık Fakültesi Dergisi 32 1 1257–1268.
IEEE	A. Genç, A. Yurtseven, H. Özyurt, and Ö. Özyurt, “STACKOVERFLOW’DA ‘BIG DATA’ İLE İLGİLİ GÖNDERİLERİN KONU MODELLEME VE BİRLİKTELİK ANALİZİ İLE ÖZELLİKLERİNİN ÇIKARILMASI”, ESOGÜ Müh Mim Fak Derg, vol. 32, no. 1, pp. 1257–1268, 2024, doi: 10.31796/ogummf.1375611.
ISNAD	Genç, Adile et al. “STACKOVERFLOW’DA ‘BIG DATA’ İLE İLGİLİ GÖNDERİLERİN KONU MODELLEME VE BİRLİKTELİK ANALİZİ İLE ÖZELLİKLERİNİN ÇIKARILMASI”. Eskişehir Osmangazi Üniversitesi Mühendislik ve Mimarlık Fakültesi Dergisi 32/1 (April 2024), 1257-1268. https://doi.org/10.31796/ogummf.1375611.
JAMA	Genç A, Yurtseven A, Özyurt H, Özyurt Ö. STACKOVERFLOW’DA "BIG DATA" İLE İLGİLİ GÖNDERİLERİN KONU MODELLEME VE BİRLİKTELİK ANALİZİ İLE ÖZELLİKLERİNİN ÇIKARILMASI. ESOGÜ Müh Mim Fak Derg. 2024;32:1257–1268.
MLA	Genç, Adile et al. “STACKOVERFLOW’DA ‘BIG DATA’ İLE İLGİLİ GÖNDERİLERİN KONU MODELLEME VE BİRLİKTELİK ANALİZİ İLE ÖZELLİKLERİNİN ÇIKARILMASI”. Eskişehir Osmangazi Üniversitesi Mühendislik Ve Mimarlık Fakültesi Dergisi, vol. 32, no. 1, 2024, pp. 1257-68, doi:10.31796/ogummf.1375611.
Vancouver	Genç A, Yurtseven A, Özyurt H, Özyurt Ö. STACKOVERFLOW’DA "BIG DATA" İLE İLGİLİ GÖNDERİLERİN KONU MODELLEME VE BİRLİKTELİK ANALİZİ İLE ÖZELLİKLERİNİN ÇIKARILMASI. ESOGÜ Müh Mim Fak Derg. 2024;32(1):1257-68.

Download Cover Image

Article Files

Full Text