Performance Evaluation of Transformer-Based Pre-Trained Language Models for Turkish Question-Answering

Mert İncidelen; Murat Aydoğan

doi:10.34248/bsengineering.1596832

Research Article

Performance Evaluation of Transformer-Based Pre-Trained Language Models for Turkish Question-Answering

Year 2025, Volume: 8 Issue: 2, 323 - 329, 15.03.2025

Mert İncidelen , Murat Aydoğan

https://doi.org/10.34248/bsengineering.1596832

Abstract

Natural language processing (NLP) has made significant progress with the introduction of Transformer-based architectures that have revolutionized tasks such as question-answering (QA). While English is a primary focus of NLP research due to its high resource datasets, low-resource languages such as Turkish present unique challenges such as linguistic complexity and limited data availability. This study evaluates the performance of Transformer-based pre-trained language models on QA tasks and provides insights into their strengths and limitations for future improvements. In the study, using the SQuAD-TR dataset, which is the machine-translated Turkish version of the SQuAD 2.0 dataset, variations of the mBERT, BERTurk, ConvBERTurk, DistilBERTurk, and ELECTRA Turkish pre-trained models were fine-tuned. The performance of these fine-tuned models was tested using the XQuAD-TR dataset. The models were evaluated using Exact Match (EM) Rate and F1 Score metrics. Among the tested models, the ConvBERTurk Base (cased) model performed the best, achieving an EM Rate of 57.81512% and an F1 Score of 71.58769%. In contrast, the DistilBERTurk Base (cased) and ELECTRA TR Small (cased) models performed poorly due to their smaller size and fewer parameters. The results indicate that case-sensitive models generally perform better than case-insensitive models. The ability of case-sensitive models to discriminate proper names and abbreviations more effectively improved their performance. Moreover, models specifically adapted for Turkish performed better on QA tasks compared to the multilingual mBERT model.

Keywords

Natural language processing, Question-answering, Transformers, BERT, ELECTRA

Ethical Statement

Ethics committee approval was not required for this study because of there was no study on animals or humans.

References

Acheampong FA, Nunoo-Mensah H, Chen W. 2021. Transformer models for text-based emotion detection: a review of BERT-based approaches. Artif Intell Rev, 54(8): 5789-5829.
Akyon FC, Cavusoglu D, Cengiz C, Altinuc SO, Temizel A. 2021. Automated question generation and question answering from Turkish texts. arXiv preprint arXiv:2111.06476.
Allam AMN, Haggag MH. 2012. The question answering systems: a survey. Int J Res Rev Inf Sci (IJRRIS), 2(3).
Alzubi JA, Jain R, Singh A, Parwekar P, Gupta M. 2023. COBERT: COVID-19 question answering system using BERT. Arab J Sci Eng, 48(8): 11003-11013.
Artetxe M, Ruder S, Yogatama D. 2019. On the cross-lingual transferability of monolingual representations. arXiv preprint arXiv:1910.11856.
Arzu M, Aydoğan M. 2023. Türkçe duygu sınıflandırma için transformers tabanlı mimarilerin karşılaştırmalı analizi. Comput Sci, 2023: 1-6.
Budur E, Özçelik R, Soylu D, Khattab O, Güngör T, Potts C. 2024. Building efficient and effective OpenQA systems for low-resource languages. arXiv preprint arXiv:2401.03590.
Clark K, Luong MT, Le QV, Manning CD. 2020. ELECTRA: pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555.
Çelikten A, Bulut H. 2021. Turkish medical text classification using BERT. 29th Signal Processing and Communications Applications Conference (SIU), June 9-11, İstanbul, Türkiye, pp: 1-4.
Devlin J, Chang MW, Lee K, Toutanova K. 2018. BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Gemirter CB, Goularas D. 2021. A Turkish question answering system based on deep learning neural networks. J Intell Syst Theory Appl, 4(2): 65-75.
Hassani H, Beneki C, Unger S, Mazinani MT, Yeganegi MR. 2020. Text mining in big data analytics. Big Data Cogn Comput, 4(1): 1.
Jiang Z, Yu W, Zhou D, Chen Y, Feng J, Yan S. 2020. ConvBERT: improving BERT with span-based dynamic convolution. Adv Neural Inf Process Syst, 33: 12837-12848.
Khurana D, Koli A, Khatter K, Singh S. 2023. Natural language processing: state of the art, current trends and challenges. Multimed Tools Appl, 82(3): 3713-3744.
Locke S, Bashall A, Al-Adely S, Moore J, Wilson A, Kitchen GB. 2021. Natural language processing in medicine: a review. Trends Anaesth Crit Care, 38: 4-9.
Rajpurkar P, Jia R, Liang P. 2018. Know what you don’t know: unanswerable questions for SQuAD. arXiv preprint arXiv:1806.03822.
Rajpurkar P, Zhang J, Lopyrev K, Liang P. 2016. SQuAD: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250.
Raza S, Schwartz B, Rosella LC. 2022. CoQUAD: A COVID-19 question answering dataset system, facilitating research, benchmarking, and practice. BMC Bioinfo, 23(1): 210.
Sanh V, Debut L, Chaumond J, Wolf T. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01106.
Savci P, Das B. 2023. Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML. Heliyon, 9(5).
Schweter S. 2020. BERTurk-BERT models for Turkish. Zenodo, 2020: 3770924.
Soygazi F, Çiftçi O, Kök U, Cengiz S. 2021. THQuAD: Turkish historic question answering dataset for reading comprehension. 6th International Conference on Computer Science and Engineering (UBMK), September 15-17, Ankara, Türkiye, pp: 215-220.
Türkmen H, Dikenelli O, Eraslan C, Callı MC, Özbek SS. 2023. BioBERTurk: exploring Turkish biomedical language model development strategies in low-resource setting. J Healthc Inform Res, 7(4): 433-446.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. 2017. Attention is all you need. Adv Neural Inf Process Syst, 2017: 30.
Yiğit G, Amasyalı F. 2021. Soru cevaplama sistemleri üzerine detaylı bir çalışma: veri kümeleri, yöntemler ve açık araştırma alanları. Bilisim Teknol Derg, 14(3): 239-254.
Zhu P, Yuan Y, Chen L. 2023. ELECTRA-based graph network model for multi-hop question answering. J Intell Inf Syst, 61(3): 819-834.

Performance Evaluation of Transformer-Based Pre-Trained Language Models for Turkish Question-Answering

Year 2025, Volume: 8 Issue: 2, 323 - 329, 15.03.2025

Mert İncidelen , Murat Aydoğan

https://doi.org/10.34248/bsengineering.1596832

Abstract

Keywords

Natural language processing, Question-answering, Transformers, BERT, ELECTRA

Ethical Statement

Ethics committee approval was not required for this study because of there was no study on animals or humans.

References

Acheampong FA, Nunoo-Mensah H, Chen W. 2021. Transformer models for text-based emotion detection: a review of BERT-based approaches. Artif Intell Rev, 54(8): 5789-5829.
Akyon FC, Cavusoglu D, Cengiz C, Altinuc SO, Temizel A. 2021. Automated question generation and question answering from Turkish texts. arXiv preprint arXiv:2111.06476.
Allam AMN, Haggag MH. 2012. The question answering systems: a survey. Int J Res Rev Inf Sci (IJRRIS), 2(3).
Alzubi JA, Jain R, Singh A, Parwekar P, Gupta M. 2023. COBERT: COVID-19 question answering system using BERT. Arab J Sci Eng, 48(8): 11003-11013.
Artetxe M, Ruder S, Yogatama D. 2019. On the cross-lingual transferability of monolingual representations. arXiv preprint arXiv:1910.11856.
Arzu M, Aydoğan M. 2023. Türkçe duygu sınıflandırma için transformers tabanlı mimarilerin karşılaştırmalı analizi. Comput Sci, 2023: 1-6.
Budur E, Özçelik R, Soylu D, Khattab O, Güngör T, Potts C. 2024. Building efficient and effective OpenQA systems for low-resource languages. arXiv preprint arXiv:2401.03590.
Clark K, Luong MT, Le QV, Manning CD. 2020. ELECTRA: pre-training text encoders as discriminators rather than generators. arXiv preprint arXiv:2003.10555.
Çelikten A, Bulut H. 2021. Turkish medical text classification using BERT. 29th Signal Processing and Communications Applications Conference (SIU), June 9-11, İstanbul, Türkiye, pp: 1-4.
Devlin J, Chang MW, Lee K, Toutanova K. 2018. BERT: pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
Gemirter CB, Goularas D. 2021. A Turkish question answering system based on deep learning neural networks. J Intell Syst Theory Appl, 4(2): 65-75.
Hassani H, Beneki C, Unger S, Mazinani MT, Yeganegi MR. 2020. Text mining in big data analytics. Big Data Cogn Comput, 4(1): 1.
Jiang Z, Yu W, Zhou D, Chen Y, Feng J, Yan S. 2020. ConvBERT: improving BERT with span-based dynamic convolution. Adv Neural Inf Process Syst, 33: 12837-12848.
Khurana D, Koli A, Khatter K, Singh S. 2023. Natural language processing: state of the art, current trends and challenges. Multimed Tools Appl, 82(3): 3713-3744.
Locke S, Bashall A, Al-Adely S, Moore J, Wilson A, Kitchen GB. 2021. Natural language processing in medicine: a review. Trends Anaesth Crit Care, 38: 4-9.
Rajpurkar P, Jia R, Liang P. 2018. Know what you don’t know: unanswerable questions for SQuAD. arXiv preprint arXiv:1806.03822.
Rajpurkar P, Zhang J, Lopyrev K, Liang P. 2016. SQuAD: 100,000+ questions for machine comprehension of text. arXiv preprint arXiv:1606.05250.
Raza S, Schwartz B, Rosella LC. 2022. CoQUAD: A COVID-19 question answering dataset system, facilitating research, benchmarking, and practice. BMC Bioinfo, 23(1): 210.
Sanh V, Debut L, Chaumond J, Wolf T. 2019. DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01106.
Savci P, Das B. 2023. Comparison of pre-trained language models in terms of carbon emissions, time and accuracy in multi-label text classification using AutoML. Heliyon, 9(5).
Schweter S. 2020. BERTurk-BERT models for Turkish. Zenodo, 2020: 3770924.
Soygazi F, Çiftçi O, Kök U, Cengiz S. 2021. THQuAD: Turkish historic question answering dataset for reading comprehension. 6th International Conference on Computer Science and Engineering (UBMK), September 15-17, Ankara, Türkiye, pp: 215-220.
Türkmen H, Dikenelli O, Eraslan C, Callı MC, Özbek SS. 2023. BioBERTurk: exploring Turkish biomedical language model development strategies in low-resource setting. J Healthc Inform Res, 7(4): 433-446.
Vaswani A, Shazeer N, Parmar N, Uszkoreit J, Jones L, Gomez AN, Kaiser Ł, Polosukhin I. 2017. Attention is all you need. Adv Neural Inf Process Syst, 2017: 30.
Yiğit G, Amasyalı F. 2021. Soru cevaplama sistemleri üzerine detaylı bir çalışma: veri kümeleri, yöntemler ve açık araştırma alanları. Bilisim Teknol Derg, 14(3): 239-254.
Zhu P, Yuan Y, Chen L. 2023. ELECTRA-based graph network model for multi-hop question answering. J Intell Inf Syst, 61(3): 819-834.

There are 26 citations in total.

Details

Primary Language	English
Subjects	Information Systems Development Methodologies and Practice
Journal Section	Research Articles
Authors	Mert İncidelen 0009-0002-1975-8332 Murat Aydoğan 0000-0002-6876-6454
Publication Date	March 15, 2025
Submission Date	December 5, 2024
Acceptance Date	January 15, 2025
Published in Issue	Year 2025 Volume: 8 Issue: 2

Cite

APA	İncidelen, M., & Aydoğan, M. (2025). Performance Evaluation of Transformer-Based Pre-Trained Language Models for Turkish Question-Answering. Black Sea Journal of Engineering and Science, 8(2), 323-329. https://doi.org/10.34248/bsengineering.1596832
AMA	İncidelen M, Aydoğan M. Performance Evaluation of Transformer-Based Pre-Trained Language Models for Turkish Question-Answering. BSJ Eng. Sci. March 2025;8(2):323-329. doi:10.34248/bsengineering.1596832
Chicago	İncidelen, Mert, and Murat Aydoğan. “Performance Evaluation of Transformer-Based Pre-Trained Language Models for Turkish Question-Answering”. Black Sea Journal of Engineering and Science 8, no. 2 (March 2025): 323-29. https://doi.org/10.34248/bsengineering.1596832.
EndNote	İncidelen M, Aydoğan M (March 1, 2025) Performance Evaluation of Transformer-Based Pre-Trained Language Models for Turkish Question-Answering. Black Sea Journal of Engineering and Science 8 2 323–329.
IEEE	M. İncidelen and M. Aydoğan, “Performance Evaluation of Transformer-Based Pre-Trained Language Models for Turkish Question-Answering”, BSJ Eng. Sci., vol. 8, no. 2, pp. 323–329, 2025, doi: 10.34248/bsengineering.1596832.
ISNAD	İncidelen, Mert - Aydoğan, Murat. “Performance Evaluation of Transformer-Based Pre-Trained Language Models for Turkish Question-Answering”. Black Sea Journal of Engineering and Science 8/2 (March 2025), 323-329. https://doi.org/10.34248/bsengineering.1596832.
JAMA	İncidelen M, Aydoğan M. Performance Evaluation of Transformer-Based Pre-Trained Language Models for Turkish Question-Answering. BSJ Eng. Sci. 2025;8:323–329.
MLA	İncidelen, Mert and Murat Aydoğan. “Performance Evaluation of Transformer-Based Pre-Trained Language Models for Turkish Question-Answering”. Black Sea Journal of Engineering and Science, vol. 8, no. 2, 2025, pp. 323-9, doi:10.34248/bsengineering.1596832.
Vancouver	İncidelen M, Aydoğan M. Performance Evaluation of Transformer-Based Pre-Trained Language Models for Turkish Question-Answering. BSJ Eng. Sci. 2025;8(2):323-9.

Download Cover Image

Article Files

Full Text

24890