Araştırma Makalesi
BibTex RIS Kaynak Göster

Sentetik Verilerle Klinik Psikoloji Veri Seti Oluşturma: Bilişsel Çarpıtmaların NLP ile Sınıflandırılarak Otomatik Tespiti

Yıl 2025, , 83 - 92, 27.03.2025
https://doi.org/10.35234/fumbd.1469178

Öz

Bilişsel çarpıtmalar, bireylerin gerçekliği yanıltıcı bir şekilde algılamalarına neden olan düşünce hatalarıdır ve psikopatolojilerle güçlü bir ilişkisi vardır. Bu nedenle, çarpıtmaların doğru bir şekilde belirlenmesi ve sınıflandırılması, bilişsel davranışçı terapinin (CBT) etkinliğini artırabilir. Bu çalışma, bilişsel çarpıtmaların otomatik tespiti için derin öğrenme ve NLP tekniklerinin etkinliğini incelemektedir. GPT-4 ile üretilen İngilizce sentetik veriler (2000 örnek) ve Shreevastava ve Foltz’un veri seti (1590 bilişsel çarpıtma, 933 çarpıtma içermeyen örnek) kullanılarak RoBERTa modeli eğitilmiştir. Üç senaryo test edilmiştir: orijinal veri seti, sentetik veri seti ve bunların kombinasyonu. Sonuçlar, sentetik verilerin güçlü bir kaynak olduğunu göstermiştir. Doğruluk oranları sırasıyla %60,67 (orijinal), %94,51 (sentetik) ve %77,18 (kombine) olarak elde edilmiştir. GPT-4 tabanlı veri seti, özellikle bazı kategorilerde neredeyse mükemmel F1 skorları sağlamıştır. ROC eğrisi analizleri, GPT-4 veri setinin en yüksek AUC değerine (0,80) sahip olduğunu göstermiştir. Çalışma, sentetik veri kullanımının klinik psikolojide yapay zeka uygulamalarının potansiyelini genişlettiğini ve hasta gizliliğini korurken etkili modeller geliştirmenin bir yolunu sunduğunu ortaya koymuştur. Gelecekteki araştırmalar için, sentetik verilerin farklı modellerle test edilmesi ve gerçek klinik verilerle karşılaştırılması önerilmektedir.

Kaynakça

  • Beck AT. Cognitive therapy and emotional disorders. New York: New American Library; 1976.
  • Ellis A. Reason and emotion in psychotherapy. New York: Lyle Stuart; 1962.
  • Burns DD. Feeling good: The new mood therapy. New York: New American Library; 1980.
  • Beck AT. Cognitive therapy: Nature and relation to behavior therapy. Behav Ther. 1970;1(2):184-200.
  • Rnic K, Dozois DJ, Martin RA. Cognitive distortions, humor styles, and depression. Eur J Psychol. 2016;12(3):348-62.
  • Marton P, Kutcher S. The prevalence of cognitive distortion in depressed adolescents. J Psychiatry Neurosci. 1995;20(1):33.
  • Attia E, Schroeder L. Pharmacologic treatment of anorexia nervosa: where do we go from here? Int J Eat Disord. 2005;37(S1):S60-S63.
  • Altuntaş Y, Söyler HÇ, Kula H. Kumar bağımlılarıyla sağlıklı kontrollerin bilişsel çarpıtmaları, psikopatolojileri ve aile ilişkilerinin karşılaştırılması. Sosyal Beşeri ve İdari Bilimler Dergisi. 2023;6(1):68-84.
  • Buga A, Kaya İ. The role of cognitive distortions related to academic achievement in predicting the depression, stress, and anxiety levels of adolescents. Int J Contemp Educ Res. 2022;9(1):103-14.
  • Zhou M, Duan N, Liu S, Shum H. Progress in neural NLP: Modeling, learning, and reasoning. Engineering. 2020.
  • Wang Y. Basic methodologies used in NLP area. 2020 IEEE 3rd International Conference on Automation, Electronics and Electrical Engineering (AUTEEE). 2020:505-11.
  • Tenney I, Das D, Pavlick E. BERT rediscovers the classical NLP pipeline. 2019:4593-601. https://doi.org/10.18653/v1/P19-1452.
  • Wallace E, Gardner M, Singh S. Interpreting predictions of NLP models. 2020:20-23. https://doi.org/10.18653/v1/2020.emnlp-tutorials.3.
  • Harrigian K, Aguirre C, Dredze M. On the state of social media data for mental health research. In: 7th Workshop on Computational Linguistics and Clinical Psychology: Improving Access, CLPsych 2021; 2021:15-24.
  • Shickel B, Siegel S, Heesacker M, Benton S, Rashidi P. Automatic detection and classification of cognitive distortions in mental health text. 2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE). 2020:275-80.
  • Kaggle. Therapist QA dataset [Internet]. Available from: https://www.kaggle.com/datasets/arnmaud/therapist-qa. Accessed 30 Oct 2024.
  • Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  • Shreevastava S, Foltz PW. Detecting cognitive distortions from patient-therapist interactions. NAACL HLT. 2021;151.
  • Kaggle.Cognitive Distortion Detection Dataset [Internet]. Available from: https://www.kaggle.com/datasets/sagarikashreevastava/cognitive-distortion-detetction-dataset?select=Annotated_data.csv. Accessed 30 Oct 2024.
  • OpenAI. GPT-4 technical report. arXiv preprint arXiv:2303.08774.
  • Brown T, et al. Language models are few-shot learners. Adv Neural Inf Process Syst. 2020;33:1877-901.
  • Gao L, Zhang L, Zhang L, Huang J. RSVN: A RoBERTa sentence vector normalization scheme for short texts to extract semantic information. Appl Sci. 2022;12(21):11278.
  • ChatGPT. Shared conversation [Internet]. Available from: https://chatgpt.com/share/e/e7f0600a-68c3-482d-8647-a40cc4b8ee7a. Accessed 30 Oct 2024.
  • HuggingFace. Dataset-1 [Internet]. Available from: https://doi.org/10.57967/hf/2858. Accessed 30 Oct 2024.
  • HuggingFace. Dataset-2 [Internet]. Available from: https://doi.org/10.57967/hf/2857. Accessed 30 Oct 2024.
  • Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, et al. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.
  • Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
  • Zhang X, Zhao J, LeCun Y. A novel RoBERTa-GRU model for sentiment analysis. Appl Sci. 2023;13(6):3915.
  • HuggingFace. Model-1 [Internet]. Available from: https://doi.org/10.57967/hf/2859. Accessed 30 Oct 2024.
  • HuggingFace. Model-2 [Internet]. Available from: https://doi.org/10.57967/hf/2832. Accessed 30 Oct 2024.

Creating a Clinical Psychology Dataset with Synthetic Data: Automatic Detection of Cognitive Distortions Classified with NLP

Yıl 2025, , 83 - 92, 27.03.2025
https://doi.org/10.35234/fumbd.1469178

Öz

Cognitive distortions are thought errors that lead individuals to perceive reality in a misleading way and are strongly associated with psychopathologies. Therefore, accurately identifying and classifying distortions can enhance the effectiveness of cognitive-behavioral therapy (CBT). This study investigates the effectiveness of deep learning and NLP techniques for the automatic detection of cognitive distortions. The RoBERTa model was trained using English synthetic data generated by GPT-4 (2000 examples) and the dataset from Shreevastava and Foltz (1590 cognitive distortion examples, 933 non-distortion examples). Three scenarios were tested: the original dataset, the synthetic dataset, and their combination. The results showed that synthetic data is a strong resource. Accuracy rates were 60.67% (original), 94.51% (synthetic), and 77.18% (combined). The GPT-4-based dataset provided almost perfect F1 scores, particularly in some categories. ROC curve analyses showed that the GPT-4 dataset had the highest AUC value (0.80). The study revealed that using synthetic data expands the potential of AI applications in clinical psychology and offers a way to develop effective models while preserving patient privacy. Future research should test synthetic data with different models and compare it with real clinical data.

Kaynakça

  • Beck AT. Cognitive therapy and emotional disorders. New York: New American Library; 1976.
  • Ellis A. Reason and emotion in psychotherapy. New York: Lyle Stuart; 1962.
  • Burns DD. Feeling good: The new mood therapy. New York: New American Library; 1980.
  • Beck AT. Cognitive therapy: Nature and relation to behavior therapy. Behav Ther. 1970;1(2):184-200.
  • Rnic K, Dozois DJ, Martin RA. Cognitive distortions, humor styles, and depression. Eur J Psychol. 2016;12(3):348-62.
  • Marton P, Kutcher S. The prevalence of cognitive distortion in depressed adolescents. J Psychiatry Neurosci. 1995;20(1):33.
  • Attia E, Schroeder L. Pharmacologic treatment of anorexia nervosa: where do we go from here? Int J Eat Disord. 2005;37(S1):S60-S63.
  • Altuntaş Y, Söyler HÇ, Kula H. Kumar bağımlılarıyla sağlıklı kontrollerin bilişsel çarpıtmaları, psikopatolojileri ve aile ilişkilerinin karşılaştırılması. Sosyal Beşeri ve İdari Bilimler Dergisi. 2023;6(1):68-84.
  • Buga A, Kaya İ. The role of cognitive distortions related to academic achievement in predicting the depression, stress, and anxiety levels of adolescents. Int J Contemp Educ Res. 2022;9(1):103-14.
  • Zhou M, Duan N, Liu S, Shum H. Progress in neural NLP: Modeling, learning, and reasoning. Engineering. 2020.
  • Wang Y. Basic methodologies used in NLP area. 2020 IEEE 3rd International Conference on Automation, Electronics and Electrical Engineering (AUTEEE). 2020:505-11.
  • Tenney I, Das D, Pavlick E. BERT rediscovers the classical NLP pipeline. 2019:4593-601. https://doi.org/10.18653/v1/P19-1452.
  • Wallace E, Gardner M, Singh S. Interpreting predictions of NLP models. 2020:20-23. https://doi.org/10.18653/v1/2020.emnlp-tutorials.3.
  • Harrigian K, Aguirre C, Dredze M. On the state of social media data for mental health research. In: 7th Workshop on Computational Linguistics and Clinical Psychology: Improving Access, CLPsych 2021; 2021:15-24.
  • Shickel B, Siegel S, Heesacker M, Benton S, Rashidi P. Automatic detection and classification of cognitive distortions in mental health text. 2020 IEEE 20th International Conference on Bioinformatics and Bioengineering (BIBE). 2020:275-80.
  • Kaggle. Therapist QA dataset [Internet]. Available from: https://www.kaggle.com/datasets/arnmaud/therapist-qa. Accessed 30 Oct 2024.
  • Devlin J, Chang MW, Lee K, Toutanova K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805.
  • Shreevastava S, Foltz PW. Detecting cognitive distortions from patient-therapist interactions. NAACL HLT. 2021;151.
  • Kaggle.Cognitive Distortion Detection Dataset [Internet]. Available from: https://www.kaggle.com/datasets/sagarikashreevastava/cognitive-distortion-detetction-dataset?select=Annotated_data.csv. Accessed 30 Oct 2024.
  • OpenAI. GPT-4 technical report. arXiv preprint arXiv:2303.08774.
  • Brown T, et al. Language models are few-shot learners. Adv Neural Inf Process Syst. 2020;33:1877-901.
  • Gao L, Zhang L, Zhang L, Huang J. RSVN: A RoBERTa sentence vector normalization scheme for short texts to extract semantic information. Appl Sci. 2022;12(21):11278.
  • ChatGPT. Shared conversation [Internet]. Available from: https://chatgpt.com/share/e/e7f0600a-68c3-482d-8647-a40cc4b8ee7a. Accessed 30 Oct 2024.
  • HuggingFace. Dataset-1 [Internet]. Available from: https://doi.org/10.57967/hf/2858. Accessed 30 Oct 2024.
  • HuggingFace. Dataset-2 [Internet]. Available from: https://doi.org/10.57967/hf/2857. Accessed 30 Oct 2024.
  • Liu Y, Ott M, Goyal N, Du J, Joshi M, Chen D, et al. RoBERTa: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.
  • Chung J, Gulcehre C, Cho K, Bengio Y. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555.
  • Zhang X, Zhao J, LeCun Y. A novel RoBERTa-GRU model for sentiment analysis. Appl Sci. 2023;13(6):3915.
  • HuggingFace. Model-1 [Internet]. Available from: https://doi.org/10.57967/hf/2859. Accessed 30 Oct 2024.
  • HuggingFace. Model-2 [Internet]. Available from: https://doi.org/10.57967/hf/2832. Accessed 30 Oct 2024.
Toplam 30 adet kaynakça vardır.

Ayrıntılar

Birincil Dil İngilizce
Konular Doğal Dil İşleme
Bölüm MBD
Yazarlar

Hakkı Halil Babacan 0000-0001-9609-5128

Ramazan Oğuz 0000-0002-7297-4141

Yahya Kemal Beyitoğlu 0000-0001-6421-8939

Yayımlanma Tarihi 27 Mart 2025
Gönderilme Tarihi 16 Nisan 2024
Kabul Tarihi 16 Eylül 2024
Yayımlandığı Sayı Yıl 2025

Kaynak Göster

APA Babacan, H. H., Oğuz, R., & Beyitoğlu, Y. K. (2025). Creating a Clinical Psychology Dataset with Synthetic Data: Automatic Detection of Cognitive Distortions Classified with NLP. Fırat Üniversitesi Mühendislik Bilimleri Dergisi, 37(1), 83-92. https://doi.org/10.35234/fumbd.1469178
AMA Babacan HH, Oğuz R, Beyitoğlu YK. Creating a Clinical Psychology Dataset with Synthetic Data: Automatic Detection of Cognitive Distortions Classified with NLP. Fırat Üniversitesi Mühendislik Bilimleri Dergisi. Mart 2025;37(1):83-92. doi:10.35234/fumbd.1469178
Chicago Babacan, Hakkı Halil, Ramazan Oğuz, ve Yahya Kemal Beyitoğlu. “Creating a Clinical Psychology Dataset With Synthetic Data: Automatic Detection of Cognitive Distortions Classified With NLP”. Fırat Üniversitesi Mühendislik Bilimleri Dergisi 37, sy. 1 (Mart 2025): 83-92. https://doi.org/10.35234/fumbd.1469178.
EndNote Babacan HH, Oğuz R, Beyitoğlu YK (01 Mart 2025) Creating a Clinical Psychology Dataset with Synthetic Data: Automatic Detection of Cognitive Distortions Classified with NLP. Fırat Üniversitesi Mühendislik Bilimleri Dergisi 37 1 83–92.
IEEE H. H. Babacan, R. Oğuz, ve Y. K. Beyitoğlu, “Creating a Clinical Psychology Dataset with Synthetic Data: Automatic Detection of Cognitive Distortions Classified with NLP”, Fırat Üniversitesi Mühendislik Bilimleri Dergisi, c. 37, sy. 1, ss. 83–92, 2025, doi: 10.35234/fumbd.1469178.
ISNAD Babacan, Hakkı Halil vd. “Creating a Clinical Psychology Dataset With Synthetic Data: Automatic Detection of Cognitive Distortions Classified With NLP”. Fırat Üniversitesi Mühendislik Bilimleri Dergisi 37/1 (Mart 2025), 83-92. https://doi.org/10.35234/fumbd.1469178.
JAMA Babacan HH, Oğuz R, Beyitoğlu YK. Creating a Clinical Psychology Dataset with Synthetic Data: Automatic Detection of Cognitive Distortions Classified with NLP. Fırat Üniversitesi Mühendislik Bilimleri Dergisi. 2025;37:83–92.
MLA Babacan, Hakkı Halil vd. “Creating a Clinical Psychology Dataset With Synthetic Data: Automatic Detection of Cognitive Distortions Classified With NLP”. Fırat Üniversitesi Mühendislik Bilimleri Dergisi, c. 37, sy. 1, 2025, ss. 83-92, doi:10.35234/fumbd.1469178.
Vancouver Babacan HH, Oğuz R, Beyitoğlu YK. Creating a Clinical Psychology Dataset with Synthetic Data: Automatic Detection of Cognitive Distortions Classified with NLP. Fırat Üniversitesi Mühendislik Bilimleri Dergisi. 2025;37(1):83-92.