Research Article
BibTex RIS Cite

COMPARISON OF PERFORMANCES OF OPEN ACCESS NATURAL LANGUAGE PROCESSING BASED CHATBOT APPLICATIONS IN TRIAGE DECISIONS

Year 2023, Volume: 25 Issue: 3, 482 - 521, 26.12.2023
https://doi.org/10.24938/kutfd.1369468

Abstract

Objective: Being publicly available, easy to use, and continuously evolving, next-generation chatbots have the potential to be used in triage, one of the most critical functions of an Emergency Department. The aim of this study was to assess the performance of Generative Pre-trained Transformer 4 (GPT-4), Bard and Claude during decision-making for Emergency Department triage.
Material and Methods: This was a preliminary cross-sectional study conducted with 50 case scenarios. Emergency Medicine specialists determined the reference Emergency Severity Index triage category of each scenario. Subsequently, each case scenario was queried using three chatbots. Inconsistent classifications between the chatbots and references were defined as over-triage (false positive) or under-triage (false negative). The primary and secondary outcomes were the predictive performance of chatbots and the difference between them in predicting high acuity triage.
Results: F1 Scores for GPT-4, Bard, and Claude for predicting Emergency Severity Index 1 and 2 were 0.899, 0.791, and 0.865 respectively. The ROC Curve of GPT-4 for high acuity predictions showed an area under the curve (AUC) of 0.911 (95% CI: 0,814-1; p<0.001), while Bard showed an AUC of 0.819 (95% CI: 0.692-0.945; p<0.001) and for Claude this was 0.881 (95% CI:0.768-0.994; p<0.001).
Conclusion: GPT-4, in its current form, was able to detect high acuity Emergency Severity Index scores in our case set and had close agreement with Emergency Medicine specialists, followed by Claude, while Bard's agreement was relatively lower. GPT-4 and Claude provided better results than Bard in case management recommendations. We believe that studies evaluating the effectiveness and limitations of chatbots in triage are important because of their future potential.

Ethical Statement

Institutional review board approval was obtained for this study on 06.04.2023 (Kocaeli University Non-Interventional Clinical Research Ethics Committee - GOKAEK-2023/07.10).

Thanks

The authors would like to thank Prof. Elif Yaka for her valuable insights.

References

  • Lee P, Bubeck S, Petro J. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. New England Journal of Medicine. 2023;388(13):1233-9.
  • OpenAI. GPT-4 technical report. ArXiv. Accessed date: September 29, 2023: https://arxiv.org/abs/2303.08774.
  • Katz DM, Bommarito MJ, Gao S, Arredondo P. GPT-4 passes the bar exam. SSRN Electronic Journal. Published online 2023.
  • Google. Bard FAQ. Accessed date: April 21, 2023: https://bard.google.com/faq?hl=en
  • Anthropic. Introducing Claude. Accessed date: April 21, 2023:https://www.anthropic.com/index/introducing- claude
  • Kuriyama A, Urushidani S, Nakayama T. Five-level emergency triage systems: Variation in assessment of validity. Emergency Medicine Journal. 2017;34(11):703-10.
  • McHugh M, Tanabe P, McClelland M, Khare RK. More patients are triaged using the emergency severity index than any other triage acuity system in the United States. Academic Emergency Medicine. 2012;19(1):106-9.
  • Gilboy N, Tanabe P, Travers D, Rosenau A, Eitel D. Emergency Severity Index, Version 4: Implementation Handbook. 2005. Accessed date: September 29, 2023: https://www.sgnor.ch/fileadmin/user_upload/Doku mente/Downloads/Esi_Handbook.pdf.
  • Sánchez-Salmerón R, Gómez-Urquiza JL, Albendín-García L, Correa-Rodríguez M, Martos- Cabrera MB, Velando-Soriano A et al. Machine learning methods applied to triage in emergency services: A systematic review. Int Emerg Nurs. 2022;60:101109.
  • Greenbaum NR, Jernite Y, Halpern Y, Calder S, Nathanson LA, Sontag DA et al. Improving documentation of presenting problems in the emergency department using a domain-specific ontology and machine learning-driven user interfaces. Int J Med Inform. 2019;132:103981.
  • Sterling NW, Patzer RE, Di M, Schrager JD. Prediction of emergency department patient disposition based on natural language processing of triage notes. Int J Med Inform. 2019;129:184-8.
  • Sterling NW, Brann F, Patzer RE, Di M, Koebbe M, Burke M et al. Prediction of emergency department resource requirements during triage: An application of current natural language processing techniques. J Am Coll Emerg Physicians Open. 2020;1(6):1676- 83.
  • Tootooni MS, Pasupathy KS, Heaton HA, Clements CM, Sir MY. CCMapper: An adaptive NLP-based free-text chief complaint mapping algorithm. Comput Biol Med. 2019;113:103398.
  • Stewart J, Lu J, Goudie A, Arendts G, Meka SA, Freeman S et al. Applications of natural language processing at emergency department triage: A systematic review. MedRxiv. Published online December 21, 2022. Accessed date: April 21, 2023: https://doi.org/10.1101/2022.12.20.22283735.
  • Ivanov O, Wolf L, Brecher D, Lewis E, Masek K, Montgomery K et al. Improving ED emergency severity index acuity assignment using machine learning and clinical natural language processing. J Emerg Nurs. 2021;47(2):265-278.e7.
  • Ganjali R, Golmakani R, Ebrahimi M, Eslami S, Bolvardi E. Accuracy of the emergency department triage system using the emergency severity index for predicting patient outcome: A single center experience. Bull Emerg Trauma. 2020;8(2):115-20.
  • Chang D, Hong WS, Taylor RA. Generating contextual embeddings for emergency department chief complaints. JAMIA Open. 2020;3(2):160-6.
  • Arora A, Arora A. The promise of large language models in health care. The Lancet. 2023;401(10377):641.
  • Iftikhar L, Iftikhar MF, I Hanif M. DocGPT: Impact of ChatGPT-3 on health services as a virtual doctor. EC Paediatrics. 2023;12(3):45-55. Accessed date: April 21, 2023: https://ecronicon.org/assets/ecpe/pdf/ECPE-12- 01277.pdf
  • Chen W, Linthicum B, Argon NT, Bohrmann T, Lopiano K, Mehrotra A et al. The effects of emergency department crowding on triage and hospital admission decisions. Am J Emerg Med. 2020;38(4):774-9.
  • Rashid K, Ullah M, Ahmed ST, Sajid MZ, Hayat MA, Nawaz B et al. Accuracy of emergency room triage using emergency severity index (ESI): Independent predictor of under and over triage. Cureus. 2021;13(12):e20229.
  • Takaoka K, Ooya K, Ono M, Kakeda T. Utility of the emergency severity index by accuracy of interrater agreement by expert triage nurses in a simulated scenario in Japan: A randomized controlled trial. J Emerg Nurs. 2021;47(4):669-74.
  • Wang G, Liu X, Xie K, Chen N, Chen T. DeepTriager: A neural attention model for emergency triage with electronic health records. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE; 2019:978-82.
  • Tahayori B, Chini‐Foroush N, Akhlaghi H. Advanced natural language processing technique to predict patient disposition based on emergency triage notes. Emergency Medicine Australasia. 2021;33(3):480-4.
  • Passi S, Vorvoreanu M. Overreliance on AI: Literature review. 2022. Accessed date: April 21, 2023: https://www.microsoft.com/en- us/research/uploads/prod/2022/06/Aether- Overreliance-on-AI-Review-Final-6.21.22.pdf

Açık Erişimli Doğal Dil İşleme Tabanlı Sohbet Botu Uygulamalarının Triyaj Kararlarındaki Performanslarının Karşılaştırılması

Year 2023, Volume: 25 Issue: 3, 482 - 521, 26.12.2023
https://doi.org/10.24938/kutfd.1369468

Abstract

Amaç: Herkese açık olan, kolay kullanılan ve sürekli gelişen yeni nesil sohbet botları, Acil Servisin en kritik işlevlerinden biri olan triyajda kullanılma potansiyeline sahiptir. Bu çalışmanın amacı, acil servis triyajına karar verme sırasında Generative Pre-trained Transformer 4 (GPT-4), Bard ve Claude uygulamalarının performansını değerlendirmektir.
Gereç ve Yöntemler: Bu çalışma, 50 vaka senaryosu ile yürütülen kesitsel bir ön çalışmaydı. Acil Tıp uzmanları her senaryonun referans Emergency Severity Index triyaj kategorisini belirledikten sonra, her vaka senaryosu üç sohbet botu kullanılarak sorgulandı. Sohbet botları ve referanslar arasındaki tutarsız sınıflandırmalar overtriyaj (yanlış pozitif) veya undertriyaj (yanlış negatif) olarak tanımlandı. Birincil sonlanım sohbet botlarının tahmin performansı ve ikincil sonlanım ise yüksek ciddiyetteki vakaların triyajını belirlemede aralarındaki farktı.
Bulgular: GPT-4, Bard ve Claude’nin Emergency Severity Index 1 ve 2’yi belirlemede F1 skorları sırasıyla 0,899, 0,791 ve 0,865’ti. Yüksek ciddiyet tespiti için ROC eğrilerinde; GPT-4'ün eğri altında kalan alanı (AUC) 0,911 (%95 GA: 0,814-1;p<0.001), Bard’ın 0,819 (%95 GA: 0,692-0,945; p<0.001) ve Claude’nin 0,881 idi (%95 GA: 0,768-0,994; p<0,001).
Sonuç: GPT-4, mevcut haliyle, vaka setimizde yüksek ciddiyetteki Emergency Severity Index skorlarını tespit edebildi ve Acil Tıp uzmanları ile yakın uyum gösterdi. Bunu Claude takip ederken, Bard ile uyumu ise nispeten daha düşüktü. GPT-4 ve Claude, vaka yönetimi önerilerinde Bard'a göre daha iyi sonuçlar verdi. Gelecekteki potansiyelleri nedeniyle, sohbet botlarının triyajdaki etkinliğini ve sınırlılıklarını değerlendiren çalışmaların önemli olduğunu düşünüyoruz.

References

  • Lee P, Bubeck S, Petro J. Benefits, limits, and risks of GPT-4 as an AI chatbot for medicine. New England Journal of Medicine. 2023;388(13):1233-9.
  • OpenAI. GPT-4 technical report. ArXiv. Accessed date: September 29, 2023: https://arxiv.org/abs/2303.08774.
  • Katz DM, Bommarito MJ, Gao S, Arredondo P. GPT-4 passes the bar exam. SSRN Electronic Journal. Published online 2023.
  • Google. Bard FAQ. Accessed date: April 21, 2023: https://bard.google.com/faq?hl=en
  • Anthropic. Introducing Claude. Accessed date: April 21, 2023:https://www.anthropic.com/index/introducing- claude
  • Kuriyama A, Urushidani S, Nakayama T. Five-level emergency triage systems: Variation in assessment of validity. Emergency Medicine Journal. 2017;34(11):703-10.
  • McHugh M, Tanabe P, McClelland M, Khare RK. More patients are triaged using the emergency severity index than any other triage acuity system in the United States. Academic Emergency Medicine. 2012;19(1):106-9.
  • Gilboy N, Tanabe P, Travers D, Rosenau A, Eitel D. Emergency Severity Index, Version 4: Implementation Handbook. 2005. Accessed date: September 29, 2023: https://www.sgnor.ch/fileadmin/user_upload/Doku mente/Downloads/Esi_Handbook.pdf.
  • Sánchez-Salmerón R, Gómez-Urquiza JL, Albendín-García L, Correa-Rodríguez M, Martos- Cabrera MB, Velando-Soriano A et al. Machine learning methods applied to triage in emergency services: A systematic review. Int Emerg Nurs. 2022;60:101109.
  • Greenbaum NR, Jernite Y, Halpern Y, Calder S, Nathanson LA, Sontag DA et al. Improving documentation of presenting problems in the emergency department using a domain-specific ontology and machine learning-driven user interfaces. Int J Med Inform. 2019;132:103981.
  • Sterling NW, Patzer RE, Di M, Schrager JD. Prediction of emergency department patient disposition based on natural language processing of triage notes. Int J Med Inform. 2019;129:184-8.
  • Sterling NW, Brann F, Patzer RE, Di M, Koebbe M, Burke M et al. Prediction of emergency department resource requirements during triage: An application of current natural language processing techniques. J Am Coll Emerg Physicians Open. 2020;1(6):1676- 83.
  • Tootooni MS, Pasupathy KS, Heaton HA, Clements CM, Sir MY. CCMapper: An adaptive NLP-based free-text chief complaint mapping algorithm. Comput Biol Med. 2019;113:103398.
  • Stewart J, Lu J, Goudie A, Arendts G, Meka SA, Freeman S et al. Applications of natural language processing at emergency department triage: A systematic review. MedRxiv. Published online December 21, 2022. Accessed date: April 21, 2023: https://doi.org/10.1101/2022.12.20.22283735.
  • Ivanov O, Wolf L, Brecher D, Lewis E, Masek K, Montgomery K et al. Improving ED emergency severity index acuity assignment using machine learning and clinical natural language processing. J Emerg Nurs. 2021;47(2):265-278.e7.
  • Ganjali R, Golmakani R, Ebrahimi M, Eslami S, Bolvardi E. Accuracy of the emergency department triage system using the emergency severity index for predicting patient outcome: A single center experience. Bull Emerg Trauma. 2020;8(2):115-20.
  • Chang D, Hong WS, Taylor RA. Generating contextual embeddings for emergency department chief complaints. JAMIA Open. 2020;3(2):160-6.
  • Arora A, Arora A. The promise of large language models in health care. The Lancet. 2023;401(10377):641.
  • Iftikhar L, Iftikhar MF, I Hanif M. DocGPT: Impact of ChatGPT-3 on health services as a virtual doctor. EC Paediatrics. 2023;12(3):45-55. Accessed date: April 21, 2023: https://ecronicon.org/assets/ecpe/pdf/ECPE-12- 01277.pdf
  • Chen W, Linthicum B, Argon NT, Bohrmann T, Lopiano K, Mehrotra A et al. The effects of emergency department crowding on triage and hospital admission decisions. Am J Emerg Med. 2020;38(4):774-9.
  • Rashid K, Ullah M, Ahmed ST, Sajid MZ, Hayat MA, Nawaz B et al. Accuracy of emergency room triage using emergency severity index (ESI): Independent predictor of under and over triage. Cureus. 2021;13(12):e20229.
  • Takaoka K, Ooya K, Ono M, Kakeda T. Utility of the emergency severity index by accuracy of interrater agreement by expert triage nurses in a simulated scenario in Japan: A randomized controlled trial. J Emerg Nurs. 2021;47(4):669-74.
  • Wang G, Liu X, Xie K, Chen N, Chen T. DeepTriager: A neural attention model for emergency triage with electronic health records. In: 2019 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). IEEE; 2019:978-82.
  • Tahayori B, Chini‐Foroush N, Akhlaghi H. Advanced natural language processing technique to predict patient disposition based on emergency triage notes. Emergency Medicine Australasia. 2021;33(3):480-4.
  • Passi S, Vorvoreanu M. Overreliance on AI: Literature review. 2022. Accessed date: April 21, 2023: https://www.microsoft.com/en- us/research/uploads/prod/2022/06/Aether- Overreliance-on-AI-Review-Final-6.21.22.pdf
There are 25 citations in total.

Details

Primary Language English
Subjects Health Services and Systems (Other)
Journal Section Özgün Araştırma
Authors

İbrahim Sarbay 0000-0001-8804-2501

Göksu Bozdereli Berikol 0000-0002-4529-3578

İbrahim Ulaş Özturan 0000-0002-1364-5292

Keith Grimes 0000-0002-4906-6612

Publication Date December 26, 2023
Submission Date October 1, 2023
Published in Issue Year 2023 Volume: 25 Issue: 3

Cite

APA Sarbay, İ., Bozdereli Berikol, G., Özturan, İ. U., Grimes, K. (2023). COMPARISON OF PERFORMANCES OF OPEN ACCESS NATURAL LANGUAGE PROCESSING BASED CHATBOT APPLICATIONS IN TRIAGE DECISIONS. The Journal of Kırıkkale University Faculty of Medicine, 25(3), 482-521. https://doi.org/10.24938/kutfd.1369468
AMA Sarbay İ, Bozdereli Berikol G, Özturan İU, Grimes K. COMPARISON OF PERFORMANCES OF OPEN ACCESS NATURAL LANGUAGE PROCESSING BASED CHATBOT APPLICATIONS IN TRIAGE DECISIONS. Kırıkkale Uni Med J. December 2023;25(3):482-521. doi:10.24938/kutfd.1369468
Chicago Sarbay, İbrahim, Göksu Bozdereli Berikol, İbrahim Ulaş Özturan, and Keith Grimes. “COMPARISON OF PERFORMANCES OF OPEN ACCESS NATURAL LANGUAGE PROCESSING BASED CHATBOT APPLICATIONS IN TRIAGE DECISIONS”. The Journal of Kırıkkale University Faculty of Medicine 25, no. 3 (December 2023): 482-521. https://doi.org/10.24938/kutfd.1369468.
EndNote Sarbay İ, Bozdereli Berikol G, Özturan İU, Grimes K (December 1, 2023) COMPARISON OF PERFORMANCES OF OPEN ACCESS NATURAL LANGUAGE PROCESSING BASED CHATBOT APPLICATIONS IN TRIAGE DECISIONS. The Journal of Kırıkkale University Faculty of Medicine 25 3 482–521.
IEEE İ. Sarbay, G. Bozdereli Berikol, İ. U. Özturan, and K. Grimes, “COMPARISON OF PERFORMANCES OF OPEN ACCESS NATURAL LANGUAGE PROCESSING BASED CHATBOT APPLICATIONS IN TRIAGE DECISIONS”, Kırıkkale Uni Med J, vol. 25, no. 3, pp. 482–521, 2023, doi: 10.24938/kutfd.1369468.
ISNAD Sarbay, İbrahim et al. “COMPARISON OF PERFORMANCES OF OPEN ACCESS NATURAL LANGUAGE PROCESSING BASED CHATBOT APPLICATIONS IN TRIAGE DECISIONS”. The Journal of Kırıkkale University Faculty of Medicine 25/3 (December 2023), 482-521. https://doi.org/10.24938/kutfd.1369468.
JAMA Sarbay İ, Bozdereli Berikol G, Özturan İU, Grimes K. COMPARISON OF PERFORMANCES OF OPEN ACCESS NATURAL LANGUAGE PROCESSING BASED CHATBOT APPLICATIONS IN TRIAGE DECISIONS. Kırıkkale Uni Med J. 2023;25:482–521.
MLA Sarbay, İbrahim et al. “COMPARISON OF PERFORMANCES OF OPEN ACCESS NATURAL LANGUAGE PROCESSING BASED CHATBOT APPLICATIONS IN TRIAGE DECISIONS”. The Journal of Kırıkkale University Faculty of Medicine, vol. 25, no. 3, 2023, pp. 482-21, doi:10.24938/kutfd.1369468.
Vancouver Sarbay İ, Bozdereli Berikol G, Özturan İU, Grimes K. COMPARISON OF PERFORMANCES OF OPEN ACCESS NATURAL LANGUAGE PROCESSING BASED CHATBOT APPLICATIONS IN TRIAGE DECISIONS. Kırıkkale Uni Med J. 2023;25(3):482-521.

This Journal is a Publication of Kırıkkale University Faculty of Medicine.