ChatGPT versus strabismus specialist on common questions about strabismus management: a comparative analysis of appropriateness and readability

Didem Dizdar Yigit; Aslan Aykut; Mehmet Orkun Sevik; Eren Çerman

doi:10.5472/marumj.1571218

Research Article

Year 2024, , 323 - 326, 30.10.2024

Didem Dizdar Yigit , Aslan Aykut , Mehmet Orkun Sevik , Eren Çerman

https://doi.org/10.5472/marumj.1571218

Abstract

References

Korngiebel DM, Mooney SD. Considering the possibilities and pitfalls of Generative Pre-trained Transformer 3 (GPT-3) in healthcare delivery. NPJ Digit Med 2021; 4: 93. doi:10.1038/ s41746.021.00464-x.
Momenaei B, Wakabayashi T, Shahlaee A, et al. Appropriateness and readability of ChatGPT-4-generated responses for surgical treatment of retinal diseases. Ophthalmol Retina 2023; 7: 862- 8. doi:10.1016/j.oret.2023.05.022.
Sarraju A, Bruemmer D, Van Iterson E, et al. Appropriateness of cardiovascular disease prevention recommendations obtained from a popular online chat-based artificial intelligence model. JAMA 2023; 329: 842-4. doi:10.1001/jama.2023.1044.
OpenAI. ChatGPT. Computer software. 2022. https://openai. com/blog/ChatGPT. Accessed on 03 December, 2023.
Teebagy S, Colwell L, Wood E, et al. Improved performance of ChatGPT-4 on the OKAP examination: A comparative study with ChatGPT-3.5. J Acad Ophthalmol (2017) 2023; 15: e184-e187. doi:10.1055/s-0043.177.4399.
Fitzsimmons PR, Michael BD, Hulley JL, et al. A readability assessment of online Parkinson’s disease information. J R Coll Physicians Edinb 2010; 40: 292-6. doi:10.4997/ JRCPE.2010.401
Kloosterboer A, Yannuzzi NA, Patel NA, et al. Assessment of the quality, content, and readability of freely available online information for patients regarding diabetic retinopathy. JAMA Ophthalmol 2019; 137: 1240-5. doi:10.1001/ jamaophthalmol.2019.3116.
Patel AJ, Kloosterboer A, Yannuzzi NA, et al. Evaluation of the content, quality, and readability of patient accessible online resources regarding cataracts. Semin Ophthalmol 2021; 36: 384-91. doi:10.1080/08820.538.2021.1893758.
AddedBytes. Readable. In, 2011-2023.
Flesch R. A new readability yardstick. J Appl Psychol 1948; 32: 221-33. doi:10.1037/h0057532.
Kincaid P, Fishburne RP, Rogers RL, Chissom BS. Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel. 1975. Institute for Simulation and Training. 56. https://stars.library.ucf.edu/istlibrary/56 Accessed on 10 January, 2024
Jindal P, MacDermid JC. Assessing reading levels of health information: uses and limitations of flesch formula. Educ Health (Abingdon) 2017; 30: 84-8. doi:10.4103/1357- 6283.210517.
McLaughlin GH. SMOG grading: A new readability formula. J Read 1969; 12: 639-46.
Nath S, Marie A, Ellershaw S, et al. New meaning for NLP: the trials and tribulations of natural language processing with GPT-3 in ophthalmology. Br J Ophthalmol 2022; 106: 889-92. doi:10.1136/bjophthalmol-2022-321141.
Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health 2023; 2: e0000198. doi:10.1371/journal.pdig.0000198.
Flesch RF. Art of readable writing. Pennsylvania: The Haddon Craftsmen, 1949.
Hamat A, Jaludin A, Mohd-Dom TN et al. Diabetes in the news: readability analysis of malaysian diabetes corpus. Int J Environ Res Public Health 2022; 19:6802. doi:10.3390/ ijerph19116802

ChatGPT versus strabismus specialist on common questions about strabismus management: a comparative analysis of appropriateness and readability

Year 2024, , 323 - 326, 30.10.2024

Didem Dizdar Yigit , Aslan Aykut , Mehmet Orkun Sevik , Eren Çerman

https://doi.org/10.5472/marumj.1571218

Abstract

Objective: Patients widely use artificial intelligence-based chatbots, and this study aims to determine their utility and limitations on
questions about strabismus. The answers to the common questions about the management of strabismus provided by Chat Generative
Pre-trained Transformer (ChatGPT)-3.5, an artificial intelligence-powered chatbot, were compared to answers from a strabismus
specialist (The Specialist) in terms of appropriateness and readability.
Patients and Methods: In this descriptive, cross-sectional study, a list of questions from strabismus patients or caregivers in outpatient
clinics about treatment, prognosis, postoperative care, and complications were subjected to ChatGPT and The Specialist. The answers
of ChatGPT were classified as appropriate or not, considering the answers of The Specialist as the reference. The readability of all the
answers was assessed according to the parameters of the Readable online toolkit.
Results: All answers provided by ChatGPT were classified as appropriate. The mean Flesch Kincaid Grade Levels of the respective
answers given by ChatGPT and The Specialist were 13.75±1.55 and 10.17±2.17 (p<0.001), higher levels indicating complexity; and
the mean Flesch Reading Ease Scores of which higher scores indicated ease, were 23.86±9.38 and 44.54±14.66 (p=0.002). The mean
reading times were 15.6±2.85 and 10.17±2.17 seconds for ChatGPT and The Specialist, respectively (p=0.003). The overall reach of the
answers by ChatGPT and The Specialist was 56.87±11.67 and 81.67±12.80 (p<0.001).
Conclusion: Although, ChatGPT provided appropriate answers to all compiled strabismus questions, those were complex or very
difficult to read for an average person. The readability scores indicated a college graduation degree would be required to understand
the answers provided by ChatGPT. However, The Specialist gave similar information in a more readable form. Therefore, physicians
and patients should consider the limitations of such similar platforms for ocular health-related questions.

Keywords

Artificial intelligence, ChatGPT, Readability, Strabismus, Strabismus surgery

References

Korngiebel DM, Mooney SD. Considering the possibilities and pitfalls of Generative Pre-trained Transformer 3 (GPT-3) in healthcare delivery. NPJ Digit Med 2021; 4: 93. doi:10.1038/ s41746.021.00464-x.
Momenaei B, Wakabayashi T, Shahlaee A, et al. Appropriateness and readability of ChatGPT-4-generated responses for surgical treatment of retinal diseases. Ophthalmol Retina 2023; 7: 862- 8. doi:10.1016/j.oret.2023.05.022.
Sarraju A, Bruemmer D, Van Iterson E, et al. Appropriateness of cardiovascular disease prevention recommendations obtained from a popular online chat-based artificial intelligence model. JAMA 2023; 329: 842-4. doi:10.1001/jama.2023.1044.
OpenAI. ChatGPT. Computer software. 2022. https://openai. com/blog/ChatGPT. Accessed on 03 December, 2023.
Teebagy S, Colwell L, Wood E, et al. Improved performance of ChatGPT-4 on the OKAP examination: A comparative study with ChatGPT-3.5. J Acad Ophthalmol (2017) 2023; 15: e184-e187. doi:10.1055/s-0043.177.4399.
Fitzsimmons PR, Michael BD, Hulley JL, et al. A readability assessment of online Parkinson’s disease information. J R Coll Physicians Edinb 2010; 40: 292-6. doi:10.4997/ JRCPE.2010.401
Kloosterboer A, Yannuzzi NA, Patel NA, et al. Assessment of the quality, content, and readability of freely available online information for patients regarding diabetic retinopathy. JAMA Ophthalmol 2019; 137: 1240-5. doi:10.1001/ jamaophthalmol.2019.3116.
Patel AJ, Kloosterboer A, Yannuzzi NA, et al. Evaluation of the content, quality, and readability of patient accessible online resources regarding cataracts. Semin Ophthalmol 2021; 36: 384-91. doi:10.1080/08820.538.2021.1893758.
AddedBytes. Readable. In, 2011-2023.
Flesch R. A new readability yardstick. J Appl Psychol 1948; 32: 221-33. doi:10.1037/h0057532.
Kincaid P, Fishburne RP, Rogers RL, Chissom BS. Derivation of New Readability Formulas (Automated Readability Index, Fog Count and Flesch Reading Ease Formula) for Navy Enlisted Personnel. 1975. Institute for Simulation and Training. 56. https://stars.library.ucf.edu/istlibrary/56 Accessed on 10 January, 2024
Jindal P, MacDermid JC. Assessing reading levels of health information: uses and limitations of flesch formula. Educ Health (Abingdon) 2017; 30: 84-8. doi:10.4103/1357- 6283.210517.
McLaughlin GH. SMOG grading: A new readability formula. J Read 1969; 12: 639-46.
Nath S, Marie A, Ellershaw S, et al. New meaning for NLP: the trials and tribulations of natural language processing with GPT-3 in ophthalmology. Br J Ophthalmol 2022; 106: 889-92. doi:10.1136/bjophthalmol-2022-321141.
Kung TH, Cheatham M, Medenilla A, et al. Performance of ChatGPT on USMLE: Potential for AI-assisted medical education using large language models. PLOS Digit Health 2023; 2: e0000198. doi:10.1371/journal.pdig.0000198.
Flesch RF. Art of readable writing. Pennsylvania: The Haddon Craftsmen, 1949.
Hamat A, Jaludin A, Mohd-Dom TN et al. Diabetes in the news: readability analysis of malaysian diabetes corpus. Int J Environ Res Public Health 2022; 19:6802. doi:10.3390/ ijerph19116802

There are 17 citations in total.

Details

Primary Language	English
Subjects	Surgery (Other)
Journal Section	Original Research
Authors	Didem Dizdar Yigit 0000-0001-7309-3293 Aslan Aykut 0000-0001-5426-1992 Mehmet Orkun Sevik 0000-0001-7130-4798 Eren Çerman This is me 0000-0002-8681-9214
Publication Date	October 30, 2024
Submission Date	April 20, 2024
Acceptance Date	May 22, 2024
Published in Issue	Year 2024

Cite

APA	Dizdar Yigit, D., Aykut, A., Sevik, M. O., Çerman, E. (2024). ChatGPT versus strabismus specialist on common questions about strabismus management: a comparative analysis of appropriateness and readability. Marmara Medical Journal, 37(3), 323-326. https://doi.org/10.5472/marumj.1571218
AMA	Dizdar Yigit D, Aykut A, Sevik MO, Çerman E. ChatGPT versus strabismus specialist on common questions about strabismus management: a comparative analysis of appropriateness and readability. Marmara Med J. October 2024;37(3):323-326. doi:10.5472/marumj.1571218
Chicago	Dizdar Yigit, Didem, Aslan Aykut, Mehmet Orkun Sevik, and Eren Çerman. “ChatGPT Versus Strabismus Specialist on Common Questions about Strabismus Management: A Comparative Analysis of Appropriateness and Readability”. Marmara Medical Journal 37, no. 3 (October 2024): 323-26. https://doi.org/10.5472/marumj.1571218.
EndNote	Dizdar Yigit D, Aykut A, Sevik MO, Çerman E (October 1, 2024) ChatGPT versus strabismus specialist on common questions about strabismus management: a comparative analysis of appropriateness and readability. Marmara Medical Journal 37 3 323–326.
IEEE	D. Dizdar Yigit, A. Aykut, M. O. Sevik, and E. Çerman, “ChatGPT versus strabismus specialist on common questions about strabismus management: a comparative analysis of appropriateness and readability”, Marmara Med J, vol. 37, no. 3, pp. 323–326, 2024, doi: 10.5472/marumj.1571218.
ISNAD	Dizdar Yigit, Didem et al. “ChatGPT Versus Strabismus Specialist on Common Questions about Strabismus Management: A Comparative Analysis of Appropriateness and Readability”. Marmara Medical Journal 37/3 (October 2024), 323-326. https://doi.org/10.5472/marumj.1571218.
JAMA	Dizdar Yigit D, Aykut A, Sevik MO, Çerman E. ChatGPT versus strabismus specialist on common questions about strabismus management: a comparative analysis of appropriateness and readability. Marmara Med J. 2024;37:323–326.
MLA	Dizdar Yigit, Didem et al. “ChatGPT Versus Strabismus Specialist on Common Questions about Strabismus Management: A Comparative Analysis of Appropriateness and Readability”. Marmara Medical Journal, vol. 37, no. 3, 2024, pp. 323-6, doi:10.5472/marumj.1571218.
Vancouver	Dizdar Yigit D, Aykut A, Sevik MO, Çerman E. ChatGPT versus strabismus specialist on common questions about strabismus management: a comparative analysis of appropriateness and readability. Marmara Med J. 2024;37(3):323-6.