Text Mining Method in the Field of Health

Selçuk Toplu; Şengül Cangür

doi:10.18521/ktd.700789

Research Article

Sağlık Alanında Metin Madenciliği Yöntemi

Year 2020, , 236 - 246, 04.06.2020

Selçuk Toplu Şengül Cangür

https://doi.org/10.18521/ktd.700789

Abstract

Amaç: Metinsel verileri sayısal hale getirerek veri madenciliği algoritmalarına uygulanmasını sağlayan metin madenciliği, günümüz dünyasında önemli bir yere sahiptir. Bu çalışmanın amacı, metin madenciliği yöntemini tanıtmak ve sağlık alanında belirlenen bir konuda uygulamasını göstermektir.
Gereç ve Yöntem: Çalışmanın uygulama aşamasında; insan-ve-kanser” ve fare- ve-kanser” şeklinde belirlenen iki farklı konu başlığı altında en sık kullanılan Pubmed veritabanından ayrı ayrı elde edilen dokümanlara ve daha sonra birleştirilmiş dokümanlara Knime programı aracılığıyla metin madenciliği yöntemi uygulanmış ve K nearest neighbor (K-NN) algoritması kullanılarak doküman sınıflaması yapılmıştır.
Bulgular: Etiket bulut grafiklerinde öne çıkan kelimeler “cell” (hücre) ve “cancer” (kanser) kelimeleridir. Her iki dokümanda frekans değeri yüksek çıkan “cell”, “cancer”, “tumor”, “patient” gibi kelimelerin veriler birleştirildikten sonra yapılan analizde de yüksek oranla çıktığı gözlenmiştir. 600 adet test dokümanının 255 tanesi insan-ve-kanser sınıfına, geri kalanının ise fare-ve-kanser sınıfına ait oldukları; F ölçütüne göre insan-ve-kanser dokümanları için %56,6’lık, fare-ve-kanser dokümanları için ise %62,6’lık doğru sınıflandırılma yüzdesi tespit edilmiştir. K-NN algoritması ile %59,8 oranında kısmen başarılı bir doküman sınıflama tahmini yapıldığı ancak Cohen kappa değerinin %19,7 olduğu ve bu uyumun zayıf düzeyde olduğu belirlenmiştir.
Sonuç: Dijital ve basılı dokümanların sayısının oldukça fazla olduğu sağlık alanında hızlı ve güvenilir bir şekilde bilgi elde edebilmek için metin madenciliği yönteminden yararlanılması ve kullanımının yaygınlaştırılması önerilmektedir.

Keywords

Metin Madenciliği, Sınıflandırma, Doğal Lisan İşleme, Pubmed

References

1. Cerrito P. Inside text mining. Text mining provides a powerful diagnosis of hospital quality rankings. Health Manag Technol. 2004; 25(3): 28-31.
2. Visa A. Technology of text mining. In: Perner P, editor. Machine learning and data mining in pattern recognition. MLDM 2001. Lecture Notes in Computer Science, vol 2123. Berlin, Heidelberg: Springer; 2001. p.1-11.
3. Sehgal AK. Text mining: the search for novelty in text [PhD dissertation]. Iowa: The University of Iowa, Department of Computer Science; 2004.
4. Thompson P, Batista-Navarro RT, Kontonatsios G, Carter J, Toon E, McNaught J, et al. Text mining the history of medicine. PLoS ONE. 2016; 11(1): e0144717. https://doi.org/10.1371/journal.pone.0144717.
5. Losiewicz P, Oard DW, Kostoff RN. Textual data mining to support science and technology management. Journal of Intelligent Information Systems. 2000; 15(2): 99-119.
6. Mahgoub H, Rösner D, Ismail N, Torkey F. A text mining technique using association rules extraction. International Journal of Computational Intelligence. 2007; 4(1): 21-8.
7. Hao H, Zhang K. The voice of Chinese health consumers: a text mining approach to web-based physician reviews. J Med Internet Res. 2016; 18(5): e108. doi: 10.2196/jmir.4430.
8. Lam C, Lai FC, Wang CH, Lai MH, Hsu N, Chung MH. Text mining of journal articles for sleep disorder terminologies. Plos One. 2016; 11(5): e0156031. doi: 10.1371/journal.pone.0156031.
9. Hsiao YW, Lu TP. Text-mining in cancer research may help identify effective treatments. Transl Lung Cancer Res 2019; 8(Suppl 4): S460-3. doi: 10.21037/ tlcr.2019.12.20.
10. Jahanbin K, Rahmanian F, Rahmanian V, Jahromi AS. Application of twitter and web news mining in infectious disease surveillance systems and prospects for public health. GMS Hyg Infect Control. 2019; 14: Doc19. doi: 10.3205/dgkh000334. eCollection 2019.
11. Lebowitz A, Kotani K, Matsuyama Y, Matsumura M. Using text mining to analyze reflective essays from Japanese medical students after rural community placement. BMC Med Educ. 2020; 20(1): 38. doi: 10.1186/s12909-020-1951-x.
12. Sahin K, Durdagi S. Identifying new piperazine-based PARP1 inhibitors using text mining and integrated molecular modeling approaches. J Biomol Struct Dyn. 2020; 1-10. doi: 10.1080/07391102.2020.1715262.
13. Yu Z, He Q, Xu G. Screening of prognostic factors in early-onset breast cancer. Technol Cancer Res Treat. 2020; 19: 1533033819893670. doi: 10.1177/1533033819893670.
14. Soucy P, Mineau W. Beyond TFIDF weighting for text categorization in the vector space model. Proceedings of the 19th International Joint Conference on Artificial Intelligence; July 30-August 2005; Edinburgh-Scotland. San Francisco, CA: Morgan Kaufmann Publishers Inc; 2005. p. 1130-5.
15. Miner G, Delen D, Elder J, Fast A, Hill T, Nisbet RA. Practical text mining and statistical analysis for non-structured text data applications. San Francisco, USA: Academic Press; 2012.
16. Kaşıkçı T, Gökçen H. Metin madenciliği ile e-ticaret sitelerinin belirlenmesi. Bilişim Teknolojileri Dergisi. 2014; 7(1): 25-32.
17. Hotho A, Nürnberger A, Paaß G. A brief survey of text mining. GLDV-Journal for Computational Linguistics and Language Technology. 2005; 20(1): 19-62.
18. Akın AA, Akın MD. zemberek.googlecode.com [Internet]. Zemberek an open source NLP framework for Turkic languages [cited 2019 March]. Available from: http://zemberek.googlecode.com/.
19. tools.nlp.itu.edu.tr [Internet]. ITU Natural Language Processing Research Group [cited 2019 March]. Available from: http://tools.nlp.itu.edu.tr/.
20. kemik.yildiz.edu.tr [Internet]. YTU Kemik Natural Language Processing Group [cited 2019 March]. Available from: www.kemik.yildiz.edu.tr.
21. İlhan U. Application of KNN and FPTC Based text categorization algorithms to Turkish news reports [master’s thesis]. Ankara: Bilkent University, Institute of Engineering and Science; 2001.
22. Pilavcılar İF. Metin madenciliği ile metin sınıflandırma [yüksek lisans tezi]. İstanbul: Yıldız Teknik Üniversitesi, Fen Bilimleri Enstitüsü; 2007.
23. Kutlu F. Categorization in a hierarchically structured text database [master’s thesis]. Ankara: Bilkent University, Institute of Engineering and Science; 2001.
24. İlhan S, Duru N, Karagöz Ş, Sağır M. Metin madenciliği ile soru cevaplama sistemi. Elektronik ve Bilgisayar Mühendisliği Sempozyumu (ELECO) 2008; 26-30 Kasım 2008; Bursa. s. 356-9.
25. Çalış K, Gazdağı O, Yıldız O. Reklam ı̇çerikli epostaların metin madenciliği yöntemleri ile otomatik tespiti. Bilişim Teknolojileri Dergisi. 2013; 6(1): 1-7.
26. Knime.com [Internet]. About Knime home [cited 2019 March 22]. Available from: https://www.knime.com/about.
27. Warrens MJ. Five ways to look at Cohen’s kappa. Psychol Psychother. 2015, 5(4): 1-4. doi: 10.4172/2161-0487.1000197.

Text Mining Method in the Field of Health

Year 2020, , 236 - 246, 04.06.2020

Selçuk Toplu Şengül Cangür

https://doi.org/10.18521/ktd.700789

Abstract

Objective: Text mining which digitalizes textual data and enables them to be applied for text mining algorithms has very important place in the today’s world. The aim of this study was to introduce the text mining method and to show its application on a subject in the field of health.
Methods: The text mining method was applied to the individual documents obtained from the most commonly used Pubmed database and then the merged documents under two different titles as “human-and-cancer” and “mouse-and-cancer” through the Knime program, and the document classification was made using K nearest neighbor (K-NN) algorithm.
Results: The prominent words were “cell” and “cancer” in tag cloud graphs. In both documents, the words such as “cell”, “cancer”, “tumor”, “patient”, whose frequency values were high, were observed to be high rates in the analysis performed after the data was merged. It was found that 255 of 600 test documents belonged to the human-and-cancer class and the remaining belonged to the mouse-and-cancer class, and the accuracy classification was 56.6% for the human-and-cancer-documents and 62.6% for the mouse-and-cancer-documents according to the F-criteria. It was determined that the document classification estimation by the K-NN algorithm was relatively successful with a rate of 59.8% however Cohen’s kappa value was 19.7%, meaning that the fit was of slight level.
Conclusion: It was recommended to use the text mining method and to generalize its use in order to obtain information quickly and reliably in the health field where there were numerous digital and printed documents.

Keywords

Text Mining, Classification, Natural Language Processing, Pubmed

References

1. Cerrito P. Inside text mining. Text mining provides a powerful diagnosis of hospital quality rankings. Health Manag Technol. 2004; 25(3): 28-31.
2. Visa A. Technology of text mining. In: Perner P, editor. Machine learning and data mining in pattern recognition. MLDM 2001. Lecture Notes in Computer Science, vol 2123. Berlin, Heidelberg: Springer; 2001. p.1-11.
3. Sehgal AK. Text mining: the search for novelty in text [PhD dissertation]. Iowa: The University of Iowa, Department of Computer Science; 2004.
4. Thompson P, Batista-Navarro RT, Kontonatsios G, Carter J, Toon E, McNaught J, et al. Text mining the history of medicine. PLoS ONE. 2016; 11(1): e0144717. https://doi.org/10.1371/journal.pone.0144717.
5. Losiewicz P, Oard DW, Kostoff RN. Textual data mining to support science and technology management. Journal of Intelligent Information Systems. 2000; 15(2): 99-119.
6. Mahgoub H, Rösner D, Ismail N, Torkey F. A text mining technique using association rules extraction. International Journal of Computational Intelligence. 2007; 4(1): 21-8.
7. Hao H, Zhang K. The voice of Chinese health consumers: a text mining approach to web-based physician reviews. J Med Internet Res. 2016; 18(5): e108. doi: 10.2196/jmir.4430.
8. Lam C, Lai FC, Wang CH, Lai MH, Hsu N, Chung MH. Text mining of journal articles for sleep disorder terminologies. Plos One. 2016; 11(5): e0156031. doi: 10.1371/journal.pone.0156031.
9. Hsiao YW, Lu TP. Text-mining in cancer research may help identify effective treatments. Transl Lung Cancer Res 2019; 8(Suppl 4): S460-3. doi: 10.21037/ tlcr.2019.12.20.
10. Jahanbin K, Rahmanian F, Rahmanian V, Jahromi AS. Application of twitter and web news mining in infectious disease surveillance systems and prospects for public health. GMS Hyg Infect Control. 2019; 14: Doc19. doi: 10.3205/dgkh000334. eCollection 2019.
11. Lebowitz A, Kotani K, Matsuyama Y, Matsumura M. Using text mining to analyze reflective essays from Japanese medical students after rural community placement. BMC Med Educ. 2020; 20(1): 38. doi: 10.1186/s12909-020-1951-x.
12. Sahin K, Durdagi S. Identifying new piperazine-based PARP1 inhibitors using text mining and integrated molecular modeling approaches. J Biomol Struct Dyn. 2020; 1-10. doi: 10.1080/07391102.2020.1715262.
13. Yu Z, He Q, Xu G. Screening of prognostic factors in early-onset breast cancer. Technol Cancer Res Treat. 2020; 19: 1533033819893670. doi: 10.1177/1533033819893670.
14. Soucy P, Mineau W. Beyond TFIDF weighting for text categorization in the vector space model. Proceedings of the 19th International Joint Conference on Artificial Intelligence; July 30-August 2005; Edinburgh-Scotland. San Francisco, CA: Morgan Kaufmann Publishers Inc; 2005. p. 1130-5.
15. Miner G, Delen D, Elder J, Fast A, Hill T, Nisbet RA. Practical text mining and statistical analysis for non-structured text data applications. San Francisco, USA: Academic Press; 2012.
16. Kaşıkçı T, Gökçen H. Metin madenciliği ile e-ticaret sitelerinin belirlenmesi. Bilişim Teknolojileri Dergisi. 2014; 7(1): 25-32.
17. Hotho A, Nürnberger A, Paaß G. A brief survey of text mining. GLDV-Journal for Computational Linguistics and Language Technology. 2005; 20(1): 19-62.
18. Akın AA, Akın MD. zemberek.googlecode.com [Internet]. Zemberek an open source NLP framework for Turkic languages [cited 2019 March]. Available from: http://zemberek.googlecode.com/.
19. tools.nlp.itu.edu.tr [Internet]. ITU Natural Language Processing Research Group [cited 2019 March]. Available from: http://tools.nlp.itu.edu.tr/.
20. kemik.yildiz.edu.tr [Internet]. YTU Kemik Natural Language Processing Group [cited 2019 March]. Available from: www.kemik.yildiz.edu.tr.
21. İlhan U. Application of KNN and FPTC Based text categorization algorithms to Turkish news reports [master’s thesis]. Ankara: Bilkent University, Institute of Engineering and Science; 2001.
22. Pilavcılar İF. Metin madenciliği ile metin sınıflandırma [yüksek lisans tezi]. İstanbul: Yıldız Teknik Üniversitesi, Fen Bilimleri Enstitüsü; 2007.
23. Kutlu F. Categorization in a hierarchically structured text database [master’s thesis]. Ankara: Bilkent University, Institute of Engineering and Science; 2001.
24. İlhan S, Duru N, Karagöz Ş, Sağır M. Metin madenciliği ile soru cevaplama sistemi. Elektronik ve Bilgisayar Mühendisliği Sempozyumu (ELECO) 2008; 26-30 Kasım 2008; Bursa. s. 356-9.
25. Çalış K, Gazdağı O, Yıldız O. Reklam ı̇çerikli epostaların metin madenciliği yöntemleri ile otomatik tespiti. Bilişim Teknolojileri Dergisi. 2013; 6(1): 1-7.
26. Knime.com [Internet]. About Knime home [cited 2019 March 22]. Available from: https://www.knime.com/about.
27. Warrens MJ. Five ways to look at Cohen’s kappa. Psychol Psychother. 2015, 5(4): 1-4. doi: 10.4172/2161-0487.1000197.

There are 27 citations in total.

Details

Primary Language	English
Subjects	Health Care Administration
Journal Section	Articles
Authors	Selçuk Toplu This is me 0000-0003-0446-0226 Şengül Cangür 0000-0002-0732-8952
Publication Date	June 4, 2020
Acceptance Date	April 3, 2020
Published in Issue	Year 2020

Cite

APA	Toplu, S., & Cangür, Ş. (2020). Text Mining Method in the Field of Health. Konuralp Medical Journal, 12(2), 236-246. https://doi.org/10.18521/ktd.700789
AMA	Toplu S, Cangür Ş. Text Mining Method in the Field of Health. Konuralp Medical Journal. June 2020;12(2):236-246. doi:10.18521/ktd.700789
Chicago	Toplu, Selçuk, and Şengül Cangür. “Text Mining Method in the Field of Health”. Konuralp Medical Journal 12, no. 2 (June 2020): 236-46. https://doi.org/10.18521/ktd.700789.
EndNote	Toplu S, Cangür Ş (June 1, 2020) Text Mining Method in the Field of Health. Konuralp Medical Journal 12 2 236–246.
IEEE	S. Toplu and Ş. Cangür, “Text Mining Method in the Field of Health”, Konuralp Medical Journal, vol. 12, no. 2, pp. 236–246, 2020, doi: 10.18521/ktd.700789.
ISNAD	Toplu, Selçuk - Cangür, Şengül. “Text Mining Method in the Field of Health”. Konuralp Medical Journal 12/2 (June 2020), 236-246. https://doi.org/10.18521/ktd.700789.
JAMA	Toplu S, Cangür Ş. Text Mining Method in the Field of Health. Konuralp Medical Journal. 2020;12:236–246.
MLA	Toplu, Selçuk and Şengül Cangür. “Text Mining Method in the Field of Health”. Konuralp Medical Journal, vol. 12, no. 2, 2020, pp. 236-4, doi:10.18521/ktd.700789.
Vancouver	Toplu S, Cangür Ş. Text Mining Method in the Field of Health. Konuralp Medical Journal. 2020;12(2):236-4.

Article Files

Full Text