Research Article
BibTex RIS Cite

MAKİNE ÖĞRENMESİ TEKNİKLERİYLE HASTALIK SINIFLANDIRMASI: RANDOM FOREST, K-NEAREST NEIGHBOUR VE ADABOOST ALGORİTMALARI UYGULAMASI

Year 2020, Volume: 6 Issue: 2, 275 - 286, 29.08.2020

Abstract

Amaç: Bu çalışmada, sağlık yönetiminde etkinliği sağlamak üzere, hastalıkların doğru olarak teşhisinde makine öğrenmesi tekniklerinin başarısının karşılaştırılması amaçlanmıştır. Veri Seti ve Yöntem: Çalışmada, Vanderbilt Üniversitesi tarafından çeşitli hastalıkların risk faktörlerinin yaygınlığını anlamak için gerçekleştirilen ve kamuya açık, 390 hastaya ait 15 değişkenden oluşan veri seti kullanılmıştır. Modelin eğitilmesi ve testi amacıyla, veri setinin %70’i eğitim, %30’u test kümelerine bölünmüştür. Random Forest (RF), K-Nearest Neighbour (KNN) ve AdaBoost algoritmaları kullanılarak sınıflandırma performansları karşılaştırılmıştır. Bulgular: Çalışma sonucunda, RF ve KNN algoritmaları sınıflandırma başarısının %92,30 ve AdaBoost algoritması ile gerçekleştirilen sınıflandırma başarısının ise %90,59 olduğu tespit edilmiştir. Sonuç: Yapay zekâ ve makine öğrenmesi yöntemlerinin sağlık yönetimi ve hizmetleri alanındaki kullanımı gün geçtikçe artmaktadır. Çalışmamızda, hastalıkların doğru olarak teşhisi amacıyla kullanılan algoritmalarla %90’ın üzerinde doğru sınıflandırma başarısı elde edilmiştir. Bu durum, teşhis ve tedavi süreçlerinde insan kaynaklı hataları azaltmak ve medikal karar süreçlerine destek amacıyla, makine öğrenmesi tekniklerine başvurulabileceğini göstermektedir.

References

  • ADA. (2010). Diagnosis and classification of es mellitus. es Care, 33(1): 62-69.
  • ADA. (2014). Standards of medical care in. es Care, 37(1): 14-80.
  • ADA (2017). Economic Costs of Diabetes in the U.S. in 2017. ADA. doi: https://doi.org/10.2337/dci18-0007.
  • ADA. (2019). Cardiovascular Disease and Risk Management: Standards of Medical Care in Diabetes. Diabetes Care, 42 (1):103-123 | https://doi.org/10.2337/dc19S010 (02.04.2020).
  • Alonso, DH., Wernick, MN., Yang, Y., Germano, G., Berman, DS., Slmoka, P. (2018). Prediction of cardiac death after adenosine myocardial perfusion SPECT based on machine learning. J Nucl Cardiol. https://doi.org/10.1007/s12350-017-0924-x (02.02.2020).
  • Araújo F.H.D. et al. (2016). Using machine learning to support healthcare professionals in making pre authorization decisions. International Journal of Medical Informatics, 94:1–7.
  • Bates, DW., Saria, S., Ohno-Machado, L., Shah, A., Escobar, G., (2014). Big data in healthcare: using analytics to identify and manage high-risk and high-cost patients. Health Aff, 33: 1123-1131.
  • Breiman, L. (2001). Random forest. Mach. Learn, 45: 5–32. doi: 10.1023/A:1010933404324.
  • Char, DS., Shah, NH., Magnus, D. (2018). Implementing Machine Learning in Health Care Addressing Ethical Challenges. N. Engl. J. Med., 378: 981–983.
  • Chen, P. and Pan, C. (2018). Diabetes classification model based on boosting algorithms. BMC, 19:109 https://doi.org/10.1186/s12859-018-2090-9 (14.03.2020).
  • Celebi, V., Inal, A. (2019). Problem of Ethics in the Context of Artificial Intelligence. The Journal of International Social Research, 12, 66.
  • Chubak, J., Pocobelli, G., Weiss, NS. (2012). Trade-offs between accuracy measures for electronic healthcare data algorithms. J Clin Epidemiol, 65(3):343–349.e2.
  • Cichosz, SL., Johansen, MD., Hejlesen, O. (2015). Toward big data analytics: review of predictive models in management of es and its complications. J es Sci Technol, 10(1):27-34.
  • Cover, T. and Hart, P. (1967). Nearest neighbor pattern classification,"Information Theory, IEEE Transactions, 13: 21-27.
  • Esteva, A., Kupre, B., Novoa, RA., et al. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature; 542:115–8
  • Freund, Y and Schapire, RE. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139.
  • Glauber, H., Karnieli, E. (2013) Preventing type 2 es mellitus: a call for personalized intervention. Perm J, 17(3): 74-9
  • Hayran, O. (2012). Sağlık Yönetimi Yazıları. Sage Yayıncılık: Ankara.
  • IDF. Atlas. (2013). 6th edition, http://www.idf.org/esatlas (14.03.2020).
  • Islam, T., Raihan, M., Farzana, F. et al. (2019.) An Empirical Study on es Mellitus Prediction for Typical and Non-Typical Cases using Machine Learning Approaches. 10th ICCCNT 2019. Kanpur, India.
  • Jiang, Y and Zhou, ZH. (2004). Editing training data for kNN classifiers with neural network ensemble. Lect. Notes Comput. Sci. 3173: 356–361. doi: 10.1007/978-3-540-28647-9_60.
  • Kalra, A., Chakraborty, A., Fine, B., Reicher, J. (2020). Machine Learning for Automation of Radiology Protocols for Quality and Efficiency Improvement. J Am Coll Radiol. doi: 10.1016/j.jacr.2020.03.012.
  • Kaur, H., Kumari, V. (2018). Predictive modelling and analytics for diabetes using a machine learning approach. Applied Computing and Informatics https://doi.org/10.1016/j.aci.2018.12.004 (05.03.2020).
  • Kavakiotis, I. et al. (2017). Machine Learning and Data Mining Methods in Diabetes Research. Computational and Structural Biotechnology, 15: 104–116.
  • Kononenko, I. (2001). Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in Medicine, vol. 23, no. 1, pp. 89-109
  • Liao, Z., Ju, Y., and Zou, Q. (2016). Prediction of G protein-coupled receptors with SVM-Prot features and random forest. Scientifica, 8309253. doi: 10.1155/2016/8309253.
  • Liu, L., Zhang, C., Zhang, G. et al. (2020). A study of aortic dissection screening method based on multiple machine learning models. J Thorac Dis, 12(3):605-614. doi: 10.21037/jtd.2019.12.119.
  • Maniruzzaman, M., Rahman, MJ., Al-Mehedi Hasan, M. et al. (2018). Accurate es Risk Stratification Using Machine Learning: Role of Missing Value and Outliers. J Med Syst, 10;42(5):92. doi: 10.1007/s10916-018-0940-7.
  • Mercaldo, F., Nardone, V., Santone, A. (2017). Diabetes Mellitus Affected Patients Classification and Diagnosis through Machine Learning Techniques. Procedia Computer Science, 112: 2519-228. Mujumdar, A., Vaidehi, V. (2019). Dibetes Prediction Using Machine Learning Algorithms. Procedia Computer Science, 165: 292–299.
  • Narula, S., Shameer, K., Salem Omar, AM., Dudley, JT., Sengupta, PP. (2017) Reply: Deep learning with unsupervised feature in echocardiographic imaging. J Am Coll Cardiol;69:2101–2.
  • Parikh, R.B., Kakad, M., Bates, DW. (2016). Integrating predictive analytics into high-value care: the dawn of precision delivery. JAMA, 315: 651-652.
  • Rodriguez, G. et al. (2019). Predicting Healthcare Costs of Diabetes Using Machine Learning Models. Elsevier Inc., doi: https://doi.org/10.1016/j.jval.2019.09.903 (05.04.2020).
  • Satman, I., Omer, B., Tutuncu, Y., Kalaca, S., Gedik, S., Dinsssccag N, Karsidag, K. & TURDEP-II Study Group. (2013). Twelve-year trends in the prevalence and risk factors of es and prees in Turkish adults. Eur J Epidemiol, 28(2):169-180.
  • Shameer K, Johnson KW, Glicksberg BS, Dudley JT, Sengupta PP. (2017). Machine learning in cardiovascular medicine: Are we there yet? https://doi.org/10.1136/heartjnl-2017-311198 (05.04.2020).
  • Soyiri, NI., Reidpath, DD. (2013). An overview of health forecasting. Environ Health Prev Med 18:1–9. DOI 10.1007/s12199-012-0294-6.
  • Thimbleby H. (2013). Technology and the future of healthcare. Journal of Public Health Research; 2:e28.
  • Tran, BX,. Latkin, CA., Giang, VT., et al. (2019). The Current Research Landscape of the Application of Artificial Intelligence in Managing Cerebrovascular and Heart Diseases: A Bibliometric and Content Analysis. Int. J. Environ. Res. Public Health, 16:2699.
  • Vishwanath, M., Jafarlou, S., Shin, I. et al. (2020). Investigation of Machine Learning Approaches for Traumatic Brain Injury Classification via EEG Assessment in Mice. Sensors (Basel), 20(7). doi: 10.3390/s20072027.
  • Wang, C., Long, Y., Li, W. et al. (2020). Exploratory study on classification of lung cancer subtypes through a combined K-nearest neighbor classifer in breathomics. Sci Rep, 3;10(1):5880. doi: 10.1038/s41598-020-62803-4.
  • WHO. (2020). Diabetes. https://www.who.int/health-topics/diabetes (14.03.2020).
  • Woldaregaya, AZ. et al. (2019). Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes Artificial Intelligence in Medicine, 98: 109–134.

DISEASE CLASSIFICATION BY MACHINE LEARNING TECHNIQUES: RANDOM FOREST, K-NEAREST NEIGHBOR AND ADABOOST ALGORITHMS APPLICATIONS

Year 2020, Volume: 6 Issue: 2, 275 - 286, 29.08.2020

Abstract

Objective: The aim of this study is to compare the correct classification rates of different machine learning algorithms in the detection of diseases in order to ensure the effectiveness in accurate diagnosis and health management.
Data Set and Method: In our study, we utilized from a research conducted by Vanderbilt University. The aim of the research was to understand the prevalence of risk factors for various diseases. An open access data set of 15 variables of 390 patients belong to that research was used in this study. For the purpose of training and testing of the model, 70% of the data set is divided into training and 30% into test sets. Classification performances were compared using Random Forest (RF), K-Nearest Neighbor (KNN) and AdaBoost algorithms.
Results: As a result of the study, it was observed that the classification success of the RF and KNN algorithms was 92.30% and the classification success of the AdaBoost algorithm was 90.59%.
Conclusion: Artificial intelligence and machine learning methods are used more frequently in the field of health management and services. In our study, the test success was achieved over 90% by using different algorithms. Machine learning techniques can be applied in issues such as reducing human errors in diagnosis and treatment processes and providing support in medical decision making processes.

References

  • ADA. (2010). Diagnosis and classification of es mellitus. es Care, 33(1): 62-69.
  • ADA. (2014). Standards of medical care in. es Care, 37(1): 14-80.
  • ADA (2017). Economic Costs of Diabetes in the U.S. in 2017. ADA. doi: https://doi.org/10.2337/dci18-0007.
  • ADA. (2019). Cardiovascular Disease and Risk Management: Standards of Medical Care in Diabetes. Diabetes Care, 42 (1):103-123 | https://doi.org/10.2337/dc19S010 (02.04.2020).
  • Alonso, DH., Wernick, MN., Yang, Y., Germano, G., Berman, DS., Slmoka, P. (2018). Prediction of cardiac death after adenosine myocardial perfusion SPECT based on machine learning. J Nucl Cardiol. https://doi.org/10.1007/s12350-017-0924-x (02.02.2020).
  • Araújo F.H.D. et al. (2016). Using machine learning to support healthcare professionals in making pre authorization decisions. International Journal of Medical Informatics, 94:1–7.
  • Bates, DW., Saria, S., Ohno-Machado, L., Shah, A., Escobar, G., (2014). Big data in healthcare: using analytics to identify and manage high-risk and high-cost patients. Health Aff, 33: 1123-1131.
  • Breiman, L. (2001). Random forest. Mach. Learn, 45: 5–32. doi: 10.1023/A:1010933404324.
  • Char, DS., Shah, NH., Magnus, D. (2018). Implementing Machine Learning in Health Care Addressing Ethical Challenges. N. Engl. J. Med., 378: 981–983.
  • Chen, P. and Pan, C. (2018). Diabetes classification model based on boosting algorithms. BMC, 19:109 https://doi.org/10.1186/s12859-018-2090-9 (14.03.2020).
  • Celebi, V., Inal, A. (2019). Problem of Ethics in the Context of Artificial Intelligence. The Journal of International Social Research, 12, 66.
  • Chubak, J., Pocobelli, G., Weiss, NS. (2012). Trade-offs between accuracy measures for electronic healthcare data algorithms. J Clin Epidemiol, 65(3):343–349.e2.
  • Cichosz, SL., Johansen, MD., Hejlesen, O. (2015). Toward big data analytics: review of predictive models in management of es and its complications. J es Sci Technol, 10(1):27-34.
  • Cover, T. and Hart, P. (1967). Nearest neighbor pattern classification,"Information Theory, IEEE Transactions, 13: 21-27.
  • Esteva, A., Kupre, B., Novoa, RA., et al. (2017). Dermatologist-level classification of skin cancer with deep neural networks. Nature; 542:115–8
  • Freund, Y and Schapire, RE. (1997). A decision-theoretic generalization of on-line learning and an application to boosting. Journal of Computer and System Sciences, 55(1):119–139.
  • Glauber, H., Karnieli, E. (2013) Preventing type 2 es mellitus: a call for personalized intervention. Perm J, 17(3): 74-9
  • Hayran, O. (2012). Sağlık Yönetimi Yazıları. Sage Yayıncılık: Ankara.
  • IDF. Atlas. (2013). 6th edition, http://www.idf.org/esatlas (14.03.2020).
  • Islam, T., Raihan, M., Farzana, F. et al. (2019.) An Empirical Study on es Mellitus Prediction for Typical and Non-Typical Cases using Machine Learning Approaches. 10th ICCCNT 2019. Kanpur, India.
  • Jiang, Y and Zhou, ZH. (2004). Editing training data for kNN classifiers with neural network ensemble. Lect. Notes Comput. Sci. 3173: 356–361. doi: 10.1007/978-3-540-28647-9_60.
  • Kalra, A., Chakraborty, A., Fine, B., Reicher, J. (2020). Machine Learning for Automation of Radiology Protocols for Quality and Efficiency Improvement. J Am Coll Radiol. doi: 10.1016/j.jacr.2020.03.012.
  • Kaur, H., Kumari, V. (2018). Predictive modelling and analytics for diabetes using a machine learning approach. Applied Computing and Informatics https://doi.org/10.1016/j.aci.2018.12.004 (05.03.2020).
  • Kavakiotis, I. et al. (2017). Machine Learning and Data Mining Methods in Diabetes Research. Computational and Structural Biotechnology, 15: 104–116.
  • Kononenko, I. (2001). Machine learning for medical diagnosis: history, state of the art and perspective. Artificial Intelligence in Medicine, vol. 23, no. 1, pp. 89-109
  • Liao, Z., Ju, Y., and Zou, Q. (2016). Prediction of G protein-coupled receptors with SVM-Prot features and random forest. Scientifica, 8309253. doi: 10.1155/2016/8309253.
  • Liu, L., Zhang, C., Zhang, G. et al. (2020). A study of aortic dissection screening method based on multiple machine learning models. J Thorac Dis, 12(3):605-614. doi: 10.21037/jtd.2019.12.119.
  • Maniruzzaman, M., Rahman, MJ., Al-Mehedi Hasan, M. et al. (2018). Accurate es Risk Stratification Using Machine Learning: Role of Missing Value and Outliers. J Med Syst, 10;42(5):92. doi: 10.1007/s10916-018-0940-7.
  • Mercaldo, F., Nardone, V., Santone, A. (2017). Diabetes Mellitus Affected Patients Classification and Diagnosis through Machine Learning Techniques. Procedia Computer Science, 112: 2519-228. Mujumdar, A., Vaidehi, V. (2019). Dibetes Prediction Using Machine Learning Algorithms. Procedia Computer Science, 165: 292–299.
  • Narula, S., Shameer, K., Salem Omar, AM., Dudley, JT., Sengupta, PP. (2017) Reply: Deep learning with unsupervised feature in echocardiographic imaging. J Am Coll Cardiol;69:2101–2.
  • Parikh, R.B., Kakad, M., Bates, DW. (2016). Integrating predictive analytics into high-value care: the dawn of precision delivery. JAMA, 315: 651-652.
  • Rodriguez, G. et al. (2019). Predicting Healthcare Costs of Diabetes Using Machine Learning Models. Elsevier Inc., doi: https://doi.org/10.1016/j.jval.2019.09.903 (05.04.2020).
  • Satman, I., Omer, B., Tutuncu, Y., Kalaca, S., Gedik, S., Dinsssccag N, Karsidag, K. & TURDEP-II Study Group. (2013). Twelve-year trends in the prevalence and risk factors of es and prees in Turkish adults. Eur J Epidemiol, 28(2):169-180.
  • Shameer K, Johnson KW, Glicksberg BS, Dudley JT, Sengupta PP. (2017). Machine learning in cardiovascular medicine: Are we there yet? https://doi.org/10.1136/heartjnl-2017-311198 (05.04.2020).
  • Soyiri, NI., Reidpath, DD. (2013). An overview of health forecasting. Environ Health Prev Med 18:1–9. DOI 10.1007/s12199-012-0294-6.
  • Thimbleby H. (2013). Technology and the future of healthcare. Journal of Public Health Research; 2:e28.
  • Tran, BX,. Latkin, CA., Giang, VT., et al. (2019). The Current Research Landscape of the Application of Artificial Intelligence in Managing Cerebrovascular and Heart Diseases: A Bibliometric and Content Analysis. Int. J. Environ. Res. Public Health, 16:2699.
  • Vishwanath, M., Jafarlou, S., Shin, I. et al. (2020). Investigation of Machine Learning Approaches for Traumatic Brain Injury Classification via EEG Assessment in Mice. Sensors (Basel), 20(7). doi: 10.3390/s20072027.
  • Wang, C., Long, Y., Li, W. et al. (2020). Exploratory study on classification of lung cancer subtypes through a combined K-nearest neighbor classifer in breathomics. Sci Rep, 3;10(1):5880. doi: 10.1038/s41598-020-62803-4.
  • WHO. (2020). Diabetes. https://www.who.int/health-topics/diabetes (14.03.2020).
  • Woldaregaya, AZ. et al. (2019). Data-driven modeling and prediction of blood glucose dynamics: Machine learning applications in type 1 diabetes Artificial Intelligence in Medicine, 98: 109–134.
There are 41 citations in total.

Details

Primary Language Turkish
Journal Section Articles
Authors

Ülkü Veranyurt This is me 0000-0003-4838-3373

Ahmet Deveci

M. Fevzi Esen This is me 0000-0002-3044-8397

Ozan Veranyurt This is me 0000-0003-3652-2356

Publication Date August 29, 2020
Acceptance Date August 12, 2020
Published in Issue Year 2020 Volume: 6 Issue: 2

Cite

APA Veranyurt, Ü., Deveci, A., Esen, M. F., Veranyurt, O. (2020). MAKİNE ÖĞRENMESİ TEKNİKLERİYLE HASTALIK SINIFLANDIRMASI: RANDOM FOREST, K-NEAREST NEIGHBOUR VE ADABOOST ALGORİTMALARI UYGULAMASI. Uluslararası Sağlık Yönetimi Ve Stratejileri Araştırma Dergisi, 6(2), 275-286.
AMA Veranyurt Ü, Deveci A, Esen MF, Veranyurt O. MAKİNE ÖĞRENMESİ TEKNİKLERİYLE HASTALIK SINIFLANDIRMASI: RANDOM FOREST, K-NEAREST NEIGHBOUR VE ADABOOST ALGORİTMALARI UYGULAMASI. USAYSAD. August 2020;6(2):275-286.
Chicago Veranyurt, Ülkü, Ahmet Deveci, M. Fevzi Esen, and Ozan Veranyurt. “MAKİNE ÖĞRENMESİ TEKNİKLERİYLE HASTALIK SINIFLANDIRMASI: RANDOM FOREST, K-NEAREST NEIGHBOUR VE ADABOOST ALGORİTMALARI UYGULAMASI”. Uluslararası Sağlık Yönetimi Ve Stratejileri Araştırma Dergisi 6, no. 2 (August 2020): 275-86.
EndNote Veranyurt Ü, Deveci A, Esen MF, Veranyurt O (August 1, 2020) MAKİNE ÖĞRENMESİ TEKNİKLERİYLE HASTALIK SINIFLANDIRMASI: RANDOM FOREST, K-NEAREST NEIGHBOUR VE ADABOOST ALGORİTMALARI UYGULAMASI. Uluslararası Sağlık Yönetimi ve Stratejileri Araştırma Dergisi 6 2 275–286.
IEEE Ü. Veranyurt, A. Deveci, M. F. Esen, and O. Veranyurt, “MAKİNE ÖĞRENMESİ TEKNİKLERİYLE HASTALIK SINIFLANDIRMASI: RANDOM FOREST, K-NEAREST NEIGHBOUR VE ADABOOST ALGORİTMALARI UYGULAMASI”, USAYSAD, vol. 6, no. 2, pp. 275–286, 2020.
ISNAD Veranyurt, Ülkü et al. “MAKİNE ÖĞRENMESİ TEKNİKLERİYLE HASTALIK SINIFLANDIRMASI: RANDOM FOREST, K-NEAREST NEIGHBOUR VE ADABOOST ALGORİTMALARI UYGULAMASI”. Uluslararası Sağlık Yönetimi ve Stratejileri Araştırma Dergisi 6/2 (August 2020), 275-286.
JAMA Veranyurt Ü, Deveci A, Esen MF, Veranyurt O. MAKİNE ÖĞRENMESİ TEKNİKLERİYLE HASTALIK SINIFLANDIRMASI: RANDOM FOREST, K-NEAREST NEIGHBOUR VE ADABOOST ALGORİTMALARI UYGULAMASI. USAYSAD. 2020;6:275–286.
MLA Veranyurt, Ülkü et al. “MAKİNE ÖĞRENMESİ TEKNİKLERİYLE HASTALIK SINIFLANDIRMASI: RANDOM FOREST, K-NEAREST NEIGHBOUR VE ADABOOST ALGORİTMALARI UYGULAMASI”. Uluslararası Sağlık Yönetimi Ve Stratejileri Araştırma Dergisi, vol. 6, no. 2, 2020, pp. 275-86.
Vancouver Veranyurt Ü, Deveci A, Esen MF, Veranyurt O. MAKİNE ÖĞRENMESİ TEKNİKLERİYLE HASTALIK SINIFLANDIRMASI: RANDOM FOREST, K-NEAREST NEIGHBOUR VE ADABOOST ALGORİTMALARI UYGULAMASI. USAYSAD. 2020;6(2):275-86.