Predicting COVID-19 Infection Using Machine Learning Methods Combined with Feature Selection

Umut Ahmet Çetin; Fatih Abut

doi:10.31590/ejosat.1132337

Araştırma Makalesi

COVID-19 Enfeksiyonunun Nitelik Seçme ile Birleştirilmiş Makine Öğrenmesi Yöntemleriyle Tahmin Edilmesi

Yıl 2022, Sayı: 37, 52 - 58, 15.07.2022

Umut Ahmet Çetin , Fatih Abut

https://doi.org/10.31590/ejosat.1132337

Öz

COVID-19, 31 Aralık 2019'dan itibaren dünyayı etkisi altına alan ve Mart 2020'de DSÖ tarafından pandemi ilan edilen bir enfeksiyondur. Bu çalışmada, yeni nitelik seçme tabanlı COVID-19 tahmin modelleri oluşturmak ve COVID-19 enfeksiyonunun tahmini için etkili değişkenleri ayırt etmek için minimum fazlalık maksimum önem (mRMR) ve Relief-F nitelik seçiciler ile ayrı ayrı birleştirilmiş Çok Katmanlı Algılayıcı (MLP), Tree Boost (TB), Radyal Temelli Fonksiyon Ağı (RBF), Destek Vektör Makinesi (SVM) ve K-Means Kümeleme (kMC) yöntemleri kullanılmıştır. Veri seti, 20.000 hasta (10.000 pozitif, 10.000 negatif) ile ilgili bilgileri içermektedir ve çeşitli kişisel, semptomatik ve asemptomatik değişkenlerden oluşmaktadır. Modellerin performansını değerlendirmek için doğruluk, duyarlılık ve F1-Skor metrikleri kullanılmıştır ve modellerin genelleme hataları 10 katlı çapraz doğrulama ile değerlendirilmiştir. Sonuçlar, bir hastanın COVID-19 enfeksiyonunu tahmin etmede mRMR’ın ortalama performansının Relief-F’den biraz daha iyi olduğunu göstermektedir. Ek olarak, mRMR’ın, COVID-19 tahmin değişkenlerinin göreceli alaka sırasını bulmada Relief-F algoritmasından daha başarılı olduğu gözlemlenmiştir. mRMR algoritması ateş ve öksürük gibi semptomatik değişkenleri vurgularken, Relief-F algoritması yaş ve ırk gibi asemptomatik değişkenleri öne çıkarmaktadır. Ayrıca, genel olarak MLP’nin COVID-19 enfeksiyonunu tahmin etmede diğer tüm sınıflandırıcılarından daha iyi performans gösterdiği de gözlemlenmiştir.

Anahtar Kelimeler

Relief-F, mRMR, makine öğrenmesi, tahmin, COVID-19, koronavirüs

Proje Numarası

FYL-2021-14257

Kaynakça

Althnian, A., Elwafa, A. A., Aloboud, N., Alrasheed, H., & Kurdi, H. (2020). Prediction of COVID-19 Individual Susceptibility using Demographic Data: A Case Study on Saudi Arabia. In Procedia Computer Science (Vol. 177, pp. 379–386). https://doi.org/10.1016/j.procs.2020.10.051
Ciotti, M., Ciccozzi, M., Terrinoni, A., Jiang, W.-C., Wang, C.-B., & Bernardini, S. (2020). The COVID-19 pandemic. In Critical Reviews in Clinical Laboratory Sciences (Vol. 57, Issue 6, pp. 365–388). Informa UK Limited. https://doi.org/¬10.1080/10408363.2020.1783198
COVID Live. (2022, May 15). Worldometers. https://www.-worldometers.info/coronavirus/
Data on COVID-19 pandemic. (2021, May 24). Open Data from the State of Espirito Santo. https://dados.es.gov.br/-dataset/dados-sobre-pandemia-covid-19/resource/38cc5066-020d-4c5a-b4c0-e9f690deb6d4
Fayyoumi, E., Idwan, S., & AboShindi, H. (2020). Machine Learning and Statistical Modelling for Prediction of Novel COVID-19 Patients Case Study: Jordan. In International Journal of Advanced Computer Science and Applications (Vol. 11, Issue 5). The Science and Information Organization. https://doi.org/10.14569/ijacsa.2020.0110518
Hanchuan Peng, Fuhui Long, & Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. In IEEE Transactions on Pattern Analysis and Machine Intelligence (Vol. 27, Issue 8, pp. 1226–1238). Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/-tpami.2005.159 Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector classification.
Kulis, B., & Jordan, M. I. (2011). Revisiting k-means: New Algorithms via Bayesian Nonparametrics (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1111.0352
Natekin, A., & Knoll, A. (2013). Gradient boosting machines, a tutorial. In Frontiers in Neurorobotics (Vol. 7). Frontiers Media SA. https://doi.org/10.3389/fnbot.2013.00021
Orr, M. J. (1996). Introduction to radial basis function networks.
Popescu, M. C., Balas, V. E., Perescu-Popescu, L., & Mastorakis, N. (2009). Multilayer perceptron and neural networks. WSEAS Transactions on Circuits and Systems, 8(7), 579-588.
Prakash, K. B. (2020). Analysis, Prediction and Evaluation of COVID-19 Datasets using Machine Learning Algorithms. In International Journal of Emerging Trends in Engineering Research (Vol. 8, Issue 5, pp. 2199–2204). The World Academy of Research in Science and Engineering. https://doi.org/10.30534/ijeter/2020/117852020
Robnik-Šikonja, M., & Kononenko, I. (2003). In Machine Learning (Vol. 53, Issue 1/2, pp. 23–69). Springer Science and Business Media LLC. https://doi.org/10.1023/a:-1025667309714
Souza, F. S. H., Hojo-Souza, N. S., dos Santos, E. B., da Silva, C. M., & Guidoni, D. L. (2020). Predicting the disease outcome in COVID-19 positive patients through Machine Learning: a retrospective cohort study with Brazilian data. https://doi.org/10.1101/2020.06.26.20140764
Viana dos Santos Santana, Í., CM da Silveira, A., Sobrinho, Á., Chaves e Silva, L., Dias da Silva, L., Santos, D. F. S., Gurjão, E. C., & Perkusich, A. (2021). Classification Models for COVID-19 Test Prioritization in Brazil: Machine Learning Approach (Preprint). JMIR Publications Inc. https://doi.org/10.2196/preprints.27293
Wollenstein-Betech, S., Cassandras, C. G., & Paschalidis, I. Ch. (2020). Personalized Predictive Models for Symptomatic COVID-19 Patients Using Basic Preconditions: Hospitalizations, Mortality, and the Need for an ICU or Ventilator. https://doi.org/10.1101/2020.05.03.20089813

Predicting COVID-19 Infection Using Machine Learning Methods Combined with Feature Selection

Yıl 2022, Sayı: 37, 52 - 58, 15.07.2022

Umut Ahmet Çetin , Fatih Abut

https://doi.org/10.31590/ejosat.1132337

Öz

COVID-19 is an infection that has affected the world since December 31, 2019, and was declared a pandemic by WHO in March 2020. In this study, Multi-Layer Perceptron (MLP), Tree Boost (TB), Radial Basis Function Network (RBF), Support Vector Machine (SVM), and K-Means Clustering (kMC) individually combined with minimum redundancy maximum relevance (mRMR) and Relief-F have been used to construct new feature selection-based COVID-19 prediction models and discern the influential variables for prediction of COVID-19 infection. The dataset has information related to 20.000 patients (i.e., 10.000 positives, 10.000 negatives) and includes several personal, symptomatic, and non-symptomatic variables. The accuracy, recall, and F1-score metrics have been used to assess the models’ performance, whereas the generalization errors of the models were evaluated using 10-fold cross-validation. The results show that the average performance of mRMR is slightly better than Relief-F in predicting the COVID-19 infection of a patient. In addition, mRMR is more successful than the Relief-F algorithm in finding the relative relevance order of the COVID-19 predictors. The mRMR algorithm emphasizes symptomatic variables such as fever and cough, whereas the Relief-F algorithm highlights non-symptomatic variables such as age and race. It has also been observed that, in general, MLP outperforms all other classifiers for predicting the COVID-19 infection.

Anahtar Kelimeler

Relief-F, mRMR, machine learning, prediction, COVID-19, coronavirus

Destekleyen Kurum

Çukurova University Scientific Research Projects Center

Proje Numarası

FYL-2021-14257

Kaynakça

Althnian, A., Elwafa, A. A., Aloboud, N., Alrasheed, H., & Kurdi, H. (2020). Prediction of COVID-19 Individual Susceptibility using Demographic Data: A Case Study on Saudi Arabia. In Procedia Computer Science (Vol. 177, pp. 379–386). https://doi.org/10.1016/j.procs.2020.10.051
Ciotti, M., Ciccozzi, M., Terrinoni, A., Jiang, W.-C., Wang, C.-B., & Bernardini, S. (2020). The COVID-19 pandemic. In Critical Reviews in Clinical Laboratory Sciences (Vol. 57, Issue 6, pp. 365–388). Informa UK Limited. https://doi.org/¬10.1080/10408363.2020.1783198
COVID Live. (2022, May 15). Worldometers. https://www.-worldometers.info/coronavirus/
Data on COVID-19 pandemic. (2021, May 24). Open Data from the State of Espirito Santo. https://dados.es.gov.br/-dataset/dados-sobre-pandemia-covid-19/resource/38cc5066-020d-4c5a-b4c0-e9f690deb6d4
Fayyoumi, E., Idwan, S., & AboShindi, H. (2020). Machine Learning and Statistical Modelling for Prediction of Novel COVID-19 Patients Case Study: Jordan. In International Journal of Advanced Computer Science and Applications (Vol. 11, Issue 5). The Science and Information Organization. https://doi.org/10.14569/ijacsa.2020.0110518
Hanchuan Peng, Fuhui Long, & Ding, C. (2005). Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy. In IEEE Transactions on Pattern Analysis and Machine Intelligence (Vol. 27, Issue 8, pp. 1226–1238). Institute of Electrical and Electronics Engineers (IEEE). https://doi.org/10.1109/-tpami.2005.159 Hsu, C. W., Chang, C. C., & Lin, C. J. (2003). A practical guide to support vector classification.
Kulis, B., & Jordan, M. I. (2011). Revisiting k-means: New Algorithms via Bayesian Nonparametrics (Version 2). arXiv. https://doi.org/10.48550/ARXIV.1111.0352
Natekin, A., & Knoll, A. (2013). Gradient boosting machines, a tutorial. In Frontiers in Neurorobotics (Vol. 7). Frontiers Media SA. https://doi.org/10.3389/fnbot.2013.00021
Orr, M. J. (1996). Introduction to radial basis function networks.
Popescu, M. C., Balas, V. E., Perescu-Popescu, L., & Mastorakis, N. (2009). Multilayer perceptron and neural networks. WSEAS Transactions on Circuits and Systems, 8(7), 579-588.
Prakash, K. B. (2020). Analysis, Prediction and Evaluation of COVID-19 Datasets using Machine Learning Algorithms. In International Journal of Emerging Trends in Engineering Research (Vol. 8, Issue 5, pp. 2199–2204). The World Academy of Research in Science and Engineering. https://doi.org/10.30534/ijeter/2020/117852020
Robnik-Šikonja, M., & Kononenko, I. (2003). In Machine Learning (Vol. 53, Issue 1/2, pp. 23–69). Springer Science and Business Media LLC. https://doi.org/10.1023/a:-1025667309714
Souza, F. S. H., Hojo-Souza, N. S., dos Santos, E. B., da Silva, C. M., & Guidoni, D. L. (2020). Predicting the disease outcome in COVID-19 positive patients through Machine Learning: a retrospective cohort study with Brazilian data. https://doi.org/10.1101/2020.06.26.20140764
Viana dos Santos Santana, Í., CM da Silveira, A., Sobrinho, Á., Chaves e Silva, L., Dias da Silva, L., Santos, D. F. S., Gurjão, E. C., & Perkusich, A. (2021). Classification Models for COVID-19 Test Prioritization in Brazil: Machine Learning Approach (Preprint). JMIR Publications Inc. https://doi.org/10.2196/preprints.27293
Wollenstein-Betech, S., Cassandras, C. G., & Paschalidis, I. Ch. (2020). Personalized Predictive Models for Symptomatic COVID-19 Patients Using Basic Preconditions: Hospitalizations, Mortality, and the Need for an ICU or Ventilator. https://doi.org/10.1101/2020.05.03.20089813

Toplam 15 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Mühendislik
Bölüm	Makaleler
Yazarlar	Umut Ahmet Çetin 0000-0001-8755-4417 Fatih Abut 0000-0001-5876-4116
Proje Numarası	FYL-2021-14257
Erken Görünüm Tarihi	30 Haziran 2022
Yayımlanma Tarihi	15 Temmuz 2022
Yayımlandığı Sayı	Yıl 2022 Sayı: 37

Kaynak Göster

APA	Çetin, U. A., & Abut, F. (2022). Predicting COVID-19 Infection Using Machine Learning Methods Combined with Feature Selection. Avrupa Bilim Ve Teknoloji Dergisi(37), 52-58. https://doi.org/10.31590/ejosat.1132337

Kapak Resmi İndir

Makale Dosyaları

Tam Metin