YSA Sınıflandırma Modellerinde Korelasyon-Hipotez Testi Tabanlı Filtreleme Yoluyla Girdi Seçimi

Meryem Uluskan; Halil Derya Şenli

doi:10.51541/nicel.1372774

Research Article

Input Selection Through Correlation-Hypothesis Testing Based Filtering in ANN Classification Models

Year 2024, Volume: 6 Issue: 1, 68 - 102, 30.06.2024

Meryem Uluskan , Halil Derya Şenli

https://doi.org/10.51541/nicel.1372774

Cited By: 1

Abstract

The main goal of this study is to obtain high performing Artificial Neural Network (ANN) models for classification by reducing the large number of potential input variables using correlations between these variables. To achieve this, a breast cancer diagnosis problem with 30 potential input variables was considered and an ANN model was created by reducing the number of input variables with a proposed correlation-hypothesis test-based filtering method. The effectiveness of the proposed model was compared with six ANN models containing different sets of input variables. These six models include the model containing all input variables and the models obtained with input variables selected by stepwise regression, forward selection and backward elimination methods, which are model-based selection methods. While creating the models, the data set was divided into different training-test percentages and different numbers of neurons were tried in the hidden layer. Accuracy, recall, precision and F1-score metrics were used to compare the classification performances of the models. As a result, the accuracy value for the models with nine input variables selected by the proposed correlation-based filtering method was found to be between 0.93-0.95, which is significantly high. The recall value for our model was obtained between 0.85-0.88 and was sufficient. The precision value for our proposed model was determined to be very high, in the range of 0.98-0.988. The F1-score of the model proposed in this study is between 0.907-0.931, which is high enough. Considering that the proposed model has the lowest number of variables among the compared models, that is, it is the simplest model, and has a good classification performance even with only 10 neurons in the hidden layer, this model can be used for rapid, lean and efficient classification at low costs, especially compared to model-based models.

Keywords

Input variable selection, Filtering method, Artificial Neural Networks, Classification, Correlation, Hypothesis tests

References

Abiodun, O.I., Jantan, A., Omolara, A.E., Dada, K.V., Mohamed, N.A. and Arshad, H. (2018), State-of-the-art in artificial neural network applications: A survey, Heliyon, 4(11).
Acharya, U.R., Oh, S.L., Hagiwara, Y., Tan, J.H., Adam, M., Gertych, A. and San Tan, R. (2017), A deep convolutional neural network model to classify heartbeats, Computers in Biology and Medicine, 89, 389-396.
Alpaydin, E. (2020), Introduction to Machine Learning, MIT Press, Cambridge, Massachusetts, ABD.
Alshanbari, E., Alamri, H., Alzahrani, W. and Alghamdi, M. (2021), Breast cancer classification using convolutional neural network, International Journal of Computer Science and Network Security, 21(6), 101-106.
Arı, A. and Hanbay, D. (2019), Tumor detection in MR ımages of regional convolutional neural networks, Journal of the Faculty of Engineering and Architecture of Gazi University, 34(3), 1395-1408.
Basu, J.K., Bhattacharyya, D. and Kim, T.H. (2010), Use of artificial neural network in pattern recognition, International Journal of Software Engineering and Its Applications, 4, 23–34.
Bılgıç, B. (2021), Comparison of breast cancer and skin cancer diagnoses using deep learning method, 29, Signal Processing and Communications Applications Conference (SIU), 9-11 Haziran, Istanbul, Türkiye.
Calkins, K.G. (2005), Correlation Coefficients, Erişim adresi: https://www.andrews.edu/~calkins/math/edrm611/edrm05.htm
Chandrashekar, G. ve Sahin, F. (2014), A survey on feature selection methods. Computers & electrical engineering, 40(1), 16-28.
Ciregan, D., Meier, U. and Schmidhuber, J. (2012), Multi-column deep neural networks for image classification, 2012 IEEE Conference on computer vision and pattern recognition, 16-21 Haziran, Providence, RI, ABD, 3642-3649.
Chuang, C. L. and Huang, S. T. (2011), A hybrid neural network approach for credit scoring, Expert Systems, 28(2), 185-196.
Chou, S.M., Lee, T.S., Shao, Y.E. and Chen, I.F. (2004), Mining the breast cancer pattern using artificial neural networks and multivariate adaptive regressions lines, Expert Systems with Applications, 27(1), 133-142.
Dongare, A.D., Kharde, R.R. and Kachare, A.D. (2012). Introduction to artificial neural network. International Journal of Engineering and Innovative Technology (IJEIT), 2(1), 189-194.
Dreiseitl, S. and Ohno-Machado, L. (2002). Logistic regression and artificial neural network classification models: A methodology review, Journal of Biomedical Informatics, 35(5-6), 352-359.
Emanet, S., Baydoğmuş, G.K. and Demir, Ö. (2021), Öznitelik seçme yöntemlerinin makine öğrenmesi tabanlı saldırı tespit sistemi performansına etkileri, Dicle Üniversitesi Mühendislik Fakültesi Mühendislik Dergisi, 12(5), 743-755.
Fernando, T.M.K.G., Maier, H.R. and Dandy, G.C. (2009), Selection of input variables for data driven models: An average shifted histogram partial mutual information estimator approach, Journal of Hydrology, 367(3-4), 165-176.
Gao, C., Sun, H., Wang, T., Tang, M., Bohnen, N.I., Müller, M.L., ... and Dinov, I.D. (2018), Model-based and model-free machine learning techniques for diagnostic prediction and classification of clinical outcomes in Parkinson’s disease. Scientific reports, 8(1), 7129.
Goodfellow, I., Bengio, Y. and Courville, A. (2016), Deeplearning, MIT Press, Cambridge, Massachusetts, ABD.
Haddadnia, J., Hashemian, M. and Hassanpour, K. (2012), Diagnosis of breast cancer using a combination of genetic algorithm and artificial neural network in medical infrared thermal imaging, Iranian Journal of Medical Physics, 9(4), 265-274.
Irmak, M.C., Taş, M.B.H., Turan, S. and Haşiloğlu, A. (2021), Comparative breast cancer detection with artificial neural networks and machine learning methods, 29. Signal Processing and Communications Applications Conference (SIU). 9-11 Haziran, Istanbul, Türkiye, 1-4
Jerez-Aragonés, J.M., Gómez-Ruiz, J.A., Ramos-Jiménez, G., Muñoz-Pérez, J. and Alba-Conejo, E. (2003), A combineneural network and decision trees model for prognosis of breast cancer relapse, Artificial Intelligence in Medicine, 27(1), 45-63.
LaMorte, W.W. (2021), Correlation and Regression, Boston University School of Public Health, Erişim adresi: https://sphweb.bumc.bu.edu/otlt/MPH-Modules/PH717-QuantCore/PH717-Module9-Correlation-Regression/PH717-Module9-Correlation-Regression4.html.
Liao, S.H. and Wen, C.H. (2007), Artificial neural networks classification and clustering of methodologies and applications–literature analysis from 1995 to 2005, Expert Systems with applications, 32(1), 1-11.
May, R., Dandy, G. and Maier, H. (2011), Review of input variable selection methods for artificial neural networks, Artificial neural networks-methodological advances and biomedical applications, 10(1), 19-45.
Öztemel, E. (2003), Yapay Sinir Ağları, Papatya Yayıncılık, İstanbul, Türkiye.
Ramani, R., Devi, K.V. and Soundar, K.R. (2020), MapReduce-based big data framework using modified artificial neural network classifier for diabetic chronic disease prediction, Soft Computing, 24(21), 16335-16345.
Remeseiro, B. and Bolon-Canedo, V. (2019), A review of feature selection methods in medical applications. Computers in biology and medicine, 112, 103375.
Ryu, Y.U., Chandrasekaran, R. and Jacob, V.S. (2007), Breast cancer prediction using the isotonic separation technique, European Journal of Operational Research, 181(2), 842-854.
Sezer, E. and Çakır, Ö. (2022), Sınıflandırma amaçlı değişken alt kümesi seçimi: bir bankacılık uygulaması, Dicle Üniversitesi İktisadi ve İdari Bilimler Fakültesi Dergisi, 12(24), 480-498.
Snelder, T.H., Dey, K.L. and Leathwick, J.R. (2007), A procedure for making optimal selection of input variables for multivariate environmental classifications. Conservation Biology, 21(2), 365-375.
Snieder, E., Shakir, R. and Khan, U.T. (2020), A comprehensive comparison of four input variable selection methods for artificial neural network flow forecasting models. Journal of Hydrology, 583, 124299.
Solorio-Fernández, S., Carrasco-Ochoa, J.A. and Martínez-Trinidad, J.F. (2020), A review of unsupervised feature selection methods. Artificial Intelligence Review, 53(2), 907-948.
Suzuki, K. (Ed.). (2011), Artificial neural networks: methodological advances and biomedical applications. BoD–Books on Demand.
UC Irvine Machine Learning Repository (2022), Erişim adresi: https://archive.ics.uci.edu/datasets.
Uluskan, M. (2020), Artificial neural networks as a quality loss function for six sigma, Total Quality Management and Business Excellence, 31(15-16), 1811-1828.
Wu, Y.C. and Feng, J.W. (2018), Development and application of artificial neural network. Wireless Personal Communications, 102, 1645-1656.
Zhang, G.P. (2000), Neural networks for classification: A survey, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 30(4), 451-462.
Zhang, Y.D., Satapathy, S.C., Guttery, D.S., Górriz, J.M. and Wang, S.H. (2021), Improved breast cancer classification through combining graph convolutional network and convolutional neural network, Information Processing and Management, 58(2), 102439.
Zou, J., Han, Y. and So, S.S. (2009), Overview of artificial neural networks. Artificial neural networks: methods and applications, 14-22.

YSA Sınıflandırma Modellerinde Korelasyon-Hipotez Testi Tabanlı Filtreleme Yoluyla Girdi Seçimi

Year 2024, Volume: 6 Issue: 1, 68 - 102, 30.06.2024

Meryem Uluskan , Halil Derya Şenli

https://doi.org/10.51541/nicel.1372774

Cited By: 1

Abstract

Bu çalışmada başlıca amaç, yüksek miktardaki olası girdi değişken sayısını, bu değişkenler arasındaki korelasyonları göz önünde bulundurarak azaltarak sınıflandırma performansı yüksek Yapay Sinir Ağı (YSA) modelleri elde etmektir. Bunu gerçekleştirmek için 30 adet olası girdi değişkeni olan bir meme kanseri teşhis problemi ele alınmış ve önerilen korelasyon-hipotez testi tabanlı bir filtreleme yöntemi ile girdi değişken sayısı azaltılarak YSA modeli oluşturulmuştur. Önerilen modelin etkinliği farklı girdi değişken setlerini içeren altı YSA modeli ile karşılaştırılmıştır. Bu altı model, tüm girdi değişkenlerini içeren modelle, model tabanlı seçim yöntemlerinden aşamalı regresyon, ileri doğru seçim ve geriye doğru eleme yöntemleri ile seçilmiş girdi değişkenleriyle elde edilmiş olan modelleri kapsamaktadır. Modeller oluşturulurken veri seti farklı eğitim-test yüzdelerine bölünmüş ve gizli katmanda farklı nöron sayıları denenmiştir. Modellerin sınıflandırma performanslarını karşılaştırmak için doğruluk, duyarlılık, kesinlik ve F1-skoru ölçütleri kullanılmıştır. Sonuç olarak, önerilen korelasyon tabanlı filtreleme yöntemi ile seçilen dokuz girdi değişkenli modeller için doğruluk değeri 0,93-0,95 arasında bulunmuş olup bu değer belirgin şekilde iyidir. Duyarlılık değeri modelimiz için 0,85-0,88 aralığında ve yeterli düzeyde elde edilmiştir. Kesinlik değerinin önerilen modelimiz için 0,98-0,988 aralığında ve çok yüksek olduğu belirlenmiştir. Bu çalışmada önerilen modelin F1-skoru 0,907-0,931 arasında olup yeterince yüksek bir değere sahiptir. Karşılaştırılan modeller içinde önerilen dokuz girdi değişkenli modelin değişken sayısının en düşük olduğu, yani en sade model olduğu ve gizli katmanda sadece 10 nöronla bile iyi bir sınıflandırma performansına sahip olduğu göz önüne alındığında bu yöntemin özellikle model tabanlı yöntemlere kıyasla kısa sürede ve düşük maliyetlerle anlaşılır sınıflandırma modelleri oluşturmada verimli olacağı belirlenmiştir.

Keywords

Girdi değişken seçimi, Filtreleme yöntemi, Yapay Sinir Ağları, Sınıflandırma, Korelasyon, Hipotez testleri

References

Abiodun, O.I., Jantan, A., Omolara, A.E., Dada, K.V., Mohamed, N.A. and Arshad, H. (2018), State-of-the-art in artificial neural network applications: A survey, Heliyon, 4(11).
Acharya, U.R., Oh, S.L., Hagiwara, Y., Tan, J.H., Adam, M., Gertych, A. and San Tan, R. (2017), A deep convolutional neural network model to classify heartbeats, Computers in Biology and Medicine, 89, 389-396.
Alpaydin, E. (2020), Introduction to Machine Learning, MIT Press, Cambridge, Massachusetts, ABD.
Alshanbari, E., Alamri, H., Alzahrani, W. and Alghamdi, M. (2021), Breast cancer classification using convolutional neural network, International Journal of Computer Science and Network Security, 21(6), 101-106.
Arı, A. and Hanbay, D. (2019), Tumor detection in MR ımages of regional convolutional neural networks, Journal of the Faculty of Engineering and Architecture of Gazi University, 34(3), 1395-1408.
Basu, J.K., Bhattacharyya, D. and Kim, T.H. (2010), Use of artificial neural network in pattern recognition, International Journal of Software Engineering and Its Applications, 4, 23–34.
Bılgıç, B. (2021), Comparison of breast cancer and skin cancer diagnoses using deep learning method, 29, Signal Processing and Communications Applications Conference (SIU), 9-11 Haziran, Istanbul, Türkiye.
Calkins, K.G. (2005), Correlation Coefficients, Erişim adresi: https://www.andrews.edu/~calkins/math/edrm611/edrm05.htm
Chandrashekar, G. ve Sahin, F. (2014), A survey on feature selection methods. Computers & electrical engineering, 40(1), 16-28.
Ciregan, D., Meier, U. and Schmidhuber, J. (2012), Multi-column deep neural networks for image classification, 2012 IEEE Conference on computer vision and pattern recognition, 16-21 Haziran, Providence, RI, ABD, 3642-3649.
Chuang, C. L. and Huang, S. T. (2011), A hybrid neural network approach for credit scoring, Expert Systems, 28(2), 185-196.
Chou, S.M., Lee, T.S., Shao, Y.E. and Chen, I.F. (2004), Mining the breast cancer pattern using artificial neural networks and multivariate adaptive regressions lines, Expert Systems with Applications, 27(1), 133-142.
Dongare, A.D., Kharde, R.R. and Kachare, A.D. (2012). Introduction to artificial neural network. International Journal of Engineering and Innovative Technology (IJEIT), 2(1), 189-194.
Dreiseitl, S. and Ohno-Machado, L. (2002). Logistic regression and artificial neural network classification models: A methodology review, Journal of Biomedical Informatics, 35(5-6), 352-359.
Emanet, S., Baydoğmuş, G.K. and Demir, Ö. (2021), Öznitelik seçme yöntemlerinin makine öğrenmesi tabanlı saldırı tespit sistemi performansına etkileri, Dicle Üniversitesi Mühendislik Fakültesi Mühendislik Dergisi, 12(5), 743-755.
Fernando, T.M.K.G., Maier, H.R. and Dandy, G.C. (2009), Selection of input variables for data driven models: An average shifted histogram partial mutual information estimator approach, Journal of Hydrology, 367(3-4), 165-176.
Gao, C., Sun, H., Wang, T., Tang, M., Bohnen, N.I., Müller, M.L., ... and Dinov, I.D. (2018), Model-based and model-free machine learning techniques for diagnostic prediction and classification of clinical outcomes in Parkinson’s disease. Scientific reports, 8(1), 7129.
Goodfellow, I., Bengio, Y. and Courville, A. (2016), Deeplearning, MIT Press, Cambridge, Massachusetts, ABD.
Haddadnia, J., Hashemian, M. and Hassanpour, K. (2012), Diagnosis of breast cancer using a combination of genetic algorithm and artificial neural network in medical infrared thermal imaging, Iranian Journal of Medical Physics, 9(4), 265-274.
Irmak, M.C., Taş, M.B.H., Turan, S. and Haşiloğlu, A. (2021), Comparative breast cancer detection with artificial neural networks and machine learning methods, 29. Signal Processing and Communications Applications Conference (SIU). 9-11 Haziran, Istanbul, Türkiye, 1-4
Jerez-Aragonés, J.M., Gómez-Ruiz, J.A., Ramos-Jiménez, G., Muñoz-Pérez, J. and Alba-Conejo, E. (2003), A combineneural network and decision trees model for prognosis of breast cancer relapse, Artificial Intelligence in Medicine, 27(1), 45-63.
LaMorte, W.W. (2021), Correlation and Regression, Boston University School of Public Health, Erişim adresi: https://sphweb.bumc.bu.edu/otlt/MPH-Modules/PH717-QuantCore/PH717-Module9-Correlation-Regression/PH717-Module9-Correlation-Regression4.html.
Liao, S.H. and Wen, C.H. (2007), Artificial neural networks classification and clustering of methodologies and applications–literature analysis from 1995 to 2005, Expert Systems with applications, 32(1), 1-11.
May, R., Dandy, G. and Maier, H. (2011), Review of input variable selection methods for artificial neural networks, Artificial neural networks-methodological advances and biomedical applications, 10(1), 19-45.
Öztemel, E. (2003), Yapay Sinir Ağları, Papatya Yayıncılık, İstanbul, Türkiye.
Ramani, R., Devi, K.V. and Soundar, K.R. (2020), MapReduce-based big data framework using modified artificial neural network classifier for diabetic chronic disease prediction, Soft Computing, 24(21), 16335-16345.
Remeseiro, B. and Bolon-Canedo, V. (2019), A review of feature selection methods in medical applications. Computers in biology and medicine, 112, 103375.
Ryu, Y.U., Chandrasekaran, R. and Jacob, V.S. (2007), Breast cancer prediction using the isotonic separation technique, European Journal of Operational Research, 181(2), 842-854.
Sezer, E. and Çakır, Ö. (2022), Sınıflandırma amaçlı değişken alt kümesi seçimi: bir bankacılık uygulaması, Dicle Üniversitesi İktisadi ve İdari Bilimler Fakültesi Dergisi, 12(24), 480-498.
Snelder, T.H., Dey, K.L. and Leathwick, J.R. (2007), A procedure for making optimal selection of input variables for multivariate environmental classifications. Conservation Biology, 21(2), 365-375.
Snieder, E., Shakir, R. and Khan, U.T. (2020), A comprehensive comparison of four input variable selection methods for artificial neural network flow forecasting models. Journal of Hydrology, 583, 124299.
Solorio-Fernández, S., Carrasco-Ochoa, J.A. and Martínez-Trinidad, J.F. (2020), A review of unsupervised feature selection methods. Artificial Intelligence Review, 53(2), 907-948.
Suzuki, K. (Ed.). (2011), Artificial neural networks: methodological advances and biomedical applications. BoD–Books on Demand.
UC Irvine Machine Learning Repository (2022), Erişim adresi: https://archive.ics.uci.edu/datasets.
Uluskan, M. (2020), Artificial neural networks as a quality loss function for six sigma, Total Quality Management and Business Excellence, 31(15-16), 1811-1828.
Wu, Y.C. and Feng, J.W. (2018), Development and application of artificial neural network. Wireless Personal Communications, 102, 1645-1656.
Zhang, G.P. (2000), Neural networks for classification: A survey, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 30(4), 451-462.
Zhang, Y.D., Satapathy, S.C., Guttery, D.S., Górriz, J.M. and Wang, S.H. (2021), Improved breast cancer classification through combining graph convolutional network and convolutional neural network, Information Processing and Management, 58(2), 102439.
Zou, J., Han, Y. and So, S.S. (2009), Overview of artificial neural networks. Artificial neural networks: methods and applications, 14-22.

There are 39 citations in total.

Details

Primary Language	Turkish
Subjects	Biostatistics, Statistics (Other)
Journal Section	Articles
Authors	Meryem Uluskan 0000-0003-1287-8286 Halil Derya Şenli 0000-0001-6966-3388
Publication Date	June 30, 2024
Published in Issue	Year 2024 Volume: 6 Issue: 1

Cite

APA	Uluskan, M., & Şenli, H. D. (2024). YSA Sınıflandırma Modellerinde Korelasyon-Hipotez Testi Tabanlı Filtreleme Yoluyla Girdi Seçimi. Nicel Bilimler Dergisi, 6(1), 68-102. https://doi.org/10.51541/nicel.1372774

Cited By

Assessing the Effectiveness of Machine Learning Techniques for Silver Price Prediction: A Comparative Study

Bitlis Eren Üniversitesi Fen Bilimleri Dergisi

https://doi.org/10.17798/bitlisfen.1556171

Download Cover Image

Article Files

Full Text