Comparison of Data Mining Classification Algorithms on Educational Data under Different Conditions

İlhan Koyuncu; Selahattin Gelbal

doi:10.21031/epod.696664

Research Article

Year 2020, Volume: 11 Issue: 4, 325 - 345, 30.12.2020

İlhan Koyuncu , Selahattin Gelbal

https://doi.org/10.21031/epod.696664

Cited By: 4

Abstract

References

Aha, D. W., Kibler, D. & Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning 6, 37-66.
Aksu, G., & Guzeller, C. O. (2016). Classification of PISA 2012 mathematical literacy scores using decision-tree method: Turkey sampling. Education and Science, 41(185), 101-122.
Akpınar, H. (2014). Veri madenciliği veri analizi. Papatya Yayınları, İstanbul.
Baker, R. S. J. (2010). Data mining for education. International Encyclopedia of Education, 7(3), 112-118.
Baker, R.S.J. & Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining, 1 (1), 3-17.
Bahadır, E. (2013). Yapay sinir ağları ve lojistik regresyon analizi yaklaşımları ile öğretmen adaylarının akademik başarılarının tahmini (Yayımlanmamış Doktora Tezi). Marmara Üniversitesi, İstanbul.
Barker, K., Trafalis, T. & Rhoads, T. R. (2004). Learning from student data. In Proceedings of the 2004 Systems and Information Engineering Design Symposium (pp. 79-86). IEEE.
Berens, J., Schneider, K., Gortz, S., Oster, S., & Burghoff, J. (2019). Early detection of students at risk - predicting student dropouts using administrative student data from German universities and machine learning methods. JEDM | Journal of Educational Data Mining, 11(3), 1-41. https://doi.org/10.5281/zenodo.3594771.
Beleites, C., Neugebauer, U., Bocklitz, T., Krafft, C., & Popp, J. (2013). Sample size planning for classification models. Analytica Chimica Acta, 760, 25-33.
Berens, J., Schneider, K., Gortz, S., Oster, S., & Burghoff, J. (2019). Early detection of students at risk - predicting student dropouts using administrative student data from German universities and machine learning methods. Journal of Educational Data Mining, 11(3), 1-41. https://doi.org/10.5281/zenodo.3594771.
Bhardwaj, B. K. & Pal, S. (2011). Data mining: A prediction for performance improvement using classification. (IJCSIS) International Journal of Computer Science and Information Security, 9, (4), 136-140.
Blum, A. L. & Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97(1), 245–271.
Brain, D., & Webb, G. (1999). On the effect of data set size on bias and variance in classification learning. In Proceedings of the Fourth Australian Knowledge Acquisition Workshop, University of New South Wales (pp. 117-128), December 5-6, Sydney, Australia.
Bulut, O., & Yavuz, H. C. (2019). Educational data mining: A tutorial for the" Rattle" package in R. International Journal of Assessment Tools in Education, 6(5), 20-36.
Büyüköztürk, Ş., Çakmak-Kılıç, E., Akgün, Ö., Karadeniz, Ş. & Demirel, F. (2015). Bilimsel araştırma yöntemleri. Ankara: Pegem.
Cabrera, A. F. (1994). Logistic regression analysis in higher education: An applied perspective. Higher Education: Handbook of Theory and Research, 10, 225–256.
Chu, C., Hsu, A. L., Chou, K. H., Bandettini, P., Lin, C., & Alzheimer's Disease Neuroimaging Initiative. (2012). Does feature selection improve classification accuracy? Impact of sample size and feature selection on classification using anatomical magnetic resonance images. Neuroimage, 60(1), 59-70.
Cox, D. R. & Snell, E. J. (1989). The analysis of binary data (2nd ed.). London: Chapman and Hall.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, 37-46.
Çırak, G. (2012). Yükseköğretimde öğrenci başarılarının sınıflandırılmasında yapay sinir ağları ve lojistik regresyon yöntemlerinin kullanılması (Yayımlanmamış Yüksek Lisans Tezi). Ankara Üniversitesi, Ankara.
Çölkesen, I., & Kavzoglu, T. (2010). Farklı boyutta eğitim örnekleri için destek vektör makinelerinin sınıflandırma performansının analizi. In Proceedings of III. Uzaktan Algılama ve Coğrafi Bilgi Sistemleri Sempozyumu (pp. 161-170), 11 – 13 Ekim, Gebze, Kocaeli, Türkiye.
Dekker, G. W., Pechenizkiy, M. ve Vleeshouwers, J. M. (2009). Predicting students drop out: A case study. In Proceedings of 2nd International Conference on Educational Data Mining (pp. 41-50). Spain, Cordoba.
Dunham, M.H. (2003). Data mining introductory and advanced topics. Upper Saddle River, NJ: Pearson Education, Inc.
Efron, B. (1983). Estimating the error rate of a prediction rule: Improvements on crossvalidation. J. Amer. Stat. Ass., 78, 316–331.
Egan, J. P. (1975). Signal detection theory and ROC analysis. New York: Academic Press.
Figueroa, R. L., Zeng-Treitler, Q., Kandula, S., & Ngo, L. H. (2012). Predicting sample size required for classification performance. BMC Medical Informatics and Decision Making, 12(1), 8.
Foody, G. M., Mathur, A., Sanchez-Hernandez, C., & Boyd, D. S. (2006). Training set size requirements for the classification of a specific class. Remote Sensing of Environment, 104(1), 1-14.
Fraenkel, J. R. & Wallen, N. E. (2011). How to design and evaluate research in education (6th ed.). New York: McGraw-Hill, Inc.
Frank, E., Hall M. A. & Witten, I. H. (2016). The WEKA workbench: Online appendix for "Data mining: Practical machine learning tools and techniques” (4th ed.). Morgan Kaufmann.
Ghosh, A. K. (2006). On optimum choice of k in nearest neighbor classification. Computational Statistics and Data Analysis, 50(11), 3113-3123.
Gorostiaga, A., & Rojo-Álvarez, J. L. (2016). On the use of conventional and statistical-learning techniques for the analysis of PISA results in Spain. Neurocomputing, 171, 625-637.
Göker, H. (2012). Üniversite giriş sınavında öğrencilerin başarılarının veri madenciliği yöntemleri ile tahmin edilmesi (Yüksek lisans tezi, Gazi Üniversitesi, Ankara). Retrieved from http://tez2.yok.gov.tr/
Güre, Ö. B., Kayri, M., & Erdoğan, F. (2020). Analysis of factors effecting PISA 2015 mathematics literacy via educational data mining. Education and Science, 45(202), 393-415. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P. & Witten, I. H. (2009). The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1), 10-18.
Hall, P., Park, B. U. & Samworth, R. J. (2008). Choice of neighbor order in nearest-neighbor classification. The Annals of Statistics, 36(5), 2135-2152.
Han, J., Kamber, M. & Pei, J. (2011). Data mining: concepts and techniques (3rd ed.). MA, USA: Elsevier.
Hamalainen, W. & Vinni, M. (2006). Comparison of machine learning methods for intelligent tutoring systems. In Proceedings of International Conference on Intelligent Tutoring Systems (pp. 525-534). Springer Berlin/Heidelberg.
Hamalainen, W. & Vinni, M. (2011). Classifiers for educational technology. In C. Romero, S. Ventura, M. Pechenizkiy, R.S.J.d. Baker (Eds.), Handbook of educational data mining (pp. 54-74). CRC Press.
Heilman, M., & Madnani, N. (2015). The impact of training data on automated short answer scoring performance. In Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 81-85), June 4, Association for Computational Linguistics, Denver, Colorado.
Heydari, S. S., & Mountrakis, G. (2018). Effect of classifier selection, reference sample size, reference class distribution and scene heterogeneity in per-pixel classification accuracy using 26 Landsat sites. Remote Sensing of Environment, 204, 648-658.
Huebner, R. A. (2013). A survey of educational data-mining research. Research in Higher Education Journal, 19, 1-13.
Karasar, N. (2005). Bilimsel araştırma yöntemi. Ankara: Nobel Yayın Dağıtım.
Kiray, S. A., Gok, B., & Bozkir, A. S. (2015). Identifying the factors affecting science and mathematics achievement using data mining methods. Journal of Education in Science, Environment and Health, 1(1), 28-48.
Kotsiantis, S. B., Pierrakeas, C. J. & Pintelas, P. E. (2003). Preventing student dropout in distance learning using machine learning techniques. In Knowledge-Based Intelligent Information and Engineering Systems (pp. 267-274). Springer Berlin/Heidelberg.
Lachenbruch, P. A. & Mickey, M. R. (1968). Estimation of error rates in discriminant analysis. Technometrics, 10(1), 1-11.
Larose, D. T. (2004). K-nearest neighbor algorithm. In Larose, D.T. and Larose, C.D. (Eds.), Discovering knowledge in data: An introduction to data mining (pp. 90-106). Hoboken, NJ, USA John Wiley and Sons, Inc.. https://doi.org/10.1002/0471687545.ch5.
Liu, H. & Motoda, H. (2001). Feature extraction, construction and selection: A data mining perspective. Boston: Kluwer Academic Publishers.
Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491-502.
Martínez-Abad, F., Gamazo, A., & Rodríguez-Conde, M. J. (2020). Educational Data Mining: Identification of factors associated with school effectiveness in PISA assessment. Studies in Educational Evaluation, 66, 100875. https://doi.org/10.1016/j.stueduc.2020.100875.
Michie, D., Spiegelhalter, D.J. & Taylor, C.C. (1994). Machine learning, neural and statistical classification. Ellis Horwood Limited.
Minaei-Bidgoli, B., D.A. Kashy, G. Kortemeyer, & W. Punch. Predicting student performance: An application of data mining methods with an educational web-based system. In Proceedings of 33rd Frontiers in Education Conference, (pp. 13-18). Westminster, CO.
Nghe, N. T., Janecek, P. & Haddawy, P. (2007). A comparative analysis of techniques for predicting academic performance. In Frontiers in Education Conference-Global Engineering: Knowledge Without Borders, Opportunities Without Passports, (pp. T2G-7). IEEE.
Organisation for Economic Co-operation and Development (2014a). PISA 2012 results: What students know and can do - student performance in mathematics, reading and science (Volume I, Revised edition). PISA, OECD Publishing.
Organisation for Economic Co-operation and Development (2014b). PISA 2012 technical report. PISA, OECD Publishing.
Osmanbegović, E. & Suljić, M. (2012). Data mining approach for predicting student performance. Economic Review, 10(1), 3-12.
Peng, C.Y.J., Lee, K. L. & Ingersoll, G. M. (2002) An introduction to logistic regression analysis and reporting. The Journal of Educational Research, 96(1), 3-14. doi:10.1080/00220670209598786.
Peng, C. Y. J. & So, T. S. H. (2002). Logistic regression analysis and reporting: A primer. Understanding Statistics: Statistical Issues in Psychology, Education, and the Social Sciences, 1(1), 31-70.
Ranjan, J. & Malik, K. (2007). Effective educational process: A data mining approach. VINE, 37(4), 502-515.
Raudys, S., & Pikelis, V. (1980). On dimensionality, sample size, classification error, and complexity of classification algorithm in pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, (3), 242-252.
Romero, C., Espejo, P. G., Zafra, A., Romero, J. R. & Ventura, S. (2013). Web usage mining for predicting final marks of students that use Moodle courses. Computer Applications in Engineering Education, 21(1), 135-146.
Romero, C., Ventura, S., Espejo, P. G. & Hervás, C. (2008). Data mining algorithms to classify students. In Proceedings of the 1st International Conference on Educational Data Mining (pp. 8-17). Montréal, Québec, Canada.
Romero, C. & Ventura, S. (2007). Educational data mining: A survey from 1995 to 2005. Expert Systems with Applications, 33(1), 135-146.
Romero, C. & Ventura, S. (2013). Data mining in education. WIREs Data Mining Knowledge Discovery 3(1), 12-27.
Shahiri, A. M., Husain, W. & Rashid, N. A. (2015). A review on predicting student's performance using data mining techniques. Procedia Computer Science, 72, 414-422.
Shao, L., Fan, X., Cheng, N., Wu, L., & Cheng, Y. (2013). Determination of minimum training sample size for microarray-based cancer outcome prediction–an empirical assessment. PloS one, 8(7), e68579. https://doi.org/10.1371/journal.pone.0068579.
Sivanandam, S., Sumathi, S., & Deepa, S. (2006). Introduction to neural networks using Matlab 6.0. New Delhi: Tata McGraw-Hill Publishing Company.
Şengür, D. (2013). Öğrencilerin akademik başarılarının veri madenciliği metotları ile tahmini (Yüksek lisans tezi, Fırat Üniversitesi, Elazığ). Erişim adresi: http://tez2.yok.gov.tr/
Sweeney, M., Lester, J., Rangwala, H., & Johri, A. (2016). Next-term student performance prediction: A recommender systems approach. JEDM | Journal of Educational Data Mining, 8(1), 22-51. https://doi.org/10.5281/zenodo.3554603.
Tabachnick, B. G. & Fidell, L. S. (2013). Using multivariate statistics (6th ed.). Boston: Pearson.
Tadjudin, S., & Landgrebe, D. (1998). Classification of high dimensional data with limited training samples (Report No. 56). West Lafayette, Indiana: Purdue University, School of Electrical and Computer Engineering. http://docs.lib.purdue.edu/ecetr/56.
Tayeh, N., Klein, A., Le Paslier, M. C., Jacquin, F., Houtin, H., Rond, C., ... & Burstin, J. (2015). Genomic prediction in pea: effect of marker density and training population size and composition on prediction accuracy. Frontiers in Plant Science, 6(941), 941. https://doi.org/10.3389/fpls.2015.00941.
Tepehan, T. (2011). Türk öğrencilerinin PISA başarılarının yordanmasında yapay sinir ağı ve lojistik regresyon modeli performanslarının karşılaştırılması (Doktora tezi, Hacettepe Üniversitesi, Ankara). Retrieved from http://tez2.yok.gov.tr/
Tezbaşaran, E. (2016). Temel bileşenler analizi ve yapay sinir ağı modellerinin ölçek geliştirme sürecinde kullanılabilirliğinin incelenmesi (Doktora tezi, Mersin Üniversitesi, Mersin). Retrieved from http://tez2.yok.gov.tr/
Tosun, S. (2007). Sınıflandırmada yapay sinir ağları ve karar ağaçları karşılaştırması: Öğrenci başarıları üzerine bir uygulama (Yüksek lisans tezi, İstanbul Teknik Üniversitesi, İstanbul). Retrieved from http://tez2.yok.gov.tr/
Wharton, S. W. (1984). An analysis of the effects of sample size on classification performance of a histogram based cluster analysis procedure. Pattern Recognition, 17(2), 239-244.
Yurdakul, S. & Topal, T. (2015). Veri madenciliği ile lise öğrenci performanslarının değerlendirilmesi. XVII. Akademik Bilişim Konferansında sunulan bildiri. Anadolu Üniversitesi, Eskişehir.

Comparison of Data Mining Classification Algorithms on Educational Data under Different Conditions

Year 2020, Volume: 11 Issue: 4, 325 - 345, 30.12.2020

İlhan Koyuncu , Selahattin Gelbal

https://doi.org/10.21031/epod.696664

Cited By: 4

Abstract

The purpose of this study was to examine the performance of Naive Bayes, k-nearest neighborhood, neural networks, and logistic regression analysis in terms of sample size and test data rate in classifying students according to their mathematics performance. The target population was 62728 students in the 15-year-old group who were participated in the Programme for International Student Assessment (PISA) in 2012 from The Organisation for Economic Co-operation and Development (OECD) countries. The performance of each algorithm was tested by using 11%, 22%, 33%, 44% and 55% of each dataset for small (500 students), medium (1000 students) and large (5000 students) sample sizes. 100 replications were performed for each analysis. As the evaluation criteria, accuracy rates, RMSE values, and total elapsed time were used. RMSE values for each algorithm were statistically compared by using Friedman and Wilcoxon tests. The results revealed that while the classification performance of the methods increased as the sample size increased, the increase of training data ratio had different effects on the performance of the algorithms. The Naive Bayes showed high performance even in small samples, performed the analyzes very quickly, and was not affected by the change in the training data ratio. Logistic regression analysis was the most effective method in large samples but had a poor performance in small samples. While neural networks showed a similar tendency, its overall performance was lower than Naive Bayes and logistic regression. The lowest performances in all conditions were obtained by the k-nearest neighborhood algorithm.

Keywords

Artificial neural networks, educational data mining, k-nearest neighborhood, logistic regression, naive Bayes

References

Aha, D. W., Kibler, D. & Albert, M. K. (1991). Instance-based learning algorithms. Machine Learning 6, 37-66.
Aksu, G., & Guzeller, C. O. (2016). Classification of PISA 2012 mathematical literacy scores using decision-tree method: Turkey sampling. Education and Science, 41(185), 101-122.
Akpınar, H. (2014). Veri madenciliği veri analizi. Papatya Yayınları, İstanbul.
Baker, R. S. J. (2010). Data mining for education. International Encyclopedia of Education, 7(3), 112-118.
Baker, R.S.J. & Yacef, K. (2009). The state of educational data mining in 2009: A review and future visions. Journal of Educational Data Mining, 1 (1), 3-17.
Bahadır, E. (2013). Yapay sinir ağları ve lojistik regresyon analizi yaklaşımları ile öğretmen adaylarının akademik başarılarının tahmini (Yayımlanmamış Doktora Tezi). Marmara Üniversitesi, İstanbul.
Barker, K., Trafalis, T. & Rhoads, T. R. (2004). Learning from student data. In Proceedings of the 2004 Systems and Information Engineering Design Symposium (pp. 79-86). IEEE.
Berens, J., Schneider, K., Gortz, S., Oster, S., & Burghoff, J. (2019). Early detection of students at risk - predicting student dropouts using administrative student data from German universities and machine learning methods. JEDM | Journal of Educational Data Mining, 11(3), 1-41. https://doi.org/10.5281/zenodo.3594771.
Beleites, C., Neugebauer, U., Bocklitz, T., Krafft, C., & Popp, J. (2013). Sample size planning for classification models. Analytica Chimica Acta, 760, 25-33.
Berens, J., Schneider, K., Gortz, S., Oster, S., & Burghoff, J. (2019). Early detection of students at risk - predicting student dropouts using administrative student data from German universities and machine learning methods. Journal of Educational Data Mining, 11(3), 1-41. https://doi.org/10.5281/zenodo.3594771.
Bhardwaj, B. K. & Pal, S. (2011). Data mining: A prediction for performance improvement using classification. (IJCSIS) International Journal of Computer Science and Information Security, 9, (4), 136-140.
Blum, A. L. & Langley, P. (1997). Selection of relevant features and examples in machine learning. Artificial Intelligence, 97(1), 245–271.
Brain, D., & Webb, G. (1999). On the effect of data set size on bias and variance in classification learning. In Proceedings of the Fourth Australian Knowledge Acquisition Workshop, University of New South Wales (pp. 117-128), December 5-6, Sydney, Australia.
Bulut, O., & Yavuz, H. C. (2019). Educational data mining: A tutorial for the" Rattle" package in R. International Journal of Assessment Tools in Education, 6(5), 20-36.
Büyüköztürk, Ş., Çakmak-Kılıç, E., Akgün, Ö., Karadeniz, Ş. & Demirel, F. (2015). Bilimsel araştırma yöntemleri. Ankara: Pegem.
Cabrera, A. F. (1994). Logistic regression analysis in higher education: An applied perspective. Higher Education: Handbook of Theory and Research, 10, 225–256.
Chu, C., Hsu, A. L., Chou, K. H., Bandettini, P., Lin, C., & Alzheimer's Disease Neuroimaging Initiative. (2012). Does feature selection improve classification accuracy? Impact of sample size and feature selection on classification using anatomical magnetic resonance images. Neuroimage, 60(1), 59-70.
Cox, D. R. & Snell, E. J. (1989). The analysis of binary data (2nd ed.). London: Chapman and Hall.
Cohen, J. (1960). A coefficient of agreement for nominal scales. Educational and Psychological Measurement 20, 37-46.
Çırak, G. (2012). Yükseköğretimde öğrenci başarılarının sınıflandırılmasında yapay sinir ağları ve lojistik regresyon yöntemlerinin kullanılması (Yayımlanmamış Yüksek Lisans Tezi). Ankara Üniversitesi, Ankara.
Çölkesen, I., & Kavzoglu, T. (2010). Farklı boyutta eğitim örnekleri için destek vektör makinelerinin sınıflandırma performansının analizi. In Proceedings of III. Uzaktan Algılama ve Coğrafi Bilgi Sistemleri Sempozyumu (pp. 161-170), 11 – 13 Ekim, Gebze, Kocaeli, Türkiye.
Dekker, G. W., Pechenizkiy, M. ve Vleeshouwers, J. M. (2009). Predicting students drop out: A case study. In Proceedings of 2nd International Conference on Educational Data Mining (pp. 41-50). Spain, Cordoba.
Dunham, M.H. (2003). Data mining introductory and advanced topics. Upper Saddle River, NJ: Pearson Education, Inc.
Efron, B. (1983). Estimating the error rate of a prediction rule: Improvements on crossvalidation. J. Amer. Stat. Ass., 78, 316–331.
Egan, J. P. (1975). Signal detection theory and ROC analysis. New York: Academic Press.
Figueroa, R. L., Zeng-Treitler, Q., Kandula, S., & Ngo, L. H. (2012). Predicting sample size required for classification performance. BMC Medical Informatics and Decision Making, 12(1), 8.
Foody, G. M., Mathur, A., Sanchez-Hernandez, C., & Boyd, D. S. (2006). Training set size requirements for the classification of a specific class. Remote Sensing of Environment, 104(1), 1-14.
Fraenkel, J. R. & Wallen, N. E. (2011). How to design and evaluate research in education (6th ed.). New York: McGraw-Hill, Inc.
Frank, E., Hall M. A. & Witten, I. H. (2016). The WEKA workbench: Online appendix for "Data mining: Practical machine learning tools and techniques” (4th ed.). Morgan Kaufmann.
Ghosh, A. K. (2006). On optimum choice of k in nearest neighbor classification. Computational Statistics and Data Analysis, 50(11), 3113-3123.
Gorostiaga, A., & Rojo-Álvarez, J. L. (2016). On the use of conventional and statistical-learning techniques for the analysis of PISA results in Spain. Neurocomputing, 171, 625-637.
Göker, H. (2012). Üniversite giriş sınavında öğrencilerin başarılarının veri madenciliği yöntemleri ile tahmin edilmesi (Yüksek lisans tezi, Gazi Üniversitesi, Ankara). Retrieved from http://tez2.yok.gov.tr/
Güre, Ö. B., Kayri, M., & Erdoğan, F. (2020). Analysis of factors effecting PISA 2015 mathematics literacy via educational data mining. Education and Science, 45(202), 393-415. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P. & Witten, I. H. (2009). The WEKA data mining software: An update. ACM SIGKDD Explorations Newsletter, 11(1), 10-18.
Hall, P., Park, B. U. & Samworth, R. J. (2008). Choice of neighbor order in nearest-neighbor classification. The Annals of Statistics, 36(5), 2135-2152.
Han, J., Kamber, M. & Pei, J. (2011). Data mining: concepts and techniques (3rd ed.). MA, USA: Elsevier.
Hamalainen, W. & Vinni, M. (2006). Comparison of machine learning methods for intelligent tutoring systems. In Proceedings of International Conference on Intelligent Tutoring Systems (pp. 525-534). Springer Berlin/Heidelberg.
Hamalainen, W. & Vinni, M. (2011). Classifiers for educational technology. In C. Romero, S. Ventura, M. Pechenizkiy, R.S.J.d. Baker (Eds.), Handbook of educational data mining (pp. 54-74). CRC Press.
Heilman, M., & Madnani, N. (2015). The impact of training data on automated short answer scoring performance. In Proceedings of the Tenth Workshop on Innovative Use of NLP for Building Educational Applications (pp. 81-85), June 4, Association for Computational Linguistics, Denver, Colorado.
Heydari, S. S., & Mountrakis, G. (2018). Effect of classifier selection, reference sample size, reference class distribution and scene heterogeneity in per-pixel classification accuracy using 26 Landsat sites. Remote Sensing of Environment, 204, 648-658.
Huebner, R. A. (2013). A survey of educational data-mining research. Research in Higher Education Journal, 19, 1-13.
Karasar, N. (2005). Bilimsel araştırma yöntemi. Ankara: Nobel Yayın Dağıtım.
Kiray, S. A., Gok, B., & Bozkir, A. S. (2015). Identifying the factors affecting science and mathematics achievement using data mining methods. Journal of Education in Science, Environment and Health, 1(1), 28-48.
Kotsiantis, S. B., Pierrakeas, C. J. & Pintelas, P. E. (2003). Preventing student dropout in distance learning using machine learning techniques. In Knowledge-Based Intelligent Information and Engineering Systems (pp. 267-274). Springer Berlin/Heidelberg.
Lachenbruch, P. A. & Mickey, M. R. (1968). Estimation of error rates in discriminant analysis. Technometrics, 10(1), 1-11.
Larose, D. T. (2004). K-nearest neighbor algorithm. In Larose, D.T. and Larose, C.D. (Eds.), Discovering knowledge in data: An introduction to data mining (pp. 90-106). Hoboken, NJ, USA John Wiley and Sons, Inc.. https://doi.org/10.1002/0471687545.ch5.
Liu, H. & Motoda, H. (2001). Feature extraction, construction and selection: A data mining perspective. Boston: Kluwer Academic Publishers.
Liu, H., & Yu, L. (2005). Toward integrating feature selection algorithms for classification and clustering. IEEE Transactions on Knowledge and Data Engineering, 17(4), 491-502.
Martínez-Abad, F., Gamazo, A., & Rodríguez-Conde, M. J. (2020). Educational Data Mining: Identification of factors associated with school effectiveness in PISA assessment. Studies in Educational Evaluation, 66, 100875. https://doi.org/10.1016/j.stueduc.2020.100875.
Michie, D., Spiegelhalter, D.J. & Taylor, C.C. (1994). Machine learning, neural and statistical classification. Ellis Horwood Limited.
Minaei-Bidgoli, B., D.A. Kashy, G. Kortemeyer, & W. Punch. Predicting student performance: An application of data mining methods with an educational web-based system. In Proceedings of 33rd Frontiers in Education Conference, (pp. 13-18). Westminster, CO.
Nghe, N. T., Janecek, P. & Haddawy, P. (2007). A comparative analysis of techniques for predicting academic performance. In Frontiers in Education Conference-Global Engineering: Knowledge Without Borders, Opportunities Without Passports, (pp. T2G-7). IEEE.
Organisation for Economic Co-operation and Development (2014a). PISA 2012 results: What students know and can do - student performance in mathematics, reading and science (Volume I, Revised edition). PISA, OECD Publishing.
Organisation for Economic Co-operation and Development (2014b). PISA 2012 technical report. PISA, OECD Publishing.
Osmanbegović, E. & Suljić, M. (2012). Data mining approach for predicting student performance. Economic Review, 10(1), 3-12.
Peng, C.Y.J., Lee, K. L. & Ingersoll, G. M. (2002) An introduction to logistic regression analysis and reporting. The Journal of Educational Research, 96(1), 3-14. doi:10.1080/00220670209598786.
Peng, C. Y. J. & So, T. S. H. (2002). Logistic regression analysis and reporting: A primer. Understanding Statistics: Statistical Issues in Psychology, Education, and the Social Sciences, 1(1), 31-70.
Ranjan, J. & Malik, K. (2007). Effective educational process: A data mining approach. VINE, 37(4), 502-515.
Raudys, S., & Pikelis, V. (1980). On dimensionality, sample size, classification error, and complexity of classification algorithm in pattern recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, (3), 242-252.
Romero, C., Espejo, P. G., Zafra, A., Romero, J. R. & Ventura, S. (2013). Web usage mining for predicting final marks of students that use Moodle courses. Computer Applications in Engineering Education, 21(1), 135-146.
Romero, C., Ventura, S., Espejo, P. G. & Hervás, C. (2008). Data mining algorithms to classify students. In Proceedings of the 1st International Conference on Educational Data Mining (pp. 8-17). Montréal, Québec, Canada.
Romero, C. & Ventura, S. (2007). Educational data mining: A survey from 1995 to 2005. Expert Systems with Applications, 33(1), 135-146.
Romero, C. & Ventura, S. (2013). Data mining in education. WIREs Data Mining Knowledge Discovery 3(1), 12-27.
Shahiri, A. M., Husain, W. & Rashid, N. A. (2015). A review on predicting student's performance using data mining techniques. Procedia Computer Science, 72, 414-422.
Shao, L., Fan, X., Cheng, N., Wu, L., & Cheng, Y. (2013). Determination of minimum training sample size for microarray-based cancer outcome prediction–an empirical assessment. PloS one, 8(7), e68579. https://doi.org/10.1371/journal.pone.0068579.
Sivanandam, S., Sumathi, S., & Deepa, S. (2006). Introduction to neural networks using Matlab 6.0. New Delhi: Tata McGraw-Hill Publishing Company.
Şengür, D. (2013). Öğrencilerin akademik başarılarının veri madenciliği metotları ile tahmini (Yüksek lisans tezi, Fırat Üniversitesi, Elazığ). Erişim adresi: http://tez2.yok.gov.tr/
Sweeney, M., Lester, J., Rangwala, H., & Johri, A. (2016). Next-term student performance prediction: A recommender systems approach. JEDM | Journal of Educational Data Mining, 8(1), 22-51. https://doi.org/10.5281/zenodo.3554603.
Tabachnick, B. G. & Fidell, L. S. (2013). Using multivariate statistics (6th ed.). Boston: Pearson.
Tadjudin, S., & Landgrebe, D. (1998). Classification of high dimensional data with limited training samples (Report No. 56). West Lafayette, Indiana: Purdue University, School of Electrical and Computer Engineering. http://docs.lib.purdue.edu/ecetr/56.
Tayeh, N., Klein, A., Le Paslier, M. C., Jacquin, F., Houtin, H., Rond, C., ... & Burstin, J. (2015). Genomic prediction in pea: effect of marker density and training population size and composition on prediction accuracy. Frontiers in Plant Science, 6(941), 941. https://doi.org/10.3389/fpls.2015.00941.
Tepehan, T. (2011). Türk öğrencilerinin PISA başarılarının yordanmasında yapay sinir ağı ve lojistik regresyon modeli performanslarının karşılaştırılması (Doktora tezi, Hacettepe Üniversitesi, Ankara). Retrieved from http://tez2.yok.gov.tr/
Tezbaşaran, E. (2016). Temel bileşenler analizi ve yapay sinir ağı modellerinin ölçek geliştirme sürecinde kullanılabilirliğinin incelenmesi (Doktora tezi, Mersin Üniversitesi, Mersin). Retrieved from http://tez2.yok.gov.tr/
Tosun, S. (2007). Sınıflandırmada yapay sinir ağları ve karar ağaçları karşılaştırması: Öğrenci başarıları üzerine bir uygulama (Yüksek lisans tezi, İstanbul Teknik Üniversitesi, İstanbul). Retrieved from http://tez2.yok.gov.tr/
Wharton, S. W. (1984). An analysis of the effects of sample size on classification performance of a histogram based cluster analysis procedure. Pattern Recognition, 17(2), 239-244.
Yurdakul, S. & Topal, T. (2015). Veri madenciliği ile lise öğrenci performanslarının değerlendirilmesi. XVII. Akademik Bilişim Konferansında sunulan bildiri. Anadolu Üniversitesi, Eskişehir.

There are 75 citations in total.

Details

Primary Language	English
Journal Section	Articles
Authors	İlhan Koyuncu 0000-0002-0009-5279 Selahattin Gelbal 0000-0001-5181-7262
Publication Date	December 30, 2020
Acceptance Date	November 12, 2020
Published in Issue	Year 2020 Volume: 11 Issue: 4

Cite

APA	Koyuncu, İ., & Gelbal, S. (2020). Comparison of Data Mining Classification Algorithms on Educational Data under Different Conditions. Journal of Measurement and Evaluation in Education and Psychology, 11(4), 325-345. https://doi.org/10.21031/epod.696664

Cited By

Yapay Zekâ Teknikleriyle Yükseköğretim Kurumları Sınavı (YKS) Puanlarının Tahmini

Gazi Üniversitesi Fen Bilimleri Dergisi Part C: Tasarım ve Teknoloji

https://doi.org/10.29109/gujsc.1509217

Stacking: An ensemble learning approach to predict student performance in PISA 2022

Education and Information Technologies

https://doi.org/10.1007/s10639-024-13110-2

Classification of Students’ Mathematical Literacy Score Using Educational Data Mining: PISA 2015 Turkey Application

Cumhuriyet Science Journal

https://doi.org/10.17776/csj.1136733

Classification of Scale Items with Exploratory Graph Analysis and Machine Learning Methods

International Journal of Assessment Tools in Education

https://doi.org/10.21449/ijate.880914

Download Cover Image

Article Files

Full Text