Educational data mining: A tutorial for the rattle package in R
Yıl 2019,
, 20 - 36, 30.12.2019
Okan Bulut
,
Hatice Cigdem Yavuz
Öz
Educational
data mining (EDM) has been a rapidly growing research field over the last
decade and enabled researchers to discover patterns and trends in education
with more sophisticated methods. EDM offers promising solutions to complex
educational problems. Given the rapid increase in the availability of big data
in education and software programs to analyze big data, the demand for
user-friendly, free software programs to implement EDM methods also continues
to increase. The R programming language has become a popular environment for
data mining due to its availability and flexibility. The rattle package
in R contains a set of functions to implement data mining with a graphical user
interface. This study demonstrates three widely used data mining algorithms
(classification and regression tree, random forest, and support vector machine)
in EDM using real data from the 2015 administration of the Programme for
International Student Assessment (PISA). First, a brief introduction to EDM is
provided along with the description of the selected data mining algorithms.
Then, how to perform data mining analysis using the rattle’s graphical user interface is demonstrated. The
study concludes by comparing the results of the selected data mining algorithms
and highlighting how those algorithms can be utilized in the context of
educational research.
Kaynakça
- Agarwal, S., Pandey, G. N., & Tiwari, M. D. (2012). Data mining in education: Data classification and decision tree approach. International Journal of e-Education, e-Business, e-Management and e-Learning, 2(2), 140.
- Aldowah, H., Al-Samarraie, H., & Fauzy, W. M. (2019). Educational Data Mining and Learning Analytics for 21st century higher education: A Review and Synthesis. Telematics and Informatics, 37, 13-49.
- Aulck, L., Velagapudi, N., Blumenstock, J., & West, J. (2016). Predicting student dropout in higher education. arXiv preprint arXiv:1606.06364.
- Baker, R. S., Martin, T., & Rossi, L. M. (2017). Educational data mining and learning analytics. In A. A. Rupp & J. P. Leighton (Eds.), The handbook of cognition and assessment: Frameworks, methodologies, and applications (pp. 379-396). Oxford, UK: John Wiley & Sons, Inc.
- Berland, M., Baker, R. S., & Blikstein, P. (2014). Educational data mining and learning analytics: Applications to constructionist research. Technology, Knowledge and Learning, 19(1-2), 205-220.
- Breiman, L. (2001). Random forest. Machine Learning, 45(1), 5–32.
- Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Belmont, CA: Wadsworth.
- Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
- Ducange, P., Pecori, R., Sarti, L., & Vecchio, M. (2016, October). Educational big data mining: how to enhance virtual learning environments. In International Joint Conference SOCO’16-CISIS’16-ICEUTE’16 (pp. 681-690). Springer, Cham.
- Dutt, A., Ismail, M. A., & Herawan, T. (2017). A systematic review on educational data mining. IEEE Access, 5, 15991-16005.
- Fernández-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems? The Journal of Machine Learning Research, 15(1), 3133-3181.
- Guruler, H., Istanbullu, A., & Karahasan, M. (2010). A new student performance analysing system using knowledge discovery in higher educational databases. Computers & Education, 55(1), 247-254.
- Hussain, M., Zhu, W., Zhang, W., Abidi, S. M. R., & Ali, S. (2019). Using machine learning to predict student difficulties from learning session data. Artificial Intelligence Review, 52(1), 381-407.
- Ivancevic, V., Celikovic, M., & Lukovic, I. (2011). Analyzing student spatial deployment in a computer laboratory. In Proceedings of the 4th international conference on educational data mining (pp. 265–270).
- Koon, S., & Petscher, Y. (2015). Comparing methodologies for developing an early warning system: Classification and regression tree model versus logistic regression. REL 2015-077. Regional Educational Laboratory Southeast.
- Koon, S., & Petscher, Y. (2016). Can scores on an interim high school reading assessment accurately predict low performance on college readiness exams? REL 2016-124. Regional Educational Laboratory Southeast.
- Lawrence, M., & Lang, D. T. (2010). RGtk2: A ghraphical user interface toolkit for R. Journal of Statistical Software, 37(8), 1-52.
- Mccuaig, J., & Baldwin, J. (2012). Identifying successful learners from interaction behaviour. In Proceedings of the 5th international conference on educational data mining (pp. 160–163).
- Mostafa, T., Echazarra, A., & Guillou, H. (2018). The science of teaching science: An exploration of science teaching practices in PISA 2015. OECD Education Working Papers, No. 188. Paris, France: OECD Publishing.
- OECD (2017). PISA 2015 Assessment and Analytical Framework: Science, Reading, Mathematic, Financial Literacy and Collaborative Problem Solving. PISA, OECD Publishing, Paris, https://doi.org/10.1787/9789264281820-en
- OECD (2018). PISA 2015 results in focus. Retrieved from https://www.oecd.org/pisa/pisa-2015-results-in-focus.pdf
- Pardos, Z. A., Wang, Q. Y., & Trivedi, S. (2012). The real world significance of performance prediction. In Proceedings of the 5th international conference on educational data mining (pp. 192–195).
- Peña-Ayala, A. (2014). Educational data mining: A survey and a data mining-based analysis of recent works. Expert System with Applications, 41(4), 1432 1462. http://dx.doi.org/10.1016/j.eswa.2013.08.042
- R Core Team (2019). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
- Sinharay, S. (2016). An NCME instructional module on data mining methods for classification and regression. Educational Measurement: Issues and Practice, 35(3), 38–54. http://dx.doi.org/10.1111/emip.12088
- Slater, S., Joksimović, S., Kovanovic, V., Baker, R. S., & Gasevic, D. (2017). Tools for Educational Data Mining: A Review. Journal of Educational and Behavioral Statistics, 42(1), 85–106. https://doi.org/10.3102/1076998616666808
- Spikol, D., Ruffaldi, E., Dabisias, G., & Cukurova, M. (2018). Supervised machine learning in multimodal learning analytics for estimating success in project‐based learning. Journal of Computer Assisted Learning, 34(4), 366-377.
- Strobl, C. (2013). Data mining. In T. Little (Ed.), The Oxford handbook of quantitative methods in psychology (Vol. 2, pp. 678–700). New York, NY: Oxford University Press.
- Venables, W. N., Smith, D. N., & the R Core Team (2019). An introduction to R. Retrieved from https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf
- Williams, G. J. (2011). Data mining with Rattle and R: The art of excavating data for knowledge discovery. New York: Springer-Verlag.
Educational data mining: A tutorial for the rattle package in R
Yıl 2019,
, 20 - 36, 30.12.2019
Okan Bulut
,
Hatice Cigdem Yavuz
Öz
Educational data mining (EDM) has been a rapidly growing research field over the last decade and enabled researchers to discover patterns and trends in education with more sophisticated methods. EDM offers promising solutions to complex educational problems. Given the rapid increase in the availability of big data in education and software programs to analyze big data, the demand for user-friendly, free software programs to implement EDM methods also continues to increase. The R programming language has become a popular environment for data mining due to its availability and flexibility. The rattle package in R contains a set of functions to implement data mining with a graphical user interface. This study demonstrates three widely used data mining algorithms (classification and regression tree, random forest, and support vector machine) in EDM using real data from the 2015 administration of the Programme for International Student Assessment (PISA). First, a brief introduction to EDM is provided along with the description of the selected data mining algorithms. Then, how to perform data mining analysis using the rattle’s graphical user interface is demonstrated. The study concludes by comparing the results of the selected data mining algorithms and highlighting how those algorithms can be utilized in the context of educational research.
Kaynakça
- Agarwal, S., Pandey, G. N., & Tiwari, M. D. (2012). Data mining in education: Data classification and decision tree approach. International Journal of e-Education, e-Business, e-Management and e-Learning, 2(2), 140.
- Aldowah, H., Al-Samarraie, H., & Fauzy, W. M. (2019). Educational Data Mining and Learning Analytics for 21st century higher education: A Review and Synthesis. Telematics and Informatics, 37, 13-49.
- Aulck, L., Velagapudi, N., Blumenstock, J., & West, J. (2016). Predicting student dropout in higher education. arXiv preprint arXiv:1606.06364.
- Baker, R. S., Martin, T., & Rossi, L. M. (2017). Educational data mining and learning analytics. In A. A. Rupp & J. P. Leighton (Eds.), The handbook of cognition and assessment: Frameworks, methodologies, and applications (pp. 379-396). Oxford, UK: John Wiley & Sons, Inc.
- Berland, M., Baker, R. S., & Blikstein, P. (2014). Educational data mining and learning analytics: Applications to constructionist research. Technology, Knowledge and Learning, 19(1-2), 205-220.
- Breiman, L. (2001). Random forest. Machine Learning, 45(1), 5–32.
- Breiman, L., Friedman, J. H., Olshen, R. A., & Stone, C. J. (1984). Classification and regression trees. Belmont, CA: Wadsworth.
- Cortes, C., & Vapnik, V. (1995). Support-vector networks. Machine learning, 20(3), 273-297.
- Ducange, P., Pecori, R., Sarti, L., & Vecchio, M. (2016, October). Educational big data mining: how to enhance virtual learning environments. In International Joint Conference SOCO’16-CISIS’16-ICEUTE’16 (pp. 681-690). Springer, Cham.
- Dutt, A., Ismail, M. A., & Herawan, T. (2017). A systematic review on educational data mining. IEEE Access, 5, 15991-16005.
- Fernández-Delgado, M., Cernadas, E., Barro, S., & Amorim, D. (2014). Do we need hundreds of classifiers to solve real world classification problems? The Journal of Machine Learning Research, 15(1), 3133-3181.
- Guruler, H., Istanbullu, A., & Karahasan, M. (2010). A new student performance analysing system using knowledge discovery in higher educational databases. Computers & Education, 55(1), 247-254.
- Hussain, M., Zhu, W., Zhang, W., Abidi, S. M. R., & Ali, S. (2019). Using machine learning to predict student difficulties from learning session data. Artificial Intelligence Review, 52(1), 381-407.
- Ivancevic, V., Celikovic, M., & Lukovic, I. (2011). Analyzing student spatial deployment in a computer laboratory. In Proceedings of the 4th international conference on educational data mining (pp. 265–270).
- Koon, S., & Petscher, Y. (2015). Comparing methodologies for developing an early warning system: Classification and regression tree model versus logistic regression. REL 2015-077. Regional Educational Laboratory Southeast.
- Koon, S., & Petscher, Y. (2016). Can scores on an interim high school reading assessment accurately predict low performance on college readiness exams? REL 2016-124. Regional Educational Laboratory Southeast.
- Lawrence, M., & Lang, D. T. (2010). RGtk2: A ghraphical user interface toolkit for R. Journal of Statistical Software, 37(8), 1-52.
- Mccuaig, J., & Baldwin, J. (2012). Identifying successful learners from interaction behaviour. In Proceedings of the 5th international conference on educational data mining (pp. 160–163).
- Mostafa, T., Echazarra, A., & Guillou, H. (2018). The science of teaching science: An exploration of science teaching practices in PISA 2015. OECD Education Working Papers, No. 188. Paris, France: OECD Publishing.
- OECD (2017). PISA 2015 Assessment and Analytical Framework: Science, Reading, Mathematic, Financial Literacy and Collaborative Problem Solving. PISA, OECD Publishing, Paris, https://doi.org/10.1787/9789264281820-en
- OECD (2018). PISA 2015 results in focus. Retrieved from https://www.oecd.org/pisa/pisa-2015-results-in-focus.pdf
- Pardos, Z. A., Wang, Q. Y., & Trivedi, S. (2012). The real world significance of performance prediction. In Proceedings of the 5th international conference on educational data mining (pp. 192–195).
- Peña-Ayala, A. (2014). Educational data mining: A survey and a data mining-based analysis of recent works. Expert System with Applications, 41(4), 1432 1462. http://dx.doi.org/10.1016/j.eswa.2013.08.042
- R Core Team (2019). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing.
- Sinharay, S. (2016). An NCME instructional module on data mining methods for classification and regression. Educational Measurement: Issues and Practice, 35(3), 38–54. http://dx.doi.org/10.1111/emip.12088
- Slater, S., Joksimović, S., Kovanovic, V., Baker, R. S., & Gasevic, D. (2017). Tools for Educational Data Mining: A Review. Journal of Educational and Behavioral Statistics, 42(1), 85–106. https://doi.org/10.3102/1076998616666808
- Spikol, D., Ruffaldi, E., Dabisias, G., & Cukurova, M. (2018). Supervised machine learning in multimodal learning analytics for estimating success in project‐based learning. Journal of Computer Assisted Learning, 34(4), 366-377.
- Strobl, C. (2013). Data mining. In T. Little (Ed.), The Oxford handbook of quantitative methods in psychology (Vol. 2, pp. 678–700). New York, NY: Oxford University Press.
- Venables, W. N., Smith, D. N., & the R Core Team (2019). An introduction to R. Retrieved from https://cran.r-project.org/doc/manuals/r-release/R-intro.pdf
- Williams, G. J. (2011). Data mining with Rattle and R: The art of excavating data for knowledge discovery. New York: Springer-Verlag.