Bilgisayarda Bireyselleştirilmiş Sınıflama Testinde Çok Kategorili Sınıflama İçin Sınıflama Koşullarının İncelenmesi

Demet Alkan; Nuri Doğan

doi:10.19171/uefad.1357800

Araştırma Makalesi

Bilgisayarda Bireyselleştirilmiş Sınıflama Testinde Çok Kategorili Sınıflama İçin Sınıflama Koşullarının İncelenmesi

Yıl 2024, Cilt: 37 Sayı: 1, 63 - 85, 30.04.2024

Demet Alkan , Nuri Doğan

https://doi.org/10.19171/uefad.1357800

Öz

Bu çalışmada R programlama dili ile çok kategorili sınıflama için Bilgisayarda Bireyselleştirilmiş Sınıflama Testi (BBST) kullanıldığında test etkililiğinin ve ölçme kesinliğinin sınıflama kriterleri, madde seçme yöntemleri, yetenek kestirim yöntemleri ve iki, üç, dört kategorili sınıflama kategori sayısı ile nasıl değiştiği araştırılmıştır. Simülasyonla iki kategorili, tek boyutlu 500 madde ve 1000 kişilik veri ile. 36 koşul belirlenmiştir. Tüm koşullar için 25 tekrarın ortalaması alınmıştır. Araştırma sonunda sınıflama kategori sayısı arttıkça Ortalama Test Uzunluğunun (OTU) arttığı, Ortalama Sınıflama Doğruluğu (OSD) azaldığı görülmüştür. Ortalama Hatanın Karekökü (RMSE), Ortalama Mutlak Hata (OMH), Yanlılık ve Gerçek Yetenekler ile Kestirilen Yetenekler Arasındaki Korelasyon (r) değerlerinin azaldığı anlaşılmıştır. OTU için Güven Aralığı (GA) sınıflama kriteri OSD, yanlılık, korelasyon, OMH için Ardışık Olasılık Oran Testi (AOOT) sınıflama kriterinin performansının daha etkili olduğu görülmüştür. Genelleştirilmiş Olabilirlik Oran (GOO) sınıflama kriterinin OTU bakımından GA kriterine benzer sonuçlar, mutlak hata yönünden ise AOOT sınıflama kriteri ile benzer sonuçlar oluşturduğu görülmüştür. Yetenek kestirim yöntemleri OSD ve OTU açısından benzer performans göstermiştir. Kesme Noktası (KN) temelli madde seçme yöntemleri Kestirilen Yetenek (KY) temelli madde seçme yöntemlerine göre test etkililiği ve ölçme kesinliği açısından daha etkili performans gösterdiği belirlenmiştir.

Anahtar Kelimeler

Bilgisayarda bireyselleştirilmiş sınıflama testi, Madde seçme yöntemi, Ölçme kesinliği, Sınıflama kategori sayısı, Sınıflama kriteri, Test etkililiği

Kaynakça

Arce-Ferrer, A., Frisbie, D. A., & Kolen, M. J. (2002). Standard errors of proportions used in Reporting changes in school performance with achievement levels. Educational Assessment, 8(1), 59-75.
Demir, S. (2019). Bireyselleştirilmiş bilgisayarlı sınıflama testlerinde sınıflama doğruluğunun incelenmesi [Investigation of classification accuracy in individualied computerized classification tests] (Yayın No. 600532) [Doktora tezi, Hacettepe Üniversitesi]. YÖK. https://tez.yok.gov.tr/UlusalTezMerkezi/
Eckes, T. (2017). Rater effects: Advances in item response modeling of human ratings–Part I. Psychological Test and Assessment Modeling, 59(4), 443-452.
Eggen, T. J. H. M. (1999). Item selection in adaptive testing with the sequential probability ratio test. Applied Psychological Measurement, 23(3), 249-261. https://doi.org/10.1177/01466219922031365
Eggen, T. J. H. M., & Straetmans, G. J. J. M. (2000). Computerized adaptive testing for classifying examinees into three categories. Educational and Psychological Measurement, 60(5), 713-734. https://doi.org/10.1177/00131640021970862
Gündeğer, C. (2017). Bireyselleştirilmiş bilgisayarlı sınıflama testi kriterlerinin sınıflama doğruluğu ve test uzunluğu açısından karşılaştırılması [Comparison of adaptive computerized classification test criteria in terms of classification accuracy and test length] (Yayın No. 483376) [Doktora tezi, Hacettepe Üniversitesi]. YÖK. https://tez.yok.gov.tr/UlusalTezMerkezi/
Haring, S. H. (2014). A comparison of three statistical testing procedures for computerized classification testing with multiple cutscores and item selection methods. (Doctoral dissertation, University of Texas at Austin). http://hdl.handle.net/2152/24838
Kaptan, S. (1995). Bilimsel araştırma teknikleri ve istatistik teknikleri. Rehber Yayınevi.
Kingsbury, G. G., & Weiss, D.J. (1983). A comparison of IRT-based adaptive mastery testing and a sequential mastery testing procedure. In D. J. Weiss (Ed.), New horizons in testing: Latent trait theory and computerized adaptive testing, (pp. 237-254). Academic Press.
Lau, C. A. (1996). Robustness of a unidimensional computerized testing mastery procedure with multidimensional testing data. (Doctoral Dissertation, The University of Iowa).
Lewis, C., & Sheehan, K. (1990). Using Bayesian decision theory to design a computerized mastery test. Applied Psychological Measurement, 14, 367-386.
Lin, C. J., & Spray, J. (2000). Effects of item-selection criteria on classification testing with the sequential probability ratio test. ACT (Research Report 2000-8). Iowa city, IA: ACT Research Report Series. https://eric.ed.gov/?id=ED445066
Nydick, S. W., Nozawa, Y., & Zhu, R. (2012, Nisan). Accuracy and efficiency in classifying examinees using computerized adaptive tests: An application to a large scale test. The National Council on Measurement in Education (NCME) toplantısında sunulan bildiri, Vancouver, BritishColumbia, Canada. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.476.3381&rep=re p1&type=pdf
Nydick, S. W. (2013). Multidimensional mastery testing with CAT. (Doctoral Dissertation, the University of Minnesota). Available from ProOuest Dissertations and Theses database. (UMI No. 3607925)
Nydick, S. W. (2014). catirt: An R Package for Simulating IRT-Based Computerized Adaptive Tests. https://cran.rproject.org/web/packages/catIrt/catIrt.pdf
Reckase, M. D. (1983). A procedure for decision making using tailored testing. In D. J. Weiss (Ed.). New horizonsin testing: latent trait theory and computerized adaptive testing. Academic Press.
R Core Team (2013). R: A language and environment for statistical computing, (Version 3.0.1) [Computer software], Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.Rproject.org/
Spray, J. A. & Reckase, M. D. (1994). The Selection of Test Items for Decision Making with a Computer Adaptive Test. The Annual Meeting of the National Council on Measurement in Education. NewOrleans, LA, 5-7 April 1994. https://eric.ed.gov/?id=ED372078
Spray, J. A., & Reckase, M. D. (1996). Comparison of SPRT and sequential bayes procedures for classifying examinees into two categories using a computerized test. Journal of Educational and Behavioral Statistics, 21(4), 405-414. https://doi.org/10.3102/10769986021004405
Thompson, N. A. (2007). A comparison of two methods of polytomous computerized classification testing for multiple cutscores Doctoral dissertation, University of Minnesota
Thompson, N. A. (2009). Item selection in computerized classification testing. Educational and Psychological Measurement, 69(5), 778-793. https://doi.org/10.1177/0013164408324460
Thompson, N. A. (2011). Termination criteria for computerized classification testing. Practical Assessment, Research & Evaluation, 16(4), 1-7. https://doi.org/10.7275/wq8m-zk25
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427-450. https://doi.org/10.1007/BF02294627
Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21(4), 361-375. https://doi.org/10.1111/j.1745-3984.1984.tb01040

Investigation of Classification Conditions For Multicategorical Classification in Computerized Adaptive Classification Test

Yıl 2024, Cilt: 37 Sayı: 1, 63 - 85, 30.04.2024

Demet Alkan , Nuri Doğan

https://doi.org/10.19171/uefad.1357800

Öz

This study used the Computerized Adaptıve Classification Test (CACT) for multi-category classification with R programming language to investigate how test effectiveness and measurement accuracy changed in terms of classification criteria, item selection methods, ability estimation methods, and two, three, and four-category classifications. With the simulation, two-category, one-dimensional 500 items and 1000-person data were created, 36 conditions were determined, and 25 repetitions were averaged for all conditions. Results showed that as the number of classification categories increased, the Average Test Length (ATL) increased and the Average Classification Accuracy (ACA) decreased. The Root Mean Square Error (RMSE), Mean Absolute Error (MAE), Bias, and Correlation (r) values between real and estimated thetas (r) values were found to decrease. The performance of the Confidence Interval (CI) classification criterion for ATL, ACA, bias, correlation, and the Sequential Probability Ratio Test (SPRT) classification criterion for MAE were found to be more effective. Generalized Likelihood Ratio (GLR) classification criterion produced similarresults to the CI criterion in terms of ATL, and to the SPRT classification criterion in terms of absolute error. Ability estimation methods were similar in terms of ACA and ATL. Cutscore based (CB) item selection methods were more effective in terms of test effectiveness and measurement accuracy than Estimated Ability -Based (EB) item selection methods.

Anahtar Kelimeler

Computerized adaptıve classification test, Classification criteria, Measurement accuracy, Number of classification categories, Item selection method, Test efficiency

Kaynakça

Arce-Ferrer, A., Frisbie, D. A., & Kolen, M. J. (2002). Standard errors of proportions used in Reporting changes in school performance with achievement levels. Educational Assessment, 8(1), 59-75.
Demir, S. (2019). Bireyselleştirilmiş bilgisayarlı sınıflama testlerinde sınıflama doğruluğunun incelenmesi [Investigation of classification accuracy in individualied computerized classification tests] (Yayın No. 600532) [Doktora tezi, Hacettepe Üniversitesi]. YÖK. https://tez.yok.gov.tr/UlusalTezMerkezi/
Eckes, T. (2017). Rater effects: Advances in item response modeling of human ratings–Part I. Psychological Test and Assessment Modeling, 59(4), 443-452.
Eggen, T. J. H. M. (1999). Item selection in adaptive testing with the sequential probability ratio test. Applied Psychological Measurement, 23(3), 249-261. https://doi.org/10.1177/01466219922031365
Eggen, T. J. H. M., & Straetmans, G. J. J. M. (2000). Computerized adaptive testing for classifying examinees into three categories. Educational and Psychological Measurement, 60(5), 713-734. https://doi.org/10.1177/00131640021970862
Gündeğer, C. (2017). Bireyselleştirilmiş bilgisayarlı sınıflama testi kriterlerinin sınıflama doğruluğu ve test uzunluğu açısından karşılaştırılması [Comparison of adaptive computerized classification test criteria in terms of classification accuracy and test length] (Yayın No. 483376) [Doktora tezi, Hacettepe Üniversitesi]. YÖK. https://tez.yok.gov.tr/UlusalTezMerkezi/
Haring, S. H. (2014). A comparison of three statistical testing procedures for computerized classification testing with multiple cutscores and item selection methods. (Doctoral dissertation, University of Texas at Austin). http://hdl.handle.net/2152/24838
Kaptan, S. (1995). Bilimsel araştırma teknikleri ve istatistik teknikleri. Rehber Yayınevi.
Kingsbury, G. G., & Weiss, D.J. (1983). A comparison of IRT-based adaptive mastery testing and a sequential mastery testing procedure. In D. J. Weiss (Ed.), New horizons in testing: Latent trait theory and computerized adaptive testing, (pp. 237-254). Academic Press.
Lau, C. A. (1996). Robustness of a unidimensional computerized testing mastery procedure with multidimensional testing data. (Doctoral Dissertation, The University of Iowa).
Lewis, C., & Sheehan, K. (1990). Using Bayesian decision theory to design a computerized mastery test. Applied Psychological Measurement, 14, 367-386.
Lin, C. J., & Spray, J. (2000). Effects of item-selection criteria on classification testing with the sequential probability ratio test. ACT (Research Report 2000-8). Iowa city, IA: ACT Research Report Series. https://eric.ed.gov/?id=ED445066
Nydick, S. W., Nozawa, Y., & Zhu, R. (2012, Nisan). Accuracy and efficiency in classifying examinees using computerized adaptive tests: An application to a large scale test. The National Council on Measurement in Education (NCME) toplantısında sunulan bildiri, Vancouver, BritishColumbia, Canada. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.476.3381&rep=re p1&type=pdf
Nydick, S. W. (2013). Multidimensional mastery testing with CAT. (Doctoral Dissertation, the University of Minnesota). Available from ProOuest Dissertations and Theses database. (UMI No. 3607925)
Nydick, S. W. (2014). catirt: An R Package for Simulating IRT-Based Computerized Adaptive Tests. https://cran.rproject.org/web/packages/catIrt/catIrt.pdf
Reckase, M. D. (1983). A procedure for decision making using tailored testing. In D. J. Weiss (Ed.). New horizonsin testing: latent trait theory and computerized adaptive testing. Academic Press.
R Core Team (2013). R: A language and environment for statistical computing, (Version 3.0.1) [Computer software], Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.Rproject.org/
Spray, J. A. & Reckase, M. D. (1994). The Selection of Test Items for Decision Making with a Computer Adaptive Test. The Annual Meeting of the National Council on Measurement in Education. NewOrleans, LA, 5-7 April 1994. https://eric.ed.gov/?id=ED372078
Spray, J. A., & Reckase, M. D. (1996). Comparison of SPRT and sequential bayes procedures for classifying examinees into two categories using a computerized test. Journal of Educational and Behavioral Statistics, 21(4), 405-414. https://doi.org/10.3102/10769986021004405
Thompson, N. A. (2007). A comparison of two methods of polytomous computerized classification testing for multiple cutscores Doctoral dissertation, University of Minnesota
Thompson, N. A. (2009). Item selection in computerized classification testing. Educational and Psychological Measurement, 69(5), 778-793. https://doi.org/10.1177/0013164408324460
Thompson, N. A. (2011). Termination criteria for computerized classification testing. Practical Assessment, Research & Evaluation, 16(4), 1-7. https://doi.org/10.7275/wq8m-zk25
Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427-450. https://doi.org/10.1007/BF02294627
Weiss, D. J., & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21(4), 361-375. https://doi.org/10.1111/j.1745-3984.1984.tb01040

Toplam 24 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Bilgisayar Tabanlı Sınav Uygulamaları
Bölüm	Makaleler
Yazarlar	Demet Alkan 0000-0002-1478-9183 Nuri Doğan 0000-0001-6274-2016
Yayımlanma Tarihi	30 Nisan 2024
Gönderilme Tarihi	9 Eylül 2023
Yayımlandığı Sayı	Yıl 2024 Cilt: 37 Sayı: 1

Kaynak Göster

APA	Alkan, D., & Doğan, N. (2024). Bilgisayarda Bireyselleştirilmiş Sınıflama Testinde Çok Kategorili Sınıflama İçin Sınıflama Koşullarının İncelenmesi. Journal of Uludag University Faculty of Education, 37(1), 63-85. https://doi.org/10.19171/uefad.1357800

Kapak Resmi İndir

Makale Dosyaları

Tam Metin

Bu eser Creative Commons Atıf-GayriTicari 4.0 Uluslararası Lisansı ile lisanslanmıştır.