Research Article
BibTex RIS Cite

Bireyselleştirilmiş Bilgisayarlı Sınıflama Testi Kriterlerinin Test Etkililiği ve Ölçme Kesinliği Açısından Karşılaştırılması

Year 2018, Volume: 9 Issue: 2, 161 - 177, 30.06.2018
https://doi.org/10.21031/epod.401077

Abstract

Bu çalışmada Bireyselleştirilmiş Bilgisayarlı
Sınıflama Testleri’nin (BBST) etkililiğinin sınıflama kriterlerine, madde seçme
ve yetenek kestirim yöntemlerine göre nasıl değiştiğinin belirlenmesi
amaçlanmıştır. Bu amaçla 3 Parametreli Lojistik Model temel alınmış; belirlenen
kesme noktası ve etrafında yüksek bilgi verecek şekilde 500 maddelik bir havuz
oluşturulmuş; birey yetenekleri (N(0,1)) 3000 kişi üzerinden türetilmiş ve
bireylerin madde cevap örüntüleri R yazılımda rasgele türetilmiştir. Sınıflama
kriterlerinden Ardışık Olasılık Oran Testi (AOOT), Genelleştirilmiş Olabilirlik
Oranı (GOO) ve Güven Aralığı (GA) yöntemleri; yetenek kestirim yöntemlerinden
Beklenen Sonsal Dağılım (BSD) ve Ağırlıklandırılmış Olabilirlik Kestirimi (AOK)
yöntemleri; madde seçme yöntemlerinden ise kesme noktasında (KN) ve kestirilen
yetenek (KY) temelinde Maksimum Fisher Bilgisi (MFB) ve Kullback-Leibler
Bilgisi (KLB) yöntemleri çaprazlanarak 48 koşul oluşturulmuştur. R yazılımında
yürütülen BBST simülasyonu sonunda, ortalama test uzunluğu (OTU), ortalama
sınıflama doğruluğu (OSD), bireylerin gerçek yetenek düzeyleri ile kestirilen
yetenek düzeyleri arasındaki korelasyon (r), yanlılık, RMSE ve ortalama mutlak
hata (OMH) değerlerinin 25 tekrara ait ortalamaları hesaplanmıştır. Araştırma
sonuçlarına göre test etkililiği bakımından GOO ve GA yöntemlerinin; ölçme
kesinliği bakımından ise AOOT’nin daha iyi performans gösterdiği; sınıflama
kriterlerinin farksızlık bölgesi genişledikçe veya hata düzeyi değeri
küçüldükçe test etkililiğinin arttığı; sınıflama kriterlerinin tümünün her
koşulda oldukça yüksek düzeyde sınıflama doğruluğuna sahip olduğu
belirlenmiştir. Bireylerin gerçek yetenek düzeyleri ile kestirilen yetenek
düzeyleri arasındaki korelasyon bakımından BSD ve AOK yetenek kestirim
yöntemlerinin her ikisinin de başarılı kestirimlerde bulundukları ancak ölçme
kesinliği bakımından BSD’nin daha iyi performans sergilediği; madde seçme
yöntemlerinin ise tümünün birbirine benzer çalıştığı ancak MFB-KY’nin tüm
bağımlı değişkenler açısından tüm koşullarda daha iyi performans gösterdiği
görülmüştür.

References

  • Boyd, A. M. (2003). Strategies for controlling testlet exposure rates in computerized adaptive testing systems. (Doctoral Dissertation). Available from ProOuest Dissertations and Theses database. (UMI No. 3110732)
  • Cheng, P. E. & Liou, M. (2000). Estimation of trait level in computerized adaptive testing. Applied Psychological Measurement, 24(3), 257–265
  • Dooley, K. (2002). Simulation research methods. In J. Baum (Ed.). Companion to organizations. London: Blackwell.
  • Eggen, T. J. H. M. (1999). Item selection in adaptive testing with the sequential probability ratio test. Applied Psychological Measurement, 23(3), 249-261
  • Eggen, T. J. H. M. & Straetmans, G. J. J. M. (2000). Computerized adaptive testing for classifying examinees into three categories. Educational and Psychological Measurement, 60(5), 713-734
  • Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologist. London: Lawrence Erlbaum Associates Publishers
  • Hambleton, R. K. & Swaminathan, H. (1985). Item response theory: principles and applications. Boston: Kluwer Nijhoff Publishing
  • Lau, C. A. & Wang, T. (1998, April). Comparing and combining dichotomous and polytomous items with SPRT procedure in computerized classification testing. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA.
  • Lau, C. A. & Wang, T. (1999, April). Computerized classification testing under practical constraints with a polytomous model. Paper presented at the annual meeting of the American Educational Research Association, Montreal, Canada.
  • Lin, C. J. & Spray, J. (2000). Effects of item-selection criteria on classification testing with the sequential probability ratio test. ACT Research Report Series 2000-8. [Online: https://eric.ed.gov/?id=ED445066, Accessed date: 26.2.2014.]
  • McBride, J. R. (1985). Computerized adaptive testing. Educational Leadership, 43(2), 25 -28
  • Miller, I. & Miller, M. (2004). John E. Freund’s mathematical statistics with applications. New Jersey: Prentice Hall
  • Nydick, S. W., Nozawa, Y. & Zhu, R. (2012, April). Accuracy and efficiency in classifying examinees using computerized adaptive tests: an application to a large scale test. Paper presented at the annual meeting of the National Council on Measurement in Education, Vancouver, Canada.
  • Nydick, S. W. (2013). Multidimensional mastery testing with CAT. (Doctoral Dissertation). Available from ProOuest Dissertations and Theses database. (UMI No. 3607925)
  • Nydick, S. W. (2014). catirt: An R Package for Simulating IRT-Based Computerized Adaptive Tests. [Online: https://cran.r-project.org/web/packages/catIrt/catIrt.pdf, Accessed date: 20.5.2015.]
  • R Core Team. (2013). R: A language and environment for statistical computing, (Version 3.0.1), Vienna, Austria: R Foundation for Statistical Computing. Online: http://www.R-project.org/
  • Reckase, M. D. (1983). A procedure for decision making using tailored testing. In D. J. Weiss (Ed.). New horizons in testing: latent trait theory and computerized adaptive testing. New York: Academic Press.
  • Spray, J. A. & Reckase, M. D. (1994, April). The selection of test items for decision making with a computer adaptive test. Paper presented at the annual meeting of the National Council on Measurement in Education, NewOrleans, LA.
  • Şencan, H. (2005). Sosyal ve davranışsal ölçümlerde güvenirlilik ve geçerlilik. Ankara: Seçkin Yayıncılık.
  • Thompson, N. A. & Ro, S. (2007). Computerized classification testing with composite hypotheses. In D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. Retrieved [22.3.2014] from www.psych.umn.edu/psylabs/CATCentral/
  • Thompson, N. A. (2007b). A practitioner’s guide for variable-length computerized classification testing. Practical Assessment Research & Evaluation, 12(1), 1-13
  • Thompson, N. A. (2009). Item selection in computerized classification testing. Educational and Psychological Measurement, 69(5), 778-793
  • Thompson, N. A. (2011). Termination criteria for computerized classification testing. Practical assessment, Research & Evaluation, 16(4), 1-7
  • van der Linden, W. J. (1990). Applications of decision theory to test-based decision making. In R. K. Hambleton & J. N. Zaal (Eds.). Advances in educational and psychological measurement. Massachusetts: Kluwer-Nijhof.
  • Wainer, H. (2000). Computerized adaptive testing: a primer. New Jersey: Lawrence Erlbaum Associates
  • Wald, A. (1947). Sequential analysis. New York: John Wiley
  • Wang, T., Hanson, B. A. & Lau, C. A. (1999). Reducing bias in CAT trait estimation: a comparison of approaches. Applied Psychological Measurement, 23(3), 263-278
  • Wang, S. & Wang, T. (2001). Precision of Warm’s weighted likelihood estimates for a polytomous model in computerized adaptive testing. Applied Psychological Measurement, 25(4), 317–331
  • Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427-450
  • Weiss, D. J. (1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement, 6(4), 473-492
  • Weiss, D. J. & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21(4), 361-375
  • Yang, X, Poggio, J. C. & Glasnapp, D. R. (2006). Effects of estimation bias on multiple category classification with an IRT-based adaptive classification procedure. Educational and Psychological Measurement, 66(4), 545-564
  • Yi, Q., Wang, T. & Ban, J. (2000). Effects of scale transformation and test termination rule on the precision of ability estimates in CAT. ACT Research Report Series, 2000-2. [Online: http://onlinelibrary.wiley.com/doi/10.1111/j.1745-3984.2001.tb01127.x/full, Accessed date: 7.12.2015.]

A Comparison of Computerized Adaptive Classification Test Criteria in Terms of Test Efficiency and Measurement Precision

Year 2018, Volume: 9 Issue: 2, 161 - 177, 30.06.2018
https://doi.org/10.21031/epod.401077

Abstract

In this study, it was aimed to determine how the efficiency of the Computerized Adaptive Classification Testing (CACT) changes according to classification criteria, item selection and ability estimation methods. For this purpose, a pool of 500 items, which is based on 3 PLM and informs at the arbitrary cut-point and around, has been generated; individual abilities have been generated using normal distribution (N(0,1)) for 3000 individuals and the item response patterns have been generated randomly in R software with the Monte Carlo simulation. As classification criteria, Sequential Probability Ratio Test (SPRT), Generalized Likelihood Ratio (GLR) and Confidence Interval (CI) methods; as ability estimation methods, Expected a Posteriori (EAP) and Weighted Likelihood Estimation (WLE) methods; and as item selection methods, Maximum Fisher Information (MFI) and Kullback-Leibler Information (KLI) methods on the basis of cut-point (CP) and estimated ability (EA) have been crossed and 48 conditions have been investigated. At the end of the CACT simulations in R, the mean values of Average Test Length (ATL), Average Classification Accuracy (ACA), correlation between the true thetas and estimated thetas (r), bias, Root Mean Square Error (RMSE) and Mean Absolute Error (MAE) for 25 replications have been calculated. According to the results of the study, it has been observed that the GLR and the CI classification criteria perform better in terms of test efficiency, however the SPRT works better in terms of the measurement precision; test efficiency increases as the indifference region of classification criteria expands or the error value decreases; all classification criteria have considerably high level of the classification accuracy in all conditions. It has been concluded that both ability estimation methods have successful estimation results in terms of the correlation between true and estimated thetas (r); whereas the EAP relatively performs better in terms of the measurement precision; and all of the item selection methods work similarly to each other however the MFI-EA performs better for all conditions in terms of all dependent variables.

References

  • Boyd, A. M. (2003). Strategies for controlling testlet exposure rates in computerized adaptive testing systems. (Doctoral Dissertation). Available from ProOuest Dissertations and Theses database. (UMI No. 3110732)
  • Cheng, P. E. & Liou, M. (2000). Estimation of trait level in computerized adaptive testing. Applied Psychological Measurement, 24(3), 257–265
  • Dooley, K. (2002). Simulation research methods. In J. Baum (Ed.). Companion to organizations. London: Blackwell.
  • Eggen, T. J. H. M. (1999). Item selection in adaptive testing with the sequential probability ratio test. Applied Psychological Measurement, 23(3), 249-261
  • Eggen, T. J. H. M. & Straetmans, G. J. J. M. (2000). Computerized adaptive testing for classifying examinees into three categories. Educational and Psychological Measurement, 60(5), 713-734
  • Embretson, S. E. & Reise, S. P. (2000). Item response theory for psychologist. London: Lawrence Erlbaum Associates Publishers
  • Hambleton, R. K. & Swaminathan, H. (1985). Item response theory: principles and applications. Boston: Kluwer Nijhoff Publishing
  • Lau, C. A. & Wang, T. (1998, April). Comparing and combining dichotomous and polytomous items with SPRT procedure in computerized classification testing. Paper presented at the annual meeting of the American Educational Research Association, San Diego, CA.
  • Lau, C. A. & Wang, T. (1999, April). Computerized classification testing under practical constraints with a polytomous model. Paper presented at the annual meeting of the American Educational Research Association, Montreal, Canada.
  • Lin, C. J. & Spray, J. (2000). Effects of item-selection criteria on classification testing with the sequential probability ratio test. ACT Research Report Series 2000-8. [Online: https://eric.ed.gov/?id=ED445066, Accessed date: 26.2.2014.]
  • McBride, J. R. (1985). Computerized adaptive testing. Educational Leadership, 43(2), 25 -28
  • Miller, I. & Miller, M. (2004). John E. Freund’s mathematical statistics with applications. New Jersey: Prentice Hall
  • Nydick, S. W., Nozawa, Y. & Zhu, R. (2012, April). Accuracy and efficiency in classifying examinees using computerized adaptive tests: an application to a large scale test. Paper presented at the annual meeting of the National Council on Measurement in Education, Vancouver, Canada.
  • Nydick, S. W. (2013). Multidimensional mastery testing with CAT. (Doctoral Dissertation). Available from ProOuest Dissertations and Theses database. (UMI No. 3607925)
  • Nydick, S. W. (2014). catirt: An R Package for Simulating IRT-Based Computerized Adaptive Tests. [Online: https://cran.r-project.org/web/packages/catIrt/catIrt.pdf, Accessed date: 20.5.2015.]
  • R Core Team. (2013). R: A language and environment for statistical computing, (Version 3.0.1), Vienna, Austria: R Foundation for Statistical Computing. Online: http://www.R-project.org/
  • Reckase, M. D. (1983). A procedure for decision making using tailored testing. In D. J. Weiss (Ed.). New horizons in testing: latent trait theory and computerized adaptive testing. New York: Academic Press.
  • Spray, J. A. & Reckase, M. D. (1994, April). The selection of test items for decision making with a computer adaptive test. Paper presented at the annual meeting of the National Council on Measurement in Education, NewOrleans, LA.
  • Şencan, H. (2005). Sosyal ve davranışsal ölçümlerde güvenirlilik ve geçerlilik. Ankara: Seçkin Yayıncılık.
  • Thompson, N. A. & Ro, S. (2007). Computerized classification testing with composite hypotheses. In D. J. Weiss (Ed.). Proceedings of the 2007 GMAC Conference on Computerized Adaptive Testing. Retrieved [22.3.2014] from www.psych.umn.edu/psylabs/CATCentral/
  • Thompson, N. A. (2007b). A practitioner’s guide for variable-length computerized classification testing. Practical Assessment Research & Evaluation, 12(1), 1-13
  • Thompson, N. A. (2009). Item selection in computerized classification testing. Educational and Psychological Measurement, 69(5), 778-793
  • Thompson, N. A. (2011). Termination criteria for computerized classification testing. Practical assessment, Research & Evaluation, 16(4), 1-7
  • van der Linden, W. J. (1990). Applications of decision theory to test-based decision making. In R. K. Hambleton & J. N. Zaal (Eds.). Advances in educational and psychological measurement. Massachusetts: Kluwer-Nijhof.
  • Wainer, H. (2000). Computerized adaptive testing: a primer. New Jersey: Lawrence Erlbaum Associates
  • Wald, A. (1947). Sequential analysis. New York: John Wiley
  • Wang, T., Hanson, B. A. & Lau, C. A. (1999). Reducing bias in CAT trait estimation: a comparison of approaches. Applied Psychological Measurement, 23(3), 263-278
  • Wang, S. & Wang, T. (2001). Precision of Warm’s weighted likelihood estimates for a polytomous model in computerized adaptive testing. Applied Psychological Measurement, 25(4), 317–331
  • Warm, T. A. (1989). Weighted likelihood estimation of ability in item response theory. Psychometrika, 54(3), 427-450
  • Weiss, D. J. (1982). Improving measurement quality and efficiency with adaptive testing. Applied Psychological Measurement, 6(4), 473-492
  • Weiss, D. J. & Kingsbury, G. G. (1984). Application of computerized adaptive testing to educational problems. Journal of Educational Measurement, 21(4), 361-375
  • Yang, X, Poggio, J. C. & Glasnapp, D. R. (2006). Effects of estimation bias on multiple category classification with an IRT-based adaptive classification procedure. Educational and Psychological Measurement, 66(4), 545-564
  • Yi, Q., Wang, T. & Ban, J. (2000). Effects of scale transformation and test termination rule on the precision of ability estimates in CAT. ACT Research Report Series, 2000-2. [Online: http://onlinelibrary.wiley.com/doi/10.1111/j.1745-3984.2001.tb01127.x/full, Accessed date: 7.12.2015.]
There are 33 citations in total.

Details

Primary Language Turkish
Journal Section Articles
Authors

Ceylan Gündeğer 0000-0003-3572-1708

Nuri Doğan 0000-0001-6274-2016

Publication Date June 30, 2018
Acceptance Date May 22, 2018
Published in Issue Year 2018 Volume: 9 Issue: 2

Cite

APA Gündeğer, C., & Doğan, N. (2018). Bireyselleştirilmiş Bilgisayarlı Sınıflama Testi Kriterlerinin Test Etkililiği ve Ölçme Kesinliği Açısından Karşılaştırılması. Journal of Measurement and Evaluation in Education and Psychology, 9(2), 161-177. https://doi.org/10.21031/epod.401077