A practical guide to item bank calibration with multiple matrix sampling

Eren Can Aybek; Serkan Arıkan; Güneş Ertaş

doi:10.21449/ijate.1440316

Araştırma Makalesi

A practical guide to item bank calibration with multiple matrix sampling

Yıl 2024, , 647 - 659, 15.11.2024

Eren Can Aybek , Serkan Arıkan , Güneş Ertaş

https://doi.org/10.21449/ijate.1440316

Öz

When it is required to estimate item parameters of a large item bank, Multiple Matrix Sampling (MMS) design provides an efficient way while minimizing the test burden on students. The current study exemplifies how to calibrate a large item pool using MMS design for various purposes, such as developing a CAT administration. The purpose of the current study is to explain and provide an example of how to use MMS design for item bank calibration. Two functions of mirt package, mirt() and multipleGroup() were compared using real data. The results of the present study showed that the standard mirt() function is more practical and makes more precise estimations compared to the multipleGroup() function.

Anahtar Kelimeler

Multiple matrix sampling, Item bank development, Item response theory

Destekleyen Kurum

Bogazici University

Proje Numarası

BAP-SUP 17002

Kaynakça

Blötner, C. (2024). Package ‘diffcor’. https://cran.r project.org/web/packages/diffcor/diffcor.pdf
Bock, R.D., & Zimowski, M.F. (1997). Multiple group IRT. In W.J. van der Linden & R.K. Hambleton (Eds.), Handbook of Modern Item Response Theory (pp. 433–448). Springer. https://doi.org/10.1007/978-1-4757-2691-6_25
Chalmers, R.P. (2012). mirt: A multidimensional item response theory package for the R Environment. Journal of Statistical Software, 48, 1 29. https://doi.org/10.18637/jss.v048.i06
Chalmers, R.P. (2023). Package “mirt”. https://cran.r-project.org/web/packages/mirt/mirt.pdf
Chen, W.-H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265–289. https://doi.org/10.2307/1165285
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Routledge. https://doi.org/10.4324/9780203771587
Gonzalez, E., & Rutkowski, L. (2010). Principles of Multiple Matrix Booklet Designs and Parameter Recovery in Large-Scale Assessments (pp. 125–156). IERI.
Gressard, R.P., & Loyd, B.H. (1991). A comparison of item sampling plans in the application of multiple matrix sampling. Journal of Educational Measurement, 28(2), 119–130.
Kaplan, D., & Su, D. (2016). On matrix sampling and imputation of context questionnaires with implications for the generation of plausible values in large-scale assessments. Journal of Educational and Behavioral Statistics, 41(1), 57–80.
Lord, F.M. (1962). Estimating norms by item-sampling. Educational and Psychological Measurement, 22(2), 259–267. https://doi.org/10.1177/001316446202200202
Lord, F.M. (1965). Item sampling in test theory and in research design. ETS Research Bulletin Series, 1965(2), i–39. https://doi.org/10.1002/j.2333-8504.1965.tb00968.x
Macdonald, P., & Paunonen, S.V. (2002). A monte carlo comparison of item and person statistics based on item response theory versus classical test theory. Educational and Psychological Measurement, 62(6), 921 943. https://doi.org/10.1177/0013164402238082
Munger, G.F., & Loyd, B.H. (1988). The use of multiple matrix sampling for survey research. The Journal of Experimental Education, 56(4), 187–191.
OECD. (2020). PISA 2018 Technical Report-PISA. OECD Publishing, Paris. Retrieved from https://www.oecd.org/pisa/data/pisa2018technicalreport/
OECD. (2023). PISA 2022 Technical Report-PISA. OECD Publishing, Paris. Retrieved from https://www.oecd.org/pisa/data/pisa2022technicalreport/
O’Neill, T.R., Gregg, J.L., & Peabody, M.R. (2020). Effect of sample size on sommon item equating using the dichotomous rasch model. Applied Measurement in Education, 33(1), 10–23. https://doi.org/10.1080/08957347.2019.1674309
Rubin, D.B. (2009). Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons.
R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
Rutkowski, L. (2014). Sensitivity of achievement estimation to conditioning model misclassification. Applied Measurement in Education, 27(2), 115 132. https://doi.org/10.1080/08957347.2014.880440
Rutkowski, L., Gonzalez, E., Davier, M. von, & Zhou, and Y. (2013). Assessment design for international large-scale assessments. In Handbook of International Large-Scale Assessment. Chapman and Hall/CRC.
Rutkowski, L., Gonzalez, E., Joncas, M., & von Davier, M. (2010). International large-scale assessment data: Issues in Secondary Analysis and Reporting. Educational Researcher, 39(2), 142–151. https://doi.org/10.3102/0013189X10363170
Shoemaker, D.M. (1973). Principles and Procedures of Multiple Matrix Sampling. Ballinger Publishing Company.
Thissen, D., & Wainer, H. (1982). Some standard errors in item response theory. Psychometrika, 47(4).
Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for Data Science. O’Reilly Media, Inc.
Yin, L., & Foy, P. (2023). TIMSS 2023 Assessment Design. In I.V.S. Mullis, M.O. Martin, & M. von Davier (Eds.), TIMSS 2023 Assessment Frameworks. Boston College, TIMSS & PIRLS International Study Center.
Zhou, Y. (2021). Improving Multiple Matrix Sampling Design for Questionnaires. Indiana University.

A practical guide to item bank calibration with multiple matrix sampling

Yıl 2024, , 647 - 659, 15.11.2024

Eren Can Aybek , Serkan Arıkan , Güneş Ertaş

https://doi.org/10.21449/ijate.1440316

Öz

Anahtar Kelimeler

Multiple matrix sampling, Item bank development, Item response theory

Proje Numarası

BAP-SUP 17002

Kaynakça

Blötner, C. (2024). Package ‘diffcor’. https://cran.r project.org/web/packages/diffcor/diffcor.pdf
Bock, R.D., & Zimowski, M.F. (1997). Multiple group IRT. In W.J. van der Linden & R.K. Hambleton (Eds.), Handbook of Modern Item Response Theory (pp. 433–448). Springer. https://doi.org/10.1007/978-1-4757-2691-6_25
Chalmers, R.P. (2012). mirt: A multidimensional item response theory package for the R Environment. Journal of Statistical Software, 48, 1 29. https://doi.org/10.18637/jss.v048.i06
Chalmers, R.P. (2023). Package “mirt”. https://cran.r-project.org/web/packages/mirt/mirt.pdf
Chen, W.-H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265–289. https://doi.org/10.2307/1165285
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Routledge. https://doi.org/10.4324/9780203771587
Gonzalez, E., & Rutkowski, L. (2010). Principles of Multiple Matrix Booklet Designs and Parameter Recovery in Large-Scale Assessments (pp. 125–156). IERI.
Gressard, R.P., & Loyd, B.H. (1991). A comparison of item sampling plans in the application of multiple matrix sampling. Journal of Educational Measurement, 28(2), 119–130.
Kaplan, D., & Su, D. (2016). On matrix sampling and imputation of context questionnaires with implications for the generation of plausible values in large-scale assessments. Journal of Educational and Behavioral Statistics, 41(1), 57–80.
Lord, F.M. (1962). Estimating norms by item-sampling. Educational and Psychological Measurement, 22(2), 259–267. https://doi.org/10.1177/001316446202200202
Lord, F.M. (1965). Item sampling in test theory and in research design. ETS Research Bulletin Series, 1965(2), i–39. https://doi.org/10.1002/j.2333-8504.1965.tb00968.x
Macdonald, P., & Paunonen, S.V. (2002). A monte carlo comparison of item and person statistics based on item response theory versus classical test theory. Educational and Psychological Measurement, 62(6), 921 943. https://doi.org/10.1177/0013164402238082
Munger, G.F., & Loyd, B.H. (1988). The use of multiple matrix sampling for survey research. The Journal of Experimental Education, 56(4), 187–191.
OECD. (2020). PISA 2018 Technical Report-PISA. OECD Publishing, Paris. Retrieved from https://www.oecd.org/pisa/data/pisa2018technicalreport/
OECD. (2023). PISA 2022 Technical Report-PISA. OECD Publishing, Paris. Retrieved from https://www.oecd.org/pisa/data/pisa2022technicalreport/
O’Neill, T.R., Gregg, J.L., & Peabody, M.R. (2020). Effect of sample size on sommon item equating using the dichotomous rasch model. Applied Measurement in Education, 33(1), 10–23. https://doi.org/10.1080/08957347.2019.1674309
Rubin, D.B. (2009). Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons.
R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
Rutkowski, L. (2014). Sensitivity of achievement estimation to conditioning model misclassification. Applied Measurement in Education, 27(2), 115 132. https://doi.org/10.1080/08957347.2014.880440
Rutkowski, L., Gonzalez, E., Davier, M. von, & Zhou, and Y. (2013). Assessment design for international large-scale assessments. In Handbook of International Large-Scale Assessment. Chapman and Hall/CRC.
Rutkowski, L., Gonzalez, E., Joncas, M., & von Davier, M. (2010). International large-scale assessment data: Issues in Secondary Analysis and Reporting. Educational Researcher, 39(2), 142–151. https://doi.org/10.3102/0013189X10363170
Shoemaker, D.M. (1973). Principles and Procedures of Multiple Matrix Sampling. Ballinger Publishing Company.
Thissen, D., & Wainer, H. (1982). Some standard errors in item response theory. Psychometrika, 47(4).
Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for Data Science. O’Reilly Media, Inc.
Yin, L., & Foy, P. (2023). TIMSS 2023 Assessment Design. In I.V.S. Mullis, M.O. Martin, & M. von Davier (Eds.), TIMSS 2023 Assessment Frameworks. Boston College, TIMSS & PIRLS International Study Center.
Zhou, Y. (2021). Improving Multiple Matrix Sampling Design for Questionnaires. Indiana University.

Toplam 26 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Eğitimde ve Psikolojide Ölçme Teorileri ve Uygulamaları
Bölüm	Makaleler
Yazarlar	Eren Can Aybek 0000-0003-3040-2337 Serkan Arıkan 0000-0001-9610-5496 Güneş Ertaş 0000-0001-8785-7768
Proje Numarası	BAP-SUP 17002
Erken Görünüm Tarihi	21 Ekim 2024
Yayımlanma Tarihi	15 Kasım 2024
Gönderilme Tarihi	20 Şubat 2024
Kabul Tarihi	12 Ağustos 2024
Yayımlandığı Sayı	Yıl 2024

Kaynak Göster

APA	Aybek, E. C., Arıkan, S., & Ertaş, G. (2024). A practical guide to item bank calibration with multiple matrix sampling. International Journal of Assessment Tools in Education, 11(4), 647-659. https://doi.org/10.21449/ijate.1440316

Makale Dosyaları

Tam Metin

23823 23825 23824