A practical guide to item bank calibration with multiple matrix sampling

Eren Can Aybek; Serkan Arıkan; Güneş Ertaş

doi:10.21449/ijate.1440316

Research Article

A practical guide to item bank calibration with multiple matrix sampling

Year 2024, Volume: 11 Issue: 4, 647 - 659, 15.11.2024

Eren Can Aybek , Serkan Arıkan , Güneş Ertaş

https://doi.org/10.21449/ijate.1440316

Abstract

When it is required to estimate item parameters of a large item bank, Multiple Matrix Sampling (MMS) design provides an efficient way while minimizing the test burden on students. The current study exemplifies how to calibrate a large item pool using MMS design for various purposes, such as developing a CAT administration. The purpose of the current study is to explain and provide an example of how to use MMS design for item bank calibration. Two functions of mirt package, mirt() and multipleGroup() were compared using real data. The results of the present study showed that the standard mirt() function is more practical and makes more precise estimations compared to the multipleGroup() function.

Keywords

Multiple matrix sampling, Item bank development, Item response theory

Supporting Institution

Bogazici University

Project Number

BAP-SUP 17002

References

Blötner, C. (2024). Package ‘diffcor’. https://cran.r project.org/web/packages/diffcor/diffcor.pdf
Bock, R.D., & Zimowski, M.F. (1997). Multiple group IRT. In W.J. van der Linden & R.K. Hambleton (Eds.), Handbook of Modern Item Response Theory (pp. 433–448). Springer. https://doi.org/10.1007/978-1-4757-2691-6_25
Chalmers, R.P. (2012). mirt: A multidimensional item response theory package for the R Environment. Journal of Statistical Software, 48, 1 29. https://doi.org/10.18637/jss.v048.i06
Chalmers, R.P. (2023). Package “mirt”. https://cran.r-project.org/web/packages/mirt/mirt.pdf
Chen, W.-H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265–289. https://doi.org/10.2307/1165285
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Routledge. https://doi.org/10.4324/9780203771587
Gonzalez, E., & Rutkowski, L. (2010). Principles of Multiple Matrix Booklet Designs and Parameter Recovery in Large-Scale Assessments (pp. 125–156). IERI.
Gressard, R.P., & Loyd, B.H. (1991). A comparison of item sampling plans in the application of multiple matrix sampling. Journal of Educational Measurement, 28(2), 119–130.
Kaplan, D., & Su, D. (2016). On matrix sampling and imputation of context questionnaires with implications for the generation of plausible values in large-scale assessments. Journal of Educational and Behavioral Statistics, 41(1), 57–80.
Lord, F.M. (1962). Estimating norms by item-sampling. Educational and Psychological Measurement, 22(2), 259–267. https://doi.org/10.1177/001316446202200202
Lord, F.M. (1965). Item sampling in test theory and in research design. ETS Research Bulletin Series, 1965(2), i–39. https://doi.org/10.1002/j.2333-8504.1965.tb00968.x
Macdonald, P., & Paunonen, S.V. (2002). A monte carlo comparison of item and person statistics based on item response theory versus classical test theory. Educational and Psychological Measurement, 62(6), 921 943. https://doi.org/10.1177/0013164402238082
Munger, G.F., & Loyd, B.H. (1988). The use of multiple matrix sampling for survey research. The Journal of Experimental Education, 56(4), 187–191.
OECD. (2020). PISA 2018 Technical Report-PISA. OECD Publishing, Paris. Retrieved from https://www.oecd.org/pisa/data/pisa2018technicalreport/
OECD. (2023). PISA 2022 Technical Report-PISA. OECD Publishing, Paris. Retrieved from https://www.oecd.org/pisa/data/pisa2022technicalreport/
O’Neill, T.R., Gregg, J.L., & Peabody, M.R. (2020). Effect of sample size on sommon item equating using the dichotomous rasch model. Applied Measurement in Education, 33(1), 10–23. https://doi.org/10.1080/08957347.2019.1674309
Rubin, D.B. (2009). Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons.
R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
Rutkowski, L. (2014). Sensitivity of achievement estimation to conditioning model misclassification. Applied Measurement in Education, 27(2), 115 132. https://doi.org/10.1080/08957347.2014.880440
Rutkowski, L., Gonzalez, E., Davier, M. von, & Zhou, and Y. (2013). Assessment design for international large-scale assessments. In Handbook of International Large-Scale Assessment. Chapman and Hall/CRC.
Rutkowski, L., Gonzalez, E., Joncas, M., & von Davier, M. (2010). International large-scale assessment data: Issues in Secondary Analysis and Reporting. Educational Researcher, 39(2), 142–151. https://doi.org/10.3102/0013189X10363170
Shoemaker, D.M. (1973). Principles and Procedures of Multiple Matrix Sampling. Ballinger Publishing Company.
Thissen, D., & Wainer, H. (1982). Some standard errors in item response theory. Psychometrika, 47(4).
Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for Data Science. O’Reilly Media, Inc.
Yin, L., & Foy, P. (2023). TIMSS 2023 Assessment Design. In I.V.S. Mullis, M.O. Martin, & M. von Davier (Eds.), TIMSS 2023 Assessment Frameworks. Boston College, TIMSS & PIRLS International Study Center.
Zhou, Y. (2021). Improving Multiple Matrix Sampling Design for Questionnaires. Indiana University.

A practical guide to item bank calibration with multiple matrix sampling

Year 2024, Volume: 11 Issue: 4, 647 - 659, 15.11.2024

Eren Can Aybek , Serkan Arıkan , Güneş Ertaş

https://doi.org/10.21449/ijate.1440316

Abstract

Keywords

Multiple matrix sampling, Item bank development, Item response theory

Project Number

BAP-SUP 17002

References

Blötner, C. (2024). Package ‘diffcor’. https://cran.r project.org/web/packages/diffcor/diffcor.pdf
Bock, R.D., & Zimowski, M.F. (1997). Multiple group IRT. In W.J. van der Linden & R.K. Hambleton (Eds.), Handbook of Modern Item Response Theory (pp. 433–448). Springer. https://doi.org/10.1007/978-1-4757-2691-6_25
Chalmers, R.P. (2012). mirt: A multidimensional item response theory package for the R Environment. Journal of Statistical Software, 48, 1 29. https://doi.org/10.18637/jss.v048.i06
Chalmers, R.P. (2023). Package “mirt”. https://cran.r-project.org/web/packages/mirt/mirt.pdf
Chen, W.-H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265–289. https://doi.org/10.2307/1165285
Cohen, J. (1988). Statistical Power Analysis for the Behavioral Sciences (2nd ed.). Routledge. https://doi.org/10.4324/9780203771587
Gonzalez, E., & Rutkowski, L. (2010). Principles of Multiple Matrix Booklet Designs and Parameter Recovery in Large-Scale Assessments (pp. 125–156). IERI.
Gressard, R.P., & Loyd, B.H. (1991). A comparison of item sampling plans in the application of multiple matrix sampling. Journal of Educational Measurement, 28(2), 119–130.
Kaplan, D., & Su, D. (2016). On matrix sampling and imputation of context questionnaires with implications for the generation of plausible values in large-scale assessments. Journal of Educational and Behavioral Statistics, 41(1), 57–80.
Lord, F.M. (1962). Estimating norms by item-sampling. Educational and Psychological Measurement, 22(2), 259–267. https://doi.org/10.1177/001316446202200202
Lord, F.M. (1965). Item sampling in test theory and in research design. ETS Research Bulletin Series, 1965(2), i–39. https://doi.org/10.1002/j.2333-8504.1965.tb00968.x
Macdonald, P., & Paunonen, S.V. (2002). A monte carlo comparison of item and person statistics based on item response theory versus classical test theory. Educational and Psychological Measurement, 62(6), 921 943. https://doi.org/10.1177/0013164402238082
Munger, G.F., & Loyd, B.H. (1988). The use of multiple matrix sampling for survey research. The Journal of Experimental Education, 56(4), 187–191.
OECD. (2020). PISA 2018 Technical Report-PISA. OECD Publishing, Paris. Retrieved from https://www.oecd.org/pisa/data/pisa2018technicalreport/
OECD. (2023). PISA 2022 Technical Report-PISA. OECD Publishing, Paris. Retrieved from https://www.oecd.org/pisa/data/pisa2022technicalreport/
O’Neill, T.R., Gregg, J.L., & Peabody, M.R. (2020). Effect of sample size on sommon item equating using the dichotomous rasch model. Applied Measurement in Education, 33(1), 10–23. https://doi.org/10.1080/08957347.2019.1674309
Rubin, D.B. (2009). Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons.
R Core Team (2023). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
Rutkowski, L. (2014). Sensitivity of achievement estimation to conditioning model misclassification. Applied Measurement in Education, 27(2), 115 132. https://doi.org/10.1080/08957347.2014.880440
Rutkowski, L., Gonzalez, E., Davier, M. von, & Zhou, and Y. (2013). Assessment design for international large-scale assessments. In Handbook of International Large-Scale Assessment. Chapman and Hall/CRC.
Rutkowski, L., Gonzalez, E., Joncas, M., & von Davier, M. (2010). International large-scale assessment data: Issues in Secondary Analysis and Reporting. Educational Researcher, 39(2), 142–151. https://doi.org/10.3102/0013189X10363170
Shoemaker, D.M. (1973). Principles and Procedures of Multiple Matrix Sampling. Ballinger Publishing Company.
Thissen, D., & Wainer, H. (1982). Some standard errors in item response theory. Psychometrika, 47(4).
Wickham, H., Çetinkaya-Rundel, M., & Grolemund, G. (2023). R for Data Science. O’Reilly Media, Inc.
Yin, L., & Foy, P. (2023). TIMSS 2023 Assessment Design. In I.V.S. Mullis, M.O. Martin, & M. von Davier (Eds.), TIMSS 2023 Assessment Frameworks. Boston College, TIMSS & PIRLS International Study Center.
Zhou, Y. (2021). Improving Multiple Matrix Sampling Design for Questionnaires. Indiana University.

There are 26 citations in total.

Details

Primary Language	English
Subjects	Measurement Theories and Applications in Education and Psychology
Journal Section	Articles
Authors	Eren Can Aybek 0000-0003-3040-2337 Serkan Arıkan 0000-0001-9610-5496 Güneş Ertaş 0000-0001-8785-7768
Project Number	BAP-SUP 17002
Early Pub Date	October 21, 2024
Publication Date	November 15, 2024
Submission Date	February 20, 2024
Acceptance Date	August 12, 2024
Published in Issue	Year 2024 Volume: 11 Issue: 4

Cite

APA	Aybek, E. C., Arıkan, S., & Ertaş, G. (2024). A practical guide to item bank calibration with multiple matrix sampling. International Journal of Assessment Tools in Education, 11(4), 647-659. https://doi.org/10.21449/ijate.1440316

Article Files

Full Text

23823 23825 23824