Year 2022,
Volume: 13 Issue: 4, 305 - 327, 25.12.2022
Mustafa Gökcan
,
Derya Çobanoğlu Aktan
References
- Ackerman, T. A. (1987, April). The robustness of LOGIST and BILOG IRT estimation programs to violations of local independence. Paper presented at the annual meeting of the American Educational Research Association. Washington, DC.
- Alderson, J. C. (2005). Diagnosing foreign language proficiency: The interface between learning and assessment. Continuum. https://doi.org/10.5040/9781474212151
- American Educational Research Association (AERA), American Psychological Association (APA), & National Council on Measurement in Education (NCME). (2014). Standards for educational and psychological testing. American Educational Research Association.
- Aryadoust, V., Ng, L. Y., & Sayama, H. (2021). A comprehensive review of Rasch measurement in language assessment: Recommendations and guidelines for research. Language Testing, 38(1), 6–40. https://doi.org/10.1177/0265532220927487
- Baker, F. (2001). The basics of Item response theory. ERIC Clearinghouse on Assessment and Evaluation.
- Beglar, D. (2010). A Rasch-based validation of the vocabulary size test. Language Testing, 27(1), 101–118. https://doi.org/10.1177/0265532209340194
- Birnbaum A. (1968) Some Latent Trait Models, In Lord F.M., & Novick M.R. (eds.), Statistical Theories of Mental Test Scores. Addison-Wesley.
- Camilli, G. (1994). Origin of the Scaling Constant d = 1.7 in Item Response Theory. Journal of Educational and Behavioral Statistics, 19(3), 293-295. https://doi.org/10.2307/1165298
- Cattell, R. B. (1966). The Scree Test for The Number of Factors. Multivariate Behavioral Research, 1(2), 245–276. https://doi.org/10.1207/s15327906mbr0102_10
- Chalmers, R. P. (2012). mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06
- Cho, S.-J., Li, F., & Bandalos, D. (2009). Accuracy of the Parallel Analysis Procedure With Polychoric Correlations. Educational and Psychological Measurement, 69(5), 748–759. https://doi.org/10.1177/0013164409332229
- Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159. https://doi.org/10.1037//0033-2909.112.1.155
- Çepni, Z. & Kelecioğlu, H. (2021). Detecting Differential Item Functioning Using SIBTEST, MH, LR and IRT Methods. Journal of Measurement and Evaluation in Education and Psychology, 12(3), 267-285. https://doi.org/10.21031/epod.988879
- Daller, H., Milton, J., & Treffers-Daller, J. (2007). Modelling and assessing vocabulary knowledge. Cambridge University Press. https://doi.org/10.1017/CBO9780511667268
- de Ayala, R. J. (2009). The theory and practice of item response theory. Guilford Press.
- DeMars, C. (2010). Item response theory. Oxford University Press.
- Elgort, I. (2013). Effects of L1 definitions and cognate status of test items on the vocabulary size test. Language Testing, 30(2), 253–272. https://doi.org/10.1177/0265532212459028
- Hambleton, R. K. & Swaminathan, H. (1985). Item response theory: principles and applications. Kluwer-Nijhoff.
- Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage.
- Hattie, J. (1985). Methodology review: assessing unidimensionality of tests and items. Applied Psychological Measurement, 9(2), 139–164. https://doi.org/10.1177/014662168500900204
- Hertzman, M. (1936). The effects of the relative difficulty of mental tests on patterns of mental organization. Archives of Psychology, 197.
- Holland, P. W. & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test Validity (pp. 129-145). Lawrence Erlbaum.
- Horn, J. L. (1965). A Rationale and Test for The Number of Factors in Factor Analysis. Psychometrika, 30, 179–185. https://doi.org/10.1007/BF02289447
- Kaiser, H. F. (1960). The Application of Electronic Computers to Factor Analysis. Educational and Psychological Measurement, 20(1), 141–151. https://doi.org/10.1177/001316446002000116
- Karami, H. (2012). The development and validation of a bilingual version of the vocabulary size test. RELC Journal, 43(1), 53–67. https://doi.org/10.1177/0033688212439359
- Kıbrıslıoğlu Uysal, N., & Atalay Kabasakal, K. (2017). The effect of background variables on gender related differential item functioning. Journal of Measurement and Evaluation in Education and Psychology, 8(4), 373-390. https://doi.org/10.21031/epod.333451
- Koyuncu, İ., & Kılıç, A. (2019). The use of exploratory and confirmatory factor analyses: A document analysis. Education and Science, 44(198). http://dx.doi.org/10.15390/EB.2019.7665
- Li, C.-H. (2016). Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares. Behavior Research Methods, 48, 936–949. https://doi.org/10.3758/s13428-015-0619-7
- Li, C. H. (2019). Using a Listening Vocabulary Levels Test to explore the effect of vocabulary knowledge on GEPT listening comprehension performance. Language Assessment Quarterly, 16(3), 328–344. https://doi.org/10.1080/15434303.2019.1648474
- Magis, D., Beland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42(3), 847–862. https://doi.org/10.3758/BRM.42.3.847
- McDonald, R. P., & Ahlawat, K. S. (1974). Difficulty factors in binary data. British Journal of Mathematical and Statistical Psychology, 27(1), 82–99. https://doi.org/10.1111/j.2044-8317.1974.tb00530.x
- McNamara, T., & Knoch, U. (2012). The Rasch wars: The emergence of Rasch measurement in language testing. Language Testing, 29(4), 555–576. https://doi.org/10.1177/0265532211430367
- Meara, P. (1980). Vocabulary acquisition: A neglected aspect of language learning. Language Teaching, 13(3-4), 221-246. https://doi.org/10.1017/S0261444800008879
- Milton, J. (2009). Measuring second language vocabulary acquisition. Multilingual Maters. https://doi.org/10.21832/9781847692092
- Milton, J. (2013). Measuring the contribution of vocabulary knowledge to proficiency in the four skills. In C. Bardel, C. Lindqvist, & B. Laufer. (Eds.), L2 vocabulary acquisition, knowledge, and use: New perspectives on assessment and corpus analysis (pp. 57-78). Eurosla Monographs Series. https://www.eurosla.org/monographs/EM02/Milton.pdf
- Miralpeix, I. & Muñoz, C. (2018). Receptive vocabulary size and its relationship to EFL language skills. International Review of Applied Linguistics in Language Teaching, 56(1), 1-24. https://doi.org/10.1515/iral-2017-0016
- Mizumoto, A., Sasao, Y., & Webb, S. A. (2019). Developing and evaluating a computerized adaptive testing version of the word part levels test. Language Testing, 36(1), 101–123. https://doi.org/10.1177/0265532217725776
- Muthén, B., du Toit, S.H.C. & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Unpublished technical report. https://www.statmodel.com/download/Article_075.pdf
- Nagelkerke, N. J. D. (1991). A note on the general definition of the coefficient of determination. Biometrika, 78(3), 691-692. https://doi.org/10.1093/biomet/78.3.691
- Nation, I. S. P., & Beglar, D. (2007). A vocabulary size test. The Language Teacher, 31(7), 9–13. https://jalt-publications.org/tlt/issues/2007-07_31.7
- Nation, I. S. P. (2013). Learning vocabulary in another language (2nd ed.). Cambridge University Press. https://doi.org/10.1017/CBO9781139524759
- Nguyen, L. T. C., & Nation, P. (2011). A bilingual vocabulary size test of English for Vietnamese learners. RELC Journal, 42(1), 86–99. https://doi.org/10.1177/0033688210390264
- Noreillie, A. S., Kestemont, B., Heylen, K., Desmet, P., & Peters, E. (2018). Vocabulary knowledge and listening comprehension at an intermediate level in English and French as foreign languages. International Journal of Applied Linguistics, 169(1), 212-231. https://doi.org/10.1075/itl.00013.nor
- Ockey, G. J., & Choi, I. (2015) Item Response Theory. The Encyclopedia of Applied Linguistics. 1-8. https://doi.org/10.1002/9781405198431.wbeal1476
- Paek, I., & Cole, K. (2020). Using R for item response theory model applications. Routledge.
- Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. University of Chicago Press.
- Reckase, M. D., Ackerman, T. A., & Carlson, J. E. (1988). Building a unidimensional test using multidimensional items. Journal of Educational Measurement, 25(3), 193–203. https://doi.org/10.1111/j.1745-3984.1988.tb00302.x
- Reckase, M. D. (2009). Multidimensional item response theory. Springer. https://doi.org/10.1007/978-0-387-89976-3
- Schmitt, N., Nation, P., & Kremmel, B. (2020). Moving the field of vocabulary assessment forward: The need for more rigorous test development and validation. Language Teaching, 53(1), 109-120. https://doi.org/10.1017/S0261444819000326
- Spearman, C. (1927). The abilities of man: Their nature and measurement. MaCmillan.
- Stæhr, L. S. (2008). Vocabulary size and the skills of listening, reading and writing. Language Learning Journal, 36(2), 139–152. https://doi.org/10.1080/09571730802389975
- Stewart, J. (2014). Do multiple-choice options inflate estimates of vocabulary size on the VST? Language Assessment Quarterly, 11(3), 271–282. https://doi.org/10.1080/15434303.2014.922977
- Tran, U. S., & Formann, A. K. (2009). Performance of Parallel Analysis in Retrieving Unidimensionality in the Presence of Binary Data. Educational and Psychological Measurement, 69(1), 50–61. https://doi.org/10.1177/0013164408318761
- Uysal, İ., Ertuna, L., Ertaş, F., G. & Kelecioğlu, H. (2019). Performances based on ability estimation of the methods of detecting differential item functioning: A simulation study. Journal of Measurement and Evaluation in Education and Psychology, 10(2), 133-148. https://doi.org/10.21031/epod.534312
- van der Linden, W. J., & Hambleton, R. K. (1997). Handbook of modern item response theory. Springer.
- Weng, L.-J., & Cheng, C.-P. (2005). Parallel Analysis with Unidimensional Binary Data. Educational and Psychological Measurement, 65(5), 697–716. https://doi.org/10.1177/0013164404273941
- Yang, Y., & Xia, Y. (2015). On the number of factors to retain in exploratory factor analysis for ordered categorical data. Behavior Research Methods, 47(3), 756–772. https://doi.org/10.3758/s13428-014-0499-2
- Yen, W.M. (1993). Scaling performance assessments: strategies for managing local item dependence. Journal of Educational Measurement, 30, 187-213. https://doi.org/10.1111/j.1745-3984.1993.tb00423.x
- Zhang, X. (2013). The “i don’t know” option in the vocabulary size test. TESOL Quarterly, 47(4), 790–811. https://doi.org/10.1002/tesq.98
- Zhang, S., & Zhang, X. (2020). The relationship between vocabulary knowledge and L2 reading/listening comprehension: A meta-analysis. Language Teaching Research,26(4), 696–725. https://doi.org/10.1177/1362168820913998
- Zhao, P., & Ji, X. (2018). Validation of the Mandarin version of the vocabulary size test. RELC Journal, 49(3), 308–321. https://doi.org/10.1177/0033688216639761
- Zumbo, B. D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4(2), 223–233. https://doi.org/10.1080/15434300701375832
Validation of the Vocabulary Size Test
Year 2022,
Volume: 13 Issue: 4, 305 - 327, 25.12.2022
Mustafa Gökcan
,
Derya Çobanoğlu Aktan
Abstract
The Vocabulary Size Test (VST) is one of the most commonly used assessment tools for measuring English vocabulary size in the field of language testing. Despite its common usage, only a limited number of validity and reliability studies have been carried out with regard to the VST. Besides, they were mostly predicated on the Rasch model. This validation study has attempted to reveal evidence for construct validity for the VST, and to this end, item response theory (IRT) analyses were performed based on the three-parameter logistic model (3PLM). The assumptions of IRT were investigated via factor analysis (unidimensionality) and Yen’s Q3 statistic (local independence). Detailed differential item functioning (DIF) analyses were conducted with Mantel-Haenszel, Lord's chi-square test, and Logistic regression methods to add evidence based on internal structure and to check fairness as a lack of measurement bias. The validation results with IRT showed that the 3PLM fitted the data better than the one- and the two-parameter logistic models. DIF results indicated that 10 items exhibited large DIF (seven favoring males and three favoring females). The results further showed that the guessing effect was not negligible for the VST.
References
- Ackerman, T. A. (1987, April). The robustness of LOGIST and BILOG IRT estimation programs to violations of local independence. Paper presented at the annual meeting of the American Educational Research Association. Washington, DC.
- Alderson, J. C. (2005). Diagnosing foreign language proficiency: The interface between learning and assessment. Continuum. https://doi.org/10.5040/9781474212151
- American Educational Research Association (AERA), American Psychological Association (APA), & National Council on Measurement in Education (NCME). (2014). Standards for educational and psychological testing. American Educational Research Association.
- Aryadoust, V., Ng, L. Y., & Sayama, H. (2021). A comprehensive review of Rasch measurement in language assessment: Recommendations and guidelines for research. Language Testing, 38(1), 6–40. https://doi.org/10.1177/0265532220927487
- Baker, F. (2001). The basics of Item response theory. ERIC Clearinghouse on Assessment and Evaluation.
- Beglar, D. (2010). A Rasch-based validation of the vocabulary size test. Language Testing, 27(1), 101–118. https://doi.org/10.1177/0265532209340194
- Birnbaum A. (1968) Some Latent Trait Models, In Lord F.M., & Novick M.R. (eds.), Statistical Theories of Mental Test Scores. Addison-Wesley.
- Camilli, G. (1994). Origin of the Scaling Constant d = 1.7 in Item Response Theory. Journal of Educational and Behavioral Statistics, 19(3), 293-295. https://doi.org/10.2307/1165298
- Cattell, R. B. (1966). The Scree Test for The Number of Factors. Multivariate Behavioral Research, 1(2), 245–276. https://doi.org/10.1207/s15327906mbr0102_10
- Chalmers, R. P. (2012). mirt: A Multidimensional Item Response Theory Package for the R Environment. Journal of Statistical Software, 48(6), 1–29. https://doi.org/10.18637/jss.v048.i06
- Cho, S.-J., Li, F., & Bandalos, D. (2009). Accuracy of the Parallel Analysis Procedure With Polychoric Correlations. Educational and Psychological Measurement, 69(5), 748–759. https://doi.org/10.1177/0013164409332229
- Cohen, J. (1992). A power primer. Psychological Bulletin, 112(1), 155-159. https://doi.org/10.1037//0033-2909.112.1.155
- Çepni, Z. & Kelecioğlu, H. (2021). Detecting Differential Item Functioning Using SIBTEST, MH, LR and IRT Methods. Journal of Measurement and Evaluation in Education and Psychology, 12(3), 267-285. https://doi.org/10.21031/epod.988879
- Daller, H., Milton, J., & Treffers-Daller, J. (2007). Modelling and assessing vocabulary knowledge. Cambridge University Press. https://doi.org/10.1017/CBO9780511667268
- de Ayala, R. J. (2009). The theory and practice of item response theory. Guilford Press.
- DeMars, C. (2010). Item response theory. Oxford University Press.
- Elgort, I. (2013). Effects of L1 definitions and cognate status of test items on the vocabulary size test. Language Testing, 30(2), 253–272. https://doi.org/10.1177/0265532212459028
- Hambleton, R. K. & Swaminathan, H. (1985). Item response theory: principles and applications. Kluwer-Nijhoff.
- Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. Sage.
- Hattie, J. (1985). Methodology review: assessing unidimensionality of tests and items. Applied Psychological Measurement, 9(2), 139–164. https://doi.org/10.1177/014662168500900204
- Hertzman, M. (1936). The effects of the relative difficulty of mental tests on patterns of mental organization. Archives of Psychology, 197.
- Holland, P. W. & Thayer, D. T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H. I. Braun (Eds.), Test Validity (pp. 129-145). Lawrence Erlbaum.
- Horn, J. L. (1965). A Rationale and Test for The Number of Factors in Factor Analysis. Psychometrika, 30, 179–185. https://doi.org/10.1007/BF02289447
- Kaiser, H. F. (1960). The Application of Electronic Computers to Factor Analysis. Educational and Psychological Measurement, 20(1), 141–151. https://doi.org/10.1177/001316446002000116
- Karami, H. (2012). The development and validation of a bilingual version of the vocabulary size test. RELC Journal, 43(1), 53–67. https://doi.org/10.1177/0033688212439359
- Kıbrıslıoğlu Uysal, N., & Atalay Kabasakal, K. (2017). The effect of background variables on gender related differential item functioning. Journal of Measurement and Evaluation in Education and Psychology, 8(4), 373-390. https://doi.org/10.21031/epod.333451
- Koyuncu, İ., & Kılıç, A. (2019). The use of exploratory and confirmatory factor analyses: A document analysis. Education and Science, 44(198). http://dx.doi.org/10.15390/EB.2019.7665
- Li, C.-H. (2016). Confirmatory factor analysis with ordinal data: Comparing robust maximum likelihood and diagonally weighted least squares. Behavior Research Methods, 48, 936–949. https://doi.org/10.3758/s13428-015-0619-7
- Li, C. H. (2019). Using a Listening Vocabulary Levels Test to explore the effect of vocabulary knowledge on GEPT listening comprehension performance. Language Assessment Quarterly, 16(3), 328–344. https://doi.org/10.1080/15434303.2019.1648474
- Magis, D., Beland, S., Tuerlinckx, F., & De Boeck, P. (2010). A general framework and an R package for the detection of dichotomous differential item functioning. Behavior Research Methods, 42(3), 847–862. https://doi.org/10.3758/BRM.42.3.847
- McDonald, R. P., & Ahlawat, K. S. (1974). Difficulty factors in binary data. British Journal of Mathematical and Statistical Psychology, 27(1), 82–99. https://doi.org/10.1111/j.2044-8317.1974.tb00530.x
- McNamara, T., & Knoch, U. (2012). The Rasch wars: The emergence of Rasch measurement in language testing. Language Testing, 29(4), 555–576. https://doi.org/10.1177/0265532211430367
- Meara, P. (1980). Vocabulary acquisition: A neglected aspect of language learning. Language Teaching, 13(3-4), 221-246. https://doi.org/10.1017/S0261444800008879
- Milton, J. (2009). Measuring second language vocabulary acquisition. Multilingual Maters. https://doi.org/10.21832/9781847692092
- Milton, J. (2013). Measuring the contribution of vocabulary knowledge to proficiency in the four skills. In C. Bardel, C. Lindqvist, & B. Laufer. (Eds.), L2 vocabulary acquisition, knowledge, and use: New perspectives on assessment and corpus analysis (pp. 57-78). Eurosla Monographs Series. https://www.eurosla.org/monographs/EM02/Milton.pdf
- Miralpeix, I. & Muñoz, C. (2018). Receptive vocabulary size and its relationship to EFL language skills. International Review of Applied Linguistics in Language Teaching, 56(1), 1-24. https://doi.org/10.1515/iral-2017-0016
- Mizumoto, A., Sasao, Y., & Webb, S. A. (2019). Developing and evaluating a computerized adaptive testing version of the word part levels test. Language Testing, 36(1), 101–123. https://doi.org/10.1177/0265532217725776
- Muthén, B., du Toit, S.H.C. & Spisic, D. (1997). Robust inference using weighted least squares and quadratic estimating equations in latent variable modeling with categorical and continuous outcomes. Unpublished technical report. https://www.statmodel.com/download/Article_075.pdf
- Nagelkerke, N. J. D. (1991). A note on the general definition of the coefficient of determination. Biometrika, 78(3), 691-692. https://doi.org/10.1093/biomet/78.3.691
- Nation, I. S. P., & Beglar, D. (2007). A vocabulary size test. The Language Teacher, 31(7), 9–13. https://jalt-publications.org/tlt/issues/2007-07_31.7
- Nation, I. S. P. (2013). Learning vocabulary in another language (2nd ed.). Cambridge University Press. https://doi.org/10.1017/CBO9781139524759
- Nguyen, L. T. C., & Nation, P. (2011). A bilingual vocabulary size test of English for Vietnamese learners. RELC Journal, 42(1), 86–99. https://doi.org/10.1177/0033688210390264
- Noreillie, A. S., Kestemont, B., Heylen, K., Desmet, P., & Peters, E. (2018). Vocabulary knowledge and listening comprehension at an intermediate level in English and French as foreign languages. International Journal of Applied Linguistics, 169(1), 212-231. https://doi.org/10.1075/itl.00013.nor
- Ockey, G. J., & Choi, I. (2015) Item Response Theory. The Encyclopedia of Applied Linguistics. 1-8. https://doi.org/10.1002/9781405198431.wbeal1476
- Paek, I., & Cole, K. (2020). Using R for item response theory model applications. Routledge.
- Rasch, G. (1980). Probabilistic models for some intelligence and attainment tests. University of Chicago Press.
- Reckase, M. D., Ackerman, T. A., & Carlson, J. E. (1988). Building a unidimensional test using multidimensional items. Journal of Educational Measurement, 25(3), 193–203. https://doi.org/10.1111/j.1745-3984.1988.tb00302.x
- Reckase, M. D. (2009). Multidimensional item response theory. Springer. https://doi.org/10.1007/978-0-387-89976-3
- Schmitt, N., Nation, P., & Kremmel, B. (2020). Moving the field of vocabulary assessment forward: The need for more rigorous test development and validation. Language Teaching, 53(1), 109-120. https://doi.org/10.1017/S0261444819000326
- Spearman, C. (1927). The abilities of man: Their nature and measurement. MaCmillan.
- Stæhr, L. S. (2008). Vocabulary size and the skills of listening, reading and writing. Language Learning Journal, 36(2), 139–152. https://doi.org/10.1080/09571730802389975
- Stewart, J. (2014). Do multiple-choice options inflate estimates of vocabulary size on the VST? Language Assessment Quarterly, 11(3), 271–282. https://doi.org/10.1080/15434303.2014.922977
- Tran, U. S., & Formann, A. K. (2009). Performance of Parallel Analysis in Retrieving Unidimensionality in the Presence of Binary Data. Educational and Psychological Measurement, 69(1), 50–61. https://doi.org/10.1177/0013164408318761
- Uysal, İ., Ertuna, L., Ertaş, F., G. & Kelecioğlu, H. (2019). Performances based on ability estimation of the methods of detecting differential item functioning: A simulation study. Journal of Measurement and Evaluation in Education and Psychology, 10(2), 133-148. https://doi.org/10.21031/epod.534312
- van der Linden, W. J., & Hambleton, R. K. (1997). Handbook of modern item response theory. Springer.
- Weng, L.-J., & Cheng, C.-P. (2005). Parallel Analysis with Unidimensional Binary Data. Educational and Psychological Measurement, 65(5), 697–716. https://doi.org/10.1177/0013164404273941
- Yang, Y., & Xia, Y. (2015). On the number of factors to retain in exploratory factor analysis for ordered categorical data. Behavior Research Methods, 47(3), 756–772. https://doi.org/10.3758/s13428-014-0499-2
- Yen, W.M. (1993). Scaling performance assessments: strategies for managing local item dependence. Journal of Educational Measurement, 30, 187-213. https://doi.org/10.1111/j.1745-3984.1993.tb00423.x
- Zhang, X. (2013). The “i don’t know” option in the vocabulary size test. TESOL Quarterly, 47(4), 790–811. https://doi.org/10.1002/tesq.98
- Zhang, S., & Zhang, X. (2020). The relationship between vocabulary knowledge and L2 reading/listening comprehension: A meta-analysis. Language Teaching Research,26(4), 696–725. https://doi.org/10.1177/1362168820913998
- Zhao, P., & Ji, X. (2018). Validation of the Mandarin version of the vocabulary size test. RELC Journal, 49(3), 308–321. https://doi.org/10.1177/0033688216639761
- Zumbo, B. D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4(2), 223–233. https://doi.org/10.1080/15434300701375832