Research Article
BibTex RIS Cite

Examination of differential item functioning in PISA through univariate and multivariate matching differential item functioning

Year 2024, Volume: 11 Issue: 4, 774 - 786, 15.11.2024

Abstract

The present research aims to examine whether the questions in the Program for the International Student Assessment (PISA) 2009 reading literacy instrument display differential item functioning (DIF) among the Turkish, French, and American samples based on univariate and multivariate matching techniques before and after the total score, which is the matching variable, is purified of the items flagged with DIF. The study is a correlational survey model research, and the participants of the study consist of 4459 Turkish, French, and American students who took booklets 1, 3, 4, and 6 in the PISA 2009 reading literacy measure. Univariate and multivariate (bivariate, trivariate, and quadrivariate) DIF analyses were performed through logistic regression before and after purifying the matching variable off the items displaying DIF. Literature was used to detect extra matching variables, and multiple linear regression analysis was carried out. As a result of the analyses, it was discovered that using extra matching variables apart from the total score reduces type I errors. It was also concluded that the exclusion of DIF items (removal of items with DIF) while calculating the total score led to variation in the number of questions detected as DIF and DIF levels of the items, although it did not yield consistent results.

References

  • American Educational Research Association. (2014). Standards for educational and psychological testing. Washington, DC.
  • Allalouf, A., Hambleton, R.K., & Sireci, S.G. (1999). Identifying the causes of DIF in translated verbal items. Journal of Educational Measurement, 36(3), 185-198.
  • Allalouf, A., & Sireci, S.G. (1998, April). Detecting sources of DIF in translated verbal items [Paper presentation]. American Educational Research Association 1998. San-Diego.
  • Arıkan, S., Van de Vijver, F.J.R., & Kutlay, Y. (2018). Propensity score matching helps to understand sources of DIF and mathematics performance differences of Indonesian, Turkish, Australian, and Dutch students in PISA. International Journal of Research in Education and Science, 4(1), 69-81.
  • Arffman, I. (2010, August). Identifying translation-related sources of differential item functioning in international reading literacy assessments [Paper presentation]. European Conference on Educational Research 2017. Helsinki.
  • Boughton, K.A., Gierl, M.J., & Khaliq, S.N. (2000, May). Differential bundle functioning on mathematics and science achievement tests: A small step toward understanding differential performance [Paper presentation]. Canadian Society for Studies in Education. Alberta.
  • Braun, M., & Harkness, J.A. (2005). Text and context: Challenges to comparability in survey questions. Zlotnik, J.H.P. & Harkness, J. (Eds.). Methodological aspects in cross-national research (pp. 95-107). Mannheim: Zuma. Camilli, G. (1992). A conceptual analysis of differential item functioning in terms of a multidimensional item response model. Applied Psychological Measurement, 16(2), 129-147.
  • Clauser, B.E., & Mazor, K.M. (1998). Using statistical procedures to identify differentially functioning items. Educational Measurement: Issues and Practice, 17, 31-44.
  • Crane, P.K., Gibbons, L.E., Jolley, L., & Belle, G. (2006). Differential item functioning analysis with ordinal logistic regression techniques. Medical Care, 44(11), 115-123.
  • Çet, S. (2006). A multivariate analysis in detecting differentially functioning items through the use of programme for international student assessment (PISA) 2003 mathematics literacy items [Unpublished doctoral dissertation, Orta Doğu Teknik Üniversitesi]. Ankara.
  • Dorans, N.J., & Holland, P.W. (1993). DIF detection and description: Mantel-Haenszel and standardization. P.W. Holland & H. Wainer (Eds.). Differential item functioning (p. 35-66). New Jersey: Lawrence Erlbaum Publishing.
  • Ercikan, K., Gierl, M.J., McCreith, T., Gautam, P., & Koh, K. (2004). Comparability of bilingual versions of assessments: Sources of incomparability of English and French versions of Canada’s national achievement tests. Applied Measurement in Education, 17(3), 301-321.
  • French, B.F., & Maller, S.J. (2007). Iterative purification and effect size use with logistic regression for differential item functioning detection. Educational and Psychological Measurement, 67(3), 373-393.
  • Furlow, C.F., Ross, T.R., & Gagné, P. (2009). The impact of multidimensionality on the detection of differential bundle functioning using simultaneous item bias test. Applied Psychological Measurement, 33(6), 441-464.
  • Gierl, M.J. (2000). Construct equivalence on translated achievement tests. Canadian Journal of Education, 25(4), 280-296.
  • Gierl, M.J. (2004, April). Using a multidimensionality-based framework to identify and interpret the construct-related dimensions that elicit group differences [Paper presentation]. American Educational Research Association. San Diego.
  • Gierl, M.J., Jodoin, M.G., & Ackerman, T.A. (2000, April). Performance of Mantel-Haenszel, simultaneous item bias test, and logistic regression when the proportion of DIF items is large [Paper presentation]. American Educational Research Association 2000. New Orleans.
  • Gierl, M.J., & Khaliq, S.N. (2000, April). Identifying sources of differential item functioning on translated achievement tests: A confirmatory analysis [Paper presentation]. National Council on Measurement in Education 2000. Louisiana, New Orleans.
  • Gierl, M.J., Rogers, W.T., & Klinger, D. (1999, April). Using statistical and judgmental reviews to identify and interpret translation DIF [Paper presentation]. National Council on Measurement in Education 1999. Montréal, Quebec.
  • Gradshtein, M.F., Mead, A.D. & Gibby, R.E. (2010). Making cognitive ability selection tests indifferent across cultures: The role of translation vs. national culture in measurement equivalence. Retrieved October 20, 2015, from http://mypages.iit.edu/~mead/GradshteinMeadGibby-2010-10-01.pdf
  • Grisay, A., Gonzalez, E., & Monseur, C. (2009). Equivalence of item difficulties across national versions of the PIRLS and PISA reading assessments. IERI Monograph Series: Issues and Methodologies in Large-Scale Assessments, 2, 63-83.
  • Hambleton, R.K. (1993). Translating achievement tests for use in cross-national studies. Retrieved December 21, 2016, from http://files.eric.ed.gov/fulltext/ED358128.pdf
  • Hambleton, R.K., Clauser, B.E., Mazor, K.M., & Jones, R.W. (1993). Advances in the detection of differentially functioning test items. Retrieved October 20, 2016, from http://files.eric.ed.gov/fulltext/ED356264.pdf
  • Hambleton, R.K., & Patsula, L. (1999). Increasing the validity of adapted tests: Myths to be avoided and guidelines for improving test adaptation practices. Journal of Applied Testing Technology, 1(1), 1-30.
  • Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991). Fundamentals of item response theory. California: Sage Publications.
  • Hidalgo, M.D., & Lopez-Pina, J.A. (2004). Differential item functioning detection and effect size: a comparison between logistic regression and Mantel-Haenszel procedures. Educational and Psychological Measurement, 64(6), 903-915.
  • Hu, L.T., & Bentler, P.M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1-55.
  • International Association for the Evaluation of Educational Achievement. (2012). TIMSS 2011 international results in mathematics. Lynch School of Education, Boston College.
  • Jodoin, M.G. (1999). Reducing Type I error rates using an effect size measure with the logistic regression procedure for DIF detection [Unpublished Master’s Thesis, University of Alberta]. Alberta.
  • Karasar, N. (2011). Bilimsel araştırma yöntemleri [Scientific research methods]. Ankara: Nobel Publishing.
  • Khalid, M.N., & Glas, C.A.W. (2013). A step-wise method for evaluation of differential item functioning. Journal of Quantitative Methods, 8(2), 25-47.
  • Lee, H., & Geisinger, K.F. (2016). The matching criterion purification for differential item functioning analyses in a large-scale assessment. Educational and Psychological Measurement, 76(1), 141-163.
  • Mcdonald, R.P., & Ringo Ho, M. (2002). Principles and practice in reporting structural equation analyses. Psychological Methods, 7(1), 64-82.
  • Organisation for Economic Co-operation and Development. (2014). PISA 2012 technical report. OECD Publishing.
  • Organisation for Economic Co-operation and Development. (2015). International large-scale assessments: Origins, growth and why countries participate in PISA. OECD Publishing.
  • Perrone, M. (2006). Differential item functioning and item bias: Critical considerations in test fairness. Teachers College, Columbia University Working Papers in TESOL and Applied Linguistics, 6(2), 1-3.
  • Roussos, L., & Stout, W. (1996). A multidimensionality-based DIF analysis paradigm. Applied Psychological Measurement, 20(4), 355-371.
  • Sireci, S.G., & Allalouf, A. (2003). Appraising item equivalence across multiple languages and cultures. Language Testing, 20(2), 148-166.
  • Sireci, S.G., & Rios, J.A. (2013). Decisions that make a difference in detecting differential item functioning. Educational Research and Evaluation: An International Journal on Theory and Practice, 19(2-3), 170-187.
  • Sireci, S.G., & Swaminathan, H. (1996). Evaluating translation equivalence: So what’s the big DIF? Retrieved October 19, 2015, from http://files.eric.ed.gov/fulltext/ED428119.pdf
  • Svetina, D., & Rutkowski, L. (2014). Detecting differential item functioning using generalized logistic regression in the context of large-scale assessments. Retrieved October 20, 2015, from http://www.largescaleassessmentsineducation.com/content/pdf/s40536-014-0004-5.pdf
  • Tabachnick, B., & Fidell, L. (2013). Using multivariate statistics (5th Edition). Allyn & Bacon/Pearson Education.
  • Van de Vijver, F., & Tanzer, N.K. (2004). Bias and equivalence in cross-cultural assessment: An overview. Retrieved October 21, 2015, from http://resilienceresearch.org/files/article-vandevijver_tanzer.pdf
  • Wen, Y. (2014). DIF analyses in multilevel data: Identification and effects on ability estimates [Unpublished doctoral dissertation, University of Wisconsin-Milwaukee]. Wisconsin.
  • Yıldırım, H.H., & Yıldırım, S. (2011). Correlates of communalities as matching variables in differential item functioning analyses. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 40, 386-396.
  • Yılmaz, M. (2021). Eğilim puanları kullanılarak abide çalışmasındaki maddelerin değişen madde fonksiyonu açısından incelenmesi [Unpublished Master’s Thesis, Hacettepe University]. Ankara.
  • Zheng, Y., Gierl, M.J., & Cui, Y. (2007). Using real data to compare DIF detection and effect size measures among Mantel-Haenszel, SIBTEST, and logistic regression procedures [Paper presentation]. National Council on Measurement in Education 2007. Chicago.
  • Zumbo, B.D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistics regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
  • Zumbo, B.D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4(2), 223-233.

Examination of differential item functioning in PISA through univariate and multivariate matching differential item functioning

Year 2024, Volume: 11 Issue: 4, 774 - 786, 15.11.2024

Abstract

The present research aims to examine whether the questions in the Program for the International Student Assessment (PISA) 2009 reading literacy instrument display differential item functioning (DIF) among the Turkish, French, and American samples based on univariate and multivariate matching techniques before and after the total score, which is the matching variable, is purified of the items flagged with DIF. The study is a correlational survey model research, and the participants of the study consist of 4459 Turkish, French, and American students who took booklets 1, 3, 4, and 6 in the PISA 2009 reading literacy measure. Univariate and multivariate (bivariate, trivariate, and quadrivariate) DIF analyses were performed through logistic regression before and after purifying the matching variable off the items displaying DIF. Literature was used to detect extra matching variables, and multiple linear regression analysis was carried out. As a result of the analyses, it was discovered that using extra matching variables apart from the total score reduces type I errors. It was also concluded that the exclusion of DIF items (removal of items with DIF) while calculating the total score led to variation in the number of questions detected as DIF and DIF levels of the items, although it did not yield consistent results.

Ethical Statement

İlgili araştırma, birinci yazarın ikinci yazar danışmanlığında hazırladığı tezden türetilmiştir. İlgili araştırma hazır veriler ve dokümanlar üzerinden yürütüldüğü ve canlılar ya da insanlar üzerinde araştırma gerçekleştirilmediği için etik kurul onayına ihtiyaç duyulmamaktadır. Bu araştırmada araştırmacılar etik ilkeleri dikkate aldıklarını ve hiçbir etik ihlalde bulunmadıklarını bildirirler.

References

  • American Educational Research Association. (2014). Standards for educational and psychological testing. Washington, DC.
  • Allalouf, A., Hambleton, R.K., & Sireci, S.G. (1999). Identifying the causes of DIF in translated verbal items. Journal of Educational Measurement, 36(3), 185-198.
  • Allalouf, A., & Sireci, S.G. (1998, April). Detecting sources of DIF in translated verbal items [Paper presentation]. American Educational Research Association 1998. San-Diego.
  • Arıkan, S., Van de Vijver, F.J.R., & Kutlay, Y. (2018). Propensity score matching helps to understand sources of DIF and mathematics performance differences of Indonesian, Turkish, Australian, and Dutch students in PISA. International Journal of Research in Education and Science, 4(1), 69-81.
  • Arffman, I. (2010, August). Identifying translation-related sources of differential item functioning in international reading literacy assessments [Paper presentation]. European Conference on Educational Research 2017. Helsinki.
  • Boughton, K.A., Gierl, M.J., & Khaliq, S.N. (2000, May). Differential bundle functioning on mathematics and science achievement tests: A small step toward understanding differential performance [Paper presentation]. Canadian Society for Studies in Education. Alberta.
  • Braun, M., & Harkness, J.A. (2005). Text and context: Challenges to comparability in survey questions. Zlotnik, J.H.P. & Harkness, J. (Eds.). Methodological aspects in cross-national research (pp. 95-107). Mannheim: Zuma. Camilli, G. (1992). A conceptual analysis of differential item functioning in terms of a multidimensional item response model. Applied Psychological Measurement, 16(2), 129-147.
  • Clauser, B.E., & Mazor, K.M. (1998). Using statistical procedures to identify differentially functioning items. Educational Measurement: Issues and Practice, 17, 31-44.
  • Crane, P.K., Gibbons, L.E., Jolley, L., & Belle, G. (2006). Differential item functioning analysis with ordinal logistic regression techniques. Medical Care, 44(11), 115-123.
  • Çet, S. (2006). A multivariate analysis in detecting differentially functioning items through the use of programme for international student assessment (PISA) 2003 mathematics literacy items [Unpublished doctoral dissertation, Orta Doğu Teknik Üniversitesi]. Ankara.
  • Dorans, N.J., & Holland, P.W. (1993). DIF detection and description: Mantel-Haenszel and standardization. P.W. Holland & H. Wainer (Eds.). Differential item functioning (p. 35-66). New Jersey: Lawrence Erlbaum Publishing.
  • Ercikan, K., Gierl, M.J., McCreith, T., Gautam, P., & Koh, K. (2004). Comparability of bilingual versions of assessments: Sources of incomparability of English and French versions of Canada’s national achievement tests. Applied Measurement in Education, 17(3), 301-321.
  • French, B.F., & Maller, S.J. (2007). Iterative purification and effect size use with logistic regression for differential item functioning detection. Educational and Psychological Measurement, 67(3), 373-393.
  • Furlow, C.F., Ross, T.R., & Gagné, P. (2009). The impact of multidimensionality on the detection of differential bundle functioning using simultaneous item bias test. Applied Psychological Measurement, 33(6), 441-464.
  • Gierl, M.J. (2000). Construct equivalence on translated achievement tests. Canadian Journal of Education, 25(4), 280-296.
  • Gierl, M.J. (2004, April). Using a multidimensionality-based framework to identify and interpret the construct-related dimensions that elicit group differences [Paper presentation]. American Educational Research Association. San Diego.
  • Gierl, M.J., Jodoin, M.G., & Ackerman, T.A. (2000, April). Performance of Mantel-Haenszel, simultaneous item bias test, and logistic regression when the proportion of DIF items is large [Paper presentation]. American Educational Research Association 2000. New Orleans.
  • Gierl, M.J., & Khaliq, S.N. (2000, April). Identifying sources of differential item functioning on translated achievement tests: A confirmatory analysis [Paper presentation]. National Council on Measurement in Education 2000. Louisiana, New Orleans.
  • Gierl, M.J., Rogers, W.T., & Klinger, D. (1999, April). Using statistical and judgmental reviews to identify and interpret translation DIF [Paper presentation]. National Council on Measurement in Education 1999. Montréal, Quebec.
  • Gradshtein, M.F., Mead, A.D. & Gibby, R.E. (2010). Making cognitive ability selection tests indifferent across cultures: The role of translation vs. national culture in measurement equivalence. Retrieved October 20, 2015, from http://mypages.iit.edu/~mead/GradshteinMeadGibby-2010-10-01.pdf
  • Grisay, A., Gonzalez, E., & Monseur, C. (2009). Equivalence of item difficulties across national versions of the PIRLS and PISA reading assessments. IERI Monograph Series: Issues and Methodologies in Large-Scale Assessments, 2, 63-83.
  • Hambleton, R.K. (1993). Translating achievement tests for use in cross-national studies. Retrieved December 21, 2016, from http://files.eric.ed.gov/fulltext/ED358128.pdf
  • Hambleton, R.K., Clauser, B.E., Mazor, K.M., & Jones, R.W. (1993). Advances in the detection of differentially functioning test items. Retrieved October 20, 2016, from http://files.eric.ed.gov/fulltext/ED356264.pdf
  • Hambleton, R.K., & Patsula, L. (1999). Increasing the validity of adapted tests: Myths to be avoided and guidelines for improving test adaptation practices. Journal of Applied Testing Technology, 1(1), 1-30.
  • Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991). Fundamentals of item response theory. California: Sage Publications.
  • Hidalgo, M.D., & Lopez-Pina, J.A. (2004). Differential item functioning detection and effect size: a comparison between logistic regression and Mantel-Haenszel procedures. Educational and Psychological Measurement, 64(6), 903-915.
  • Hu, L.T., & Bentler, P.M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: Conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6(1), 1-55.
  • International Association for the Evaluation of Educational Achievement. (2012). TIMSS 2011 international results in mathematics. Lynch School of Education, Boston College.
  • Jodoin, M.G. (1999). Reducing Type I error rates using an effect size measure with the logistic regression procedure for DIF detection [Unpublished Master’s Thesis, University of Alberta]. Alberta.
  • Karasar, N. (2011). Bilimsel araştırma yöntemleri [Scientific research methods]. Ankara: Nobel Publishing.
  • Khalid, M.N., & Glas, C.A.W. (2013). A step-wise method for evaluation of differential item functioning. Journal of Quantitative Methods, 8(2), 25-47.
  • Lee, H., & Geisinger, K.F. (2016). The matching criterion purification for differential item functioning analyses in a large-scale assessment. Educational and Psychological Measurement, 76(1), 141-163.
  • Mcdonald, R.P., & Ringo Ho, M. (2002). Principles and practice in reporting structural equation analyses. Psychological Methods, 7(1), 64-82.
  • Organisation for Economic Co-operation and Development. (2014). PISA 2012 technical report. OECD Publishing.
  • Organisation for Economic Co-operation and Development. (2015). International large-scale assessments: Origins, growth and why countries participate in PISA. OECD Publishing.
  • Perrone, M. (2006). Differential item functioning and item bias: Critical considerations in test fairness. Teachers College, Columbia University Working Papers in TESOL and Applied Linguistics, 6(2), 1-3.
  • Roussos, L., & Stout, W. (1996). A multidimensionality-based DIF analysis paradigm. Applied Psychological Measurement, 20(4), 355-371.
  • Sireci, S.G., & Allalouf, A. (2003). Appraising item equivalence across multiple languages and cultures. Language Testing, 20(2), 148-166.
  • Sireci, S.G., & Rios, J.A. (2013). Decisions that make a difference in detecting differential item functioning. Educational Research and Evaluation: An International Journal on Theory and Practice, 19(2-3), 170-187.
  • Sireci, S.G., & Swaminathan, H. (1996). Evaluating translation equivalence: So what’s the big DIF? Retrieved October 19, 2015, from http://files.eric.ed.gov/fulltext/ED428119.pdf
  • Svetina, D., & Rutkowski, L. (2014). Detecting differential item functioning using generalized logistic regression in the context of large-scale assessments. Retrieved October 20, 2015, from http://www.largescaleassessmentsineducation.com/content/pdf/s40536-014-0004-5.pdf
  • Tabachnick, B., & Fidell, L. (2013). Using multivariate statistics (5th Edition). Allyn & Bacon/Pearson Education.
  • Van de Vijver, F., & Tanzer, N.K. (2004). Bias and equivalence in cross-cultural assessment: An overview. Retrieved October 21, 2015, from http://resilienceresearch.org/files/article-vandevijver_tanzer.pdf
  • Wen, Y. (2014). DIF analyses in multilevel data: Identification and effects on ability estimates [Unpublished doctoral dissertation, University of Wisconsin-Milwaukee]. Wisconsin.
  • Yıldırım, H.H., & Yıldırım, S. (2011). Correlates of communalities as matching variables in differential item functioning analyses. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 40, 386-396.
  • Yılmaz, M. (2021). Eğilim puanları kullanılarak abide çalışmasındaki maddelerin değişen madde fonksiyonu açısından incelenmesi [Unpublished Master’s Thesis, Hacettepe University]. Ankara.
  • Zheng, Y., Gierl, M.J., & Cui, Y. (2007). Using real data to compare DIF detection and effect size measures among Mantel-Haenszel, SIBTEST, and logistic regression procedures [Paper presentation]. National Council on Measurement in Education 2007. Chicago.
  • Zumbo, B.D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistics regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
  • Zumbo, B.D. (2007). Three generations of DIF analyses: Considering where it has been, where it is now, and where it is going. Language Assessment Quarterly, 4(2), 223-233.
There are 49 citations in total.

Details

Primary Language English
Subjects National and International Success Comparisons, Cross-Cultural Comparisons of Education: International Examinations
Journal Section Articles
Authors

Ahmet Yıldırım 0000-0002-0856-9678

Nizamettin Koç This is me 0000-0001-5412-0727

Early Pub Date October 21, 2024
Publication Date November 15, 2024
Submission Date June 10, 2024
Acceptance Date September 2, 2024
Published in Issue Year 2024 Volume: 11 Issue: 4

Cite

APA Yıldırım, A., & Koç, N. (2024). Examination of differential item functioning in PISA through univariate and multivariate matching differential item functioning. International Journal of Assessment Tools in Education, 11(4), 774-786.

23823             23825             23824