Research Article
BibTex RIS Cite

A Comparison of the efficacies of differential item functioning detection methods

Year 2023, Volume: 10 Issue: 1, 145 - 159, 20.03.2023
https://doi.org/10.21449/ijate.1135368

Abstract

To ensure the validity of the tests is to check that all items have similar results across different groups of individuals. However, differential item functioning (DIF) occurs when the results of individuals with equal ability levels from different groups differ from each other on the same test item. Based on Item Response Theory and Classic Test Theory, there are some methods, with different advantages and limitations to identify items that show DIF. This study aims to compare the performances of five methods for detecting DIF. The efficacies of Mantel-Haenszel (MH), Logistic Regression (LR), Crossing simultaneous item bias test (CSIBTEST), Lord's chi-square (LORD), and Raju's area measure (RAJU) methods are examined considering conditions of the sample size, DIF ratio, and test length. In this study, to compare the detection methods, power and Type I error rates are evaluated using a simulation study with 100 replications conducted for each condition. Results show that LR and MH have the lowest Type I error and the highest power rate in detecting uniform DIF. In addition, CSIBTEST has a similar power rate to MH and LR. Under DIF conditions, sample size, DIF ratio, test length and their interactions affect Type I error and power rates.

References

  • Apinyapibal, S., Lawthong, N., & Kanjanawasee, S. (2015). A comparative analysis of the efficacy of differential item functioning detection for dichotomously scored items among logistic regression, SIBTEST and raschtree methods. Procedia-Social and Behavioral Sciences, 191, 21-25. https://doi.org/10.1016/j.sbspro.2015.04.664
  • Atalay Kabasakal, K., Arsan, N., Gök, B., & Kelecioğlu, H. (2014). Comparing performances (type I error and power) of IRT likelihood ratio SIBTEST and Mantel-Haenszel methods in the determination of differential item functioning, Educational Sciences: Theory and Practice, 14(6), 2175-2193. https://doi.org/10.12738/estp.2014.6.2165
  • Atar, B. (2007). Differential item functioning analyses for mixed response data using IRT likelihood-ratio test, logistic regression, and GLLAMM procedures [Unpublished doctoral dissertation]. University of Florida State.
  • Ayva Yörü, F.G., & Atar, H.Y. (2019). Determination of differential item functioning (DIF) according to SIBTEST, Lord's [Chi-squared], Raju's area measurement and Breslow-Day Methods. Journal of Pedagogical Research, 3(3), 139 150. https://doi.org/10.33902/jpr.v3i3.137
  • Camilli, G., & Shepard, L.A. (1994). Methods for identifying biased test items. Sage Publications.
  • Camilli, G. (2006). Test fairness. In R.L. Brennan (Ed), Educational Measurement (4th ed., pp. 221–257). Rowman & Littlefield.
  • De Ayala, R.J. (2009). The theory and practice of item response theory. The Guilford Press.
  • DeMars, C.E. (2009). Modification of the Mantel-Haenszel and logistic regression DIF procedures to incorporate the SIBTEST regression correction. Journal of Educational and Behavioral Statistics, 34, 149-170. https://doi.org/10.3102/1076998607313923
  • DeMars, C.E., & Lau, A. (2011). Differential item functioning detection with latent classes: how accurately can we detect who is responding differentially?. Educational and Psychological Measurement, 71(4), 597 616. https://doi.org/10.1177/0013164411404221
  • Dorans, N.J., & Holland, P.W. (1992). DIF detection and description: Mantel‐Haenszel and standardization. ETS Research Report Series, 1992(1), i 40. https://doi.org/10.1002/j.2333-8504.1992.tb01440.x
  • Embretson, S.E., & Reise, S.T. (2000). Item response theory for psychologists. Lawrance Erlbaum Associates.
  • Erdem Keklik, D. (2014). Değişen madde fonksiyonunu belirlemede Mantel-Haenszel ve lojistik regresyon tekniklerinin karşılaştırılması [Comparison of Mantel-Haenszel and logistic regression techniques in detecting differential item functioning]. Journal of Measurement and Evaluation in Education and Psychology, 5(2), 12 25. https://doi.org/10.21031/epod.71099
  • Fidalgo, A.M., Mellenbergh, G.J., & Muñiz, J. (2000). Effects of amount of DIF, test length, and purification type on robustness and power of Mantel-Haenszel procedures. Methods of Psychological Research Online, 5(3), 43-53.
  • Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278-295. https://doi.org/10.1177/0146621605275728
  • Gao, X. (2019). A comparison of six DIF detection methods [Unpublished master’s thesis]. University of Connecticut.
  • Gierl, M.J., Jodoin, M.G., & Ackerman, T.A. (2000, April 24-27). Performance of Mantel-Haenszel, simultaneous item bias test, and logistic regression when the proportion of DIF items is large [Paper presentation] In Annual Meeting of the American Educational Research Association (AERA), New Orleans, LA, United States.
  • Glas, C.A., & Meijer, R.R. (2003). A Bayesian approach to person fit analysis in item response theory models. Applied Psychological Measurement, 27(3), 217 233. https://doi.org/10.1177/0146621603027003003
  • Guilera, G., Gomez-Benito, J., Hidalgo, M.D. & Sanchez-Meca, J. (2013). Type I error and statistical power of the Mantel-Haenszel procedure for detecting DIF: A meta-analysis. Psychological Methods, 18(4), 553-71. https://doi.org/10.1037/a0034306
  • Güler, N., & Penfield, R.D. (2009). A comparison of the logistic regression and contingency table methods for simultaneous detection of uniform and nonuniform DIF. Journal of Educational Measurement, 46(3), 314 329. https://doi.org/10.1111/j.1745 3984.2009.00083.x
  • Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991). Fundamentals of item response theory. Sage.
  • Hambleton, R.K., Clauser, B.E., Mazor, K.M., & Jones, R.W. (1993). Advances in the detection of differentially functioning test items. European Journal of Psychological Assessment, 9(1), 1-18.
  • Han, K.T., & Hambleton, R.K. (2014). User's manual for WinGen3: Windows software that generates IRT model parameters and item responses (Center for Educational Assessment Report No. 642). Amherst, MA: University of Massachusetts, Center for Educational Assessment.
  • Herrera, A., & Gomez, J. (2008). Influence of equal or unequal comparison group sample sizes on the detection of differential item functioning using the Mantel-Haenszel and logistic regression techniques. Quality & Quantity, 42(6), 739 755. https://doi.org/10.1007/s11135-006-9065-z
  • Hidalgo, M.D., López-Martínez, M.D., Gómez-Benito, J., & Guilera, G. (2016). A comparison of discriminant logistic regression and Item Response Theory Likelihood-Ratio Tests for Differential Item Functioning (IRTLRDIF) in polytomous short tests. Psicothema, 28(1), 83-88. https://doi.org/10.7334/psicothema2015.142
  • Holland, P.W., & Thayer, D.T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H.I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Erlbaum
  • Holmes Finch, W., & French, B.F. (2007). Detection of crossing differential item functioning: A comparison of four methods. Educational and Psychological Measurement, 67(4), 565-582. https://doi.org/10.1177/0013164406296975
  • Jodoin, M.G., & Gierl, M.J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied measurement in education, 14(4), 329-349. https://doi.org/10.1207/S15324818AME1404_2
  • Kane, M.T. (2006). Validation. In R.L. Brennan (Ed.), Educational measurement (4th ed., pp. 17– 64). Rowman & Littlefield.
  • Karasar, N. (2021). Bilimsel araştırma yöntemleri [Scientific research methods]. Nobel Yayınları.
  • Kaya, Y., Leite, W., & Miller, M.D. (2015). A comparison of logistic regression models for DIF detection in polytomous items: the effect of small sample sizes and non-normality of ability distributions. International Journal of Assessment Tools in Education, 2(1), 22-39. https://doi.org/10.21449/ijate.239563
  • Kelecioğlu, H., Karabay, B., & Karabay, E. (2014). Seviye belirleme sınavı’nın madde yanlılığı açısından incelenmesi [Investigation of placement test in terms of item biasness]. Elementary Education Online, 13(3), 934-953.
  • Kim, J. (2010). Controlling Type I error rate in evaluating differential item functioning for four DIF methods: Use of three procedures for adjustment of multiple item testing. Dissertation, Georgia State University.
  • Li, Y., Brooks, G.P., & Johanson, G.A. (2012). Item discrimination and Type I error in the detection of differential item functioning. Educational and Psychological Measurement, 72(5), 847-861. https://doi.org/10.1177/0013164411432333
  • Li, H.H., & Stout, W. (1996). A new procedure for detection of crossing DIF. Psychometrika, 61(4), 647-677. https://doi.org/10.1007/BF02294041
  • Linn, R. L., & Gronlund, N. E. (2000). Measurement and assessment in teaching (8th Ed.). Upper Saddle River.
  • Lopez, G.E. (2012). Detection and classification of DIF types using parametric and nonparametric methods: A comparison of the IRT-likelihood ratio test, crossing-SIBTEST, and logistic regression procedures [Unpublished doctoral dissertation]. University of South Florida.
  • Lord, F.M. (1980). Applications of item response theory to practical testing problems. Routledge.
  • Magis, D., Beland, S., &Raiche, G. (2022). Collection of methods to detect dichotomous differential item functioning (DIF). Package ‘difR’.
  • Marañón, P.P., Garcia, M.I.B., & Costas, C.S.L. (1997). Identification of nonuniform differential item functioning: A comparison of Mantel-Haenszel and item response theory analysis procedures. Educational and Psychological Measurement, 57(4), 559-568. https://doi.org/10.1177/0013164497057004002
  • Mellenbergh, G.J. (1983). Conditional item bias methods. In S.H. Irvine & J.W. Berry (Eds.), Human assessment and cultural factors (pp. 293-302). Springer.
  • Messick, S. (1989). Validity. In R.L. Linn (Ed.), Educational measurement (pp. 13-103). MacMillan.
  • Narayanan, P., & Swaminathan, H. (1994). Performance of the Mantel-Haenszel and simultaneous item bias procedures for detecting differential item functioning. Applied Psychological Measurement, 18(4), 315 328. https://doi.org/10.1177/014662169401800403
  • Narayanan, P., & Swaminathan, H. (1996). Identification of items that show non-uniform DIF. Applied Psychological Measurement, 20(3), 257 274. https://doi.org/10.1177/014662169602000306
  • Oshima, T.C., & Morris, S.B. (2008). Raju's differential functioning of items and tests (DFIT). Educational Measurement: Issues and Practice, 27(3), 43 50. https://doi.org/10.1111/j.1745-3992.2008.00127.x
  • Osterlind, S.J., & Everson, H.T. (2009). Differential Item Functioning. Sage.
  • R Core Team. (2022). R: A language and environment for statistical computing [Computer software manual]. http://www.R-project.org/
  • Raju, N.S. (1988). The area between two item characteristic curves. Psychometrika, 53(4), 495-502. https://doi.org/10.1007/BF02294403
  • Reise, S.P., & Waller, N.G. (2002). Item response theory for dichotomous assessment data. In F. Drasgow & N. Schmitt (Eds.), Measuring and analyzing behavior in organizations: Advances in measurement and data analysis (pp. 88–122). Jossey-Bass.
  • Rockoff, D. (2018). A randomization test for the detection of differential item functioning [Unpublished doctoral dissertation]. The University of Arizona.
  • Rogers, H.J., & Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17, 105–116. https://doi.org/10.1177/014662169301700201
  • Roussos, L.A., & Stout, W.F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel Type I error performance. Journal of Educational Measurement, 33, 215-230. https://doi.org/10.1111/j.1745-3984.1996.tb00490.x
  • Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58(2), 159-194. https://doi.org/10.1007/BF02294572
  • Swaminathan, H., & Rogers, H.J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361 370. https://doi.org/10.1111/j.1745-3984.1990.tb00754.x
  • Uttaro, T., & Millsap, R.E. (1994). Factors influencing the Mantel-Haenszel procedure in the detection of differential item functioning. Applied Psychological Measurement, 18(1), 15–25. https://doi.org/10.1177/014662169401800102
  • Uyar, Ş. (2015). Gözlenen gruplara ve örtük sınıflara göre belirlenen değişen madde fonksiyonunun karşılaştırılması [Comparing differential item functioning based on manifest groups and latent classes] [Unpublished doctoral dissertation]. University of Hacettepe.
  • Uysal, İ., Ertuna, L., Ertaş, F.G. & Kelecioğlu, H. (2019). Performances based on ability estimation of the methods of detecting differential item functioning: A simulation study. Journal of Measurement and Evaluation in Education and Psychology, 10(2), 133-148. https://doi.org/10.21031/epod.534312
  • Vaughn, B.K., & Wang, Q. (2010). DIF trees: using classifications trees to detect differential item functioning. Educational and Psychological Measurement, 70(6) 941–952. https://doi.org/10.1177/0013164410379326
  • Zumbo, B.D.A. (1999). Handbook on the theory and methods of differential item functioning: Logistic regression modeling as a unitary framework for binary and likert type item scores. Ottowa.

A Comparison of the efficacies of differential item functioning detection methods

Year 2023, Volume: 10 Issue: 1, 145 - 159, 20.03.2023
https://doi.org/10.21449/ijate.1135368

Abstract

To ensure the validity of the tests is to check that all items have similar results across different groups of individuals. However, differential item functioning (DIF) occurs when the results of individuals with equal ability levels from different groups differ from each other on the same test item. Based on Item Response Theory and Classic Test Theory, there are some methods, with different advantages and limitations to identify items that show DIF. This study aims to compare the performances of five methods for detecting DIF. The efficacies of Mantel-Haenszel (MH), Logistic Regression (LR), Crossing simultaneous item bias test (CSIBTEST), Lord's chi-square (LORD), and Raju's area measure (RAJU) methods are examined considering conditions of the sample size, DIF ratio, and test length. In this study, to compare the detection methods, power and Type I error rates are evaluated using a simulation study with 100 replications conducted for each condition. Results show that LR and MH have the lowest Type I error and the highest power rate in detecting uniform DIF. In addition, CSIBTEST has a similar power rate to MH and LR. Under DIF conditions, sample size, DIF ratio, test length and their interactions affect Type I error and power rates.

References

  • Apinyapibal, S., Lawthong, N., & Kanjanawasee, S. (2015). A comparative analysis of the efficacy of differential item functioning detection for dichotomously scored items among logistic regression, SIBTEST and raschtree methods. Procedia-Social and Behavioral Sciences, 191, 21-25. https://doi.org/10.1016/j.sbspro.2015.04.664
  • Atalay Kabasakal, K., Arsan, N., Gök, B., & Kelecioğlu, H. (2014). Comparing performances (type I error and power) of IRT likelihood ratio SIBTEST and Mantel-Haenszel methods in the determination of differential item functioning, Educational Sciences: Theory and Practice, 14(6), 2175-2193. https://doi.org/10.12738/estp.2014.6.2165
  • Atar, B. (2007). Differential item functioning analyses for mixed response data using IRT likelihood-ratio test, logistic regression, and GLLAMM procedures [Unpublished doctoral dissertation]. University of Florida State.
  • Ayva Yörü, F.G., & Atar, H.Y. (2019). Determination of differential item functioning (DIF) according to SIBTEST, Lord's [Chi-squared], Raju's area measurement and Breslow-Day Methods. Journal of Pedagogical Research, 3(3), 139 150. https://doi.org/10.33902/jpr.v3i3.137
  • Camilli, G., & Shepard, L.A. (1994). Methods for identifying biased test items. Sage Publications.
  • Camilli, G. (2006). Test fairness. In R.L. Brennan (Ed), Educational Measurement (4th ed., pp. 221–257). Rowman & Littlefield.
  • De Ayala, R.J. (2009). The theory and practice of item response theory. The Guilford Press.
  • DeMars, C.E. (2009). Modification of the Mantel-Haenszel and logistic regression DIF procedures to incorporate the SIBTEST regression correction. Journal of Educational and Behavioral Statistics, 34, 149-170. https://doi.org/10.3102/1076998607313923
  • DeMars, C.E., & Lau, A. (2011). Differential item functioning detection with latent classes: how accurately can we detect who is responding differentially?. Educational and Psychological Measurement, 71(4), 597 616. https://doi.org/10.1177/0013164411404221
  • Dorans, N.J., & Holland, P.W. (1992). DIF detection and description: Mantel‐Haenszel and standardization. ETS Research Report Series, 1992(1), i 40. https://doi.org/10.1002/j.2333-8504.1992.tb01440.x
  • Embretson, S.E., & Reise, S.T. (2000). Item response theory for psychologists. Lawrance Erlbaum Associates.
  • Erdem Keklik, D. (2014). Değişen madde fonksiyonunu belirlemede Mantel-Haenszel ve lojistik regresyon tekniklerinin karşılaştırılması [Comparison of Mantel-Haenszel and logistic regression techniques in detecting differential item functioning]. Journal of Measurement and Evaluation in Education and Psychology, 5(2), 12 25. https://doi.org/10.21031/epod.71099
  • Fidalgo, A.M., Mellenbergh, G.J., & Muñiz, J. (2000). Effects of amount of DIF, test length, and purification type on robustness and power of Mantel-Haenszel procedures. Methods of Psychological Research Online, 5(3), 43-53.
  • Finch, H. (2005). The MIMIC model as a method for detecting DIF: Comparison with Mantel Haenszel, SIBTEST, and the IRT likelihood ratio. Applied Psychological Measurement, 29, 278-295. https://doi.org/10.1177/0146621605275728
  • Gao, X. (2019). A comparison of six DIF detection methods [Unpublished master’s thesis]. University of Connecticut.
  • Gierl, M.J., Jodoin, M.G., & Ackerman, T.A. (2000, April 24-27). Performance of Mantel-Haenszel, simultaneous item bias test, and logistic regression when the proportion of DIF items is large [Paper presentation] In Annual Meeting of the American Educational Research Association (AERA), New Orleans, LA, United States.
  • Glas, C.A., & Meijer, R.R. (2003). A Bayesian approach to person fit analysis in item response theory models. Applied Psychological Measurement, 27(3), 217 233. https://doi.org/10.1177/0146621603027003003
  • Guilera, G., Gomez-Benito, J., Hidalgo, M.D. & Sanchez-Meca, J. (2013). Type I error and statistical power of the Mantel-Haenszel procedure for detecting DIF: A meta-analysis. Psychological Methods, 18(4), 553-71. https://doi.org/10.1037/a0034306
  • Güler, N., & Penfield, R.D. (2009). A comparison of the logistic regression and contingency table methods for simultaneous detection of uniform and nonuniform DIF. Journal of Educational Measurement, 46(3), 314 329. https://doi.org/10.1111/j.1745 3984.2009.00083.x
  • Hambleton, R.K., Swaminathan, H., & Rogers, H.J. (1991). Fundamentals of item response theory. Sage.
  • Hambleton, R.K., Clauser, B.E., Mazor, K.M., & Jones, R.W. (1993). Advances in the detection of differentially functioning test items. European Journal of Psychological Assessment, 9(1), 1-18.
  • Han, K.T., & Hambleton, R.K. (2014). User's manual for WinGen3: Windows software that generates IRT model parameters and item responses (Center for Educational Assessment Report No. 642). Amherst, MA: University of Massachusetts, Center for Educational Assessment.
  • Herrera, A., & Gomez, J. (2008). Influence of equal or unequal comparison group sample sizes on the detection of differential item functioning using the Mantel-Haenszel and logistic regression techniques. Quality & Quantity, 42(6), 739 755. https://doi.org/10.1007/s11135-006-9065-z
  • Hidalgo, M.D., López-Martínez, M.D., Gómez-Benito, J., & Guilera, G. (2016). A comparison of discriminant logistic regression and Item Response Theory Likelihood-Ratio Tests for Differential Item Functioning (IRTLRDIF) in polytomous short tests. Psicothema, 28(1), 83-88. https://doi.org/10.7334/psicothema2015.142
  • Holland, P.W., & Thayer, D.T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer & H.I. Braun (Eds.), Test validity (pp. 129-145). Hillsdale, NJ: Erlbaum
  • Holmes Finch, W., & French, B.F. (2007). Detection of crossing differential item functioning: A comparison of four methods. Educational and Psychological Measurement, 67(4), 565-582. https://doi.org/10.1177/0013164406296975
  • Jodoin, M.G., & Gierl, M.J. (2001). Evaluating type I error and power rates using an effect size measure with the logistic regression procedure for DIF detection. Applied measurement in education, 14(4), 329-349. https://doi.org/10.1207/S15324818AME1404_2
  • Kane, M.T. (2006). Validation. In R.L. Brennan (Ed.), Educational measurement (4th ed., pp. 17– 64). Rowman & Littlefield.
  • Karasar, N. (2021). Bilimsel araştırma yöntemleri [Scientific research methods]. Nobel Yayınları.
  • Kaya, Y., Leite, W., & Miller, M.D. (2015). A comparison of logistic regression models for DIF detection in polytomous items: the effect of small sample sizes and non-normality of ability distributions. International Journal of Assessment Tools in Education, 2(1), 22-39. https://doi.org/10.21449/ijate.239563
  • Kelecioğlu, H., Karabay, B., & Karabay, E. (2014). Seviye belirleme sınavı’nın madde yanlılığı açısından incelenmesi [Investigation of placement test in terms of item biasness]. Elementary Education Online, 13(3), 934-953.
  • Kim, J. (2010). Controlling Type I error rate in evaluating differential item functioning for four DIF methods: Use of three procedures for adjustment of multiple item testing. Dissertation, Georgia State University.
  • Li, Y., Brooks, G.P., & Johanson, G.A. (2012). Item discrimination and Type I error in the detection of differential item functioning. Educational and Psychological Measurement, 72(5), 847-861. https://doi.org/10.1177/0013164411432333
  • Li, H.H., & Stout, W. (1996). A new procedure for detection of crossing DIF. Psychometrika, 61(4), 647-677. https://doi.org/10.1007/BF02294041
  • Linn, R. L., & Gronlund, N. E. (2000). Measurement and assessment in teaching (8th Ed.). Upper Saddle River.
  • Lopez, G.E. (2012). Detection and classification of DIF types using parametric and nonparametric methods: A comparison of the IRT-likelihood ratio test, crossing-SIBTEST, and logistic regression procedures [Unpublished doctoral dissertation]. University of South Florida.
  • Lord, F.M. (1980). Applications of item response theory to practical testing problems. Routledge.
  • Magis, D., Beland, S., &Raiche, G. (2022). Collection of methods to detect dichotomous differential item functioning (DIF). Package ‘difR’.
  • Marañón, P.P., Garcia, M.I.B., & Costas, C.S.L. (1997). Identification of nonuniform differential item functioning: A comparison of Mantel-Haenszel and item response theory analysis procedures. Educational and Psychological Measurement, 57(4), 559-568. https://doi.org/10.1177/0013164497057004002
  • Mellenbergh, G.J. (1983). Conditional item bias methods. In S.H. Irvine & J.W. Berry (Eds.), Human assessment and cultural factors (pp. 293-302). Springer.
  • Messick, S. (1989). Validity. In R.L. Linn (Ed.), Educational measurement (pp. 13-103). MacMillan.
  • Narayanan, P., & Swaminathan, H. (1994). Performance of the Mantel-Haenszel and simultaneous item bias procedures for detecting differential item functioning. Applied Psychological Measurement, 18(4), 315 328. https://doi.org/10.1177/014662169401800403
  • Narayanan, P., & Swaminathan, H. (1996). Identification of items that show non-uniform DIF. Applied Psychological Measurement, 20(3), 257 274. https://doi.org/10.1177/014662169602000306
  • Oshima, T.C., & Morris, S.B. (2008). Raju's differential functioning of items and tests (DFIT). Educational Measurement: Issues and Practice, 27(3), 43 50. https://doi.org/10.1111/j.1745-3992.2008.00127.x
  • Osterlind, S.J., & Everson, H.T. (2009). Differential Item Functioning. Sage.
  • R Core Team. (2022). R: A language and environment for statistical computing [Computer software manual]. http://www.R-project.org/
  • Raju, N.S. (1988). The area between two item characteristic curves. Psychometrika, 53(4), 495-502. https://doi.org/10.1007/BF02294403
  • Reise, S.P., & Waller, N.G. (2002). Item response theory for dichotomous assessment data. In F. Drasgow & N. Schmitt (Eds.), Measuring and analyzing behavior in organizations: Advances in measurement and data analysis (pp. 88–122). Jossey-Bass.
  • Rockoff, D. (2018). A randomization test for the detection of differential item functioning [Unpublished doctoral dissertation]. The University of Arizona.
  • Rogers, H.J., & Swaminathan, H. (1993). A comparison of logistic regression and Mantel-Haenszel procedures for detecting differential item functioning. Applied Psychological Measurement, 17, 105–116. https://doi.org/10.1177/014662169301700201
  • Roussos, L.A., & Stout, W.F. (1996). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel Type I error performance. Journal of Educational Measurement, 33, 215-230. https://doi.org/10.1111/j.1745-3984.1996.tb00490.x
  • Shealy, R., & Stout, W. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58(2), 159-194. https://doi.org/10.1007/BF02294572
  • Swaminathan, H., & Rogers, H.J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361 370. https://doi.org/10.1111/j.1745-3984.1990.tb00754.x
  • Uttaro, T., & Millsap, R.E. (1994). Factors influencing the Mantel-Haenszel procedure in the detection of differential item functioning. Applied Psychological Measurement, 18(1), 15–25. https://doi.org/10.1177/014662169401800102
  • Uyar, Ş. (2015). Gözlenen gruplara ve örtük sınıflara göre belirlenen değişen madde fonksiyonunun karşılaştırılması [Comparing differential item functioning based on manifest groups and latent classes] [Unpublished doctoral dissertation]. University of Hacettepe.
  • Uysal, İ., Ertuna, L., Ertaş, F.G. & Kelecioğlu, H. (2019). Performances based on ability estimation of the methods of detecting differential item functioning: A simulation study. Journal of Measurement and Evaluation in Education and Psychology, 10(2), 133-148. https://doi.org/10.21031/epod.534312
  • Vaughn, B.K., & Wang, Q. (2010). DIF trees: using classifications trees to detect differential item functioning. Educational and Psychological Measurement, 70(6) 941–952. https://doi.org/10.1177/0013164410379326
  • Zumbo, B.D.A. (1999). Handbook on the theory and methods of differential item functioning: Logistic regression modeling as a unitary framework for binary and likert type item scores. Ottowa.
There are 58 citations in total.

Details

Primary Language English
Subjects Other Fields of Education
Journal Section Articles
Authors

Münevver Başman 0000-0003-3572-7982

Publication Date March 20, 2023
Submission Date June 24, 2022
Published in Issue Year 2023 Volume: 10 Issue: 1

Cite

APA Başman, M. (2023). A Comparison of the efficacies of differential item functioning detection methods. International Journal of Assessment Tools in Education, 10(1), 145-159. https://doi.org/10.21449/ijate.1135368

23823             23825             23824