Research Article
BibTex RIS Cite
Year 2021, Volume: 12 Issue: 3, 267 - 285, 29.09.2021
https://doi.org/10.21031/epod.988879

Abstract

References

  • Ackerman, T. A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29(1), 67-91.
  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
  • Asil, M. (2010). Uluslararası Öğrenci Değerlendirme Programı (PISA) 2006 öğrenci anketinin kültürler arası eşdeğerliğinin incelenmesi. Yayınlanmamış doktora tezi, Hacettepe Üniversitesi, Ankara.
  • Bakan Kalaycıoğlu, D., & Berberoğlu, G. (2010). Differential item functioning Analysis of the science and mathematics items in the university entrance examinations in Turkey. Journal of Psychoeducational Assessment, 20(5), 1-12.
  • Bakan Kalaycıoğlu, D., & Kelecioğlu, H. (2011). Öğrenci Seçme Sınavı’nın madde yanlılığı açısından incelenmesi. Eğitim ve Bilim, 36(161), 3-13.
  • Camilli, G. (2006). Test fairness. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 221-256). Westport: American Council on Education & Praeger Publishers.
  • Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement: Issues and Practice, 17(1), 31-47.
  • Cohen, A., & Ibarra, R. A. (2005). Examining gender-related differential item funtioning using insights from psychometric and multicontext theory. In A. M. Gallagher ve J. C. Kaufman (eds.). Gender differences in mathematics: An integrative psychological approach ( pp. 143-171). Cambridge: NY.
  • Doğan, N., & Öğretmen, T. (2008). Değişen madde fonksiyonunu belirlemede Mantel-Haenszel, ki-kare ve lojistik regresyon tekniklerinin karşılaştırılması. Eğitim ve Bilim, 33(148), 100-112.
  • Doolittle, A. E., & Cleary, T. A. (1987). Gender-based differential item performance in mathematics achievement items. Journal of Educational Measurement, 24(2), 157-166.
  • du Toit, M. (Ed.). (2003). IRT from SSI: BILOG¬MG, MULTILOG, PARSCALE, TESTFACT. Lincolnwood, IL: Scientific Software International, Inc.
  • Ercikan, K. (1998). Translation effects in international assessments. International Journal of Educational Research, 29, 543-553.
  • Educational Testing Service. (2007). The GRE® Analytical Writing Measure: An asset in admissions decisions. Downloaded from www.ets.org/Media/Tests/GRE/pdf/gre_aw_an_asset.pdf
  • Fennema, E., & Sherman, J. (1977). Sex-related differences in mathematics achievement, spatial visualization and affective factors. American Educational Research Journal, 14(1), 51-71.
  • Gierl, M. J. (2005). Using dimensionality-based DIF analyses to identify and interpret constructs that elicit group differences. Educational Measurement: Issues and Practice, 24(1), 3-14.
  • Gierl, M. J., Bisanz, J., Bisanz, G. L., & Boughton, K. A. (2003). Identifying content and cognitive skills that produce gender differences in mathematics: A demonstration of the multidimensionality-based DIF analysis paradigm. Journal of Educational Measurment, 40(4), 281-306.
  • Gierl, M., Bisanz, J., Bisanz, G., Boughton, K., & Khaliq, S. (2001). Illustrating the utility of differential bundle functioning analyses to identify and interpret group differences on achievement tests. Educational Measurement: Issues and Practice, 20(2), 26–36.
  • Gök, B., Kelecioğlu, H. ve Doğan, N. (2010). Değişen madde fonksiyonunu belirlemede Mantel-Haenzsel ve lojistik regresyon tekniklerinin karşılaştırılması. Eğitim ve Bilim, 35(156), 3-16.
  • Grisay, A., de Jong, J. H. A. L., Gebhardt, E., Berenzer, A., & Halleux-Monseur, B. (2007). Translation equivalence across PISA countries. Journal of Applied Measurement, 8(3), 249-266.
  • Hambleton, R. K. (2006). Good practices for identifying differential item functioning. Medical Care, 44(11), 182-188.
  • Hambleton, R. K., Merenda, P. F., & Spielberger, C. D. (eds.) (2005). Adapting educational and psychological tests for cross-cultural assessment. Mahwah, NJ: Lawrence Erlbaum.
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of Item Response Theory. Sage Publications: California.
  • Harris, A. M., & Carlton, S. T. (1983). Patterns of gender differences on mathematics items on the Scholastic Aptitude Test. Applied Measurement in Education, 6(2), 137-151.
  • Higaldo, M. D., & Lopez-Pina, J. A. (2004). Differential item functioning detection and effect size: A comparison between logistic regression and Mantel-Haenszel procedures. Educational and Psychological Measurement, 64, 903-915.
  • Holland, P. W., & Thayer, D.T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer ve H.I. Braun (Eds.), Test validity (pp.129-145). Hillsdale, NJ: Erlbaum.
  • Jodoin, M. G., & Gierl, M.J. (2001). Evaluating type I error and power rates using an effect size measure with logistic regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329-349.
  • Joint Committee on Testing Practices. (2004). Code of fair testing practices in education. Downloaded from http://www.apa.org/science/programs/testing/fair-code.aspx
  • Mendes-Barnett, S., & Ercikan, K. (2006). Examining sources of gender DIF in mathematics assessments using a confirmatory multidimensional model approach. Applied Measurement in Education, 19(4), 289-304.
  • Millsap, R. E., & Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17(4), 297-334.
  • Nandakumar, R. (1993). Simultaneous DIF amplification and cancellation: Shealy–Stout’s test for DIF. Journal of Educational Measurement, 30(4), 293–312.
  • Ong, Y.M., Williams, J. S., & Lamprianou, I. (2011). Exploration of the validity of gender differences in mathematics assessment using differential bundle functioning. International Journal of Testing, 11(3), 271-293.
  • Oort, F. (1992). Using restricted factor analysis to detect item bias. Methodika, 6(2), 150–166.
  • ÖSYM. (2008). 2008 Akademik Personel ve Lisansüstü Eğitimi Giriş Sınavı (ALES) Sonbahar Dönemi Kılavuzu. www.osym.gov.tr adresinden indirilmiştir.
  • Roussos, L., & Stout, W. (1996a). A multidimensionality-based DIF analysis paradigm. Applied Psychological Measurement, 20(4), 355-371.
  • Roussos, L. A., & Stout, W. F. (1996b). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel type I error performance. Journal of Educational Measurement, 33(2), 215-230.
  • Ryan, K. E., & Chiu, S. (2001). An examination of item context effects, DIF, and gender DIF. Applied Measurement in Education, 14(1), 73-90.
  • Scheunemann, J. D., & Grima, A. (1997). Characteristics of quantitative word items associated with differential performace for female and Black examinees. Applied Measurement in Education, 10(4), 299-320.
  • Shealy, R., & Stout, W. F. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159–194.
  • Smith, L. L., & Reise, S. P. (1998). Gender differences on negative affectivity: An IRT study of differential item functioning on the Multidimensional Personality Questionnaire Stress Reaction scale. Journal of Personality and Social Psychology, 75(5), 1350-1362.
  • Stout, W., & Roussos, L. (1995). SIBTEST user manual. Urbana: University of Illinois.
  • Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361-370.
  • Thissen, D. (2001). IRTLRDIF v.2.0.b: Software for the computation of the statistics involved in Item Response Theory Likelihood-Ratio tests for differential item functioning. Downloaded from http://www.unc.edu/~dthissen/dl.html
  • Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P.W. Holland ve H. Wainer (Eds.), Differential item functioning (pp. 67-113). Hillsdale NJ: Erlbaum.
  • Waller, N. G. (1998). EZDIF: Detection of uniform and non-uniform differential item functioning with the Mantel-Haenszel and logistic regression procedures. Applied Psychological Measurement, 22(4), 391.
  • Wang, W., & Yeh, L. Y. (2003). Effects of anchor item methods on differential ıtem functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27(6), 479-498.
  • Yıldırım, H. H., & Berberoğlu, G. (1999). Judgemental and statistical DIF analyses of the PISA-2003 Mathematics Literacy items. International Journal of Testing, 9(2), 108-121.
  • Zieky, M. (1993) Practical questions in the use of DIF statistics in test development. In P.W. Holland ve H. Wainer (Eds.), Differential item functioning (pp. 337-347). Hillsdale NJ: Erlbaum.
  • Zimowski, M. F., Muraki, E., Mislevy, R. J., & Bock, R. D. (1996). BILOG-MG: Multiple-group IRT analysis and test maintenance for binary items. Chicago, IL: Scientific Software International.
  • Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
  • Zumbo, B. D., & Gelin, M. N. (2005). A matter of test bias in educational policy research: Bringing the context into picture by investigating sociological community moderated (or mediated) test and item bias. Journal of Educational Research and Policy Studies, 5, 1-23.
  • Zumbo, B. D., & Thomas, D. R. (1996, October). A measure of DIF effect size using logistic regression procedures. Paper presented at the National Board of Medical Examiners, Philadelphia, PA.

Detecting Differential Item Functioning Using SIBTEST, MH, LR and IRT Methods

Year 2021, Volume: 12 Issue: 3, 267 - 285, 29.09.2021
https://doi.org/10.21031/epod.988879

Abstract

In this study, differential item functioning (DIF) and differential bundle functioning (DBF) analyses of the Academic Staff and Postgraduate Education Entrance Examination Quantitative Ability Tests were carried out. Mantel-Haenszel, logistic regression, SIBTEST, Item Response Theory-Likelihood Ratio and BILOG-MG DIF Algorithm methods were used for DIF analyses. SIBTEST was the method used for DBF analyses. Data sets for the study came from an earlier application of the examination. Gender DIF analyses showed that eleven items showed DIF. Four of the items favored male applicants, where seven of them favored female applicants. In order to investigate the sources of DIF, we consulted experts. In general, the items which could be solved using routine algorithmic operations and which are presented in the algebraic, abstract format showed DIF in favor of females. The “real-life” word problems favored males. According to DBF analyses, the operations item group favored females and the word problems item group favored males.

References

  • Ackerman, T. A. (1992). A didactic explanation of item bias, item impact, and item validity from a multidimensional perspective. Journal of Educational Measurement, 29(1), 67-91.
  • American Educational Research Association, American Psychological Association, & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. Washington, DC: American Educational Research Association.
  • Asil, M. (2010). Uluslararası Öğrenci Değerlendirme Programı (PISA) 2006 öğrenci anketinin kültürler arası eşdeğerliğinin incelenmesi. Yayınlanmamış doktora tezi, Hacettepe Üniversitesi, Ankara.
  • Bakan Kalaycıoğlu, D., & Berberoğlu, G. (2010). Differential item functioning Analysis of the science and mathematics items in the university entrance examinations in Turkey. Journal of Psychoeducational Assessment, 20(5), 1-12.
  • Bakan Kalaycıoğlu, D., & Kelecioğlu, H. (2011). Öğrenci Seçme Sınavı’nın madde yanlılığı açısından incelenmesi. Eğitim ve Bilim, 36(161), 3-13.
  • Camilli, G. (2006). Test fairness. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 221-256). Westport: American Council on Education & Praeger Publishers.
  • Clauser, B. E., & Mazor, K. M. (1998). Using statistical procedures to identify differentially functioning test items. Educational Measurement: Issues and Practice, 17(1), 31-47.
  • Cohen, A., & Ibarra, R. A. (2005). Examining gender-related differential item funtioning using insights from psychometric and multicontext theory. In A. M. Gallagher ve J. C. Kaufman (eds.). Gender differences in mathematics: An integrative psychological approach ( pp. 143-171). Cambridge: NY.
  • Doğan, N., & Öğretmen, T. (2008). Değişen madde fonksiyonunu belirlemede Mantel-Haenszel, ki-kare ve lojistik regresyon tekniklerinin karşılaştırılması. Eğitim ve Bilim, 33(148), 100-112.
  • Doolittle, A. E., & Cleary, T. A. (1987). Gender-based differential item performance in mathematics achievement items. Journal of Educational Measurement, 24(2), 157-166.
  • du Toit, M. (Ed.). (2003). IRT from SSI: BILOG¬MG, MULTILOG, PARSCALE, TESTFACT. Lincolnwood, IL: Scientific Software International, Inc.
  • Ercikan, K. (1998). Translation effects in international assessments. International Journal of Educational Research, 29, 543-553.
  • Educational Testing Service. (2007). The GRE® Analytical Writing Measure: An asset in admissions decisions. Downloaded from www.ets.org/Media/Tests/GRE/pdf/gre_aw_an_asset.pdf
  • Fennema, E., & Sherman, J. (1977). Sex-related differences in mathematics achievement, spatial visualization and affective factors. American Educational Research Journal, 14(1), 51-71.
  • Gierl, M. J. (2005). Using dimensionality-based DIF analyses to identify and interpret constructs that elicit group differences. Educational Measurement: Issues and Practice, 24(1), 3-14.
  • Gierl, M. J., Bisanz, J., Bisanz, G. L., & Boughton, K. A. (2003). Identifying content and cognitive skills that produce gender differences in mathematics: A demonstration of the multidimensionality-based DIF analysis paradigm. Journal of Educational Measurment, 40(4), 281-306.
  • Gierl, M., Bisanz, J., Bisanz, G., Boughton, K., & Khaliq, S. (2001). Illustrating the utility of differential bundle functioning analyses to identify and interpret group differences on achievement tests. Educational Measurement: Issues and Practice, 20(2), 26–36.
  • Gök, B., Kelecioğlu, H. ve Doğan, N. (2010). Değişen madde fonksiyonunu belirlemede Mantel-Haenzsel ve lojistik regresyon tekniklerinin karşılaştırılması. Eğitim ve Bilim, 35(156), 3-16.
  • Grisay, A., de Jong, J. H. A. L., Gebhardt, E., Berenzer, A., & Halleux-Monseur, B. (2007). Translation equivalence across PISA countries. Journal of Applied Measurement, 8(3), 249-266.
  • Hambleton, R. K. (2006). Good practices for identifying differential item functioning. Medical Care, 44(11), 182-188.
  • Hambleton, R. K., Merenda, P. F., & Spielberger, C. D. (eds.) (2005). Adapting educational and psychological tests for cross-cultural assessment. Mahwah, NJ: Lawrence Erlbaum.
  • Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of Item Response Theory. Sage Publications: California.
  • Harris, A. M., & Carlton, S. T. (1983). Patterns of gender differences on mathematics items on the Scholastic Aptitude Test. Applied Measurement in Education, 6(2), 137-151.
  • Higaldo, M. D., & Lopez-Pina, J. A. (2004). Differential item functioning detection and effect size: A comparison between logistic regression and Mantel-Haenszel procedures. Educational and Psychological Measurement, 64, 903-915.
  • Holland, P. W., & Thayer, D.T. (1988). Differential item performance and the Mantel-Haenszel procedure. In H. Wainer ve H.I. Braun (Eds.), Test validity (pp.129-145). Hillsdale, NJ: Erlbaum.
  • Jodoin, M. G., & Gierl, M.J. (2001). Evaluating type I error and power rates using an effect size measure with logistic regression procedure for DIF detection. Applied Measurement in Education, 14(4), 329-349.
  • Joint Committee on Testing Practices. (2004). Code of fair testing practices in education. Downloaded from http://www.apa.org/science/programs/testing/fair-code.aspx
  • Mendes-Barnett, S., & Ercikan, K. (2006). Examining sources of gender DIF in mathematics assessments using a confirmatory multidimensional model approach. Applied Measurement in Education, 19(4), 289-304.
  • Millsap, R. E., & Everson, H. T. (1993). Methodology review: Statistical approaches for assessing measurement bias. Applied Psychological Measurement, 17(4), 297-334.
  • Nandakumar, R. (1993). Simultaneous DIF amplification and cancellation: Shealy–Stout’s test for DIF. Journal of Educational Measurement, 30(4), 293–312.
  • Ong, Y.M., Williams, J. S., & Lamprianou, I. (2011). Exploration of the validity of gender differences in mathematics assessment using differential bundle functioning. International Journal of Testing, 11(3), 271-293.
  • Oort, F. (1992). Using restricted factor analysis to detect item bias. Methodika, 6(2), 150–166.
  • ÖSYM. (2008). 2008 Akademik Personel ve Lisansüstü Eğitimi Giriş Sınavı (ALES) Sonbahar Dönemi Kılavuzu. www.osym.gov.tr adresinden indirilmiştir.
  • Roussos, L., & Stout, W. (1996a). A multidimensionality-based DIF analysis paradigm. Applied Psychological Measurement, 20(4), 355-371.
  • Roussos, L. A., & Stout, W. F. (1996b). Simulation studies of the effects of small sample size and studied item parameters on SIBTEST and Mantel-Haenszel type I error performance. Journal of Educational Measurement, 33(2), 215-230.
  • Ryan, K. E., & Chiu, S. (2001). An examination of item context effects, DIF, and gender DIF. Applied Measurement in Education, 14(1), 73-90.
  • Scheunemann, J. D., & Grima, A. (1997). Characteristics of quantitative word items associated with differential performace for female and Black examinees. Applied Measurement in Education, 10(4), 299-320.
  • Shealy, R., & Stout, W. F. (1993). A model-based standardization approach that separates true bias/DIF from group ability differences and detects test bias/DTF as well as item bias/DIF. Psychometrika, 58, 159–194.
  • Smith, L. L., & Reise, S. P. (1998). Gender differences on negative affectivity: An IRT study of differential item functioning on the Multidimensional Personality Questionnaire Stress Reaction scale. Journal of Personality and Social Psychology, 75(5), 1350-1362.
  • Stout, W., & Roussos, L. (1995). SIBTEST user manual. Urbana: University of Illinois.
  • Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27(4), 361-370.
  • Thissen, D. (2001). IRTLRDIF v.2.0.b: Software for the computation of the statistics involved in Item Response Theory Likelihood-Ratio tests for differential item functioning. Downloaded from http://www.unc.edu/~dthissen/dl.html
  • Thissen, D., Steinberg, L., & Wainer, H. (1993). Detection of differential item functioning using the parameters of item response models. In P.W. Holland ve H. Wainer (Eds.), Differential item functioning (pp. 67-113). Hillsdale NJ: Erlbaum.
  • Waller, N. G. (1998). EZDIF: Detection of uniform and non-uniform differential item functioning with the Mantel-Haenszel and logistic regression procedures. Applied Psychological Measurement, 22(4), 391.
  • Wang, W., & Yeh, L. Y. (2003). Effects of anchor item methods on differential ıtem functioning detection with the likelihood ratio test. Applied Psychological Measurement, 27(6), 479-498.
  • Yıldırım, H. H., & Berberoğlu, G. (1999). Judgemental and statistical DIF analyses of the PISA-2003 Mathematics Literacy items. International Journal of Testing, 9(2), 108-121.
  • Zieky, M. (1993) Practical questions in the use of DIF statistics in test development. In P.W. Holland ve H. Wainer (Eds.), Differential item functioning (pp. 337-347). Hillsdale NJ: Erlbaum.
  • Zimowski, M. F., Muraki, E., Mislevy, R. J., & Bock, R. D. (1996). BILOG-MG: Multiple-group IRT analysis and test maintenance for binary items. Chicago, IL: Scientific Software International.
  • Zumbo, B. D. (1999). A handbook on the theory and methods of differential item functioning (DIF): Logistic regression modeling as a unitary framework for binary and Likert-type (ordinal) item scores. Ottawa ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.
  • Zumbo, B. D., & Gelin, M. N. (2005). A matter of test bias in educational policy research: Bringing the context into picture by investigating sociological community moderated (or mediated) test and item bias. Journal of Educational Research and Policy Studies, 5, 1-23.
  • Zumbo, B. D., & Thomas, D. R. (1996, October). A measure of DIF effect size using logistic regression procedures. Paper presented at the National Board of Medical Examiners, Philadelphia, PA.
There are 51 citations in total.

Details

Primary Language English
Journal Section Articles
Authors

Zafer Çepni 0000-0002-8033-905X

Hülya Kelecioğlu 0000-0002-0741-9934

Publication Date September 29, 2021
Acceptance Date September 24, 2021
Published in Issue Year 2021 Volume: 12 Issue: 3

Cite

APA Çepni, Z., & Kelecioğlu, H. (2021). Detecting Differential Item Functioning Using SIBTEST, MH, LR and IRT Methods. Journal of Measurement and Evaluation in Education and Psychology, 12(3), 267-285. https://doi.org/10.21031/epod.988879