Madde Tepki Modellemesinde Genellenebilirlik İle İki Yüzeyli Desenlerin İncelenmesi
Year 2018,
Volume: 9 Issue: 1, 17 - 32, 31.03.2018
Gülden Kaya Uyanık
,
Selahattin Gelbal
Abstract
Bu çalışmada, Madde Tepki Modellemesinde Genellenebilirlik
(MTMG) yaklaşımı iki yüzeyli bx(m:t) deseni ile incelenmiş ve Genellenebilirlik
Kuramından (GK) elde edilen sonuçlar ile karşılaştırılmıştır. Çalışmada simülasyon
verisi kullanılmıştır. Genellenebilirlik Kuramı doğrusal veri seti bx (m:t)
dengelenmiş rastgele deseni için üretilmiştir. Üretilen veriler madde takımı
etkisi, madde takımı uzunluğu ve madde takımı sayısı açısından farklılık
göstermektedir. Veriler toplamda iki evrenden ve her evren dört farklı koşuldan
oluşmaktadır. Araştırmanın sonucu tüm evrenlere ait koşulların varyans
kestirimlerinin MTMG yaklaşımı ve GK ile elde edilen sonuçlar arasında bir fark
olmadığını göstermektedir. Elde edilen bu sonuç MTMG yaklaşımını ortaya atan ve
tek yüzeyli desen üzerinde inceleyen Briggs ve Wilson’ın yapmış oldukları
çalışma ile desteklenmektedir. MTMG yaklaşımı ve GK ile kestirilen değerler
arasında fark yoktur; ancak MTMG yaklaşımında hata varyansı etkileşim varyansından
ayrı olarak gözlenebilir. Çalışmada ayrıca madde takımları güvenirliği farklı
koşullar altında incelenmiştir. Birey-madde takımı etkileşiminin küçük olduğu
durumlarda etkileşimin büyük olduğu durumlara göre daha yüksek güvenirlik elde
edilmiştir. Bunun yanında madde takımı etkisi arttıkça güvenirliğin düştüğü
gözlenmiştir. Ayrıca tüm evrenlere ait koşullar incelendiğinde madde takımları
için madde sayısı artıkça güvenirliğin arttığı gözlenmiştir.
References
- Alkahtani, S. F. (2012). Oral performace scoring using generalizability theory and many-facet Rasch measurement: A comparison study. Unpublished Doctoral Dissertation, The Pennsylvania State University.
- Bock, R. D., Brennan, R. L. ve Muraki, E. (2002). The information in multiple ratings. Applied Psychological Measurement, 26, 364-375.
- Bradlow, E. T., Wainer, H. ve Wang, X. (1999). A bayesian random effects model for testlets. Psychometrika, 64, 153-168.
- Brennan, R. L. (2001). Generalizability theory. New-York: Springer-Verlag.
- Briggs, D. C. ve Wilson, M. (2004). Generalizability theory in item response modeling. Presentation at the International Meeting of the Psychometric Society, Pacific Grove, CA.
- Briggs, D. C. ve Wilson, M. (2007). Generalizability theory in item response modeling. Journal of Educational Measurement, 44(2), 131-155.
- Chien, Y. M. (2008). An investigation of testlet-based item response models with a random facets design in generalizability theory. Unpublished Doctoral Dissertation. University of Iowa.
- Cronbach, L. J., Linn, R. L., Brennan, R. L. ve Haertel, E. (1995). Generalizability analysis for educational assessments. Evaluation Comment. Los Angeles: UCLA's Center for the Study of Evaluation and The National Center for Research on Evaluation, Standards and Student Testing, http:--www.cse.ucla.edu.
- DeMars, C. E. (2006). Application of the bi-factor multidimensional item response theory model to testlet-based tests. Journal of Educational Measurement, 43(2), 145-168.
- Dimitrov, D. M. (2003). Marginal true-score measures and reliability for binary items as a function of their IRT parameters. Applied Psychological Measurement, 27(6), 440- 458.
- Dresher, A. R. (2004). An empirical investigation of LID using the testlet model: A further look. Paper presented at the Annual Meeting of the National Council on Measurement in Education, San Diego, CA.
- Feldt, L. S. ve Quails A. L. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement (3r ed.) (pp. 105-146). New York: American Council on Education and Macmillan.
- Ferrara, S., Huynh, F. L. ve Bagli, H. (1997). Contextual characateristics of locally dependent open-ended item clusters on a large-scale performance assessment. Applied Measurement in Education, 12, 123-144.
- Ferrara, S., Huynh, F. L. ve Michaels, H. (1999). Contextual explanations of local dependence in item clusters in a large-scale hands-on science performance assessment. Journal of Educational Measurement, 36, 119-140.
- Fox, J. P. ve Glas, C. A. W. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika, 66, 271-288.
- Glas, C. A. W. (1989). Contributions to estimating and testing Rasch models. Unpublished Doctoral Dissertation. Enschede, University of Twente.
- Güler, N., Kaya Uyanık, G. ve Taşdelen Teker, G. (2012). Genellenebilirlik kuramı. Ankara: Pegem Akademi Yayıncılık.
- Hendrickson, A. B. (2001). Reliability of scores from tests composed of testlets: a comparison of methods. Paper presented at the Annual Meeting of the National Council on Measurement in Education (Seatle, WA, April1-13, 201).
- Jiao, H., Kamata, A., Wang, S. ve Jin, Y. (2012). A multilevel testlet model for dual local dependence. Journal of Educational Measurement, 49(1), 82-100.
- Karasar, N. (2004). Bilimsel Araştırma Yöntemi. 13. Baskı, Ankara: Nobel Yayınları.
- Kim, S. C. ve Wilson, M. (2008). A comparative analysis of the ratings in performance assessment using generalizability theory and the many-facet Rasch model. Journal of Applied Measurement, 10(4), 408-423.
- Kolen, M. ve Harris, D. (1987). A multivariate test theory model based on item response theory and generalizability theory. Paper presented at the American Educational Research Association, Washington, DC.
- Lee, G. ve Park, I. Y. (2012). A comparison of the approaches of generalizability theory and item response theory in estimating the reliability of test scores for testlet-composed tests. Asia Pacific Education Review, 13(1), 47-54.
- Lee, G., Brennan, R. L. ve Frisbie, D. A. (2000). Incorporating the testlet concept in test score analyses. Educational Measurement: Issues and Practice, 19(4), 9-15.
- Lee, G. ve Frisbie, D. A. (1999). Estimating reliability under a generalizability theory model for test scores composed of testlets. Applied Measurement in Education, 12(3), 237-255.
- Li, Y., Bolt, D. M. ve Fu, J. (2006). A comparison of alternative models for testlets. Applied Psychological Measurement, 30(1), 3-21.
- Linacre, J. M. (1989). Many-facet Rasch measurement. Chicago: MESA Press.
- Linacre, J. M. (1999). FACETS (Version 3.17) [Computer software]. Chicago: MESA Press.
- Lord, F. M. (1983). Unbiased estimation of ability parameters, of their variance, and of their parallel forms reliability. Psychometrika, 48, 233-245
- Patz, R., Junker, B., Johnson, M. S. ve Mariano, L. (2002). The hierarchical rater model for rated test items and its application to large-scale educational assessment data. Journal of Educational and Behavioral Statistics, 27, 341-384.
- Raju, N. S. ve Oshima, T. C. (2005). Two prophecy formulas for assessing the reliability of item response theory-based ability estimates. Educational and Psychological Measurement, 65(3), 361-375.
- Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.
- Rosenbaum, P. R. (1988). Items bundles. Psychometrika, 53(3), 349-359.
- Samejima, F. (1977). A use of the information function in tailored testing. Applied Psychological Measurement, 1, 233-247.
- Samejima, F. (1994). Estimation of reliability coefficients using the test information function and its modifications. Applied Psychological Measurement, 18, 229-244.
- Shavelson, R. J. ve Webb, N. M. (1991). Generalizability theory: A Primer. USA: SAGE Publications.
- Sireci, S. G., Thissen, D. ve Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28, 237-247.
- Thissen, D., Steinberg, L. ve Mooney, J. (1989). Trace lines for testlets: A use of multiple-categorical response models. Journal of Educational Measurement, 26, 247- 260.
- Verhelst N. D. ve Verstralen, H. H. F. M. (2001). IRT models for multiple raters. In A. Boomsma, T. Snijders, and M. Van Duijn (Eds.), Essays in Item Response Modeling (pp. 89-108) New York: Springer-Verlag.
- Wainer, H. (1995). Precision and differential item functioning on a testlet-based test: The 1991 law school admissions test as an example. Applied Measurement in Education, 8, 157-186.
- Wainer, H. ve Kiely, G. L. (1987). Item clusters and computerized adaptive testing: a case for testlets. Journal of Educational Measurement, 24 (3), 185-201.
- Wainer, H. ve Lewis, C. (1990). Toward a psychometrics for testlets. Journal of Educational Measurement, 27(1), 1-14.
- Wainer, H. ve Thissen, D. (1996). How is reliability related to the quality of test scores? What is the effect of local dependence on reliability? Educational Measurement: Issues and Practice, 15(1), 22-29.
- Wainer, H. ve Wang, C. (2000). Using a new statistical model for testlets to score TOEFL. Journal of Educational Measurement, 37, 203-220.
- Wainer, H., Bradlow, E. T. ve Du, Z. (2000). Testlet response theory: An analog for the 3PL model useful in testlet-based adaptive testing. Dordrecht: Kluwer Academic Publishers.
- Wang, X., Bradlow, E. T. ve Wainer, H. (2002). A General bayesian model for testlets: theory and application. Applied Psychological Measurement, 26(1), 109-128.
- Wilson, M. ve Hoskens, M. (2001). The rater bundle model. Journal of Educational and Behavioral Statistics, 26, 283-306.
- Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187-213.
- Zhang, X. ve Roberts, W. L. (2013). Investigation of standardized patient ratings of humanistic competence on a medical licensure examination using Many-Facet Rasch Measurement and generalizability theory. Advances in Health Sciences Education, 18(5), 929-944.
- Zwinderman, A. H. (1991). A generalized Rasch model for manifest predictors. Psychometrika, 56, 589-600.
Investigation of Two Facets Design With Generalizability In Item Response Modeling
Year 2018,
Volume: 9 Issue: 1, 17 - 32, 31.03.2018
Gülden Kaya Uyanık
,
Selahattin Gelbal
Abstract
An approach called generalizability in item response
modeling (GIRM) is investigated with two facets sx(i:t) design and results are
compared with results of generalizability theory in this study. In this study
simulated data is used. In Generalizability Theory linear model random facets
balanced bx(m:h) design are used for generating data. Generated data are
differed by factors. These factors are testlet effect, testlet length and
number of testlet. All generated data consist of two different universes and
all universes have four different conditions. According to the results of this
study the estimates of variance components obtained using GIRM approach are
generally quite similar to those obtained using GT approach. Briggs and
Wilson‘s study is supported this result. There is no difference between results
of GIRM and GT but error variance could be separated from residual variance
with GIRM. This study also examines the
reliability of testlets under different conditions. Testlets are more reliable
when person-item variance is smaller. Furthermore, when testlet effect is
increased,reliability is decreased. When conditions of all universes are
investigated it is concluded that it is effective to have more items to
increase reliability.
References
- Alkahtani, S. F. (2012). Oral performace scoring using generalizability theory and many-facet Rasch measurement: A comparison study. Unpublished Doctoral Dissertation, The Pennsylvania State University.
- Bock, R. D., Brennan, R. L. ve Muraki, E. (2002). The information in multiple ratings. Applied Psychological Measurement, 26, 364-375.
- Bradlow, E. T., Wainer, H. ve Wang, X. (1999). A bayesian random effects model for testlets. Psychometrika, 64, 153-168.
- Brennan, R. L. (2001). Generalizability theory. New-York: Springer-Verlag.
- Briggs, D. C. ve Wilson, M. (2004). Generalizability theory in item response modeling. Presentation at the International Meeting of the Psychometric Society, Pacific Grove, CA.
- Briggs, D. C. ve Wilson, M. (2007). Generalizability theory in item response modeling. Journal of Educational Measurement, 44(2), 131-155.
- Chien, Y. M. (2008). An investigation of testlet-based item response models with a random facets design in generalizability theory. Unpublished Doctoral Dissertation. University of Iowa.
- Cronbach, L. J., Linn, R. L., Brennan, R. L. ve Haertel, E. (1995). Generalizability analysis for educational assessments. Evaluation Comment. Los Angeles: UCLA's Center for the Study of Evaluation and The National Center for Research on Evaluation, Standards and Student Testing, http:--www.cse.ucla.edu.
- DeMars, C. E. (2006). Application of the bi-factor multidimensional item response theory model to testlet-based tests. Journal of Educational Measurement, 43(2), 145-168.
- Dimitrov, D. M. (2003). Marginal true-score measures and reliability for binary items as a function of their IRT parameters. Applied Psychological Measurement, 27(6), 440- 458.
- Dresher, A. R. (2004). An empirical investigation of LID using the testlet model: A further look. Paper presented at the Annual Meeting of the National Council on Measurement in Education, San Diego, CA.
- Feldt, L. S. ve Quails A. L. (1989). Reliability. In R. L. Linn (Ed.), Educational measurement (3r ed.) (pp. 105-146). New York: American Council on Education and Macmillan.
- Ferrara, S., Huynh, F. L. ve Bagli, H. (1997). Contextual characateristics of locally dependent open-ended item clusters on a large-scale performance assessment. Applied Measurement in Education, 12, 123-144.
- Ferrara, S., Huynh, F. L. ve Michaels, H. (1999). Contextual explanations of local dependence in item clusters in a large-scale hands-on science performance assessment. Journal of Educational Measurement, 36, 119-140.
- Fox, J. P. ve Glas, C. A. W. (2001). Bayesian estimation of a multilevel IRT model using Gibbs sampling. Psychometrika, 66, 271-288.
- Glas, C. A. W. (1989). Contributions to estimating and testing Rasch models. Unpublished Doctoral Dissertation. Enschede, University of Twente.
- Güler, N., Kaya Uyanık, G. ve Taşdelen Teker, G. (2012). Genellenebilirlik kuramı. Ankara: Pegem Akademi Yayıncılık.
- Hendrickson, A. B. (2001). Reliability of scores from tests composed of testlets: a comparison of methods. Paper presented at the Annual Meeting of the National Council on Measurement in Education (Seatle, WA, April1-13, 201).
- Jiao, H., Kamata, A., Wang, S. ve Jin, Y. (2012). A multilevel testlet model for dual local dependence. Journal of Educational Measurement, 49(1), 82-100.
- Karasar, N. (2004). Bilimsel Araştırma Yöntemi. 13. Baskı, Ankara: Nobel Yayınları.
- Kim, S. C. ve Wilson, M. (2008). A comparative analysis of the ratings in performance assessment using generalizability theory and the many-facet Rasch model. Journal of Applied Measurement, 10(4), 408-423.
- Kolen, M. ve Harris, D. (1987). A multivariate test theory model based on item response theory and generalizability theory. Paper presented at the American Educational Research Association, Washington, DC.
- Lee, G. ve Park, I. Y. (2012). A comparison of the approaches of generalizability theory and item response theory in estimating the reliability of test scores for testlet-composed tests. Asia Pacific Education Review, 13(1), 47-54.
- Lee, G., Brennan, R. L. ve Frisbie, D. A. (2000). Incorporating the testlet concept in test score analyses. Educational Measurement: Issues and Practice, 19(4), 9-15.
- Lee, G. ve Frisbie, D. A. (1999). Estimating reliability under a generalizability theory model for test scores composed of testlets. Applied Measurement in Education, 12(3), 237-255.
- Li, Y., Bolt, D. M. ve Fu, J. (2006). A comparison of alternative models for testlets. Applied Psychological Measurement, 30(1), 3-21.
- Linacre, J. M. (1989). Many-facet Rasch measurement. Chicago: MESA Press.
- Linacre, J. M. (1999). FACETS (Version 3.17) [Computer software]. Chicago: MESA Press.
- Lord, F. M. (1983). Unbiased estimation of ability parameters, of their variance, and of their parallel forms reliability. Psychometrika, 48, 233-245
- Patz, R., Junker, B., Johnson, M. S. ve Mariano, L. (2002). The hierarchical rater model for rated test items and its application to large-scale educational assessment data. Journal of Educational and Behavioral Statistics, 27, 341-384.
- Raju, N. S. ve Oshima, T. C. (2005). Two prophecy formulas for assessing the reliability of item response theory-based ability estimates. Educational and Psychological Measurement, 65(3), 361-375.
- Rasch, G. (1960). Probabilistic models for some intelligence and attainment tests. Copenhagen: Danish Institute for Educational Research.
- Rosenbaum, P. R. (1988). Items bundles. Psychometrika, 53(3), 349-359.
- Samejima, F. (1977). A use of the information function in tailored testing. Applied Psychological Measurement, 1, 233-247.
- Samejima, F. (1994). Estimation of reliability coefficients using the test information function and its modifications. Applied Psychological Measurement, 18, 229-244.
- Shavelson, R. J. ve Webb, N. M. (1991). Generalizability theory: A Primer. USA: SAGE Publications.
- Sireci, S. G., Thissen, D. ve Wainer, H. (1991). On the reliability of testlet-based tests. Journal of Educational Measurement, 28, 237-247.
- Thissen, D., Steinberg, L. ve Mooney, J. (1989). Trace lines for testlets: A use of multiple-categorical response models. Journal of Educational Measurement, 26, 247- 260.
- Verhelst N. D. ve Verstralen, H. H. F. M. (2001). IRT models for multiple raters. In A. Boomsma, T. Snijders, and M. Van Duijn (Eds.), Essays in Item Response Modeling (pp. 89-108) New York: Springer-Verlag.
- Wainer, H. (1995). Precision and differential item functioning on a testlet-based test: The 1991 law school admissions test as an example. Applied Measurement in Education, 8, 157-186.
- Wainer, H. ve Kiely, G. L. (1987). Item clusters and computerized adaptive testing: a case for testlets. Journal of Educational Measurement, 24 (3), 185-201.
- Wainer, H. ve Lewis, C. (1990). Toward a psychometrics for testlets. Journal of Educational Measurement, 27(1), 1-14.
- Wainer, H. ve Thissen, D. (1996). How is reliability related to the quality of test scores? What is the effect of local dependence on reliability? Educational Measurement: Issues and Practice, 15(1), 22-29.
- Wainer, H. ve Wang, C. (2000). Using a new statistical model for testlets to score TOEFL. Journal of Educational Measurement, 37, 203-220.
- Wainer, H., Bradlow, E. T. ve Du, Z. (2000). Testlet response theory: An analog for the 3PL model useful in testlet-based adaptive testing. Dordrecht: Kluwer Academic Publishers.
- Wang, X., Bradlow, E. T. ve Wainer, H. (2002). A General bayesian model for testlets: theory and application. Applied Psychological Measurement, 26(1), 109-128.
- Wilson, M. ve Hoskens, M. (2001). The rater bundle model. Journal of Educational and Behavioral Statistics, 26, 283-306.
- Yen, W. M. (1993). Scaling performance assessments: Strategies for managing local item dependence. Journal of Educational Measurement, 30, 187-213.
- Zhang, X. ve Roberts, W. L. (2013). Investigation of standardized patient ratings of humanistic competence on a medical licensure examination using Many-Facet Rasch Measurement and generalizability theory. Advances in Health Sciences Education, 18(5), 929-944.
- Zwinderman, A. H. (1991). A generalized Rasch model for manifest predictors. Psychometrika, 56, 589-600.