Research Article
BibTex RIS Cite

Comparison of Estimation of Total Score and Subscores with Hierarchical Item Response Theory Models

Year 2018, Volume: 9 Issue: 2, 178 - 201, 30.06.2018
https://doi.org/10.21031/epod.404089

Abstract

In this study,
the relationship between subtest and total test was investigated by using
hierarchical item response theory models in order to contribute to reliable
subtest and total test score estimates. The RMSE and reliability of the total
test score and subtest scores estimated by the Higher Order, Bi-factor and
hierarchical MIRT models in the study were compared under the conditions of the
size of the correlations between the subtests, subtest length and number of
subtests. In addition, the performance of three models used in the research was
examined on TEOG 2015 data. As a result of the study, in almost all conditions,
when the correlation between the subtest and the subtest length increased, the
RMSE of the ability parameters decreased and the reliability increased for the
total test score obtained from the three estimation models. Under all
conditions, the lowest RMSE values and the highest reliability values were yielded
from Hierarchical MIRT model for subtest score recovery and from Hierarchical
MIRT model for total test score recovery. In addition, all models estimated
RMSE and reliability values close to each other at 0.8 level of correlation for
total test score recovery. The RMSE values of the ability parameters for the
subtest scores in two and three dimensional data were found to be not affected
by the correlation level between the subtests while the subtest length
decreased in the Hierarchical MIRT model; were found to decrease as the
correlation between subtest and subtest length in the Higher Order model and
were found to decrease as the subtest length increased, but significantly
increased as the correlation between the subtests increased in the Bi-factor
model. 

References

  • American Educational Research Association, American Psychological Association, National Council on Measurement in Education, Joint Committee on Standards for Educational, & Psychological Testing (US). (1999). Standards for educational and psychological testing. American Educational Research Association, Washington, DC.
  • Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64(2), 153–168, doi: 10.1002/j.2333-8504.1998.tb01752.x
  • Brandt, S., & Duckor, B. (2013). Increasing unidimensional measurement precision using a multidimensional item response model approach. Psychological Test and Assessment Modeling, 55(2), 148-161.
  • Brennan, R. L. (2012). Utility indexes for decisions about subscores (No. 33). Center for Advanced Studies in Measurement and Assessment (CASMA). Retrieved from https://education.uiowa.edu/sites/education.uiowa.edu/files/documents/centers/casma/publications/casma-research-report-33.pdf
  • Bulut, O. (2013). Between-person and within-person subscore reliability: Comparison of unidimensional and multidimensional IRT models. (Doctoral Dissertation). Retrieved from https://conservancy.umn.edu/bitstream/handle/11299/155592/Bulut_umn_0130E_13879.pdf?sequence=1&isAllowed=y
  • Chang, Y. F. (2015). A Restricted Bi-factor Model of Subdomain Relative Strengths and Weaknesses. (Doctoral Dissertation) Retrieved from https://conservancy.umn.edu/bitstream/handle/11299/175551/CHANG_umn_0130E_16452.pdf?sequence=1&isAllowed=y
  • Çakıcı Eser, D. (2015). Çok boyutlu madde tepki kuramının farklı modellerinden çeşitli koşullar altında kestirilen parametrelerin incelenmesi. (Doktora tezi). Erişim adresi: http://tez2.yok.gov.tr/
  • de la Torre, J., & Patz, R.J. (2005). Making the most of what we have: A practical application of multidimensional IRT in test scoring. Journal of Educational and Behavioral Statistics, 30(3), 295–311, doi: 10.3102/10769986030003295
  • de la Torre, J. (2009). Improving the quality of ability estimates through multidimensional scoring and incorporation of ancillary variables. Applied Psychological Measurement, 33(6), 465–485, doi: 10.1177/0146621608329890
  • de la Torre, J., & Song, H. (2009). Simultaneous estimation of overall and domain abilities: A higher-order IRT model approach. Applied Psychological Measurement, 33(8), 620-639, doi: 10.1177/0146621608326423
  • de la Torre, J., Song, H., & Hong, Y. (2011). A comparison of four methods of IRT subscoring. Applied Psychological Measurement, 35(4), 296-316, doi: 10.1177/0146621610378653
  • Edwards, M. C., & Vevea, J. L. (2006). An empirical Bayes approach to subscore augmentation: How much strength can we borrow?. Journal of Educational and Behavioral Statistics, 31(3), 241-259, doi: 10.3102/10769986031003241
  • ETS. (2014). ETS standards for quality and fairness. Educational Testing Service. .Retreived from https://www.ets.org/s/about/pdf/standards.pdf
  • Ferrara, S., & DeMauro, G. E. (2007). Standardized assessment of individual achievement in K–12. In R. L. Brennan (Eds.). Educational measurement, 579–622. Westport, CT: Praeger.
  • Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2011). How to design and evaluate research in education. (8th edition). Boston: McGraw – Hill. Gall M. D., Gall, J. P., & Borg, W., R. (2003). Educational research: An introduction. (7th. Edition). Pearson Education, Inc.
  • Gibbons, R. D., & Hedeker, D. (1992). Full-information item Bi-factor analysis. Psychometrika, 57, 423–436.
  • Haberman, S. J. (2008). When can subscores have value? Journal of Educational and Behavioral Statistics, 33(2), 204–229, doi:10.3102/1076998607302636
  • Haberman, S., Sinharay, S., & Puhan, G. (2009). Reporting subscores for institutions. British Journal of Mathematical and Statistical Psychology, 62(1), 79–95, doi:10.1348/000711007X248875
  • Haladyna, T. M., & Kramer, G. A. (2004). The validity of subscores for a credentialing Test. Evaluation & The Health Professions, 27(4), 349–368, doi: 10.1177/0163278704270010
  • Harwell, M., Stone, C. A., Hsu, T. C., & Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101-125, doi: 10.1177/014662169602000201
  • Huang, H. Y., Wang, W. C., Chen, P. H., & Su, C. M. (2013). Higher-order item response models for hierarchical latent traits. Applied Psychological Measurement, 37(8), 619-637, doi: 10.1177/0146621613488819
  • Jiang, S., Wang, C., & Weiss, D. J. (2016). Sample size requirements for estimation of item parameters in the multidimensional graded response model. Frontiers in psychology, 7(109), 1-10, doi: 10.3389/fpsyg.2016.00109
  • Kelley, T. L. (1927). The interpretation of educational measurements. New York: World Book.
  • Kelley, T. L. (1947). Fundamentals of statistics. Cambridge: Harvard University Press Kerlinger.
  • Kerlinger, F.N. (1973). Foundation of behavioural research. New York. Holt. Rinehand and Hinston.
  • Köse, İ.A. (2010). Madde tepki kuramına dayalı tek boyutlu ve çok boyutlu modellerin test uzunluğu ve örneklem büyüklüğü açısından karşılaştırılması. (Doktora Tezi). Erişim adresi: http://tez2.yok.gov.tr/
  • Lee, J. (2012). Multidimensional item response theory: an investigation of interaction effects between factors on item parameter recovery using Markov Chain Monte Carlo. (Doctoral Dissertation). Retrieved from https://d.lib.msu.edu/islandora/object/etd:1577/datastream/OBJ/download/Multidimensional_item_response_theory__an_investigation_of_interaction_effects_between_factors_on_item_parameter_recovery_using_Markov_Chain_Monte_Carlo.pdf
  • Ling, G. (2012). Why the major field test in business does not report subscores: Reliability and construct validity evidence (No. RR-12-11). ETS Research Report. Retrieved from https://www.ets.org/Media/Research/pdf/RR-12-11.pdf
  • Lorenzo-Seva, U., & Ferrando, P.J. (2006). FACTOR: A computer program to fit the exploratory factor analysis model. Behavioral Research Methods, Instruments and Computers, 38(1), 88-91.
  • Messick, S. (1989). Validity. In R. L. Linn (Eds.). Educational measurement, 13-103, New York, NY: Macmillan. Monaghan, W. (2006). The fact about subscores (No. RDC-04). ETS Research Report. Retrieved from https://www.ets.org/research/policy_research_reports/rdc-04
  • Özkan, Y. Ö. (2012). Öğrenci başarılarının belirlenmesi sınavından (ÖBBS) klasik test kuramı, tek boyutlu ve çok boyutlu madde tepki kuramı modelleri ile kestirilen başarı puanlarının karşılaştırılması. (Doktora Tezi). Erişim adresi: http://tez2.yok.gov.tr/
  • Reckase, M. D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21, 25–36, doi: 10.1177/0146621697211002
  • Schmid, J., & Leiman, J. M. (1957). The development of hierarchical factor solutions. Psychometrika, 22(1), 53-61.
  • Sheng, Y., & Wikle, C. K. (2007). Comparing Multiunidimensional and unidimensional item response theory models. Educational and Psychological Measurement, 67(6) 899–919, doi: 10.1177/0013164406296977
  • Sheng, Y., & Wikle, C. K. (2008). Bayesian multidimensional IRT models with a hierarchical structure. Educational and Psychological Measurement, 68(3), 413–430, doi: 10.1177/0013164407308512
  • Shin, D. (2007). A comparison of methods of estimating subscale scores for mixed-format tests. Report for Pearson Educational Measurement. Retreived from https://images.pearsonassessments.com/images/tmrs/tmrs_rg/EstimatingSubscaleScoresforMixedFormatItemsforPEMreportfinal.pdf?WT.mc_id=TMRS_A_Comparison_of_Methods_of_Estimating
  • Shin, C. D., Ansley, T., Tsai, T., & Mao X. (2005, April). A comparison of methods of estimating objective scores. Annual meeting of the National Council on Measurement in Education, Montreal, Canada.
  • Sinharay, S. (2010). How often do subscores have added value? Results from operational and simulated data. Journal of Educational Measurement, 47(2),150-174.
  • Skorupski, W. P., & Carvajal, J. (2010). A comparison of approaches for improving the reliability of objective level scores. Educational and Psychological Measurement, 70(3), 357-375, doi: 10.1177/0013164409355694
  • Wainer, H., Vevea, J. L., Camacho, F., Reeve, B. B., Rosa, K., Nelson, L.,Swygert, K. A., & Thissen, D. (2001). Augmented score—‘‘borrowing strength’’ to compute scores based on small numbers of items. In D. Thissen and H. Wainer (Eds.). Test scoring, (343-387). Mahwah, Lawrence Erlbaum Associates, Inc
  • Wang, W. C., & Wilson, M. (2005). The Rasch testlet model. Applied Psychological Measurement, 29(2), 126–149, doi: 10.1177/0146621604271053
  • Wang, W. C., Chen, P. H., & Cheng, Y. Y. (2004). Improving measurement precision of test batteries using multidimensional item response models. Psychological Methods, 9(1), 116, doi: 10.1037/1082-989X.9.1.116
  • Yao, L. (2003). SimuMIRT [Software]. Monterey, CA: Defense Manpower Data Center. Retreived from http://www.bmirt.com
  • Yao, L. (2010). Reporting valid and reliable overall scores and domain scores. Journal of Educational Measurement, 47(3), 339-360, doi: 10.1111/j.1745-3984.2010.00117.x
  • Yao, L. (2017). Comparing methods for estimating the abilities for the multidimensional models of mixed item types. Communications in Statistics-Simulation and Computation, 1-18, doi: 10.1080/03610918.2016.1277749
  • Yao, L., & Boughton, K. A. (2007). A multidimensional item response modeling approach for improving subscale proficiency estimation and classification. Applied Psychological Measurement, 31(2), 83-105, doi: 10.1177/0146621606291559
  • Yao, L., & Schwarz, R. D. (2006). A multidimensional partial credit model with associated item and test statistics: An application to mixed-format tests. Applied Psychological Measurement, 30(6), 469–492, doi: 10.1177/0146621605284537
  • Yen, W. M. (1980). The extent, causes and importance of context effects on item parameters for 2 latent trait models. Journal of Educational Measurement, 17(4), 297–311, doi: 10.1111/j.1745-3984.1980.tb00833.x
  • Yen, W. M. (1987,June). A Bayesian/IRT index of objective performance. Annual meeting of the Psychometric Society, Montreal, Quebec, Canada.

Toplam Test ve Alt Test Puanlarının Kestiriminin Hiyerarşik Madde Tepki Kuramı Modelleri ile Karşılaştırılması

Year 2018, Volume: 9 Issue: 2, 178 - 201, 30.06.2018
https://doi.org/10.21031/epod.404089

Abstract

Bu
araştırmada güvenilir alt test ve toplam test puanı kestirimleri konusuna katkı
sağlamak amacıyla alt test ve toplam test arasındaki ilişki hiyerarşik madde
tepki kuramı modelleri ile araştırılmak istenmiştir. Çalışmada Üst Düzey Sıralı
(Higher Order), İki Faktör (Bi-factor) ve hiyerarşik çok boyutlu madde tepki
kuramı (ÇBMTK) modelleri ile kestirilen toplam test puanının ve alt test
puanlarının RMSE ve güvenirlik değerleri alt test sayısı, alt test uzunluğu ve
alt testler arasındaki korelasyonların büyüklüğü koşulları altında
karşılaştırılmıştır. Ayrıca TEOG 2015 verileri üzerinde araştırmada kullanılan
üç kestirim modelinin performansı incelenmiştir. Araştırmanın sonucunda iki ve üç boyutlu verilerde hemen hemen tüm
koşullarda alt test uzunluğu ve alt testler arasındaki korelasyonun arttıkça üç
kestirim modelinden elde edilen toplam test puanı için yetenek parametreleri
kestirim hatasının azaldığı, kestirim güvenirliğinin ise arttığı bulunmuştur.
Toplam test puanları için Hiyerarşik ÇBMTK model ile tüm koşullarda en düşük
RMSE değeri ve en yüksek güvenirlik değeri elde edilmiştir. Ayrıca korelasyonun
0.8 düzeyinde toplam test puanı için tüm modeller birbirine yakın RMSE ve
güvenirlik değerleri ile kestirim yapmıştır. İki ve üç boyutlu verilerde alt test
puanı için kestirilen yetenek parametrelerinin RMSE değerleri, Hiyerarşik ÇBMTK
modelde alt test uzunluğu arttıkça azalırken alt testler arasındaki korelasyon
düzeyinden etkilenmediği; Üst Düzey Sıralı modelde alt test uzunluğu ve alt
testler arasındaki korelasyon arttıkça azaldığı; İki Faktör modelde ise alt
test uzunluğu arttıkça azalırken alt testler arasındaki korelasyon arttıkça
önemli düzeyde arttığı bulunmuştur. 

References

  • American Educational Research Association, American Psychological Association, National Council on Measurement in Education, Joint Committee on Standards for Educational, & Psychological Testing (US). (1999). Standards for educational and psychological testing. American Educational Research Association, Washington, DC.
  • Bradlow, E. T., Wainer, H., & Wang, X. (1999). A Bayesian random effects model for testlets. Psychometrika, 64(2), 153–168, doi: 10.1002/j.2333-8504.1998.tb01752.x
  • Brandt, S., & Duckor, B. (2013). Increasing unidimensional measurement precision using a multidimensional item response model approach. Psychological Test and Assessment Modeling, 55(2), 148-161.
  • Brennan, R. L. (2012). Utility indexes for decisions about subscores (No. 33). Center for Advanced Studies in Measurement and Assessment (CASMA). Retrieved from https://education.uiowa.edu/sites/education.uiowa.edu/files/documents/centers/casma/publications/casma-research-report-33.pdf
  • Bulut, O. (2013). Between-person and within-person subscore reliability: Comparison of unidimensional and multidimensional IRT models. (Doctoral Dissertation). Retrieved from https://conservancy.umn.edu/bitstream/handle/11299/155592/Bulut_umn_0130E_13879.pdf?sequence=1&isAllowed=y
  • Chang, Y. F. (2015). A Restricted Bi-factor Model of Subdomain Relative Strengths and Weaknesses. (Doctoral Dissertation) Retrieved from https://conservancy.umn.edu/bitstream/handle/11299/175551/CHANG_umn_0130E_16452.pdf?sequence=1&isAllowed=y
  • Çakıcı Eser, D. (2015). Çok boyutlu madde tepki kuramının farklı modellerinden çeşitli koşullar altında kestirilen parametrelerin incelenmesi. (Doktora tezi). Erişim adresi: http://tez2.yok.gov.tr/
  • de la Torre, J., & Patz, R.J. (2005). Making the most of what we have: A practical application of multidimensional IRT in test scoring. Journal of Educational and Behavioral Statistics, 30(3), 295–311, doi: 10.3102/10769986030003295
  • de la Torre, J. (2009). Improving the quality of ability estimates through multidimensional scoring and incorporation of ancillary variables. Applied Psychological Measurement, 33(6), 465–485, doi: 10.1177/0146621608329890
  • de la Torre, J., & Song, H. (2009). Simultaneous estimation of overall and domain abilities: A higher-order IRT model approach. Applied Psychological Measurement, 33(8), 620-639, doi: 10.1177/0146621608326423
  • de la Torre, J., Song, H., & Hong, Y. (2011). A comparison of four methods of IRT subscoring. Applied Psychological Measurement, 35(4), 296-316, doi: 10.1177/0146621610378653
  • Edwards, M. C., & Vevea, J. L. (2006). An empirical Bayes approach to subscore augmentation: How much strength can we borrow?. Journal of Educational and Behavioral Statistics, 31(3), 241-259, doi: 10.3102/10769986031003241
  • ETS. (2014). ETS standards for quality and fairness. Educational Testing Service. .Retreived from https://www.ets.org/s/about/pdf/standards.pdf
  • Ferrara, S., & DeMauro, G. E. (2007). Standardized assessment of individual achievement in K–12. In R. L. Brennan (Eds.). Educational measurement, 579–622. Westport, CT: Praeger.
  • Fraenkel, J. R., Wallen, N. E., & Hyun, H. H. (2011). How to design and evaluate research in education. (8th edition). Boston: McGraw – Hill. Gall M. D., Gall, J. P., & Borg, W., R. (2003). Educational research: An introduction. (7th. Edition). Pearson Education, Inc.
  • Gibbons, R. D., & Hedeker, D. (1992). Full-information item Bi-factor analysis. Psychometrika, 57, 423–436.
  • Haberman, S. J. (2008). When can subscores have value? Journal of Educational and Behavioral Statistics, 33(2), 204–229, doi:10.3102/1076998607302636
  • Haberman, S., Sinharay, S., & Puhan, G. (2009). Reporting subscores for institutions. British Journal of Mathematical and Statistical Psychology, 62(1), 79–95, doi:10.1348/000711007X248875
  • Haladyna, T. M., & Kramer, G. A. (2004). The validity of subscores for a credentialing Test. Evaluation & The Health Professions, 27(4), 349–368, doi: 10.1177/0163278704270010
  • Harwell, M., Stone, C. A., Hsu, T. C., & Kirisci, L. (1996). Monte Carlo studies in item response theory. Applied Psychological Measurement, 20(2), 101-125, doi: 10.1177/014662169602000201
  • Huang, H. Y., Wang, W. C., Chen, P. H., & Su, C. M. (2013). Higher-order item response models for hierarchical latent traits. Applied Psychological Measurement, 37(8), 619-637, doi: 10.1177/0146621613488819
  • Jiang, S., Wang, C., & Weiss, D. J. (2016). Sample size requirements for estimation of item parameters in the multidimensional graded response model. Frontiers in psychology, 7(109), 1-10, doi: 10.3389/fpsyg.2016.00109
  • Kelley, T. L. (1927). The interpretation of educational measurements. New York: World Book.
  • Kelley, T. L. (1947). Fundamentals of statistics. Cambridge: Harvard University Press Kerlinger.
  • Kerlinger, F.N. (1973). Foundation of behavioural research. New York. Holt. Rinehand and Hinston.
  • Köse, İ.A. (2010). Madde tepki kuramına dayalı tek boyutlu ve çok boyutlu modellerin test uzunluğu ve örneklem büyüklüğü açısından karşılaştırılması. (Doktora Tezi). Erişim adresi: http://tez2.yok.gov.tr/
  • Lee, J. (2012). Multidimensional item response theory: an investigation of interaction effects between factors on item parameter recovery using Markov Chain Monte Carlo. (Doctoral Dissertation). Retrieved from https://d.lib.msu.edu/islandora/object/etd:1577/datastream/OBJ/download/Multidimensional_item_response_theory__an_investigation_of_interaction_effects_between_factors_on_item_parameter_recovery_using_Markov_Chain_Monte_Carlo.pdf
  • Ling, G. (2012). Why the major field test in business does not report subscores: Reliability and construct validity evidence (No. RR-12-11). ETS Research Report. Retrieved from https://www.ets.org/Media/Research/pdf/RR-12-11.pdf
  • Lorenzo-Seva, U., & Ferrando, P.J. (2006). FACTOR: A computer program to fit the exploratory factor analysis model. Behavioral Research Methods, Instruments and Computers, 38(1), 88-91.
  • Messick, S. (1989). Validity. In R. L. Linn (Eds.). Educational measurement, 13-103, New York, NY: Macmillan. Monaghan, W. (2006). The fact about subscores (No. RDC-04). ETS Research Report. Retrieved from https://www.ets.org/research/policy_research_reports/rdc-04
  • Özkan, Y. Ö. (2012). Öğrenci başarılarının belirlenmesi sınavından (ÖBBS) klasik test kuramı, tek boyutlu ve çok boyutlu madde tepki kuramı modelleri ile kestirilen başarı puanlarının karşılaştırılması. (Doktora Tezi). Erişim adresi: http://tez2.yok.gov.tr/
  • Reckase, M. D. (1997). The past and future of multidimensional item response theory. Applied Psychological Measurement, 21, 25–36, doi: 10.1177/0146621697211002
  • Schmid, J., & Leiman, J. M. (1957). The development of hierarchical factor solutions. Psychometrika, 22(1), 53-61.
  • Sheng, Y., & Wikle, C. K. (2007). Comparing Multiunidimensional and unidimensional item response theory models. Educational and Psychological Measurement, 67(6) 899–919, doi: 10.1177/0013164406296977
  • Sheng, Y., & Wikle, C. K. (2008). Bayesian multidimensional IRT models with a hierarchical structure. Educational and Psychological Measurement, 68(3), 413–430, doi: 10.1177/0013164407308512
  • Shin, D. (2007). A comparison of methods of estimating subscale scores for mixed-format tests. Report for Pearson Educational Measurement. Retreived from https://images.pearsonassessments.com/images/tmrs/tmrs_rg/EstimatingSubscaleScoresforMixedFormatItemsforPEMreportfinal.pdf?WT.mc_id=TMRS_A_Comparison_of_Methods_of_Estimating
  • Shin, C. D., Ansley, T., Tsai, T., & Mao X. (2005, April). A comparison of methods of estimating objective scores. Annual meeting of the National Council on Measurement in Education, Montreal, Canada.
  • Sinharay, S. (2010). How often do subscores have added value? Results from operational and simulated data. Journal of Educational Measurement, 47(2),150-174.
  • Skorupski, W. P., & Carvajal, J. (2010). A comparison of approaches for improving the reliability of objective level scores. Educational and Psychological Measurement, 70(3), 357-375, doi: 10.1177/0013164409355694
  • Wainer, H., Vevea, J. L., Camacho, F., Reeve, B. B., Rosa, K., Nelson, L.,Swygert, K. A., & Thissen, D. (2001). Augmented score—‘‘borrowing strength’’ to compute scores based on small numbers of items. In D. Thissen and H. Wainer (Eds.). Test scoring, (343-387). Mahwah, Lawrence Erlbaum Associates, Inc
  • Wang, W. C., & Wilson, M. (2005). The Rasch testlet model. Applied Psychological Measurement, 29(2), 126–149, doi: 10.1177/0146621604271053
  • Wang, W. C., Chen, P. H., & Cheng, Y. Y. (2004). Improving measurement precision of test batteries using multidimensional item response models. Psychological Methods, 9(1), 116, doi: 10.1037/1082-989X.9.1.116
  • Yao, L. (2003). SimuMIRT [Software]. Monterey, CA: Defense Manpower Data Center. Retreived from http://www.bmirt.com
  • Yao, L. (2010). Reporting valid and reliable overall scores and domain scores. Journal of Educational Measurement, 47(3), 339-360, doi: 10.1111/j.1745-3984.2010.00117.x
  • Yao, L. (2017). Comparing methods for estimating the abilities for the multidimensional models of mixed item types. Communications in Statistics-Simulation and Computation, 1-18, doi: 10.1080/03610918.2016.1277749
  • Yao, L., & Boughton, K. A. (2007). A multidimensional item response modeling approach for improving subscale proficiency estimation and classification. Applied Psychological Measurement, 31(2), 83-105, doi: 10.1177/0146621606291559
  • Yao, L., & Schwarz, R. D. (2006). A multidimensional partial credit model with associated item and test statistics: An application to mixed-format tests. Applied Psychological Measurement, 30(6), 469–492, doi: 10.1177/0146621605284537
  • Yen, W. M. (1980). The extent, causes and importance of context effects on item parameters for 2 latent trait models. Journal of Educational Measurement, 17(4), 297–311, doi: 10.1111/j.1745-3984.1980.tb00833.x
  • Yen, W. M. (1987,June). A Bayesian/IRT index of objective performance. Annual meeting of the Psychometric Society, Montreal, Quebec, Canada.
There are 49 citations in total.

Details

Primary Language Turkish
Journal Section Articles
Authors

Sümeyra Soysal 0000-0002-7304-1722

Hülya Kelecioğlu 0000-0002-0741-9934

Publication Date June 30, 2018
Acceptance Date June 13, 2018
Published in Issue Year 2018 Volume: 9 Issue: 2

Cite

APA Soysal, S., & Kelecioğlu, H. (2018). Toplam Test ve Alt Test Puanlarının Kestiriminin Hiyerarşik Madde Tepki Kuramı Modelleri ile Karşılaştırılması. Journal of Measurement and Evaluation in Education and Psychology, 9(2), 178-201. https://doi.org/10.21031/epod.404089