Çok Aşamalı Testlerin Panel Deseni, Modül Uzunluğu, Örneklem Büyüklüğü ve Yetenek Parametresi Kestirim Yöntemleri Açısından Farklı Koşullar Altında Karşılaştırılması

Serap Büyükkıdık; Fatma Gökçen Ayva Yörü

doi:10.52597/buje.1329338

Araştırma Makalesi

BibTex

RIS

Kaynak Göster

Çok Aşamalı Testlerin Panel Deseni, Modül Uzunluğu, Örneklem Büyüklüğü ve Yetenek Parametresi Kestirim Yöntemleri Açısından Farklı Koşullar Altında Karşılaştırılması

Yıl 2024, Cilt: 41 Sayı: 2, 9 - 27, 31.08.2024

Serap Büyükkıdık , Fatma Gökçen Ayva Yörü

https://doi.org/10.52597/buje.1329338

Cited By: 1

Öz

Bu araştırmada çeşitli simülasyon koşullarında çok aşamalı testlerin performansları, hata kareler ortalamasının karekökü (Root Mean Square Error-RMSE), tahminin standart hatası (Standard Error of Estimate-SEE), yanlılık (BIAS) ve ortalama mutlak hata (Mean Absolute Error-MAE) değerlendirme kriterleri açısından karşılaştırılmıştır. Test simülasyonunda panel deseni (1-3, 1-2-3, 1-3-3), modül uzunluğu (6, 12, 18), örneklem büyüklüğü (300, 1000, 3000), yetenek parametresi kestirim yöntemi (beklenen sonsal dağılım [Expected a Posteriori-EAP], maksimum sonsal dağılım [Maximum a Posteriori-MAP] ve sınırlı en çok olabilirlik kestirimi [Maximum Likelihood Estimation with Fences-MLEF]) olmak üzere 81 koşul (3x3x3x3) belirlenmiştir. Araştırma sonucunda RMSE ile MAE değerlerinin genellikle benzer sonuçlar verdiği ve modül uzunluğu arttıkça ölçme doğruluğunun da arttığı bulunmuştur. Ayrıca RMSE, SEE ve MAE’nin 1-3 panel deseninde en yüksek, 1-3-3 deseninde ise en düşük değerleri aldığı saptanmıştır. Araştırmacılara 1-3-3 panel deseninde, en az 12 modül uzunluğunda ve EAP yöntemi kullanarak çalışma yapmaları önerilmektedir.

Anahtar Kelimeler

Çok aşamalı test, panel desen, modül uzunluğu, örneklem büyüklüğü, yetenek parametresi kestirim yöntemi

Kaynakça

Armstrong, R. D., & Roussos, L. (2005). A method to determine targets for multi-stage adaptive tests. Research Report 02-07. Newton, PA: School Admission Council.
Armstrong, R. D., Jones, D. H., Koppel, N. B., & Pashley, P. J. (2004). Computerized adaptive testing with multiple-form structures. Applied Psychological Measurement, 28(3), 147–164. https://doi.org/10.1177/0146621604263652
Baker, F. B. (2001). The basics of item response theory. United States of America: ERIC Clearinghouse on Assessment and Evaluation.
Belov, D. I., & Armstrong, R. D. (2008). A monte carlo approach to the design, assembly, and evaluation of multistage adaptive tests. Applied Psychological Measurement, 32(2), 119–137. https://doi.org/10.1177/0146621606297308
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459. https://doi.org/10.1007/BF02293801
Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6(4), 431–444. https://doi.org/10.1177/014662168200600405
Boztunç Öztürk, N. (2019). How the length and characteristics of routing module affect ability estimation in ca-MST? Universal Journal of Educational Research, 7(1), 164–170. https://doi.org/10.13189/ujer.2019.070121
Breithaupt, K., & Hare, D. R. (2007). Automated simultaneous assembly of multistage testlets for a high-stakes licensing examination. Educational and Psychological Measurement, 67(1), 5–20. https://doi.org/10.1177/0013164406288162
Büyükkıdık, S. & Ayva Yörü, F. G. (2022, Eylül). Çok aşamalı testlerin panel deseni, modül uzunluğu, örneklem büyüklüğü ve yetenek kestirim yöntemleri açısından karşılaştırılması [Sözlü bildiri]. 8. Uluslararası Eğitimde ve Psikolojide Ölçme ve Değerlendirme Kongresi, İzmir.
Chen, L. Y. (2010). An investigation of the optimal test design for multi-stage test using the generalized partical credit model [Yayımlanmamış doktora tezi]. The University of Texas at Austin.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2. baskı). Routledge.
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart & Winston.
Dallas, A. (2014). The effects of routing and scoring within a computer adaptive multi-stage framework. [Yayımlanmamış doktora tezi]. The University of North Carolina.
Dallas, A., Wang, X., Furter, R., & Luecht, R. M. (2012, Nisan). Item pool size, targeted item writing and panel replication strategies for a 1-3-3 multistage test design [Sözlü bildiri]. National Council on Measurement in Education (NCME), Vancouver, BC.
Davey, T., & Lee, Y. H. (2011). Potential impact of context effects on the scoring and equating of the multistage GRE® revised general test. ETS Research Report Series, 2011(2), i–44. https://www.ets.org/research/policy_research_reports/publications/report/2011/itjm.html
Davis, L. L., & Dodd, B. G. (2003). Item exposure constraints for testlets in the verbal reasoning section of the MCAT. Applied Psychological Measurement, 27(5), 335–356. https://doi.org/10.1177/0146621603256804
De Ayala, R. J. (2009). The theory and practice of item response theory. New York: The Guilford Press.
Doğruöz, E. (2018). Bireyselleştirilmiş çok aşamalı testlerin test birleştirme yöntemlerine göre incelenmesi [Yayımlanmamış doktora tezi]. Hacettepe Üniversitesi.
Drasgow, F., & Mattern, K. (2006). New tests and new items: Opportunities and issues. D. Bartram & R. Hambleton (Haz.), Computer based testing and internet: Issues and advances içinde (s. 59–77). Educational testing service: London.
Edwards, M. C., Flora, D. B., & Thissen, D. (2012). Multistage computerized adaptive testing with uniform item exposure. Applied Measurement in Education, 25(2), 118–141. https://doi.org/10.1080/08957347.2012.660363
Erdem Kara, B. (2019). Değişen madde fonksiyonu gösteren madde oranının bireyselleştirilmiş bilgisayarlı ve çok aşamalı testler üzerindeki etkisi [Yayımlanmamış doktora tezi]. Hacettepe Üniversitesi.
Ertaş Polat, F. G. (2022). Çok aşamalı bireye uyarlanmış testlerde farklı koşullardan elde edilen yetenek kestirimlerinin karşılaştırılması [Yayımlanmamış doktora tezi]. Hacettepe Üniversitesi.
Hambleton, R. K., & Xing, D. (2006). Optimal and nonoptimal computerbased test designs for making pass-fail decisions. Applied Measurement in Education, 19(3), 221–239 https://doi.org/10.1207/s15324818ame1903_4
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. London: Sage.
Han, K. C. T., & Guo, F. (2014). Multistage testing by shaping modules on the fly. D. Yan, A. A. von Davier, & C. Lewis (Haz.), Computerized multistage testing: Theory and applications içinde (s. 119–133). Chapman and Hall/CRC.
Han, K. T. (2013). MSTGen: Simulated data generator for multistage testing. Applied Psychological Measurement, 37(8), 666–668. https://doi.org/10.1177/0146621613499639
Hembry, I. F. (2014). Operational characteristics of mixed format multistage tests using the 3PL testlet response theory model [Yayımlanmamış doktora tezi]. The University of Texas at Austin.
Hendrickson, A. (2007). An NCME instructional module on multistage testing. Educational Measurement: Issues and Practice, 26(2), 44–52. https://doi.org/10.1111/j.1745-3992.2007.00093.x
International Association for the Evaluation of Educational Achievement. (2021). TIMSS 2019 international database [Veri seti]. TIMSS & PIRLS International Study Center. https://timss2019.org/international-database/?_gl=1*1gitpgj*_ga*OTg0NzE0MzYuMTY0NTk5NzE4MQ..*_ga_L2FMXN42HR*MTY0Njc3OTQ2OC41LjAuMTY0Njc3OTQ2OC4w
Jodoin, M. G. (2003). Psychometric properties of several computer-based test designs with ideal and constrained item pool [Yayımlanmamış doktora tezi]. University of Massachusetts-Amherst.
Jodoin, M. G., Zenisky, A., & Hambleton, R. K. (2006). Comparison of the psychometric properties of several computer-based test design for credentialing exams with multiple purposes. Applied Measurement in Education, 19(3), 203–220. http://doi.org/10.1207/s15324818ame1903_3
Keng, L., & Dodd, B.G. (2009, Nisan). A comparison of the performance of testlet based computer adaptive tests and multistage tests [Sözlü bildiri]. National Council on Measurement in Education (NCME), San Diego, CA.
Kim, H., & Plake, B. S. (1993, Nisan). Monte carlo simulation comparison of two-stage testing and computerized adaptive testing [Sözlü bildiri]. National Council on Measurement in Education (NCME), Atlanta, GA.
Kim, J., Chung, H., Dodd, B. G., & Park, R. (2012). Panel design variations in the multistage test using the mixed-format tests. Educational and Psychological Measurement, 72(4), 574–588. https://doi.org/10.1177/0013164411428977
Kim, S., Moses, T., & Yoo, H. H. (2015). A comparison of IRT proficiency estimation methods under adaptive multistage testing. Journal of Educational Measurement, 52(1), 70–79. https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12063
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, (NJ): Lawrence Erlbaum Associates.
Luecht, R. M. (2000, Nisan). Implementing the Computer-Adaptive Sequantial Testing (CAST) framework to mass produce high quality computer adaptive and mastery tests [Sözlü bildiri]. National Council on Measurement in Education (NCME), New Orleans, LA.
Luecht, R., Brumfield, T., & Breithaupt, K. (2006). A testlet assembly design for adaptive multistage tests. Applied Measurement in Education, 19(3), 189–202. https://doi.org/10.1207/s15324818ame1903_2
Luecht, R. M., & Sireci, S. G. (2011). A review of models for computer-based testing. (Rapor No. RR-2011-12). College Board, New York. https://doi.org/10.1111/j.1745-3984.1998.tb00537.x
Luo, X., & Kim, D. (2018). A top-down approach to designing the computerized adaptive multistage test. Journal of Educational Measurement, 55(2), 243–263. https://doi.org/10.1111/jedm.12174
Magis, D., Yan, D. & von Davier, A. A. (2017). Computerized adaptive and multistage testing with R: Using packages catR and mstR. Cham, Switzerland: Springer International Publishing.
Mason, B. J., Patry, M., & Bernstein, D. J. (2001). An examination of equivalence between non adaptive computer-based test and traditional testing. Journal of Educational Computing Research 24(1), 29–39. https://doi.org/10.2190/9EPM-B14R-XQWT-WVNL
Mead, A. D. (2006). An introduction to multistage testing. Applied Measurement in Education, 19(3), 185–187. https://doi.org/10.1207/s15324818ame1903_1
Milli Eğitim Bakanlığı [MEB]. (2016). TIMSS 2015 ulusal matematik ve fen bilimleri ön raporu 4. ve 8. sınıflar. https://timss.meb.gov.tr/meb_iys_dosyalar/2022_03/07135609_TIMSS_2015_Ulusal_Rapor.pdf
Owen, R. J. (1975). A bayesian sequential procedure for quantal response in the context of adaptive mental testing. Journal of the American Statistical Association, 70(350), 351–356. https://doi.org/10.2307/2285821
Park, R. (2015). Investigating the impact of a mixed-format item pool on optimal test designs for multistage testing [Yayımlanmamış doktora tezi]. The University of Texas at Austin.
Patsula, L. N. (1999). A comparison of computerized adaptive testing and multistage testing [Yayımlanmamış doktora tezi). The University of Massachusetts Amherst. https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=4283&context=dissertations_1
Reese, L. M., Schnipke, D. L., & Luebke, S. W. (1999). Incorporating content constraints into a multi-stage adaptive testlet design. Law School Admission Council Computerized Testing Report. LSAC Research Report Series.
Samejima, F. (1968). Estimation of latent ability using a response patterns of graded scores. Psychometrika Monograph, 17, i–169.https://doi.org/10.1002/j.2333-8504.1968.tb00153.x
Sarı, H. İ., Yahşi Sarı, H., & Huggins Manley, A. C. (2016). Computer adaptive multistage testing: Practical issues, challenges and principles. Journal of Measurement and Evaluation in Education and Psychology, 7(2), 388–406. https://doi.org/10.21031/epod.280183
Sari, H. İ. (2016). Examining content control in adaptive tests: Computerized adaptive testing vs. computerized multistage testing [Yayımlanmamış doktora tezi]. University of Florida.
Schnipke, D. L., & Reese, L. M. (1999). A comparison of testlet-based test designs for computerized adaptive testing. (Rapor No: 97–01). ERIC Database.
Stark, S., & Chernyshenko, O. S. (2006). Multistage testing: Widely or narrowly applicable? Applied Measurement in Education, 19(3), 257–260. https://doi.org/10.1207/s15324818ame1903_6
Şahin, M. G. (2020). Analyzing different module characteristics in computer adaptive multistage testing. International Journal of Assessment Tools in Education, 9(2), 191–206. https://doi.org/10.21449/ijate.676947
Şahin, M. G., & Boztunç Öztürk, N. (2019). Analyzing the maximum likelihood score estimation method with fences in ca-MST. International Journal of Assessment Tools in Education 6(4), 555–567. https://dx.doi.org/10.21449/ijate.634091
van der Linden, W. J., & Pashley, P. J. (2010). Item selection and ability estimation in adaptive testing. W. J. van der Linden, & C. A. W. Glas (Haz.), Elements of adaptive testing içinde (s. 3–30). New York: Springer.
Wang, K. (2017). Fair comparison of the performance of computerized adaptive testing and multistage adaptive testing [Yayımlanmamış doktora tezi]. Michigan State University.
Wang, T. H., Wang, K. H., Wang, W. L., Huang, S. C., & Chen, S. Y. (2004). Web-based assesment and test analyses (WATA) Q3 system: Development and evaluation. Journal of Computer Assisted Learning, 20(1), 59–71. https://doi.org/10.1111/j.1365-2729.2004.00066.x
Wang, T., & Vispoel, W. P. (1998). Properties of ability estimation methods in computerized adaptive testing. Journal of Educational Measurement, 35(2), 109–135. https://doi.org/10.1111/j.1745-3984.1998.tb00530.x
Wang, X., Fluegge, L., & Luecht, R. M. (2012, Nisan). A large-scale comparative study of the accuracy and efficiency of ca-MST panel design configurations [Sözlü bildiri]. National Council on Measurement in Education (NCME), Vancouver, BC.
Warm, A. W. (1989). Weighted likelihood estimation of ability in item response theory with tests of finite length. Psychometrika, 54(3), 427–450. https://doi.org/10.1007/BF02294627
Weiss, D. J. (1983). New horizons in testing: Latent trait test theory and computerized adaptive testing. Academic Press: New York.
Weissman, A., Belov, D., & Armstrong, R. (2007). Information-based versus number-correct routing in multistage classification tests. (LSAC Research Report No:07–05). Newtown, PA.
Yan, D., Lewis, C., & von Davier, A. (2014a). Overview of computerized multistage tests. D. Yan, A. A. von Davier, & C. Lewis (Haz.), Computerized multistage testing: Theory and applications içinde (s. 3–20). London, England: Chapman & Hall.
Yan, D., von Davier, A. A., & Lewis, C. (Haz.). (2014b). Computerized multistage testing: Theory and applications. CRC Press.
Yang, L. (2016). Enhancing item pool utilization when designing multistage computerized adaptive tests [Yayımlanmamış doktora tezi]. Michigan State University.
Zenisky, A. L. (2004). Evaluating the effects of several multi-stage testing design variables on selected psychometric outcomes for certification and licensure assessment [Yayımlanmamış doktora tezi]. University of Massachusetts Amherst. https://scholarworks.umass.edu/dissertations/AAI3136800 adresinden erişilmiştir.
Zenisky, A., & Hambleton, R. (2014). Multistage test designs: Moving research results into practice. Yan, D., Von Davier, A., & Lewis, C. (Haz.), Computerized multistage testing: Theory and applications, içinde (s. 21–36). London, England: Chapman & Hall.
Zheng, Y. & Chang, H. H. (2014). Multistage testing, on-the-fly multistage testing, and beyond. Y. Cheng, & H. H. Chang (Haz.), Advancing methodologies to support both summative and formative assessments içinde (s. 21–40). Charlotte, NC: Information Age Publishing.
Zheng, Y., & Chang, H. H. (2015). On-the-fly assembled multistage adaptive testing. Applied Psychological Measurement, 39(2), 104–118. https://doi.org/10.1177/0146621614544519
Zheng, Y., Nozawa, Y., Gao, X., & Chang, H. H. (2012). Multistage adaptive testing for a large-scale classification test: Design, heuristic assembly, and comparison with other testing modes. ACT Research Report Series, ACT.

Comparing Multi-Stage Tests under Different Conditions in Terms of Panel Design, Module Length, Sample Size and Ability Parameter Estimation Methods

Yıl 2024, Cilt: 41 Sayı: 2, 9 - 27, 31.08.2024

Serap Büyükkıdık , Fatma Gökçen Ayva Yörü

https://doi.org/10.52597/buje.1329338

Cited By: 1

Öz

In this research, the performances of multi-stage tests under various simulation conditions have been compared in terms of evaluation criteria, including root mean square error (RMSE), standard error of estimate (SEE), bias, and mean absolute error (MAE). In the test simulation, 81 conditions (3x3x3x3) have been determined, including panel design (1-3, 1-2-3, 1-3-3), module length (6, 12, 18), sample size (300, 1000, 3000), and ability parameter estimation methods (expected a posteriori [EAP], maximum a posteriori [MAP], and maximum likelihood estimation with fences [MLEF]). The research findings indicate that RMSE and MAE values generally produce similar results, and measurement accuracy tends to increase with the lengthening of the module. Additionally, it was observed that RMSE, SEE, and MAE have the highest values in the 1-3 panel design and the lowest values in the 1-3-3 design. Researchers are recommended to conduct their studies using a 1-3-3 panel design, with a minimum module length of 12, and employing the EAP method.

Anahtar Kelimeler

Multi-stage test, panel design, module length, sample size, ability parameter estimation method

Kaynakça

Armstrong, R. D., & Roussos, L. (2005). A method to determine targets for multi-stage adaptive tests. Research Report 02-07. Newton, PA: School Admission Council.
Armstrong, R. D., Jones, D. H., Koppel, N. B., & Pashley, P. J. (2004). Computerized adaptive testing with multiple-form structures. Applied Psychological Measurement, 28(3), 147–164. https://doi.org/10.1177/0146621604263652
Baker, F. B. (2001). The basics of item response theory. United States of America: ERIC Clearinghouse on Assessment and Evaluation.
Belov, D. I., & Armstrong, R. D. (2008). A monte carlo approach to the design, assembly, and evaluation of multistage adaptive tests. Applied Psychological Measurement, 32(2), 119–137. https://doi.org/10.1177/0146621606297308
Bock, R. D., & Aitkin, M. (1981). Marginal maximum likelihood estimation of item parameters: Application of an EM algorithm. Psychometrika, 46(4), 443–459. https://doi.org/10.1007/BF02293801
Bock, R. D., & Mislevy, R. J. (1982). Adaptive EAP estimation of ability in a microcomputer environment. Applied Psychological Measurement, 6(4), 431–444. https://doi.org/10.1177/014662168200600405
Boztunç Öztürk, N. (2019). How the length and characteristics of routing module affect ability estimation in ca-MST? Universal Journal of Educational Research, 7(1), 164–170. https://doi.org/10.13189/ujer.2019.070121
Breithaupt, K., & Hare, D. R. (2007). Automated simultaneous assembly of multistage testlets for a high-stakes licensing examination. Educational and Psychological Measurement, 67(1), 5–20. https://doi.org/10.1177/0013164406288162
Büyükkıdık, S. & Ayva Yörü, F. G. (2022, Eylül). Çok aşamalı testlerin panel deseni, modül uzunluğu, örneklem büyüklüğü ve yetenek kestirim yöntemleri açısından karşılaştırılması [Sözlü bildiri]. 8. Uluslararası Eğitimde ve Psikolojide Ölçme ve Değerlendirme Kongresi, İzmir.
Chen, L. Y. (2010). An investigation of the optimal test design for multi-stage test using the generalized partical credit model [Yayımlanmamış doktora tezi]. The University of Texas at Austin.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2. baskı). Routledge.
Crocker, L., & Algina, J. (1986). Introduction to classical and modern test theory. New York: Holt, Rinehart & Winston.
Dallas, A. (2014). The effects of routing and scoring within a computer adaptive multi-stage framework. [Yayımlanmamış doktora tezi]. The University of North Carolina.
Dallas, A., Wang, X., Furter, R., & Luecht, R. M. (2012, Nisan). Item pool size, targeted item writing and panel replication strategies for a 1-3-3 multistage test design [Sözlü bildiri]. National Council on Measurement in Education (NCME), Vancouver, BC.
Davey, T., & Lee, Y. H. (2011). Potential impact of context effects on the scoring and equating of the multistage GRE® revised general test. ETS Research Report Series, 2011(2), i–44. https://www.ets.org/research/policy_research_reports/publications/report/2011/itjm.html
Davis, L. L., & Dodd, B. G. (2003). Item exposure constraints for testlets in the verbal reasoning section of the MCAT. Applied Psychological Measurement, 27(5), 335–356. https://doi.org/10.1177/0146621603256804
De Ayala, R. J. (2009). The theory and practice of item response theory. New York: The Guilford Press.
Doğruöz, E. (2018). Bireyselleştirilmiş çok aşamalı testlerin test birleştirme yöntemlerine göre incelenmesi [Yayımlanmamış doktora tezi]. Hacettepe Üniversitesi.
Drasgow, F., & Mattern, K. (2006). New tests and new items: Opportunities and issues. D. Bartram & R. Hambleton (Haz.), Computer based testing and internet: Issues and advances içinde (s. 59–77). Educational testing service: London.
Edwards, M. C., Flora, D. B., & Thissen, D. (2012). Multistage computerized adaptive testing with uniform item exposure. Applied Measurement in Education, 25(2), 118–141. https://doi.org/10.1080/08957347.2012.660363
Erdem Kara, B. (2019). Değişen madde fonksiyonu gösteren madde oranının bireyselleştirilmiş bilgisayarlı ve çok aşamalı testler üzerindeki etkisi [Yayımlanmamış doktora tezi]. Hacettepe Üniversitesi.
Ertaş Polat, F. G. (2022). Çok aşamalı bireye uyarlanmış testlerde farklı koşullardan elde edilen yetenek kestirimlerinin karşılaştırılması [Yayımlanmamış doktora tezi]. Hacettepe Üniversitesi.
Hambleton, R. K., & Xing, D. (2006). Optimal and nonoptimal computerbased test designs for making pass-fail decisions. Applied Measurement in Education, 19(3), 221–239 https://doi.org/10.1207/s15324818ame1903_4
Hambleton, R. K., Swaminathan, H., & Rogers, H. J. (1991). Fundamentals of item response theory. London: Sage.
Han, K. C. T., & Guo, F. (2014). Multistage testing by shaping modules on the fly. D. Yan, A. A. von Davier, & C. Lewis (Haz.), Computerized multistage testing: Theory and applications içinde (s. 119–133). Chapman and Hall/CRC.
Han, K. T. (2013). MSTGen: Simulated data generator for multistage testing. Applied Psychological Measurement, 37(8), 666–668. https://doi.org/10.1177/0146621613499639
Hembry, I. F. (2014). Operational characteristics of mixed format multistage tests using the 3PL testlet response theory model [Yayımlanmamış doktora tezi]. The University of Texas at Austin.
Hendrickson, A. (2007). An NCME instructional module on multistage testing. Educational Measurement: Issues and Practice, 26(2), 44–52. https://doi.org/10.1111/j.1745-3992.2007.00093.x
International Association for the Evaluation of Educational Achievement. (2021). TIMSS 2019 international database [Veri seti]. TIMSS & PIRLS International Study Center. https://timss2019.org/international-database/?_gl=1*1gitpgj*_ga*OTg0NzE0MzYuMTY0NTk5NzE4MQ..*_ga_L2FMXN42HR*MTY0Njc3OTQ2OC41LjAuMTY0Njc3OTQ2OC4w
Jodoin, M. G. (2003). Psychometric properties of several computer-based test designs with ideal and constrained item pool [Yayımlanmamış doktora tezi]. University of Massachusetts-Amherst.
Jodoin, M. G., Zenisky, A., & Hambleton, R. K. (2006). Comparison of the psychometric properties of several computer-based test design for credentialing exams with multiple purposes. Applied Measurement in Education, 19(3), 203–220. http://doi.org/10.1207/s15324818ame1903_3
Keng, L., & Dodd, B.G. (2009, Nisan). A comparison of the performance of testlet based computer adaptive tests and multistage tests [Sözlü bildiri]. National Council on Measurement in Education (NCME), San Diego, CA.
Kim, H., & Plake, B. S. (1993, Nisan). Monte carlo simulation comparison of two-stage testing and computerized adaptive testing [Sözlü bildiri]. National Council on Measurement in Education (NCME), Atlanta, GA.
Kim, J., Chung, H., Dodd, B. G., & Park, R. (2012). Panel design variations in the multistage test using the mixed-format tests. Educational and Psychological Measurement, 72(4), 574–588. https://doi.org/10.1177/0013164411428977
Kim, S., Moses, T., & Yoo, H. H. (2015). A comparison of IRT proficiency estimation methods under adaptive multistage testing. Journal of Educational Measurement, 52(1), 70–79. https://onlinelibrary.wiley.com/doi/abs/10.1111/jedm.12063
Lord, F. M. (1980). Applications of item response theory to practical testing problems. Hillsdale, (NJ): Lawrence Erlbaum Associates.
Luecht, R. M. (2000, Nisan). Implementing the Computer-Adaptive Sequantial Testing (CAST) framework to mass produce high quality computer adaptive and mastery tests [Sözlü bildiri]. National Council on Measurement in Education (NCME), New Orleans, LA.
Luecht, R., Brumfield, T., & Breithaupt, K. (2006). A testlet assembly design for adaptive multistage tests. Applied Measurement in Education, 19(3), 189–202. https://doi.org/10.1207/s15324818ame1903_2
Luecht, R. M., & Sireci, S. G. (2011). A review of models for computer-based testing. (Rapor No. RR-2011-12). College Board, New York. https://doi.org/10.1111/j.1745-3984.1998.tb00537.x
Luo, X., & Kim, D. (2018). A top-down approach to designing the computerized adaptive multistage test. Journal of Educational Measurement, 55(2), 243–263. https://doi.org/10.1111/jedm.12174
Magis, D., Yan, D. & von Davier, A. A. (2017). Computerized adaptive and multistage testing with R: Using packages catR and mstR. Cham, Switzerland: Springer International Publishing.
Mason, B. J., Patry, M., & Bernstein, D. J. (2001). An examination of equivalence between non adaptive computer-based test and traditional testing. Journal of Educational Computing Research 24(1), 29–39. https://doi.org/10.2190/9EPM-B14R-XQWT-WVNL
Mead, A. D. (2006). An introduction to multistage testing. Applied Measurement in Education, 19(3), 185–187. https://doi.org/10.1207/s15324818ame1903_1
Milli Eğitim Bakanlığı [MEB]. (2016). TIMSS 2015 ulusal matematik ve fen bilimleri ön raporu 4. ve 8. sınıflar. https://timss.meb.gov.tr/meb_iys_dosyalar/2022_03/07135609_TIMSS_2015_Ulusal_Rapor.pdf
Owen, R. J. (1975). A bayesian sequential procedure for quantal response in the context of adaptive mental testing. Journal of the American Statistical Association, 70(350), 351–356. https://doi.org/10.2307/2285821
Park, R. (2015). Investigating the impact of a mixed-format item pool on optimal test designs for multistage testing [Yayımlanmamış doktora tezi]. The University of Texas at Austin.
Patsula, L. N. (1999). A comparison of computerized adaptive testing and multistage testing [Yayımlanmamış doktora tezi). The University of Massachusetts Amherst. https://scholarworks.umass.edu/cgi/viewcontent.cgi?article=4283&context=dissertations_1
Reese, L. M., Schnipke, D. L., & Luebke, S. W. (1999). Incorporating content constraints into a multi-stage adaptive testlet design. Law School Admission Council Computerized Testing Report. LSAC Research Report Series.
Samejima, F. (1968). Estimation of latent ability using a response patterns of graded scores. Psychometrika Monograph, 17, i–169.https://doi.org/10.1002/j.2333-8504.1968.tb00153.x
Sarı, H. İ., Yahşi Sarı, H., & Huggins Manley, A. C. (2016). Computer adaptive multistage testing: Practical issues, challenges and principles. Journal of Measurement and Evaluation in Education and Psychology, 7(2), 388–406. https://doi.org/10.21031/epod.280183
Sari, H. İ. (2016). Examining content control in adaptive tests: Computerized adaptive testing vs. computerized multistage testing [Yayımlanmamış doktora tezi]. University of Florida.
Schnipke, D. L., & Reese, L. M. (1999). A comparison of testlet-based test designs for computerized adaptive testing. (Rapor No: 97–01). ERIC Database.
Stark, S., & Chernyshenko, O. S. (2006). Multistage testing: Widely or narrowly applicable? Applied Measurement in Education, 19(3), 257–260. https://doi.org/10.1207/s15324818ame1903_6
Şahin, M. G. (2020). Analyzing different module characteristics in computer adaptive multistage testing. International Journal of Assessment Tools in Education, 9(2), 191–206. https://doi.org/10.21449/ijate.676947
Şahin, M. G., & Boztunç Öztürk, N. (2019). Analyzing the maximum likelihood score estimation method with fences in ca-MST. International Journal of Assessment Tools in Education 6(4), 555–567. https://dx.doi.org/10.21449/ijate.634091
van der Linden, W. J., & Pashley, P. J. (2010). Item selection and ability estimation in adaptive testing. W. J. van der Linden, & C. A. W. Glas (Haz.), Elements of adaptive testing içinde (s. 3–30). New York: Springer.
Wang, K. (2017). Fair comparison of the performance of computerized adaptive testing and multistage adaptive testing [Yayımlanmamış doktora tezi]. Michigan State University.
Wang, T. H., Wang, K. H., Wang, W. L., Huang, S. C., & Chen, S. Y. (2004). Web-based assesment and test analyses (WATA) Q3 system: Development and evaluation. Journal of Computer Assisted Learning, 20(1), 59–71. https://doi.org/10.1111/j.1365-2729.2004.00066.x
Wang, T., & Vispoel, W. P. (1998). Properties of ability estimation methods in computerized adaptive testing. Journal of Educational Measurement, 35(2), 109–135. https://doi.org/10.1111/j.1745-3984.1998.tb00530.x
Wang, X., Fluegge, L., & Luecht, R. M. (2012, Nisan). A large-scale comparative study of the accuracy and efficiency of ca-MST panel design configurations [Sözlü bildiri]. National Council on Measurement in Education (NCME), Vancouver, BC.
Warm, A. W. (1989). Weighted likelihood estimation of ability in item response theory with tests of finite length. Psychometrika, 54(3), 427–450. https://doi.org/10.1007/BF02294627
Weiss, D. J. (1983). New horizons in testing: Latent trait test theory and computerized adaptive testing. Academic Press: New York.
Weissman, A., Belov, D., & Armstrong, R. (2007). Information-based versus number-correct routing in multistage classification tests. (LSAC Research Report No:07–05). Newtown, PA.
Yan, D., Lewis, C., & von Davier, A. (2014a). Overview of computerized multistage tests. D. Yan, A. A. von Davier, & C. Lewis (Haz.), Computerized multistage testing: Theory and applications içinde (s. 3–20). London, England: Chapman & Hall.
Yan, D., von Davier, A. A., & Lewis, C. (Haz.). (2014b). Computerized multistage testing: Theory and applications. CRC Press.
Yang, L. (2016). Enhancing item pool utilization when designing multistage computerized adaptive tests [Yayımlanmamış doktora tezi]. Michigan State University.
Zenisky, A. L. (2004). Evaluating the effects of several multi-stage testing design variables on selected psychometric outcomes for certification and licensure assessment [Yayımlanmamış doktora tezi]. University of Massachusetts Amherst. https://scholarworks.umass.edu/dissertations/AAI3136800 adresinden erişilmiştir.
Zenisky, A., & Hambleton, R. (2014). Multistage test designs: Moving research results into practice. Yan, D., Von Davier, A., & Lewis, C. (Haz.), Computerized multistage testing: Theory and applications, içinde (s. 21–36). London, England: Chapman & Hall.
Zheng, Y. & Chang, H. H. (2014). Multistage testing, on-the-fly multistage testing, and beyond. Y. Cheng, & H. H. Chang (Haz.), Advancing methodologies to support both summative and formative assessments içinde (s. 21–40). Charlotte, NC: Information Age Publishing.
Zheng, Y., & Chang, H. H. (2015). On-the-fly assembled multistage adaptive testing. Applied Psychological Measurement, 39(2), 104–118. https://doi.org/10.1177/0146621614544519
Zheng, Y., Nozawa, Y., Gao, X., & Chang, H. H. (2012). Multistage adaptive testing for a large-scale classification test: Design, heuristic assembly, and comparison with other testing modes. ACT Research Report Series, ACT.

Toplam 71 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	Türkçe
Konular	Eğitimde Ölçme ve Değerlendirme (Diğer)
Bölüm	Özgün Çalışma
Yazarlar	Serap Büyükkıdık 0000-0003-4335-2949 Fatma Gökçen Ayva Yörü 0000-0002-4555-1987
Yayımlanma Tarihi	31 Ağustos 2024
Yayımlandığı Sayı	Yıl 2024 Cilt: 41 Sayı: 2

Kaynak Göster

APA	Büyükkıdık, S., & Ayva Yörü, F. G. (2024). Çok Aşamalı Testlerin Panel Deseni, Modül Uzunluğu, Örneklem Büyüklüğü ve Yetenek Parametresi Kestirim Yöntemleri Açısından Farklı Koşullar Altında Karşılaştırılması. Bogazici University Journal of Education, 41(2), 9-27. https://doi.org/10.52597/buje.1329338

Cited By

Çok Aşamalı Testlerin Panel Deseni, Modül Uzunluğu, Örneklem Büyüklüğü ve Yetenek Parametresi Kestirim Yöntemleri Açısından Farklı Koşullar Altında Karşılaştırılması

Boğaziçi Üniversitesi Eğitim Dergisi

https://doi.org/10.52597/buje.1329338

Kapak Resmi İndir

Makale Dosyaları

Tam Metin

This work is licensed under a Creative Commons Attribution 4.0 International License.