Research Article
BibTex RIS Cite

Using Rasch analysis to examine raters’ expertise Turkish teacher candidates’ competency levels in writing different types of test items

Year 2022, Volume: 9 Issue: 4, 998 - 1012, 22.12.2022
https://doi.org/10.21449/ijate.1058300

Abstract

The aim of the present study was to examine Turkish teacher candidates’ competency levels in writing different types of test items by utilizing Rasch analysis. In addition, the effect of the expertise of the raters scoring the items written by the teacher candidates was examined within the scope of the study. 84 Turkish teacher candidates participated in the present study, which was conducted using the relational survey model, one of the quantitative research methods. Three experts participated in the rating process: an expert in Turkish education, an expert in measurement and evaluation, and an expert in both Turkish education and measurement and evaluation. The teacher candidates wrote true-false, short response, multiple choice and open-ended types of items in accordance with the Test Item Development Form, and the raters scored each item type by designating a score between 1 and 5 based on the item evaluation scoring rubric prepared for each item type. The study revealed that Turkish teacher candidates had the highest level of competency in writing true-false items, while they had the lowest competency in writing multiple-choice items. Moreover, it was revealed that raters’ expertise had an effect on teacher candidates’ competencies in writing different types of items. Finally, it was found that the rater who was an expert in both Turkish education and measurement and evaluation had the highest level of scoring reliability, while the rater who solely had expertise in measurement and evaluation had the relatively lowest level of scoring reliability.

References

  • Anthony, C.J., Styck, K.M., Volpe, R.J., & Robert, C.R. (2022). Using many-facet rasch measurement and generalizability theory to explore rater effects for direct behavior rating–multi-item scales. School Psychology. Advance online publication. https://doi.org/10.1037/spq0000518
  • Asim, A.E., Ekuri, E.E., & Eni, E.I. (2013). A Diagnostic Study of Pre-Service Teachers’ Competency in Multiple-Choice Item Development. Research in Education, 89(1), 13–22. https://doi.org/10.7227/RIE.89.1.2
  • Atılgan, H., & Tezbaşaran, A. (2005). Genellenebilirlik kuramı alternatif karar çalışmaları ile senaryolar ve gerçek durumlar için elde edilen g ve phi katsayılarının tutarlılığının incelenmesi. Eğitim Araştırmaları, 18(1), 28-40.
  • Barkaoui, K. (2010). Do ESL essay raters’ evaluation criteria change with experience? A mixed-methods, cross-sectional study. TESOL Quarterly, 44(1), 31–57.
  • Baykul, Y. (2000). Eğitimde ve psikolojide ölçme. ÖSYM Yayınları.
  • Bıkmaz Bilgen, Ö., & Doğan, N. (2017). Puanlayıcılar Arası Güvenirlik Belirleme Tekniklerinin Karşılaştırılması. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 8(1), 63-78. https://doi.org/10.21031/epod.294847
  • Büyüköztürk, Ş., Kılıç Çakmak, E., Akgün, Ö. E., Karadeniz, Ş., & Demirel, F. (2018). Eğitimde bilimsel araştırma yöntemleri. Pegem Akademi. https://doi.org/10.14527/9789944919289
  • Crocker, L.M. & Algina, L. (2008). Introduction to classical and modern test theory. Holt, Rinehart and Winston.
  • Davis, L. (2016). The influence of training and experience on rater performance in scoring spoken language. Language Testing, 33(1), 117–135.
  • Erguvan, I.D. & Aksu Dünya, B. (2021). Gathering evidence on e-rubrics: Perspectives and many facet Rasch analysis of rating behavior. International Journal of Assessment Tools in Education , 8(2) , 454-474 . https://doi.org/10.21449/ijate.818151
  • Erman Aslanoğlu, A., & Şata, M. (2021). Examining the differential rater functioning in the process of assessing writing skills of middle school 7th grade students. Participatory Educational Research (PER), 8(4), 239-252. https://doi.org/10.17275/per.21.88.8.4
  • Farrokhi, F., Esfandiari, R., & Schaefer, E. (2012). A many-facet Rasch measurement of differential rater severity/leniency in three types of assessment. JALT Journal, 34(1), 79-101. https://doi.org/10.37546/JALTJJ34.1-3
  • Farrokhi, F., Esfandiari, R., & Vaez Dalili, M. (2011). Applying the many-facet Rasch model to detect centrality in self-assessment, peer-assessment and teacher assessment. World Applied Sciences Journal, 15(11), 76-83. https://doi.org/10.4304/tpls.1.11.1531-1540
  • Fuhrman, M. (1996) Developing Good Multiple-Choice Tests and Test Items, Journal of Geoscience Education, 44(4), 379-384. https://doi.org/10.5408/1089-9995-44.4.379
  • Gierl, M.J., Bulut, O., Guo, Q., & Zhang, X. (2017). Developing, Analyzing, and Using Distractors for Multiple-Choice Tests in Education: A Comprehensive Review. Review of Educational Research, 87(6), 1082-1116. https://doi.org/10.3102/0034654317726529
  • Goodwin, S. (2016). A Many-Facet Rasch analysis comparing essay rater behavior on an academic English reading/writing test used for two purposes. Assessing Writing, 30(1), 21-31. https://doi.org/10.1016/j.asw.2016.07.004
  • Gorin, J.S. (2007). Reconsidering issues in validity theory. Educational Researcher, 36(8), 456-462. https://doi.org/10.3102/0013189X07311607
  • Haladyna, T.M., Downing, S.M., & Rodriguez, M.C. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment, Applied Measurement in Education, 15(3), 309-333. https://doi.org/10.1207/S15324818AME1503_5
  • Jones, E., & Bergin, C. (2019) Evaluating Teacher Effectiveness Using Classroom Observations: A Rasch Analysis of the Rater Effects of Principals, Educational Assessment, 24(2), 91-118. https://doi.org/10.1080/10627197.2018.1564272
  • Kamış, Ö. & Doğan, C.D. (2017). How consistent are decision studies in G theory?. Gazi University Journal of Gazi Educational Faculty, 37(2), 591-610.
  • Kara, Y., & Kelecioğlu, H. (2015). Puanlayıcı Niteliklerinin Kesme Puanlarının Belirlenmesine Etkisinin Genellenebilirlik Kuramı’yla İncelenmesi. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 6(1), 58-71. https://doi.org/10.21031/epod.47997
  • Karasar, N. (2018). Bilimsel araştırma yöntemi (33th ed.). Ankara: Nobel Yayıncılık.
  • Kim, H. (2020). Kim, H. Effects of rating criteria order on the halo effect in L2 writing assessment: a many-facet Rasch measurement analysis. Lang Test Asia 10(16), 1-23, https://doi.org/10.1186/s40468-020-00115-0
  • Leckie, G., & Baird, J.A. (2011). Rater effects on essay scoring: A multilevel analysis of severity drift, central tendency, and rater experience. Journal of Educational Measurement, 48(4), 399-418. https://doi.org/10.1111/j.1745-3984.2011.00152.x
  • Li, W. (2022). Scoring rubric reliability and internal validity in rater-mediated EFL writing assessment: Insights from many-facet Rasch measurement. Read Writ. https://doi.org/10.1007/s11145-022-10279-1
  • Linacre, J.M. (1993). Rasch-based generalizability theory. Rasch Measurement Transaction, 7(1), 283-284.
  • Linacre, J.M. (2012). FACETS (Version 3.70.1) [Computer Software]. MESA Press.
  • Linacre, J.M. (2017). FACETS (Version 3.80.0) [Computer Software]. MESA Press.
  • Linn, R.L., & Grolund, N.E. (2000). Measurement and assessment in teaching (8th ed.). Merrill/Prentice Hall.
  • Marais, I., & Andrich, D. (2008). Formalizing dimension and response violations of local independence in the unidimensional Rasch model. J Appl Meas, 9(3), 200-215.
  • McDonald, R.P. (1999). Test theory: A unified approach. Lawrence Erlbaum.
  • Meadows, M., & Billington, L. (2010). The effect of marker background and training on the quality of marking in GCSE English. AQA Education.
  • Milli Eğitim Bakanlığı (2019). Türkçe Dersi Öğretim Programı (İlkokul ve Ortaokul 1, 2, 3, 4, 5, 6, 7 ve 8. Sınıflar). MEB Yayınları.
  • Myford, C.M., & Wolfe, E.W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386-422.
  • Myford, C.M., & Wolfe, E.W. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part II. Journal of Applied Measurement, 5(2), 189-227.
  • Osburn, H.G. (2000). Coefficient alpha and related internal consistency reliability coefficients. Psychological methods, 5(3), 343-355.
  • Özçelik, D.A. (2010a). Ölçme ve değerlendirme. Pegem Akademi.
  • Özçelik, D.A. (2010b). Test geliştirme kılavuzu. Pegem Akademi.
  • Primi, R., Silvia, P.J., Jauk, E., & Benedek, M. (2019). Applying many-facet Rasch modeling in the assessment of creativity. Psychology of Aesthetics, Creativity, and the Arts, 13(2), 176–186. https://doi.org/10.1037/aca0000230
  • Sayın, A., & Kahraman, N. (2020). A measurement tool for repeated measurement of assessment of university students’ writing skill: development and evaluation. Journal of Measurement and Evaluation in Education and Psychology, 11(2), 113-130. https://doi.org/10.21031/epod.639148
  • Sayın, A., & Takıl, N.B. (2017). Opinions of the Turkish teacher candidates for change in the reading skills of the students in the 15 year old group. International Journal of Language Academy, 5(2), 266-284. http://dx.doi.org/10.18033/ijla.3561
  • Sireci, S.G. (2007). On validity theory and test validition. Educational Researcher, 36(8), 477-481. https://doi.org/10.3102/0013189X07311609
  • Song, T., Wolfe, E.W., Hahn, L., Less-Petersen, M., Sanders, R., & Vickers, D. (2014). Relationship between rater background and rater performance. Pearson.
  • Tan, Ş. (2012). Öğretimde ölçme ve değerlendirme KPSS el kitabı. Ankara: Pegem Akademi.
  • Tekin, H. (2004). Eğitimde ölçme ve değerlendirme. Yargı Yayınevi.
  • Walsh, W.B., & Betz, N.E. (1995). Tests and assessment. Prentice-Hall, Inc.
  • Wiseman, C.S. (2012). Rater effects: Ego engagement in rater decision-making. Assessing Writing, 17(3), 150-173. https://doi.org/10.1016/j.asw.2011.12.001

Using Rasch analysis to examine raters’ expertise Turkish teacher candidates’ competency levels in writing different types of test items

Year 2022, Volume: 9 Issue: 4, 998 - 1012, 22.12.2022
https://doi.org/10.21449/ijate.1058300

Abstract

The aim of the present study was to examine Turkish teacher candidates’ competency levels in writing different types of test items by utilizing Rasch analysis. In addition, the effect of the expertise of the raters scoring the items written by the teacher candidates was examined within the scope of the study. 84 Turkish teacher candidates participated in the present study, which was conducted using the relational survey model, one of the quantitative research methods. Three experts participated in the rating process: an expert in Turkish education, an expert in measurement and evaluation, and an expert in both Turkish education and measurement and evaluation. The teacher candidates wrote true-false, short response, multiple choice and open-ended types of items in accordance with the Test Item Development Form, and the raters scored each item type by designating a score between 1 and 5 based on the item evaluation scoring rubric prepared for each item type. The study revealed that Turkish teacher candidates had the highest level of competency in writing true-false items, while they had the lowest competency in writing multiple-choice items. Moreover, it was revealed that raters’ expertise had an effect on teacher candidates’ competencies in writing different types of items. Finally, it was found that the rater who was an expert in both Turkish education and measurement and evaluation had the highest level of scoring reliability, while the rater who solely had expertise in measurement and evaluation had the relatively lowest level of scoring reliability.

References

  • Anthony, C.J., Styck, K.M., Volpe, R.J., & Robert, C.R. (2022). Using many-facet rasch measurement and generalizability theory to explore rater effects for direct behavior rating–multi-item scales. School Psychology. Advance online publication. https://doi.org/10.1037/spq0000518
  • Asim, A.E., Ekuri, E.E., & Eni, E.I. (2013). A Diagnostic Study of Pre-Service Teachers’ Competency in Multiple-Choice Item Development. Research in Education, 89(1), 13–22. https://doi.org/10.7227/RIE.89.1.2
  • Atılgan, H., & Tezbaşaran, A. (2005). Genellenebilirlik kuramı alternatif karar çalışmaları ile senaryolar ve gerçek durumlar için elde edilen g ve phi katsayılarının tutarlılığının incelenmesi. Eğitim Araştırmaları, 18(1), 28-40.
  • Barkaoui, K. (2010). Do ESL essay raters’ evaluation criteria change with experience? A mixed-methods, cross-sectional study. TESOL Quarterly, 44(1), 31–57.
  • Baykul, Y. (2000). Eğitimde ve psikolojide ölçme. ÖSYM Yayınları.
  • Bıkmaz Bilgen, Ö., & Doğan, N. (2017). Puanlayıcılar Arası Güvenirlik Belirleme Tekniklerinin Karşılaştırılması. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 8(1), 63-78. https://doi.org/10.21031/epod.294847
  • Büyüköztürk, Ş., Kılıç Çakmak, E., Akgün, Ö. E., Karadeniz, Ş., & Demirel, F. (2018). Eğitimde bilimsel araştırma yöntemleri. Pegem Akademi. https://doi.org/10.14527/9789944919289
  • Crocker, L.M. & Algina, L. (2008). Introduction to classical and modern test theory. Holt, Rinehart and Winston.
  • Davis, L. (2016). The influence of training and experience on rater performance in scoring spoken language. Language Testing, 33(1), 117–135.
  • Erguvan, I.D. & Aksu Dünya, B. (2021). Gathering evidence on e-rubrics: Perspectives and many facet Rasch analysis of rating behavior. International Journal of Assessment Tools in Education , 8(2) , 454-474 . https://doi.org/10.21449/ijate.818151
  • Erman Aslanoğlu, A., & Şata, M. (2021). Examining the differential rater functioning in the process of assessing writing skills of middle school 7th grade students. Participatory Educational Research (PER), 8(4), 239-252. https://doi.org/10.17275/per.21.88.8.4
  • Farrokhi, F., Esfandiari, R., & Schaefer, E. (2012). A many-facet Rasch measurement of differential rater severity/leniency in three types of assessment. JALT Journal, 34(1), 79-101. https://doi.org/10.37546/JALTJJ34.1-3
  • Farrokhi, F., Esfandiari, R., & Vaez Dalili, M. (2011). Applying the many-facet Rasch model to detect centrality in self-assessment, peer-assessment and teacher assessment. World Applied Sciences Journal, 15(11), 76-83. https://doi.org/10.4304/tpls.1.11.1531-1540
  • Fuhrman, M. (1996) Developing Good Multiple-Choice Tests and Test Items, Journal of Geoscience Education, 44(4), 379-384. https://doi.org/10.5408/1089-9995-44.4.379
  • Gierl, M.J., Bulut, O., Guo, Q., & Zhang, X. (2017). Developing, Analyzing, and Using Distractors for Multiple-Choice Tests in Education: A Comprehensive Review. Review of Educational Research, 87(6), 1082-1116. https://doi.org/10.3102/0034654317726529
  • Goodwin, S. (2016). A Many-Facet Rasch analysis comparing essay rater behavior on an academic English reading/writing test used for two purposes. Assessing Writing, 30(1), 21-31. https://doi.org/10.1016/j.asw.2016.07.004
  • Gorin, J.S. (2007). Reconsidering issues in validity theory. Educational Researcher, 36(8), 456-462. https://doi.org/10.3102/0013189X07311607
  • Haladyna, T.M., Downing, S.M., & Rodriguez, M.C. (2002). A Review of Multiple-Choice Item-Writing Guidelines for Classroom Assessment, Applied Measurement in Education, 15(3), 309-333. https://doi.org/10.1207/S15324818AME1503_5
  • Jones, E., & Bergin, C. (2019) Evaluating Teacher Effectiveness Using Classroom Observations: A Rasch Analysis of the Rater Effects of Principals, Educational Assessment, 24(2), 91-118. https://doi.org/10.1080/10627197.2018.1564272
  • Kamış, Ö. & Doğan, C.D. (2017). How consistent are decision studies in G theory?. Gazi University Journal of Gazi Educational Faculty, 37(2), 591-610.
  • Kara, Y., & Kelecioğlu, H. (2015). Puanlayıcı Niteliklerinin Kesme Puanlarının Belirlenmesine Etkisinin Genellenebilirlik Kuramı’yla İncelenmesi. Eğitimde ve Psikolojide Ölçme ve Değerlendirme Dergisi, 6(1), 58-71. https://doi.org/10.21031/epod.47997
  • Karasar, N. (2018). Bilimsel araştırma yöntemi (33th ed.). Ankara: Nobel Yayıncılık.
  • Kim, H. (2020). Kim, H. Effects of rating criteria order on the halo effect in L2 writing assessment: a many-facet Rasch measurement analysis. Lang Test Asia 10(16), 1-23, https://doi.org/10.1186/s40468-020-00115-0
  • Leckie, G., & Baird, J.A. (2011). Rater effects on essay scoring: A multilevel analysis of severity drift, central tendency, and rater experience. Journal of Educational Measurement, 48(4), 399-418. https://doi.org/10.1111/j.1745-3984.2011.00152.x
  • Li, W. (2022). Scoring rubric reliability and internal validity in rater-mediated EFL writing assessment: Insights from many-facet Rasch measurement. Read Writ. https://doi.org/10.1007/s11145-022-10279-1
  • Linacre, J.M. (1993). Rasch-based generalizability theory. Rasch Measurement Transaction, 7(1), 283-284.
  • Linacre, J.M. (2012). FACETS (Version 3.70.1) [Computer Software]. MESA Press.
  • Linacre, J.M. (2017). FACETS (Version 3.80.0) [Computer Software]. MESA Press.
  • Linn, R.L., & Grolund, N.E. (2000). Measurement and assessment in teaching (8th ed.). Merrill/Prentice Hall.
  • Marais, I., & Andrich, D. (2008). Formalizing dimension and response violations of local independence in the unidimensional Rasch model. J Appl Meas, 9(3), 200-215.
  • McDonald, R.P. (1999). Test theory: A unified approach. Lawrence Erlbaum.
  • Meadows, M., & Billington, L. (2010). The effect of marker background and training on the quality of marking in GCSE English. AQA Education.
  • Milli Eğitim Bakanlığı (2019). Türkçe Dersi Öğretim Programı (İlkokul ve Ortaokul 1, 2, 3, 4, 5, 6, 7 ve 8. Sınıflar). MEB Yayınları.
  • Myford, C.M., & Wolfe, E.W. (2003). Detecting and measuring rater effects using many-facet Rasch measurement: Part I. Journal of Applied Measurement, 4(4), 386-422.
  • Myford, C.M., & Wolfe, E.W. (2004). Detecting and measuring rater effects using many-facet Rasch measurement: Part II. Journal of Applied Measurement, 5(2), 189-227.
  • Osburn, H.G. (2000). Coefficient alpha and related internal consistency reliability coefficients. Psychological methods, 5(3), 343-355.
  • Özçelik, D.A. (2010a). Ölçme ve değerlendirme. Pegem Akademi.
  • Özçelik, D.A. (2010b). Test geliştirme kılavuzu. Pegem Akademi.
  • Primi, R., Silvia, P.J., Jauk, E., & Benedek, M. (2019). Applying many-facet Rasch modeling in the assessment of creativity. Psychology of Aesthetics, Creativity, and the Arts, 13(2), 176–186. https://doi.org/10.1037/aca0000230
  • Sayın, A., & Kahraman, N. (2020). A measurement tool for repeated measurement of assessment of university students’ writing skill: development and evaluation. Journal of Measurement and Evaluation in Education and Psychology, 11(2), 113-130. https://doi.org/10.21031/epod.639148
  • Sayın, A., & Takıl, N.B. (2017). Opinions of the Turkish teacher candidates for change in the reading skills of the students in the 15 year old group. International Journal of Language Academy, 5(2), 266-284. http://dx.doi.org/10.18033/ijla.3561
  • Sireci, S.G. (2007). On validity theory and test validition. Educational Researcher, 36(8), 477-481. https://doi.org/10.3102/0013189X07311609
  • Song, T., Wolfe, E.W., Hahn, L., Less-Petersen, M., Sanders, R., & Vickers, D. (2014). Relationship between rater background and rater performance. Pearson.
  • Tan, Ş. (2012). Öğretimde ölçme ve değerlendirme KPSS el kitabı. Ankara: Pegem Akademi.
  • Tekin, H. (2004). Eğitimde ölçme ve değerlendirme. Yargı Yayınevi.
  • Walsh, W.B., & Betz, N.E. (1995). Tests and assessment. Prentice-Hall, Inc.
  • Wiseman, C.S. (2012). Rater effects: Ego engagement in rater decision-making. Assessing Writing, 17(3), 150-173. https://doi.org/10.1016/j.asw.2011.12.001
There are 47 citations in total.

Details

Primary Language English
Subjects Other Fields of Education
Journal Section Articles
Authors

Ayfer Sayın 0000-0003-1357-5674

Mehmet Şata 0000-0003-2683-4997

Publication Date December 22, 2022
Submission Date January 15, 2022
Published in Issue Year 2022 Volume: 9 Issue: 4

Cite

APA Sayın, A., & Şata, M. (2022). Using Rasch analysis to examine raters’ expertise Turkish teacher candidates’ competency levels in writing different types of test items. International Journal of Assessment Tools in Education, 9(4), 998-1012. https://doi.org/10.21449/ijate.1058300

23823             23825             23824