Research Article
BibTex RIS Cite
Year 2021, Volume: 8 Issue: 4, 136 - 151, 31.10.2021

Abstract

References

  • Abu Kassim, N.L. (2007). Exploring rater judging behaviour using the many-facet Rasch model. Paper Presented in the Second Biennial International Conference on Teaching and Learning of English in Asia: Exploring New Frontiers (TELiA2), Universiti Utara, Malaysia. http://repo.uum.edu.my/3212/
  • Anastasi, A. (1976). Psychological testing (4th ed.). Macmillan.
  • Akpınar, M. (2019). The effect of peer assessment on pre-service teachers' teaching Practices. Education & Science, 44(200), 269-290. https://doi.org/10.15390/EB.2019.8077
  • Baird, J. A., Hayes, M., Johnson, R., Johnson, S., & Lamprianou, I. (2013). Marker effects and examination reliability. A Comparative exploration from the perspectives of generalisability theory, Rash model and multilevel modelling. University of Oxford for Educational Assessment. Retrieved from https://dera.ioe.ac.uk/17683/1/2013-01-21-marker-effects-and-examination-reliability.pdf
  • Barkaoui, K. (2013). Multifaceted Rasch analysis for test evaluation. The companion to language assessment, 3, 1301-1322. https://doi.org/10.1002/9781118411360.wbcla070
  • Bennett, J. (1998). Human resources management. Prentice Hall.
  • Bond, T., & Fox, C. M. (2015). Applying the Rasch model: Fundamental measurement in the human sciences. Routledge. https://doi.org/10.4324/9781315814698
  • Bonk, W. J., & Ockey, G. J. (2003). A many-facet Rasch analysis of the second language group oral discussion task. Language Testing, 20(1), 89-110. https://doi.org/10.1191/0265532203lt245oa
  • Boud, D., & Soler, R. (2016). Sustainable assessment revisited. Assessment & Evaluation in Higher Education, 41(3), 400-413. https://doi.org/10.1080/02602938.2015.1018133
  • Brown, J. D., & Hudson, T. (1998). The alternatives in language assessment. TESOL quarterly, 32(4), 653-675. https://doi.org/10.2307/3587999
  • Cetin, B., & Ilhan, M. (2017). An analysis of rater severity and leniency in open-ended mathematic questions rated through standard rubrics and rubrics based on the SOLO taxonomy. Education and Science, 42(189), 217-247. https://doi.org/10.15390/EB.2017.5082
  • Chen, W. H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265-289. https://doi.org/10.2307/1165285
  • Cheng, K. H., & Tsai, C. C. (2012). Students' interpersonal perspectives on, conceptions of and approaches to learning in online peer assessment. Australasian Journal of Educational Technology, 28(4), 599-618. https://doi.org/10.14742/ajet.830
  • Chester, A., & Gwynne, G. (2006). Online teaching: encouraging collaboration through anonymity. Journal of Computer-Mediated Communication, 4(2), JCMC424. https://doi.org/10.1111/j.1083-6101.1998.tb00096.x
  • Cho, K., & MacArthur, C. (2010). Student revision with peer and expert reviewing. Learning and instruction, 20(4), 328-338. https://doi.org/10.1016/j.learninstruc.2009.08.006
  • Cronbach, L.I. (1990). Essentials of psychological testing. Harper and Row Publishers.
  • Engelhard, G. (1994). Examining rater errors in the assessment of written composition with a many‐faceted Rasch model. Journal of Educational Measurement, 31(2), 93-112. https://doi.org/10.1111/j.1745-3984.1994.tb00436.x
  • Esfandiari, R. (2015). Rater errors among peer-assessors: applying the many-facet Rasch measurement model. Iranian Journal of Applied Linguistics, 18(2), 77-107. https://doi.org/10.18869/acadpub.ijal.18.2.77
  • Farrokhi, F., & Esfandiari, R. (2011). A many-facet Rasch model to detect halo effect in three types of raters. Theory & Practice in Language Studies, 1(11), 1531-1540. https://doi.org/10.4304/tpls.1.11.1531-1540
  • Farrokhi, F., Esfandiari, R., & Vaez Dalili, M. (2011). Applying the many-facet Rasch model to detect centrality in self-assessment, peer-assessment and teacher assessment. World Applied Sciences Journal, 15(11), 76-83.
  • Freeman, M., & McKenzie, J. (2000). Self and Peer Assessment of Student Teamwork: Designing, implementing and evaluating SPARK, a confidential, web based system. Flexible learning for a flexible society. Retrieved from https://ascilite.org/archived-journals/aset/confs/aset-herdsa2000/procs/freeman.html
  • Goodrich, H. (1997). Understanding Rubrics: The dictionary may define" rubric," but these models provide more clarity. Educational Leadership, 54(4), 14-17.
  • Güneş, P., & Kiliç, D. (2016). Dereceli puanlama anahtarı ile öz, akran ve öğretmen değerlendirmesi. Mehmet Akif Ersoy Üniversitesi Eğitim Fakültesi Dergisi, 1(39), 58-69. https://doi.org/10.21764/efd.93792
  • Haiyang, S. (2010). An application of classical test theory and many facet Rasch measurement in analyzing the reliability of an English test for non-English major graduates. Chinese Journal of Applied Linguistics, 33(2), 87-102.
  • Haladyna, T. M. (1997). Writing test items in order to evaluate higher order thinking. Allyn & Bacon.
  • Hansen, K. (2003). Writing in the social sciences: A rhetoric with readings. Pearson Custom.
  • Hosack, I. (2004). The effects of anonymous feedback on Japanese university students’ attitudes towards peer review. In R. Hogaku (Ed.), Language and its universe (pp. 297–322). Ritsumeikan Hogaku.
  • Ilhan, M. (2016). A Comparison of the Ability Estimations of Classical Test Theory and the Many Facet Rasch Model in Measurements with Open-ended Questions. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 31(2), 346-368.
  • Ivankova, N. V., Creswell, J. W., & Stick, S. L. (2006). Using mixed-methods sequential explanatory design: From theory to practice. Field methods, 18(1), 3-20. https://doi.org/10.1177/1525822X05282260
  • Kane, J., Bernardin, H., Villanueva, J., & Peyrefitte, J. (1995). Stability of rater leniency: Three studies. Academy of Management Journal, 38, 1036-1051. https://doi.org/10.2307/256619
  • Khaatri, N., Kane, M.B., & Reeve, A.L. (1995). How performance assessments affect teaching and learning. Educational Leadership, 53(3), 80-83.
  • Kim, Y., Park, I., & Kang, M. (2012). Examining rater effects of the TGMD-2 on children with intellectual disability. Adapted Physical Activity Quarterly, 29(4), 346-365. https://doi.org/10.1123/apaq.29.4.346
  • Kingsbury, F. A. (1922). Analyzing ratings and training raters. Journal of Personnel Research, 1, 377–383.
  • Knoch, U., Fairbairn, J., Myford, C., & Huisman, A. (2018). Evaluating the relative effectiveness of online and face-to-face training for new writing raters. Papers in Language Testing and Assessment, 7(1), 61-86.
  • Kutlu, Ö., Doğan, C.D., & Karaya, İ. (2014). Öğrenci başarısının belirlenmesi: Performansa ve portfolyoya dayalı durum belirleme [Determining student success: Determination based on performance and portfolio]. Pegem Akademi Yayıncılık.
  • Lawshe, C. H. (1975). A quantitative approach to content validity. Personnel psychology, 28(4), 563-575. https://doi.org/10.1111/j.1744-6570.1975.tb01393.x
  • Lawshe, C. H. (1985). Inferences from personnel tests and their validity. Journal of Applied Psychology, 70(1), 237-238. https://doi.org/10.1037/0021-9010.70.1.237
  • Li, L. (2017). The role of anonymity in peer assessment. Assessment & Evaluation in Higher Education, 42(4), 645-656. https://doi.org/10.1080/02602938.2016.1174766
  • Li, L., & Gao, F. (2016). The effect of peer assessment on project performance of students at different learning levels. Assessment & Evaluation in Higher Education, 41(6), 885-900. https://doi.org/10.1080/02602938.2015.1048185
  • Li, L., Liu, X., & Zhou, Y. (2012). Give and take: A re‐analysis of assessor and assessee's roles in technology‐facilitated peer assessment. British Journal of Educational Technology, 43(3), 376-384. https://doi.org/10.1111/j.1467-8535.2011.01180.x
  • Linacre, J. M. (2017). A user’s guide to FACETS: Rasch-model computer programs. MESA Press.
  • Mackay, A., & Gass, S. (2005). Second Language Research: Methodology and Design. Lawrence Erlbaum Associates.
  • May, G. L. (2008). The effect of rater training on reducing social style bias in peer evaluation. Business Communication Quarterly, 71(3), 297-313. https://doi.org/10.1177/1080569908321431
  • Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149-174.
  • Messick, S. (1996). Validity and washback in language testing. Language Testing, 13(3), 241-256. https://doi.org/10.1177/026553229601300302
  • Miyazoe, T., & Anderson, T. (2011). Anonymity in blended learning: who would you like to be?. Journal of Educational Technology & Society, 14(2), 175–187.
  • Moore, B.B. (2009). Consideration of rater effects and rater design via signal detection theory. (Unpublished Doctoral dissertation). Retrieved from http://www.proquest.com/
  • Moskal, B. M. (2000). Scoring rubrics: What, when and how? Practical Assessment, Research, and Evaluation, 7(1), 3.
  • Myford, C. M. (2002). Investigating design features of descriptive graphic rating scales. Applied Measurement in Education, 15(2), 187–215. https://doi.org/10.1207/S15324818AME1502_04
  • Myford, C.M., & Wolfe, E.W. (2004). Detecting and measuring rater effects using many- facet Rasch measurement: Part II. Journal of Applied Measurement, 5(2), 189-227.
  • Newby, D., Allan, R., Fenner, A. B., Jones, B., Komorowska, H., & Soghikyan, K. (2007). European Portfolio for Student Teachers of Languages: A reflection tool for language teacher education. Council of Europe.
  • Özdemir, O., & Erdem, D. (2017). Sunum becerilerinin akran değerlendirmesine arkadaşlığın etkisi. Turkish Journal of Educational Studies, 4(1), 21-43.
  • Panadero, E. (2016). Is it safe? Social, interpersonal, and human effects of peer assessment: a review and future directions. In G. T. L. Brown & L. R. Harris (Eds.), Human factors and social conditions of assessment (pp. 1–39). Routledge.
  • Panadero, E., Romero, M., & Strijbos, J-W (2013). The impact of a rubric and friendship on peer assessment: Effects on construct validity, performance, and perceptions of fairness and comfort. Studies in Educational Evaluation, 39(4), 195–203. https://doi.org/10.1016/j.stueduc.2013.10.005
  • Papinczak, T., Young, L., Groves, M., & Haynes, M. (2007). An analysis of peer, self, and tutor assessment in problem-based learning tutorials. Medical teacher, 29(5), e122-e132. https://doi.org/10.1080/01421590701294323
  • Pope, N. K. L. (2005). The impact of stress in self- and peer assessment. Assessment & Evaluation in Higher Education, 30(1), 51-63. https://doi.org/10.1080/0260293042003243896
  • Romagnano, L. (2001). The myth of objectivity in mathematics assessment. Mathematics Teacher, 94(1), 31-37. https://doi.org/10.5951/MT.94.1.0031
  • Rotsaert, T., Panadero, E., & Schellens, T. (2018). Anonymity as an instructional scaffold in peer assessment: its effects on peer feedback quality and evolution in students’ perceptions about peer assessment skills. European Journal of Psychology of Education, 33(1), 75-99. https://doi.org/10.1007/s10212-017-0339-8
  • Royal, K. D., & Hecker, K. G. (2016). Rater errors in clinical performance assessments. Journal of veterinary medical education, 43(1), 5-8. https://doi.org/10.3138/jvme.0715-112R
  • Schools, C. C. P., & Chesterfıeld, V. (2015). Performance evaluation handbook for teachers. Regina, SK. https://www.nctq.org/dmsView/70-07
  • Schoonenboom, J., & Johnson, R. B. (2017). How to construct a mixed methods research design. KZfSS Kölner Zeitschrift für Soziologie und Sozialpsychologie, 69(2), 107-131. https://doi.org/10.1007/s11577-017-0454-1
  • Sudweeks, R. R., Reeve, S. & Bradshaw, W.S. (2005). A comparison of generalizability theory and many-facet Rasch measurement in an analysis of college sophomore writing. Assessing Writing, 9, 239-261. https://doi.org/10.1016/j.asw.2004.11.001
  • Sung, Y. T., Chang, K. E., Chang, T. H., & Yu, W. C. (2010). How many heads are better than one? The reliability and validity of teenagers' self-and peer assessments. Journal of Adolescence, 33(1), 135-145. https://doi.org/10.1016/j.adolescence.2009.04.004
  • Trace, J., Janssen, G., & Meier, V. (2017). Measuring the impact of rater negotiation in writing performance assessment. Language Testing, 34(1), 3–22. https://doi.org/10.1177/0265532215594830
  • Vanderhoven, E., Raes, A., Montrieux, H., Rotsaert, T., & Schellens, T. (2015). What if pupils can assess their peers anonymously? A quasi-experimental study. Computers & Education, 81, 123–132. https://doi.org/10.1016/j.compedu.2014.10.001
  • Vickerman, P. (2009). Student perspectives on formative peer assessment: an attempt to deepen learning?. Assessment & Evaluation in Higher Education, 34(2), 221-230. https://doi.org/10.1080/02602930801955986
  • Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing, 15(2), 263-287. https://doi.org/10.1177/026553229801500205
  • Welsh, E. (2002, May). Dealing with data: Using NVivo in the qualitative data analysis process. In Forum Qualitative Sozialforschung/Forum: Qualitative Social Research, 3(2). Retrieve from http://www.qualitative-research.net/index.php/fqs/article/view/865/1881
  • Wilson, F. R., Pan, W., & Schumsky, D. A. (2012). Recalculation of the critical values for Lawshe’s content validity ratio. Measurement and Evaluation in Counseling and Development, 45(3), 197-210. https://doi.org/10.1177/0748175612440286
  • Yu, F. Y., & Liu, Y. H. (2009). Creating a psychologically safe online space for a student‐generated questions learning activity via different identity revelation modes. British Journal of Educational Technology, 40(6), 1109-1123. https://doi.org/10.1111/j.1467-8535.2008.00905.x
  • Yu, F. Y., & Sung, S. (2016). A mixed methods approach to the assessor's targeting behavior during online peer assessment: effects of anonymity and underlying reasons. Interactive learning environments, 24(7), 1674-1691. https://doi.org/10.1080/10494820.2015.1041405

Examining Rater Biases of Peer Assessors in Different Assessment Environments

Year 2021, Volume: 8 Issue: 4, 136 - 151, 31.10.2021

Abstract

The current study employed many-facet Rasch measurement (MFRM) to explain the rater bias patterns of EFL student teachers (hereafter students) when they rate the teaching performance of their peers in three assessment environments: online, face-to-face, and anonymous. Twenty-four students and two instructors rated 72 micro-teachings performed by senior Turkish students. The performance was assessed using a five-category analytic rubric developed by the researchers (Lesson Presentation, Classroom Management, Communication, Material, and Instructional Feedback). MFRM revealed the severity and leniency biases in all three assessment environments at the group and individual levels, drawing attention to the less occurrence of biases anonymous assessment. The central tendency and halo effects were observed only at the individual level in all three assessment environments, and these errors were similar to each other. Semi-structured interviews with peer raters (n = 24) documented their perspectives about how the anonymous assessment affected the severity, leniency, central tendency, and halo effects. Besides, the findings displayed that hiding the identity of the peers develops the reliability and validity of the measurements performed during peer assessment.

References

  • Abu Kassim, N.L. (2007). Exploring rater judging behaviour using the many-facet Rasch model. Paper Presented in the Second Biennial International Conference on Teaching and Learning of English in Asia: Exploring New Frontiers (TELiA2), Universiti Utara, Malaysia. http://repo.uum.edu.my/3212/
  • Anastasi, A. (1976). Psychological testing (4th ed.). Macmillan.
  • Akpınar, M. (2019). The effect of peer assessment on pre-service teachers' teaching Practices. Education & Science, 44(200), 269-290. https://doi.org/10.15390/EB.2019.8077
  • Baird, J. A., Hayes, M., Johnson, R., Johnson, S., & Lamprianou, I. (2013). Marker effects and examination reliability. A Comparative exploration from the perspectives of generalisability theory, Rash model and multilevel modelling. University of Oxford for Educational Assessment. Retrieved from https://dera.ioe.ac.uk/17683/1/2013-01-21-marker-effects-and-examination-reliability.pdf
  • Barkaoui, K. (2013). Multifaceted Rasch analysis for test evaluation. The companion to language assessment, 3, 1301-1322. https://doi.org/10.1002/9781118411360.wbcla070
  • Bennett, J. (1998). Human resources management. Prentice Hall.
  • Bond, T., & Fox, C. M. (2015). Applying the Rasch model: Fundamental measurement in the human sciences. Routledge. https://doi.org/10.4324/9781315814698
  • Bonk, W. J., & Ockey, G. J. (2003). A many-facet Rasch analysis of the second language group oral discussion task. Language Testing, 20(1), 89-110. https://doi.org/10.1191/0265532203lt245oa
  • Boud, D., & Soler, R. (2016). Sustainable assessment revisited. Assessment & Evaluation in Higher Education, 41(3), 400-413. https://doi.org/10.1080/02602938.2015.1018133
  • Brown, J. D., & Hudson, T. (1998). The alternatives in language assessment. TESOL quarterly, 32(4), 653-675. https://doi.org/10.2307/3587999
  • Cetin, B., & Ilhan, M. (2017). An analysis of rater severity and leniency in open-ended mathematic questions rated through standard rubrics and rubrics based on the SOLO taxonomy. Education and Science, 42(189), 217-247. https://doi.org/10.15390/EB.2017.5082
  • Chen, W. H., & Thissen, D. (1997). Local dependence indexes for item pairs using item response theory. Journal of Educational and Behavioral Statistics, 22(3), 265-289. https://doi.org/10.2307/1165285
  • Cheng, K. H., & Tsai, C. C. (2012). Students' interpersonal perspectives on, conceptions of and approaches to learning in online peer assessment. Australasian Journal of Educational Technology, 28(4), 599-618. https://doi.org/10.14742/ajet.830
  • Chester, A., & Gwynne, G. (2006). Online teaching: encouraging collaboration through anonymity. Journal of Computer-Mediated Communication, 4(2), JCMC424. https://doi.org/10.1111/j.1083-6101.1998.tb00096.x
  • Cho, K., & MacArthur, C. (2010). Student revision with peer and expert reviewing. Learning and instruction, 20(4), 328-338. https://doi.org/10.1016/j.learninstruc.2009.08.006
  • Cronbach, L.I. (1990). Essentials of psychological testing. Harper and Row Publishers.
  • Engelhard, G. (1994). Examining rater errors in the assessment of written composition with a many‐faceted Rasch model. Journal of Educational Measurement, 31(2), 93-112. https://doi.org/10.1111/j.1745-3984.1994.tb00436.x
  • Esfandiari, R. (2015). Rater errors among peer-assessors: applying the many-facet Rasch measurement model. Iranian Journal of Applied Linguistics, 18(2), 77-107. https://doi.org/10.18869/acadpub.ijal.18.2.77
  • Farrokhi, F., & Esfandiari, R. (2011). A many-facet Rasch model to detect halo effect in three types of raters. Theory & Practice in Language Studies, 1(11), 1531-1540. https://doi.org/10.4304/tpls.1.11.1531-1540
  • Farrokhi, F., Esfandiari, R., & Vaez Dalili, M. (2011). Applying the many-facet Rasch model to detect centrality in self-assessment, peer-assessment and teacher assessment. World Applied Sciences Journal, 15(11), 76-83.
  • Freeman, M., & McKenzie, J. (2000). Self and Peer Assessment of Student Teamwork: Designing, implementing and evaluating SPARK, a confidential, web based system. Flexible learning for a flexible society. Retrieved from https://ascilite.org/archived-journals/aset/confs/aset-herdsa2000/procs/freeman.html
  • Goodrich, H. (1997). Understanding Rubrics: The dictionary may define" rubric," but these models provide more clarity. Educational Leadership, 54(4), 14-17.
  • Güneş, P., & Kiliç, D. (2016). Dereceli puanlama anahtarı ile öz, akran ve öğretmen değerlendirmesi. Mehmet Akif Ersoy Üniversitesi Eğitim Fakültesi Dergisi, 1(39), 58-69. https://doi.org/10.21764/efd.93792
  • Haiyang, S. (2010). An application of classical test theory and many facet Rasch measurement in analyzing the reliability of an English test for non-English major graduates. Chinese Journal of Applied Linguistics, 33(2), 87-102.
  • Haladyna, T. M. (1997). Writing test items in order to evaluate higher order thinking. Allyn & Bacon.
  • Hansen, K. (2003). Writing in the social sciences: A rhetoric with readings. Pearson Custom.
  • Hosack, I. (2004). The effects of anonymous feedback on Japanese university students’ attitudes towards peer review. In R. Hogaku (Ed.), Language and its universe (pp. 297–322). Ritsumeikan Hogaku.
  • Ilhan, M. (2016). A Comparison of the Ability Estimations of Classical Test Theory and the Many Facet Rasch Model in Measurements with Open-ended Questions. Hacettepe Üniversitesi Eğitim Fakültesi Dergisi, 31(2), 346-368.
  • Ivankova, N. V., Creswell, J. W., & Stick, S. L. (2006). Using mixed-methods sequential explanatory design: From theory to practice. Field methods, 18(1), 3-20. https://doi.org/10.1177/1525822X05282260
  • Kane, J., Bernardin, H., Villanueva, J., & Peyrefitte, J. (1995). Stability of rater leniency: Three studies. Academy of Management Journal, 38, 1036-1051. https://doi.org/10.2307/256619
  • Khaatri, N., Kane, M.B., & Reeve, A.L. (1995). How performance assessments affect teaching and learning. Educational Leadership, 53(3), 80-83.
  • Kim, Y., Park, I., & Kang, M. (2012). Examining rater effects of the TGMD-2 on children with intellectual disability. Adapted Physical Activity Quarterly, 29(4), 346-365. https://doi.org/10.1123/apaq.29.4.346
  • Kingsbury, F. A. (1922). Analyzing ratings and training raters. Journal of Personnel Research, 1, 377–383.
  • Knoch, U., Fairbairn, J., Myford, C., & Huisman, A. (2018). Evaluating the relative effectiveness of online and face-to-face training for new writing raters. Papers in Language Testing and Assessment, 7(1), 61-86.
  • Kutlu, Ö., Doğan, C.D., & Karaya, İ. (2014). Öğrenci başarısının belirlenmesi: Performansa ve portfolyoya dayalı durum belirleme [Determining student success: Determination based on performance and portfolio]. Pegem Akademi Yayıncılık.
  • Lawshe, C. H. (1975). A quantitative approach to content validity. Personnel psychology, 28(4), 563-575. https://doi.org/10.1111/j.1744-6570.1975.tb01393.x
  • Lawshe, C. H. (1985). Inferences from personnel tests and their validity. Journal of Applied Psychology, 70(1), 237-238. https://doi.org/10.1037/0021-9010.70.1.237
  • Li, L. (2017). The role of anonymity in peer assessment. Assessment & Evaluation in Higher Education, 42(4), 645-656. https://doi.org/10.1080/02602938.2016.1174766
  • Li, L., & Gao, F. (2016). The effect of peer assessment on project performance of students at different learning levels. Assessment & Evaluation in Higher Education, 41(6), 885-900. https://doi.org/10.1080/02602938.2015.1048185
  • Li, L., Liu, X., & Zhou, Y. (2012). Give and take: A re‐analysis of assessor and assessee's roles in technology‐facilitated peer assessment. British Journal of Educational Technology, 43(3), 376-384. https://doi.org/10.1111/j.1467-8535.2011.01180.x
  • Linacre, J. M. (2017). A user’s guide to FACETS: Rasch-model computer programs. MESA Press.
  • Mackay, A., & Gass, S. (2005). Second Language Research: Methodology and Design. Lawrence Erlbaum Associates.
  • May, G. L. (2008). The effect of rater training on reducing social style bias in peer evaluation. Business Communication Quarterly, 71(3), 297-313. https://doi.org/10.1177/1080569908321431
  • Masters, G. N. (1982). A Rasch model for partial credit scoring. Psychometrika, 47(2), 149-174.
  • Messick, S. (1996). Validity and washback in language testing. Language Testing, 13(3), 241-256. https://doi.org/10.1177/026553229601300302
  • Miyazoe, T., & Anderson, T. (2011). Anonymity in blended learning: who would you like to be?. Journal of Educational Technology & Society, 14(2), 175–187.
  • Moore, B.B. (2009). Consideration of rater effects and rater design via signal detection theory. (Unpublished Doctoral dissertation). Retrieved from http://www.proquest.com/
  • Moskal, B. M. (2000). Scoring rubrics: What, when and how? Practical Assessment, Research, and Evaluation, 7(1), 3.
  • Myford, C. M. (2002). Investigating design features of descriptive graphic rating scales. Applied Measurement in Education, 15(2), 187–215. https://doi.org/10.1207/S15324818AME1502_04
  • Myford, C.M., & Wolfe, E.W. (2004). Detecting and measuring rater effects using many- facet Rasch measurement: Part II. Journal of Applied Measurement, 5(2), 189-227.
  • Newby, D., Allan, R., Fenner, A. B., Jones, B., Komorowska, H., & Soghikyan, K. (2007). European Portfolio for Student Teachers of Languages: A reflection tool for language teacher education. Council of Europe.
  • Özdemir, O., & Erdem, D. (2017). Sunum becerilerinin akran değerlendirmesine arkadaşlığın etkisi. Turkish Journal of Educational Studies, 4(1), 21-43.
  • Panadero, E. (2016). Is it safe? Social, interpersonal, and human effects of peer assessment: a review and future directions. In G. T. L. Brown & L. R. Harris (Eds.), Human factors and social conditions of assessment (pp. 1–39). Routledge.
  • Panadero, E., Romero, M., & Strijbos, J-W (2013). The impact of a rubric and friendship on peer assessment: Effects on construct validity, performance, and perceptions of fairness and comfort. Studies in Educational Evaluation, 39(4), 195–203. https://doi.org/10.1016/j.stueduc.2013.10.005
  • Papinczak, T., Young, L., Groves, M., & Haynes, M. (2007). An analysis of peer, self, and tutor assessment in problem-based learning tutorials. Medical teacher, 29(5), e122-e132. https://doi.org/10.1080/01421590701294323
  • Pope, N. K. L. (2005). The impact of stress in self- and peer assessment. Assessment & Evaluation in Higher Education, 30(1), 51-63. https://doi.org/10.1080/0260293042003243896
  • Romagnano, L. (2001). The myth of objectivity in mathematics assessment. Mathematics Teacher, 94(1), 31-37. https://doi.org/10.5951/MT.94.1.0031
  • Rotsaert, T., Panadero, E., & Schellens, T. (2018). Anonymity as an instructional scaffold in peer assessment: its effects on peer feedback quality and evolution in students’ perceptions about peer assessment skills. European Journal of Psychology of Education, 33(1), 75-99. https://doi.org/10.1007/s10212-017-0339-8
  • Royal, K. D., & Hecker, K. G. (2016). Rater errors in clinical performance assessments. Journal of veterinary medical education, 43(1), 5-8. https://doi.org/10.3138/jvme.0715-112R
  • Schools, C. C. P., & Chesterfıeld, V. (2015). Performance evaluation handbook for teachers. Regina, SK. https://www.nctq.org/dmsView/70-07
  • Schoonenboom, J., & Johnson, R. B. (2017). How to construct a mixed methods research design. KZfSS Kölner Zeitschrift für Soziologie und Sozialpsychologie, 69(2), 107-131. https://doi.org/10.1007/s11577-017-0454-1
  • Sudweeks, R. R., Reeve, S. & Bradshaw, W.S. (2005). A comparison of generalizability theory and many-facet Rasch measurement in an analysis of college sophomore writing. Assessing Writing, 9, 239-261. https://doi.org/10.1016/j.asw.2004.11.001
  • Sung, Y. T., Chang, K. E., Chang, T. H., & Yu, W. C. (2010). How many heads are better than one? The reliability and validity of teenagers' self-and peer assessments. Journal of Adolescence, 33(1), 135-145. https://doi.org/10.1016/j.adolescence.2009.04.004
  • Trace, J., Janssen, G., & Meier, V. (2017). Measuring the impact of rater negotiation in writing performance assessment. Language Testing, 34(1), 3–22. https://doi.org/10.1177/0265532215594830
  • Vanderhoven, E., Raes, A., Montrieux, H., Rotsaert, T., & Schellens, T. (2015). What if pupils can assess their peers anonymously? A quasi-experimental study. Computers & Education, 81, 123–132. https://doi.org/10.1016/j.compedu.2014.10.001
  • Vickerman, P. (2009). Student perspectives on formative peer assessment: an attempt to deepen learning?. Assessment & Evaluation in Higher Education, 34(2), 221-230. https://doi.org/10.1080/02602930801955986
  • Weigle, S. C. (1998). Using FACETS to model rater training effects. Language Testing, 15(2), 263-287. https://doi.org/10.1177/026553229801500205
  • Welsh, E. (2002, May). Dealing with data: Using NVivo in the qualitative data analysis process. In Forum Qualitative Sozialforschung/Forum: Qualitative Social Research, 3(2). Retrieve from http://www.qualitative-research.net/index.php/fqs/article/view/865/1881
  • Wilson, F. R., Pan, W., & Schumsky, D. A. (2012). Recalculation of the critical values for Lawshe’s content validity ratio. Measurement and Evaluation in Counseling and Development, 45(3), 197-210. https://doi.org/10.1177/0748175612440286
  • Yu, F. Y., & Liu, Y. H. (2009). Creating a psychologically safe online space for a student‐generated questions learning activity via different identity revelation modes. British Journal of Educational Technology, 40(6), 1109-1123. https://doi.org/10.1111/j.1467-8535.2008.00905.x
  • Yu, F. Y., & Sung, S. (2016). A mixed methods approach to the assessor's targeting behavior during online peer assessment: effects of anonymity and underlying reasons. Interactive learning environments, 24(7), 1674-1691. https://doi.org/10.1080/10494820.2015.1041405
There are 71 citations in total.

Details

Primary Language English
Subjects Other Fields of Education
Journal Section Research Article
Authors

Sabahattin Yeşilçınar 0000-0001-6457-0211

Mehmet Şata 0000-0003-2683-4997

Publication Date October 31, 2021
Published in Issue Year 2021 Volume: 8 Issue: 4

Cite

APA Yeşilçınar, S., & Şata, M. (2021). Examining Rater Biases of Peer Assessors in Different Assessment Environments. International Journal of Psychology and Educational Studies, 8(4), 136-151.