Üniversitede Öğretimi Değerlendirme Puanlarının Ölçme Değişmezliği
Year 2019,
Volume: 34 Issue: 2, 402 - 417, 30.04.2019
İlker Kalender
,
Giray Berberoğlu
Abstract
Bu çalışmada üniversite düzeyinde öğretimi değerlendirme anketi
puanlarının üst ve alt başarı grupları arasındaki ölçme değişmezliği
incelenmiştir. Başarı düzeyleri öğrencilerin not beklentileri ve dersin sene
sonu başarı ortalaması şeklinde iki ölçüt ile belirlenmiştir. Çalışmanın verisi
625 dersten oluşmaktadır. Yedi maddelik anketin (i) faktör yapısı, (ii) faktör
yükleri, (iii) madde ortalamaları ve (iv) hata varyanslarının değişmezliği iki
başarı grubu arasında analiz edilmiştir. Öğrencilerin not beklentilerine göre,
üst ve alt başarı grupları hata varyanslarının değişmezliği dışında değişmez
ölçek özellikleri görülmüştür. Diğer yandan, sene sonu notlarına göre yapılan
inceleme sonucunda, madde ortalamaları ve hata varyanslarında tam değişmezlik
gözlemlenmemiştir. Çalışma sonuçları öğretim elemanı değerlendirme puanlarının
sınıflar ve dersler arasında başarı düzeylerinden bağımsız olarak
karşılaştırılmasının yanıltıcı sonuçlar doğurabileceğini göstermektedir. Üst ve
alt başarı gruplarının ölçek başlangıç noktaları farklı olduğu için bu durum
özellikle ciddi kararlar alınması aşamalarında daha önem kazanmaktadır.
References
- Ackerman, D., Gross, B. L., & Vigneron, F. (2009). Peer observation reports and student evaluations of teaching: Who are the experts? Alberta Journal of Educational Research, 55(1), 18-39.
- Baas, M., De Dreu, C. K. W., & Nijstad, B. A. (2011). When prevention promotes creativity: the role of mood, regulatory focus, and regulatory closure. Journal of Personality and Social Psychology, 100, 794–809.
- Benton, S. L. & Cashin, W. E. (2012). Student ratings of teaching: A summary of research and literature. (No. 50). Manhattan, KS: IDEA Center.
- Benton, S. L., Duchon, D., & Pallett, W. H. (2013). Validity of self-reported student ratings of instruction. Assessment & Evaluation in Higher Education, 38, 377-389.
- Bowman, N. A. (2010). Can 1st-year college students accurately report their learning and development? American Educational Research Journal, 47(2), 466-496.
- Brockx, B., Spooren, P., & Mortelmans, D. (2011). Taking the grading leniency story to the edge. The influence of student, teacher, and course characteristics on student evaluations of teaching in higher education. Educational Assessment, Evaluation and Accountability, 23(4), 289-306.
- Byrne, B. M. (2004). Testing for multigroup invariance using AMOS graphics: A road less traveled. Structural Equation Modeling, 11, 272–300.
- Byrne, B. M. (2010). Structural equation modeling with AMOS: Basic concepts, applications, and programming. Routledge.
- Byrne, B. M., Shavelson, R. J., & Muthén, B. (1989). Testing for the equivalence of factor covariance and mean structures: the issue of partial measurement invariance. Psychological Bulletin, 105, 456-466.
- Bryant, F. B., & Satorra, A. (2012). Principles and practice of scaled difference chi-square testing. Structural Equation Modeling, 19(3), 372-398.
- Clayson, D. E. (2009). Student evaluations of teaching: are they related to what students learn?: a meta-analysis and review of the literature. Journal of Marketing Education, 31, 16-30.
- Cole, J. S., & Gonyea, R. M. (2010). Accuracy of self-reported sat and act test scores: Implications for research. Research in Higher Education, 51(3), 305–319.
- Cruse, D. B. (1987). Student evaluations of the university professor: caveat professor. Higher Education, 16, 723-737.
- Dimitrov, D. M. (2010). Testing for factorial invariance in the context of construct validation. Measurement and Evaluation in Counselling and Development, 43(2), 121-149.
- Ehie, I. C., & Karathanos, K. (1994). Business faculty performance evaluation based on the new aacsbi accreditation standards. Journal of Education for Business, 69, 257-262.
- Ellis, B. B., Becker, P., & Kimmel, H. D. (1993). An item response theory evaluation of an English version of the trier personality inventory (TPI). Journal of Cross-cultural psychology, 24(2), 133-148.
- Felton, J., Mitchell, J., & Stinson, M. (2004). Web-based student evaluations of professors: the relations between perceived quality, easiness and sexiness.Assessment & Evaluation in Higher Education, 29(1), 91-108.
- Finney S. J., & DiStefano, C. (2013). Nonnormal and categorical data in structural equation modeling. In G. R. Hancock y R. O. Mueller (Eds.), Structural Equation Modeling: A second course (pp. 269- 492). Greenwich, CT: Information Age Publishing Inc.
- Gelman, A. & Park, D. K. (2008). Splitting a predictor at the upper quarter or third and the lower quarter or third. The American Statistician, 62(4), 1-8.
- Grammatikopoulos, V., Linardakis, M., Gregoriadis, A., & Oikonomidis, V. (2015). Assessing the Students’ Evaluations of Educational Quality (SEEQ) questionnaire in Greek higher education. Higher Education, 70(3), 395-408.
- Greenwald, A. G., & Gillmore, G. M. (1997a). Grading leniency is a removable contaminant of student ratings. American Psychologist, 52, 1209–1217.
- Greenwald, A. G., & Gillmore, J. M. (1997b). No pain, no gain? The importance of measuring course workload in student ratings of instruction. Journal of EducationalPsychology, 89, 743–751.
- Grimes, P. W. (2002). The overconfident principles of economics students: an examination of metacognitive skill. Journal of Economic Education, 33(1), 15-30.
- Harrison, P. D., Douglas D. K., & Burdsal, C. A. (2004). The relative merits of different types of overall evaluations of teaching effectiveness. Research in Higher Education, 45(3), 311–323.
- Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55.
- Jöreskog, K. G. (1971). Simultaneous factor analysis in several populations. Psychometrika, 36(4), 409-426.
- Jöreskog, K. G., & Sörbom, D. (1996). LISREL 8: user’s reference guide. Chicago, IL: Scientific Software International.
- Kennedy, E. J., Lawton, L., & Plumlee, E. L. (2002) Blissful ignorance: the problem of unrecognized incompetence and academic performance, Journal of Marketing Education, 24(3), 243-252.
- Kulik, J. A. (2001). Student ratings: validity, utility, and controversy. New directions for Institutional Research, 109, 9-25.
- Langbein, L. (2008). Management by results: Student evaluation of faculty teaching and the mis-measurement of performance. Economics of Education Review, 27(4), 417-428.
- Levine, D. W., Kaplan, R. M., Kripke, D. F., Bowen, D. J., Naughton, M. J., Shumaker, S. A. (2003). Factor structure and measurement invariance of the women's health initiative insomnia rating scale. Psychological Assessment, 15(2), 123-136.
- Lubke, G. H., Dolan, C. V., Kelderman, H. & Mellerbergh, G. J. (2003). On the relationship between sources of within- and between-group differences and measurement invariance in the common factor model. Intelligence, 31, 543-566.
- Macfadyen, L. P., Dawson, S., Prest, S., & Gašević, D. (2016). Whose feedback? A multilevel analysis of student completion of end-of-term teaching evaluations. Assessment & Evaluation in Higher Education, 41, 821-839.
- Machina, K. (1987). Evaluating student evaluations. Academe, 73(3), 19-22.
- MacNell, L., Driscoll, A., & Hunt, A. N. (2015). What’s in a name: exposing gender bias in student ratings of teaching. Innovative Higher Education, 40(4), 291-303.
- Marsh, H. W., & Roche, L. A. (1997). Making students' evaluations of teaching effectiveness effective: The critical issues of validity, bias, and utility. American Psychologist, 52(11), 1187-1197.
- Marsh, H. W., & Roche, L. A. (2000). Effects of grading leniency and low workload on students' evaluations of teaching: Popular myth, bias, validity, or innocent bystanders?. Journal of Educational Psychology, 92(1), 202-228.
- Maurer, T. W. (2006). Cognitive dissonance or revenge? Student grades and course evaluations. Teaching of Psychology, 33(3), 176-179.
- McKeachie W. J. (1987). Can evaluating instruction improve teaching?. New Directions for Teaching and Learning, 31, 3-7.
- McPherson, M. A., & Jewell, R. T. (2007). Leveling the playing field: Should student evaluation scores be adjusted?. Social Science Quarterly, 88(3), 868-881.
- Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525-543.
- Meredith, W., & Millsap, R.E. (1992). On the misuse of manifest variables in the detection of measurement bias. Psychometrika, 57, 289-311.
- Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: American Council on Education/Macmillan.
- Miles, P., & House, D. (2015). The Tail Wagging the Dog; An Overdue Examination of Student Teaching Evaluations. International Journal of Higher Education, 4(2), 116-126.
- Morley, D. (2014). Assessing the reliability of student evaluations of teaching: choosing the right coefficient. Assessment & Evaluation in Higher Education, 39(2), 127-139.
- Nargundkar, S., & Shrikhande, M. (2014). Norming of student evaluations of instruction: Impact of noninstructional factors. Decision Sciences Journal of Innovative Education, 12(1), 55-72.
- Nimmer, J. G. & Stone, E. F. (1991). Effects of grading practices and time of rating on student ratings of faculty performance and student learning. Research in Higher Education, 32, 195-215.
- Pascarella, E. T., Seifert, T. A., Blaich, C. (2010). How effective are the nsse benchmarks in predicting important educational outcomes? Change, 42(1), 16–22.
- Pollio, H. R., & Beck, H. P. (2000). When the tail wags the dog. Journal of Higher Education, 71, 84-102.
- Remmers, H. H. (1928). The relationship between students' marks and student attitude toward instructors. School & Society, 28, 759–760.
- Rodabaugh, R. C., & Kravitz, D. A. (1994). Effects of procedural fairness on student judgments of professors. Journal on Excellence in College Teaching, 5(2), 67-83.
- Sailor, P., Worthen, B., & Shin, E. H. (1997). Class level as a possible mediator of the relationship between grades and student ratings of teaching. Assessment & Evaluation in Higher Education, 22(3), 261-269.
- Satorra, A., & Bentler, P. M. (1988). Scaling corrections for chi-square statistics in covariance structure analysis. ASA 1988 Proceedings of the Business and Economic Statistics, Section (308-313). Alexandria, VA: American Statistical Association.
- Satorra, A., & Bentler, P. M. (2001). A scaled difference chi-square test statistic for moment structure analysis. Psychometrika, 66, 507–514.
- Schmitt, N., & Kuljanin, G. (2008). Measurement invariance: review of practice and implications. Human Resource Management Review, 18, 210-222.
- Schoot, R., Lugtig, P., & Hox, J. (2012). A checklist for testing measurement invariance. European Journal of Developmental Psychology, 9(4), 486-492.
- Soh, K. (2014). Test language effect in international achievement comparisons: An example from PISA 2009. Cogent Education, 1(1), 955247.
- Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: toward a unified strategy. Journal of Applied Psychology, 91(6), 1292.
- van de Vijver, F. J. R., & Tanzer, N. K. (2004). Bias and equivalence in cross-cultural assessment: An overview. Revue Européenne de Psychologie Appliquée/European Review of Applied Psychology, 54,119-135.
- Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3, 4–70.
- Wilberg, M. (2007). Measuring and detecting differential item functioning in criterion-referenced licensing test: A theoretic comparison of methods. Umea University. EM No 60.
- Wilhem, H. B. (2004). The relative influence of published teaching evaluations and other instructor attributes on course choice. Journal of Marketing Education, 26(1), 17-30.
- Wolbring, T. (2012). Class attendance and students’ evaluations of teaching. Do no-shows bias course ratings and rankings?. Evaluation review, 36(1), 72-96.
- Wu, A. D., Li, Z. & Zumbo, B. D. (2007). Decoding the meaning of factorial invariance and updating the practice of multi-group confirmatory factor analysis: a demonstration with timss data. Practical Assessment, Research & Evaluation, 12, 1–26.
- Zabaleta, F. (2007). The use and misuse of student evaluations of teaching.Teaching in Higher Education, 12(1), 55-76.
- Zhao, J., & Gallant, D. J. (2012). Student evaluation of instruction in higher education: Exploring issues of validity and reliability. Assessment & Evaluation in Higher Education, 37(2), 227-235.
The Measurement Invariance of University Students’ Ratings of Instruction
Year 2019,
Volume: 34 Issue: 2, 402 - 417, 30.04.2019
İlker Kalender
,
Giray Berberoğlu
Abstract
The
invariance in the scores of student rating of instruction was studied across
high and low achieving classrooms. Achievement levels were determined by the
two criteria such as self-reported expected grades and end of semester grades.
The data included 625 classrooms. The equality of (i) factorial structure, (ii)
factor loadings, (iii) item intercepts, and (iv) error variances of the 7 item
rating scale were studied across these groups. With respect to self-reported
expected grades, high and low achieving classes produced invariant scale
characteristics except strict invariance. On the other hand, with respect to
end of semester grades, full equality in item intercepts and error variances
were not achieved. It seems that comparing the rating results across the
classrooms and courses independent of the achievement levels of the students
may be misleading especially for the high-stake decisions since the origin of
the scale is not the same across high and low achieving groups.
References
- Ackerman, D., Gross, B. L., & Vigneron, F. (2009). Peer observation reports and student evaluations of teaching: Who are the experts? Alberta Journal of Educational Research, 55(1), 18-39.
- Baas, M., De Dreu, C. K. W., & Nijstad, B. A. (2011). When prevention promotes creativity: the role of mood, regulatory focus, and regulatory closure. Journal of Personality and Social Psychology, 100, 794–809.
- Benton, S. L. & Cashin, W. E. (2012). Student ratings of teaching: A summary of research and literature. (No. 50). Manhattan, KS: IDEA Center.
- Benton, S. L., Duchon, D., & Pallett, W. H. (2013). Validity of self-reported student ratings of instruction. Assessment & Evaluation in Higher Education, 38, 377-389.
- Bowman, N. A. (2010). Can 1st-year college students accurately report their learning and development? American Educational Research Journal, 47(2), 466-496.
- Brockx, B., Spooren, P., & Mortelmans, D. (2011). Taking the grading leniency story to the edge. The influence of student, teacher, and course characteristics on student evaluations of teaching in higher education. Educational Assessment, Evaluation and Accountability, 23(4), 289-306.
- Byrne, B. M. (2004). Testing for multigroup invariance using AMOS graphics: A road less traveled. Structural Equation Modeling, 11, 272–300.
- Byrne, B. M. (2010). Structural equation modeling with AMOS: Basic concepts, applications, and programming. Routledge.
- Byrne, B. M., Shavelson, R. J., & Muthén, B. (1989). Testing for the equivalence of factor covariance and mean structures: the issue of partial measurement invariance. Psychological Bulletin, 105, 456-466.
- Bryant, F. B., & Satorra, A. (2012). Principles and practice of scaled difference chi-square testing. Structural Equation Modeling, 19(3), 372-398.
- Clayson, D. E. (2009). Student evaluations of teaching: are they related to what students learn?: a meta-analysis and review of the literature. Journal of Marketing Education, 31, 16-30.
- Cole, J. S., & Gonyea, R. M. (2010). Accuracy of self-reported sat and act test scores: Implications for research. Research in Higher Education, 51(3), 305–319.
- Cruse, D. B. (1987). Student evaluations of the university professor: caveat professor. Higher Education, 16, 723-737.
- Dimitrov, D. M. (2010). Testing for factorial invariance in the context of construct validation. Measurement and Evaluation in Counselling and Development, 43(2), 121-149.
- Ehie, I. C., & Karathanos, K. (1994). Business faculty performance evaluation based on the new aacsbi accreditation standards. Journal of Education for Business, 69, 257-262.
- Ellis, B. B., Becker, P., & Kimmel, H. D. (1993). An item response theory evaluation of an English version of the trier personality inventory (TPI). Journal of Cross-cultural psychology, 24(2), 133-148.
- Felton, J., Mitchell, J., & Stinson, M. (2004). Web-based student evaluations of professors: the relations between perceived quality, easiness and sexiness.Assessment & Evaluation in Higher Education, 29(1), 91-108.
- Finney S. J., & DiStefano, C. (2013). Nonnormal and categorical data in structural equation modeling. In G. R. Hancock y R. O. Mueller (Eds.), Structural Equation Modeling: A second course (pp. 269- 492). Greenwich, CT: Information Age Publishing Inc.
- Gelman, A. & Park, D. K. (2008). Splitting a predictor at the upper quarter or third and the lower quarter or third. The American Statistician, 62(4), 1-8.
- Grammatikopoulos, V., Linardakis, M., Gregoriadis, A., & Oikonomidis, V. (2015). Assessing the Students’ Evaluations of Educational Quality (SEEQ) questionnaire in Greek higher education. Higher Education, 70(3), 395-408.
- Greenwald, A. G., & Gillmore, G. M. (1997a). Grading leniency is a removable contaminant of student ratings. American Psychologist, 52, 1209–1217.
- Greenwald, A. G., & Gillmore, J. M. (1997b). No pain, no gain? The importance of measuring course workload in student ratings of instruction. Journal of EducationalPsychology, 89, 743–751.
- Grimes, P. W. (2002). The overconfident principles of economics students: an examination of metacognitive skill. Journal of Economic Education, 33(1), 15-30.
- Harrison, P. D., Douglas D. K., & Burdsal, C. A. (2004). The relative merits of different types of overall evaluations of teaching effectiveness. Research in Higher Education, 45(3), 311–323.
- Hu, L. T., & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Structural Equation Modeling, 6, 1–55.
- Jöreskog, K. G. (1971). Simultaneous factor analysis in several populations. Psychometrika, 36(4), 409-426.
- Jöreskog, K. G., & Sörbom, D. (1996). LISREL 8: user’s reference guide. Chicago, IL: Scientific Software International.
- Kennedy, E. J., Lawton, L., & Plumlee, E. L. (2002) Blissful ignorance: the problem of unrecognized incompetence and academic performance, Journal of Marketing Education, 24(3), 243-252.
- Kulik, J. A. (2001). Student ratings: validity, utility, and controversy. New directions for Institutional Research, 109, 9-25.
- Langbein, L. (2008). Management by results: Student evaluation of faculty teaching and the mis-measurement of performance. Economics of Education Review, 27(4), 417-428.
- Levine, D. W., Kaplan, R. M., Kripke, D. F., Bowen, D. J., Naughton, M. J., Shumaker, S. A. (2003). Factor structure and measurement invariance of the women's health initiative insomnia rating scale. Psychological Assessment, 15(2), 123-136.
- Lubke, G. H., Dolan, C. V., Kelderman, H. & Mellerbergh, G. J. (2003). On the relationship between sources of within- and between-group differences and measurement invariance in the common factor model. Intelligence, 31, 543-566.
- Macfadyen, L. P., Dawson, S., Prest, S., & Gašević, D. (2016). Whose feedback? A multilevel analysis of student completion of end-of-term teaching evaluations. Assessment & Evaluation in Higher Education, 41, 821-839.
- Machina, K. (1987). Evaluating student evaluations. Academe, 73(3), 19-22.
- MacNell, L., Driscoll, A., & Hunt, A. N. (2015). What’s in a name: exposing gender bias in student ratings of teaching. Innovative Higher Education, 40(4), 291-303.
- Marsh, H. W., & Roche, L. A. (1997). Making students' evaluations of teaching effectiveness effective: The critical issues of validity, bias, and utility. American Psychologist, 52(11), 1187-1197.
- Marsh, H. W., & Roche, L. A. (2000). Effects of grading leniency and low workload on students' evaluations of teaching: Popular myth, bias, validity, or innocent bystanders?. Journal of Educational Psychology, 92(1), 202-228.
- Maurer, T. W. (2006). Cognitive dissonance or revenge? Student grades and course evaluations. Teaching of Psychology, 33(3), 176-179.
- McKeachie W. J. (1987). Can evaluating instruction improve teaching?. New Directions for Teaching and Learning, 31, 3-7.
- McPherson, M. A., & Jewell, R. T. (2007). Leveling the playing field: Should student evaluation scores be adjusted?. Social Science Quarterly, 88(3), 868-881.
- Meredith, W. (1993). Measurement invariance, factor analysis and factorial invariance. Psychometrika, 58(4), 525-543.
- Meredith, W., & Millsap, R.E. (1992). On the misuse of manifest variables in the detection of measurement bias. Psychometrika, 57, 289-311.
- Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13–103). New York: American Council on Education/Macmillan.
- Miles, P., & House, D. (2015). The Tail Wagging the Dog; An Overdue Examination of Student Teaching Evaluations. International Journal of Higher Education, 4(2), 116-126.
- Morley, D. (2014). Assessing the reliability of student evaluations of teaching: choosing the right coefficient. Assessment & Evaluation in Higher Education, 39(2), 127-139.
- Nargundkar, S., & Shrikhande, M. (2014). Norming of student evaluations of instruction: Impact of noninstructional factors. Decision Sciences Journal of Innovative Education, 12(1), 55-72.
- Nimmer, J. G. & Stone, E. F. (1991). Effects of grading practices and time of rating on student ratings of faculty performance and student learning. Research in Higher Education, 32, 195-215.
- Pascarella, E. T., Seifert, T. A., Blaich, C. (2010). How effective are the nsse benchmarks in predicting important educational outcomes? Change, 42(1), 16–22.
- Pollio, H. R., & Beck, H. P. (2000). When the tail wags the dog. Journal of Higher Education, 71, 84-102.
- Remmers, H. H. (1928). The relationship between students' marks and student attitude toward instructors. School & Society, 28, 759–760.
- Rodabaugh, R. C., & Kravitz, D. A. (1994). Effects of procedural fairness on student judgments of professors. Journal on Excellence in College Teaching, 5(2), 67-83.
- Sailor, P., Worthen, B., & Shin, E. H. (1997). Class level as a possible mediator of the relationship between grades and student ratings of teaching. Assessment & Evaluation in Higher Education, 22(3), 261-269.
- Satorra, A., & Bentler, P. M. (1988). Scaling corrections for chi-square statistics in covariance structure analysis. ASA 1988 Proceedings of the Business and Economic Statistics, Section (308-313). Alexandria, VA: American Statistical Association.
- Satorra, A., & Bentler, P. M. (2001). A scaled difference chi-square test statistic for moment structure analysis. Psychometrika, 66, 507–514.
- Schmitt, N., & Kuljanin, G. (2008). Measurement invariance: review of practice and implications. Human Resource Management Review, 18, 210-222.
- Schoot, R., Lugtig, P., & Hox, J. (2012). A checklist for testing measurement invariance. European Journal of Developmental Psychology, 9(4), 486-492.
- Soh, K. (2014). Test language effect in international achievement comparisons: An example from PISA 2009. Cogent Education, 1(1), 955247.
- Stark, S., Chernyshenko, O. S., & Drasgow, F. (2006). Detecting differential item functioning with confirmatory factor analysis and item response theory: toward a unified strategy. Journal of Applied Psychology, 91(6), 1292.
- van de Vijver, F. J. R., & Tanzer, N. K. (2004). Bias and equivalence in cross-cultural assessment: An overview. Revue Européenne de Psychologie Appliquée/European Review of Applied Psychology, 54,119-135.
- Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3, 4–70.
- Wilberg, M. (2007). Measuring and detecting differential item functioning in criterion-referenced licensing test: A theoretic comparison of methods. Umea University. EM No 60.
- Wilhem, H. B. (2004). The relative influence of published teaching evaluations and other instructor attributes on course choice. Journal of Marketing Education, 26(1), 17-30.
- Wolbring, T. (2012). Class attendance and students’ evaluations of teaching. Do no-shows bias course ratings and rankings?. Evaluation review, 36(1), 72-96.
- Wu, A. D., Li, Z. & Zumbo, B. D. (2007). Decoding the meaning of factorial invariance and updating the practice of multi-group confirmatory factor analysis: a demonstration with timss data. Practical Assessment, Research & Evaluation, 12, 1–26.
- Zabaleta, F. (2007). The use and misuse of student evaluations of teaching.Teaching in Higher Education, 12(1), 55-76.
- Zhao, J., & Gallant, D. J. (2012). Student evaluation of instruction in higher education: Exploring issues of validity and reliability. Assessment & Evaluation in Higher Education, 37(2), 227-235.