The Continuity of Students’ Disengaged Responding in Low-stakes Assessments: Evidence from Response Times

Hatice Cigdem Bulut

doi:10.21449/ijate.789212

Araştırma Makalesi

The Continuity of Students’ Disengaged Responding in Low-stakes Assessments: Evidence from Response Times

Yıl 2021, Cilt: 8 Sayı: 3, 527 - 541, 05.09.2021

Hatice Cigdem Bulut

https://doi.org/10.21449/ijate.789212

Öz

Several studies have been published on disengaged test respondents, and others have analyzed disengaged survey respondents separately. For many large-scale assessments, students answer questionnaire and test items in succession. This study examines the percentage of students who continuously engage in disengaged responding behaviors across sections in a low-stakes assessment. The effects on calculated scores of filtering students, based on their responding behaviors, are also analyzed. Data of this study came from the 2015 administration of PISA. For data analysis, frequencies and percentages of engaged students in the sessions were initially calculated using students' response times. To investigate the impact of filtering disengaged respondents on parameter estimation, three groups were created, namely engaged in both measures, engaged only in the test, and engaged only in the questionnaire. Next, several validity checks were performed on each group to verify the accuracy of the classifications and the impact of filtering student groups based on their responding behavior. The results indicate that students who are disengaged in tests tend to continue this behavior when responding to the questionnaire items in PISA. Moreover, the rate of continuity of disengaged responding is non-negligible as can be seen from the effect sizes. On the other hand, removing disengaged students in both measures led to higher or nearly the same performance ratings compared to the other groups. Researchers analyzing the dataset including achievement tests and survey items are recommended to review disengaged responses and filter out students who are continuously showing disengaged responding before performing further statistical analysis.

Anahtar Kelimeler

Disengaged responding, insufficient effort responding, validity, low-stakes assessments

Teşekkür

The author would like to thank O B for his many insightful comments on the earlier drafts of this paper and anonymous reviewers for their beneficial suggestions and comments.

Kaynakça

Birnbaum, A. (1968). Some latent trait models and their use in inferring a student’s ability. In F. M. Lord and M.R. Novick (eds.), Statistical theories of mental test scores. Addison-Wesley.
Buchanan, E. M., & Scofield, J. E. (2018). Methods to detect low-quality data and its implication for psychological research. Behavior Research Methods, 2018, (50), 2586–2596. https://doi.org/10.3758/s13428-018-1035-6
Curran, P. G. (2016). Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology, 66, 4 19. https://doi.org/10.1016/j.jesp.2015.07.006
DeMars, C. E. (2007). Changes in rapid-guessing behavior series of assessments. Educational Assessment, 12(1), 23–45. https://doi.org/10.1080/10627190709336946
Eccles, J. S., & Wigfield, A. (2002). Motivational beliefs, values, and goals. Annual review of psychology, 53(1), 109-132. https://doi.org/10.1146/annurev.psych.53.100901.135153
Eklöf, H. (2006). Development and validation of scores from an instrument measuring student test-taking motivation. Educational and Psychological Measurement, 66, 643–656. https://doi.org/10.1177/0013164405278574
Eklöf, H., Pavešič, B. J., & Grønmo, L. S. (2014). A cross-national comparison of reported effort and mathematics performance in TIMSS Advanced. Applied Measurement in Education, 27(1), 31–45. https://doi.org/10.1080/08957347.2013.853070
Goldhammer, F., Martens, T., Christoph, G., & Lüdtke, O. (2016). Test-taking engagement in PIAAC (OECD Education Working Papers, No. 133). OECD Publishing.
Guo, H., Rios, J. A., Haberman, S., Liu, O. L., Wang, J., & Paek, I. (2016). A new procedure for detection of students’ rapid guessing responses using response time. Applied Measurement in Education, 29, 173 183. https://doi.org/10.1080/08957347.2016.1171766
Huang, J. L., Curran, P. G., Keeney, J., Poposki, E. M., & DeShon, R. P. (2012). Detecting and deterring insufficient effort responding to surveys. Journal of Business and Psychology, 27(1), 99–114. https://doi.org/10.1007/s10869-011-9231-8
Huang, J.L., Bowling, N.A., Liu, M., & Li, Y. (2015). Detecting insufficient effort responding with an infrequency scale: Evaluating validity and participant reactions. Journal of Business and Psychology, 30, 299–311. https://doi.org/10.1007/s10869-014-9357-6
Huang, J.L., Curran, P.G., Keeney, J., Poposki, E.M., & DeShon, R.P. (2012). Detecting and deterring insufficient effort responding to surveys. Journal of Business and Psychology, 27, 99–114. https://doi.org/10.1007/s10869-011-9231-8
Johnson, J. A. (2005). Ascertaining the validity of individual protocols from web-based personality inventories. Journal of Research in Personality, 39, 103 129. https://doi.org/10.1016/j.jrp.2004.09.009
Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16, 277 298. https://doi.org/10.1207/S15324818AME1604
Maniaci, M. R., & Rogge, R. D. (2014). Caring about carelessness: Participant inattention and its effects on research. Journal of Research in Personality, 48, 61 83. https://doi.org/10.1016/j.jrp.2013.09.008
Martinkova, P., Drabinova, A., Leder, O., & Houdek, J. (2017). ShinyItemAnalysis: Test ´and item analysis via shiny [Computer software manual]. https://CRAN.R-project.org/package=ShinyItemAnalysis.
Meade, A. W., & Craig, S. B. (2012). Identifying careless responses in survey data. Psychological Methods, 17, 437–455. https://doi.org/10.1037/a0028085
Meyer, P. J. (2010). A mixture Rasch model with response time components. Applied Psychological Measurement, 34, 521–538. https://doi.org/10.1177/0146621609355451
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159 176. https://doi.org/10.1002/j.2333 8504.1992.tb01436.x
Niessen, A. S. M., Meijer, R. R., & Tendeiro, J. N. (2016). Detecting careless respondents in web-based questionnaires: Which method to use?. Journal of Research in Personality, 63, 1–11. https://doi.org/10.1016/j.jrp.2016.04.010
OECD. (2017). PISA 2015 assessment and analytical framework: Science, reading, mathematic, financial literacy and collaborative problem solving. OECD Publishing. https://doi.org/10.1787/9789264281820-en
Palaniappan, K., & Kum, I. Y. S. (2019). Underlying Causes behind Research Study Participants’ Careless and Biased Responses in the Field of Sciences. Current Psychology, 38(6), 1737–1747. https://doi.org/10.1007/s12144-017-9733-2
R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL, https://www.R-project.org/.
Rizopoulos, D. (2006). ltm: An R package for latent variable modeling and item response theory analyses. Journal of Statistical Software, 17(5), 1–25.
Rosseel, Y. (2011). lavaan: An R package for structural equation modeling and more (Version 0.4-10 beta).
Setzer, J. C., Wise, S. L., van den Heuvel, J. R., & Ling, G. (2013). An investigation of examinee test-taking effort on a low-stakes assessment. Applied Measurement in Education, 26(1), 34–49. https://doi.org/10.1080/08957347.2013.739453
Sundre, D. L., & Moore, D. L. (2002). The Student Opinion Scale: A measure of examinee motivation. Assessment Update, 14(1), 8–9.
Sundre, D. L., &Wise, S. L. (2003, April). ‘Motivation filtering’: An exploration of the impact of low examinee motivation on the psychometric quality of tests. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago.
van der Linden, W. J. (2009). Conceptual issues in response‐time modeling. Journal of Educational Measurement, 46(3), 247 272. https://doi.org/10.1111/j.1745 3984.2009.00080.x
Wang, C., & Xu, G. (2015). A mixture hierarchical model for response times and response accuracy. British Journal of Mathematical and Statistical Psychology, 68(3), 456–477. https://doi.org/10.1111/bmsp.12054
Wise, S. L. (2006). An investigation of the differential effort received by items on a low-stakes, computer-based test. Applied Measurement in Education, 19, 95–114. https://doi.org/10.1207/s15324818ame1902_2
Wise, S. L. (2017). Rapid-guessing behavior: Its identification, interpretations, and implications. Educational Measurement: Issues and Practice, 36(4), 52–61. https://doi.org/10.1111/emip.12165
Wise, S. L. (2019). An Information-Based Approach to Identifying Rapid-Guessing Thresholds. Applied Measurement in Education, 32(4), 325 336, https://doi.org/10.1080/08957347.2019.1660350
Wise, S. L., & DeMars, C. E. (2005). Examinee motivation in low-stakes assessment: Problems and potential solutions. Educational Assessment, 10, 1 18. https://doi.org/10.1207/s15326977ea1001_1
Wise, S. L., & DeMars, C. E. (2006). An application of item response time: The effort-moderated IRT model. Journal of Educational Measurement, 43, 19 38. https://doi.org/10.1111/j.1745-3984.2006.00002.x
Wise, S. L., & Gao, L. (2017). A general approach to measuring test-taking effort on computer-based tests. Applied Measurement in Education, 30(4), 343 354. https://doi.org/10.1080/08957347.2017.1353992
Wise, S. L., & Kingsbury, G. G. (2016). Modeling student test-taking motivation in the context of an adaptive achievement test. Journal of Educational Measurement, 53, 86–105. https://doi.org/10.1111/jedm.2016.53.issue-1.
Wise, S. L., & Ma, L. (2012, April). Setting response time thresholds for a CAT item pool: The normative threshold method. In annual meeting of the National Council on Measurement in Education, Vancouver, Canada.
Wise, S. L., Soland, J., & Bo, Y. (2019). The (Non) Impact of Differential Test Taker Engagement on Aggregated Scores. International Journal of Testing, 1–21. https://doi.org/10.1080/15305058.2019.1605999
Woods, C.M. (2006). Careless responding to reverse-worded items: Implications for con- firmatory factor analysis. Journal of Psychopathology and Behavioral Assessment, 28, 189–194. https://doi.org/10.1007/s10862-005-9004-7
Zamarro, G., Hitt, C., & Mendez, I. (2019). When students don’t care: Reexamining ınternational differences in achievement and student effort. Journal of Human Capital, 13(4), 519–552. https://doi.org/10.1086/705799
Zhang, C., & Conrad, F. (2014). Speeding in web surveys: The tendency to answer very fast and its association with straightlining. In Survey Research Methods, 8, 127–135. https://doi.org/10.18148/srm/2014.v8i2.5453

The Continuity of Students’ Disengaged Responding in Low-stakes Assessments: Evidence from Response Times

Yıl 2021, Cilt: 8 Sayı: 3, 527 - 541, 05.09.2021

Hatice Cigdem Bulut

https://doi.org/10.21449/ijate.789212

Öz

Anahtar Kelimeler

Response time, Disengaged responding, Insufficient effort responding, Validity, Low-stakes assessments

Kaynakça

Birnbaum, A. (1968). Some latent trait models and their use in inferring a student’s ability. In F. M. Lord and M.R. Novick (eds.), Statistical theories of mental test scores. Addison-Wesley.
Buchanan, E. M., & Scofield, J. E. (2018). Methods to detect low-quality data and its implication for psychological research. Behavior Research Methods, 2018, (50), 2586–2596. https://doi.org/10.3758/s13428-018-1035-6
Curran, P. G. (2016). Methods for the detection of carelessly invalid responses in survey data. Journal of Experimental Social Psychology, 66, 4 19. https://doi.org/10.1016/j.jesp.2015.07.006
DeMars, C. E. (2007). Changes in rapid-guessing behavior series of assessments. Educational Assessment, 12(1), 23–45. https://doi.org/10.1080/10627190709336946
Eccles, J. S., & Wigfield, A. (2002). Motivational beliefs, values, and goals. Annual review of psychology, 53(1), 109-132. https://doi.org/10.1146/annurev.psych.53.100901.135153
Eklöf, H. (2006). Development and validation of scores from an instrument measuring student test-taking motivation. Educational and Psychological Measurement, 66, 643–656. https://doi.org/10.1177/0013164405278574
Eklöf, H., Pavešič, B. J., & Grønmo, L. S. (2014). A cross-national comparison of reported effort and mathematics performance in TIMSS Advanced. Applied Measurement in Education, 27(1), 31–45. https://doi.org/10.1080/08957347.2013.853070
Goldhammer, F., Martens, T., Christoph, G., & Lüdtke, O. (2016). Test-taking engagement in PIAAC (OECD Education Working Papers, No. 133). OECD Publishing.
Guo, H., Rios, J. A., Haberman, S., Liu, O. L., Wang, J., & Paek, I. (2016). A new procedure for detection of students’ rapid guessing responses using response time. Applied Measurement in Education, 29, 173 183. https://doi.org/10.1080/08957347.2016.1171766
Huang, J. L., Curran, P. G., Keeney, J., Poposki, E. M., & DeShon, R. P. (2012). Detecting and deterring insufficient effort responding to surveys. Journal of Business and Psychology, 27(1), 99–114. https://doi.org/10.1007/s10869-011-9231-8
Huang, J.L., Bowling, N.A., Liu, M., & Li, Y. (2015). Detecting insufficient effort responding with an infrequency scale: Evaluating validity and participant reactions. Journal of Business and Psychology, 30, 299–311. https://doi.org/10.1007/s10869-014-9357-6
Huang, J.L., Curran, P.G., Keeney, J., Poposki, E.M., & DeShon, R.P. (2012). Detecting and deterring insufficient effort responding to surveys. Journal of Business and Psychology, 27, 99–114. https://doi.org/10.1007/s10869-011-9231-8
Johnson, J. A. (2005). Ascertaining the validity of individual protocols from web-based personality inventories. Journal of Research in Personality, 39, 103 129. https://doi.org/10.1016/j.jrp.2004.09.009
Karabatsos, G. (2003). Comparing the aberrant response detection performance of thirty-six person-fit statistics. Applied Measurement in Education, 16, 277 298. https://doi.org/10.1207/S15324818AME1604
Maniaci, M. R., & Rogge, R. D. (2014). Caring about carelessness: Participant inattention and its effects on research. Journal of Research in Personality, 48, 61 83. https://doi.org/10.1016/j.jrp.2013.09.008
Martinkova, P., Drabinova, A., Leder, O., & Houdek, J. (2017). ShinyItemAnalysis: Test ´and item analysis via shiny [Computer software manual]. https://CRAN.R-project.org/package=ShinyItemAnalysis.
Meade, A. W., & Craig, S. B. (2012). Identifying careless responses in survey data. Psychological Methods, 17, 437–455. https://doi.org/10.1037/a0028085
Meyer, P. J. (2010). A mixture Rasch model with response time components. Applied Psychological Measurement, 34, 521–538. https://doi.org/10.1177/0146621609355451
Muraki, E. (1992). A generalized partial credit model: Application of an EM algorithm. Applied Psychological Measurement, 16, 159 176. https://doi.org/10.1002/j.2333 8504.1992.tb01436.x
Niessen, A. S. M., Meijer, R. R., & Tendeiro, J. N. (2016). Detecting careless respondents in web-based questionnaires: Which method to use?. Journal of Research in Personality, 63, 1–11. https://doi.org/10.1016/j.jrp.2016.04.010
OECD. (2017). PISA 2015 assessment and analytical framework: Science, reading, mathematic, financial literacy and collaborative problem solving. OECD Publishing. https://doi.org/10.1787/9789264281820-en
Palaniappan, K., & Kum, I. Y. S. (2019). Underlying Causes behind Research Study Participants’ Careless and Biased Responses in the Field of Sciences. Current Psychology, 38(6), 1737–1747. https://doi.org/10.1007/s12144-017-9733-2
R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL, https://www.R-project.org/.
Rizopoulos, D. (2006). ltm: An R package for latent variable modeling and item response theory analyses. Journal of Statistical Software, 17(5), 1–25.
Rosseel, Y. (2011). lavaan: An R package for structural equation modeling and more (Version 0.4-10 beta).
Setzer, J. C., Wise, S. L., van den Heuvel, J. R., & Ling, G. (2013). An investigation of examinee test-taking effort on a low-stakes assessment. Applied Measurement in Education, 26(1), 34–49. https://doi.org/10.1080/08957347.2013.739453
Sundre, D. L., & Moore, D. L. (2002). The Student Opinion Scale: A measure of examinee motivation. Assessment Update, 14(1), 8–9.
Sundre, D. L., &Wise, S. L. (2003, April). ‘Motivation filtering’: An exploration of the impact of low examinee motivation on the psychometric quality of tests. Paper presented at the annual meeting of the National Council on Measurement in Education, Chicago.
van der Linden, W. J. (2009). Conceptual issues in response‐time modeling. Journal of Educational Measurement, 46(3), 247 272. https://doi.org/10.1111/j.1745 3984.2009.00080.x
Wang, C., & Xu, G. (2015). A mixture hierarchical model for response times and response accuracy. British Journal of Mathematical and Statistical Psychology, 68(3), 456–477. https://doi.org/10.1111/bmsp.12054
Wise, S. L. (2006). An investigation of the differential effort received by items on a low-stakes, computer-based test. Applied Measurement in Education, 19, 95–114. https://doi.org/10.1207/s15324818ame1902_2
Wise, S. L. (2017). Rapid-guessing behavior: Its identification, interpretations, and implications. Educational Measurement: Issues and Practice, 36(4), 52–61. https://doi.org/10.1111/emip.12165
Wise, S. L. (2019). An Information-Based Approach to Identifying Rapid-Guessing Thresholds. Applied Measurement in Education, 32(4), 325 336, https://doi.org/10.1080/08957347.2019.1660350
Wise, S. L., & DeMars, C. E. (2005). Examinee motivation in low-stakes assessment: Problems and potential solutions. Educational Assessment, 10, 1 18. https://doi.org/10.1207/s15326977ea1001_1
Wise, S. L., & DeMars, C. E. (2006). An application of item response time: The effort-moderated IRT model. Journal of Educational Measurement, 43, 19 38. https://doi.org/10.1111/j.1745-3984.2006.00002.x
Wise, S. L., & Gao, L. (2017). A general approach to measuring test-taking effort on computer-based tests. Applied Measurement in Education, 30(4), 343 354. https://doi.org/10.1080/08957347.2017.1353992
Wise, S. L., & Kingsbury, G. G. (2016). Modeling student test-taking motivation in the context of an adaptive achievement test. Journal of Educational Measurement, 53, 86–105. https://doi.org/10.1111/jedm.2016.53.issue-1.
Wise, S. L., & Ma, L. (2012, April). Setting response time thresholds for a CAT item pool: The normative threshold method. In annual meeting of the National Council on Measurement in Education, Vancouver, Canada.
Wise, S. L., Soland, J., & Bo, Y. (2019). The (Non) Impact of Differential Test Taker Engagement on Aggregated Scores. International Journal of Testing, 1–21. https://doi.org/10.1080/15305058.2019.1605999
Woods, C.M. (2006). Careless responding to reverse-worded items: Implications for con- firmatory factor analysis. Journal of Psychopathology and Behavioral Assessment, 28, 189–194. https://doi.org/10.1007/s10862-005-9004-7
Zamarro, G., Hitt, C., & Mendez, I. (2019). When students don’t care: Reexamining ınternational differences in achievement and student effort. Journal of Human Capital, 13(4), 519–552. https://doi.org/10.1086/705799
Zhang, C., & Conrad, F. (2014). Speeding in web surveys: The tendency to answer very fast and its association with straightlining. In Survey Research Methods, 8, 127–135. https://doi.org/10.18148/srm/2014.v8i2.5453

Toplam 42 adet kaynakça vardır.

Ayrıntılar

Birincil Dil	İngilizce
Konular	Eğitim Üzerine Çalışmalar
Bölüm	Makaleler
Yazarlar	Hatice Cigdem Bulut 0000-0003-2585-3686
Yayımlanma Tarihi	5 Eylül 2021
Gönderilme Tarihi	1 Eylül 2020
Yayımlandığı Sayı	Yıl 2021 Cilt: 8 Sayı: 3

Kaynak Göster

APA	Bulut, H. C. (2021). The Continuity of Students’ Disengaged Responding in Low-stakes Assessments: Evidence from Response Times. International Journal of Assessment Tools in Education, 8(3), 527-541. https://doi.org/10.21449/ijate.789212

Makale Dosyaları

Tam Metin

23823 23825 23824