Impact of the Number of Scale Points on Data Characteristics and Respondents’ Evaluations: An Experimental Design Approach Using 5-Point and 7-Point Likert-type Scales
Year 2016,
Issue: 55, 1 - 20, 30.10.2016
Oylum Korkut Altuna
,
F.müge Arslan
Abstract
A remarkable deal of social research is based on data collected through the use of Likerttype scales. The optimal number of response categories in Likert-type scales has been subject to an academic debate for years. This article studies the differences between 5- and 7-point Likert-type scales using the SERVPERF Scale, which was developed by Cronin and Taylor in 1992, as the measuring instrument. A pretest-posttest control group experimental design was used to test whether the differently pointed response categories lead to any statistical differences in data characteristics, dimensional structure of the scale and data fit. Results do not show any statistically significant differences in terms of normality and reliability whereas different dimensional structures are achieved for the 5- and 7-point scale formats of SERVPERF using Exploratory Factor Analysis. ANCOVA results reveal that the number of response categories is not affective on the participants’ evaluations of SERVPERF. The results of confirmatory factor analysis show that the best fit is achieved for the 7-point SERVPERF.
References
- Alford, W.K., Malouff, J.M. & Osland, K.S. (2005). Written Emotional Expression as a
Coping Method in Child Protective Services Officers. International Journal of Stress
Management, 12(2), 182-183.
- Babakus, E. & Boller, G.W. (1992). An Emprical Assessment of the SERVQUAL Scale.
Journal of Business Research, 24(3), 253-268. doi: 10.1016/0148-2963(92)90022-4
- Bearden, W.O., Netmeyer, R.G. & Mobley, M. (1993). Handbook of Marketing Scales:
Multi-item Measures for Marketing and Consumer Behavior Research. Newbury Park,
CA: Sage.
- Bendig, A.W. (1953). The Reliability of Self-ratings as a Function of the Amount of
Verbal Anchoring and the Number of Categories on The Scale. The Journal of Applied
Psychology, 37, 38-41. doi: 10.1037/h0055647
- Bendig, A.W. (1954). Reliability and The Number of Rating Scale Categories. The Journal
of Applied Psychology, 38, 38-40.
- Brady, M.K., Cronin, J.J.Jr. & Brand, R.R. (2002). Performance-only Measurement of
Service Quality: A Replication and Extension. Journal of Business Research, 55(1),
17-31.
- Brown, G.; Wilding, R.E. & Coulter, R.L. (1991). Customer Evaluation of Retail
Salespeople Using the SOCO Scale: A Replication Extension and Application. Journal
of the Academy of Marketing Science, 9, 347-351.
- Carillat, F.A., Jaramillo, F. & Mulki, J.P. (2007). The Validity of the SERVQUAL and
SERVPERF Scales: A Meta-analytical View of 17 Years of Research Across Five
Continents. International Journal of Service Industry Management, 18(5), 472-490.
doi: 10.1108/09564230710826250
- Chang, L. (1994). A Psychometric Evaluation of Four-point and Six-point Likert-type
Scales in Relation to Reliability and Validity. Applied Psychological Measurement,
18, 205-215.
- Choudhury, S. & Bhattacharjee, D. (2014). Optimal Number of Scale Points in Likert Type
Scales for Quantifying Compulsive Buying Behaviour. Asian Journal of Management
Research, 4(3), 431-440.
- Cicchetti, D.V., Showalter, D. & Tyrer, P.J. (1985). The Effect of Number of Rating Scale
Categories on Levels of Inter-rater Reliability: A Monte-Carlo Investigation. Applied
Psychological Measurement, 9, 31-36.
- Cohen, J. (1988). Statistical Power Analysis for the Behavioural Sciences. Second Edition,
New York, NY: Academic Press.
- Colman, A.M., Norris, C.E. & Preston, C.C. (1997). Comparing Rating Scales of Different
Lengths: Equivalence of Scores from 5-point and 7-point Scales. Psychological
Reports, 80, 355-362. doi: 10.2466/pr0.1997.80.2.355
- Cortina, J.M. (1993). What is Coefficient Alpha? An Examination of Theory and
Applications. Journal of Applied Psychology, 78, 98-104. doi: 10.1037/0021-
9010.78.1.98
- Cox, E.P. (1980). The Optimal Number of Response Alternatives for a Scale: A Review.
Journal of Marketing Research, 17, 407-422.
- Crocker, L. & Algina, J. (1986). Introduction to Classical and Modern Test Theory. New
York, NY: Holt, Rinehart & Winston.
- Cronin, J.J.Jr. & Taylor, A.S.(1992). Measuring Service Quality: A Reexamination and an
Extension. Journal of Marketing, 56(3), 243-253.
- Dawes, J. (2008). Do Data Characteristics Change According to the Number of Scale Points
Used?, International Journal of Market Research, 50(1), 61 – 77. http://citeseerx.ist.
psu.edu/viewdoc/download?doi=10.1.1.417.9488&rep=rep1&type=pdf
- Dimitrov, D.M. & Rumrill, Jr., P.D. (2003). Pretest-Posttest Designs and Measurement of
Change. Work, 20, 159-165. http://iospress.metapress.com/content/7x9hgpq885t2yttq/
- Doğan, V., Özkara, B.Y., Yılmaz, C. and Torlak, Ö. (2014). Katılım Düzeyi Seçenek
Sayısının Veri Karakteristiği ve Veri Kalitesi Kapsamında İncelenmesi: Optimal
Katılım Düzeyi Seçenek Sayısına İlişkin Bir Çıkarım (An Examination of the Optimal
Number of Response Categories in terms of Data Characteristics and Data Quality: An
Inference Regarding the Optimal Number of Response Categories). In the Proceedings
of the 19th Annual Turkish National Marketing Congress, Gaziantep, TURKEY.
- Field, A. (2012). Discovering Statistics Using IBM SPSS Statistics. Fourth Edition,
London: Sage Publications.
- Finn, R.H. (1972). Effects of Some Variations in Rating Scale Characteristics on the Means
and Reliabilities of Ratings. Educational and Psychological Measurement, 32(7), 255-
265.
- Garner, W.R. (1960). Rating Scales, Discriminability and Information Transmission.
Psychological Review, 67,343-352.
- Green, J.A. & Rao, V.R. (1970). Rating Scales and Information Recovery: How Many
Scales and Response Categories to Use? Journal of Marketing, 34, 33-39.
- Howell, D.C. (1992). Statistical Methods for Psychology. Boston, MA: Duxbury Press.
- Huck, S.W. (2008). Reading Statistics and Research. Fifth Edition, Boston, MA: Pearson
Education, Inc.
- Jain, S.K. & Gupta, G. (2004). Measuring Service Quality: SERVQUAL vs SERVPERF
Scales. The Journal for Decision Makers, 29(2), 25-37. http://www.vikalpa.com/pdf/
articles/2004/2004_apr_jun_25_37.pdf
- Janssens, W., Wijnen, K. Pelsmacker, P.D. & Van Kenhove, P. (2008). Marketing Research
with SPSS. London: Pearson Education Limited.
- Jones, R.R. (1968). Differences in Response Consistency and Subjects’ Preferences for
Three Personality Inventory Response Formats. In Proceedings of the 76th Annual
Convention of the American Psychological Association, 247-248.
- Jöreskog, K.G. and Sörbom, D. (1993). Lisrel 8: Structural Equation Modeling with
Simplis Command Language. Scientific Software International.
- Lai, M., Li, Yongjian & Liu, Y. (2010). Determining the Optimal Scale Width for a Rating
Scale Using an Integrated Discrimination Fuction. Measurement, 43, 1458-1471. doi:
10.1016/j.measurement.2010.08.012
- Leung, S.O. (2011). A Comparison of Psychometric Properties and Normality in 4-,5-,6
and 11-Point Likert Scales. Journal of Social Service Research, 37, 412-421. doi:10.1
080/01488376.2011.580697
- Loken, B., Pirie, P., Virnig, K.A., Hinkle, R.L. & Salmon, C.T. (1987). The Use of 0-10
Scales in Telephone Surveys. Journal of the Market Research Society, 29(3), 353-362.
- Lozano, L.M., Garcia-Cueto, E. & Muniz, J. (2008). Effect of the Number of Response
Categories on the Reliability and Validity of Rating Scales. Methodology, 4(2), 73-79.
doi: 10.1027/1614-2241.4.2.73
- Malhotra, N. K. (2010). Marketing Research: An Applied Orientation. Sixth Edition,
Boston, MA: Pearson.
- Marlow, L., Inman, D. & Shwery, C. (2005). To What Extent are Literacy Initiatives Being
Supported: Important Questions for Administrators. Reading Improvement, 42(3), 179.
http://eric.ed.gov/?id=EJ725388
- Matell, M.S. & Jacoby, J. (1971). Is There an Optimal Number of Alternatives for
Likert Scale Items? Study 1: Reliability and Validity. Educational and Psychological
Measurement, 31, 657-674. http://psycnet.apa.org/journals/apl/56/6/506/
- Morris, S.B. (2008). Estimating Effect Sizes from Pretest – Posttest – Control Group
Designs. Organizational Research Methods, 11(2), 364-386.
- Oaster, T.R.F. (1989). Number of Alternatives per Choice Point and Stability of Likert-type
Scales. Perceptual and Motor Scales, 68, 549-550. doi: 10.2466/pms.1989.68.2.549
- Osteras, N., Gulbrandsen, P., Garratt, A., Benth, J.S., Dahl, F.A., Natvig, B. & Brage, S.
(2008). A Randomised Comparison of a Four and a Five-Point Scale Version of the
Norwegian Function Assessment Scale. Health and Quality of Life Outcomes, 6(14),
1-9. doi: http://www.hqlo.com/content/6/1/14
- Preston, C.C. & Colman, A.M. (2000). Optimal Number of Response Categories in Rating
Scales: Reliability, Validity, Discriminating Power and Respondent Preferences. Acta
Psychologica, 104, 1-15. doi: 10.1016/S0001-6918(99)00050-5
- Qin, H., Prybutok, V.R. & Zhao, Q. (2010). Perceived Service Quality in Fast-food
Restaurants: Empirical Evidince from China. International Journal of Quality &
Reliability Management, 27(4) , 424-437. doi: 10.1108/02656711011035129
- Ramsay, J.O. (1973). The Effect of Number of Categories in Rating Scales on
Precision of Estimation of Scale Values. Psychometrika, 38, 513-533. doi:
10.1177/014662168500900103
- Sallot, L.M. & Lyon, L.J. (2003). Investigating Effects of Tolerance – Intolerence of Ambiguity
and the Teaching of Public Relations Writing: A Quasi-Experiment. Journalism &
Mass Communication Educator, 58(3), 251-272. doi: 10.1177/107769580305800304
- Symonds, P.M. (1924). On the Loss of Reliability in Ratings Due to Coarseness of the Scale.
Journal of Experimental Psychology, 7, 456-461. doi: 10.1177/014662168500900103
- Viswanathan, M., Sudman, S. & Johnson, M. (2004). Maximum versus Meaningful
Discrimination in Scale Response: Implications for Validity of Measurement of
Consumer Perceptions About Products. Journal of Business Research, 57, 108-124.
doi: 10.1016/S0148-2963(01)00296-X
- Weathers, D., Sharma, S. & Niedrich, R.W. (2005). The Impact of the Number of Scale
Points, Dispositional Factors and the Status Quo Decision Heuristic on Scale Reliability
and Response Accuracy. Journal of Business Research, 58, 1516-1524. doi: 10.1016/j.
jbusres.2004.08.002
- Weijters, B., Cabooter, E. & Schillewaert, N. (2010). The Effect of Rating Scale Format
on Response Styles: The Number of Response Categories and Response Category
Labels. International Journal of Research in Marketing, 27, 236-247. doi: 10.1016/j.
ijresmar.2010.02.004
- Weng, L.J. (2004). Impact of the Number of Response Categories and Anchor Labels
on Coefficient Alpha and Test-retest Reliability. Educational and Psychological
Measurement, 64(6), 956-972. doi: 10.1177/0013164404268674
- Woodruff, D.J. & Feldt, L.S. (1986). Tests for Equality of Several Alpha Coefficients
When Their Sample Estimates are Dependent. Psychometrika, 51, 393-413. http://link.
springer.com/article/10.1007/BF02294063
- Zhou, L. (2004). A Dimension-specific Analysis of Performance-Only Measurement of
Service Quality and Satisfaction in China’s Retail Banking. The Journal of Services
Marketing, 18(6/7), 534-546. doi: 10.1108/08876040410561866
Impact of the Number of Scale Points on Data Characteristics and Respondents’ Evaluations: An Experimental Design Approach Using 5-Point and 7-Point Likert-type Scales
Year 2016,
Issue: 55, 1 - 20, 30.10.2016
Oylum Korkut Altuna
,
F.müge Arslan
Abstract
A remarkable deal of social research is based on data collected through the use of Likerttype scales. The optimal number of response categories in Likert-type scales has been subject to an academic debate for years. This article studies the differences between 5- and 7-point Likert-type scales using the SERVPERF Scale, which was developed by Cronin and Taylor in 1992, as the measuring instrument. A pretest-posttest control group experimental design was used to test whether the differently pointed response categories lead to any statistical differences in data characteristics, dimensional structure of the scale and data fit. Results do not show any statistically significant differences in terms of normality and reliability whereas different dimensional structures are achieved for the 5- and 7-point scale formats of SERVPERF using Exploratory Factor Analysis. ANCOVA results reveal that the number of response categories is not affective on the participants’ evaluations of SERVPERF. The results of confirmatory factor analysis show that the best fit is achieved for the 7-point SERVPERF.
References
- Alford, W.K., Malouff, J.M. & Osland, K.S. (2005). Written Emotional Expression as a
Coping Method in Child Protective Services Officers. International Journal of Stress
Management, 12(2), 182-183.
- Babakus, E. & Boller, G.W. (1992). An Emprical Assessment of the SERVQUAL Scale.
Journal of Business Research, 24(3), 253-268. doi: 10.1016/0148-2963(92)90022-4
- Bearden, W.O., Netmeyer, R.G. & Mobley, M. (1993). Handbook of Marketing Scales:
Multi-item Measures for Marketing and Consumer Behavior Research. Newbury Park,
CA: Sage.
- Bendig, A.W. (1953). The Reliability of Self-ratings as a Function of the Amount of
Verbal Anchoring and the Number of Categories on The Scale. The Journal of Applied
Psychology, 37, 38-41. doi: 10.1037/h0055647
- Bendig, A.W. (1954). Reliability and The Number of Rating Scale Categories. The Journal
of Applied Psychology, 38, 38-40.
- Brady, M.K., Cronin, J.J.Jr. & Brand, R.R. (2002). Performance-only Measurement of
Service Quality: A Replication and Extension. Journal of Business Research, 55(1),
17-31.
- Brown, G.; Wilding, R.E. & Coulter, R.L. (1991). Customer Evaluation of Retail
Salespeople Using the SOCO Scale: A Replication Extension and Application. Journal
of the Academy of Marketing Science, 9, 347-351.
- Carillat, F.A., Jaramillo, F. & Mulki, J.P. (2007). The Validity of the SERVQUAL and
SERVPERF Scales: A Meta-analytical View of 17 Years of Research Across Five
Continents. International Journal of Service Industry Management, 18(5), 472-490.
doi: 10.1108/09564230710826250
- Chang, L. (1994). A Psychometric Evaluation of Four-point and Six-point Likert-type
Scales in Relation to Reliability and Validity. Applied Psychological Measurement,
18, 205-215.
- Choudhury, S. & Bhattacharjee, D. (2014). Optimal Number of Scale Points in Likert Type
Scales for Quantifying Compulsive Buying Behaviour. Asian Journal of Management
Research, 4(3), 431-440.
- Cicchetti, D.V., Showalter, D. & Tyrer, P.J. (1985). The Effect of Number of Rating Scale
Categories on Levels of Inter-rater Reliability: A Monte-Carlo Investigation. Applied
Psychological Measurement, 9, 31-36.
- Cohen, J. (1988). Statistical Power Analysis for the Behavioural Sciences. Second Edition,
New York, NY: Academic Press.
- Colman, A.M., Norris, C.E. & Preston, C.C. (1997). Comparing Rating Scales of Different
Lengths: Equivalence of Scores from 5-point and 7-point Scales. Psychological
Reports, 80, 355-362. doi: 10.2466/pr0.1997.80.2.355
- Cortina, J.M. (1993). What is Coefficient Alpha? An Examination of Theory and
Applications. Journal of Applied Psychology, 78, 98-104. doi: 10.1037/0021-
9010.78.1.98
- Cox, E.P. (1980). The Optimal Number of Response Alternatives for a Scale: A Review.
Journal of Marketing Research, 17, 407-422.
- Crocker, L. & Algina, J. (1986). Introduction to Classical and Modern Test Theory. New
York, NY: Holt, Rinehart & Winston.
- Cronin, J.J.Jr. & Taylor, A.S.(1992). Measuring Service Quality: A Reexamination and an
Extension. Journal of Marketing, 56(3), 243-253.
- Dawes, J. (2008). Do Data Characteristics Change According to the Number of Scale Points
Used?, International Journal of Market Research, 50(1), 61 – 77. http://citeseerx.ist.
psu.edu/viewdoc/download?doi=10.1.1.417.9488&rep=rep1&type=pdf
- Dimitrov, D.M. & Rumrill, Jr., P.D. (2003). Pretest-Posttest Designs and Measurement of
Change. Work, 20, 159-165. http://iospress.metapress.com/content/7x9hgpq885t2yttq/
- Doğan, V., Özkara, B.Y., Yılmaz, C. and Torlak, Ö. (2014). Katılım Düzeyi Seçenek
Sayısının Veri Karakteristiği ve Veri Kalitesi Kapsamında İncelenmesi: Optimal
Katılım Düzeyi Seçenek Sayısına İlişkin Bir Çıkarım (An Examination of the Optimal
Number of Response Categories in terms of Data Characteristics and Data Quality: An
Inference Regarding the Optimal Number of Response Categories). In the Proceedings
of the 19th Annual Turkish National Marketing Congress, Gaziantep, TURKEY.
- Field, A. (2012). Discovering Statistics Using IBM SPSS Statistics. Fourth Edition,
London: Sage Publications.
- Finn, R.H. (1972). Effects of Some Variations in Rating Scale Characteristics on the Means
and Reliabilities of Ratings. Educational and Psychological Measurement, 32(7), 255-
265.
- Garner, W.R. (1960). Rating Scales, Discriminability and Information Transmission.
Psychological Review, 67,343-352.
- Green, J.A. & Rao, V.R. (1970). Rating Scales and Information Recovery: How Many
Scales and Response Categories to Use? Journal of Marketing, 34, 33-39.
- Howell, D.C. (1992). Statistical Methods for Psychology. Boston, MA: Duxbury Press.
- Huck, S.W. (2008). Reading Statistics and Research. Fifth Edition, Boston, MA: Pearson
Education, Inc.
- Jain, S.K. & Gupta, G. (2004). Measuring Service Quality: SERVQUAL vs SERVPERF
Scales. The Journal for Decision Makers, 29(2), 25-37. http://www.vikalpa.com/pdf/
articles/2004/2004_apr_jun_25_37.pdf
- Janssens, W., Wijnen, K. Pelsmacker, P.D. & Van Kenhove, P. (2008). Marketing Research
with SPSS. London: Pearson Education Limited.
- Jones, R.R. (1968). Differences in Response Consistency and Subjects’ Preferences for
Three Personality Inventory Response Formats. In Proceedings of the 76th Annual
Convention of the American Psychological Association, 247-248.
- Jöreskog, K.G. and Sörbom, D. (1993). Lisrel 8: Structural Equation Modeling with
Simplis Command Language. Scientific Software International.
- Lai, M., Li, Yongjian & Liu, Y. (2010). Determining the Optimal Scale Width for a Rating
Scale Using an Integrated Discrimination Fuction. Measurement, 43, 1458-1471. doi:
10.1016/j.measurement.2010.08.012
- Leung, S.O. (2011). A Comparison of Psychometric Properties and Normality in 4-,5-,6
and 11-Point Likert Scales. Journal of Social Service Research, 37, 412-421. doi:10.1
080/01488376.2011.580697
- Loken, B., Pirie, P., Virnig, K.A., Hinkle, R.L. & Salmon, C.T. (1987). The Use of 0-10
Scales in Telephone Surveys. Journal of the Market Research Society, 29(3), 353-362.
- Lozano, L.M., Garcia-Cueto, E. & Muniz, J. (2008). Effect of the Number of Response
Categories on the Reliability and Validity of Rating Scales. Methodology, 4(2), 73-79.
doi: 10.1027/1614-2241.4.2.73
- Malhotra, N. K. (2010). Marketing Research: An Applied Orientation. Sixth Edition,
Boston, MA: Pearson.
- Marlow, L., Inman, D. & Shwery, C. (2005). To What Extent are Literacy Initiatives Being
Supported: Important Questions for Administrators. Reading Improvement, 42(3), 179.
http://eric.ed.gov/?id=EJ725388
- Matell, M.S. & Jacoby, J. (1971). Is There an Optimal Number of Alternatives for
Likert Scale Items? Study 1: Reliability and Validity. Educational and Psychological
Measurement, 31, 657-674. http://psycnet.apa.org/journals/apl/56/6/506/
- Morris, S.B. (2008). Estimating Effect Sizes from Pretest – Posttest – Control Group
Designs. Organizational Research Methods, 11(2), 364-386.
- Oaster, T.R.F. (1989). Number of Alternatives per Choice Point and Stability of Likert-type
Scales. Perceptual and Motor Scales, 68, 549-550. doi: 10.2466/pms.1989.68.2.549
- Osteras, N., Gulbrandsen, P., Garratt, A., Benth, J.S., Dahl, F.A., Natvig, B. & Brage, S.
(2008). A Randomised Comparison of a Four and a Five-Point Scale Version of the
Norwegian Function Assessment Scale. Health and Quality of Life Outcomes, 6(14),
1-9. doi: http://www.hqlo.com/content/6/1/14
- Preston, C.C. & Colman, A.M. (2000). Optimal Number of Response Categories in Rating
Scales: Reliability, Validity, Discriminating Power and Respondent Preferences. Acta
Psychologica, 104, 1-15. doi: 10.1016/S0001-6918(99)00050-5
- Qin, H., Prybutok, V.R. & Zhao, Q. (2010). Perceived Service Quality in Fast-food
Restaurants: Empirical Evidince from China. International Journal of Quality &
Reliability Management, 27(4) , 424-437. doi: 10.1108/02656711011035129
- Ramsay, J.O. (1973). The Effect of Number of Categories in Rating Scales on
Precision of Estimation of Scale Values. Psychometrika, 38, 513-533. doi:
10.1177/014662168500900103
- Sallot, L.M. & Lyon, L.J. (2003). Investigating Effects of Tolerance – Intolerence of Ambiguity
and the Teaching of Public Relations Writing: A Quasi-Experiment. Journalism &
Mass Communication Educator, 58(3), 251-272. doi: 10.1177/107769580305800304
- Symonds, P.M. (1924). On the Loss of Reliability in Ratings Due to Coarseness of the Scale.
Journal of Experimental Psychology, 7, 456-461. doi: 10.1177/014662168500900103
- Viswanathan, M., Sudman, S. & Johnson, M. (2004). Maximum versus Meaningful
Discrimination in Scale Response: Implications for Validity of Measurement of
Consumer Perceptions About Products. Journal of Business Research, 57, 108-124.
doi: 10.1016/S0148-2963(01)00296-X
- Weathers, D., Sharma, S. & Niedrich, R.W. (2005). The Impact of the Number of Scale
Points, Dispositional Factors and the Status Quo Decision Heuristic on Scale Reliability
and Response Accuracy. Journal of Business Research, 58, 1516-1524. doi: 10.1016/j.
jbusres.2004.08.002
- Weijters, B., Cabooter, E. & Schillewaert, N. (2010). The Effect of Rating Scale Format
on Response Styles: The Number of Response Categories and Response Category
Labels. International Journal of Research in Marketing, 27, 236-247. doi: 10.1016/j.
ijresmar.2010.02.004
- Weng, L.J. (2004). Impact of the Number of Response Categories and Anchor Labels
on Coefficient Alpha and Test-retest Reliability. Educational and Psychological
Measurement, 64(6), 956-972. doi: 10.1177/0013164404268674
- Woodruff, D.J. & Feldt, L.S. (1986). Tests for Equality of Several Alpha Coefficients
When Their Sample Estimates are Dependent. Psychometrika, 51, 393-413. http://link.
springer.com/article/10.1007/BF02294063
- Zhou, L. (2004). A Dimension-specific Analysis of Performance-Only Measurement of
Service Quality and Satisfaction in China’s Retail Banking. The Journal of Services
Marketing, 18(6/7), 534-546. doi: 10.1108/08876040410561866