Determination of the Number of Bins/Classes Used in Histograms and Frequency Tables: A Short Bibliography
Yıl 2010,
Cilt: 7 Sayı: 2, 77 - 86, 15.12.2010
Nurhan Doğan
,
İsmet Doğan
Öz
The histogram is the oldest and most popular tool for graphical display of a univariate set of data. An important parameter that needs to be specified when constructing a histogram is the bin width (also called the interval width or class width). This is simply the length of the subintervals of the real line, sometimes called “bins” or “intervals” (also called “classes”), on which the histogram is based. Frequency distributions facilitate the organization and presentation of data. A major issue with all classifying techniques is how to select the number of classes. There is no “correct” answer for every set of data. Each case must be treated separately; each frequency table must be designed individually. The number of bins / classes increases as the sample size increases. Larson’s (1975) formula had the lowest number of bins / classes and the Ishikawa’s (1986) formula had the highest one for n>300. The aim of this study is to give a short bibliography for bins/class numbers.
Kaynakça
- Bendat, S. J., Piersol, A.G., 1966. Measurements and Analysis of Random Data. John Wiley & SONS, Inc., New York.
- Berenson, M. L., Levine, D.M., 1992. Basic Business Statistics: Concepts and Applications. Prentice Hall Englewood Cliffs, New Jersey.
- Brown, L.D., Hwang, J.T., 1993. How to Approximate a Histogram by a Normal Density. The American Statistician, 47 (4), 251-255.
- Cencov, N.N., 1962. Evaluation of an Unknown Distribution Density From Observations. Soviet Mathematics, 3, 1559-1562.
- Cohran, W.G., 1954. Some Methods for Strengthening the Common Chi Square Test. Biometrics, 10 (4), 417-451.
- Denby, L., Mallows, C., 2009. Variations on the Histogram. Journal of Computational and Graphical Statistics, 18(1), 21-31.
- Davies, G.R., 1929. The Analysis of Frequency Distributions. Journal of the American Statistical Association, 24 (168), 349-366.
- Doane, D.P., 1976. Aesthetic Frequency Classifications. The American Statistician, 30 (4), 181-183.
- Freedman, D., Diaconis, P., 1981. On The Histogram as a Density Estimator: L2 Theory. Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete, 57, 453-476.
- Gislason, E. A., Goldfield, E. M., 1984. Determination of the Optimum Number of Bins to Use in a Histogrammic Representation of a Probability Density Function. Journal of Chemical Physics, 80(2), 701-704.
- He, K., Meeden, G., 1997. Selecting the Number of Bins in a Histogram: A Decision Theoretic Approach. Journal of Statistical Planning and Inference, 61, 59-69.
- Hirai, Y., Some Remarks on Class Interval of Histograms, 2009.
http://eprints.lib.okayama-u.ac.jp/ 9696/1/082_0113_0117.pdf.
- Hyndman, R. J., The Problem with Sturges’ Rule for Constructing Histograms, 1995, Monash University, www.robjhyndman.com/papers/sturges.pdf.
- Ishikawa, K., 1986. Guide to Quality Control. White Plains, New York: Unipub, Kraus International.
- Knuth, K.H., Optimal Data-Based Binning for Histograms, Draft Paper, 2006. http://www.huginn.com/knuth/papers/knuth-histo-draft-060221.pdf.
- Larson, H.J., 1975. Statistics: An Introduction. John Wiley & SONS, Inc., New York.
- Lohaka, H.O., 2007. Making a Grouped Data Frequency Table: Development and Examination of the Iteration Algorithm. PhD Thesis, College of Education, Ohio University (unpublished).
- Mann, H. B., Wald, A., 1942. On the Choice of the Number of Class Intervals in the Application of the Chi Square Test. The Annals of Mathematical Statistics, 13 (3), 306-317.
- Mori, T., 1974. An Optimal Length of Class Interval for Histogram. Japan Journal of Applied Statistics, 4(1), 17-24.
- Mosteller, F., Tukey, J.W., 1977. Data Analysis and Regression, A Second Course in Statistics. Addison-Wesley, Reading, MA.
- Plane, D. R., Oppermann, E.B., 1981. Business and Economic Statistics. Business Publications, Inc., Plano, Texas.
- Rissanen, J., 1992. Stochastic Complexity in Statistical Inquary. World Scientific, Singapore.
- Rudemo, M., 1982. Empricial Choice of Histograms and Kernel Density Estimators. Scandinavian Journal of Statistics, 9 (2), 65-78.
- Scott, D. W., 1979. On Optimal and Data-Based Histograms. Biometrika, 66(3), 605-610.
- Scott, D. W., 1992. Multivariate Density Estimation: Theory, Practice and Visualization. John Wiley & SONS, Inc., New York.
- Sturges, H.A., 1926. The Choice of a Class Interval. Journal of the American Statistical Association, 21 (153), 65-66.
- Suzuki, G., 1985. Effective Use of Graphical Representation in Statistics. Japan Journal of Applied Statistics, 14(1), 27-37.
- Terrel, G.R., Scott, D.W., 1985. Oversmoothed Nonparametric Density Estimates. Journal of the American Statistical Association, 80 (389), 209-214.
- Velleman, P.F., 1976. Interactive Computing for Exploratory Data Analysis I: Display Algorihms. 1975 Proceedings of the Statistical Computing Section, 142-147, Washington DC: American Statistical Association.
- Wand, M.P., 1997. Data-Based Choice of Histogram Bin Width. The American Statistician, 51 (1), 59-64.
Histogramlarda ve Sıklık Tablolarında Kullanılan Sütun / Sınıf Sayılarının Belirlenmesi: Kısa Bir Bibliyografya
Yıl 2010,
Cilt: 7 Sayı: 2, 77 - 86, 15.12.2010
Nurhan Doğan
,
İsmet Doğan
Öz
Tek değişkenli bir veri setinin grafiksel gösterimi için en eski ve en popüler yöntem histogramdır. Bir histogramın oluşturulmasında belirlenmesi gereken önemli bir parametre ise aralık genişliği veya sınıf genişliği olarak da bilinen sütun genişliğidir. Histogramların esası olan sütun genişliği basit olarak sütun, aralık veya sınıf olarak isimlendirilen alt grupların uzunluğudur. Sıklık dağılımları ise verinin düzenlenmesi ve sunulmasında yardımcı olur. Sınıflandırma tekniklerinin hemen tamamında temel sorun sınıf sayısının nasıl seçileceğidir. Sınıf sayısı ile ilgili her veri seti için geçerli doğru bir cevap bulunmamaktadır. Her bir durum ayrı ayrı dikkate alınmalı, her bir sıklık tablosu kendine özgü olarak düzenlenmelidir. Örnek büyüklüğü arttıkça sütun / sınıf sayısı artmaktadır. n>300 olması durumunda en düşük sütun / sınıf sayısı Larson (1975) formülünden, en yüksek sütun / sınıf sayısı ise
Ishikawa (1986) formülünden elde edilmekedir. Bu çalışmanın amacı, sütun / sınıf sayısının belirlenmesi ile ilgili kısa bir bibliyografya vermektir.
Kaynakça
- Bendat, S. J., Piersol, A.G., 1966. Measurements and Analysis of Random Data. John Wiley & SONS, Inc., New York.
- Berenson, M. L., Levine, D.M., 1992. Basic Business Statistics: Concepts and Applications. Prentice Hall Englewood Cliffs, New Jersey.
- Brown, L.D., Hwang, J.T., 1993. How to Approximate a Histogram by a Normal Density. The American Statistician, 47 (4), 251-255.
- Cencov, N.N., 1962. Evaluation of an Unknown Distribution Density From Observations. Soviet Mathematics, 3, 1559-1562.
- Cohran, W.G., 1954. Some Methods for Strengthening the Common Chi Square Test. Biometrics, 10 (4), 417-451.
- Denby, L., Mallows, C., 2009. Variations on the Histogram. Journal of Computational and Graphical Statistics, 18(1), 21-31.
- Davies, G.R., 1929. The Analysis of Frequency Distributions. Journal of the American Statistical Association, 24 (168), 349-366.
- Doane, D.P., 1976. Aesthetic Frequency Classifications. The American Statistician, 30 (4), 181-183.
- Freedman, D., Diaconis, P., 1981. On The Histogram as a Density Estimator: L2 Theory. Zeitschrift fur Wahrscheinlichkeitstheorie und Verwandte Gebiete, 57, 453-476.
- Gislason, E. A., Goldfield, E. M., 1984. Determination of the Optimum Number of Bins to Use in a Histogrammic Representation of a Probability Density Function. Journal of Chemical Physics, 80(2), 701-704.
- He, K., Meeden, G., 1997. Selecting the Number of Bins in a Histogram: A Decision Theoretic Approach. Journal of Statistical Planning and Inference, 61, 59-69.
- Hirai, Y., Some Remarks on Class Interval of Histograms, 2009.
http://eprints.lib.okayama-u.ac.jp/ 9696/1/082_0113_0117.pdf.
- Hyndman, R. J., The Problem with Sturges’ Rule for Constructing Histograms, 1995, Monash University, www.robjhyndman.com/papers/sturges.pdf.
- Ishikawa, K., 1986. Guide to Quality Control. White Plains, New York: Unipub, Kraus International.
- Knuth, K.H., Optimal Data-Based Binning for Histograms, Draft Paper, 2006. http://www.huginn.com/knuth/papers/knuth-histo-draft-060221.pdf.
- Larson, H.J., 1975. Statistics: An Introduction. John Wiley & SONS, Inc., New York.
- Lohaka, H.O., 2007. Making a Grouped Data Frequency Table: Development and Examination of the Iteration Algorithm. PhD Thesis, College of Education, Ohio University (unpublished).
- Mann, H. B., Wald, A., 1942. On the Choice of the Number of Class Intervals in the Application of the Chi Square Test. The Annals of Mathematical Statistics, 13 (3), 306-317.
- Mori, T., 1974. An Optimal Length of Class Interval for Histogram. Japan Journal of Applied Statistics, 4(1), 17-24.
- Mosteller, F., Tukey, J.W., 1977. Data Analysis and Regression, A Second Course in Statistics. Addison-Wesley, Reading, MA.
- Plane, D. R., Oppermann, E.B., 1981. Business and Economic Statistics. Business Publications, Inc., Plano, Texas.
- Rissanen, J., 1992. Stochastic Complexity in Statistical Inquary. World Scientific, Singapore.
- Rudemo, M., 1982. Empricial Choice of Histograms and Kernel Density Estimators. Scandinavian Journal of Statistics, 9 (2), 65-78.
- Scott, D. W., 1979. On Optimal and Data-Based Histograms. Biometrika, 66(3), 605-610.
- Scott, D. W., 1992. Multivariate Density Estimation: Theory, Practice and Visualization. John Wiley & SONS, Inc., New York.
- Sturges, H.A., 1926. The Choice of a Class Interval. Journal of the American Statistical Association, 21 (153), 65-66.
- Suzuki, G., 1985. Effective Use of Graphical Representation in Statistics. Japan Journal of Applied Statistics, 14(1), 27-37.
- Terrel, G.R., Scott, D.W., 1985. Oversmoothed Nonparametric Density Estimates. Journal of the American Statistical Association, 80 (389), 209-214.
- Velleman, P.F., 1976. Interactive Computing for Exploratory Data Analysis I: Display Algorihms. 1975 Proceedings of the Statistical Computing Section, 142-147, Washington DC: American Statistical Association.
- Wand, M.P., 1997. Data-Based Choice of Histogram Bin Width. The American Statistician, 51 (1), 59-64.