Research Article
BibTex RIS Cite

Regresyon Çözümlemesinde Kayıp Veri Sorunu

Year 2005, Volume: 4 Issue: 3, 52 - 62, 15.12.2005

Abstract

Kayıp veri çözümlemesinin konusu veri matrisindeki bazı değerlerin gözlenmemiş olmasıdır. Kayıp veri çözümlemesi özellikle uygulamalı istatistiğin çok önemli konularından birini oluşturmaktadır. Kayıp veriyi yok saymak, örneklemin rastgeleliğini bozarak yanlı parametre tahminleri elde edilmesine neden olabilmektedir. Regresyon analizi, tahmin amaçlı kullanılan önemli çok değişkenli istatistiksel analizlerin başında gelmektedir. Bu nedenle bu çalışmada, regresyon analizinde, bağımsız değişkenlerde kayıp veri mekanizması rassal kayıp (MAR) olacak şekilde, regresyon analizi varsayımlarının sağlandığı ve sağlanmadığı iki ayrı veri seti üzerinde benzetim çalışması yapılmıştır. Kayıp veri göz ardı edilebilir olduğunda model esaslı yöntemler arasında yer alan, EM algoritması ve çoklu atıf yöntemleri karşılaştırmalı olarak incelenmiştir. EM algoritmasının regresyon analizi varsayımlarının bozulmasından etkilenmediği, ancak çoklu atıf için, atıf sayısının arttırılması gerektiği sonucu elde edilmiştir.

References

  • AFIFI, A.A. and ELASHOFF, R.M. (1966). Missing Observations in Multivariate Statistics: Review of the Literature, J. Am. Statist. Assoic. 61, 595-604.
  • ALLISON, P.D. (2002). Missing Data, Sage Publications, USA
  • ATKINSON, AC. and CHENG, T-C. (2000). On Robust Linear Regression with Incomplete Data, Computational Statistics & Data Analysis, 33, 361.
  • DEMPSTER, A.P., LAIRD, N.M. and RUBIN, D.B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm (with discussion), J. Roy. Statist. Soc. B39, 1-38.
  • HARTLEY, H.O. and HOCKING, R.R. (1971). The Analysis of Incomplete Data, Biometrics 14, 174-194.
  • LITTLE, R.J.A. (1997). Biostatistical Analysis with Missing Data, in Encyclopedia of Biostatistics (P. Armitage and T. Colton, eds.)London: Wiley.
  • LITTLE, R.J.A. and RUBIN, D. B. (1983a). Incomplete Data, in Encylopedia of Biostatistics (P. Armitage and T. Colton, eds.) London: Wiley.
  • LITTLE, R.J.A. and RUBIN, D. B. (2002), 2nd Ed., Statistical Analysis with Missing Data, a John Wiley&Sons, Inc. USA.
  • LITTLE, R.J.A. and SCHENKER, N. (1994). Missing Data in Handbook for Statistical Modeling in the Social and Behavioral Sciences (G. Arminger, C.C. Clogg, and M.E. Sobel, eds.) pp.39-75. NewYork: Plenum.
  • MCLACHLAN, G.J. and KRISHNAN, T. (1997). The EM AIgorithm and Extensions. New York: Wiley.
  • ORCHARD, T. and WOODBURY, M.A. (1972). A Missing Information Principle: Theory and Applications, Proceedings of the 6th Berkeley Symposium on Mathematics, Statistics, and Probability, Volume 1, 697-715.
  • RUBIN, D.B. (1976). Inference and Missing Data. Biometrika 63, 581-592.
  • SCHAFER, J.L. (1997). Analysis of Incomplete Multivariate Data, Chapman&Hall, USA.

The Problem of Missing Data in Regression Analysis

Year 2005, Volume: 4 Issue: 3, 52 - 62, 15.12.2005

Abstract

The subject of missing data analysis consists of a data matrix in which some of the values in the matrix are not observed. Missing data analysis is one of the most important topics in applied statistics. It destroys the randomness of the sample and causes serious bias in the parameter estimate. The regression analysis is one of the most important procedures used for estimation in multivariate statistical analysis. For this reason, in this study, missing data mechanism designed by missing at random (MAR) for independent variables in regression analysis in two different data sets, one that verifies, one that violates regression assumptions; is used. When missing data can be ignored, model based methods that EM algorithm and multiple imputation method are compared. EM algorithm is not affected by the violation of regression assumptions but for multiple imputation number of imputations needs to be increased.

References

  • AFIFI, A.A. and ELASHOFF, R.M. (1966). Missing Observations in Multivariate Statistics: Review of the Literature, J. Am. Statist. Assoic. 61, 595-604.
  • ALLISON, P.D. (2002). Missing Data, Sage Publications, USA
  • ATKINSON, AC. and CHENG, T-C. (2000). On Robust Linear Regression with Incomplete Data, Computational Statistics & Data Analysis, 33, 361.
  • DEMPSTER, A.P., LAIRD, N.M. and RUBIN, D.B. (1977). Maximum Likelihood from Incomplete Data via the EM Algorithm (with discussion), J. Roy. Statist. Soc. B39, 1-38.
  • HARTLEY, H.O. and HOCKING, R.R. (1971). The Analysis of Incomplete Data, Biometrics 14, 174-194.
  • LITTLE, R.J.A. (1997). Biostatistical Analysis with Missing Data, in Encyclopedia of Biostatistics (P. Armitage and T. Colton, eds.)London: Wiley.
  • LITTLE, R.J.A. and RUBIN, D. B. (1983a). Incomplete Data, in Encylopedia of Biostatistics (P. Armitage and T. Colton, eds.) London: Wiley.
  • LITTLE, R.J.A. and RUBIN, D. B. (2002), 2nd Ed., Statistical Analysis with Missing Data, a John Wiley&Sons, Inc. USA.
  • LITTLE, R.J.A. and SCHENKER, N. (1994). Missing Data in Handbook for Statistical Modeling in the Social and Behavioral Sciences (G. Arminger, C.C. Clogg, and M.E. Sobel, eds.) pp.39-75. NewYork: Plenum.
  • MCLACHLAN, G.J. and KRISHNAN, T. (1997). The EM AIgorithm and Extensions. New York: Wiley.
  • ORCHARD, T. and WOODBURY, M.A. (1972). A Missing Information Principle: Theory and Applications, Proceedings of the 6th Berkeley Symposium on Mathematics, Statistics, and Probability, Volume 1, 697-715.
  • RUBIN, D.B. (1976). Inference and Missing Data. Biometrika 63, 581-592.
  • SCHAFER, J.L. (1997). Analysis of Incomplete Multivariate Data, Chapman&Hall, USA.
There are 13 citations in total.

Details

Primary Language Turkish
Subjects Statistics
Journal Section Research Articles
Authors

Neslihan Demirel

Serdar Kurt This is me

Publication Date December 15, 2005
Published in Issue Year 2005 Volume: 4 Issue: 3

Cite

APA Demirel, N., & Kurt, S. (2005). Regresyon Çözümlemesinde Kayıp Veri Sorunu. İstatistik Araştırma Dergisi, 4(3), 52-62.