Araştırma Makalesi
BibTex RIS Kaynak Göster

Comparison of Penalized Regression Methods through a Simulation Study

Yıl 2022, Cilt: 9 Sayı: 1, 80 - 91, 30.06.2022
https://doi.org/10.35193/bseufbd.994181

Öz

Penalized regression methods are often used to obtain stable coefficient estimates in case of multicollinearity problems in the dataset. In addition, these methods can make automatic variable selection depending on the nature of the penalty term applied. In this study, a detailed comparison of the performances of ridge, LASSO, elastic net and adaptive LASSO penalized regression methods, which are widely used in the literature, is made through simulation studies depending on the structure of the real coefficient vector. Mean squared error on the test set, misclassification rate, false positive rate and active set sizes are used as comparison criteria in the study. Simulation studies show that the structure of the real coefficient vector has a significant effect on the model performance revealed by the methods. 

Kaynakça

  • Montgomery, D. C., Peck, E. A. & Vining, G. G. (2021). Introduction to linear regression analysis, John Wiley & Sons.
  • Hoerl, A. E. & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics12 (1), 55-67.
  • Rao, C. R. & Toutenburg, H. (1995). Linear models, Springer.
  • Sarkar, N. (1992). A new estimator combining the ridge regression and the restricted least squares methods of estimation. Communications in statistics-theory and methods21 (7), 1987-2000.
  • Kaçıranlar, S., Sakallıoğlu, S., Akdeniz, F., Styan, G. P. & Werner, H. J. (1999). A new biased estimator in linear regression and a detailed analysis of the widely-analysed dataset on Portland cement. Sankhyā: The Indian Journal of Statistics, Series B, 443-459.
  • Özkale, M. R. & Kaçıranlar, S. (2007). The restricted and unrestricted two-parameter estimators. Communications in Statistics-Theory and Methods36 (15), 2707-2725.
  • Miller, A. (2002). Subset selection in regression, CRC Press.
  • Frank, L. E. & Friedman, J. H. (1993). A statistical view of some chemometrics regression tools. Technometrics35 (2), 109-135.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological)58 (1), 267-288.
  • Zou, H. & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology)67 (2), 301-320.
  • Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American statistical association 101(476), 1418-1429.
  • Sirimongkolkasem, T., & Drikvandi, R. (2019). On regularisation methods for analysis of high dimensional data. Annals of Data Science 6(4), 737-763.
  • Meinshausen, N., & Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. The annals of statistics 37(1), 246-270.
  • Yüzbaşı, B., Arashi, M., & Ejaz Ahmed, S. (2020). Shrinkage Estimation Strategies in Generalised Ridge Regression Models: Low/High‐Dimension Regime. International Statistical Review 88(1), 229-251.
  • Ahmed, S. E., Kim, H., Yıldırım, G., & Yüzbaşı, B. (2016). High-Dimensional Regression Under Correlated Design: An Extensive Simulation Study. In International Workshop on Matrices and Statistics (pp. 145-175). Springer, Cham.
  • Shahriari, S., Faria, S., & Gonçalves, A. M. (2015). Variable selection methods in high-dimensional regression—A simulation study. Communications in Statistics-Simulation and Computation 44(10), 2548-2561.
  • Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction (Vol. 2, pp. 1-758). New York: springer.
  • Tibshirani, R. (1997). The lasso method for variable selection in the Cox model. Statistics in medicine16 (4), 385-395.
  • Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. & Knight, K. (2005). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology)67 (1), 91-108.
  • Zhao, P. & Yu, B. (2006). On model selection consistency of Lasso. The Journal of Machine Learning Research7, 2541-2563.
  • Chang, L., Roberts, S. & Welsh, A. (2018). Robust lasso regression using Tukey's biweight criterion. Technometrics60 (1), 36-47.
  • Hussami, N., & Tibshirani, R. J. (2015). A component lasso. Canadian Journal of Statistics 43(4), 624-646.

Bir Simülasyon Çalışması ile Cezalı Regresyon Yöntemlerinin Karşılaştırılması

Yıl 2022, Cilt: 9 Sayı: 1, 80 - 91, 30.06.2022
https://doi.org/10.35193/bseufbd.994181

Öz

Veri kümesinde çoklu iç ilişki problemi olması durumunda kararlı katsayı tahminleri elde etmek için sıklıkla cezalı regresyon yöntemleri kullanılır. Ayrıca bu yöntemler uygulanan ceza teriminin yapısına bağlı olarak otomatik değişken seçimi de yapabilmektedir. Bu çalışmada literatürde yaygın kullanım alanı bulan ridge, LASSO, elastik net ve uyarlanabilir LASSO cezalı regresyon yöntemlerinin gerçek katsayı vektörünün yapısına bağlı olarak simülasyon çalışmaları yoluyla performanslarının ayrıntılı olarak karşılaştırılması yapılmıştır. Çalışmada karşılaştırma kriteri olarak test kümesi üzerinde hata kareler ortalaması, yanlış sınıflama oranı, yanlış pozitif oranı ve aktif küme büyüklükleri kullanılmıştır. Simülasyon çalışmaları, gerçek katsayı vektörünün yapısının yöntemlerin ortaya çıkardığı model performansı üzerinde önemli etkisinin olduğunu göstermektedir.

Kaynakça

  • Montgomery, D. C., Peck, E. A. & Vining, G. G. (2021). Introduction to linear regression analysis, John Wiley & Sons.
  • Hoerl, A. E. & Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics12 (1), 55-67.
  • Rao, C. R. & Toutenburg, H. (1995). Linear models, Springer.
  • Sarkar, N. (1992). A new estimator combining the ridge regression and the restricted least squares methods of estimation. Communications in statistics-theory and methods21 (7), 1987-2000.
  • Kaçıranlar, S., Sakallıoğlu, S., Akdeniz, F., Styan, G. P. & Werner, H. J. (1999). A new biased estimator in linear regression and a detailed analysis of the widely-analysed dataset on Portland cement. Sankhyā: The Indian Journal of Statistics, Series B, 443-459.
  • Özkale, M. R. & Kaçıranlar, S. (2007). The restricted and unrestricted two-parameter estimators. Communications in Statistics-Theory and Methods36 (15), 2707-2725.
  • Miller, A. (2002). Subset selection in regression, CRC Press.
  • Frank, L. E. & Friedman, J. H. (1993). A statistical view of some chemometrics regression tools. Technometrics35 (2), 109-135.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological)58 (1), 267-288.
  • Zou, H. & Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the royal statistical society: series B (statistical methodology)67 (2), 301-320.
  • Zou, H. (2006). The adaptive lasso and its oracle properties. Journal of the American statistical association 101(476), 1418-1429.
  • Sirimongkolkasem, T., & Drikvandi, R. (2019). On regularisation methods for analysis of high dimensional data. Annals of Data Science 6(4), 737-763.
  • Meinshausen, N., & Yu, B. (2009). Lasso-type recovery of sparse representations for high-dimensional data. The annals of statistics 37(1), 246-270.
  • Yüzbaşı, B., Arashi, M., & Ejaz Ahmed, S. (2020). Shrinkage Estimation Strategies in Generalised Ridge Regression Models: Low/High‐Dimension Regime. International Statistical Review 88(1), 229-251.
  • Ahmed, S. E., Kim, H., Yıldırım, G., & Yüzbaşı, B. (2016). High-Dimensional Regression Under Correlated Design: An Extensive Simulation Study. In International Workshop on Matrices and Statistics (pp. 145-175). Springer, Cham.
  • Shahriari, S., Faria, S., & Gonçalves, A. M. (2015). Variable selection methods in high-dimensional regression—A simulation study. Communications in Statistics-Simulation and Computation 44(10), 2548-2561.
  • Hastie, T., Tibshirani, R., Friedman, J. H., & Friedman, J. H. (2009). The elements of statistical learning: data mining, inference, and prediction (Vol. 2, pp. 1-758). New York: springer.
  • Tibshirani, R. (1997). The lasso method for variable selection in the Cox model. Statistics in medicine16 (4), 385-395.
  • Tibshirani, R., Saunders, M., Rosset, S., Zhu, J. & Knight, K. (2005). Sparsity and smoothness via the fused lasso. Journal of the Royal Statistical Society: Series B (Statistical Methodology)67 (1), 91-108.
  • Zhao, P. & Yu, B. (2006). On model selection consistency of Lasso. The Journal of Machine Learning Research7, 2541-2563.
  • Chang, L., Roberts, S. & Welsh, A. (2018). Robust lasso regression using Tukey's biweight criterion. Technometrics60 (1), 36-47.
  • Hussami, N., & Tibshirani, R. J. (2015). A component lasso. Canadian Journal of Statistics 43(4), 624-646.
Toplam 22 adet kaynakça vardır.

Ayrıntılar

Birincil Dil Türkçe
Konular Mühendislik
Bölüm Makaleler
Yazarlar

Murat Genç 0000-0002-6335-3044

Yayımlanma Tarihi 30 Haziran 2022
Gönderilme Tarihi 13 Eylül 2021
Kabul Tarihi 7 Mart 2022
Yayımlandığı Sayı Yıl 2022 Cilt: 9 Sayı: 1

Kaynak Göster

APA Genç, M. (2022). Bir Simülasyon Çalışması ile Cezalı Regresyon Yöntemlerinin Karşılaştırılması. Bilecik Şeyh Edebali Üniversitesi Fen Bilimleri Dergisi, 9(1), 80-91. https://doi.org/10.35193/bseufbd.994181