Sayma Verileri ile Kantil Regresyon: Aşırı Yayılım Veri Örneği

Burcu Durmuş; Öznur İşçi Güneri; Aynur İncekirik

doi:10.35193/bseufbd.1018339

Research Article

Sayma Verileri ile Kantil Regresyon: Aşırı Yayılım Veri Örneği

Year 2022, Volume: 9 Issue: 1, 286 - 303, 30.06.2022

Burcu Durmuş , Öznur İşçi Güneri , Aynur İncekirik

https://doi.org/10.35193/bseufbd.1018339

Abstract

Sayma modellerinde klasik regresyon varsayımları sağlanamamaktadır. Bu nedenle sayma verileri için Poisson ve negatif binom dağılım en bilinen yöntemlerdir. Poisson model eşit yayılım durumunda, negatif binom dağılım aşırı yayılım durumunda kullanılabilir. Uygulamada veriler genellikle aşırı yayılım göstermektedir. Eğer sayma verilerinde fazla sıfır değerli varsa eşit yayılım durumunda zero-inflated Poisson, aşırı yayılım durumunda zero-inflated negatif binom modelleri, Poisson Hurdle ve negatif binom Hurdle modelleri veya bunların genelleştirilmiş modelleri tercih edilebilir. Bu modeller genel olarak bağımlı değişkenin koşullu ortalamasını modellemeye odaklanır. Ancak koşullu ortalama regresyon modelleri, bağımlı değişkenin aykırı değerlerine duyarlı olabilir ya da diğer koşullu dağılım özellikleri hakkında hiçbir bilgi sağlamayabilir. Bu durumda sayma verileri için sağlam yöntemlerden olan kantil regresyon kullanılabilir. Kantil regresyon aykırı değerlerin varlığında sağlam tahmin avantajlarına sahiptir. Bu makalede bağımlı değişken sayma verilerinden oluşan makale sayısıdır. Bağımsız değişkenler cinsiyet, evli olup olmadığı, 5 yaşının altında çocuk sayısı, doktora prestiji ve danışmanın son 3 yıldaki makale sayısı değişkenlerinden oluşmaktadır. Çalışmada Poisson ve negatif binom dağılım uygulandıktan sonra %25, %50, %75 ve %90 kantil regresyon tahminleri elde edilmiştir.

Keywords

Sayma Verisi, Kantil Regresyon, Poisson Regresyon, Negatif Binom Regresyon

References

Khoshgoftaar, T.M., Gao, K. & Szabo, R.M. (2005). Comparing Software Fault Predictions of Pure and Zero- inflated Poisson Regression Models. International Journal of Systems Science 36(11), 707-715.
Cui, Y. & Yang, W. (2009). Zero-İnflated Generalized Poisson Regression Mixture Model for Mapping Quantitative Trait Loci Underlying Count Trait With Many Zeros. Journal of Theoretical Biology, 256, 276-285.
Martin, S.W., Rose, C.E, Wannemuehler, K.A. & Plikaytis, B.D. (2006). On the of Zero-inflated and Hurdle Models for Medelling Vaccine Adverse event Count Data. Journal of Biopharmaceutical Statistics, 16, 463-481.
Lambert, D. (1992). Zero-Inflated Poisson Regression, with An Application to Defects in Manufacturing, Technometrics, 34(1), 1-14.
Green, W.H. (1994). Accounting for Excess Zeros and Sample Selection in Poisson and Negative Binom Regression Models, NYU Working Paper No. EC-94-10, 1-32.
Koenker, R. & Basett, G. (1978). Regression Quantiles, Econometrica, 46(1): 33-50.
Manski, C. F. (1975). Maximum Score Estimation of the Stochastic Utility Model of Choice. Journal of Econometrics, 3, 205–228.
Manski, C.F. (1985). Semiparametric Analysis of Discrete Response: Asymptotic Properties of the Maximum Score Estimator. Journal of Econometrics, 27, 313-333.
Horowitz, J. L. (1992). A Smooth Maximum Score Estimator for the Binary Response Model, Econometrica, 60, 505-531.
Horowitz, J.L. (1998). Semiparametric Methods in Econometrics, New York: Springer-Verlag, 100.
Powell, J.L. (1984). Least Absolute Deviation Estimation for the Censored Regression Model. Journal of Econometrics, 25, 303-325.
Powell, J.L. (1986). Censored Regression Quantiles, Journal of Econometrics, 32, 143-155.
Lee, M.J. (1992). Median Regression for Ordered Discrete Response, Journal of Econometrics, 51, 59-77.
Koenker, R., & Bilias, Y. (2001). Quantile Regression for Duration Data: A Reappraisal of the Pennsylvania Reemployment Bonus Experiments. Empirical Economics, 26, 199-220.
Koenker, R., & Geling, O. (2001). Reappraising Medfly Longevity: A Quantile Regression Survival Analysis. Journal of the American Statistical Association, 96, 458-468.
Machado, J.A.F. & Portugal, P. (2002). Exploring Transition Data through Quantile Regression Methods: An Application to U.S. Unemployment Duration. Statistical Data Analysis Based on the L1-Norm and Related Methods, 77-94.
Machado, J.A.F & Santos Silva, J.M.C. (2005). Quantiles for Counts. Journal of the American Statistical Association, 100(472), 1226-1237.
Wu, H., Gao, L. & Zhang, Z. (2014). Analysis of Crash Data Using Quantile Regression for Counts, Journal of Transportation Engineering,140(4).
Congdon, P. (2017). Quantile Regression for Overdispersed Count Data: A Hierarchical Method. Journal of Statistical Distributions and Applications. 4(18), 1-19.
Chernozhukov, V., Fernández-Val,I., Blaise Melly, B. & Kaspar Wüthrich, K. (2020). Generic Inference on Quantile and Quantile Effect Functions for Discrete Outcomes. Journal of The American Statistical Association, 115(5299, 123-137.
Frumento, P., & Salvati, N. (2021). Parametric modeling of quantile regression coefficient functions with count data. Statistical Methods & Applications, 30:1237–1258.
Lamarche, C., Shib, X., & Young, D.S. (2021). Conditional Quantile Functions for Zero-Inflated Longitudinal Count Data. Econometrics and Statistics (basımda) https://gattonweb.uky.edu/faculty/lamarche/ZIPQR.pdf
Çınar, U.K. (2019). En Küçük Kareler Regresyonuna Alternatif Bir Yöntem: Kantil Regresyon. Avrasya Uluslararası Araştırmalar Dergisi, 7(18), 57-71.
Yu, K., Lu, Z. & Stander, J. (2003). Quantile Regression: Applications and Current Research Areas. Journal of the Royal Statistical Society: Series D (The Statistician), 52, 331-350.
Saçaklı, İ., (2005). Kantil Regresyon ve Alternatif Regresyon Modelleri ile Karşılaştırılması. Yayınlanmış Yüksek Lisans Tezi, Marmara Üniversitesi, Sosyal Bilimler Enstitüsü, Ekonometri Anabilim Dalı, İstanbul.
Sinharay, S. (2010). Discrete Probability Distributions. International Encyclopedia of Education (Third Edition), 1-11.
Favero, L.P., Souza, R.F., Belfiore, P., Corrêa, H.L. & Haddad, M.F.C. (2021). Count Data Regression Analysis: Concepts, Overdispersion Detection, Zero-inflation Identification, and Applications with R, Practical Assessment, Research, and Evaluation, 26, 1-22.
Yip, K.C.H. & Yau, K.K.W. (2005). On Modeling Claim Frequency Data in General Insurance With Extra Zeros. Insurance: Mathematics and Economics, 36(2), 153-163.
Cameron, A.C. & Trivedi, P.K. (1990). Regression-based Tests for Overdispersion in the Poisson Model. Journal of Econometrics, 46, 347-364.
Ismail, N. & Jemain, A.A. (2007). Handling Overdispersion with Negative Binom and Generalized Poisson Regression Models. Virginia: Casualty Actuarial Society Forum, 103-158.
Kibar, F.T. (2008). Trafik Kazaları ve Trabzon Bölünmüş Sahil Yolu Örneğinde Kaza Tahmin Modelinin Oluşturulması. Yüksek Lisans Tezi, Karadeniz Teknik Üniversitesi, Fen Bilimleri Enstitüsü, Trabzon.
NNCS. (2020). Negative Binom Regression. NCSS Statistical Software, Chapter 326.
Cameron, A.C. & Trivedi, P.K. (2013). Regression Analysis of Count Data, Cambridge University Press, 566.
Zwilling, M.L. (2013). Negative Binom Regression, The Mathematica Journal, 15, 1-18.
Boucher, J.P., Denuit, M. & Guillen, M. (2007). Risk Classification for Claim Counts: Mixed Poisson, Zero-Inflated Mixed Poisson and Hurdle Models. North American Actuarial Journal, 11(4), 110- 131.
Fox J. (1997). Applied Regression Analysis: Linear Models And Related Methods. Sage Publication, USA, 123-240.
Neter, J., Kutner, M., Nachtsheim, C. & Wasserman,W. (1996). Applied Lineear Regression Models, Irwin, USA, 561.
Barnett. V., & Lewis, T. (1994). Outliers in Statistical Data. John Wiley Sons, Canada, 7–25.
Çamurlu, S. & Erilli, N.A. (2019). Kantil Regresyon Analizinde Bootstrap Tahmini. Erciyes Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 35(2), 16-25.
Rousseeuw, P. & Leroy, A. (1987). Robust Regression and Outlier Detection. John Wiley Sons, Canada, 84-143.
Wang, H. (2007). Quantile Regression: Overview and Applications to Risk Assessment. North Caroline State University, 1-26.
Geraci, M. (2021). Qtools: A Collection of Models and Tools for Quantile Inference.
Koenker, R. (2005). Quantile Regression, London: Cambridge University Press, 349.
Elmalı, K. (2014). Kantil Regresyon ve Negatif Binom Regresyon İle İllerde Kullanılan İlaç Sayısına Etki Eden Faktörlerin İncelenmesi. Yüksek Lisans Tezi, Atatürk Üniversitesi, Ekonometri Anabilim Dalı, Erzurum.
Akaike, H. (1973). Information Theory and an Extension of the Maximum Likelihood Principle. In Petrov B.N., & Csaki F. (Eds.), Proceedings of the 2nd International Symposium on Information Theory, 267-281.
Hurvich, C.M. & Tsai, C. (1989). Regression and Time Series Model Selection in Small Samples. Biometrika, 76, 297-307.
McQuarrie, A.D.R. & Tsai, C. (1998) Regression and Time Series Model Selection. World Scientific Publishing Company, Singapore, 480.
Sugiuna, N. (1978). Further Analysis of the Data by Akaike’s Information Criterion and the Finite Corrections. Communication in Statistics-Theory and Methods, 57, 13-26.
Long, S.J. & Freese, J. (2001). Predicted Probabilities for Count Models. The Stata Journal, 1, 51–57.
Ucla Statistical Consulting (2021), Poisson Regression-Stata Data Analysis Examples. https://stats.idre.ucla.edu/stata/dae/poisson-regression

Quantile Regression with Count Data: Example of Overdispersion Data

Year 2022, Volume: 9 Issue: 1, 286 - 303, 30.06.2022

Burcu Durmuş , Öznur İşçi Güneri , Aynur İncekirik

https://doi.org/10.35193/bseufbd.1018339

Abstract

Classical regression assumptions are not valid in count models. Therefore, Poisson and negative binom distribution are the most common methods for count data. The Poisson model can be used in case of equal spread, while negative binom distributions in case of overdispersion. In practice, data is often over dispersed. If there are too many zero values in the count data, zero-inflated Poisson models in case of equal spread, zero-inflated negative binom models, Poisson Hurdle and negative binom Hurdle models or their generalized models can be preferred in case of overdispersion. These models generally focus on modeling the conditional average of the dependent variable. However, conditional average regression models may be sensitive to outliers of the dependent variable or provide no information about other conditional distribution properties. In this case, quantile regression, which is one of the robust methods for count data, can be used. The quantile regression has the advantages of robust prediction in the presence of outliers. In this study, count data was taken to show the dependent variable number of articles. Independent variables include of gender, marital status, number of children under the age of 5, prestige of the doctorate, and the number of articles by the consultant in the last 3 years. After applying Poisson and negative binom distribution in the study, 25%, 50%, 75% and 90% quantile regression estimates were obtained.

Keywords

Count Data, Quantile Regression, Poisson Regression, Negative Binom Regression

References

Khoshgoftaar, T.M., Gao, K. & Szabo, R.M. (2005). Comparing Software Fault Predictions of Pure and Zero- inflated Poisson Regression Models. International Journal of Systems Science 36(11), 707-715.
Cui, Y. & Yang, W. (2009). Zero-İnflated Generalized Poisson Regression Mixture Model for Mapping Quantitative Trait Loci Underlying Count Trait With Many Zeros. Journal of Theoretical Biology, 256, 276-285.
Martin, S.W., Rose, C.E, Wannemuehler, K.A. & Plikaytis, B.D. (2006). On the of Zero-inflated and Hurdle Models for Medelling Vaccine Adverse event Count Data. Journal of Biopharmaceutical Statistics, 16, 463-481.
Lambert, D. (1992). Zero-Inflated Poisson Regression, with An Application to Defects in Manufacturing, Technometrics, 34(1), 1-14.
Green, W.H. (1994). Accounting for Excess Zeros and Sample Selection in Poisson and Negative Binom Regression Models, NYU Working Paper No. EC-94-10, 1-32.
Koenker, R. & Basett, G. (1978). Regression Quantiles, Econometrica, 46(1): 33-50.
Manski, C. F. (1975). Maximum Score Estimation of the Stochastic Utility Model of Choice. Journal of Econometrics, 3, 205–228.
Manski, C.F. (1985). Semiparametric Analysis of Discrete Response: Asymptotic Properties of the Maximum Score Estimator. Journal of Econometrics, 27, 313-333.
Horowitz, J. L. (1992). A Smooth Maximum Score Estimator for the Binary Response Model, Econometrica, 60, 505-531.
Horowitz, J.L. (1998). Semiparametric Methods in Econometrics, New York: Springer-Verlag, 100.
Powell, J.L. (1984). Least Absolute Deviation Estimation for the Censored Regression Model. Journal of Econometrics, 25, 303-325.
Powell, J.L. (1986). Censored Regression Quantiles, Journal of Econometrics, 32, 143-155.
Lee, M.J. (1992). Median Regression for Ordered Discrete Response, Journal of Econometrics, 51, 59-77.
Koenker, R., & Bilias, Y. (2001). Quantile Regression for Duration Data: A Reappraisal of the Pennsylvania Reemployment Bonus Experiments. Empirical Economics, 26, 199-220.
Koenker, R., & Geling, O. (2001). Reappraising Medfly Longevity: A Quantile Regression Survival Analysis. Journal of the American Statistical Association, 96, 458-468.
Machado, J.A.F. & Portugal, P. (2002). Exploring Transition Data through Quantile Regression Methods: An Application to U.S. Unemployment Duration. Statistical Data Analysis Based on the L1-Norm and Related Methods, 77-94.
Machado, J.A.F & Santos Silva, J.M.C. (2005). Quantiles for Counts. Journal of the American Statistical Association, 100(472), 1226-1237.
Wu, H., Gao, L. & Zhang, Z. (2014). Analysis of Crash Data Using Quantile Regression for Counts, Journal of Transportation Engineering,140(4).
Congdon, P. (2017). Quantile Regression for Overdispersed Count Data: A Hierarchical Method. Journal of Statistical Distributions and Applications. 4(18), 1-19.
Chernozhukov, V., Fernández-Val,I., Blaise Melly, B. & Kaspar Wüthrich, K. (2020). Generic Inference on Quantile and Quantile Effect Functions for Discrete Outcomes. Journal of The American Statistical Association, 115(5299, 123-137.
Frumento, P., & Salvati, N. (2021). Parametric modeling of quantile regression coefficient functions with count data. Statistical Methods & Applications, 30:1237–1258.
Lamarche, C., Shib, X., & Young, D.S. (2021). Conditional Quantile Functions for Zero-Inflated Longitudinal Count Data. Econometrics and Statistics (basımda) https://gattonweb.uky.edu/faculty/lamarche/ZIPQR.pdf
Çınar, U.K. (2019). En Küçük Kareler Regresyonuna Alternatif Bir Yöntem: Kantil Regresyon. Avrasya Uluslararası Araştırmalar Dergisi, 7(18), 57-71.
Yu, K., Lu, Z. & Stander, J. (2003). Quantile Regression: Applications and Current Research Areas. Journal of the Royal Statistical Society: Series D (The Statistician), 52, 331-350.
Saçaklı, İ., (2005). Kantil Regresyon ve Alternatif Regresyon Modelleri ile Karşılaştırılması. Yayınlanmış Yüksek Lisans Tezi, Marmara Üniversitesi, Sosyal Bilimler Enstitüsü, Ekonometri Anabilim Dalı, İstanbul.
Sinharay, S. (2010). Discrete Probability Distributions. International Encyclopedia of Education (Third Edition), 1-11.
Favero, L.P., Souza, R.F., Belfiore, P., Corrêa, H.L. & Haddad, M.F.C. (2021). Count Data Regression Analysis: Concepts, Overdispersion Detection, Zero-inflation Identification, and Applications with R, Practical Assessment, Research, and Evaluation, 26, 1-22.
Yip, K.C.H. & Yau, K.K.W. (2005). On Modeling Claim Frequency Data in General Insurance With Extra Zeros. Insurance: Mathematics and Economics, 36(2), 153-163.
Cameron, A.C. & Trivedi, P.K. (1990). Regression-based Tests for Overdispersion in the Poisson Model. Journal of Econometrics, 46, 347-364.
Ismail, N. & Jemain, A.A. (2007). Handling Overdispersion with Negative Binom and Generalized Poisson Regression Models. Virginia: Casualty Actuarial Society Forum, 103-158.
Kibar, F.T. (2008). Trafik Kazaları ve Trabzon Bölünmüş Sahil Yolu Örneğinde Kaza Tahmin Modelinin Oluşturulması. Yüksek Lisans Tezi, Karadeniz Teknik Üniversitesi, Fen Bilimleri Enstitüsü, Trabzon.
NNCS. (2020). Negative Binom Regression. NCSS Statistical Software, Chapter 326.
Cameron, A.C. & Trivedi, P.K. (2013). Regression Analysis of Count Data, Cambridge University Press, 566.
Zwilling, M.L. (2013). Negative Binom Regression, The Mathematica Journal, 15, 1-18.
Boucher, J.P., Denuit, M. & Guillen, M. (2007). Risk Classification for Claim Counts: Mixed Poisson, Zero-Inflated Mixed Poisson and Hurdle Models. North American Actuarial Journal, 11(4), 110- 131.
Fox J. (1997). Applied Regression Analysis: Linear Models And Related Methods. Sage Publication, USA, 123-240.
Neter, J., Kutner, M., Nachtsheim, C. & Wasserman,W. (1996). Applied Lineear Regression Models, Irwin, USA, 561.
Barnett. V., & Lewis, T. (1994). Outliers in Statistical Data. John Wiley Sons, Canada, 7–25.
Çamurlu, S. & Erilli, N.A. (2019). Kantil Regresyon Analizinde Bootstrap Tahmini. Erciyes Üniversitesi Fen Bilimleri Enstitüsü Dergisi, 35(2), 16-25.
Rousseeuw, P. & Leroy, A. (1987). Robust Regression and Outlier Detection. John Wiley Sons, Canada, 84-143.
Wang, H. (2007). Quantile Regression: Overview and Applications to Risk Assessment. North Caroline State University, 1-26.
Geraci, M. (2021). Qtools: A Collection of Models and Tools for Quantile Inference.
Koenker, R. (2005). Quantile Regression, London: Cambridge University Press, 349.
Elmalı, K. (2014). Kantil Regresyon ve Negatif Binom Regresyon İle İllerde Kullanılan İlaç Sayısına Etki Eden Faktörlerin İncelenmesi. Yüksek Lisans Tezi, Atatürk Üniversitesi, Ekonometri Anabilim Dalı, Erzurum.
Akaike, H. (1973). Information Theory and an Extension of the Maximum Likelihood Principle. In Petrov B.N., & Csaki F. (Eds.), Proceedings of the 2nd International Symposium on Information Theory, 267-281.
Hurvich, C.M. & Tsai, C. (1989). Regression and Time Series Model Selection in Small Samples. Biometrika, 76, 297-307.
McQuarrie, A.D.R. & Tsai, C. (1998) Regression and Time Series Model Selection. World Scientific Publishing Company, Singapore, 480.
Sugiuna, N. (1978). Further Analysis of the Data by Akaike’s Information Criterion and the Finite Corrections. Communication in Statistics-Theory and Methods, 57, 13-26.
Long, S.J. & Freese, J. (2001). Predicted Probabilities for Count Models. The Stata Journal, 1, 51–57.
Ucla Statistical Consulting (2021), Poisson Regression-Stata Data Analysis Examples. https://stats.idre.ucla.edu/stata/dae/poisson-regression

There are 50 citations in total.

Details

Primary Language	Turkish
Journal Section	Articles
Authors	Burcu Durmuş 0000-0002-0298-0802 Öznur İşçi Güneri 0000-0003-3677-7121 Aynur İncekirik 0000-0002-5029-6036
Publication Date	June 30, 2022
Submission Date	November 3, 2021
Acceptance Date	March 21, 2022
Published in Issue	Year 2022 Volume: 9 Issue: 1

Cite

APA	Durmuş, B., İşçi Güneri, Ö., & İncekirik, A. (2022). Sayma Verileri ile Kantil Regresyon: Aşırı Yayılım Veri Örneği. Bilecik Şeyh Edebali Üniversitesi Fen Bilimleri Dergisi, 9(1), 286-303. https://doi.org/10.35193/bseufbd.1018339

Article Files

Full Text