Research Article
BibTex RIS Cite

Regression Analyses or Decision Trees?

Year 2020, Volume: 18 Issue: 4, 251 - 260, 28.12.2020
https://doi.org/10.18026/cbayarsos.796172

Abstract

Decision tree algorithm is an important classification method in data mining techniques. A decision tree creates classification and regression models like a tree that has a root node, branches, and leaf nodes. Logistic regression which is an alternative method to regression analysis when the dependent variable is a dichotomy, is another technique used for classification purposes. Within the scope of this research, logistic regression, linear regression, classification tree, and regression tree were applied on the same data set. This study explores the most important variables determining the house price by using these four methods. Models’ performances and predictive powers were compared and the best model is determined. This comparison was performed using 414 real estate data on 5 independent variables and the dependent variable is house price. The findings showed that the classification tree model for real estate valuation data performs better than standard approaches.

References

  • Aery, M., & Ram, C. (2017). A Review on Machine Learning: Trends and Future Prospects. https://www.researchgate.net/publication/323377718.
  • Alpar, R. (2017). Uygulamalı Çok Değişkenli İstatistiksel Yöntemler. Detay Yayıncılık. Dördüncü Baskı, Ankara.
  • Deconinck, E., Hancock, T., Coomans, D., & Massart. (2005). “Classification of Drugs in Absorption Classes Using the Classification and Regression Trees (CART) Methodology”, Journal of Pharmaceutical and Biomedical Analysis, 39: 91–103.
  • Deveci Kocakoç, İ., & Keser, İ. (2019). Exploring Decision Rules for Election Results by Classification Trees. In Economies of the Balkan and Eastern European Countries, Kne Social Sciences, Pages 107--115. Doı 10.18502/Kss.V4i1.5982economies Of The Balkan and Eastern European Countries (EBEEC 2019), Conference Paper.
  • Gacar, A. (2019). Yapay Zeka ve Yapay Zakanın Muhasebe Mesleğine Olan Etkileri: Türkiye'ye Yönelik Fırsat ve Tehditler. Balkan Sosyal Bilimler Dergisi 8(Eurefe'9):389-394.
  • Garay, U. ( 2016). Real Estate as an Investment (Chapter 14). N Book: Alternative Investments: Caıa Level Iı (Pp.343-358.)Edition: 3rd Chapter: Real Estate as an Investment publisher: Wıley https://www.researchgate.net/publication/309415671.
  • Güner, Z. B. (2014). ‘‘CART and Logistic Regression Analysis in Data Mining: An Application on Pharmacy Provision System Data’’ . Sosyal Güvenlik Uzmanları Derneği, Sosyal Güvence Dergisi, Sayı 6.
  • Hosmer, D. W., & Lemeshow, S. (1989). “Applied Logistic Regression”, John Wiley & Sons, New York, 5-50.
  • Irimia-Dieguez, A., Blanco-Oliver, A., & Vazquez-Cueto, M. (2015). ‘‘A Comparison of Classification/Regression Trees and Logistic Regression in Failure Models’’. Procedia Economics and Finance 26, 23 – 28.
  • Khemphila, A., & Boonjing, V. (2010). '‘Comparing Performances of Logistic Regression, Decision Trees, and Neural Networks for Classifying Heart Disease Patients’’ 978-1-4244-7818-7/10/$26.00_C 2010 Ieee.
  • Kocarık Gacar, B., & Deveci Kocakaç, İ. (2019). Regression Analysis or Decision Trees? 5th International Researchers, Statisticians and Young Statisticians Congress (IRSYSC'19), Kusadasi, Turkey. Özet Bildiri, 17-19 October 2019.
  • Krulický, T., & Horák, J. (2019). Real Estate as an Investment Asset. Web of Conferences 61, 01011 https://Doi.Org/10.1051/Shsconf/20196101011.
  • Kurt, İ., Türe, M., & Kurum, A. T. (2008). Comparing Performances of Logistic Regression, Classification and Regression Tree, and Neural Networks for Predicting Coronary Artery Disease. . Expert Systems Xith Applications, 34, 366-374.
  • Lemon, S., Roy, J., Clark , M., Friedmann, P., & Rakowski , W. (2003). Classification and Regression Tree Analysis in Public Health: Methodological Review and Comparison with Logistic Regression. Ann Behav Med. 2003; 26 (3) : 172 - 181. Doi:10.1207/S15324796abm2603_02. © 2003 By The Society of Behavioral Medicine, 26.
  • Lewis, R. J. (2000). ‘‘An Introduction to Classification and Regression Tree (CART) Analysis’’ Ucla Medical Center Torrance, California Presented at the 2000 Annual Meeting Of The Society For Academic Emergency Medicine İn San Francisco, California.
  • Long, W., Griffth, J., Selker, H., & Agostino, R. (1993). A Comparison of Logistic Regression to Decision-Tree Induction in a Medical Domain. Reprinted from Computers in Biomedical Research,26: 74-97, 1993.
  • What is Machine Learning?, Matlab, Mathworks, Statistics and Machine Learning Toolbox, Adress: https://www.mathworks.com/discovery/machine-learning.html Date: 01.09.2020.
  • Rudd, J., & Priestley, J. (2017). "A Comparison of Decision Tree with Logistic Regression Model for Prediction of Worst Non-Financial Payment Status in Commercial Credit. Grey Literature from PhD Candidates. 5. http://digitalcommons.kennesaw.edu/dataphdgreylit/5.
  • Ru-Ping, L. (2010). Research of Decision Tree Classification Algorithm in Data Mining. Journal of East China Institute of Technology, Natural Science 2010-02.
  • Tatlıdil, H. (2002). Uygulamalı Çok Degişkenli İstatistiksel Analiz, Ankara:1.Basım. Cem WebOfset.
  • Yeh, I., & Hsu, T. (2018). Building Real Estate Valuation Models with Comparative Approach Through Case-Based Reasoning. Applied Soft Computing, 65, 260-271.

Regresyon Analizleri mi Karar Ağaçları mı?

Year 2020, Volume: 18 Issue: 4, 251 - 260, 28.12.2020
https://doi.org/10.18026/cbayarsos.796172

Abstract

Karar ağaçları algoritması, veri madenciliği teknikleri içinde önemli bir sınıflandırma yöntemidir. Karar ağacı, kök düğümü, dalları ve yaprak düğümleri olan ağaç yapısında sınıflandırma ve regresyon modelleri oluşturur. Bağımlı değişken iki kategorili olduğunda regresyon analizine alternatif bir yöntem olarak tercih edilen lojistik regresyon analizi, sınıflandırma amacıyla kullanılan bir diğer tekniktir. Bu araştırma kapsamında aynı veri seti üzerinde lojistik regresyon, doğrusal regresyon, sınıflandırma ağacı ve regresyon ağacı yöntemleri uygulanmıştır. Bu dört yöntem kullanılarak konut fiyatını belirleyen en önemli değişkenler belirlenmiştir. Modellerin performansları ve tahmin güçleri karşılaştırılmış; en iyi sınıflandırma yapan model belirlenmeye çalışılmıştır. Bu karşılaştırma, 5 bağımsız değişken ve bağımlı değişken ev fiyatı olmak üzere, 414 gayrimenkul verisi kullanılarak yapılmıştır. Analiz sonucunda elde edilen bulgular, gayrimenkul değerleme verisi için sınıflandırma ağacı modelinin standart yaklaşımlardan daha iyi performans sergilediğini göstermiştir.

References

  • Aery, M., & Ram, C. (2017). A Review on Machine Learning: Trends and Future Prospects. https://www.researchgate.net/publication/323377718.
  • Alpar, R. (2017). Uygulamalı Çok Değişkenli İstatistiksel Yöntemler. Detay Yayıncılık. Dördüncü Baskı, Ankara.
  • Deconinck, E., Hancock, T., Coomans, D., & Massart. (2005). “Classification of Drugs in Absorption Classes Using the Classification and Regression Trees (CART) Methodology”, Journal of Pharmaceutical and Biomedical Analysis, 39: 91–103.
  • Deveci Kocakoç, İ., & Keser, İ. (2019). Exploring Decision Rules for Election Results by Classification Trees. In Economies of the Balkan and Eastern European Countries, Kne Social Sciences, Pages 107--115. Doı 10.18502/Kss.V4i1.5982economies Of The Balkan and Eastern European Countries (EBEEC 2019), Conference Paper.
  • Gacar, A. (2019). Yapay Zeka ve Yapay Zakanın Muhasebe Mesleğine Olan Etkileri: Türkiye'ye Yönelik Fırsat ve Tehditler. Balkan Sosyal Bilimler Dergisi 8(Eurefe'9):389-394.
  • Garay, U. ( 2016). Real Estate as an Investment (Chapter 14). N Book: Alternative Investments: Caıa Level Iı (Pp.343-358.)Edition: 3rd Chapter: Real Estate as an Investment publisher: Wıley https://www.researchgate.net/publication/309415671.
  • Güner, Z. B. (2014). ‘‘CART and Logistic Regression Analysis in Data Mining: An Application on Pharmacy Provision System Data’’ . Sosyal Güvenlik Uzmanları Derneği, Sosyal Güvence Dergisi, Sayı 6.
  • Hosmer, D. W., & Lemeshow, S. (1989). “Applied Logistic Regression”, John Wiley & Sons, New York, 5-50.
  • Irimia-Dieguez, A., Blanco-Oliver, A., & Vazquez-Cueto, M. (2015). ‘‘A Comparison of Classification/Regression Trees and Logistic Regression in Failure Models’’. Procedia Economics and Finance 26, 23 – 28.
  • Khemphila, A., & Boonjing, V. (2010). '‘Comparing Performances of Logistic Regression, Decision Trees, and Neural Networks for Classifying Heart Disease Patients’’ 978-1-4244-7818-7/10/$26.00_C 2010 Ieee.
  • Kocarık Gacar, B., & Deveci Kocakaç, İ. (2019). Regression Analysis or Decision Trees? 5th International Researchers, Statisticians and Young Statisticians Congress (IRSYSC'19), Kusadasi, Turkey. Özet Bildiri, 17-19 October 2019.
  • Krulický, T., & Horák, J. (2019). Real Estate as an Investment Asset. Web of Conferences 61, 01011 https://Doi.Org/10.1051/Shsconf/20196101011.
  • Kurt, İ., Türe, M., & Kurum, A. T. (2008). Comparing Performances of Logistic Regression, Classification and Regression Tree, and Neural Networks for Predicting Coronary Artery Disease. . Expert Systems Xith Applications, 34, 366-374.
  • Lemon, S., Roy, J., Clark , M., Friedmann, P., & Rakowski , W. (2003). Classification and Regression Tree Analysis in Public Health: Methodological Review and Comparison with Logistic Regression. Ann Behav Med. 2003; 26 (3) : 172 - 181. Doi:10.1207/S15324796abm2603_02. © 2003 By The Society of Behavioral Medicine, 26.
  • Lewis, R. J. (2000). ‘‘An Introduction to Classification and Regression Tree (CART) Analysis’’ Ucla Medical Center Torrance, California Presented at the 2000 Annual Meeting Of The Society For Academic Emergency Medicine İn San Francisco, California.
  • Long, W., Griffth, J., Selker, H., & Agostino, R. (1993). A Comparison of Logistic Regression to Decision-Tree Induction in a Medical Domain. Reprinted from Computers in Biomedical Research,26: 74-97, 1993.
  • What is Machine Learning?, Matlab, Mathworks, Statistics and Machine Learning Toolbox, Adress: https://www.mathworks.com/discovery/machine-learning.html Date: 01.09.2020.
  • Rudd, J., & Priestley, J. (2017). "A Comparison of Decision Tree with Logistic Regression Model for Prediction of Worst Non-Financial Payment Status in Commercial Credit. Grey Literature from PhD Candidates. 5. http://digitalcommons.kennesaw.edu/dataphdgreylit/5.
  • Ru-Ping, L. (2010). Research of Decision Tree Classification Algorithm in Data Mining. Journal of East China Institute of Technology, Natural Science 2010-02.
  • Tatlıdil, H. (2002). Uygulamalı Çok Degişkenli İstatistiksel Analiz, Ankara:1.Basım. Cem WebOfset.
  • Yeh, I., & Hsu, T. (2018). Building Real Estate Valuation Models with Comparative Approach Through Case-Based Reasoning. Applied Soft Computing, 65, 260-271.
There are 21 citations in total.

Details

Primary Language English
Journal Section Articles
Authors

Burcu Kocarık Gacar 0000-0001-5944-4456

İpek Deveci Kocakoç 0000-0001-9155-8269

Publication Date December 28, 2020
Published in Issue Year 2020 Volume: 18 Issue: 4

Cite

APA Kocarık Gacar, B., & Deveci Kocakoç, İ. (2020). Regression Analyses or Decision Trees?. Manisa Celal Bayar Üniversitesi Sosyal Bilimler Dergisi, 18(4), 251-260. https://doi.org/10.18026/cbayarsos.796172