DENGESİZ KREDİ SKORLAMA VERİ SETLERİNDE KOLEKTİF ÖĞRENME ALGORİTMALARININ PERFORMANS DEĞERLENDİRMESİ

Nihan Ankara; Hulya Sahinturk

doi:10.17261/Pressacademia.2019.1089

Research Article

DENGESİZ KREDİ SKORLAMA VERİ SETLERİNDE KOLEKTİF ÖĞRENME ALGORİTMALARININ PERFORMANS DEĞERLENDİRMESİ

Year 2019, , 180 - 185, 30.07.2019

Nihan Ankara Hulya Sahinturk

https://doi.org/10.17261/Pressacademia.2019.1089

Abstract

Amaç- Kredi skorlama modeli
geliştirilirken kullanılan veri kümelerinde sınıflara ait örneklerin dengesiz
bir dağılıma sahip olmalarından dolayı, modellerin doğruluk oranı düşük
olmaktadır. Biz bu çalışmada kollektif öğrenme algoritmalarını maliyete duyarlı
öğrenme yöntemiyle birlikte kullanarak elde edilen modellerin performansını
karşılaştıp en etkin modellleri belirlemeye çalıştık.

Metodoloji- Bu amaçla Bagging ve
AdaBoost kolektif öğrenme yöntemleri karar ağaçları, destek vektör makineleri
ve k-NN temel sınıflandırıcıları ile iki farklı kredi veri seti üzerinde
çalıştırılmıştır. Ayrıca Bagging ve AdaBoost için maliyet duyarlı öğrenme
yöntemi kullanılarak azınlık sınıflandırma grubunun ceza puanı artırılmıştır. Bütün bu kombinasyonlar kıyaslanmıştır.

Bulgular- Maliyete duyarlı öğrenme
yöntemlerinin kullanılması, hem AdaBoost hem de Bagging için performans
değerlendirme ölçeği AUC açısından daha başarılı sonuçlar elde edilmesini
sağlamıştır. Verideki sınıf dengesizlik oranının artması durumunda, karar ağaçlarının temel sınıflandırıcı olduğu
Bagging kolektif yönteminin AdaBoost kolektif yöntemine göre daha yüksek başarı
elde ettiği gözlemlenmiştir.

Sonuç- Başarısı yüksek
etkili bir kredi skorlama yöntemi geliştirilmesi hala çözülmesi gereken bir
problem olmasına rağmen kolektif öğrenme yöntemi ile oluşturulan modellerin
bireysel sınıflandırıcılarılarla oluşturulan modellere göre daha yüksek başarı
gösterdiği gözlemlenmiştir. Bu durum
literatürdeki diğer çalışma bulgularıyla da örtüşmektedir. [Maciej Zięba ve ark., 2012]

Keywords

Kredi skorlama, kolektif öğrenme, dengesiz veri seti

References

Paleologo, Giuseppe ve ark. (2010). Subagging for credit scoring models. European Journal of Operational Research. 201 (2010) 490–499
Wang, Gang ve ark. (2011). A comparative assessment of ensemble learning for credit scoring, Expert Systems with Applications. 38 (2011) 223 - 230
Marqués, A.I. ve ark. (2012). Exploring the behaviour of base classifiers in credit scoring ensembles. Expert Systems with Applications. 39 (2012) 10244–10250
Gang Wang ve ark (2012). A hybrid ensemble approach for enterprise credit risk assessment based on Support Vector Machine. Expert Systems with Applications. 39 (2012) 5325–5331
Jin Xiao ve ark (2012). Dynamic classifier ensemble model for customer classification with imbalanced class distribution. Expert Systems with Applications 39 3668–3675
Maciej Zięba ve ark (2012). Ensemble Classifier for Solving Credit Scoring Problems. IFIP International Federation for Information Processing, pp. 59–66
Chih-Fong Tsai ve ark (2014). A comparative study of classifier ensembles for bankruptcy prediction. Applied Soft Computing 24 (2014) 977–984
Tsai Chih-Fong ve ark (2014). Modeling credit scoring using neural network ensembles. Kybernetes, vol. 43, no. 7, pp. 1114-1123
Ning Chen ve ark (2015). Comparative study of classifier ensembles for cost-sensitive credit risk assessment. Intelligent Data Analysis 19 (2015) 127–144
Stefan Lessmann ve ark (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research 247(2015) 124–136
Shashi Dahiya ve ark (2015). Credit Scoring Using Ensemble of Various Classifiers on Reduced Feature Set. Industrija, vol. 43, no. 4
Hong Wang ve ark (2015). Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble. PLOS ONE
C.R.Durga devi ve ark (2016). A Relative Evaluation of the Performance of Ensemble Learning in Credit Scoring. IEEE International Conference on Advances in Computer Applications (ICACA)
Uma R. Salunkhe, Suresh N. Mali (2016). Classifier Ensemble Design for Imbalanced Data Classification A Hybrid Approach. International Conference on Computational Modeling and Security (CMS 2016) Procedia Computer Science 85 ( 2016 ) 725 – 732
You Zhu ve ark (2017). Comparison of individual, ensemble and integrated ensemble machine learning methods to predict China’s SME credit risk in supply chain finance. Neural Comput & Applic (2017) 28 (Suppl 1):S41–S50
Mikel Galar ve ark (2012). A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE Transactions On Systems, Man, And Cybernetics—Part C: Applications And Reviews, vol. 42, no. 4, pp. 463-485
Charles X. Ling, Victor S. Sheng (2008). Cost-Sensitive Learning and the Class Imbalance Problem. Encyclopedia of Machine Learning
Zhou , Zhi-Hua (2012). Ensemble Methods Foundations and Algorithms . Cambridge UK: CRC Press
ETS Asset Management Factory. (2016, Nisan 20). What is the difference between Bagging and Boosting?. Retrieved from https://quantdare.com/what-is-the-difference-between-bagging-and-boosting/
Schapire, R. (2013, Ekim 09). Explaining AdaBoost. Retrieved from http://rob.schapire.net/papers/explaining-adaboost.pdf
Polikar, R. (2008, Aralık 22). Ensemble Learning. Retrieved from http://www.scholarpedia.org/article/Ensemble_learning
Lutins, E. (2017, Ağustos 01). Ensemble Methods in Machine Learning: What are They and Why Use Them?. Retrieved from https://towardsdatascience.com/ensemble-methods-in-machine-learning-what-are-they-and-why-use-them-68ec3f9fef5f

PERFORMANCE EVALUATION OF ENSEMBLE LEARNING ALGORITHMS ON UNBALANCED CREDIT SCORING DATA SETS

Year 2019, , 180 - 185, 30.07.2019

Nihan Ankara Hulya Sahinturk

https://doi.org/10.17261/Pressacademia.2019.1089

Abstract

Purpose- As the credit scoring model
is developed, the accuracy of the models is low due to the unbalanced distribution
of the samples belonging to the classes.
In this study, we
tried to determine the most effective models by comparing the performance of
the models obtained by using collective learning algorithms together with cost
sensitive learning method.

Methodology- For this purpose,
Bagging and AdaBoost collective learning methods were run on two different
credit data sets with decision trees, support vector machines and k-NN basic
classifiers. In addition, the penal score of the minority classification group
was increased by using cost-sensitive learning method for Bagging and AdaBoost.
All these combinations were compared.

Findings- The use of
cost-sensitive learning methods has led to more successful results in terms of
AUC for both AdaBoost and Bagging. It was observed that the Bagging collective
method, which is the main classifier of decision trees, had higher success than
the AdaBoost collective method in the case of increasing class imbalance rate
in the data.

Conclusion- Although the development of a
highly effective credit scoring method is still a problem that needs to be
solved, it has been observed that the models created by the collective learning
method show higher success than the models created by individual classifiers.
This situation coincides with the findings of other studies in the literature. [Maciej Zięba ve ark., 2012]

Keywords

Credit scoring, ensemble learning, unbalance datasets

References

Paleologo, Giuseppe ve ark. (2010). Subagging for credit scoring models. European Journal of Operational Research. 201 (2010) 490–499
Wang, Gang ve ark. (2011). A comparative assessment of ensemble learning for credit scoring, Expert Systems with Applications. 38 (2011) 223 - 230
Marqués, A.I. ve ark. (2012). Exploring the behaviour of base classifiers in credit scoring ensembles. Expert Systems with Applications. 39 (2012) 10244–10250
Gang Wang ve ark (2012). A hybrid ensemble approach for enterprise credit risk assessment based on Support Vector Machine. Expert Systems with Applications. 39 (2012) 5325–5331
Jin Xiao ve ark (2012). Dynamic classifier ensemble model for customer classification with imbalanced class distribution. Expert Systems with Applications 39 3668–3675
Maciej Zięba ve ark (2012). Ensemble Classifier for Solving Credit Scoring Problems. IFIP International Federation for Information Processing, pp. 59–66
Chih-Fong Tsai ve ark (2014). A comparative study of classifier ensembles for bankruptcy prediction. Applied Soft Computing 24 (2014) 977–984
Tsai Chih-Fong ve ark (2014). Modeling credit scoring using neural network ensembles. Kybernetes, vol. 43, no. 7, pp. 1114-1123
Ning Chen ve ark (2015). Comparative study of classifier ensembles for cost-sensitive credit risk assessment. Intelligent Data Analysis 19 (2015) 127–144
Stefan Lessmann ve ark (2015). Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research. European Journal of Operational Research 247(2015) 124–136
Shashi Dahiya ve ark (2015). Credit Scoring Using Ensemble of Various Classifiers on Reduced Feature Set. Industrija, vol. 43, no. 4
Hong Wang ve ark (2015). Large Unbalanced Credit Scoring Using Lasso-Logistic Regression Ensemble. PLOS ONE
C.R.Durga devi ve ark (2016). A Relative Evaluation of the Performance of Ensemble Learning in Credit Scoring. IEEE International Conference on Advances in Computer Applications (ICACA)
Uma R. Salunkhe, Suresh N. Mali (2016). Classifier Ensemble Design for Imbalanced Data Classification A Hybrid Approach. International Conference on Computational Modeling and Security (CMS 2016) Procedia Computer Science 85 ( 2016 ) 725 – 732
You Zhu ve ark (2017). Comparison of individual, ensemble and integrated ensemble machine learning methods to predict China’s SME credit risk in supply chain finance. Neural Comput & Applic (2017) 28 (Suppl 1):S41–S50
Mikel Galar ve ark (2012). A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches. IEEE Transactions On Systems, Man, And Cybernetics—Part C: Applications And Reviews, vol. 42, no. 4, pp. 463-485
Charles X. Ling, Victor S. Sheng (2008). Cost-Sensitive Learning and the Class Imbalance Problem. Encyclopedia of Machine Learning
Zhou , Zhi-Hua (2012). Ensemble Methods Foundations and Algorithms . Cambridge UK: CRC Press
ETS Asset Management Factory. (2016, Nisan 20). What is the difference between Bagging and Boosting?. Retrieved from https://quantdare.com/what-is-the-difference-between-bagging-and-boosting/
Schapire, R. (2013, Ekim 09). Explaining AdaBoost. Retrieved from http://rob.schapire.net/papers/explaining-adaboost.pdf
Polikar, R. (2008, Aralık 22). Ensemble Learning. Retrieved from http://www.scholarpedia.org/article/Ensemble_learning
Lutins, E. (2017, Ağustos 01). Ensemble Methods in Machine Learning: What are They and Why Use Them?. Retrieved from https://towardsdatascience.com/ensemble-methods-in-machine-learning-what-are-they-and-why-use-them-68ec3f9fef5f

There are 22 citations in total.

Details

Primary Language	Turkish
Subjects	Finance, Business Administration
Journal Section	Articles
Authors	Nihan Ankara This is me 0000-0003-2160-210X Hulya Sahinturk This is me 0000-0002-3936-9441
Publication Date	July 30, 2019
Published in Issue	Year 2019

Cite

APA	Ankara, N., & Sahinturk, H. (2019). DENGESİZ KREDİ SKORLAMA VERİ SETLERİNDE KOLEKTİF ÖĞRENME ALGORİTMALARININ PERFORMANS DEĞERLENDİRMESİ. PressAcademia Procedia, 9(1), 180-185. https://doi.org/10.17261/Pressacademia.2019.1089
AMA	Ankara N, Sahinturk H. DENGESİZ KREDİ SKORLAMA VERİ SETLERİNDE KOLEKTİF ÖĞRENME ALGORİTMALARININ PERFORMANS DEĞERLENDİRMESİ. PAP. July 2019;9(1):180-185. doi:10.17261/Pressacademia.2019.1089
Chicago	Ankara, Nihan, and Hulya Sahinturk. “DENGESİZ KREDİ SKORLAMA VERİ SETLERİNDE KOLEKTİF ÖĞRENME ALGORİTMALARININ PERFORMANS DEĞERLENDİRMESİ”. PressAcademia Procedia 9, no. 1 (July 2019): 180-85. https://doi.org/10.17261/Pressacademia.2019.1089.
EndNote	Ankara N, Sahinturk H (July 1, 2019) DENGESİZ KREDİ SKORLAMA VERİ SETLERİNDE KOLEKTİF ÖĞRENME ALGORİTMALARININ PERFORMANS DEĞERLENDİRMESİ. PressAcademia Procedia 9 1 180–185.
IEEE	N. Ankara and H. Sahinturk, “DENGESİZ KREDİ SKORLAMA VERİ SETLERİNDE KOLEKTİF ÖĞRENME ALGORİTMALARININ PERFORMANS DEĞERLENDİRMESİ”, PAP, vol. 9, no. 1, pp. 180–185, 2019, doi: 10.17261/Pressacademia.2019.1089.
ISNAD	Ankara, Nihan - Sahinturk, Hulya. “DENGESİZ KREDİ SKORLAMA VERİ SETLERİNDE KOLEKTİF ÖĞRENME ALGORİTMALARININ PERFORMANS DEĞERLENDİRMESİ”. PressAcademia Procedia 9/1 (July 2019), 180-185. https://doi.org/10.17261/Pressacademia.2019.1089.
JAMA	Ankara N, Sahinturk H. DENGESİZ KREDİ SKORLAMA VERİ SETLERİNDE KOLEKTİF ÖĞRENME ALGORİTMALARININ PERFORMANS DEĞERLENDİRMESİ. PAP. 2019;9:180–185.
MLA	Ankara, Nihan and Hulya Sahinturk. “DENGESİZ KREDİ SKORLAMA VERİ SETLERİNDE KOLEKTİF ÖĞRENME ALGORİTMALARININ PERFORMANS DEĞERLENDİRMESİ”. PressAcademia Procedia, vol. 9, no. 1, 2019, pp. 180-5, doi:10.17261/Pressacademia.2019.1089.
Vancouver	Ankara N, Sahinturk H. DENGESİZ KREDİ SKORLAMA VERİ SETLERİNDE KOLEKTİF ÖĞRENME ALGORİTMALARININ PERFORMANS DEĞERLENDİRMESİ. PAP. 2019;9(1):180-5.

Article Files

Full Text

PressAcademia Procedia (PAP) publishes proceedings of conferences, seminars and symposiums. PressAcademia Procedia aims to provide a source for academic researchers, practitioners and policy makers in the area of social and behavioral sciences, and engineering.

PressAcademia Procedia invites academic conferences for publishing their proceedings with a review of editorial board. Since PressAcademia Procedia is an double blind peer-reviewed open-access book, the manuscripts presented in the conferences can easily be reached by numerous researchers. Hence, PressAcademia Procedia increases the value of your conference for your participants.

PressAcademia Procedia provides an ISBN for each Conference Proceeding Book and a DOI number for each manuscript published in this book.

PressAcademia Procedia is currently indexed by DRJI, J-Gate, International Scientific Indexing, ISRA, Root Indexing, SOBIAD, Scope, EuroPub, Journal Factor Indexing and InfoBase Indexing.

Please contact to procedia@pressacademia.org for your conference proceedings.