To read this content please select one of the options below:

Credit risk assessment for unbalanced datasets based on data mining, artificial neural network and support vector machines

Sihem Khemakhem (Faculty of Economics and Management of Sfax, Sfax University, Tunisia)
Fatma Ben Said (National Engineering School of Sfax (ENIS), Sfax University, Tunisia)
Younes Boujelbene (Faculty of Economics and Management of Sfax, Sfax University, Tunisia)

Journal of Modelling in Management

ISSN: 1746-5664

Article publication date: 22 October 2018

Issue publication date: 7 November 2018

1043

Abstract

Purpose

Credit scoring datasets are generally unbalanced. The number of repaid loans is higher than that of defaulted ones. Therefore, the classification of these data is biased toward the majority class, which practically means that it tends to attribute a mistaken “good borrower” status even to “very risky borrowers”. In addition to the use of statistics and machine learning classifiers, this paper aims to explore the relevance and performance of sampling models combined with statistical prediction and artificial intelligence techniques to predict and quantify the default probability based on real-world credit data.

Design/methodology/approach

A real database from a Tunisian commercial bank was used and unbalanced data issues were addressed by the random over-sampling (ROS) and synthetic minority over-sampling technique (SMOTE). Performance was evaluated in terms of the confusion matrix and the receiver operating characteristic curve.

Findings

The results indicated that the combination of intelligent and statistical techniques and re-sampling approaches are promising for the default rate management and provide accurate credit risk estimates.

Originality/value

This paper empirically investigates the effectiveness of ROS and SMOTE in combination with logistic regression, artificial neural networks and support vector machines. The authors address the role of sampling strategies in the Tunisian credit market and its impact on credit risk. These sampling strategies may help financial institutions to reduce the erroneous classification costs in comparison with the unbalanced original data and may serve as a means for improving the bank’s performance and competitiveness.

Keywords

Acknowledgements

The authors are grateful to Kamel MAALOUL, translator and English professor, for having proofread the manuscript.

Citation

Khemakhem, S., Ben Said, F. and Boujelbene, Y. (2018), "Credit risk assessment for unbalanced datasets based on data mining, artificial neural network and support vector machines", Journal of Modelling in Management, Vol. 13 No. 4, pp. 932-951. https://doi.org/10.1108/JM2-01-2017-0002

Publisher

:

Emerald Publishing Limited

Copyright © 2018, Emerald Publishing Limited

Related articles