Credit Decision Support Based on Real Set of Cash Loans Using Integrated Machine Learning Algorithms

Ziemba, Paweł; Becker, Jarosław; Becker, Aneta; Radomska-Zalas, Aleksandra; Pawluk, Mateusz; Wierzba, Dariusz

doi:10.3390/electronics10172099

Open AccessArticle

Credit Decision Support Based on Real Set of Cash Loans Using Integrated Machine Learning Algorithms

by

Paweł Ziemba

^1,*

,

Jarosław Becker

^2,*

,

Aneta Becker

³,

Aleksandra Radomska-Zalas

²,

Mateusz Pawluk

⁴

and

Dariusz Wierzba

⁵

¹

Institute of Management, University of Szczecin, Aleja Papieża Jana Pawła II 22A, 70-453 Szczecin, Poland

²

Faculty of Technology, The Jacob of Paradies University, Chopina 52, 66-400 Gorzów Wielkopolski, Poland

³

Faculty of Economics, West Pomeranian University of Technology, Janickiego 31, 71-210 Szczecin, Poland

⁴

Faculty of Mathematics and Information Science, Informatics, Warsaw University of Technology, Koszykowa 75, 00-662 Warsaw, Poland

⁵

Faculty of Economic Sciences, University of Warsaw, Długa 44/50, 00-241 Warsaw, Poland

^*

Authors to whom correspondence should be addressed.

Electronics 2021, 10(17), 2099; https://doi.org/10.3390/electronics10172099

Submission received: 5 August 2021 / Revised: 26 August 2021 / Accepted: 26 August 2021 / Published: 30 August 2021

(This article belongs to the Special Issue Knowledge Engineering and Data Mining)

Download

Browse Figure

Versions Notes

Abstract

:

One of the important research problems in the context of financial institutions is the assessment of credit risk and the decision to whether grant or refuse a loan. Recently, machine learning based methods are increasingly employed to solve such problems. However, the selection of appropriate feature selection technique, sampling mechanism, and/or classifiers for credit decision support is very challenging, and can affect the quality of the loan recommendations. To address this challenging task, this article examines the effectiveness of various data science techniques in issue of credit decision support. In particular, processing pipeline was designed, which consists of methods for data resampling, feature discretization, feature selection, and binary classification. We suggest building appropriate decision models leveraging pertinent methods for binary classification, feature selection, as well as data resampling and feature discretization. The selected models’ feasibility analysis was performed through rigorous experiments on real data describing the client’s ability for loan repayment. During experiments, we analyzed the impact of feature selection on the results of binary classification, and the impact of data resampling with feature discretization on the results of feature selection and binary classification. After experimental evaluation, we found that correlation-based feature selection technique and random forest classifier yield the superior performance in solving underlying problem.

Keywords:

credit scoring; cash loans; machine learning; decision model; classification; feature selection; resampling; discretization

1. Introduction

Nowadays, banks and financial institutions carefully analyze the credit risk of their clients [1]. The current world situation, i.e., COVID-19 pandemic, affects not only people’s lives, but also has a negative impact on economic factor, especially related to paying liabilities by potential borrowers [2]. According to that issue, credit scoring systems [1] are needed by such organizations in order to select the most promising clients to work with and offer well-tailored services for them. These models are particularly suited for financial institutions, due to the ability of assessing the numerical score of individual customers, which determines their loan repayment probability [3]. Under the hood the final decision is made—whether loan granting is justified or not. Most often, credit risk is assessed on the basis of historical data, using mainly statistical or machine learning methods [4], among them, e.g., rough sets [5], usually combined with: probability theory [6], fuzzy sets [7], decision trees [8], Neural Networks and Support Vector Machines [9], or genetic algorithms [10].

Of particular importance in the problems of credit scoring are classification models that play role of decision models [11], usually, supported by feature selection, data resampling and feature discretization methods [12]. There exist many applications of above techniques in numerous publications [1,2,3,4,13,14,15,16,17,18]. Reduction of computational burden and significant improvement of model efficiency and understandability can be achieved when relevant feature subset is selected [19]. Moreover, credit scoring models may be sensitive due to the dataset imbalance, i.e., the number of positive and negative cases is not equally distributed—in that situation, their overall performance may be improved by data resampling [20]. The use of discretization may also have a positive impact on credit scoring models by increasing the efficiency of certain classification algorithms [21]. Unfortunately, when analyzing the literature on credit scoring, there is a shortage of research in which all the indicated techniques (feature selection, resampling, discretization, classification) would be used in one process of processing a dataset and building a classification model. In connection with the identified research gap, the question arises whether the combined use of the indicted methods and techniques in the process of dataset processing will increase the effectiveness of classification models.

The aim of this article is to analyze the effectiveness of various classification models in supporting credit decisions. Contribution includes:

creation of decision models using different binary classifiers, feature selection methods, as well as data resampling and feature discretization methods;
evaluation of models on dataset containing real data of cash loans.

It is important to note that the presented research is a significant extension of the earlier works in which we examined only selected classifiers and feature selection methods [22] as well as rough set approach [23].

Section 2 discusses the problem of credit risk assessment and reviews the literature on the subject. Section 3 presents a review of useful methods for classification task, feature selection, data resampling, and feature discretization incorporated in the study, as well as proven measures for assessment of classification models. Section 4 contains a description and explanation of the adopted test procedure. The general results of the research carried out are included in Section 5, while the more detailed results are included in the Appendix A, Appendix B, Appendix C, Appendix D, Appendix E, Appendix F and Appendix G. The paper is summarized with conclusions and proposals for further research presented in Section 6.

2. Literature Review

The subject of interest of authors dealing with financial issues is often credit risk, generally defined as the risk of a business partner who does not fully meet its obligations on time and avoids such activities altogether [24]. Credit risk can also be understood as the risk of changes in the value of the company’s equity as a result of changes in the creditworthiness of its debtors. It is noted that in recent years a lot of attention has been paid to the methods and algorithms for assessing financial credit risk. This was due, among others, to the occurrence of global financial crises, but also to the need for a thorough assessment of such threats and forecasting business failures. It should be added that the above-mentioned factors have an impact on the functioning of the economy and financial decisions made by societies [25].

Due to the fact that financial credit risk indicates a risk related to financing, its assessment is aimed at solving the following two categories of problems: credit rating or scoring and predicting bankruptcy of forecasting a financial crisis of enterprises. Historically, research on financial credit risk assessment was initiated in the 1930s [26] and continued over the years with considerable success in the 1960s [27]. Nowadays, apart from taking into account the achievements obtained with the use of traditional statistical methods, the research focuses primarily on the use of advanced machine learning methods. This approach, without the need to follow strict assumptions, results in an improvement in the accuracy of the results obtained in a conventional manner. At the same time, it is impossible to indicate the only effective method that is superior to others. On the other hand, the most recently used intelligence techniques include: artificial neural networks (ANNs), fuzzy set theory (FST), decision trees (DTRs), case-based reasoning (CBR), support vector machines (SVMs), rough set theory (RST), genetic programming (GP), hybrid learning, and ensemble computing [25].

The traditional approach to credit risk assessment focuses on obtaining the optimal linear combination of the input explanatory variables. It is expected that thanks to these variables it will be possible to: model, analyze and predict the risk of corporate insolvency. Their use is determined by popularity, but attention is paid, for example, to the fact that they do not take into account complex relationships between variables. To assess credit risk using statistical models, among others, linear discrimination analysis (LDA), logistic regression (LR), multivariate discriminant analysis (MDA), quadratic discriminant analysis (QDA), factor analysis (FA), risk index models, and conditional probability model are used [25]. Among the works pointing to the domination of statistical methods over other approaches, there are [28,29].

The group of methods that combine the traditional and intelligent approaches are semi-parametric method, which are characterized by greater flexibility of the model structure, clearly interpret the modelled process and show greater accuracy. More information on this can be found in [30,31]. In the literature on the subject, there are many interesting combinations of parametric, non-parametric and semi-parametric models, for example, the Klein and Spady model [32], Logit model and the CART model [33]. Another proposal is the integration of a parametric binary logistic regression model (BLRM) and non-parametric models (e.g., SVM, DTR) [34].

Many publications report good results obtained with the use of artificial neural networks [35,36,37]. The feature of networks that makes them useful for the assessment of credit risk is the ability to process non-linear data and approximate most of the functions. In this way, internal patterns can be found from complex financial data [38]. There are also some limitations to their use, such as difficulty in explaining the black box algorithm, time-consuming learning, not providing optimal solutions, and too much adjustment to the training data.

Another proposal for credit risk assessment are SVMs, which transform non-linear input vectors into a multidimensional feature space. It is possible with the use of kernel functions, which means that the data can be separated by linear models. The interest in SVMs is due to their good performance, the possibility of generalizing a small set of high-value data [39]. Their effectiveness is noticeable when the input data are non-linear and non-stationary, which results in obtaining models supporting credit decisions [40].

The classical classification approach is represented by decision trees. In the case of credit risk, their usefulness results from: easy interpretation of the obtained results, non-linear estimation, non-parametric form, accuracy, possibility of application in the case of continuous and categorical variables, as well as the indication of significant variables. In the discussed field, for example, ID3, C4.5, CART, CHAID, MARS, ADTree [33] can be used.

In the literature on the subject [25], it is possible to note the use of CBR in the subject of credit risk. This approach makes it possible to propose problem-solving by recalling similar experiences. All activities are based on the principle of k-nearest neighbors (kNN), which in the case of classification includes the identified object in the class to which most of its k-nearest neighbors belong. It is suggested to use CBR in the case of small data sets, although it is less precise in relation to other methods used in this type of problem and its improvement is proposed [41].

There have been many interesting publications on credit risk assessment recently. In their work, Wang et al. (2020) [42] presented the results of a study on the assessment of credit risk in the supply chain of commercial banks online. The authors used the literature induction method, the non-linear LS-SVM model and compared the obtained results with the results of the logistic regression model. They found that the LS-SVM evaluation model had a higher classification accuracy than the logistic regression model. In addition, they found that it has a strong generalization capacity and can comprehensively identify credit risk and provide sound, scientific analysis, and is an effective tool supporting the credit risk assessment of small and medium-sized enterprises.

The article by Arora and Kaur (2020) [43], which confirmed the usefulness of modern data mining and machine learning techniques, is also worth mentioning. According to the authors, these methods show precision in predicting credit risk and support taking appropriate decisions. Bolasso (Bootstrap-Lasso) was used in the research. In order to test the predictive accuracy, the functions obtained by Bolasso were applied to the following classification algorithms: Random Forest (RF), Support Vector Machine (SVM), Naïve Bayes (NB), and kNN. The authors concluded that the Random Forest algorithm (BS-RF) with Bolasso enabled provides the best credit risk assessment results.

Other conclusions were reached by Froelich and Hajek (2019) [44], who proposed in their previous studies to automate credit risk assessment by using systems based on machine learning methods. The authors concluded that the obtained results are difficult to interpret and do not fully take into account the expert knowledge. In the next step, they applied multi-criteria group decision making methods (MCGDM) to simulate the assessment process performed by a team of credit risk experts. According to the authors, standard MCGDM methods do not take into account high uncertainty and are not effective in the case of a significant impact of the assessed credit risk criteria. Therefore, they proposed an MCGDM model that combines fuzzy sets and fuzzy cognitive maps with the traditional TOPSIS approach. In turn, Heidary Dahooie et al. (2021) [45] proposed a combination of Data Envelopment Analysis (DEA) with the dynamic multi-attribute decision-making method (DMADM), considering it an innovative dynamic decision-making method for assessing loan applications. The credit performance criteria were distinguished on the basis of a literature review and expert opinion. In contrast, the criteria weights were calculated using the dynamic approach to the common set of DEA weights. Then, candidates were prioritized using five Gray MADM methods (including SAW-G, VIKOR-G, TOPSIS-G, ARAS-G and COPRAS-G). In the final study, a new method called the correlation coefficient and standard deviation (CCSD) was used to determine the aggregate rank.

In the summary of the review of credit risk assessment methods, it should be added that in recent years, in line with the observations of Bellacos (2018) [46], efforts to improve the traditional approach to credit scoring have not always been successful. Compared to traditional credit models, the data used in the new credit models is much more precise, comprehensive and holistic. These data, combined with modern machine learning (ML) algorithms and artificial intelligence (AI), provide much better calibrated risk assessment models. On the other hand, when comparing ML and AI methods with expert credit risk assessment, it should be noted that modern methods take into account many more decision-making factors than a human can do. The expert has knowledge based on his previous experience, but classification models have much more knowledge. The knowledge of classifiers is also based on previous experiences, in this case written as a set of training cases, but their ability to process information is much greater than that of an expert who has limited perception. Moreover, ML methods, unlike humans, do not get tired, do not get sick, etc. Additionally, in the literature, the advantage of machine learning and data mining methods over expert assessment in complex problems requiring the processing of many data is noticed [47]. On the other hand, there are still areas where the expert outweighs ML and AI methods [48].

The banking sector already has some characteristics such as: advanced computerization (available computing power, modern analytical tools), large amounts of transaction data, financial history of customers, which make it the preferred field for implementing credit risk assessment models based on machine learning and artificial intelligence. The content of the Digital Banking report (2021) [49] presenting current trends and priorities in retail banking shows that most banking institutions know what is needed, and many of them even know how to face the current challenges. The problem, however, is that current banking standards keep organizations from doing this. In the area of credit decisions, this applies to solutions with a very complicated, difficult or even impossible explanation mechanism. An example is neural networks seen as black boxes. What is happening inside such a network cannot be fully explained. Banks in Poland refuse to use such tools, as it is difficult to justify a specific credit decision made on their basis before the Polish Financial Supervision Authority (PFSA). PFSA is sympathetic to traditional scoring and other methods whose results are intuitive, easily interpreted, and easy to argue and explain.

3. Materials and Methods

3.1. Classification Methods

Machine learning can be used for various tasks, among others, in classification problems, consisting in predicting the belonging of an object to a certain class on the basis of well-defined characteristics of this object. Usually, discrimination of selected object is based on the earlier training of the classifier, during which the classification algorithm attempts to “learn”, what are the real classes of training objects and what features determine whether the objects belong to specific classes [47,50]. Methods for classification task are, e.g., C4.5 decision tree (C4.5), random forest (RF), decision table (DT), naive Bayes (NB) classifier, logistic regression (LR), or k-nearest neighbors (kNN) algorithm. The characteristics of selected classification methods are presented in Table 1.

3.2. Feature Selection Methods

One of the basic issues in classification task is the multidimensionality of the object to be assigned to a specific class. This is a serious obstacle decreasing accuracy of classification algorithms, known as the “dimensional curse” [66]. Dimensionality reduction of feature space allows lowering the computational and data collection costs, which eventually improves predictions [67]. Tools, which can be used for that task are called feature selection methods.

The feature selection process focuses on identifying relevant features in dataset as significant and rejecting redundant features [68]. For this purpose, various algorithms are used to assess the importance of particular features in the classification task. The feature selection methods are divided into three categories: filters, wrappers, and embedded methods [69]. Filters and wrappers are usually composed of four elements (steps), such as: generation of feature subset, evaluation of the subset, stopping criterion, result validation [70]. By describing individual elements of the feature selection methods, it is possible to point out significant differences between these groups of methods.

Filters are based on independent evaluation of features using general data characteristics. For example, Pearson correlation coefficients between each input and selected output can be used. Feature subset is determined by defining threshold for minimum value of correlation or particular number of features to be selected before training the machine learning algorithm [71].

Wrappers evaluate individual feature subsets using machine learning algorithms, which algorithms will eventually be used in the classification or regression task. In this case, training algorithm is included in the feature selection procedure, therefore, cross-validation based on set of training cases is usually used to estimate the accuracy of the classifier using a specific feature subset [72].

Embedded methods are similar to wrappers in that they use classification to perform the task of feature selection. The main difference between wrappers and embedded methods is “embedding” of selection procedure into the selected classifier. In other words, the dimensions of training objects subject to classification are reduced while building classifier model [73]. For instance, in decision trees unnecessary features are eliminated by trimming and defining the minimum number of objects in the node.

Wrappers differ only in the applied machine learning algorithms, so, as in the case of embedded methods, the results obtained using them depend solely on the quality of the machine learning algorithm and the algorithm fit to a specific classification task. Wrappers and embedded methods analyze the features of the objects contained in the training set only in terms of obtaining the maximum number of correct classifications, omitting other characteristics of the features. Meanwhile, the general characteristics of the features seem so important that they should affect the selection of individual features that determine the training and test cases. Therefore, filtration procedures that determine the significance of individual attributes using measures other than classifier’s accuracy seem to be more interesting. Filter methods are using various measures to assess relevance of each feature, e.g., distance function and different correlation measures.

Popular filter technique that uses the distance function is ReliefF [74]. On the other hand, the most numerous groups of filters are correlation procedures, among them the most promising are: Symmetrical Uncertainty (SU) [75], Correlation-based Feature Selection (CFS) [76], Fast Correlation-Based Filter (FCBF) [77], and Significance Attribute (SA) [78]. The basis characteristics of each method are presented in Table 2.

3.3. Resampling Methods

In binary classification, when number of classes in training set is unbalanced, i.e., class distribution is strongly skewed, conventional classifiers maximizing their accuracy usually build models that tend to classify all objects as belonging to the majority class. This results in low accuracy for the minority class, whose objects are underrepresented in training set, whereas such class is often of uttermost importance [84]. To overcome this issue, resampling methods are commonly used for training set. The two most popular in machine learning, yet very simple, are techniques of random undersampling and random oversampling [20]. In addition to the resampling methods already aforementioned, another interesting approach is Synthetic Minority Over-sampling Technique (SMOTE) [85]. Table 3 lists the main advantages and disadvantages of each of these approaches.

3.4. Discretization Methods

Some classification algorithms improve their performance by using feature discretization. Moreover, certain classifiers cannot work without data discretization. Such methods bin continuous features, dividing them into ranges or intervals, resulting in conversion of numerical data to nominal data. Here, main issue with feature discretization is appropriate choice of cutpoints, because continuous data can be discretized in an infinite number of ways. Perfect discretization method should find a relatively small number of cutpoints, dividing data into relevant bins. Among discretization techniques, there are supervised and unsupervised methods. First group results are superior to second group, because it uses class distribution to which each object belongs as additional information. Great number of methods perform discretization based on class entropy, which is a measure of uncertainty in finite range of classes. Entropy is calculated for different splits and compared to entropy of dataset without splits. It is run recursively until the search stop criterion is meet [86]. For instance, heuristic method of Minimal Description Length Principle (MDLP) can be used, here. This technique determines whether or not to accept current cut-off point candidate, thus, stopping recursion if specified condition is not met [87]. The entropy-based discretization with MDLP stop criterion is considered to be one of the best supervised discretization methods [71]. It measures information gain score of possible cutpoint by comparing entropy value. For each considered cutpoint, entropy of input interval is compared to the weighted sum of entropies for two output intervals. There are several different criteria for MDLP stopping condition, including Fayyad criterion [88] and Kononenko criterion [89].

3.5. Classification Evaluation Metrics

The quality of the classification can be evaluated by, e.g., Receiver Operating Characteristic curve (ROC), Area Under Receiver Operating Characteristic curve (AUROC) and Gini coefficient (GC). Another interesting measure is Precision-Recall Curve (PRC).

ROC is the graphic representation of the predictive model effectiveness made by sketching the quantitative characteristics of binary classifiers derived from such model using variety of cut-off points. This shows the relationship between True Positive Rate (TPR) and False Positive Rate (FPR). TPR can be calculated as follows by Equation (1) [85]:

T P R = \frac{T P}{T P + F N}

(1)

where TP indicates number of true positives, i.e., model predicts positive class correctly and FN indicates number of false negatives, i.e., model predicts negative class incorrectly. In turn, FPR is defined as Equation (2) [85]:

F P R = \frac{F P}{F P + T N}

(2)

where FP indicates number of false positives, i.e., model predicts positive class incorrectly and TN indicates number of true negatives, i.e., model predicts negative class correctly.

AUROC measures the classifier’s accuracy. It is calculated as probability thresholds for following event—considered object belongs to negative or positive class. Geometrically, this is area below ROC. The higher value of AUROC, the better classification results of model are, where AUROC < 0.5 means invalid classifier, i.e., worse than random, AUROC = 0.5 means random classifier, and AUROC = 1 means ideal classifier [85].

GC is a measure of model’s quality, interpreted as degree of ideality for classifier. GC is calculated based on the following Equation (3):

G C = 2 * A U R O C - 1

(3)

The higher value of GC, the better classifier is, where GC = 0 means random classifier, and GC = 1 means ideal classifier [90].

PRC shows dependence between precision (Positive Predictive Value—PPV) and recall (TPR) for the classifier, where former is calculated as follows Equation (4) [91]:

P P V = \frac{T P}{T P + F P}

(4)

Big area under PRC (AUPRC) represents both high precision and high recall, where high precision corresponds to low false positive frequency and high recall corresponds to low false negative frequency. High scores for precision and recall indicate that classifier predicts accurate results and also most of them are positive [91]. PRCs are often zigzag curves with oscillations. Due to that fact, they tend to cross over much more than ROCs, therefore, leaving researcher difficult comparison. It is recommended to use PRCs in addition to ROCs for obtaining complete overview while evaluation and comparison of classifier models [92].

4. Research Procedure

The dataset on which the experiment was conducted describes anonymized data about loan repayment and borrowers. This set consists of 91,759 records described by 272 conditional attributes (features) and the decision attribute. It was divided in proportion 70/30% into training set (64,230 records) and testing set (27,529 records) [93].

Final research was preceded by a series of preliminary tests, during which following were selected:

the most promising and various filter methods for feature selection;
different classifiers, bearing in mind their core algorithm, way of knowledge representation and ability to explain classification of cases.

During preliminary tests, it was noticed that one of the models with outstanding classification results can be random forest, therefore, its more detailed examination allowed to select optimal parameters, i.e., number of iterations = 239 and maximum tree depth = 13 [22].

In this research study it was assumed that various combinations will be tested, consisting in filter methods (SU, FCBF, CFS, SA, ReliefF), classifiers models (C4.5, DT, kNN, LR, NB, RF, optimized random forest (ORF)), resampling methods (without resampling, random undersampling, SMOTE) and feature discretization (without discretization, Fayyad criterion, Kononenko criterion). Taking into account the number of methodological approaches considered in each group, this gives 315 different scenarios and the same number of classification models supporting credit decisions. In practice, this number was smaller due to the fact that the number of conducted scenarios was limited, because of omitting selected resampling and discretization algorithms. Here, following heuristics was used, according to which, if specific preprocessing method, i.e., resampling or discretization, does not give satisfactory results, then there is no reason for its inclusion in subsequent scenario. Moreover, due to the high computational complexity, some scenarios did not use ReliefF. It should be noted that in case of large training dataset, this method performed in general time-consuming calculations, not yielding acceptable results. Therefore, all scenarios included at least 4 filter methods (SU, FCBF, CFS, SA) and all seven classifiers. Additionally, it should be clarified that for case of random undersampling, each scenario was repeated three times, building three different classification models and averaging results, eventually. The above approach was followed in order to minimize the impact of training cases random selection on classification results. The research study was divided into four general scenarios in which following combinations of methods were applied:

without resampling, without discretization, feature selection, classification method;
resampling, without discretization, feature selection, classification method;
without resampling, discretization, feature selection, classification method;
resampling, discretization, feature selection, classification method.

Furthermore, at the beginning, classification was performed without using filter methods, i.e., scenario 0. Results of this study were reference to subsequent scenarios in which filter methods were used. According to such approach all research scenarios allowed to define:

the effect of feature selection on classification;
the effect of data resampling on classification with feature selection;
the effect of feature discretization on classification with feature selection;
the effect of data resampling with feature discretization on classification with feature selection.

Figure 1 depicts the research study, which was carried out. Figure 1 shows that processing techniques including feature discretization and feature selection were applied to training set and results were used in testing set. This was necessary step to allow full consistency between training set and testing set. For instance, binning of training data was achieved and then the same bins were adopted to testing data. Likewise, selection of relevant features was done based on training set and redundant features were removed from testing set. Only one processing method used on training cases without testing cases was data resampling.

5. Results and Discussion

Full results of conducted research study are presented in Appendix A, Appendix B, Appendix C, Appendix D, Appendix E, Appendix F and Appendix G, while this section shows only the best results from each considered scenario. Table 4 depicts the four top classification results from each scenario. From Table 4 it can be stated that the best classification results are obtained by RF model with possible optimization and feature selection method allowing top classification results is mainly CFS. It should be also noted that overall outstanding result was achieved by RF on full dataset of 272 features. Obviously, dimensionality reduction of such data is necessary due to the lack of ability to explain classification or need to collect great amount of information in order to classify new case. Assuming feature selection is made without resampling or discretization the best classification results were obtained by ORF. However, if both feature selection and classification accuracy are important, then RF model should be supported by data resampling, which allows to balance class distribution. Moreover, in case of RF, as well as LR and DT, undersampling provides better classification results than discretization (cf. Appendix C, Appendix E and Appendix F). On the contrary, it is opposite for NB, kNN and C4.5. Furthermore, RF and LR, both with undersampling, yield superior results than with combination of undersampling and discretization. On the other hand, above combination improves quality of classification for NB. Additionally, in order to obtain acceptable results using LR or NB, it is necessary to employ methods previously mentioned while for RF model they can be entirely omitted. Moreover, the randomness in applied undersampling algorithm also plays vital role. It has serious impact on obtained feature sets, thus, on results of classification. Nevertheless, conclusions drawn here are true for each research case performed during the study. It should be noted that in order to maximize accuracy of classification, it is recommended to carry out several draws and select set of training cases that allows to obtain the best results for the classification of testing cases.

On the other hand, if the selection of possibly smallest feature set is of great importance, then FCBF should be used. Table 5 depicts four top classification results from each scenario where feature sets were obtained by above method. From Table 5 it can be stated that feature sets consisting in five or six features do not provide acceptable classification results. Bearing in mind that the minimum number of features and the maximum accuracy are essential, results of RF in scenario 2 and NB in scenario 4 are worth noting. DT achieves also relatively good classification results compared to other models. Main reason behind that is due to the built-in feature selection, i.e., DT automatically reduces feature space. Whether input feature set is relatively large enough, this can cause deterioration of classification compared to other models, but with low number of features additional reduction is not performed, so that there is no negative impact on final results.

6. Conclusions

The article deals with the problem of credit decisions based on machine learning methods. In particular, the effects of the application were verified together with classifiers of other machine learning methods in the processing of the credit data set. Summarizing results of conducted research study, it is possible to indicate premises related to use of individual methods, i.e., feature selection, binary classification, data resampling, feature discretization:

if classification result is important, then RF will return good results over a full set of data;
if both feature selection and classification accuracy are important, then acceptable results will be obtained by undersampling with CFS and RF;
if both minimum number of features and classification accuracy are important, then fair results will be achieved by following approaches: (1) CFS with RF, (2) undersampling with FCBF and RF, (3) discretization with CFS and LR or NB, (4) undersampling with discretization, FCBF and NB.

Of course, above heuristics do not fulfill topic in an exhaustive way of choosing appropriate approach to credit scoring problem. In some business cases, apart from classification result and size of feature set, the ability to explain classification may be also important, which gives certain advantage. Moreover, constraining oneself only to classification accuracy, it is not possible to clearly determine whether it is better to use AUROC, AUPRC or GC. Basically, the selection of classification model will consist in seeking trade-off between inherent features of classifiers. Therefore, further research is targeted on the selection of a specific approach using a classifier for credit decisions in support of stakeholders (e.g., banks) depending on their personal needs (i.e., actual requirements and preferences). Assessment of various approaches is, here, a multi-criteria decision problem, thus, a multi-criteria decision analysis [94] will be involved.

Author Contributions

Conceptualization, P.Z. and A.B.; methodology, P.Z.; validation, J.B.; formal analysis, A.B.; investigation, P.Z.; resources, A.R.-Z.; data curation, M.P.; writing—original draft preparation, P.Z.; writing—review and editing, J.B. and A.B.; supervision, P.Z. and J.B.; project administration, A.R.-Z.; funding acquisition, D.W. All authors have read and agreed to the published version of the manuscript.

Funding

The research is partially financed through the National Centre for Research and Development, Poland (grant no. POIR.01.01.01-00-0322/18-00).

Data Availability Statement

Data available on request.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Results of Scenario 0

Table A1. Classification results for complete feature set.

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.604	0.858	0.677	0.850	0.811	0.880	0.914
GC	0.208	0.716	0.354	0.700	0.622	0.760	0.828
AUPRC negative	0.991	0.998	0.993	0.997	0.997	0.998	0.999
AUPRC positive	0.048	0.087	0.053	0.096	0.013	0.276	0.275
AUPRC mean	0.982	0.989	0.984	0.988	0.987	0.991	0.991

Appendix B. Results of Scenario 1

Table A2. Classification results for feature subset selected by CFS (13 features).

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.795	0.828	0.640	0.834	0.831	0.852	0.881
GC	0.590	0.656	0.280	0.668	0.662	0.704	0.762
AUPRC negative	0.996	0.997	0.993	0.998	0.998	0.998	0.998
AUPRC positive	0.073	0.063	0.029	0.072	0.046	0.137	0.147
AUPRC mean	0.987	0.988	0.983	0.988	0.988	0.989	0.990

Table A3. Classification results for feature subset selected by FCBF (six features).

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.730	0.812	0.658	0.786	0.792	0.740	0.813
GC	0.460	0.624	0.316	0.572	0.584	0.480	0.626
AUPRC negative	0.995	0.997	0.993	0.997	0.997	0.995	0.997
AUPRC positive	0.067	0.074	0.037	0.044	0.035	0.071	0.086
AUPRC mean	0.985	0.987	0.983	0.987	0.987	0.985	0.988

Table A4. Classification results for feature subset selected by SU (13 features).

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.657	0.732	0.568	0.742	0.719	0.660	0.729
GC	0.314	0.464	0.136	0.484	0.438	0.320	0.458
AUPRC negative	0.993	0.995	0.991	0.995	0.995	0.993	0.995
AUPRC positive	0.050	0.060	0.039	0.042	0.032	0.062	0.079
AUPRC mean	0.983	0.985	0.982	0.986	0.985	0.983	0.985

Table A5. Classification results for feature subset selected by SA (13 features).

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.657	0.734	0.606	0.723	0.724	0.716	0.756
GC	0.314	0.468	0.212	0.446	0.448	0.432	0.512
AUPRC negative	0.993	0.995	0.992	0.995	0.995	0.994	0.996
AUPRC positive	0.056	0.046	0.047	0.044	0.041	0.106	0.113
AUPRC mean	0.983	0.985	0.982	0.985	0.985	0.985	0.987

Appendix C. Results of Scenario 2—Random Undersampling

Table A6. Classification results for feature subset selected by CFS (35/27/37 features).

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.768	0.852	0.744	0.891	0.849	0.901	0.902
GC	0.537	0.704	0.488	0.781	0.699	0.802	0.805
AUPRC negative	0.995	0.998	0.996	0.999	0.998	0.999	0.999
AUPRC positive	0.031	0.057	0.022	0.082	0.049	0.111	0.118
AUPRC mean	0.986	0.988	0.986	0.989	0.988	0.990	0.990

Table A7. Classification results for feature subset selected by FCBF (12/14/11 features).

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.780	0.849	0.740	0.848	0.819	0.872	0.874
GC	0.560	0.699	0.481	0.696	0.637	0.743	0.749
AUPRC negative	0.996	0.998	0.996	0.998	0.997	0.998	0.998
AUPRC positive	0.030	0.057	0.028	0.064	0.048	0.089	0.094
AUPRC mean	0.986	0.988	0.986	0.988	0.988	0.989	0.989

Table A8. Classification results for feature subset selected by SU (35/27/37 features).

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.777	0.850	0.760	0.885	0.847	0.890	0.893
GC	0.555	0.701	0.519	0.769	0.693	0.781	0.786
AUPRC negative	0.996	0.998	0.996	0.999	0.998	0.999	0.999
AUPRC positive	0.032	0.061	0.024	0.081	0.048	0.105	0.111
AUPRC mean	0.986	0.988	0.986	0.989	0.988	0.989	0.989

Table A9. Classification results for feature subset selected by SA (35/27/37 features).

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.797	0.843	0.758	0.867	0.829	0.879	0.886
GC	0.594	0.687	0.515	0.735	0.657	0.758	0.772
AUPRC negative	0.997	0.998	0.996	0.998	0.997	0.999	0.999
AUPRC positive	0.033	0.061	0.025	0.075	0.045	0.095	0.100
AUPRC mean	0.987	0.988	0.986	0.992	0.988	0.989	0.989

Table A10. Classification results for feature subset selected by ReliefF (35/27/37 features).

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.767	0.843	0.725	0.734	0.838	0.852	0.853
GC	0.535	0.687	0.450	0.469	0.675	0.703	0.705
AUPRC negative	0.996	0.998	0.995	0.995	0.998	0.998	0.998
AUPRC positive	0.030	0.056	0.028	0.031	0.049	0.077	0.078
AUPRC mean	0.986	0.988	0.985	0.985	0.988	0.989	0.989

Appendix D. Results of Scenario 2—SMOTE

Table A11. Classification results for feature subset selected by CFS (42 features).

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.582	0.653	0.686	0.727	0.765	0.883	0.875
GC	0.164	0.306	0.372	0.454	0.530	0.766	0.750
AUPRC negative	0.990	0.994	0.994	0.995	0.996	0.998	0.998
AUPRC positive	0.045	0.021	0.026	0.049	0.040	0.158	0.111
AUPRC mean	0.980	0.984	0.984	0.986	0.986	0.990	0.989

Table A12. Classification results for feature subset selected by FCBF (28 features).

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.573	0.653	0.682	0.757	0.754	0.869	0.869
GC	0.146	0.306	0.364	0.514	0.508	0.738	0.738
AUPRC negative	0.990	0.994	0.994	0.996	0.995	0.998	0.998
AUPRC positive	0.034	0.021	0.027	0.054	0.040	0.131	0.083
AUPRC mean	0.980	0.984	0.984	0.986	0.986	0.989	0.989

Table A13. Classification results for feature subset selected by SU (42 features).

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.684	0.633	0.751	0.863	0.775	0.846	0.849
GC	0.368	0.266	0.502	0.726	0.550	0.692	0.698
AUPRC negative	0.993	0.994	0.995	0.998	0.996	0.997	0.998
AUPRC positive	0.061	0.021	0.055	0.073	0.043	0.116	0.092
AUPRC mean	0.983	0.984	0.985	0.989	0.986	0.988	0.989

Table A14. Classification results for feature subset selected by SA (42 features).

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.639	0.623	0.772	0.851	0.776	0.839	0.839
GC	0.278	0.246	0.544	0.702	0.552	0.678	0.678
AUPRC negative	0.992	0.993	0.996	0.998	0.996	0.997	0.998
AUPRC positive	0.042	0.019	0.066	0.072	0.048	0.117	0.099
AUPRC mean	0.982	0.983	0.986	0.989	0.986	0.988	0.988

Appendix E. Results of Scenario 3—Fayyad Criterion

Table A15. Classification results for feature subset selected by CFS (14 features).

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.801	0.848	0.746	0.879	0.877	0.761	0.771
GC	0.602	0.696	0.492	0.758	0.754	0.522	0.542
AUPRC negative	0.996	0.998	0.996	0.998	0.998	0.995	0.995
AUPRC positive	0.084	0.082	0.083	0.117	0.101	0.084	0.085
AUPRC mean	0.987	0.988	0.986	0.989	0.989	0.985	0.986

Table A16. Classification results for feature subset selected by FCBF (five features).

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.656	0.826	0.813	0.822	0.824	0.810	0.810
GC	0.312	0.652	0.626	0.644	0.648	0.620	0.620
AUPRC negative	0.993	0.997	0.997	0.997	0.997	0.997	0.996
AUPRC positive	0.045	0.069	0.074	0.081	0.082	0.070	0.071
AUPRC mean	0.983	0.987	0.987	0.987	0.988	0.987	0.987

Table A17. Classification results for feature subset selected by SU (14 features).

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.649	0.732	0.678	0.717	0.720	0.653	0.647
GC	0.298	0.464	0.356	0.434	0.440	0.306	0.294
AUPRC negative	0.993	0.995	0.993	0.994	0.994	0.992	0.992
AUPRC positive	0.052	0.060	0.062	0.072	0.067	0.052	0.053
AUPRC mean	0.983	0.985	0.984	0.985	0.985	0.983	0.983

Table A18. Classification results for feature subset selected by SA (14 features).

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.634	0.707	0.651	0.721	0.717	0.609	0.600
GC	0.268	0.414	0.302	0.442	0.434	0.218	0.200
AUPRC negative	0.993	0.994	0.993	0.994	0.994	0.992	0.991
AUPRC positive	0.056	0.060	0.071	0.072	0.071	0.052	0.053
AUPRC mean	0.983	0.985	0.984	0.985	0.985	0.982	0.982

Appendix F. Results of Scenario 3—Kononenko Criterion

Table A19. Classification results for feature subset selected by CFS (13 features).

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.814	0.847	0.744	0.876	0.880	0.787	0.791
GC	0.628	0.694	0.488	0.752	0.760	0.574	0.582
AUPRC negative	0.997	0.997	0.995	0.998	0.998	0.995	0.996
AUPRC positive	0.086	0.083	0.087	0.116	0.102	0.089	0.089
AUPRC mean	0.987	0.988	0.986	0.989	0.989	0.986	0.986

Table A20. Classification results for feature subset selected by FCBF (five features).

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.656	0.826	0.813	0.822	0.824	0.810	0.810
GC	0.312	0.652	0.626	0.644	0.648	0.620	0.620
AUPRC negative	0.993	0.997	0.997	0.997	0.997	0.997	0.996
AUPRC positive	0.045	0.069	0.074	0.081	0.082	0.070	0.071
AUPRC mean	0.983	0.987	0.987	0.987	0.988	0.987	0.987

Table A21. Classification results for feature subset selected by SU (13 features).

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.646	0.727	0.658	0.725	0.728	0.644	0.643
GC	0.292	0.454	0.316	0.450	0.456	0.288	0.286
AUPRC negative	0.993	0.995	0.993	0.995	0.995	0.992	0.992
AUPRC positive	0.049	0.047	0.060	0.069	0.065	0.052	0.052
AUPRC mean	0.983	0.985	0.984	0.985	0.985	0.983	0.983

Table A22. Classification results for feature subset selected by SA (13 features).

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.817	0.825	0.703	0.793	0.814	0.713	0.714
GC	0.634	0.650	0.406	0.586	0.628	0.426	0.428
AUPRC negative	0.997	0.997	0.994	0.996	0.997	0.994	0.994
AUPRC positive	0.084	0.063	0.067	0.075	0.079	0.066	0.066
AUPRC mean	0.987	0.988	0.985	0.987	0.988	0.984	0.984

Appendix G. Results of Scenario 4—Random Undersampling, Kononenko Criterion

Table A23. Classification results for feature subset selected by CFS (35 features).

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.821	0.852	0.871	0.883	0.884	0.895	0.897
GC	0.642	0.704	0.742	0.766	0.768	0.790	0.794
AUPRC negative	0.997	0.998	0.998	0.998	0.999	0.999	0.999
AUPRC positive	0.036	0.049	0.091	0.098	0.112	0.116	0.121
AUPRC mean	0.987	0.988	0.989	0.989	0.989	0.990	0.990

Table A24. Classification results for feature subset selected by FCBF (10 features).

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.840	0.843	0.861	0.866	0.878	0.861	0.861
GC	0.680	0.686	0.722	0.732	0.756	0.722	0.722
AUPRC negative	0.997	0.998	0.998	0.998	0.998	0.998	0.998
AUPRC positive	0.045	0.054	0.063	0.081	0.084	0.065	0.065
AUPRC mean	0.988	0.988	0.989	0.989	0.989	0.989	0.989

Table A25. Classification results for feature subset selected by SU (35 features).

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.830	0.851	0.860	0.884	0.860	0.865	0.870
GC	0.660	0.702	0.720	0.768	0.720	0.730	0.740
AUPRC negative	0.997	0.998	0.998	0.998	0.998	0.998	0.998
AUPRC positive	0.039	0.056	0.072	0.094	0.079	0.085	0.094
AUPRC mean	0.988	0.988	0.989	0.989	0.989	0.989	0.989

Table A26. Classification results for feature subset selected by SA (35 features).

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.823	0.852	0.861	0.872	0.854	0.873	0.877
GC	0.646	0.704	0.722	0.744	0.708	0.746	0.754
AUPRC negative	0.997	0.998	0.998	0.998	0.998	0.998	0.998
AUPRC positive	0.041	0.049	0.079	0.092	0.083	0.086	0.091
AUPRC mean	0.987	0.988	0.989	0.989	0.989	0.989	0.989

Table A27. Classification results for feature subset selected by ReliefF (35 features).

Measure	Classifier
Measure	C4.5	DT	kNN	LR	NB	RF	ORF
AUROC	0.784	0.851	0.842	0.828	0.844	0.876	0.878
GC	0.568	0.702	0.684	0.656	0.688	0.752	0.756
AUPRC negative	0.996	0.998	0.998	0.998	0.998	0.998	0.998
AUPRC positive	0.030	0.056	0.070	0.067	0.080	0.105	0.104
AUPRC mean	0.986	0.988	0.988	0.988	0.989	0.989	0.989

References

Koutanaei, F.N.; Sajedi, H.; Khanbabaei, M. A Hybrid Data Mining Model of Feature Selection Algorithms and Ensemble Learning Classifiers for Credit Scoring. J. Retail. Consum. Serv. 2015, 27, 11–23. [Google Scholar] [CrossRef]
Wang, D.; Zhang, Z.; Bai, R.; Mao, Y. A Hybrid System with Filter Approach and Multiple Population Genetic Algorithm for Feature Selection in Credit Scoring. J. Comput. Appl. Math. 2018, 329, 307–321. [Google Scholar] [CrossRef]
Tunç, A. Feature Selection in Credibility Study for Finance Sector. Procedia Comput. Sci. 2019, 158, 254–259. [Google Scholar] [CrossRef]
Tripathi, D.; Edla, D.R.; Kuppili, V.; Bablani, A.; Dharavath, R. Credit Scoring Model Based on Weighted Voting and Cluster Based Feature Selection. Procedia Comput. Sci. 2018, 132, 22–31. [Google Scholar] [CrossRef]
Pawlak, Z. Rough Sets and Fuzzy Sets. Fuzzy Sets Syst. 1985, 17, 99–102. [Google Scholar] [CrossRef]
Maldonado, S.; Peters, G.; Weber, R. Credit Scoring using Three-Way Decisions with Probabilistic Rough Sets. Inf. Sci. 2020, 507, 700–714. [Google Scholar] [CrossRef]
Capotorti, A.; Barbanera, E. Credit Scoring Analysis using a Fuzzy Probabilistic Rough Set Model. Comput. Stat. Data Anal. 2012, 56, 981–994. [Google Scholar] [CrossRef]
Zhou, X.; Zhang, D.; Jiang, Y. A New Credit Scoring Method Based on Rough Sets and Decision Tree. In Advances in Knowledge Discovery and Data Mining; Washio, T., Suzuki, E., Ting, K.M., Inokuchi, A., Eds.; Springer: Berlin/Heidelberg, Germany, 2008; pp. 1081–1089. [Google Scholar]
Zhou, J.; Tian, J. Credit Risk Assessment Based on Rough Set Theory and Fuzzy Support Vector Machine; Atlantis Press: Paris, France, 2007. [Google Scholar]
Zhou, J.; Bai, T. Credit Risk Assessment using Rough Set Theory and GA-Based SVM. In Proceedings of the 2008 the 3rd International Conference on Grid and Pervasive Computing—Workshops, Kunming, China, 25–28 May 2008; pp. 320–325. [Google Scholar] [CrossRef]
Ziemba, P. Multi-Criteria Fuzzy Evaluation of the Planned Offshore Wind Farm Investments in Poland. Energies 2021, 14, 978. [Google Scholar] [CrossRef]
López, J.; Maldonado, S. Profit-Based Credit Scoring Based on Robust Optimization and Feature Selection. Inf. Sci. 2019, 500, 190–202. [Google Scholar] [CrossRef]
Liu, Y.; Schumann, M. Data Mining Feature Selection for Credit Scoring Models. J. Oper. Res. Soc. 2005, 56, 1099–1108. [Google Scholar] [CrossRef]
Somol, P.; Baesens, B.; Pudil, P.; Vanthienen, J. Filter-versus Wrapper-Based Feature Selection for Credit Scoring. Int. J. Intell. Syst. 2005, 20, 985–999. [Google Scholar] [CrossRef]
Ha, S.; Nguyen, H.-N. Credit Scoring with a Feature Selection Approach Based Deep Learning. In MATEC Web of Conferences; EDP Sciences: Les ulis, France, 2016; Volume 54, p. 05004. [Google Scholar] [CrossRef] [Green Version]
Aryuni, M.; Madyatmadja, E. Feature Selection in Credit Scoring Model for Credit Card Applicants in XYZ Bank: A Comparative Study. Int. J. Multimed. Ubiquitous Eng. 2015, 10, 17–24. [Google Scholar] [CrossRef] [Green Version]
Boughaci, D.; Alkhawaldeh, A.A. Three Local Search-Based Methods for Feature Selection in Credit Scoring. Vietnam J. Comput. Sci. 2018, 5, 107–121. [Google Scholar] [CrossRef] [Green Version]
Van, S.H.; Ha, N.N.; Bao, H.N.T. A Hybrid Feature Selection Method for Credit Scoring. EAI Endorsed Trans. Context-Aware Syst. Appl. 2017, 4, e2. [Google Scholar]
Kozodoi, N.; Lessmann, S.; Papakonstantinou, K.; Gatsoulis, Y.; Baesens, B. A Multi-Objective Approach for Profit-Driven Feature Selection in Credit Scoring. Decis. Support Syst. 2019, 120, 106–117. [Google Scholar] [CrossRef]
Guo, X.; Yin, Y.; Dong, C.; Yang, G.; Zhou, G. On the Class Imbalance Problem. In Proceedings of the Fourth International Conference on Natural Computation, Jinan, China, 18–20 October 2008; Volume 4. [Google Scholar] [CrossRef]
García, S.; Luengo, J.; Sáez, J.A.; López, V.; Herrera, F. A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning. IEEE Trans. Knowl. Data Eng. 2013, 25, 734–750. [Google Scholar] [CrossRef]
Ziemba, P.; Radomska-Zalas, A.; Becker, J. Client Evaluation Decision Models in the Credit Scoring Tasks. Procedia Comput. Sci. 2020, 176, 3301–3309. [Google Scholar] [CrossRef]
Becker, J.; Radomska-Zalas, A.; Ziemba, P. Rough Set Theory in the Classification of Loan Applications. Procedia Comput. Sci. 2020, 176, 3235–3244. [Google Scholar] [CrossRef]
Andersson, F.; Mausser, H.; Rosen, D.; Uryasev, S. Credit Risk Optimization with Conditional Value-at Risk Criterion. Math. Program. 2001, 89, 273–291. [Google Scholar] [CrossRef]
Chen, N.; Ribeiro, B.; Chen, A. Financial Credit Risk Assessment: A Recent Review. Artif. Intell. Rev. 2016, 45, 1–23. [Google Scholar] [CrossRef]
Shen, G.; Jia, W. The Prediction Model of Financial Crisis Based on the Combination of Principle Component Analysis and Support Vector Machine. Open J. Soc. Sci. 2014, 2, 204–212. [Google Scholar] [CrossRef] [Green Version]
Altman, E.I. Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy. J. Financ. 1968, 23, 589–609. [Google Scholar] [CrossRef]
Kouki, M.; Elkhaldi, A. Toward a Predicting Model of Firm Bankruptcy: Evidence from the Tunisian Context. Middle East. Financ. Econ. 2011, 14, 26–43. [Google Scholar]
Kwak, W.; Shi, Y.; Kou, G. Bankruptcy Prediction for Korean Firms after the 1997 Financial Crisis: Using a Multiple Criteria Linear Programming Data Mining Approach. Rev. Quant. Financ. Account. 2012, 38, 441–453. [Google Scholar] [CrossRef]
Cheng, K.F.; Chu, C.K.; Hwang, R.-C. Predicting Bankruptcy using the Discrete-Time Semiparametric Hazard Model. Quant. Financ. 2010, 10, 1055–1066. [Google Scholar] [CrossRef]
Hwang, R.-C.; Chung, H.; Chu, C.K. Predicting Issuer Credit Ratings using a Semiparametric Method. J. Empir. Financ. 2010, 17, 120–137. [Google Scholar] [CrossRef]
Klein, R.; Spady, R.H. An Efficient Semiparametric Estimator for Binary Response Models. Econometrica 1993, 61, 387–421. [Google Scholar] [CrossRef] [Green Version]
Brezigar-Masten, A.; Masten, I. CART-Based Selection of Bankruptcy Predictors for the Logit Model. Expert Syst. Appl. 2012, 39, 10153–10159. [Google Scholar] [CrossRef]
Li, J.; Pan, L.; Chen, M.; Yang, X. Parametric and Non-Parametric Combination Model to Enhance Overall Performance on Default Prediction. J. Syst. Sci. Complex. 2014, 27, 950–969. [Google Scholar] [CrossRef]
Mokhatab Rafiei, F.; Manzari, S.M.; Bostanian, S. Financial Health Prediction Models using Artificial Neural Networks, Genetic Algorithm and Multivariate Discriminant Analysis: Iranian Evidence. Expert Syst. Appl. 2011, 38, 10210–10217. [Google Scholar] [CrossRef]
Chen, N.; Vieira, A.; Ribeiro, B.; Duarte, J.; Neves, J. A Stable Credit Rating Model Based on Learning Vector Quantization. Intell. Data Anal. 2011, 15, 237–250. [Google Scholar] [CrossRef]
Blanco, A.; Pino-Mejías, R.; Lara, J.; Rayo, S. Credit Scoring Models for the Microfinance Industry using Neural Networks: Evidence from Peru. Expert Syst. Appl. 2013, 40, 356–364. [Google Scholar] [CrossRef]
Huang, F. A Genetic Fuzzy Neural Network for Bankruptcy Prediction in Chinese Corporations. In Proceedings of the 2008 International Conference on Risk Management & Engineering Management, Beijing, China, 4–6 November 2008; pp. 542–546. [Google Scholar]
Yang, Z.; You, W.; Ji, G. Using Partial Least Squares and Support Vector Machines for Bankruptcy Prediction. Expert Syst. Appl. 2011, 38, 8336–8342. [Google Scholar] [CrossRef]
Jeganathan, J.; Joseph, K.S.; Vaishnavi, J. Bankruptcy Prediction using Svm and Hybrid Svm Survey. Int. J. Comput. Appl. 2011, 34, 39–45. [Google Scholar]
Li, H.; Adeli, H.; Sun, J.; Han, J.-G. Hybridizing Principles of TOPSIS with Case-Based Reasoning for Business Failure Prediction. Comput. Oper. Res. 2011, 38, 409–419. [Google Scholar] [CrossRef]
Wang, F.; Ding, L.; Yu, H.; Zhao, Y. Big Data Analytics on Enterprise Credit Risk Evaluation of E-Business Platform. Inf. Syst. E-Bus. Manag. 2020, 18, 311–350. [Google Scholar] [CrossRef]
Arora, N.; Kaur, P.D. A Bolasso Based Consistent Feature Selection Enabled Random Forest Classification Algorithm: An Application to Credit Risk Assessment. Appl. Soft Comput. 2020, 86, 105936. [Google Scholar] [CrossRef]
Froelich, W.; Hajek, P. IVIFCM-TOPSIS for Bank Credit Risk Assessment. In Intelligent Decision Technologies 2019; Czarnowski, I., Howlett, R.J., Jain, L.C., Eds.; Springer: Singapore, 2020; pp. 99–108. [Google Scholar]
Heidary Dahooie, J.; Razavi Hajiagha, S.H.; Farazmehr, S.; Zavadskas, E.K.; Antucheviciene, J. A Novel Dynamic Credit Risk Evaluation Method using Data Envelopment Analysis with Common Weights and Combination of Multi-Attribute Decision-Making Methods. Comput. Oper. Res. 2021, 129, 105223. [Google Scholar] [CrossRef]
Bellacosa, M. AI Can Transform Trade Finance through Better SME Credit Scoring. Available online: https://www.theglobaltreasurer.com/2018/06/08/ai-can-transform-trade-finance-through-better-sme-credit-scoring/ (accessed on 19 August 2021).
Ziemba, P.; Jankowski, J.; Wątróbski, J.; Piwowarski, M. Web Projects Evaluation using the Method of Significant Website Assessment Criteria Detection. In Transactions on Computational Collective Intelligence XXII; Nguyen, N.T., Kowalczyk, R., Eds.; Springer: Berlin/Heidelberg, Germany, 2016; pp. 167–188. [Google Scholar]
Ärje, J.; Raitoharju, J.; Iosifidis, A.; Tirronen, V.; Meissner, K.; Gabbouj, M.; Kiranyaz, S.; Kärkkäinen, S. Human Experts vs. Machines in Taxa Recognition. Signal Process. Image Commun. 2020, 87, 115917. [Google Scholar] [CrossRef]
Marous, J. Retail Banking Trends and Priorities; Temenos: Geneva, Switzerland, 2021; p. 119. [Google Scholar]
Sulikowski, P.; Zdziebko, T. Deep Learning-Enhanced Framework for Performance Evaluation of a Recommending Interface with Varied Recommendation Position and Intensity Based on Eye-Tracking Equipment Data Processing. Electronics 2020, 9, 266. [Google Scholar] [CrossRef] [Green Version]
Quinlan, J.R. C4.5: Programs for Machine Learning; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1993; ISBN 978-1-55860-238-0. [Google Scholar]
Wang, X.; Zhou, C.; Xu, X. Application of C4.5 Decision Tree for Scholarship Evaluations. Procedia Comput. Sci. 2019, 151, 179–184. [Google Scholar] [CrossRef]
Breiman, L. Random Forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef] [Green Version]
Sulikowski, P.; Zdziebko, T.; Turzyński, D. Modeling Online User Product Interest for Recommender Systems and Ergonomics Studies. Concurr. Comput. Pract. Exp. 2019, 31, e4301. [Google Scholar] [CrossRef]
Demski, T. Od Pojedynczych Drzew do Losowego Lasu; StatSoft Polska: Kraków, Poland, 2011. [Google Scholar]
Breiman, L. Bagging Predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef] [Green Version]
Kohavi, R. The Power of Decision Tables. In European Conference on Machine Learning; Springer: Berlin/Heidelberg, Germany, 1995; pp. 174–189. [Google Scholar]
Kalmegh, S.R. Comparative Analysis of the WEKA Classifiers Rules Conjunctiverule & Decisiontable on Indian News Dataset by using Different Test Mode. Int. J. Eng. Sci. Invent. 2018, 7, 2319–6734. [Google Scholar]
Perzyk, M.; Biernacki, R. Zaawansowane metody statystyczne w sterowaniu procesami produkcyjnymi. Arch. Odlew. 2004, 4, 19–28. [Google Scholar]
John, G.H.; Langley, P. Estimating Continuous Distributions in Bayesian Classifiers. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, 18–20 August 1995; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1995; pp. 338–345. [Google Scholar]
StatSoft. Available online: https://www.statsoft.pl (accessed on 28 April 2021).
Hastie, T.; Tibshirani, R.; Friedman, J. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed.; Springer Series in Statistics; Springer: New York, NY, USA, 2009; ISBN 978-0-387-84857-0. [Google Scholar]
Le Cessie, S.; Van Houwelingen, J.C. Ridge Estimators in Logistic Regression. J. R. Stat. Soc. Ser. C (Appl. Stat.) 1992, 41, 191–201. [Google Scholar] [CrossRef]
Guo, G.; Wang, H.; Bell, D.; Bi, Y.; Greer, K. KNN Model-Based Approach in Classification. In On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE; Meersman, R., Tari, Z., Schmidt, D.C., Eds.; Springer: Berlin/Heidelberg, Germany, 2003; pp. 986–996. [Google Scholar]
Sá, J.P.M. De Pattern Recognition: Concepts, Methods and Applications; Springer: Berlin/Heidelberg, Germany, 2001; ISBN 978-3-642-62677-7. [Google Scholar]
Chizi, B.; Maimon, O. Dimension Reduction and Feature Selection. In Data Mining and Knowledge Discovery Handbook; Maimon, O., Rokach, L., Eds.; Springer: Boston, MA, USA, 2005; pp. 93–111. ISBN 978-0-387-25465-4. [Google Scholar]
Guyon, I. Practical Feature Selection: From Correlation to Causality. In Mining Massive Data Sets for Security—Advances in Data Mining, Search, Social Networks and Text Mining, and Their Applications to Security; IOS Press: Amsterdam, The Netherlands, 2008; pp. 27–43. [Google Scholar]
Ziemba, P.; Piwowarski, M.; Jankowski, J.; Wątróbski, J. Method of Criteria Selection and Weights Calculation in the Process of Web Projects Evaluation. In Computational Collective Intelligence; Technologies and Applications; Hwang, D., Jung, J.J., Nguyen, N.-T., Eds.; Springer International Publishing: Cham, Switzerland, 2014; pp. 684–693. [Google Scholar]
Biswas, S.; Bordoloi, M.; Purkayastha, B. Review on Feature Selection and Classification using Neuro-Fuzzy Approaches. Int. J. Appl. Evol. Comput. 2016, 7, 28–44. [Google Scholar] [CrossRef]
Liu, H.; Yu, L.; Motoda, H. Feature Extraction, Selection, and Construction. In The Handbook of Data Mining; Lawrence Erlbaum Associates: Mahwah, NJ, USA, 2003; pp. 409–424. [Google Scholar]
Witten, I.H.; Frank, E.; Hall, M.A. Data Mining: Practical Machine Learning Tools and Techniques; Elsevier: Amsterdam, The Netherlands, 2011; ISBN 978-0-12-374856-0. [Google Scholar]
Hall, M.A.; Holmes, G. Benchmarking Attribute Selection Techniques for Discrete Class Data Mining. IEEE Trans. Knowl. Data Eng. 2003, 15, 1437–1447. [Google Scholar] [CrossRef] [Green Version]
Chandrashekar, G.; Sahin, F. A Survey on Feature Selection Methods. Comput. Electr. Eng. 2014, 40, 16–28. [Google Scholar] [CrossRef]
Bins, J.; Draper, B. Evaluating Feature Relevance: Reducing Bias in Relief. In Proceedings of the 6th Joint Conference on Information Science, Research Triangle Park, NC, USA, 8–13 March 2002; pp. 757–760. [Google Scholar]
Yang, Q.; Shao, J.; Scholz, M.; Plant, C. Feature Selection Methods for Characterizing and Classifying Adaptive Sustainable Flood Retention Basins. Water Res. 2011, 45, 993–1004. [Google Scholar] [CrossRef] [PubMed]
Hall, M.A.; Smith, L.A. Feature Selection for Machine Learning: Comparing a Correlation-Based Filter Approach to the Wrapper. In Proceedings of the Twelfth International Florida Artificial Intelligence Research Society Conference, Orlando, FL, USA, 1–5 May 1999; AAAI Press: Palo Alto, CA, USA, 1999; pp. 235–239. [Google Scholar]
Yu, L.; Liu, H. Feature Selection for High-Dimensional Data: A Fast Correlation-Based Filter Solution. In Proceedings of the 20th International Conference on Machine Learning, Washington, DC, USA, 1 January 2003; Volume 2, pp. 856–863. [Google Scholar]
Ahmad, A.; Dey, L. A Feature Selection Technique for Classificatory Analysis. Pattern Recognit. Lett. 2005, 26, 43–56. [Google Scholar] [CrossRef]
Chang, C.-C. Generalized Iterative RELIEF for Supervised Distance Metric Learning. Pattern Recognit. 2010, 43, 2971–2981. [Google Scholar] [CrossRef]
Kononenko, I.; Hong, S.J. Attribute Selection for Modelling. Future Gener. Comput. Syst. 1997, 13, 181–195. [Google Scholar] [CrossRef]
Kononenko, I. Estimating Attributes: Analysis and Extensions of RELIEF. In Machine Learning: ECML-94; Bergadano, F., De Raedt, L., Eds.; Springer: Berlin/Heidelberg, Germany, 1994; pp. 171–182. [Google Scholar]
Senthamarai Kannan, S.; Ramaraj, N. A Novel Hybrid Feature Selection via Symmetrical Uncertainty Ranking Based Local Memetic Search Algorithm. Knowl.-Based Syst. 2010, 23, 580–585. [Google Scholar] [CrossRef]
Hall, M.A. Correlation-Based Feature Selection for Discrete and Numeric Class Machine Learning. In Proceedings of the Seventeenth International Conference on Machine Learning, Standord, CA, USA, 29 June–2 July 2000; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 2000; pp. 359–366. [Google Scholar]
Pozzolo, A.D.; Caelen, O.; Johnson, R.A.; Bontempi, G. Calibrating Probability with Undersampling for Unbalanced Classification. In Proceedings of the 2015 IEEE Symposium Series on Computational Intelligence, Cape Town, South Africa, 7–10 December 2015; pp. 159–166. [Google Scholar]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic Minority Over-Sampling Technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
De Sá, C.R.; Soares, C.; Knobbe, A.; Azevedo, P.; Jorge, A.M. Multi-Interval Discretization of Continuous Attributes for Label Ranking. In Discovery Science; Fürnkranz, J., Hüllermeier, E., Higuchi, T., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 155–169. [Google Scholar]
Zhu, Q.; Lin, L.; Shyu, M.-L.; Chen, S.-C. Effective Supervised Discretization for Classification Based on Correlation Maximization. In Proceedings of the 2011 IEEE International Conference on Information Reuse Integration, Las Vegas, NV, USA, 3–5 August 2011; pp. 390–395. [Google Scholar]
Fayyad, U.M.; Irani, K.B. Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning. In Proceedings of the 13th International Joint Conference on Artificial Intelligence (IJCAI-93), Chambèry, France, 28 August–3 September 1993; pp. 1022–1027. [Google Scholar]
Kononenko, I. On Biases in Estimating Multi-Valued Attributes. In Proceedings of the 14th International Joint Conference on Artificial Intelligence, Montreal, QC, Canada, 20–25 August 1995; Morgan Kaufmann Publishers Inc.: San Francisco, CA, USA, 1995; Volume 2, pp. 1034–1040. [Google Scholar]
Duda, R.O.; Hart, P.E.; Stork, D.G. Pattern Classification, 2nd ed.; Wiley: Hoboken, NJ, USA, 2012; Available online: https://www.wiley.com/en-us/Pattern+Classification%2C+2nd+Edition-p-9781118586006 (accessed on 12 November 2019).
Boyd, K.; Eng, K.H.; Page, C.D. Area under the Precision-Recall Curve: Point Estimates and Confidence Intervals. In Machine Learning and Knowledge Discovery in Databases; Blockeel, H., Kersting, K., Nijssen, S., Železný, F., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 451–466. [Google Scholar]
Saito, T.; Rehmsmeier, M. The Precision-Recall Plot is More Informative than the ROC Plot When Evaluating Binary Classifiers on Imbalanced Datasets. PLoS ONE 2015, 10, e0118432. [Google Scholar] [CrossRef] [Green Version]
Wierzba, D.; Ziemba, P.; Becker, J. Mendeley Data—Anonymized Data about Loan Repayment and Borrowers. Available online: http://dx.doi.org/10.17632/fr99jcnkxg.2 (accessed on 27 August 2021).
Ziemba, P. Multi-Criteria Approach to Stochastic and Fuzzy Uncertainty in the Selection of Electric Vehicles with High Social Acceptance. Expert Syst. Appl. 2021, 173, 114686. [Google Scholar] [CrossRef]

Figure 1. Scenario-based research study. Abbreviations: RU—Random undersampling, SMOTE—Synthetic Minority Over-sampling Technique, FC—Fayyad criterion-based discretization, KC—Kononenko criterion-based discretization, CFS—Correlation-based Feature Selection, SA—Significance Attribute, SU—Symmetrical Uncertainty, FCBF—Fast Correlation-Based Filter, DT—Decision table, LR—Logistic regression, NB—Naïve Bayes, RF—Random forest, C4.5—C4.5 decision tree, kNN—k-nearest neighbors, ORF—Optimized random forest.

Table 1. Characteristics of selected classification methods.

Method	Essence of the Method	Advantages	Disadvantages	Ref.
C4.5	The C4.5 algorithm is based on dataset splits according to individual variables, works in a recursive manner when visiting each decision node and proposing optimal division according to selected criterion.	C4.5 is not built by binary splits only, therefore, varied shape of model is obtained. When categorical variable is analyzed, branching based on each level of attribute is made. This results that tree, when all possible divisions are made, has greater depth.	Assigning one value to dependent variable. Significant change of predicted value when value of one of the features changes slightly.	[51,52]
RF	RF is a complex classifier, consisting of multiple instances of decision trees, which is trained without supervision. One tree can be grown by obtaining a randomly drawn subset of data with replacement from the training dataset. Then the decision tree is created for the selected subset. Training finishes when the number of trees has reached its maximum or error in testing set has stopped decreasing.	Possibility of enabling parallel computation for each tree, due to independence of trees. This approach has more stability than simple decision tree model, providing improved classification accuracy. Some of frequent issues are addressed by random forest: incomplete data, irrelevant and redundant explanatory variables, sophisticated and large dependency structure of features.	The main disadvantage can be loss of interpretability for trained classifier model. High computational complexity.	[53,54,55,56]
DT	DT is an accurate method for numeric prediction from decision trees and it is an ordered set of If-Then rules that have the potential to be more compact and therefore more understandable than the decision trees. The entire problem of learning DT consists of selecting the right attributes to be included. Usually this is done by measuring the tables cross validation performance for different subsets of attributes and choosing the best performing subset.	DT is one of the simplest hypothesis spaces possible and usually they are easy to understand. It is a simpler, less compute intensive algorithm than the decision-tree-based approach. Leave-one-out cross-validation is very cheap for this kind of classifier.	The TD algorithm very rarely achieves above-average classification accuracy. There are always the same number of evaluation conditions and actions to be performed in the decision table. DT does not depict the flow of logic for the solution to a given problem.	[57,58]
NB	It is a family of algorithms based on a common principle, that the value of a given feature is independent of the value of any other feature, taking into account the class variable. The purpose of NB algorithm is to assess conditional probability of occurring events.	The NB classifier is considered to be relatively simple, effective algorithm. NB is able to analyze any number of independent, continuous and categorical variables. It can be used for tasks with two or more classes for output variable, assuming complete independence of individual variables. It only requires a small number of training data to estimate the parameters necessary for classification. It is not sensitive to insignificant features.	NB assumes that all features are independent, what rarely happening in real (it limits the applicability of this algorithm). There is a problem of ‘zero frequency’ in the NB, where it assigns zero probability to a categorical variable whose category in the test data set wasn’t available in the training dataset.	[59,60,61]
LR	LR is one of the classification methods used when each sample is assigned to one of two classes (binary classification). This model assesses the probability of an event that dependent variable is equal to 1.	LR takes into account all significant variables and excludes all irrelevant features from model. The resulting model is easy to interpret, because each feature has one weight assigned.	The LR model does not explain interactions between independent variables and data cannot be collinear. In case of outliers LR model efficiency deteriorates much, so that they should be removed before starting the analysis.	[62,63]
kNN	kNN is a nonparametric method. The algorithm assumes that similar objects are in the same class and the prediction of belonging to the class of a new object is based on a comparison with a set of prototype objects.	kNN can be used both for regression and classification tasks. It does not require learning as it uses the idea of prototypes. No need for parameter optimization. Possibly huge number of classes. Very fast evaluation of new samples. Ease of implementation.	kNN treats all the attributes of the feature space equally important, which increases risk of domination irrelevant or redundant features over significant ones, leading to inferior classification. To avoid such situation, an appropriate set of features should be selected [39].	[64,65]

Table 2. Characteristics of selected feature selection methods.

Method	Group of Methods	Methodological Basics	Applied Heuristics	Essence of the Method	Ref.
ReliefF	distance based	k-nearest neighbors	good attributes should discriminate objects belonging to other classes and should have the same value for objects being similar and belonging to the same class	introduces hits and misses concepts, which improves or deteriorates classifier’s accuracy	[79] [80] [81] [74]
SU	correlation based	entropy, information gain	‘1′ means that we are fully informed based on the attribute, allowing us to predict the class of the object; ‘0′ means there is no information after analyzing the attribute and prediction is not possible	compensates a deviation of information gain towards multi-valued attributes and normalizes final score to range [0, 1]	[75] [82]
CFS	correlation based	SU, Pearson linear correlation	good subset of features contains attributes that are strongly correlated with a specific class of objects and not correlated with other classes and attributes	matrix of mutual correlation between attributes and correlation between attributes and classes of objects are initially computed, forward search is performed using the “Best First” algorithm	[76] [83]
FCBF	correlation based	SU	only the attributes whose SU values are above defined threshold are selected for further consideration	The procedure employs sets of redundant features separately for each feature; selected attributes are sorted based on descending order of SU score and feature set is examined, whether redundancy of features exists	[77]
SA	correlation based	probability theory	if selected attribute is significant, then there is a high chance for that elements with complementary sets of values for this attribute will belong to complementary sets of classes; if class decisions for two sets of elements are different, then significant attribute values for these two sets of elements should also be different	significance of each attribute is calculated as the average value of general associations: given attribute with classes and classes with given attribute; the attribute is relevant when both values of associations are high	[78]

Table 3. Characteristics of selected resampling methods.

Method	Essence of the Method	Advantages	Disadvantages	Ref.
Random undersampling	assumes that multiple objects of the majority class are redundant and random deletion of them will not significantly change data distribution	reduces representation of the majority class by removing random objects of such class until number of classes is balanced	there is possible risk of certain objects removal, which have positive impact on accuracy of classifier	[84]
Random oversampling	increases size of the minority class by replicating objects belonging to such class	-	puts at risk of overfitting the classifier model by shifting the model towards the minority class; not add any new valuable objects, of the minority class; classifier training is significantly extended by increasing the size of the training set	[20] [84]
SMOTE oversampling	the minority class is oversampled by generating synthetic objects in neighborhood of the real objects; among k nearest neighbors, n ≤ k neighbors are randomly selected and one synthetic object is generated similar to each of them	using interpolation instead of replication, as opposed to random oversampling, SMOTE avoids problem of overfitting	shifts the decision boundaries of the minority class towards space of the majority class	[20] [84] [85]

Table 4. The best classification results from each research scenario.

Scenario	Rank (GC-Based)	Feature Selection	No of Features	Resampling	Discretization	Classifier	GC	AUPRC Negative	AUPRC Positive
0	1	-	272	-	-	ORF	0.828	0.999	0.275
	2	-	272	-	-	RF	0.76	0.998	0.276
	3	-	272	-	-	DT	0.716	0.998	0.087
	4	-	272	-	-	LR	0.7	0.997	0.096
1	1	CFS	13	-	-	ORF	0.762	0.998	0.147
	2	CFS	13	-	-	RF	0.704	0.998	0.137
	3	CFS	13	-	-	LR	0.668	0.998	0.072
	4	CFS	13	-	-	NB	0.662	0.998	0.046
2	1	CFS	35/27/37	RU	-	ORF	0.805	0.999	0.118
	2	CFS	35/27/37	RU	-	RF	0.802	0.999	0.111
	3	SU	35/27/37	RU	-	ORF	0.786	0.999	0.111
	4	SU	35/27/37	RU	-	RF	0.781	0.999	0.105
3	1	CFS	13	-	KC	NB	0.76	0.998	0.102
	2	CFS	14	-	FC	LR	0.758	0.998	0.117
	3	CFS	14	-	FC	NB	0.754	0.998	0.101
	4	CFS	13	-	KC	LR	0.752	0.998	0.116
4	1	CFS	35	RU	KC	ORF	0.794	0.999	0.121
	2	CFS	35	RU	KC	RF	0.79	0.999	0.116
	3	CFS	35	RU	KC	NB	0.768	0.999	0.112
	4	SU	35	RU	KC	LR	0.768	0.998	0.094

Classifier: NB—Naive Bayes, RF—Random Forest, DT—Decision Table, LR—Logistic Regression, ORF—Optimized Random Forest; Resampling: RU—Random Undersampling; Discretization: KC—Kononenko Criterion, FC—Fayyad Criterion; Feature selection: CFS—Correlation-based Feature Selection, SU—Symmetrical Uncertainty.

Table 5. The best classification results from each research scenario using FCBF.

Scenario	Rank (GC-Based)	Feature Selection	No of Features	Resampling	Discretization	Classifier	GC	AUPRC Negative	AUPRC Positive
1	1	FCBF	6	-	-	ORF	0.626	0.997	0.086
	2		6	-	-	DT	0.624	0.997	0.074
	3		6	-	-	NB	0.584	0.997	0.035
	4		6	-	-	LR	0.572	0.997	0.044
2	1		12	RU	-	ORF	0.749	0.998	0.094
	2		12	RU	-	RF	0.743	0.998	0.089
	3		12	RU	-	DT	0.699	0.998	0.057
	4		12	RU	-	LR	0.696	0.998	0.064
3	1		5	-	FC/KC	DT	0.652	0.997	0.069
	2		5	-	FC/KC	NB	0.648	0.997	0.082
	3		5	-	FC/KC	LR	0.644	0.997	0.081
	4		5	-	FC/KC	kNN	0.626	0.997	0.074
4	1		10	RU	KC	NB	0.756	0.998	0.084
	2		10	RU	KC	LR	0.732	0.998	0.081
	3		10	RU	KC	ORF/RF	0.722	0.998	0.065
	4		10	RU	KC	kNN	0.722	0.998	0.063

Classifier: NB—Naive Bayes, RF—Random Forest, DT—Decision Table, LR—Logistic Regression, ORF—Optimized Random Forest; Resampling: RU—Random Undersampling; Discretization: KC—Kononenko Criterion, FC—Fayyad Criterion; Feature selection: FCBF— Fast Correlation-Based Filter.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ziemba, P.; Becker, J.; Becker, A.; Radomska-Zalas, A.; Pawluk, M.; Wierzba, D. Credit Decision Support Based on Real Set of Cash Loans Using Integrated Machine Learning Algorithms. Electronics 2021, 10, 2099. https://doi.org/10.3390/electronics10172099

AMA Style

Ziemba P, Becker J, Becker A, Radomska-Zalas A, Pawluk M, Wierzba D. Credit Decision Support Based on Real Set of Cash Loans Using Integrated Machine Learning Algorithms. Electronics. 2021; 10(17):2099. https://doi.org/10.3390/electronics10172099

Chicago/Turabian Style

Ziemba, Paweł, Jarosław Becker, Aneta Becker, Aleksandra Radomska-Zalas, Mateusz Pawluk, and Dariusz Wierzba. 2021. "Credit Decision Support Based on Real Set of Cash Loans Using Integrated Machine Learning Algorithms" Electronics 10, no. 17: 2099. https://doi.org/10.3390/electronics10172099

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Credit Decision Support Based on Real Set of Cash Loans Using Integrated Machine Learning Algorithms

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. Classification Methods

3.2. Feature Selection Methods

3.3. Resampling Methods

3.4. Discretization Methods

3.5. Classification Evaluation Metrics

4. Research Procedure

5. Results and Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A. Results of Scenario 0

Appendix B. Results of Scenario 1

Appendix C. Results of Scenario 2—Random Undersampling

Appendix D. Results of Scenario 2—SMOTE

Appendix E. Results of Scenario 3—Fayyad Criterion

Appendix F. Results of Scenario 3—Kononenko Criterion

Appendix G. Results of Scenario 4—Random Undersampling, Kononenko Criterion

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI