Methodology for Smooth Transition from Experience-Based to Data-Driven Credit Risk Assessment Modeling under Data Scarcity

Li, Hengchun; Lan, Qiujun; Xiong, Qingyue

doi:10.3390/math12152410

Open AccessArticle

Methodology for Smooth Transition from Experience-Based to Data-Driven Credit Risk Assessment Modeling under Data Scarcity

by

Hengchun Li

¹,

Qiujun Lan

²

and

Qingyue Xiong

^2,*

¹

School of Economics and Management, Shaoyang University, Shaoyang 422000, China

²

Hunan Key Laboratory of Data Science & Blockchain, Business School of Hunan University, Changsha 410082, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(15), 2410; https://doi.org/10.3390/math12152410

Submission received: 12 July 2024 / Revised: 26 July 2024 / Accepted: 31 July 2024 / Published: 2 August 2024

(This article belongs to the Special Issue Machine Learning and Finance)

Download

Browse Figures

Versions Notes

Abstract

:

Credit risk refers to the possibility of borrower default, and its assessment is crucial for maintaining financial stability. However, the journey of credit risk data generation is often gradual, and machine learning techniques may not be readily applicable for crafting evaluations at the initial stage of the data accumulation process. This article proposes a credit risk modeling methodology, TED-NN, that first constructs an indicator system based on expert experience, assigns initial weights to the indicator system using the Analytic Hierarchy Process, and then constructs a neural network model based on the indicator system to achieve a smooth transition from an empirical model to a data-driven model. TED-NN can automatically adapt to the gradual accumulation of data, which effectively solves the problem of risk modeling and the smooth transition from no to sufficient data. The effectiveness of this methodology is validated through a specific case of credit risk assessment. Experimental results on a real-world dataset demonstrate that, in the absence of data, the performance of TED-NN is equivalent to the AHP and better than untrained neural networks. As the amount of data increases, TED-NN gradually improves and then surpasses the AHP. When there are sufficient data, its performance approaches that of a fully data-driven neural network model.

Keywords:

credit risk assessment; neural networks; machine learning; data scarcity; analytic hierarchy process

MSC:

68T07; 68W99

1. Introduction

Credit risk assessment is a significant topic in the realm of finance, serving as a critical foundation for decision-making in loan approvals and credit card issuance. According to the Basel II Accord, credit risk is one of the risks that financial institutions face when allocating resources. It is defined as the probability of loss for a lender when a borrower fails to meet the terms of a loan or credit agreement [1]. As we know, each loan default data record means a monetary loss. For instance, a lending institution that has been in operation for about two years has around 2000 customers, a loan balance of about CNY 1 billion (USD 150 million), and a bad debt loss of about CNY 5 million, with fewer than 20 default records. On average, each default record incurs a loss of approximately CNY 250,000. In this context, enabling financial institutions to accurately assess the likelihood of borrowers defaulting on their obligations is the most effective way to safeguard them against potential losses.

Risk assessment models are used to identify, quantify, and evaluate risks, which can help organizations understand potential risks and take appropriate risk management measures [2]. In the context of finance, these assessment models are essential tools for managing and mitigating the credit risks inherent in lending activities. Assessment models can be broadly categorized into qualitative and quantitative methodologies. Qualitative assessments leverage expert judgment to evaluate risks through observation, investigation, and analysis, which is especially effective for risks that are difficult to quantify. Common qualitative assessment techniques encompass brainstorming sessions, Delphi consultations, in-depth interviews with experts, and comprehensive scenario analyses [2,3].

In contrast, quantitative assessments rely on numerical methods to measure risk by quantifying the risk factors and potential for loss. This characteristic makes quantitative assessments suitable for situations where precise numerical analysis is required. Typical quantitative risk assessment methods include Monte Carlo simulation, sensitivity analysis, probabilistic risk assessment, fault tree analysis, decision tree analysis, risk matrices, etc. [4,5]. In practical applications, hybrid methodologies that integrate both qualitative and quantitative risk assessment methods have gained significant popularity. Identifying risk factors is crucial in credit risk assessment. Widely used methodologies begin by employing qualitative techniques, such as brainstorming or expert interviews, to identify and describe potential risk factors. With the insights of experts, these risks are preliminarily prioritized. Following this, probabilistic analysis and various mathematical techniques are applied to quantify risk factors, allowing for the detailed analysis and comprehensive computation of potential risks [5,6]. In these methods, risk assessment factors are usually organized into a hierarchical risk indicator system based on expert judgment. Determining the appropriate weights for various risk indicators is key to constructing a credit risk assessment model. The Analytic Hierarchy Process (AHP), introduced by Thomas L. Saaty, is a classic technique for prioritizing indicator systems, which harnesses the expertise of professionals to evaluate the relative significance of diverse risk indicators. By constructing a judgment matrix and applying mathematical algorithms, the AHP ascertains the weightage of each indicator. This methodology offers greater precision and consistency compared to the direct assignment of weights by experts [7].

The development and validation of credit risk assessment models is an important and complex process. To better quantify financial risks and allocate economic capital, financial institutions have invested substantial resources to develop internal credit risk assessment models over the past decades [8]. Traditional credit assessment modeling approaches have relied on the expertise of professionals [9,10,11,12]. For instance, Roy et al. employed a hybrid Analytic Hierarchy Process to construct a credit scoring model [12]. Wu et al. proposed a multi-criteria decision-making method that enhances the validity of the assessment results by considering the cognitive levels of various experts and the degree of gray relation [10]. Habachi et al. introduced a novel method combining linear discriminant analysis with expert opinions, using Bayesian logic to determine the probability of default. This approach leverages expert knowledge to complement the deficiencies of statistical models, potentially increasing the accuracy and reliability of the model when dealing with complex credit assessment tasks [11]. Credit assessment modeling methods based on expert judgment are capable of handling complex situations where data are insufficient or difficult to quantify, and the decision-making process is usually more interpretable. However, there are limitations to this type of approach, including the possibility of being limited by personal bias and experience, inconsistency in results, poor scalability, and difficulty in quickly adapting to changes in the application environment.

With the dawn of the big data era, the data-driven model-building paradigm has gained widespread recognition. And, the integration of credit risk assessment modeling with machine learning has become a significant research direction in the financial industry. The existing research on credit risk assessment modeling mostly focuses on the development and improvement of models, such as the development of credit assessment models based on different machine learning algorithms [13,14,15] or the fusion of data from different sources to improve predictive performance [16,17]. Some researchers have attempted to improve classification accuracy by enhancing data preprocessing strategies, such as developing solutions to address the data imbalance issue, which is a common challenge in credit risk assessment. Previous works have primarily employed techniques like random oversampling, random undersampling, improved oversampling techniques, and methods that combine Random Forest with Recursive Feature Elimination and random oversampling to address the data imbalance issue, thereby improving the model’s generalization capabilities and predictive performance [18,19,20,21]. Some researchers have also optimized models by applying feature selection techniques [22,23,24]. For example, Shrawan et al. explored various combinations of different feature selection techniques (such as Information Gain, Gain Ratio, and Chi-Square) and machine learning classifiers to enhance the robustness and accuracy of the assessment model [22]. Arora et al. introduced a novel Bolasso-enabled Random Forest algorithm to provide great stability of features, which gains improved credit risk assessment performance [23]. Despite the excellent predictive capabilities of machine learning models, their “black box” nature often raises concerns among regulatory agencies and users. Therefore, enhancing the interpretability and transparency of models is also an important direction for research. Some studies have proposed frameworks and technologies to make machine learning models more transparent and interpretable [25,26,27]. For instance, Lundberg et al. introduced SHAP (SHapley Additive exPlanations), offering a unified framework for interpreting the predictions of complex models [26]. By assigning an importance value to each feature’s contribution to a specific prediction, it helps users better understand the model’s decision-making process.

Data-driven modeling approaches diminish the impact of human bias and enhance the accuracy of assessments by learning complex patterns within the data. However, sufficient training data are a necessary prerequisite for adopting this type of approach. Unfortunately, for many newly launched projects, the available sample size is quite limited due to the short operation time. As we know, developing customized credit risk assessment models for different customer groups is crucial for lending institutions. Before or in the early stages of launching a new business, lending institutions may not have enough historical information about this customer segment. As shown in Figure 1, with the development of the business, repayment data gradually accumulate. Data will go through different stages, from no data and limited data to sufficient data. However, achieving a sufficient data volume phase for lending institutions may require several years. Consequently, in the absence of substantial data, the establishment of machine learning-based risk assessment models is not feasible without alternative data sources. Therefore, businesses often commence operations by inviting experts to construct qualitative risk assessment models. Nonetheless, this approach precludes the integration of subsequent operational data into the models. Financial institutions’ business data are often derived from costly lessons learned, such as loan defaults. The inability to leverage such valuable data is indeed regrettable.

This raises the following challenges: How can these valuable data be effectively utilized to improve the assessment models established during the data-scarce phase? How can the model smoothly transition from an empirical model to a data-driven model as data gradually accumulate? Current research seldom focuses on the transition process of credit risk assessment models from a data-scarce state to a data-accumulating state. To address the above-mentioned challenges, this paper proposes a novel credit risk assessment modeling methodology called TED-NN (Transition from Empirical model to Data-driven model). Based on TED-NN, indicators and weights are determined based on expert experience during the no-data phase. As the business develops, a small amount of data can be fully utilized to improve the accuracy of the assessment model. When data are abundant, the model naturally evolves into a fully data-driven model. The entire process does not require model reconstruction; it only needs to iteratively optimize the model using newly generated business data. The main contributions of this paper are as follows:

Although the AHP method can construct models using expert experience during the initial stage of business without relying on historical data, the limitations of expert experience and human cognitive biases lead to the subjectivity of the AHP model. The methodology proposed in this paper combines the advantages of the AHP and neural networks. It not only utilizes expert experience to build models in the early stages of business but also incorporates real data to continuously and dynamically update the model.
Our method directly constructs a neural network model by referring to the AHP model structure, which avoids the problem of determining the nodes in the neural network design and also gives a specific meaning to each node, thereby improving the comprehensibility of the model.
The establishment of our model is relatively straightforward, requiring only slight modifications to the original BP neural network learning algorithm. By directly inheriting the initial weight from the AHP, we ensure that the neural network starts from a favorable position and effectively avoids premature convergence to a local optimum. Furthermore, this model does not require a large amount of training data; even a small amount of data can be used for the model to learn and improve.

The remainder of this paper is organized as follows: Section 2 briefly reviews the preliminary knowledge of the indicator system, the Analytic Hierarchy Process, and the BP algorithm of the neural network. Section 3 describes the main ideas, technical points, and key algorithms of TED-NN in detail. Section 4 provides a specific case to illustrate the application of TED-NN in credit evaluation. Section 5 presents the comparison results between TED-NN, an empirical model, and a data-driven model for the three stages of data accumulation. Finally, Section 6 concludes the paper and gives an outlook on future research work.

2. Preliminaries

2.1. Indicator System of Risk Assessment Model

The establishment of the indicator system is the prerequisite and basis for the risk evaluation model. Constructing an indicator system is the process of decomposing the abstract object into a behavioral, operable structure according to its essential attributes and the identity of certain features [28]. Typically, an indicator system exhibits a multi-layer structure, including an objective indicator, sub-objective indicators, and operational indicators, and it manifests as a tree-like or net-like structure (as shown in Figure 2). The objective indicator is located at the top level of the indicator system and represents the comprehensive evaluation results; operational indicators are at the bottom level and comprise multiple indicators that make it easy to directly obtain quantitative results; and sub-objective indicators are located between the two and can contain multiple layers, reflecting the relationship and mechanism of action between the objective indicator and operational indicators.

Establishing an evaluation indicator system usually includes the following steps: (1) clarify the evaluation purpose; (2) determine the evaluation object; (3) collect and evaluate information related to the object, including expert opinions, stakeholder needs, etc.; (4) in a layered manner, determine the different dimensions of indicators based on the evaluation purpose; (5) select specific operational indicators, which should be quantifiable, comparable, and actionable; (6) assign weights to each indicator based on their importance and contribution to the evaluation objectives. In this process, qualitative methods such as discussion and judgment are usually used to obtain results based on expert experience and knowledge. Assigning weights to various indicators in the system is a crucial and challenging task. In modeling practice, it is very common to adopt subjective experiential methods.

2.2. Analytic Hierarchy Process

The AHP was proposed by the American operations researcher Professor T.L. Saaty of the University of Pittsburgh in the early 1970s. The differences between various judgment elements (indicators) are quantified, and the subjective judgments of the expert are calculated and converted into weight parameters by using a mathematical method [29]. It is very suitable for assigning weights to various indicators of a hierarchical indicator system. The main process is as follows:

(1): A set of pairwise comparison matrices (size $n \times n$ ) is constructed for each of the lower levels using the relative scale measurement shown in Table 1, and a matrix is built for each element in the higher level immediately following it. The judgment matrices were developed through a structured Delphi process involving a panel of experts in the field of credit risk assessment. Each expert was asked to provide pairwise comparisons of the criteria based on their experience and understanding of the domain. These comparisons were then synthesized into a collective judgment matrix using geometric mean aggregation, which helps to mitigate the impact of potential outliers and biases in individual judgments.

(2): Pairwise comparisons are conducted based on the element dominance. There are $n \times (n - 1)$ judgments required to develop the set of matrices. Reciprocals are automatically assigned in each pairwise comparison.
(3): Hierarchical synthesis is adopted to assign weights to eigenvectors based on criteria weights, and the sum of weighted eigenvector entries that correspond to those in the subsequent lower level of the hierarchy is calculated.
(4): Having made all pairwise comparisons, the consistency is determined by using the eigenvalue $λ_{m a x}$ to calculate the consistency index $C I$ as follows: $C I = (λ_{m a x} - n) /$ $(n - 1)$ , where n represents the size of the matrix. Judgment consistency can be checked by taking the consistency ratio (CR) of the CI with the appropriate value in Table 2. The CR is accepted if it does not exceed 0.10. If it is more, the judgment matrix is inconsistent. To obtain a consistent matrix, judgments should be reviewed and improved.

(5): Steps (1)–(4) are performed for all layers in the hierarchy.

The main advantage of the AHP is that it organically combines qualitative methods with quantitative methods. It utilizes human subjective experience to judge the relative importance of each indicator. The judgment matrix in this method is the core, but it requires significant cost and effort to construct. In practice, due to the limitation of human cognitive ability, it is not easy to construct a judgment matrix that satisfies the consistency requirement on the first attempt, and it is often necessary to repeatedly modify it to achieve consistency. It can be seen that the indicator weights determined by the AHP method are essentially experience-based.

2.3. Back-Propagation Neural Network

The neural network is a machine learning method constructed by a number of simple components in a massively parallel connection that mimics the biological nervous system based on biological neurons. The neural network functions as a sophisticated, adaptive, and nonlinear system capable of acquiring and learning knowledge from its surroundings through internal neurons. An artificial neuron generally serves as a nonlinear processing unit that receives multiple inputs and generates a single output, and their structures are shown in Figure 3.

The model can be expressed in mathematical language as

v_{k} = \sum_{i} w_{i k} x_{i} + b_{k}, y_{k} = ϕ (v_{k})

(1)

where

x_{i} (i = 1, 2, \dots, n)

is the input value,

w_{i k}

is the weight of the i-th input connection of the k-th neuron,

b_{k}

is the bias level of neuron k,

v_{k}

is the weighted sum of neuron k for each input and bias,

ϕ (˙)

is the activation function, which is usually a nonlinear function, and

y_{k}

is the output value of the neuron.

The process of improving a neural network’s performance is known as training, which involves optimizing the network’s weights and biases. The back-propagation (BP) algorithm, introduced by Rumelhart et al. in 1986 [30], stands as a foundational learning algorithm for neural networks. The algorithm is divided into two phases. The first phase is the forward process, in which the output of each neural layer is determined iteratively, layer by layer. The second phase is the reverse process, which calculates the error of each hidden layer node layer by layer and thus corrects the connection weight with the front layer. Given a predefined network architecture and initial weights, the BP algorithm iterates through the following processes until convergence:

(1): Calculate the output of each node from front to back:

$y_{k} = ϕ (v_{k})$

(2)
(2): Calculate the corresponding delta value for each node k of the output layer:

$δ_{k} = - (t_{k} - y_{k}) ϕ^{'} (v_{k})$

(3)

where $t_{k}$ is the supervisory signal (supervised output), and $ϕ^{'} (˙)$ is the derivative of the activation function.
(3): Calculate the delta value of the hidden layer from back to front:

$δ_{i} = \sum_{k} δ_{k} w_{i k} ϕ^{'} (v_{i})$

(4)
(4): Calculate and save the weight correction for each node:

$Δ_{w_{i k}} = η δ_{k} x_{i}$

(5)

Here, $η$ is the learning rate used to adjust the rate of learning.
(5): Correct the connection weight between each node:

$w_{i k} = w_{i k} + Δ_{w_{i k}}$

(6)

Although a neural network has many advantages, such as strong robustness, fault tolerance, parallel processing capability, and self-learning, its disadvantages are also obvious. Generally, the structure of a neural network, including the network depth and the number of nodes in each layer, must be established before the training phase. In particular, the configuration of the hidden layers often lacks concrete theoretical guidance, resulting in significant unpredictability. When the activation function is nonlinear, the BP learning algorithm is a local optimization method, and the global optimal solution is not necessarily obtained. Since the neural network is a local optimization method, different initial values of weights may lead to different local minimum points. Therefore, when the initial value is far from the global advantage, the final optimization result may still be far from the global optimal value. The neural network model is like a black box. The weight values seen by opening the black box are not as clear as the weights in the AHP model. And, each node does not have any practical meaning. In addition, a neural network is a supervised machine learning model that requires enough labeled sample data for training, in contrast to the AHP. For the construction of the credit evaluation model, since default data are scarce compared with normal transaction data, it is difficult to collect enough default samples to build the model in advance.

3. Methodology

3.1. Framework and Model Structure

Actually, the basic idea of the TED-NN is very simple. Since there are no sample data in the initial stage of the business, TED-NN will construct an indicator system based on expert experience and allocate weights to each indicator through the AHP method (later referred to as the AHP model). With the accumulation of sample data, the neural network is constructed with reference to the indicator system, and the weights of the AHP model are used as the initial weights of the neural network (referred to as the TED-NN model). Based on the training algorithm, the weights are dynamically updated, and then a more accurate and objective evaluation model based on machine learning is gradually obtained. This idea is shown in Figure 4.

An important feature of this framework is that the structure of the neural network mirrors the multi-layer indicator system structure of the empirical model, and its initial connection weights are also the same as the weights assigned by the AHP method. That is, the structure of TED-NN is similar to the tree or net shown in Figure 2. And, the adjacent level’s nodes are not fully connected; the levels of certain nodes may not even be aligned with other nodes, such as the gray node in Figure 5. Hence, the structure of TED-NN is significantly different from general BP neural networks (GNNs).

Two advantages arising from this are as follows:

(1): The neural network model is no longer a black box, and each node corresponds to each indicator in the empirical model, which has a clear meaning. This makes neural network models easier to understand and more interpretable. This is because the architecture of TED-NN mirrors the multi-layer structure of the empirical model’s indicator system. This correspondence is not only structural but also semantic. Each node in TED-NN represents a specific business indicator and is connected according to its logical relationship in the empirical model. This one-to-one correspondence between nodes and indicators allows for tracking how much each indicator contributes to the final decision when the model makes a credit risk assessment, enhancing the interpretability of the model. For example, if the model gives a business a low credit rating, we can check which indicators have the most negative impact, such as discovering that the weights of the “net profit margin” and “asset-liability ratio” are large but the values are low, and the joint influence of the two leads to a low credit rating. This traceability enhances the intelligibility of the decision logic behind credit risk assessment, thus significantly improving the interpretability of the model.
(2): Due to the drawbacks of local optimization in the BP algorithm, random assignment may trap the neural network in a poor solution. The initial weights of neural networks in TED are not randomly assigned, which can effectively avoid this problem because these weight values could be close to the global optimum.

Moreover, considering that the AHP model is a linear structure, the ReLu function is used as the activation function of the hidden layer to fit the output structure of the AHP. In addition, in order to endow the model with nonlinear mapping ability, the activation function of the output layer nodes adopts a sigmoid function.

3.2. Algorithms

As mentioned above, the model structure in TED-NN is different from a GNN. How should the model be trained? Additionally, how can a smooth transition from the AHP model to the neural network model be achieved? These are two key issues that need to be addressed.

3.2.1. Smooth-Transition Algorithm

When the TED-NN model was first established, although its structure and connection weights were the same as those of the AHP model, its input–output mapping results were not the same. That is, for the same input sample, there may be a big difference between the output of the AHP model and the output of the TED-NN model. The reason is that the output node of the AHP model is the weighted sum of the input nodes, which is a simple linear operation. However, each node in the TED-NN model needs to undergo activation function transformation after weighting and summing the nodes in the previous layer. In addition, each node of the TED-NN model has a bias input term, which is also the reason why the initial TED-NN and AHP model outputs are inconsistent. Hence, a smooth-transition algorithm is necessary to make the output of the TED model consistent with AHP’s under the same conditions of input samples. Figure 6 shows the process of the algorithm.

Firstly, some sample data generated during business execution are input into both the AHP and TED-NN models simultaneously, and the output errors of both are calculated.

Then, in order to reduce the output error value, the bias value (treated as a virtual bias input node) of each node is adjusted according to the gradient descent method by means of back-propagation. The iterative loop is repeated until these errors satisfy the accuracy requirement. It should be noted that the connection weights of nodes in the smoothing process are fixed.

It must be noted that the training of smoothing algorithms does not require actual business data. We can generate simulated data and input them into the AHP model to generate output results. Then, the simulated data and the output data of the AHP model are used as input and output variables for the training data of the smoothing algorithm, respectively. The detailed pseudocode in Algorithm 1 provides a structured representation of the steps of the smooth-transition algorithm.

Algorithm 1 Smooth-transition algorithm for TED-NN.

1:: Set the initial structure of TED-NN based on the indicator system.
2:: Set node connection weights W based on the AHP model.
3:: Set the bias $B = (b_{1}, b_{2}, \dots, b_{s})$ for non-input layer nodes, where s is the number of non-input nodes.
4:: Set the activation function to ReLU for each hidden node and sigmoid for the output node.
5:: $k = the number of input layer indicators$ .
6:: Generate a simulation dataset X of size $n \times k$ as the input dataset, where $X [i] = (x_{i 1}, x_{i 2}, \dots, x_{i k})$ for $i = 1, 2, \dots, n$ .
7:: for $i = 1 \to n$ do

$y [i] = AHP (X [i])$

8:: end for
9:: $T r a i n s e t = (X, y)$
10:: Adjust the bias B but fix W to train TED-NN using $T r a i n s e t$ .
11:: Output $(W, B)$ as the initial TED-NN model.

3.2.2. Modification of the BP Algorithm

Because the TED-NN model is not fully connected, its learning algorithm cannot directly copy the traditional BP algorithm. To adapt to these differences, we propose a modification algorithm based on the BP algorithm (MBP). The most important measure we have taken is to introduce a connection matrix E between each level, in which element

E_{i k}

represents the connection between node i and node k:

E_{i k} = \{\begin{matrix} 1 & i and k have a connection \\ 0 & i and k do not have a connection \end{matrix}

(7)

The connection matrix E characterizes the connection relationship between nodes in different layers. Its introduction is intended to ensure that the structure and indicator system of TED-NN are consistent. If all of its element values are set to 1, TED-NN is equivalent to the GNN.

The key steps of the MBP algorithm are shown in Algorithm 2. Obviously, compared to the classical BP algorithm, MBP only makes a few modifications, which adds access to the connection matrix. There are no significant changes in the time or space complexity.

Algorithm 2 Modification algorithm based on the BP algorithm.

1:: Calculate the output of each layer node from front to back.

$v_{k} = \sum_{i} E_{i j} w_{i j} x_{i} + b_{k}, y_{k} = ϕ (v_{k})$
2:: Calculate the corresponding delta value for each node K of the output layer.

$δ_{k} = - (t_{k} - y_{k}) ϕ^{'} (v_{k})$
3:: Calculate the hidden layer delta value from back to front.

$δ_{j} = \sum_{k} δ_{k} E_{j k} w_{j k} ϕ^{'} (v_{j})$
4:: Calculate the weight correction for each node.

$Δ_{w_{i j}} = η δ_{j} x_{i}$
5:: Update the connection weights between all nodes.

$w_{i j} = w_{i j} + Δ_{w_{i j}}$

4. A Case of TED-NN Modeling

Next, we will demonstrate the operational process of TED through a case of credit risk modeling. The well-known credit evaluation benchmark dataset German from the UCI machine learning repository was applied in this case. It can be downloaded from the website https://archive.ics.uci.edu/datasets (accessed on 20 Janauary 2024). This dataset was generated from a credit card business in a German bank, and it contains twenty attribute variables and one label variable. Among them, the label variable has two different states to classify customers into two types: good credit and bad credit. This dataset has a total of 1000 customer samples, including 700 “good” samples and 300 “bad” samples. The variables in this dataset are described in detail in Table 3.

4.1. Construction of Indicator System

Based on the meaning of each variable in the dataset and referring to the famous FICO credit rating indicator system from the United States [31], we constructed a personal credit risk assessment indicator system based on this dataset, as shown in Figure 7.

4.2. Construction of AHP Model

Then, we use the AHP method to assign weights to the above indicator system. Here, Satty’s 1–9 scale is applied to construct judgment matrices. This method does not compare all factors together but uses a relative scale to compare them with each other, which can reduce the difficulty of comparing factors with different natures so as to improve the accuracy and consistency of judgments. Furthermore, the 1–9 scale provides fine granularity and is capable of comprehensively covering various situations, thereby better capturing the subtle differences in the relative importance of the assessed indicators and enhancing judgment accuracy.

The constructed judgment matrices are shown in Table A1, Table A2, Table A3, Table A4, Table A5 and Table A6 in Appendix A. Subsequently, the maximum eigenvalue of each judgment matrix and its corresponding feature vector are calculated. The feature vector is then normalized to obtain the weights of the credit indicators. Upon examination, the CR value of each judgment matrix is less than 0.10. Finally, a linear AHP model is produced as follows:

\begin{matrix} C r e d i t S c o r e & = 0.0036 C_{1} + 0.0068 C_{2} + 0.0079 C_{3} + 0.0024 C_{4} + 0.0036 C_{5} + 0.0094 C_{6} + \\ 0.0143 C_{7} + 0.0112 C_{8} + 0.0588 C_{9} + 0.0314 C_{10} + 0.0751 C_{11} + 0.0751 C_{12} + \\ 0.0751 C_{13} + 0.0237 C_{14} + 0.0398 C_{15} + 0.3284 C_{16} + 0.0918 C_{17} + 0.0628 C_{18} + \\ 0.0526 C_{19} + 0.0263 C_{20} \end{matrix}

(8)

4.3. Construction of TED-NN Model

Figure 8 shows the structure of the TED-NN model, which is obtained by referring to the indicator system structure shown in Figure 7. The initial weight of each connection in the TED-NN network refers to the corresponding weight in the AHP model (see weight columns of judgment matrices in Appendix A).

Table A7 in Appendix B shows the initial weight values between the output layer and the hidden layer, Table A8 shows the initial weight matrix values between the hidden layer and the input layer, and Table A9 and Table A10 in Appendix B show the corresponding connection matrices between the output layer, the hidden layer, and the input layer.

After initializing the weights, the smoothing algorithm is used to fix the weights, and the deviation values of the hidden and output nodes of TED-NN are adjusted to ensure that its model output is consistent with the AHP model.

So far, when new data arrive, we can apply the MBP algorithm proposed in the previous section to train and update the model. As data accumulate and the model is gradually trained, its performance will gradually approach that of a data-driven model.

5. Experiments

5.1. Experimental Preparation

To verify the effectiveness of the method proposed in this article, we conducted the following experiments. The following preparations were made before the experiment:

(1): A general neural network model (GNN) with the same node layout as TED-NN but with full connections between layers and a random initialization of weights was constructed. Here, the GNN model is regarded as a typical representative of data-driven models, while the AHP model can be seen as a representative of empirical models. The TED-NN proposed in this paper is a transitional model from empirical models to fully data-driven models. The experiment compared these three models.
(2): All data variables were normalized to the [0, 1] interval. Specifically, continuous variables include age, checking account balance, credit limit, etc., which are beneficial indicators, that is, the larger the indicator, the higher the credit score. Therefore, the following formula is used for the normalization calculation:

$x^{*} = \frac{x - x_{m i n}}{x_{m a x} - x_{m i n}}$

(9)

where $x^{*}$ is the normalized value of the indicator, and $x_{m i n}$ and $x_{m a x}$ are the minimum and maximum values of the indicator, respectively. Categorized variables, such as gender and marital status, job type, length of service, etc., were converted into values within [0, 1] by combining references with expert opinions.
(3): Then, 90% of the dataset was randomly allocated as the training set and the remaining 10% as the testing set.
(4): The output of the three models is a continuous number in [0, 1], while the label variable of the German dataset is binary. It is necessary to set a threshold to convert the continuous output value into binary data. The Youden index is used to determine the threshold value. The Youden index, also known as the correct index, is a method of evaluating the authenticity of screening tests. It is calculated as follows:

$Y o u d e n i n d e x = s e n s i t i v i t y - (1 - s p e c i f i c i t y)$

(10)

Here, sensitivity represents the probability of judging a positive instance as positive, and specificity represents the probability of judging a negative instance as negative. The Youden index reflects the total ability of the screening method to find positive and negative samples. The magnitude of the index is directly proportional to the effectiveness and authenticity of the screening experiment. By maximizing the difference between the true-positive rate (sensitivity) and the false-positive rate (1-specificity), the Youden index can identify the threshold that best discriminates between classes. However, the Youden index also has one limitation. It assumes that the output is binary, and therefore, it may not be well suited for multi-class classification problems. A detailed explanation of the Youden index can be found in [32].

In our experiment, the Youden index of each model is calculated based on the training set data, and the maximum point of the Youden index is used as the classification threshold value of each model. The threshold values for the AHP, GNN, and TED-NN are, respectively, 0.406, 0.375, and 0.47. Take AHP as an example. When the credit score of a sample is greater than 0.406, it is a good customer and vice versa. So, samples of the test set are classified as

{0, 1}

based on these threshold values to compare their differences with real credit categories.

5.2. Performance Comparison at No-Data Stage

To simulate the situation of the no-data stage, we compared the performance of three models based on the test dataset: the AHP model, the untrained TED-NN model, and the untrained GNN model.

Four classic metrics in the field of machine learning are used for performance comparison: Accuracy, Precision, Recall, and F1-Score. Accuracy measures the overall performance of the model by considering both the correctly classified non-events (true negatives) and events (true positives). It is calculated as the ratio of the number of correct predictions (both true positives and true negatives) to the total number of predictions made:

Accuracy = \frac{TP + TN}{Total number of instances}

(11)

Precision is the ratio of correctly predicted positive observations to the total number of predicted positives. In credit risk assessment, Precision measures the model’s ability to avoid false positives, i.e., reflects the accuracy of the model in not granting loans to applicants who are likely to default. A high precision rate means that the majority of the loans approved by the bank are to applicants who are likely to repay their loans. Precision is calculated as follows:

Precision = \frac{TP}{TP + FP}

(12)

Recall is the ratio of correctly predicted positive observations to all observations in the actual class. In credit risk assessment, it measures the proportion of all actual good loans that are successfully identified and approved by the model. A high Recall rate indicates that the model is effective in capturing as many potentially good customers as possible, thus minimizing the loss of business opportunities. Recall is calculated as follows:

Recall = \frac{TP}{TP + FN}

(13)

And, F1-Score is the weighted average of Precision and Recall, and it tries to find a balance between the two. F1-Score is calculated as follows:

F 1 - Score = 2 \times \frac{Precision \times Recall}{Precision + Recall}

(14)

where

TP = true positives;
TN = true negatives;
FP = false positives;
FN = false negatives.

The values of these metrics are all between 0 and 1, with larger values indicating the better performance of the model. Here, “Bad” customers are considered positive examples because the most important aspect of a risk assessment model is to identify bad customers. Table 4 shows the results.

From this table, it can be seen that the accuracy of the AHP model reaches 0.7600, and the F1-Score is also the highest among the three models, which clearly indicates that the empirical model is effective. In addition, regardless of which metric is used, the AHP and TED-NN are better than the untrained GNN model, indicating that although the TED-NN model has not been trained, it adopts a structure and weight similar to the AHP, and after bias correction, it already possesses most of the capabilities of the AHP model.

5.3. Performance Comparison at Sufficient-Data Stage

To simulate the situation of the sufficient-data stage, we used a full training dataset to train the TED-NN and GNN models. Then, using a test dataset to evaluate them, the AHP model remains unchanged. Table 5 shows the results.

From these results, it can be seen that at the sufficient-data stage, the AHP is no longer the best-performing model. The GNN and TED-NN models are very close and significantly better than the AHP model. This indicates that after sufficient training with data, data-driven models will surpass empirical models. This also indicates that the performance of the TED-NN model might not surpass that of the GNN model, but the two could be very close after sufficient training.

Figure 9 shows the ROC (receiver operating characteristic) curve and AUC (area under the curve) values of the three models using the training set and the test set. The OC curve is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. The false-positive rate can be calculated as 1 − specificity. In general, if the probability distributions for both sensitivity and the false-positive rate are known, the ROC curve can be generated by plotting the cumulative distribution function of the sensitivity probability on the y-axis versus the cumulative distribution function of the false-positive rate on the x-axis. AUC is defined as the area under the ROC curve, and it is obvious that the value of this area will not be greater than 1. Since the ROC curve is generally above the line

y = x

, the AUC value ranges between 0.5 and 1. The AUC value is used as the criterion for the classification effect of the classifier. The larger the AUC value is, the better the classification effect is. In the training set, it can be seen that the GNN has a high degree of fitting, and its AUC value reaches 0.821. Furthermore, the fitting degree of TED-NN is close to that of the AHP, and the AUC value is about 0.711. It is understandable that the GNN obtains the highest AUC value on the training set. The focus should be on the results of the model using the test set. As can be seen in Figure 9, the AUC value of the AHP is 0.771, GNN’s is 0.795, and TED-NN’s is 0.804. It can be seen that the fitting degree of TED-NN is the best.

5.4. Performance Comparison at Limited-Data Stage

In order to simulate the process of data accumulation as the business develops, we designed the following experiment to observe the performance of the TED-NN model during the limited-data stage.

Firstly, 10% of the training data are selected, and the model starts training from a completely untrained state. And, the test dataset is used to test the model and obtain performance evaluation metrics.

Subsequently, we add new training data to reach 20% of the entire training set, continue training the model, and test the model with test data.

Then, the above process is repeated to increase the training data to 30%, 40%, …, 90%, and the performance metrics of the model for each round are obtained. The results are shown in Table 6.

Figure 10 provides a more intuitive representation of the changes in F1-Score with increasing training data. It can be clearly seen that the performance of the TED-NN model improves with the gradual increase in training data, approaching the fully trained GNN model.

Figure 11, which illustrates the changes in Accuracy for the TED-NN model as training data increase, reinforces the observation made in Figure 10. Similar to the F1-Score, the Accuracy of the TED-NN model consistently improves, eventually converging with that of the fully trained GNN model.

From the above experimental results, it can be observed that TED-NN demonstrates significant advantages in accuracy over the AHP model or traditional neural networks. Since TED-NN is essentially a neural network with trimmed network nodes, it still conforms to the general characteristics of a GNN. Therefore, factors such as the representativeness of training samples, the size of training samples, training methods, and activation functions can affect TED-NN’s performance. When there are no data available for training it, it is equivalent to an AHP model built based on experience, which performs better than an untrained GNN. When there is a small amount of training data, the TED-NN model can be improved and its performance can surpass that of the AHP model. When there are enough data, the performance of TED-NN can approach that of the fully trained GNN. However, we speculate that even with substantial data, TED-NN is unlikely to surpass the GNN due to the inherent complexity and stronger expressive power of fully connected GNNs.

As mentioned earlier, in reality, data are gradually accumulated, and this characteristic of the TED-NN model makes it of great practical significance. That is, enterprises can continuously collect relevant data in the process of conducting business to improve evaluation models and fully utilize the value of data.

6. Conclusions

Credit risk assessment modeling is a widely discussed topic across various domains. This paper focuses on the long data accumulation process that may exist in credit risk modeling. The multi-layer indicator system model based on expert experience is a modeling scheme that does not require risk data. In addition to the indicator system itself, the weights also reflect the subjectivity and empiricism of experts. In the initial stage of business with only a small amount of data or even no data, the empirical model solution is deemed a viable solution. However, as business develops and risk data gradually accumulate, it is necessary to apply the burgeoning business data to improve the model and bolster its accuracy.

The transition methodology proposed in this work can effectively solve the smooth-transition problem between empirical models and data-driven models. With the proposed methodology, the structure and connection weights of the TED-NN model are derived from the multi-indicator system of the empirical model. After being adjusted by smoothing algorithms, TED-NN can effectively inherit most of the abilities and characteristics of the empirical model.

While the proposed TED-NN can assimilate expert experience through the AHP and enhance its performance with real data, there are some potential limitations. Specifically, the construction of the initial empirical model is subject to the quality of the expert’s judgment and experience. The quality of the initial expert input is important, as it directly influences the AHP model’s efficacy, which, in turn, sets the stage for TED-NN’s initial performance. Moreover, the presence of noisy data could also impact the predictive outcomes, underscoring the need for the careful curation of expert input and data preprocessing.

Furthermore, it should be pointed out that the learning algorithm for adjusting weights used in this paper is still a gradient descent learning algorithm. But, in fact, there are many globally optimized weight adjustment algorithms for neural networks, such as learning algorithms combined with genetic algorithms, simulated annealing algorithms, and particle swarm optimization algorithms. The TED-NN model can also try these algorithms to theoretically further improve model performance. Then, the ANP, which is an extension of the AHP, can be used in combination with the neural network. This will make it widely applicable in many other fields, not only in the field of credit evaluation. In addition, the AHP model used in this paper has only three layers. But, the AHP model may have more layers in reality. However, deep networks can cause many problems, such as the more serious issue of gradient disappearance. Hence, applying the deep learning mechanism and determining the appropriate activation function to manage this situation is also a research direction that can be considered in the future.

Author Contributions

Conceptualization, Q.X. and H.L.; methodology, H.L.; validation, Q.L. and H.L.; formal analysis, Q.L.; writing—original draft preparation, H.L. and Q.L.; writing—review and editing, Q.X. and Q.L.; visualization, Q.L. and Q.X.; supervision, Q.X. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Science Foundation of the Ministry of Education of China (18YJAZH038), the Hunan Provincial Education Scientific Research Project (23C0266), and the Research Project of Shaoyang University (24KYQD17).

Data Availability Statement

The original data presented in the study are openly available in the UCI machine learning repository at 10.24432/C5NC77.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Table A1. Judgment matrix A.

A	B1	B2	B3	B4	B5	Weights
B1	1	1/3	1/5	1/7	1/2	0.0480
B2	3	1	1/3	1/5	1	0.1014
B3	5	3	1	1/2	5	0.2888
B4	7	5	2	1	7	0.4829
B5	2	1	1/5	1/7	1	0.0789

Note:

λ_{m a x} = 5.1094, C I = 0.0273, C R = 0.0244 < 0.1

.

Table A2. Judgment matrix

B_{1} - C

.

Table A2. Judgment matrix

B_{1} - C

.

B1	C1	C2	C3	C4	C5	C6	C7	Weights
C1	1	1/3	1/3	2	1	1/2	1/3	0.0755
C2	3	1	1	3	2	1/2	1/3	0.1415
C3	3	1	1	3	2	1	1/2	0.1656
C4	1/2	1/3	1/3	1	1/2	1/3	1/5	0.0492
C5	1	1/2	1/2	2	1	1/5	1/3	0.0744
C6	2	2	1	3	5	1	1/2	0.1966
C7	3	3	2	5	3	2	1	0.2971

Note:

λ_{m a x} = 7.2395, C I = 0.0399, C R = 0.0294 < 0

.

Table A3. Judgment matrix

B_{2} - C

.

Table A3. Judgment matrix

B_{2} - C

.

B2	C8	C9	C10	Weights
C8	1	1/5	1/3	0.1095
C9	5	1	2	0.5816
C10	3	1/2	1	0.3090

Note:

λ_{m a x} = 3.0037, C I = 0.0018, C R = 0.0032 < 0.1

.

Table A4. Judgment matrix

B_{3} - C

.

Table A4. Judgment matrix

B_{3} - C

.

B3	C11	C12	C13	C14	C15	Weights
C11	1	1	1	3	2	0.2601
C12	1	1	1	3	2	0.2601
C13	1	1	1	3	2	0.2601
C14	1/3	1/3	1/3	1	1/2	0.0819
C15	1/2	1/2	1/2	2	1	0.1378

Note:

λ_{m a x} = 5.0099, C I = 0.0025, C R = 0.0022 < 0.1

.

Table A5. Judgment matrix

B_{4} - C

.

Table A5. Judgment matrix

B_{4} - C

.

B4	C16	C17	C18	Weights
C16	1	5	4	0.6870
C17	1/5	1	2	0.1865
C18	1/4	1/2	1	0.1265

Note:

λ_{m a x} = 3.0940, C I = 0.0470, C R = 0.0810 < 0.1

.

Table A6. Judgment matrix

B_{5} - C

.

Table A6. Judgment matrix

B_{5} - C

.

B5	C19	C20	Weights
C19	1	2	0.6667
C20	1/2	1	0.3333

Note:

λ_{m a x} = 2, C I = 0, C R = 0 < 0.1

.

Appendix B

Table A7. Initial weight matrix between output layer and hidden layer of TED-NN.

	A
B1	0.0480
B2	0.1014
B3	0.2888
B4	0.4829
B5	0.0789

Table A8. Initial weight matrix between hidden layer and input layer of TED-NN.

	B1	B2	B3	B4	B5
C1	0.0755	0	0	0	0
C2	0.1415	0	0	0	0
C3	0.1656	0	0	0	0
C4	0.0492	0	0	0	0
C5	0.0744	0	0	0	0
C6	0.1966	0	0	0	0
C7	0.2971	0	0	0	0
C8	0	0.1095	0	0	0
C9	0	0.5816	0	0	0
C10	0	0.3090	0	0	0
C11	0	0	0.2601	0	0
C12	0	0	0.2601	0	0
C13	0	0	0.2601	0	0
C14	0	0	0.0819	0	0
C15	0	0	0.1378	0	0
C16	0	0	0	0.6870	0
C17	0	0	0	0.1865	0
C18	0	0	0	0.1265	0
C19	0	0	0	0	0.6667
C20	0	0	0	0	0.3333

Table A9. Connection matrix between output layer and hidden layer of TED-NN.

	A
B1	1
B2	1
B3	1
B4	1
B5	1

Table A10. Connection matrix between hidden layer and input layer of TED-NN.

	B1	B2	B3	B4	B5
C1	1	0	0	0	0
C2	1	0	0	0	0
C3	1	0	0	0	0
C4	1	0	0	0	0
C5	1	0	0	0	0
C6	1	0	0	0	0
C7	1	0	0	0	0
C8	0	1	0	0	0
C9	0	1	0	0	0
C10	0	1	0	0	0
C11	0	0	1	0	0
C12	0	0	1	0	0
C13	0	0	1	0	0
C14	0	0	1	0	0
C15	0	0	1	0	0
C16	0	0	0	1	0
C17	0	0	0	1	0
C18	0	0	0	1	0
C19	0	0	0	0	1
C20	0	0	0	0	1

References

Rakhaev, V. Developing credit risk assessment methods to make loss provisions for potential loans. Financ. Theory Pract. 2020, 24, 82–91. [Google Scholar] [CrossRef]
Bostrom, A.; French, S.; Gottlieb, S. Risk Assessment, Modeling and Decision Support; Springer: Berlin/Heidelberg, Germany, 2008. [Google Scholar]
Scarlat, E.; Chirita, N.; Bradea, I.A. Indicators and metrics used in the enterprise risk management (ERM). Econ. Comput. Econ. Cybern. Stud. Res. J. 2012, 46, 5–18. [Google Scholar]
Bensi, M.T. A Bayesian Network Methodology for Infrastructure Seismic Risk Assessment and Decision Support. Ph.D. Thesis, University of California, Berkeley, CA, USA, 2010. [Google Scholar]
Mantena, R.; Tilson, V.; Zheng, X. Literature survey: Mathematical models in the analysis of durable goods with emphasis on information systems and operations management issues. Decis. Support Syst. 2012, 53, 331–344. [Google Scholar] [CrossRef]
Eguchi, R.T.; Seligson, H.A. Loss Estimation Models and Metrics. In Risk, Governance and Society; Springer: Berlin/Heidelberg, Germany, 2008; pp. 135–170. [Google Scholar]
Macharis, C.; Springael, J.; De Brucker, K.; Verbeke, A. PROMETHEE and AHP: The design of operational synergies in multicriteria analysis.: Strengthening PROMETHEE with ideas of AHP. Eur. J. Oper. Res. 2004, 153, 307–317. [Google Scholar] [CrossRef]
Moradi, S.; Mokhatab Rafiei, F. A dynamic credit risk assessment model with data mining techniques: Evidence from Iranian banks. Financ. Innov. 2019, 5, 15. [Google Scholar] [CrossRef]
Xu, Y.; Zhang, Y. A online credit evaluation method based on AHP and SPA. Commun. Nonlinear Sci. Numer. Simul. 2009, 14, 3031–3036. [Google Scholar] [CrossRef]
Wu, W.; Kou, G.; Peng, Y. Group decision-making using improved multi-criteria decision making methods for credit risk analysis. Filomat 2016, 30, 4135–4150. [Google Scholar] [CrossRef]
Habachi, M.; Benbachir, S. Combination of linear discriminant analysis and expert opinion for the construction of credit rating models: The case of SMEs. Cogent Bus. Manag. 2019, 6, 1685926. [Google Scholar] [CrossRef]
Roy, P.K.; Shaw, K. A credit scoring model for SMEs using AHP and TOPSIS. Int. J. Financ. Econ. 2023, 28, 372–391. [Google Scholar] [CrossRef]
Malekipirbazari, M.; Aksakalli, V. Risk assessment in social lending via random forests. Expert Syst. Appl. 2015, 42, 4621–4631. [Google Scholar] [CrossRef]
Papouskova, M.; Hajek, P. Two-stage consumer credit risk modelling using heterogeneous ensemble learning. Decis. Support Syst. 2019, 118, 33–45. [Google Scholar] [CrossRef]
Machado, M.R.; Karray, S. Assessing credit risk of commercial customers using hybrid machine learning algorithms. Expert Syst. Appl. 2022, 200, 116889. [Google Scholar] [CrossRef]
Zhang, W.; Yan, S.; Li, J.; Tian, X.; Yoshida, T. Credit risk prediction of SMEs in supply chain finance by fusing demographic and behavioral data. Transp. Res. Part Logist. Transp. Rev. 2022, 158, 102611. [Google Scholar] [CrossRef]
El-Qadi, A.; Trocan, M.; Conde-Cespedes, P.; Frossard, T.; Díaz-Rodríguez, N. Credit Risk Scoring Using a Data Fusion Approach. In Proceedings of the International Conference on Computational Collective Intelligence, Budapest, Hungary, 27–29 September 2023; Springer: Cham, Switzerland, 2023; pp. 769–781. [Google Scholar]
Khemakhem, S.; Said, F.B.; Boujelbene, Y. Credit risk assessment for unbalanced datasets based on data mining, artificial neural network and support vector machines. J. Model. Manag. 2018, 13, 932–951. [Google Scholar] [CrossRef]
Tran, T.C.; Dang, T.K. Machine learning for prediction of imbalanced data: Credit fraud detection. In Proceedings of the 2021 15th International Conference on Ubiquitous Information Management and Communication (IMCOM), Seoul, Republic of Korea, 4–6 January 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–7. [Google Scholar]
Shen, F.; Zhao, X.; Kou, G.; Alsaadi, F.E. A new deep learning ensemble credit risk evaluation model with an improved synthetic minority oversampling technique. Appl. Soft Comput. 2021, 98, 106852. [Google Scholar] [CrossRef]
Hussin Adam Khatir, A.A.; Bee, M. Machine learning models and data-balancing techniques for credit scoring: What is the best combination? Risks 2022, 10, 169. [Google Scholar] [CrossRef]
Trivedi, S.K. A study on credit scoring modeling with different feature selection and machine learning approaches. Technol. Soc. 2020, 63, 101413. [Google Scholar] [CrossRef]
Arora, N.; Kaur, P.D. A Bolasso based consistent feature selection enabled random forest classification algorithm: An application to credit risk assessment. Appl. Soft Comput. 2020, 86, 105936. [Google Scholar] [CrossRef]
Liu, X.; Li, Y.; Dai, C.; Zhang, H. A hierarchical attention-based feature selection and fusion method for credit risk assessment. Future Gener. Comput. Syst. 2024, 160, 537–546. [Google Scholar] [CrossRef]
Soui, M.; Gasmi, I.; Smiti, S.; Ghédira, K. Rule-based credit risk assessment model using multi-objective evolutionary algorithms. Expert Syst. Appl. 2019, 126, 144–157. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A unified approach to interpreting model predictions. Adv. Neural Inf. Process. Syst. 2017, 30, 4768–4777. [Google Scholar]
Linardatos, P.; Papastefanopoulos, V.; Kotsiantis, S. Explainable ai: A review of machine learning interpretability methods. Entropy 2020, 23, 18. [Google Scholar] [CrossRef] [PubMed]
Hui, Z.; Li, A.; Xiao, H. The Study on Determining in Indicator Weight of Performance Evaluation in Multilayer and many Indicators. J. Inn. Mong. Univ. (Humanit. Soc. Sci.) 2006, 2, 94–97. [Google Scholar]
Saaty, T.L.; Kearns, K.P. The Analytic Hierarchy Process. In Analytical Planning; Elsevier: Amsterdam, The Netherlands, 1985. [Google Scholar]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Simon, J.M. How Your FICO Credit Score is Calculated: Payment History. Available online: https://www.foxbusiness.com/features/how-your-fico-credit-score-is-calculated-payment-history (accessed on 24 May 2024).
Youden, W.J. Index for rating diagnostic tests. Cancer 1950, 3, 32–35. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Stages of data accumulation.

Figure 2. Indicator system structure of risk assessment model.

Figure 3. Neuron model.

Figure 4. Framework of TED-NN.

Figure 5. Difference in network structure between TED-NN and GNN.

Figure 6. Smoothing algorithm.

Figure 7. German credit assessment indicator system.

Figure 8. The structure of TED-NN for the German credit dataset. Note: Due to the large number of values, only the weights in Table A1 and the first weights in Table A2, Table A3, Table A4, Table A5 and Table A6 are presented.

Figure 9. ROC curves and AUC values. Note: The upper part of the figure corresponds to the training set, and the lower part of the figure corresponds to the test set.

Figure 10. F1-Score of TED-NN with increasing training data.

Figure 11. Accuracy of TED-NN with increasing training data.

Table 1. Pairwise comparison scale for AHP preferences.

Numerical Rating	Verbal Judgments of Preferences
9	Extremely preferred
8	Very strongly to extremely preferred
7	Very strongly preferred
6	Strongly to very strongly preferred
5	Strongly preferred
4	Moderately to strongly preferred
3	Moderately preferred
2	Equally to moderately preferred
1	Equally preferred

Table 2. Average random consistency.

Size of Matrix	1	2	3	4	5	6	7	8	9	10
Random consistency	0	0	0.58	0.9	1.12	1.24	1.32	1.41	1.45	1.49

Table 3. German credit dataset description.

Variable	Description
status	Status of the debtor’s checking account with the bank
duration	Credit duration in months
credit_history	History of compliance with previous or concurrent credit contracts
purpose	Purpose for which the credit is needed
amount	Credit amount in DM
savings	Debtor’s savings
employment_duration	Duration of debtor’s employment with current employer
installment_rate	Credit installments as a percentage of debtor’s disposable income
personal_status_sex	Combined information on sex and marital status
other_debtors	Is there another debtor or a guarantor for the credit?
present_residence	Length of time the debtor has lived in the present residence
property	The debtor’s most valuable property
age	Age in years
other_installment_plans	Installment plans from providers other than the credit-giving bank
housing	Type of housing the debtor lives in
number_credits	Number of credits, including the current one, the debtor has at this bank
job	Quality of debtor’s job
people_liable	Number of persons who financially depend on the debtor
telephone	Is there a telephone landline registered in the debtor’s name?
foreign_worker	Is the debtor a foreign worker?
credit_risk	Has the credit contract been complied with (good) or not (bad)?

Table 4. The performance comparison of the three models at the no-data stage.

Model	Accuracy	Precision	Recall	F1-Score
AHP	0.7600	0.7222	0.4063	0.5200
GNN	0.5300	0.2857	0.3125	0.2985
TED-NN	0.6600	0.4737	0.5625	0.5143

Each metric’s highest score in the table is highlighted in bold.

Table 5. Performance comparison of three models at no-data stage.

Model	Accuracy	Precision	Recall	F1-Score
AHP	0.7600	0.7222	0.4063	0.5200
GNN	0.8900	0.8621	0.7813	0.8197
TED-NN	0.8900	0.9200	0.7188	0.8070

Each metric’s highest score in the table is highlighted in bold.

Table 6. Performance of TED-NN model with increased training data.

Training Data Used	Accuracy	Precision	Recall	F1-Score
0%	0.6600	0.4737	0.5625	0.5143
10%	0.6700	0.4872	0.5938	0.5352
20%	0.6800	0.5000	0.6250	0.5556
30%	0.7000	0.5278	0.5938	0.5588
40%	0.7200	0.5526	0.6563	0.6000
50%	0.7100	0.5366	0.6875	0.6027
60%	0.7500	0.5946	0.6875	0.6377
70%	0.7800	0.6389	0.7188	0.6765
80%	0.8100	0.7097	0.6875	0.6984
90%	0.8400	0.7500	0.7500	0.7500
100%	0.8900	0.8889	0.7500	0.8136

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, H.; Lan, Q.; Xiong, Q. Methodology for Smooth Transition from Experience-Based to Data-Driven Credit Risk Assessment Modeling under Data Scarcity. Mathematics 2024, 12, 2410. https://doi.org/10.3390/math12152410

AMA Style

Li H, Lan Q, Xiong Q. Methodology for Smooth Transition from Experience-Based to Data-Driven Credit Risk Assessment Modeling under Data Scarcity. Mathematics. 2024; 12(15):2410. https://doi.org/10.3390/math12152410

Chicago/Turabian Style

Li, Hengchun, Qiujun Lan, and Qingyue Xiong. 2024. "Methodology for Smooth Transition from Experience-Based to Data-Driven Credit Risk Assessment Modeling under Data Scarcity" Mathematics 12, no. 15: 2410. https://doi.org/10.3390/math12152410

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Methodology for Smooth Transition from Experience-Based to Data-Driven Credit Risk Assessment Modeling under Data Scarcity

Abstract

1. Introduction

2. Preliminaries

2.1. Indicator System of Risk Assessment Model

2.2. Analytic Hierarchy Process

2.3. Back-Propagation Neural Network

3. Methodology

3.1. Framework and Model Structure

3.2. Algorithms

3.2.1. Smooth-Transition Algorithm

3.2.2. Modification of the BP Algorithm

4. A Case of TED-NN Modeling

4.1. Construction of Indicator System

4.2. Construction of AHP Model

4.3. Construction of TED-NN Model

5. Experiments

5.1. Experimental Preparation

5.2. Performance Comparison at No-Data Stage

5.3. Performance Comparison at Sufficient-Data Stage

5.4. Performance Comparison at Limited-Data Stage

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix B

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI