1. Introduction
Credit risk assessment is a significant topic in the realm of finance, serving as a critical foundation for decision-making in loan approvals and credit card issuance. According to the Basel II Accord, credit risk is one of the risks that financial institutions face when allocating resources. It is defined as the probability of loss for a lender when a borrower fails to meet the terms of a loan or credit agreement [
1]. As we know, each loan default data record means a monetary loss. For instance, a lending institution that has been in operation for about two years has around 2000 customers, a loan balance of about CNY 1 billion (USD 150 million), and a bad debt loss of about CNY 5 million, with fewer than 20 default records. On average, each default record incurs a loss of approximately CNY 250,000. In this context, enabling financial institutions to accurately assess the likelihood of borrowers defaulting on their obligations is the most effective way to safeguard them against potential losses.
Risk assessment models are used to identify, quantify, and evaluate risks, which can help organizations understand potential risks and take appropriate risk management measures [
2]. In the context of finance, these assessment models are essential tools for managing and mitigating the credit risks inherent in lending activities. Assessment models can be broadly categorized into qualitative and quantitative methodologies. Qualitative assessments leverage expert judgment to evaluate risks through observation, investigation, and analysis, which is especially effective for risks that are difficult to quantify. Common qualitative assessment techniques encompass brainstorming sessions, Delphi consultations, in-depth interviews with experts, and comprehensive scenario analyses [
2,
3].
In contrast, quantitative assessments rely on numerical methods to measure risk by quantifying the risk factors and potential for loss. This characteristic makes quantitative assessments suitable for situations where precise numerical analysis is required. Typical quantitative risk assessment methods include Monte Carlo simulation, sensitivity analysis, probabilistic risk assessment, fault tree analysis, decision tree analysis, risk matrices, etc. [
4,
5]. In practical applications, hybrid methodologies that integrate both qualitative and quantitative risk assessment methods have gained significant popularity. Identifying risk factors is crucial in credit risk assessment. Widely used methodologies begin by employing qualitative techniques, such as brainstorming or expert interviews, to identify and describe potential risk factors. With the insights of experts, these risks are preliminarily prioritized. Following this, probabilistic analysis and various mathematical techniques are applied to quantify risk factors, allowing for the detailed analysis and comprehensive computation of potential risks [
5,
6]. In these methods, risk assessment factors are usually organized into a hierarchical risk indicator system based on expert judgment. Determining the appropriate weights for various risk indicators is key to constructing a credit risk assessment model. The Analytic Hierarchy Process (AHP), introduced by Thomas L. Saaty, is a classic technique for prioritizing indicator systems, which harnesses the expertise of professionals to evaluate the relative significance of diverse risk indicators. By constructing a judgment matrix and applying mathematical algorithms, the AHP ascertains the weightage of each indicator. This methodology offers greater precision and consistency compared to the direct assignment of weights by experts [
7].
The development and validation of credit risk assessment models is an important and complex process. To better quantify financial risks and allocate economic capital, financial institutions have invested substantial resources to develop internal credit risk assessment models over the past decades [
8]. Traditional credit assessment modeling approaches have relied on the expertise of professionals [
9,
10,
11,
12]. For instance, Roy et al. employed a hybrid Analytic Hierarchy Process to construct a credit scoring model [
12]. Wu et al. proposed a multi-criteria decision-making method that enhances the validity of the assessment results by considering the cognitive levels of various experts and the degree of gray relation [
10]. Habachi et al. introduced a novel method combining linear discriminant analysis with expert opinions, using Bayesian logic to determine the probability of default. This approach leverages expert knowledge to complement the deficiencies of statistical models, potentially increasing the accuracy and reliability of the model when dealing with complex credit assessment tasks [
11]. Credit assessment modeling methods based on expert judgment are capable of handling complex situations where data are insufficient or difficult to quantify, and the decision-making process is usually more interpretable. However, there are limitations to this type of approach, including the possibility of being limited by personal bias and experience, inconsistency in results, poor scalability, and difficulty in quickly adapting to changes in the application environment.
With the dawn of the big data era, the data-driven model-building paradigm has gained widespread recognition. And, the integration of credit risk assessment modeling with machine learning has become a significant research direction in the financial industry. The existing research on credit risk assessment modeling mostly focuses on the development and improvement of models, such as the development of credit assessment models based on different machine learning algorithms [
13,
14,
15] or the fusion of data from different sources to improve predictive performance [
16,
17]. Some researchers have attempted to improve classification accuracy by enhancing data preprocessing strategies, such as developing solutions to address the data imbalance issue, which is a common challenge in credit risk assessment. Previous works have primarily employed techniques like random oversampling, random undersampling, improved oversampling techniques, and methods that combine Random Forest with Recursive Feature Elimination and random oversampling to address the data imbalance issue, thereby improving the model’s generalization capabilities and predictive performance [
18,
19,
20,
21]. Some researchers have also optimized models by applying feature selection techniques [
22,
23,
24]. For example, Shrawan et al. explored various combinations of different feature selection techniques (such as Information Gain, Gain Ratio, and Chi-Square) and machine learning classifiers to enhance the robustness and accuracy of the assessment model [
22]. Arora et al. introduced a novel Bolasso-enabled Random Forest algorithm to provide great stability of features, which gains improved credit risk assessment performance [
23]. Despite the excellent predictive capabilities of machine learning models, their “black box” nature often raises concerns among regulatory agencies and users. Therefore, enhancing the interpretability and transparency of models is also an important direction for research. Some studies have proposed frameworks and technologies to make machine learning models more transparent and interpretable [
25,
26,
27]. For instance, Lundberg et al. introduced SHAP (SHapley Additive exPlanations), offering a unified framework for interpreting the predictions of complex models [
26]. By assigning an importance value to each feature’s contribution to a specific prediction, it helps users better understand the model’s decision-making process.
Data-driven modeling approaches diminish the impact of human bias and enhance the accuracy of assessments by learning complex patterns within the data. However, sufficient training data are a necessary prerequisite for adopting this type of approach. Unfortunately, for many newly launched projects, the available sample size is quite limited due to the short operation time. As we know, developing customized credit risk assessment models for different customer groups is crucial for lending institutions. Before or in the early stages of launching a new business, lending institutions may not have enough historical information about this customer segment. As shown in
Figure 1, with the development of the business, repayment data gradually accumulate. Data will go through different stages, from no data and limited data to sufficient data. However, achieving a sufficient data volume phase for lending institutions may require several years. Consequently, in the absence of substantial data, the establishment of machine learning-based risk assessment models is not feasible without alternative data sources. Therefore, businesses often commence operations by inviting experts to construct qualitative risk assessment models. Nonetheless, this approach precludes the integration of subsequent operational data into the models. Financial institutions’ business data are often derived from costly lessons learned, such as loan defaults. The inability to leverage such valuable data is indeed regrettable.
This raises the following challenges: How can these valuable data be effectively utilized to improve the assessment models established during the data-scarce phase? How can the model smoothly transition from an empirical model to a data-driven model as data gradually accumulate? Current research seldom focuses on the transition process of credit risk assessment models from a data-scarce state to a data-accumulating state. To address the above-mentioned challenges, this paper proposes a novel credit risk assessment modeling methodology called TED-NN (Transition from Empirical model to Data-driven model). Based on TED-NN, indicators and weights are determined based on expert experience during the no-data phase. As the business develops, a small amount of data can be fully utilized to improve the accuracy of the assessment model. When data are abundant, the model naturally evolves into a fully data-driven model. The entire process does not require model reconstruction; it only needs to iteratively optimize the model using newly generated business data. The main contributions of this paper are as follows:
Although the AHP method can construct models using expert experience during the initial stage of business without relying on historical data, the limitations of expert experience and human cognitive biases lead to the subjectivity of the AHP model. The methodology proposed in this paper combines the advantages of the AHP and neural networks. It not only utilizes expert experience to build models in the early stages of business but also incorporates real data to continuously and dynamically update the model.
Our method directly constructs a neural network model by referring to the AHP model structure, which avoids the problem of determining the nodes in the neural network design and also gives a specific meaning to each node, thereby improving the comprehensibility of the model.
The establishment of our model is relatively straightforward, requiring only slight modifications to the original BP neural network learning algorithm. By directly inheriting the initial weight from the AHP, we ensure that the neural network starts from a favorable position and effectively avoids premature convergence to a local optimum. Furthermore, this model does not require a large amount of training data; even a small amount of data can be used for the model to learn and improve.
The remainder of this paper is organized as follows:
Section 2 briefly reviews the preliminary knowledge of the indicator system, the Analytic Hierarchy Process, and the BP algorithm of the neural network.
Section 3 describes the main ideas, technical points, and key algorithms of TED-NN in detail.
Section 4 provides a specific case to illustrate the application of TED-NN in credit evaluation.
Section 5 presents the comparison results between TED-NN, an empirical model, and a data-driven model for the three stages of data accumulation. Finally,
Section 6 concludes the paper and gives an outlook on future research work.
2. Preliminaries
2.1. Indicator System of Risk Assessment Model
The establishment of the indicator system is the prerequisite and basis for the risk evaluation model. Constructing an indicator system is the process of decomposing the abstract object into a behavioral, operable structure according to its essential attributes and the identity of certain features [
28]. Typically, an indicator system exhibits a multi-layer structure, including an objective indicator, sub-objective indicators, and operational indicators, and it manifests as a tree-like or net-like structure (as shown in
Figure 2). The objective indicator is located at the top level of the indicator system and represents the comprehensive evaluation results; operational indicators are at the bottom level and comprise multiple indicators that make it easy to directly obtain quantitative results; and sub-objective indicators are located between the two and can contain multiple layers, reflecting the relationship and mechanism of action between the objective indicator and operational indicators.
Establishing an evaluation indicator system usually includes the following steps: (1) clarify the evaluation purpose; (2) determine the evaluation object; (3) collect and evaluate information related to the object, including expert opinions, stakeholder needs, etc.; (4) in a layered manner, determine the different dimensions of indicators based on the evaluation purpose; (5) select specific operational indicators, which should be quantifiable, comparable, and actionable; (6) assign weights to each indicator based on their importance and contribution to the evaluation objectives. In this process, qualitative methods such as discussion and judgment are usually used to obtain results based on expert experience and knowledge. Assigning weights to various indicators in the system is a crucial and challenging task. In modeling practice, it is very common to adopt subjective experiential methods.
2.2. Analytic Hierarchy Process
The AHP was proposed by the American operations researcher Professor T.L. Saaty of the University of Pittsburgh in the early 1970s. The differences between various judgment elements (indicators) are quantified, and the subjective judgments of the expert are calculated and converted into weight parameters by using a mathematical method [
29]. It is very suitable for assigning weights to various indicators of a hierarchical indicator system. The main process is as follows:
- (1)
A set of pairwise comparison matrices (size
) is constructed for each of the lower levels using the relative scale measurement shown in
Table 1, and a matrix is built for each element in the higher level immediately following it. The judgment matrices were developed through a structured Delphi process involving a panel of experts in the field of credit risk assessment. Each expert was asked to provide pairwise comparisons of the criteria based on their experience and understanding of the domain. These comparisons were then synthesized into a collective judgment matrix using geometric mean aggregation, which helps to mitigate the impact of potential outliers and biases in individual judgments.
- (2)
Pairwise comparisons are conducted based on the element dominance. There are judgments required to develop the set of matrices. Reciprocals are automatically assigned in each pairwise comparison.
- (3)
Hierarchical synthesis is adopted to assign weights to eigenvectors based on criteria weights, and the sum of weighted eigenvector entries that correspond to those in the subsequent lower level of the hierarchy is calculated.
- (4)
Having made all pairwise comparisons, the consistency is determined by using the eigenvalue
to calculate the consistency index
as follows:
, where
n represents the size of the matrix. Judgment consistency can be checked by taking the consistency ratio (CR) of the CI with the appropriate value in
Table 2. The CR is accepted if it does not exceed 0.10. If it is more, the judgment matrix is inconsistent. To obtain a consistent matrix, judgments should be reviewed and improved.
- (5)
Steps (1)–(4) are performed for all layers in the hierarchy.
The main advantage of the AHP is that it organically combines qualitative methods with quantitative methods. It utilizes human subjective experience to judge the relative importance of each indicator. The judgment matrix in this method is the core, but it requires significant cost and effort to construct. In practice, due to the limitation of human cognitive ability, it is not easy to construct a judgment matrix that satisfies the consistency requirement on the first attempt, and it is often necessary to repeatedly modify it to achieve consistency. It can be seen that the indicator weights determined by the AHP method are essentially experience-based.
2.3. Back-Propagation Neural Network
The neural network is a machine learning method constructed by a number of simple components in a massively parallel connection that mimics the biological nervous system based on biological neurons. The neural network functions as a sophisticated, adaptive, and nonlinear system capable of acquiring and learning knowledge from its surroundings through internal neurons. An artificial neuron generally serves as a nonlinear processing unit that receives multiple inputs and generates a single output, and their structures are shown in
Figure 3.
The model can be expressed in mathematical language as
where
is the input value,
is the weight of the
i-th input connection of the
k-th neuron,
is the bias level of neuron
k,
is the weighted sum of neuron
k for each input and bias,
is the activation function, which is usually a nonlinear function, and
is the output value of the neuron.
The process of improving a neural network’s performance is known as training, which involves optimizing the network’s weights and biases. The back-propagation (BP) algorithm, introduced by Rumelhart et al. in 1986 [
30], stands as a foundational learning algorithm for neural networks. The algorithm is divided into two phases. The first phase is the forward process, in which the output of each neural layer is determined iteratively, layer by layer. The second phase is the reverse process, which calculates the error of each hidden layer node layer by layer and thus corrects the connection weight with the front layer. Given a predefined network architecture and initial weights, the BP algorithm iterates through the following processes until convergence:
- (1)
Calculate the output of each node from front to back:
- (2)
Calculate the corresponding delta value for each node
k of the output layer:
where
is the supervisory signal (supervised output), and
is the derivative of the activation function.
- (3)
Calculate the delta value of the hidden layer from back to front:
- (4)
Calculate and save the weight correction for each node:
Here, is the learning rate used to adjust the rate of learning.
- (5)
Correct the connection weight between each node:
Although a neural network has many advantages, such as strong robustness, fault tolerance, parallel processing capability, and self-learning, its disadvantages are also obvious. Generally, the structure of a neural network, including the network depth and the number of nodes in each layer, must be established before the training phase. In particular, the configuration of the hidden layers often lacks concrete theoretical guidance, resulting in significant unpredictability. When the activation function is nonlinear, the BP learning algorithm is a local optimization method, and the global optimal solution is not necessarily obtained. Since the neural network is a local optimization method, different initial values of weights may lead to different local minimum points. Therefore, when the initial value is far from the global advantage, the final optimization result may still be far from the global optimal value. The neural network model is like a black box. The weight values seen by opening the black box are not as clear as the weights in the AHP model. And, each node does not have any practical meaning. In addition, a neural network is a supervised machine learning model that requires enough labeled sample data for training, in contrast to the AHP. For the construction of the credit evaluation model, since default data are scarce compared with normal transaction data, it is difficult to collect enough default samples to build the model in advance.
5. Experiments
5.1. Experimental Preparation
To verify the effectiveness of the method proposed in this article, we conducted the following experiments. The following preparations were made before the experiment:
- (1)
A general neural network model (GNN) with the same node layout as TED-NN but with full connections between layers and a random initialization of weights was constructed. Here, the GNN model is regarded as a typical representative of data-driven models, while the AHP model can be seen as a representative of empirical models. The TED-NN proposed in this paper is a transitional model from empirical models to fully data-driven models. The experiment compared these three models.
- (2)
All data variables were normalized to the [0, 1] interval. Specifically, continuous variables include age, checking account balance, credit limit, etc., which are beneficial indicators, that is, the larger the indicator, the higher the credit score. Therefore, the following formula is used for the normalization calculation:
where
is the normalized value of the indicator, and
and
are the minimum and maximum values of the indicator, respectively. Categorized variables, such as gender and marital status, job type, length of service, etc., were converted into values within [0, 1] by combining references with expert opinions.
- (3)
Then, 90% of the dataset was randomly allocated as the training set and the remaining 10% as the testing set.
- (4)
The output of the three models is a continuous number in [0, 1], while the label variable of the German dataset is binary. It is necessary to set a threshold to convert the continuous output value into binary data. The Youden index is used to determine the threshold value. The Youden index, also known as the correct index, is a method of evaluating the authenticity of screening tests. It is calculated as follows:
Here, sensitivity represents the probability of judging a positive instance as positive, and specificity represents the probability of judging a negative instance as negative. The Youden index reflects the total ability of the screening method to find positive and negative samples. The magnitude of the index is directly proportional to the effectiveness and authenticity of the screening experiment. By maximizing the difference between the true-positive rate (sensitivity) and the false-positive rate (1-specificity), the Youden index can identify the threshold that best discriminates between classes. However, the Youden index also has one limitation. It assumes that the output is binary, and therefore, it may not be well suited for multi-class classification problems. A detailed explanation of the Youden index can be found in [
32].
In our experiment, the Youden index of each model is calculated based on the training set data, and the maximum point of the Youden index is used as the classification threshold value of each model. The threshold values for the AHP, GNN, and TED-NN are, respectively, 0.406, 0.375, and 0.47. Take AHP as an example. When the credit score of a sample is greater than 0.406, it is a good customer and vice versa. So, samples of the test set are classified as based on these threshold values to compare their differences with real credit categories.
5.2. Performance Comparison at No-Data Stage
To simulate the situation of the no-data stage, we compared the performance of three models based on the test dataset: the AHP model, the untrained TED-NN model, and the untrained GNN model.
Four classic metrics in the field of machine learning are used for performance comparison: Accuracy, Precision, Recall, and F1-Score. Accuracy measures the overall performance of the model by considering both the correctly classified non-events (true negatives) and events (true positives). It is calculated as the ratio of the number of correct predictions (both true positives and true negatives) to the total number of predictions made:
Precision is the ratio of correctly predicted positive observations to the total number of predicted positives. In credit risk assessment, Precision measures the model’s ability to avoid false positives, i.e., reflects the accuracy of the model in not granting loans to applicants who are likely to default. A high precision rate means that the majority of the loans approved by the bank are to applicants who are likely to repay their loans. Precision is calculated as follows:
Recall is the ratio of correctly predicted positive observations to all observations in the actual class. In credit risk assessment, it measures the proportion of all actual good loans that are successfully identified and approved by the model. A high Recall rate indicates that the model is effective in capturing as many potentially good customers as possible, thus minimizing the loss of business opportunities. Recall is calculated as follows:
And, F1-Score is the weighted average of Precision and Recall, and it tries to find a balance between the two. F1-Score is calculated as follows:
where
TP = true positives;
TN = true negatives;
FP = false positives;
FN = false negatives.
The values of these metrics are all between 0 and 1, with larger values indicating the better performance of the model. Here, “Bad” customers are considered positive examples because the most important aspect of a risk assessment model is to identify bad customers.
Table 4 shows the results.
From this table, it can be seen that the accuracy of the AHP model reaches 0.7600, and the F1-Score is also the highest among the three models, which clearly indicates that the empirical model is effective. In addition, regardless of which metric is used, the AHP and TED-NN are better than the untrained GNN model, indicating that although the TED-NN model has not been trained, it adopts a structure and weight similar to the AHP, and after bias correction, it already possesses most of the capabilities of the AHP model.
5.3. Performance Comparison at Sufficient-Data Stage
To simulate the situation of the sufficient-data stage, we used a full training dataset to train the TED-NN and GNN models. Then, using a test dataset to evaluate them, the AHP model remains unchanged.
Table 5 shows the results.
From these results, it can be seen that at the sufficient-data stage, the AHP is no longer the best-performing model. The GNN and TED-NN models are very close and significantly better than the AHP model. This indicates that after sufficient training with data, data-driven models will surpass empirical models. This also indicates that the performance of the TED-NN model might not surpass that of the GNN model, but the two could be very close after sufficient training.
Figure 9 shows the ROC (receiver operating characteristic) curve and AUC (area under the curve) values of the three models using the training set and the test set. The OC curve is a graphical plot that illustrates the diagnostic ability of a binary classifier system as its discrimination threshold is varied. The false-positive rate can be calculated as 1 − specificity. In general, if the probability distributions for both sensitivity and the false-positive rate are known, the ROC curve can be generated by plotting the cumulative distribution function of the sensitivity probability on the
y-axis versus the cumulative distribution function of the false-positive rate on the
x-axis. AUC is defined as the area under the ROC curve, and it is obvious that the value of this area will not be greater than 1. Since the ROC curve is generally above the line
, the AUC value ranges between 0.5 and 1. The AUC value is used as the criterion for the classification effect of the classifier. The larger the AUC value is, the better the classification effect is. In the training set, it can be seen that the GNN has a high degree of fitting, and its AUC value reaches 0.821. Furthermore, the fitting degree of TED-NN is close to that of the AHP, and the AUC value is about 0.711. It is understandable that the GNN obtains the highest AUC value on the training set. The focus should be on the results of the model using the test set. As can be seen in
Figure 9, the AUC value of the AHP is 0.771, GNN’s is 0.795, and TED-NN’s is 0.804. It can be seen that the fitting degree of TED-NN is the best.
5.4. Performance Comparison at Limited-Data Stage
In order to simulate the process of data accumulation as the business develops, we designed the following experiment to observe the performance of the TED-NN model during the limited-data stage.
Firstly, 10% of the training data are selected, and the model starts training from a completely untrained state. And, the test dataset is used to test the model and obtain performance evaluation metrics.
Subsequently, we add new training data to reach 20% of the entire training set, continue training the model, and test the model with test data.
Then, the above process is repeated to increase the training data to 30%, 40%,
…, 90%, and the performance metrics of the model for each round are obtained. The results are shown in
Table 6.
Figure 10 provides a more intuitive representation of the changes in F1-Score with increasing training data. It can be clearly seen that the performance of the TED-NN model improves with the gradual increase in training data, approaching the fully trained GNN model.
Figure 11, which illustrates the changes in Accuracy for the TED-NN model as training data increase, reinforces the observation made in
Figure 10. Similar to the F1-Score, the Accuracy of the TED-NN model consistently improves, eventually converging with that of the fully trained GNN model.
From the above experimental results, it can be observed that TED-NN demonstrates significant advantages in accuracy over the AHP model or traditional neural networks. Since TED-NN is essentially a neural network with trimmed network nodes, it still conforms to the general characteristics of a GNN. Therefore, factors such as the representativeness of training samples, the size of training samples, training methods, and activation functions can affect TED-NN’s performance. When there are no data available for training it, it is equivalent to an AHP model built based on experience, which performs better than an untrained GNN. When there is a small amount of training data, the TED-NN model can be improved and its performance can surpass that of the AHP model. When there are enough data, the performance of TED-NN can approach that of the fully trained GNN. However, we speculate that even with substantial data, TED-NN is unlikely to surpass the GNN due to the inherent complexity and stronger expressive power of fully connected GNNs.
As mentioned earlier, in reality, data are gradually accumulated, and this characteristic of the TED-NN model makes it of great practical significance. That is, enterprises can continuously collect relevant data in the process of conducting business to improve evaluation models and fully utilize the value of data.
6. Conclusions
Credit risk assessment modeling is a widely discussed topic across various domains. This paper focuses on the long data accumulation process that may exist in credit risk modeling. The multi-layer indicator system model based on expert experience is a modeling scheme that does not require risk data. In addition to the indicator system itself, the weights also reflect the subjectivity and empiricism of experts. In the initial stage of business with only a small amount of data or even no data, the empirical model solution is deemed a viable solution. However, as business develops and risk data gradually accumulate, it is necessary to apply the burgeoning business data to improve the model and bolster its accuracy.
The transition methodology proposed in this work can effectively solve the smooth-transition problem between empirical models and data-driven models. With the proposed methodology, the structure and connection weights of the TED-NN model are derived from the multi-indicator system of the empirical model. After being adjusted by smoothing algorithms, TED-NN can effectively inherit most of the abilities and characteristics of the empirical model.
While the proposed TED-NN can assimilate expert experience through the AHP and enhance its performance with real data, there are some potential limitations. Specifically, the construction of the initial empirical model is subject to the quality of the expert’s judgment and experience. The quality of the initial expert input is important, as it directly influences the AHP model’s efficacy, which, in turn, sets the stage for TED-NN’s initial performance. Moreover, the presence of noisy data could also impact the predictive outcomes, underscoring the need for the careful curation of expert input and data preprocessing.
Furthermore, it should be pointed out that the learning algorithm for adjusting weights used in this paper is still a gradient descent learning algorithm. But, in fact, there are many globally optimized weight adjustment algorithms for neural networks, such as learning algorithms combined with genetic algorithms, simulated annealing algorithms, and particle swarm optimization algorithms. The TED-NN model can also try these algorithms to theoretically further improve model performance. Then, the ANP, which is an extension of the AHP, can be used in combination with the neural network. This will make it widely applicable in many other fields, not only in the field of credit evaluation. In addition, the AHP model used in this paper has only three layers. But, the AHP model may have more layers in reality. However, deep networks can cause many problems, such as the more serious issue of gradient disappearance. Hence, applying the deep learning mechanism and determining the appropriate activation function to manage this situation is also a research direction that can be considered in the future.