Next Article in Journal
Integral Cryptanalysis of Reduced-Round IIoTBC-A and Full IIoTBC-B
Previous Article in Journal
A Novel Improved Genetic Algorithm for Multi-Period Fractional Programming Portfolio Optimization Model in Fuzzy Environment
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An FTwNB Shield: A Credit Risk Assessment Model for Data Uncertainty and Privacy Protection

1
College of Science, North China University of Science and Technology, Tangshan 063210, China
2
Hebei Engineering Research Center for the Intelligentization of Iron Ore Optimization and Ironmaking Raw Materials Preparation Processes, North China University of Science and Technology, Tangshan 063210, China
3
The Key Laboratory of Engineering Computing in Tangshan City, North China University of Science and Technology, Tangshan 063210, China
4
Hebei Key Laboratory of Data Science and Application, North China University of Science and Technology, Tangshan 063210, China
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(11), 1695; https://doi.org/10.3390/math12111695
Submission received: 17 April 2024 / Revised: 20 May 2024 / Accepted: 24 May 2024 / Published: 29 May 2024

Abstract

:
Credit risk assessment is an important process in bank financial risk management. Traditional machine-learning methods cannot solve the problem of data islands and the high error rate of two-way decisions, which is not conducive to banks’ accurate credit risk assessment of users. To this end, this paper establishes a federated three-way decision incremental naive Bayes bank user credit risk assessment model (FTwNB) that supports asymmetric encryption, uses federated learning to break down data barriers between banks, and uses asymmetric encryption to protect data security for federated processes. At the same time, the model combines the three-way decision methods to realize the three-way classification of user credit (good, bad and delayed judgment), so as to avoid the loss of bank interests caused by the forced division of uncertain users. In addition, the model also incorporates incremental learning steps to eliminate training samples with poor data quality to further improve the model performance. This paper takes German Credit data and Default of Credit Card Clients data as examples to conduct simulation experiments. The result shows that the performance of the FTwNB model has been greatly improved, which verifies that it has good credit risk assessment capabilities.

1. Introduction

With the development of social economy, more and more individual users have begun to apply for credit loans from banks, but for banks, not all users meet the loan requirements, which requires credit risk assessment of users. Therefore, the task of credit risk management and assessment plays a vital role in the bank’s operation, and it plays a vital role in the bank’s loan decision-making.
The early traditional credit risk rating methods mainly relied on manual judgment and experience accumulation, which had the problems of strong subjectivity and instability, and the judgment results could not convince most people. In 1968, Altman [1] turned the research on credit risk assessment into a quantitative analysis combined with financial indicators and discriminant analysis methods to construct a discriminant function for identifying corporate financial risks and constructed a Z-Score model. After that, Altman et al. [2] continued to improve and build the ZETA model on the basis of the Z-Score. The method of quantitative analysis is accepted to a certain extent, so it is widely used. Later, with the advent of the era of big data and the development of artificial intelligence technology, the huge amount of data provided data support for the training of machine-learning models, and the task method of credit risk assessment also began to change and machine-learning models began to be widely used in risk assessment. The application of machine learning eliminates the obvious subjective and human factors, so as to obtain more objective and effective evaluation results. Therefore, many scholars focus on research concerning machine-learning methods. For example, Zhang et al. [3] combined particle swarm optimization (PSO) and the genetic algorithm (GA) to design a new PSO-GA-BP personal credit risk assessment model. Liu [4] constructed a neural network model BPNN with self-learning ability. Rao et al. [5] studied the credit risk of personal auto loans, combining particle swarm optimization (PSO) with the extreme gradient-boosting (XGBoost) model. Wen et al. [6] used the logistic regression model for credit rating prediction. Zhou et al. [7] proposed a random forest model based on the XGBoost algorithm, where XGBoost is used for feature selection and random forest is used for classification. Chen et al. [8] used the MLP neural network to evaluate the financial credit risk of the supply chain of small and medium-sized enterprises. These studies have achieved very good results, but if they want to be truly applied in real life, there are still some shortcomings, which are specifically reflected in the following aspects:
(1)
Using artificial intelligence technology to solve bank credit risk assessment still faces two major challenges. One is that in the banking field, the data of different banks exist in the form of isolated islands; the other is that banks need to strengthen their data confidentiality and security. In real life, each bank has less local data, so the performance of the trained model cannot meet the requirements of the credit evaluation task. If the data of various banks are integrated and then model training is carried out, although the performance of the model can be improved, it will also bring about another problem, that is, data privacy leakage.
(2)
Nowadays, the quality of internet data is uneven, and the same is true for the data of various banks. There are many invalid data in the data. Most credit risk assessment models do not evaluate and screen the quality of user data during model training, resulting in poor quality user data being applied to the model training process, resulting in a decline in model performance.
(3)
At present, most of the credit risk assessment models still belong to the two-way decision, that is, for the assessment results, there are only good credit (positive domain) or bad credit (negative domain), and for uncertain samples, the samples will be forcibly divided into positive domain or into negative domain, where there is a high probability of misjudgment and it will cause losses to the bank.
The above problems need to be urgently improved. To this end, we designed a federated three-way decision incremental naive Bayes bank user credit risk assessment model (FTwNB) that supports asymmetric encryption for bank user credit risk assessment, which effectively addresses the shortcomings of current research. The main contributions of this paper are as follows:
(1)
Use federated learning to break the status of each bank’s data islands, realize data information exchange, and at the same time, ensure the accuracy of the model and the privacy of the data. Moreover, the federation strategy under FTwNB is a non-destructive federation strategy, that is, the federation calculation result will not decrease compared with the result of first integrating data and then training.
(2)
Introduce an asymmetric encryption algorithm to further strengthen the privacy protection capabilities of each bank during data interaction, adding another layer of protection for data security.
(3)
Use the three-way incremental naive Bayes classifier (3WD-INB) to realize the three-way classification of the credit of the user to be evaluated, which solves the problem that the two-way classification is difficult to deal with uncertain samples, and the three-way classification process is more in line with the human thought process and practical application, that is, for certain samples, a clear answer (good or bad credit risk) is given directly, and for uncertain samples, further manual review is required to reduce the error rate.
(4)
With the help of the incremental learning feature of the 3WD-INB, the quality of the data held by the bank can be evaluated and screened to ensure the high quality of the data, thereby improving the performance of the model.
The structure of this paper is as follows. In the Section 2, some related work on federated learning and three-way decisions in bank user credit risk assessment and other fields is introduced. In the Section 3, the algorithm principle and establishment process of the FTwNB model are introduced in detail. In the Section 4, the FTwNB model is simulated and its parameters analyzed. Finally, we conclude this paper in the Section 5 and propose future work.

2. Related Work

2.1. Federated Learning and Its Applications

Federated learning (FL) was proposed by Google [9] in 2016. Because of its potential to protect private data, federated learning has attracted extensive attention from experts and scholars at home and abroad, and it has been widely used in many fields.
Federated learning has achieved good results, especially in the medical and financial fields. For example, in the medical field, in order to solve the problem of performance degradation in medical image classification, Zhao et al. [10] proposed a novel federated-learning method for distributed information sharing (FedDIS) and verified the performance of the algorithm using the Alzheimer’s disease MRI dataset. In order to detect COVID-19, Ho et al. [11] used chest X-ray images and symptom information combined with a convolutional neural network to build a federated-learning system, which successfully improved the detection accuracy while ensuring that the data were not shared. Also, in order to detect COVID-19, Kandati et al. [12] proposed a novel hybrid algorithm named the genetic clustered FL (Genetic CFL) and proved that the Genetic CFL method is superior to the traditional AI method. In the financial field, Tian et al. [13] established a federated XGBoost model to predict credit card defaults in order to reduce the occurrence of credit card defaults, and they proposed an optional split extraction model for unbalanced datasets. In the research field of credit-scoring models, He et al. [14] proposed a decentralized multi-party method based on logistic regression to implement multi-party collaborative model training in order to use multi-source information for credit scoring while ensuring data privacy. In addition, federated learning is also widely used in other fields. For example, Qin et al. [15] combined federated learning with wireless communication and used the communication between the central server and distributed local clients to train and optimize the model, and they achieved good results. Jin et al. [16] used the federated-learning framework for network traffic classification. Kanagavelu et al. [17] used the federated UNet model to semantically segment satellite images and street view images, and they finally realized a land use classification. Lv et al. [18] constructed a federated random forest algorithm to realize the classification of ship AIS trajectory data and added a homomorphic encryption process to protect the data in the federated process.
Overall, federated learning has applications in various fields, and federated learning reasonably balances the cooperation and competition, data and privacy in the current social development, and it is a valuable promotional machine-learning method. However, although federated learning has achieved satisfactory research results in many fields, in the field of user credit risk assessment, it is also very necessary to join federated learning to break the island of bank data. However, the penetration of federated learning in this field is still shallow, still in the initial stage of development, and there are few related research studies.

2.2. Three-Way Decision and Its Application

The three-way decision is a decision-making method summarized by Yao [19] in the research process of rough set theory. The general idea is to improve the domain of discourse from the traditional one divided into two domains (positive domain and negative domain) to one divided into three domains (positive domain, negative domain and boundary domain) and adopt different decision-making methods for different domains, divide and rule. For the credit risk assessment task of bank users, the application mode of three decisions is shown in Figure 1, which is a new theory in line with human cognitive mode.
The three-way decision is similar to federated learning and has achieved good research results in many fields, such as text classification and medical diagnosis. For example, Zhang et al. [20] and Ma et al. [21] both applied the idea of three-way decision to text emotion classification, allowing the samples with better judgment to be judged directly and the samples with harder judgment to undergo delayed judgment, which well solved the problem of the high misjudgment rate in traditional two-way decisions. Yue et al. [22] added the three-way decision in the field of medical image diagnosis, combined with a convolutional neural network to construct the EviDCNN-3WC model to realize a three-way classification. The three-way classification method can effectively identify uncertain images and reduce the risk of image classification. Li et al. [23] also combined three-way decision ideas for software defect prediction and transformed the software defect prediction problem into three types of decisions: determining defective software, determining non-defective software and uncertain software, which has better accuracy and reliability. Yuan et al. [24] designed a three-way decision spam-filtering method based on mail header information. In addition to dividing mail into normal mail and spam, it allows users to further check uncertain mail. Experiments show that it reduces the misclassification rate. Zhou et al. [25] also applied the three-way decision to mail filtering. In addition to the three-way classification field, the three-way decision is also widely used in the recommendation system. For example, Zhang et al. [26] combined random forest and the three-way decision to construct a three-way recommendation system and verified that the performance of the three-way recommendation system is better than that of the traditional system. Zhang et al. [27] integrated the naive Bayes, three-way decision and collaborative filtering algorithm, and they proposed a three-way naive Bayes collaborative filtering recommendation model (3NBCFR), which was used for film recommendation, effectively reducing the recommendation cost and improving the recommendation quality. In research related to credit risk assessment, Maldonado et al. [28] constructed a three-way decision credit-scoring model based on probabilistic rough sets and conducted extensive case studies on the credit applications of more than 7000 micro-enterprises in Chile.
Overall, because the three-way decision conforms to the human cognitive model and can be widely accepted in production and life, the three-way decision has become the focus of research by many scholars. However, judging from the current research situation, although the three-way decision can improve the classification performance, there are few related studies in the field of credit risk assessment, and the current research does not combine the three-way decision and federated learning at the same time. Therefore, it is necessary to establish a three-way credit risk assessment model based on federated learning.

3. Model Building

3.1. Data Description and Data Preprocessing

The datasets used in the simulation experiment in this paper are the German Credit dataset and the Default of Credit Card Clients dataset from UCI. The German Credit dataset is a dataset that predicts the tendency of loan default according to the bank loan information of individuals and the occurrence of loan delinquencies of applicants. The dataset contains 1000 data items in 19 dimensions, and the data are labeled “good” or “bad”. The Default of Credit Card Clients dataset contains the default payment information of credit card customers in Taiwan. The dataset contains 30,000 pieces of data in 23 dimensions, labeled “paid” or “not paid”.
When conducting simulation experiments, the original dataset is split according to the ratio of the training set/test set = 3:1, and the training set is split to simulate that different banks (clients) have different data. In order to avoid chance, we designed five simulation situations, and each time we used different random seeds to disrupt the original data, as shown in Table 1.

3.2. FTwNB Local Model

The local model under the framework of the FTwNB algorithm is the 3WD-INB. The 3WD-INB was proposed by Yang et al. [29]. The algorithm uses the idea of the three-way decision to realize the three-way classification of samples, making the classification process more in line with people’s thought process. At the same time, combined with incremental learning, it filters the samples with poor data quality in the training data and further improves the classification performance of the model. The model process of the 3WD-INB is shown in Figure 2.

3.2.1. Initial Training Process

First, the 3WD-INB model splits the training dataset into a training set and an incremental training set, and the split ratio is generally 1:1. The training set is used for the initial training of the model, and the incremental training set is used for the incremental training of the model. The initial training stage mainly uses the Bayesian principle, as follows:
A local bank dataset contains N users: U = { x 1 , x 2 , , x N } , the training set contains 19 attributes: A = { a 1 , a 2 , , a 19 } , the categories of data labels are 2 types: C 1 , C 2 (Indicates good or bad credit). Represent the training sample x h as a 19-dimensional feature vector x h = { v 1 , h v 2 h , , v 19 h } , v i h represents the value of sample x h in attribute a i . Then, according to the Bayesian theorem, the posterior probability P C c x h can be obtained, as shown in (1).
P C c x h = P C c P x h C c P x h
In order to avoid m a i , v i h C c = 0 caused by values that do not appear in the training samples in the test samples, the Laplacian smoothing operation is used, and the calculation formulas of P C c and P v i h C c after smoothing are shown in (2) and (3).
P C c = C c + 1 U + 2 , c = 1,2
P v i h C c = m a i , v i h C c + 1 C c + a i , c = 1,2
Among them, U is the total number of training samples, C c is the number of samples of category C c in the training samples, and m a i , v i h C c represents the collection of objects whose value is v i h on the i-th attribute in C c .
Finally, the smoothed P C c and P v i h C c are used to classify the samples, and the probability that the credit rating of the users to be predicted is good or bad can be obtained by substituting them into Formula (1).

3.2.2. Incremental Learning Process

The incremental learning process utilizes the incremental characteristics of naive Bayes. In this process, the confidence degree ( θ ) is introduced to filter the poor samples and ensure the high quality of the training data.
Suppose the training set is U = x 1 , x 2 , , x N and the incremental training set is E = e 1 , e 2 , , e M . The essence of the 3WD-INB incremental learning process is to add samples with higher confidence degree ( θ ) in the incremental training set to the training set, and remove samples with lower confidence. Simultaneously update P C c , P C ¯ c , P v i h C c and P v i h C ¯ c , the process of incremental learning is as follows.
For each user in the training set, a confidence degree θ is introduced, and when the confidence degree θ j of user e j satisfies (4), the user data are added to the training set U .
θ j = max P C c i = 1 n P v i h C c , 1 j M
θ j γ i = 1 l θ i , 1 l M
Among them, γ is the confidence coefficient, generally γ 0.5 , 1 .
When the incremental training sample e j is added to the training set U , the updated formula of P C c , P C ¯ c is:
P C c = N + K 1 + N + K P C c ,   C b C c N + K 1 + N + K P C c + 1 1 + N + K ,   C b = C c P C ¯ c = N + K 1 + N + K P C ¯ c , C b = C c N + K 1 + N + K P C ¯ c + 1 1 + N + K , C b C c
The updated formulas of P v i h C c and P v i h C ¯ c are as follows:
P v i h C c = λ 1 + λ P v i h C c , C b = C c v c i v i h λ 1 + λ P v i h C c + 1 1 + λ , C b = C c v c i = v i h P v i h C c , C b C c P v i h C ¯ c = λ 1 + λ P v i h C c , C b C c v c i v i h λ 1 + λ P v i h C c + 1 1 + λ , C b C c v c i = v i h P v i h C c , C b = C c

3.2.3. Three-Way Classification Process

The process for the three-way classification is completed through the parameters α and β (generally: 0 β α 1 )), and the P C c ,   P C ¯ c , P v i h C c and P v i h C ¯ c obtained from the model training.
According to the derivation by Yang et al. [29], the expressions of the POS domain, NEG domain and BND domain of the 3WD-INB are shown in (8). According to this classification rule, the three-way classification of sample x h can be realized. The POS domain, NEG domain and BND domain, respectively, indicate that the credit evaluation result of the user is good, bad and delayed judgment. The user bank evaluated as delayed judgment can conduct a further review and then determine its credit evaluation result.
P O S α , β C c = x h P α N E G α , β C c = x h P β B N D α , β C c = x h β P α
In (8), the calculation of P is similar to the calculation of the sample probability in the Bayesian principle. The calculation process is shown in (9). During the calculation process, the method of taking logarithms on both sides is used to improve the calculation efficiency.
P = i = 1 n log P v i h C c P v i h C ¯ c
α and β are calculated based on parameters α and β , and the calculation process is shown in (10). In practical applications, the client can reasonably adjust the parameters α and β according to its own situation, so as to achieve the most efficient classification.
α = log P C ¯ c P C c + log α 1 α β = log P C ¯ c P C c + log β 1 β

3.3. Federation Strategy

The federated process under the FTwNB algorithm framework is reflected in the incremental learning process, as shown in Figure 3, which is very similar to the local model in Section 3.1. The difference is that after the local client completes the incremental learning, the results of the incremental learning are uploaded to the federated server. The federated server is mainly divided into two parts: the calculation server and the parameter server. The calculation server performs federated calculations on the results from multiple bank clients, and the parameter server is responsible for storing and recording global parameters and returning them to each bank client.
Assuming that the updated training set based on incremental learning is U = x 1 , x 2 , , x N , we need to extract the parameters we need to federate according to η = η 1 , η 2 , , η 19 . The type of feature values that appear under each feature m = v i h ; η i is a matrix with 2 rows and m columns, and the value corresponding to each position of the matrix represents the number of samples under the feature and category. In simple terms, η is the statistical result of the training set U according to the characteristics and categories.
When the computing server receives federated parameters η from different bank clients, it will sum them to obtain the global parameter η , as shown in (11).
η = η 1 + η 2 + + η r
where η r represents the federated parameters from the r-th bank client.
Then, P C c and P v i h C c can be recalculated according to η , as shown in (12) and (13).
P C c = C c U
P v i h C c = m a i , v i h C c C c
Among them, U is the total number of training samples, C c is the number of samples in the category dimension C c in the training samples, m a i , v i h C c represents a collection of objects in C c whose value is v i h on the i-th attribute, and the above parameters can be obtained through η .
Finally, pass the global parameters P C c and P v i h C c calculated by the calculation server to the parameter server, the parameter server returns the global parameter to each bank client, and the bank client can use the new global parameter to carry out the three-way classification.
The global parameters calculated using the FTwNB algorithm framework’s federation strategy are consistent with the results obtained through direct data integration calculations. However, this strategy only transmits the characteristic statistical results of the data. Therefore, the FTwNB algorithm framework’s federation strategy is lossless. It is worth noting that the classifier used in this paper includes an incremental learning step. Therefore, in the case of the federation, the dataset eliminated by each bank client may differ from the dataset eliminated by the 3WD-INB classifier integrated with data training. Even though the FTwNB algorithm framework employs a lossless federation strategy, the global model of the FTwNB and the 3WD-INB model trained with integrated data will produce different results.

3.4. Asymmetric Encryption

Asymmetric encryption is an encryption method that uses a pair of keys (public and private keys) for encryption and decryption. Its security is based on a type of mathematics considered difficult, meaning that encrypted data cannot be cracked without knowing the private key. In asymmetric encryption, the public key can be released publicly to encrypt data, while the private key must remain secret to decrypt data. When encrypting, use the public key to encrypt the data, and when decrypting, use the private key to decrypt the encrypted data.
In this paper, with the help of asymmetric encryption technology, the local parameters of each bank are first encrypted and then uploaded. The process is shown in Figure 4. By using asymmetric encryption technology, it is ensured that even if the local parameters are maliciously stolen during the upload, the problem of privacy leakage will not occur due to the fact that the thief does not have the corresponding private key for the local parameter. This effectively protects the privacy of the data, which is of great significance in practical applications. This article uses the Cryptography library to implement the asymmetric encryption, which provides complete cryptographic functionality.

3.5. Establishment of FTwNB

Under the FTwNB algorithm framework, the bank credit prediction is mainly carried out according to the process shown in Figure 5.
The specific steps and explanations of the FTwNB are as follows:
Step 1: The calculation server configures an independent private key and public key for each bank client, which are used for data encryption by the bank client.
Step 2: For each bank client, according to Formulas (1)–(7) in Section 3.1, use the existing data of each bank for local FTwNB model training, obtain the required federated parameter η after incremental learning and use the private key for parameter encryption.
Step 3: Each bank client uploads the encrypted parameters to the computing server, and the computing server decrypts and calculates the global parameters using Formulas (11)–(13).
Step 4: The calculation server transmits the global parameters to the parameter server, and the parameter server records and saves the global parameters and returns them to each bank client.
Step 5: The bank client re-updates the parameters of the local model according to the global parameters, and obtains the global model according to Formulas (8)–(10).
Step 6: Each bank client uses the global model to evaluate the credit risk of new customers.

4. Simulation Results and Analysis

All the simulation results in this paper are obtained by programming in Python language under the environment of Intel(R) Core(TM) i7-10750H CPU @ 2.60 GHz 2.59 GHz, RAM 16 GB.

4.1. Evaluation Indicators

For the evaluation indicators under the three-way decision systems, drawing on the evaluation methods of Yang et al. [29] and Jia et al. [30], the F1 score ( F 1 ) and precision ( P r e c i s i o n ) are used to evaluate the classification performance. The confusion matrix of the three decisions is shown in Table 2. In the table, n x y represents the number of samples that are actually classified as category x and identified as category y .
The accuracy rate ( A C C ) describes the overall classification accuracy rate, and the calculation formula is shown in (14).
A C C = n P P + n N N n P P + n P N + n N P + n N N
The recall rate ( R e c a l l ) is the ability of the classifier to find all the positive samples, and its value is 1 at best and 0 at worst. For the three-way decision-making, it is necessary to consider the positive samples divided into the NEG domain at the same time, and the calculation formula is shown in (15).
R e c a l l = n P P n P P + n B P + n N P
The F1 score ( F 1 ) can be regarded as a harmonic average of the model precision and recall, and its maximum value is 1 and its minimum value is 0. The calculation formula is shown in Formula (16).
F 1 = 2 × A C C × R e c a l l A C C + R e c a l l
Precision ( P r e c i s i o n ) is the ability of the classifier not to mark negative examples as positive examples. The Precision value is 1 at best and 0 at worst. The calculation formula is shown in (17).
P r e c i s i o n = n P P n P P + n P N

4.2. Parameter Setting and Analysis of Experimental Results

4.2.1. Parameter Setting Method

The important parameters in the FTwNB model are α and β , which will directly affect the performance of the credit classification. In order to explore the optimal combination of α and β parameters, this paper uses a relatively simple grid search method for parameter optimization due to the small amount of parameters.
Firstly, a parameter grid is constructed to define the parameters. Both α and β take 0.05 as the step size, increasing from 0.05 to 0.95.
Using the average value of the F 1 and P r e c i s i o n as the evaluation function, calculate the evaluation function values corresponding to all the grids, and the parameter combination ( α , β ) corresponding to the maximum value is the optimal parameter.
Taking the FTwNB model in simulation case 5 as an example, the gridded search results are shown in Figure 6. It can be seen from the figure that the optimal parameter value combination ( α , β ) is (0.60, 0.05).
Then, we searched for the optimal parameter combination of all the models in all the simulation situations, and the results are shown in Table 3 and Table 4. The follow-up experimental results were carried out under the optimal parameter combination.

4.2.2. Analysis of Experimental Results

After the simulation experiments on the five simulation situations, the final results of the FTwNB obtained by using the German Credit dataset are shown in Table 5 and Table 6, which represent the results of the cases where the number of clients is 4 and 5, respectively.
On the German Credit dataset, compared with the local client model, for simulation case 1, the FTwNB has a more obvious improvement in the P r e c i s i o n , the highest is increased from the original 0.8133 to 0.9044, an increase of 11.2% from the original. For the F 1 , compared to the optimal value of 0.8072 for optimal client 3, the FTwNB decreased by 0.0043, but it brought an increase of 0.0823 in the P r e c i s i o n for client 3. Overall, the performance of the FTwNB model still improved. For simulation case 2, the F 1 and P r e c i s i o n indicators of the FTwNB are higher than those of four local clients, indicating that the FTwNB has achieved a good performance improvement. For simulation case 3, both the F 1 and P r e c i s i o n indicators are improved, and the P r e c i s i o n improvement is relatively large. For simulation case 4, the maximum values of the F 1 and P r e c i s i o n improvements are 0.0635 and 0.0917, respectively. For simulation case 5, the improvement in the F 1 is small, and the improvement in the P r e c i s i o n is relatively large. In order to show the performance improvement of the FTwNB model more clearly, we draw a comparison chart of the model performance improvement, as shown in Figure 7. It can be seen from Figure 7 that the performance improvement of the FTwNB model is relatively stable compared with the local client model, indicating that the FTwNB model can help banks improve the accuracy and stability of credit ratings.
After the simulation experiment of five simulation cases, the final results of the FTwNB obtained by using the dataset of Default of Credit Card Clients are shown in Table 7 and Table 8, which represent the results of the cases with 4 and 5 clients, respectively.
On the Default of Credit Card Clients dataset, compared with the local client model, for simulation case 2, the P r e c i s i o n and F 1 index of the FTwNB are higher than four local clients, and the F 1 indexes are close to integrated training. For simulation cases 1, 3, 4, and 5, the P r e c i s i o n and F 1 indicators have been improved.
Compared with the integrated training model, the comparison is shown in Table 9 and Table 10. It can also be concluded from Table 9 and Table 10 that the federation strategy of the FTwNB will not lose performance in most cases. Although in special cases there is a performance drop, the FTwNB can bring the improvement of data and model security, which is more meaningful for practical applications.

4.2.3. Analysis of Experimental Results of Parameter Sensitivity

The P r e c i s i o n and F 1 indexes of the FTwNB were obtained by using the German Credit dataset, and the confidence parameter gamma was changed by a 0.05 step. The experimental results are shown in Table 11 and Table 12.
The confidence parameter gamma plays a crucial role in the model. It is responsible for screening the quality of bank data to ensure that only high-quality data are used for model training. If the gamma is set too low, poor-quality data will be incorporated, resulting in lower overall model performance. Conversely, if the gamma is set too high, it may mistakenly filter out some of the high-quality data that should have been included, also affecting the overall performance of the model. Therefore, it is critical to find an appropriate gamma value to balance the relationship between data quality and model performance.

5. Conclusions and Future Work

In this paper, we propose a federated three-way decision incremental naive Bayes bank user credit risk assessment model (FTwNB) that supports asymmetric encryption for the user credit evaluation task of bank lending. This paper designs a non-destructive federation strategy to break the data islands between different banks through the idea of federated learning. This paper also uses the 3WD-INB classifier, using incremental learning to filter poor data in the training set, and it uses the three-way decision to achieve the three-way classification of samples, so that the evaluation results of new users can be divided into whether there is or is not. In addition to the credit risk, it also allows banks to conduct a further manual review of uncertain users, reducing the losses caused by the traditional two-branch classification mandatory division. In this paper, the feasibility of the model is verified through five groups of simulations in different situations.
The advantage of this paper is that it combines federated learning with three-way decision ideas, both of which are of great significance for solving practical problems and are very important ideas. At the same time, the FTwNB model constructed in this paper has achieved satisfactory results in improving the classification performance and protecting data privacy. Objectively speaking, the limitation of this paper is that the 3WD-INB classifier makes independent assumptions on the attribute conditions. In real applications, if the real datasets are not handled properly, the performance of the model may be reduced.
In the future, we will consider using the semi-naive Bayesian method or the Bayesian network method to solve the requirement of the independence of attribute conditions, so that the acceptance of data by the FTwNB model will be further improved. At the same time, data imbalances remain a challenge when it comes to credit risk assessment, especially when it comes to data integration across banks. Future work will focus on further improving the model’s ability to handle unbalanced data, such as using diverse enhanced federated learning or deep-learning methods.

Author Contributions

Methodology, S.H.; project administration, L.W. and G.Y.; resources, C.Z.; software, J.F.; supervision, L.W.; validation, J.R.; visualization, Z.Y.; writing—original draft, S.H. and Z.Y.; writing—review and editing, L.W., C.Z. and G.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Basic Scientific Research Business Expenses of Hebei Provincial Universities (JST2022001), the Tangshan Science and Technology Project (22130225G) and the Tangshan Science and Technology Bureau (21130211D).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Altman, E.I. Financial Ratios, Discriminant Analysis and the Prediction of Corporate Bankruptcy. J. Financ. 1968, 23, 589–609. [Google Scholar] [CrossRef]
  2. Altman, E.I.; Haldeman, R.G.; Narayanan, P. ZETATM Analysis A New Model to Identify Bankruptcy Risk of Corporations. J. Bank. Financ. 1977, 1, 29–54. [Google Scholar] [CrossRef]
  3. Zhang, Q. Personal Credit Risk Assessment of Bp Neural Network Commercial Banks Based on PSO-GA Algorithm Optimization. Agro Food Ind. Hi-Tech 2017, 28, 2580–2584. [Google Scholar]
  4. Liu, L. A Self-Learning BP Neural Network Assessment Algorithm for Credit Risk of Commercial Bank. Wirel. Commun. Mob. Comput. 2022, 2022, 9650934. [Google Scholar] [CrossRef]
  5. Rao, C.; Liu, Y.; Goh, M. Credit Risk Assessment Mechanism of Personal Auto Loan Based on PSO-XGBoost Model. Complex Intell. Syst. 2023, 9, 1391–1414. [Google Scholar] [CrossRef]
  6. Wen, H.; Sui, X.; Lu, S. Study on Effect of Consumer Information in Personal Credit Risk Evaluation. Complexity 2022, 2022, 7340010. [Google Scholar] [CrossRef]
  7. Zhou, Y.; Cui, J.; Zhou, L.; Sun, H.; Liu, S. Study on the Evaluation of Personal Credit Risk Based on the lmproved Random Forest Model. Credit. Ref. 2020, 38, 28–32. [Google Scholar]
  8. Chen, X.; Tao, L. Credit Risk Assessment of SME Supply Chain Finance Based on MLP Neural Network. J. Hunan Univ. Sci. Technol. (Nat. Sci. Ed.) 2021, 36, 91–99. [Google Scholar] [CrossRef]
  9. Konečný, J.; McMahan, H.B.; Yu, F.X.; Richtárik, P.; Suresh, A.T.; Bacon, D. Federated Learning: Strategies for Improving Communication Efficiency. arXiv 2017, arXiv:1610.05492. [Google Scholar]
  10. Zhao, L.; Huang, J. A Distribution Information Sharing Federated Learning Approach for Medical Image Data. Complex Intell. Syst. 2023, 9, 5625–5636. [Google Scholar] [CrossRef]
  11. Ho, T.-T.; Tran, K.-D.; Huang, Y. FedSGDCOVID: Federated SGD COVID-19 Detection under Local Differential Privacy Using Chest X-Ray Images and Symptom Information. Sensors 2022, 22, 3728. [Google Scholar] [CrossRef] [PubMed]
  12. Kandati, D.R.; Gadekallu, T.R. Genetic Clustered Federated Learning for COVID-19 Detection. Electronics 2022, 11, 2714. [Google Scholar] [CrossRef]
  13. Tian, J.; Tsai, P.-W.; Wang, F.; Zhang, K.; Xiao, H.; Chen, J. An Optional Splitting Extraction Based Gain-AUPRC Balanced Strategy in Federated XGBoost for Mitigating Imbalanced Credit Card Fraud Detection. Int. J. Bio-Inspired Comput. 2022, 20, 82–93. [Google Scholar] [CrossRef]
  14. He, H.; Wang, Z.; Jain, H.; Jiang, C.; Yang, S. A Privacy-Preserving Decentralized Credit Scoring Method Based on Multi-Party Information. Decis. Support. Syst. 2023, 166, 113910. [Google Scholar] [CrossRef]
  15. Qin, Z.; Li, G.Y.; Ye, H. Federated Learning and Wireless Communications. IEEE Wirel. Commun. 2021, 28, 134–140. [Google Scholar] [CrossRef]
  16. Jin, Z.; Liang, Z.; He, M.; Peng, Y.; Xue, H.; Wang, Y. A Federated Semi-Supervised Learning Approach for Network Traffic Classification. Int. J. Netw. Manag. 2023, 33, e2222. [Google Scholar] [CrossRef]
  17. Kanagavelu, R.; Dua, K.; Garai, P.; Thomas, N.; Elias, S.; Elias, S.; Wei, Q.; Yong, L.; Rick, G.S.M. FedUKD: Federated UNet Model with Knowledge Distillation for Land Use Classification from Satellite and Street Views. Electronics 2023, 12, 896. [Google Scholar] [CrossRef]
  18. Lu, G.; Hu, X.; Yang, M.; Xu, M. Ship AIS Trajectory Classification Algorithm Based on Federated Random Forest. Netinfo Secur. 2022, 22, 67–76. [Google Scholar] [CrossRef]
  19. Zhou, B.; Yao, Y.; Luo, J. A Three-Way Decision Approach to Email Spam Filtering. In Proceedings of the Advances in Artificial Intelligence: 23rd Canadian Conference on Artificial Intelligence, Canadian AI 2010, Ottawa, ON, Canada, 31 May–2 June 2010; Proceedings 23. Springer: Berlin/Heidelberg, Germany, 2010; pp. 28–39. [Google Scholar]
  20. Zhang, Y.; Miao, D.; Zhang, Z. Multi-granularity Text Sentiment Classification Model Based on Three-way Decisions. Comput. Sci. 2017, 44, 188–193+215. [Google Scholar]
  21. Ma, X.; Huang, C.; Jiang, C. Progressive text classification based on three-way decision and KNN algorithm. Appl. Res. Comput. 2023, 40, 1065–1069. [Google Scholar] [CrossRef]
  22. Yue, X.; Chen, Y.; Yuan, B.; Lv, Y. Three-Way Image Classification with Evidential Deep Convolutional Neural Networks. Cogn. Comput. 2022, 14, 2074–2086. [Google Scholar] [CrossRef]
  23. Li, W.; Huang, Z.; Li, Q. Three-Way Decisions Based Software Defect Prediction. Knowl.-Based Syst. 2016, 91, 263–274. [Google Scholar] [CrossRef]
  24. Yuan, G.; Yu, H. Method of Three-way Decision Spam Filtering Based on Head lnformation of E-mail. Comput. Sci. 2017, 44, 74–77+114. [Google Scholar]
  25. Zhou, B.; Yao, Y.; Luo, J. Cost-Sensitive Three-Way Email Spam Filtering. J. Intell. Inf. Syst. 2014, 42, 19–45. [Google Scholar] [CrossRef]
  26. Zhang, H.-R.; Min, F. Three-Way Recommender Systems Based on Random Forests. Knowl.-Based Syst. 2016, 91, 275–286. [Google Scholar] [CrossRef]
  27. Zhang, C.; Duan, X.; Liu, F.; Li, X.; Liu, S. Three-Way Naive Bayesian Collaborative Filtering Recommendation Model for Smart City. Sustain. Cities Soc. 2022, 76, 103373. [Google Scholar] [CrossRef]
  28. Maldonado, S.; Peters, G.; Weber, R. Credit Scoring Using Three-Way Decisions with Probabilistic Rough Sets. Inf. Sci. 2020, 507, 700–714. [Google Scholar] [CrossRef]
  29. Yang, Z.; Ren, J.; Zhang, Z.; Sun, Y.; Zhang, C.; Wang, M.; Wang, L. A New Three-Way Incremental Naive Bayes Classifier. Electronics 2023, 12, 1730. [Google Scholar] [CrossRef]
  30. Jia, X.; Shang, L. How to Evaluate Three-Way Decisions Based Binary Classification? In Proceedings of the Rough Sets, Fuzzy Sets, Data Mining, and Granular Computing: 15th International Conference, RSFDGrC 2015, Tianjin, China, 20–23 November 2015; Proceedings. Springer: Berlin/Heidelberg, Germany, 2015; pp. 366–375. [Google Scholar]
Figure 1. The three-way decision model diagram of this article.
Figure 1. The three-way decision model diagram of this article.
Mathematics 12 01695 g001
Figure 2. The 3WD-INB model schematic diagram.
Figure 2. The 3WD-INB model schematic diagram.
Mathematics 12 01695 g002
Figure 3. Federal process.
Figure 3. Federal process.
Mathematics 12 01695 g003
Figure 4. Encryption process.
Figure 4. Encryption process.
Mathematics 12 01695 g004
Figure 5. The flow of the FTwNB algorithm.
Figure 5. The flow of the FTwNB algorithm.
Mathematics 12 01695 g005
Figure 6. Parametric search results for the FTwNB.
Figure 6. Parametric search results for the FTwNB.
Mathematics 12 01695 g006
Figure 7. Model performance improvement comparison chart.
Figure 7. Model performance improvement comparison chart.
Mathematics 12 01695 g007
Table 1. Description of simulated data.
Table 1. Description of simulated data.
Case NumberNumber of ClientsPercentage of Data Owned
144:4:3:4
24188:188:187:187
351:1:1:1:1
4510:8:20:12:25
552:3:2:3:5
Table 2. The confusion matrix of the three-way decision.
Table 2. The confusion matrix of the three-way decision.
Actually a Positive ExampleActually a Negative Example
Predicted as POS domain n P P n P N
Predicted as BND domain n B P n B N
Predicted as NEG domain n N P n N N
Table 3. Optimal parameter combination for the German Credit dataset.
Table 3. Optimal parameter combination for the German Credit dataset.
ModelCase 1Case 2Case 3Case 4Case 5
Client 1(0.65, 0.05)(0.70, 0.05)(0.20, 0.05)(0.75, 0.05)(0.15, 0.05)
Client 2(0.55, 0.05)(0.40, 0.10)(0.35, 0.10)(0.65, 0.05)(0.30, 0.10)
Client 3(0.70, 0.05)(0.80, 0.05)(0.25, 0.05)(0.80, 0.10)(0.25, 0.05)
Client 4(0.75, 0.05)(0.60, 0.05)(0.10, 0.05)(0.60, 0.05)(0.15, 0.10)
Client 5--(0.10, 0.05)(0.80, 0.10)(0.25, 0.05)
Integrating Training 1(0.75, 0.05)(0.40, 0.05)(0.40, 0.05)(0.75, 0.05)(0.25, 0.15)
FTwNB(0.65, 0.05)(0.30, 0.05)(0.35, 0.05)(0.60, 0.05)(0.60, 0.05)
1 Integrating Training: It means that the data of all the clients are first integrated into a large dataset and then modeled using the 3WD-INB.
Table 4. Optimal parameter combination for the Default of Credit Card Clients dataset.
Table 4. Optimal parameter combination for the Default of Credit Card Clients dataset.
ModelCase 1Case 2Case 3Case 4Case 5
Client 1(0.65, 0.05)(0.30, 0.05)(0.10, 0.05)(0.20, 0.05)(0.25, 0.05)
Client 2(0.75, 0.05)(0.75, 0.05)(0.10, 0.05)(0.10, 0.05)(0.20, 0.05)
Client 3(0.75, 0.05)(0.45, 0.05)(0.10, 0.05)(0.15, 0.05)(0.35, 0.15)
Client 4(0.65, 0.05)(0.30, 0.05)(0.15, 0.05)(0.10, 0.05)(0.30, 0.10)
Client 5--(0.15, 0.05)(0.10, 0.05)(0.30, 0.10)
Integrating Training (0.65, 0.05)(0.35, 0.05)(0.10, 0.05)(0.10, 0.05)(0.30, 0.10)
FTwNB(0.65, 0.05)(0.20, 0.05)(0.10, 0.05)(0.10, 0.05)(0.30, 0.10)
Table 5. The result of the number of clients being 4.
Table 5. The result of the number of clients being 4.
ModelCase 1Case 2
F 1 P r e c i s i o n F 1 P r e c i s i o n
Client 10.77800.83670.77810.7688
Client 20.80460.81330.80710.7653
Client 30.80720.82210.75780.8158
Client 40.74460.82140.77420.7904
Integrating Training0.80090.90370.82300.7711
FTwNB0.80290.90440.81450.8258
Table 6. The result of the number of clients being 5.
Table 6. The result of the number of clients being 5.
ModelCase 3Case 4Case 5
F 1 P r e c i s i o n F 1 P r e c i s i o n F 1 P r e c i s i o n
Client 10.82030.76780.73720.81330.84820.7447
Client 20.84440.78100.77910.82800.84900.7682
Client 30.83640.76360.75980.87500.83840.7799
Client 40.83290.71950.79170.79010.85160.7489
Client 50.82630.71540.77000.86010.84020.7522
Integrating Training0.83910.80300.78620.88650.85840.7632
FTwNB0.84540.81870.80070.87840.85600.7915
Table 7. The result of the number of clients being 4.
Table 7. The result of the number of clients being 4.
ModelCase 1Case 2
F 1 P r e c i s i o n F 1 P r e c i s i o n
Client 10.85930.85920.8680 0.8383
Client 20.85340.86460.8483 0.8585
Client 30.85180.86910.8634 0.8435
Client 40.85970.86030.8661 0.8371
Integrating Training0.85940.87030.87580.8598
FTwNB0.86430.87120.87540.8696
Table 8. The result of the number of clients being 5.
Table 8. The result of the number of clients being 5.
ModelCase 3Case 4Case 5
F 1 P r e c i s i o n F 1 P r e c i s i o n F 1 P r e c i s i o n
Client 10.87030.82520.87690.83650.88050.8287
Client 20.87080.82520.87660.82760.88000.8313
Client 30.87070.82330.87640.83590.88100.8282
Client 40.87050.82880.87720.82700.88040.8279
Client 50.86950.83140.87690.82750.87990.8306
Integrating Training 0.87510.83470.88650.84780.88040.8500
FTwNB0.87840.84180.89470.85630.88730.8566
Table 9. Comparison of the FTwNB and integrated training model on the German Credit dataset.
Table 9. Comparison of the FTwNB and integrated training model on the German Credit dataset.
F 1 P r e c i s i o n
Case 1upgradeupgrade
Case 2declineupgrade
Case 3upgradeupgrade
Case 4upgradedecline
Case 5declineupgrade
Table 10. Comparison of the FTwNB and integrated training model on the Default of Credit Card Clients dataset.
Table 10. Comparison of the FTwNB and integrated training model on the Default of Credit Card Clients dataset.
F 1 P r e c i s i o n
Case 1upgradeupgrade
Case 2declineupgrade
Case 3upgradeupgrade
Case 4upgradeupgrade
Case 5upgradeupgrade
Table 11. The F 1 index change of the FTwNB.
Table 11. The F 1 index change of the FTwNB.
Gamma Parameter0.500.550.600.650.700.750.800.850.900.951.00
Case 10.80010.80290.80100.80120.79100.78840.79520.77510.78150.76110.7559
Case 20.81110.81200.81450.81010.80150.79910.78500.78800.79150.77740.7615
Case 30.79160.80800.81190.82160.83690.84540.84150.83100.82150.82200.8181
Case 40.74190.76220.78820.80070.79150.78890.79050.78730.78010.77990.7703
Case 50.79730.80050.82190.84150.85530.85600.84590.84100.83160.82440.8133
Table 12. The P r e c i s i o n index change of the FTwNB.
Table 12. The P r e c i s i o n index change of the FTwNB.
Gamma Parameter0.500.550.600.650.700.750.800.850.900.951.00
Case 10.89130.90440.89710.88150.88050.87950.86630.85850.83990.82150.8110
Case 20.81100.82170.82580.81160.80980.80450.79930.78780.75150.77510.7601
Case 30.75990.77520.79510.80930.81050.81870.80590.80010.79510.78590.7751
Case 40.83350.84190.85510.87840.87090.85890.84520.83430.80050.79850.7746
Case 50.72210.73390.75510.76410.78860.79150.78780.77590.76510.75550.7335
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hua, S.; Zhang, C.; Yang, G.; Fu, J.; Yang, Z.; Wang, L.; Ren, J. An FTwNB Shield: A Credit Risk Assessment Model for Data Uncertainty and Privacy Protection. Mathematics 2024, 12, 1695. https://doi.org/10.3390/math12111695

AMA Style

Hua S, Zhang C, Yang G, Fu J, Yang Z, Wang L, Ren J. An FTwNB Shield: A Credit Risk Assessment Model for Data Uncertainty and Privacy Protection. Mathematics. 2024; 12(11):1695. https://doi.org/10.3390/math12111695

Chicago/Turabian Style

Hua, Shaona, Chunying Zhang, Guanghui Yang, Jinghong Fu, Zhiwei Yang, Liya Wang, and Jing Ren. 2024. "An FTwNB Shield: A Credit Risk Assessment Model for Data Uncertainty and Privacy Protection" Mathematics 12, no. 11: 1695. https://doi.org/10.3390/math12111695

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop