4.3.1. Experimental Validation for Chinese SMEs
The data processing workflow began with FRDFS, an algorithm that initially identified and definition removed 86 redundant attributes. After streamlining the dataset, the DBSCAN clustering algorithm was applied, which revealed the structural distribution of the financial entities: 9537 core objects, 3650 boundary objects and 3600 outlier objects. The identification of core objects is particularly important because these objects are considered to support model decisions and reflect typical financial behavior. In contrast, outlier objects are usually equated with noise, which may lead to overfitting, and should be removed from the corresponding noise points.
To visually substantiate the effectiveness of FRDFS, we utilized t-SNE for dimensionality reduction, projecting the multi-dimensional data onto a two-dimensional plane. The resulting visualization is shown in
Figure 3 below:
The visualization effectively demonstrates the clustering tendencies within the dataset post t-SNE reduction, as discerned by the DBSCAN algorithm. Cores are grouped densely, denoting well-defined clusters with sufficient local density to comply with the specified eps and min_samples parameters. The border points form peripheries around these core regions, suggesting a gradation in point density, indicative of the algorithm’s sensitivity in identifying cluster edges. Outliers appear scattered and isolated, reinforcing the robustness of DBSCAN in distinguishing between in-cluster data and noise.
The distribution and separation of clusters provide insight into the dataset’s intrinsic structure and the clustering algorithm’s performance. The clusters vary in size and density, with some clusters being tightly connected, implying a high degree of similarity within these clusters, while others are more dispersed, suggesting variability within the clusters. The presence of outliers reflects the natural disorder within the dataset, emphasising the need for noise filtering for accurate data modelling. The top 20 features in terms of attribute importance screened by FRDFS are shown in the table below:
Table 3 highlights the top 20 attributes crucial for financial analysis of Chinese SMEs, selected through the 3-LRP framework. Attributes such as “Number of Corporate Enforcements” and “Number of Second-Level Corporate Cancellations” offer insights into the companies’ compliance and stability. The “Tencent Lingkun Regulatory Index” and enforcement-related metrics such as “Number of Shareholder Enforcements” emphasize the framework’s depth in capturing regulatory and governance risks. In particular, the inclusion of secondary and third level penetration indicators highlights the efficacy of the 3-LRP in profiling the financial behavior of SMEs and highlights its role in enhancing fraud detection models.
To assess whether the accuracy of our model is fixed or specific to the dataset, we acknowledge that model performance is often influenced by the characteristics of the dataset used. The dataset in this study consists of records from Chinese SMEs, and the accuracy reported is reflective of this specific dataset. Variations in data characteristics, such as feature distributions, data quality, and noise levels, can impact the performance of the model. For the task of enterprise risk identification in containing high noise and high dimensionality, an FRDFS is proposed for feature engineering to help select core features and noise samples for deletion. With a line up to verify the effectiveness of FRDFS, the prediction models with and without FRDFS for feature engineering are compared. The specific experimental results are shown in the following table.
In
Table 4, the impact of FRDFS on classification performance is evident. All models exhibit an increase in Recall 1 upon incorporating FRDFS, demonstrating its effectiveness in boosting the detection of high-risk companies (labeled as ‘1’). LR, for instance, sees an uplift in Recall 1 from 0.66 to 0.71, and a similar trend is observed with KNN and SVM, where Recall 1 increases from 0.68 to 0.82 and 0.61 to 0.82, respectively. This consistent improvement across models underscores the robustness of FRDFS in enhancing model performance, particularly in detecting high-risk companies, which is a critical aspect of financial fraud detection.
Advanced models, such as TabPFN and LGBM, also show significant improvements when augmented with FRDFS, with TabPFN + FRDFS achieving a Recall 1 of 0.93 and LGBM + FRDFS reaching 0.85. These results highlight the capability of FRDFS in leveraging the strengths of advanced models, further enhancing its ability to identify positive cases more accurately. The FDSV model, in particular, stands out with the highest Accuracy of 0.96 and a strong Recall 1 of 0.93, suggesting that FDSV is highly effective in discerning fraudulent behavior. This is central to the objective of our experiment, indicating that the integration of FRDFS substantially boosts the model’s fraud detection capabilities.
The consistent increase in performance metrics, such as Recall 1 and Accuracy, across various models when incorporating FRDFS points to its robustness and adaptability. FRDFS not only improves the detection of high-risk companies but also enhances the overall classification performance, making it a valuable component in financial fraud detection systems. The ability to maintain high performance across different models and datasets further attests to the reliability and effectiveness of our proposed methodology.
To further assess the impact of feature selection on model performance, we conducted a sensitivity analysis focusing on the number of features used in the models, as shown in
Figure 4.
Figure 4 illustrates the effect of varying the number of features on the Accuracy, Recall, and F1-Score of the models.
As shown in
Figure 4, model performance improves as the number of features increases, reaching an optimal point at around 30 features. Beyond this point, additional features result in a gradual decline in performance. This suggests that while more features can provide more information, there is a threshold, beyond which the inclusion of additional features introduces noise and redundancy, negatively impacting model performance.
The sensitivity analysis demonstrates the importance of selecting an optimal number of features to achieve the best model performance. It highlights the robustness of our feature selection methodology, which effectively balances the inclusion of informative features with the exclusion of irrelevant ones. The results further validate the effectiveness of FRDFS in enhancing model performance by ensuring that the most relevant features are retained.
Subsequently, in order to better compare the effects of different models, the ROC curves after feature selection using FRDFS for all models are shown
Figure 5 below:
As shown in
Figure 5, the ROC curve quantitatively describes the classification effect of different models in distinguishing legitimate and suspicious activities of Chinese SMEs. The AUC of the FDSV model is 0.97, which indicates that the model has the capability to distinguish between fraudulent and genuine cases. The AUC of the traditional classifier LR is 0.77, and the AUC of SVM is 0.85, which highlights the stronger differentiation ability of the FDSV model.
The ROC curves in
Figure 4 further illustrate the superior performance of our proposed methodology. The FDSV model achieves an AUC of 0.97, significantly higher than the traditional classifiers LR and SVM, which have AUCs of 0.77 and 0.85, respectively. This high AUC value indicates that FDSV has a better capability of distinguishing between fraudulent and genuine cases, which is crucial for reliable fraud detection. The improvement in AUC values across all models when incorporating FRDFS also underscores its effectiveness in enhancing model performance. The robustness and reliability of our proposed methodology are evident from its ability to consistently deliver high performance in fraud detection tasks, making it a highly effective tool for financial fraud detection systems.
4.3.2. Experimental Validation of 3-LRP
Various attributes of financial transactions, such as the number of previous suspicious activities linked to an account or the frequency of high-value transactions, play a critical role in assessing potential fraud risks. By integrating these attributes, the FDSV model efficiently predicts financial fraud. Pursuing this methodology, a comparative experiment was carried out using the 3-LRP model. The results of this experiment are delineated in the table below.
Table 5,
Table 6 and
Table 7 illustrates the performance differences between single-layer and double-layer penetration methods in assessing financial risks. In the single-layer penetration, a foundational assessment approach is utilized, focusing on a singular risk dimension. Despite its simplicity, this method demonstrates considerable effectiveness, with all key metrics, i.e., accuracy, precision, recall, and F1-score, consistently registering at 0.79. This outcome underscores the value of even basic assessment methods in providing meaningful insights into financial risks.
In contrast, the double-layer penetration method broadens the risk assessment scope by incorporating additional risk dimensions. This expanded approach leads to a notable improvement in accuracy, which rises to 0.87. The precision, recall, and F1-score of the double-layer method, measured at 0.83, 0.75, and 0.86 respectively, also show marked enhancements. These figures indicate a superior ability of the double-layer penetration method in detecting and evaluating risks, compared to its single-layer counterpart. Such an increase in performance metrics highlights the benefits of a more nuanced and comprehensive risk assessment approach in financial fraud detection. The model’s ability to simulate risk is further enhanced with the extension of the penetration relationship to the third layer. It is worth noting that the recall 1 using FDSV has experienced a substantial improvement increase to 0.95, indicating that 3-LRP is already able to model the risks that firms may face in a more complete manner.
The results demonstrate the effectiveness of our proposed method. It is important to note that the accuracy reported in this study is specific to the dataset used, which consists of records from Chinese SMEs. The performance of the model may vary when applied to different datasets due to variations in data characteristics, such as feature distributions, data quality, and the presence of noise. To further validate the robustness and generalizability of our model, future work will involve testing the model on diverse datasets.
4.3.3. Experimental Validation of Credit Risk Assessment Dataset
In the following sections, we delve into the experimental results of different models on financial credit datasets from public FFDs in three different countries: the German GM credit dataset, the Australian credit dataset, and the Japanese CRX credit dataset, again with a binary classification task. The data are collected from Kaggle’s website, the UCI Machine Learning Repository, and others. The performance of the FDSV model is validated with publicly available datasets. The following are the detailed results of these experiments.
As shown in
Figure 6,
Figure 7,
Figure 8,
Figure 9,
Figure 10 and
Figure 11, various models have been evaluated under different datasets to obtain a comprehensive view of the performance of these models in financial fraud detection. By analyzing the precision, recall and F1-scores for both the risk-free and risky categories, together with specific data points, one can draw meaningful conclusions. In the GM dataset, when considering the risk-free category, models such as LGBM and TabPFN exhibit impressive precision, indicating their ability to accurately identify risk-free instances. However, Tabnet stands out with a higher recall of 0.82, emphasizing its effectiveness in capturing true risk-free cases, albeit with a slightly lower precision of 0.76. The balance between precision and recall is evident in their similar F1-scores of 0.79, highlighting the inherent trade-offs in model performance within this category.
The risk categories shown in
Figure 6 are critical in detecting real cases of fraud, and FDSV’s high precision of 0.75 is critical in minimizing false positives and avoiding the mislabeling of normal transactions as fraudulent. FDSV also has a respectable recall of 0.76, which highlights its ability to identify many real cases of fraud. FDSV’s advantage in precision and recall resulted in a high F1-score of 0.75, demonstrating that it is both comprehensive and effective in detecting financial fraud.
In the Australian dataset, within the Risk-Free category, LGBM achieves a precision of 0.77, while TabPFN achieves a precision of 0.79. However, Tabnet stands out with a higher recall of 0.86, highlighting its proficiency in capturing true risk-free cases. This balance between precision and recall is reflected in their similar F1-scores, emphasizing the inherent trade-offs in model performance. Transitioning to the Risk category in the Australian dataset, TabPFN continues to excel with high precision (0.91), essential for minimizing false positives in fraud detection. RF also demonstrates notable improvement in the Risk category, suggesting its effectiveness in identifying risk instances with a precision of 0.86. Once again, FDSV showcases a balanced performance in both precision and recall, maintaining high precision in this category, achieving a precision of 0.82 and an impressive recall of 0.93.
Moving to the CRX dataset, in the Risk-Free category, LGBM achieves a precision of 0.77, and TabPFN achieves a precision of 0.79. However, TabPFN stands out with a higher recall of 0.91, indicating its proficiency in capturing true risk-free cases. As observed in previous datasets, FDSV maintains a balanced performance in both precision and recall, resulting in a competitive F1-score, with a precision of 0.82 and a remarkable recall of 0.93. In the Risk category of the CRX dataset, TabPFN once again excels with high precision (0.85), crucial for reducing false positives in fraud detection. RF maintains its trend of improvement in the risk category, highlighting its suitability for identifying risk instances with a precision of 0.82. Consistently, FDSV demonstrates strength in maintaining high precision in the Risk category, achieving a precision of 0.82.
Then, a case study was conducted to serve as an example study of the FDSV methodology, as shown in the table below.
Table 8 shows the classification probabilities of several samples through different classifiers and the FDSV model. As can be seen from the table, multiple classifiers are on an uncertain classification boundary. For example, for sample 1, the SVM model has a probability of 0.45 for class 1 and 0.55 for class 0, leading to a final classification error. However, TabPFN achieve higher classification confidence on this sample. Therefore, by using the proposed FDSV model, we can ensure that classifiers with higher confidence are given greater ensemble weights, thereby correcting this tendency towards error. In other words, models with greater certainty make a larger contribution to the final prediction outcome.
The experiments conducted across different datasets demonstrate varying model performances. However, amidst this variability, FDSV consistently stands out by delivering competitive and robust results. FDSV’s unique strength lies in its exceptional ability to strike a balance between precision and recall. This crucial balance makes it an ideal choice for comprehensive financial fraud detection, as it excels in correctly identifying fraud cases while effectively minimizing false alarms. FDSV’s dependable performance underscores its potential to enhance the accuracy and reliability of fraud detection systems in the realm of financial security.