Big Data-Driven Distributed Machine Learning for Scalable Credit Card Fraud Detection Using PySpark, XGBoost, and CatBoost

Theodorakopoulos, Leonidas; Theodoropoulou, Alexandra; Tsimakis, Anastasios; Halkiopoulos, Constantinos

doi:10.3390/electronics14091754

Open AccessArticle

Big Data-Driven Distributed Machine Learning for Scalable Credit Card Fraud Detection Using PySpark, XGBoost, and CatBoost

by

Leonidas Theodorakopoulos

,

Alexandra Theodoropoulou

,

Anastasios Tsimakis

and

Constantinos Halkiopoulos

^*

Department of Management Science and Technology, University of Patras, 26334 Patras, Greece

^*

Author to whom correspondence should be addressed.

Electronics 2025, 14(9), 1754; https://doi.org/10.3390/electronics14091754

Submission received: 18 March 2025 / Revised: 17 April 2025 / Accepted: 22 April 2025 / Published: 25 April 2025

(This article belongs to the Special Issue New Advances in Cloud Computing and Its Latest Applications)

Download

Browse Figures

Versions Notes

Abstract

:

This study presents an optimization for a distributed machine learning framework to achieve credit card fraud detection scalability. Due to the growth in fraudulent activities, this research implements the PySpark-based processing of large-scale transaction datasets, integrating advanced machine learning models: Logistic Regression, Decision Trees, Random Forests, XGBoost, and CatBoost. These have been evaluated in terms of scalability, accuracy, and handling imbalanced datasets. Key findings: Among the most promising models for complex and imbalanced data, XGBoost and CatBoost promise close-to-ideal accuracy rates in fraudulent transaction detection. PySpark will be instrumental in scaling these systems to enable them to perform distributed processing, real-time analysis, and adaptive learning. This study further discusses challenges like overfitting, data access, and real-time implementation with potential solutions such as ensemble methods, intelligent sampling, and graph-based approaches. Future directions are underlined by deploying these frameworks in live transaction environments, leveraging continuous learning mechanisms, and integrating advanced anomaly detection techniques to handle evolving fraud patterns. The present research demonstrates the importance of distributed machine learning frameworks for developing robust, scalable, and efficient fraud detection systems, considering their significant impact on financial security and the overall financial ecosystem.

Keywords:

fraud detection; credit cards; machine learning; scalability; XGBoost; CatBoost; PySpark; financial security; big data processing

1. Introduction

Credit card fraud is an escalating threat in the digital financial ecosystem, inflicting substantial financial losses worldwide. As digital transactions proliferate, the complexity and frequency of fraudulent activities demand robust, scalable, and real-time detection systems [1,2]. Machine learning (ML) models have become central to fraud detection because they can identify complex patterns and adapt to evolving behaviors. However, the massive scale of transaction data and the class imbalance inherent in fraud detection tasks require distributed frameworks that can handle these computational challenges effectively [3,4].

Machine learning has been transforming decision-making into an art of precision, adaptability, and effectiveness in various domains. In fraud detection, ML automates the analysis of large and complex data with patterns and relationships that no human could detect [5]. Further, unlike traditional methods, which rely on some empirical rules or rigid algorithms, ML adapts to constantly changing data and provides quicker and more accurate decisions. This adaptability is essential in fraud detection since fraudulent behaviors frequently change to get around detection systems [6,7].

Despite the potential, the application of ML in fraud detection is confronted with serious challenges. The most prominent challenge involves highly imbalanced fraud-related data since fraudulent transactions constitute a tiny percentage of total transactions [8,9]. In addition, the volume of financial data needs a framework capable of handling large-scale processing efficiently. Distributed computing platforms such as PySpark address these demands by enabling scalable and parallelized data processing, reducing latency, and enhancing system performance [10,11,12].

Credit card fraud is developing alongside improving abilities to perform digital transactions, and because of this, it requires sophisticated tools to endure [13,14]. Modern ML models using XGBoost and CatBoost, together with PySpark as a distributed framework for scalable solutions capable of real-time detection and adaptive learning, develop intricate features from historical transaction data to outperform traditional detection systems [15,16]. In turn, these systems face overfitting problems, limited access to data, and an inability to provide real-time responsiveness [17,18,19].

Recent efforts towards overcoming the preceding issues proposed creative solutions based on genetic algorithm-based feature selection techniques, spatial-temporal attention, or other methods involving combinations with distributed learning approaches, like Apache Spark [20,21]. Utilizing this, new analysis techniques enable better coping with larger datasets without an added computational load [22,23,24]. Nevertheless, ever-evolving patterns in fraud schemes and lately scalable algorithms provide continuous grounds for complications concerning effective methods to implement real-world fraud-detecting solutions [25,26,27].

This study proposes an optimized, scalable fraud detection framework by integrating PySpark for distributed processing with advanced ML models (XGBoost and CatBoost). We evaluate the framework’s performance across metrics such as accuracy, calibration (Brier Score), specificity, and latency. Our main objective is to demonstrate how distributed ML pipelines can enhance predictive accuracy, computational efficiency, and real-time responsiveness in credit card fraud detection. This work contributes by benchmarking performance across several algorithms and providing actionable insights into scalable model deployment in financial systems.

2. Literature Review

2.1. Introduction to Credit Card Fraud Detection

Financial institutions nowadays face significant problems with digital transactions, especially fraud detection, specifically fraud detection regarding credit cards. While the digitization of financial services is increasing daily, so are online transactions, thus increasing the risk of fraudulent activities. Recent research [28,29,30] has underscored the rise in credit card fraud, necessitating machine learning methods to identify suspicious transactions swiftly. Financial organizations and businesses invest heavily in developing advanced fraud detection systems, demonstrating their commitment to safeguarding customers and their interests. Their primary goal is to identify and halt potentially fraudulent activities, ensuring the security of all parties involved.

One of the main problems that needs to be addressed in detecting credit card fraud is the imbalance in the datasets. Legitimate transactions seem to outnumber fraudulent ones, which baffles experts who detect fraud. For this reason, generative adversarial networks (GANs) have emerged. They are used as a powerful solution, generating synthetic data to balance the datasets and improve the accuracy of detection rates [30,31]. Machine learning techniques provide a plethora of algorithms suitable for such tasks. Techniques such as Logistic Regression, Decision Trees, Random Forests, and the XGBoost algorithm are some of the most applied algorithms [32]. The deep convolutional neural network (DCNN) design has shown improved detection accuracy, especially when handling large volumes of data [33,34].

Detecting credit card fraud serves several essential purposes. It protects consumers from fake transactions, identity theft, and financial harm. Additionally, it helps protect financial institutions from losses caused by suspicious or fraudulent behavior, thereby supporting the stability of the broader financial system. Detecting and preventing fraud on time can also potentially lower costs for investigating and resolving such incidents. Legal compliance ensures that the proper regulations and standards are followed, depending on the market or the country. Efficient fraud detection helps consumers trust more in the credit card system, promoting more frequent usage of credit cards. Lastly, successful fraud detection significantly boosts the reputation of organizations that issue credit cards and provide payment services by showcasing their dedication to security and customer safety.

Fraud detection systems utilize various techniques to detect and prevent fake transactions in time [35]. These are machine learning algorithms, anomaly detection algorithms, and various behavioral analysis techniques. These techniques and systems remain involved in the rise and further complexity of the different forms fraudulent activities can take. Big data and AI have played an equally important role in improving fraud detection systems since they allow real-time analysis of vast volumes of data with great precision [36]. If we want to prevent fraudulent activities and protect the interests of customers and financial institutions, we need to continue upgrading and evolving detection algorithms and methods.

2.2. Role of Machine Learning in Improving Decision-Making Processes in Fraud Detection

Machine learning has transformed how organizations detect, analyze, and combat fraudulent behaviors, significantly influencing decision-making processes in fraud detection [37]. While advanced machine learning algorithms have demonstrated exceptional precision and speed in analyzing extensive datasets, uncovering concealed patterns, and predicting potential fraudulent transactions, critical challenges in their application persist. Existing research has highlighted the strengths of ML techniques, but the broader analysis of their limitations and systemic gaps often remain underexplored, limiting their practical utility in real-world implementations [38,39,40,41,42,43]. Machine learning’s ability to recognize patterns automatically, mainly through unsupervised models, has proven invaluable in identifying anomalies within large datasets that could indicate fraud [44]. However, relying on historical data and static baselines for training these models often limits their adaptability to evolving fraud tactics. While discussed in the literature, continuous learning mechanisms are rarely applied in dynamic operational environments, compromising the models’ ability to detect novel fraudulent schemes [45,46,47,48,49,50,51].

Predictive modeling has shown substantial potential in improving decision-making by utilizing historical data to forecast fraudulent transactions. Supervised learning models, trained on legitimate and fraudulent datasets, enable organizations to predict and proactively address suspicious activities [52]. Yet, the success of predictive modeling is constrained by the quality of training data, with imbalanced datasets often skewing results [53]. Addressing this imbalance through synthetic data generation methods such as GANs introduces risks of overfitting and reduced generalization, which are seldom critically analyzed in prior studies [54,55,56,57,58]. Dynamic risk scoring systems, powered by machine learning, enable real-time updates to risk assessments by incorporating contextual data, patterns, and behaviors [59]. This real-time evaluation improves decision-making by distinguishing high-risk from low-risk transactions and optimizing resource allocation [60]. However, the literature often overlooks the integration challenges of such systems with existing operational frameworks, reducing their scalability and effectiveness [61,62,63,64].

Anomaly detection, as a specialized application of machine learning, is critical for identifying outliers indicative of fraudulent activity [65]. Establishing normative baselines from historical data allows algorithms to flag deviations, but this approach struggles with detecting sophisticated fraud schemes that do not align with known patterns. The adaptability of anomaly detection systems to continuously changing fraud tactics is an area that requires more robust solutions [66,67,68,69,70,71]. Natural language processing (NLP), an emerging facet of machine learning in fraud detection, extends decision-making capabilities by analyzing textual data such as customer interactions and complaints [72]. This qualitative integration complements numerical fraud detection strategies but faces challenges in processing unstructured data on a large scale. The effectiveness of NLP in fraud detection would benefit from a more detailed exploration of its limitations and integration complexities [73,74,75].

Finally, integrating machine learning with existing business systems amplifies the effectiveness of fraud detection frameworks [76]. However, few studies critically analyze the technical and organizational barriers to such integration. Most approaches assume a standalone application or limited interoperability, neglecting the practical constraints institutions face in deploying these advanced models [77]. Despite these limitations, machine learning remains pivotal in fraud detection, offering capabilities such as improved detection accuracy, real-time analysis, and adaptability to new fraud schemes. However, the field must address systemic challenges, including data imbalance, computational scalability, evolving fraud patterns, and seamless integration with legacy systems, to harness these technologies’ potential fully. Addressing these research gaps will ensure more robust, adaptable, and practical fraud detection systems, bridging the divide between academic innovation and real-world application [78,79].

Based on the literature reviewed in this section, we propose the following hypothesis:

H1:

Advanced machine learning techniques significantly improve decision-making in fraud detection systems compared to traditional analytical methods by providing higher predictive accuracy, better adaptability to evolving fraud patterns, and enhanced real-time responsiveness.

2.3. Distributed Frameworks for Scalable Credit Card Fraud Detection

This integration of PySpark with other advanced machine learning frameworks, such as XGBoost and CatBoost, has empowered credit card fraud detection to process large-scale transactional datasets in a very scalable and efficient manner. PySpark is designed for distributed computing architecture. It partitions data and executes parallel tasks over clusters to overcome computational bottlenecks usually faced in high-dimensional data with iterative model training [80,81,82]. By taking advantage of in-memory computation, PySpark reduces latency in the pre-processing, training, and prediction phases and, hence, is suitable for fraud detection scenarios where real-time analysis is in demand [83,84].

The streaming capability of PySpark optimizes data in real-time by enabling the identification of fraudulent activities at that instant. The design of the data ingestion pipes aims to ensure the least latency. It supports dynamic updates to models during runtime, maintaining the system’s responsiveness to the dynamic nature of fraud patterns [85,86]. These are computationally expensive feature engineering tasks. The generation of interaction terms, aggregated statistics, and time-series-based features is performed in parallel using the PySpark framework. On a large scale, these processes greatly enhance model performance and computational efficiency [87,88,89].

XGBoost is a gradient-boosting algorithm that is highly efficient on structured data and does exceptionally well in cases of high-class imbalance, which is very frequently used in fraud detection scenarios. However, it suffers from computational intensity in the tree construction and feature selection processes, which limits scalability for big data applications [90,91]. PySpark can distribute computation across nodes and, therefore, is much faster for iterative computation tasks such as gradient calculations and optimizations of leaf splits. PySpark can also tune hyperparameters in a distributed manner through grid and random search over nodes for the most critical learning rate parameters, max depth, and tree boosters with optimum training time [92,93,94].

CatBoost is a library that can handle categorical features very efficiently, and it is very easy to integrate into PySpark for fraud detection involving mixed data types. It has encoded categorical variables with target-based approaches and handles missing values, hence reducing the overhead of pre-processing, which is further streamlined by a distributed framework of PySpark [95]. Besides this, PySpark also offers in-memory processing, which speeds up the training phase of CatBoost. Hence, the computation of feature importance and ordered boosting to reduce overfitting is efficient [96,97]. Moreover, CatBoost’s integration with PySpark’s streaming framework enables real-time updates to encodings and model parameters, ensuring adaptability to evolving fraud patterns and enhancing the system’s effectiveness in dynamic operational environments [98,99,100].

Similarly, PySpark with XGBoost and CatBoost enhances the handling of class imbalances by SMOTE and under-sampling on scale across distributed nodes. These techniques generate synthetic samples for minority classes or selectively downsample the majority class, thus enhancing the model’s sensitivity toward fraudulent transactions without compromising computation efficiency [101,102,103]. Also, PySpark accelerates advanced feature representations that are important for modeling subtle patterns of fraudulent behavior [104,105].

While this brings much-needed improvement, the challenges with these integrations are also significant: infrastructure to support distributed computing, such as memory and computational resources, can be very high, especially for smaller institutions [106,107,108], and the seamless integration of PySpark into existing fraud detection systems often requires custom pipelines and deep expertise in both PySpark and the machine learning frameworks [109]. Real-time adaptation is still an area of ongoing development, with many current systems still relying on batch processing or manual retraining to address new fraud patterns [110]. Continuous learning mechanisms, along with PySpark’s support for streaming data, could further enhance the adaptability and robustness of these systems in detecting fraud [111,112,113]. These works exemplify the potentially transformative gains for credit card fraud detection via PySpark and the boosting combinations of XGBoost and CatBoost by presenting a truly scalable solution that efficiently offsets computational demands at high predictive accuracies [114,115].

Finally, the concentric diagram (Figure 1) locates the significant components, challenges, and future directions to improve credit card fraud detection using advanced machine learning and distributed frameworks. The proposed concentric structure organizes the elements into core areas, focusing on their interrelatedness. It points out the key features of the PySpark framework for handling large-scale data in real-time analysis, along with adaptive learning combined with advanced machine learning techniques: Logistic Regression, XGBoost, CatBoost, Decision Trees, and deep convolutive neural networks. They pinpoint the unequivocal challenges of real-time processing, data access, overfitting, and class imbalance that provide the bases that must be covered to ensure system robustness. It identifies correspondingly advanced anomaly detection, application in a live environment, cross-platform applicability, and continuous learning mechanisms that point toward future directions in evolving fraud detection systems. Based on the discussed scalability features of distributed frameworks, we propose the following hypothesis:

H2:

Integrating distributed frameworks such as PySpark with advanced machine learning algorithms (e.g., XGBoost and CatBoost) substantially improves the scalability, computational efficiency, and real-time processing capability of credit card fraud detection systems compared to non-distributed frameworks.

This color-coded and structured visualization fully captures the multilayer approach toward the robust, scalable, and efficient system of fraud case detection, reinforcing clarity and strategic insight.

3. Materials and Methods

This section discusses methodology, using datasets, models, and evaluation performance for fraud in credit card use. The dataset used in this analysis was gathered from https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud/data (accessed on 21 April 2025) and includes credit card fraud detection data and anonymous information about credit card transactions.

Its large size, over 200 MB, makes it well suited to several problems, such as imbalance, and others handled below via models and techniques. Logistic Regression, Decision Trees, Random Forests, CatBoost, and XGBoost were implemented (Figure 2).

3.1. Performance Evaluation

Logistic Regression is a statistical model whereby the predictors estimate the probability for binary classification tasks; hence, it is suitable for predicting fraudulent transactions. It calculates the likelihood of an outcome based on one or more independent variables. It is simple and interpretable; therefore, Logistic Regression is used as a baseline model compared to other more complex algorithms.

Decision Trees classify data using a tree-like model in which nodes represent either decisions or conditions, while the branches are outcomes. The method is intuitive and interpretable; however, it is vulnerable to overfitting when exposed to new, unseen data. Here, the technique is an intermediary approach for understanding how hierarchical decision-making influences fraud detection accuracy.

Random Forests aggregate predictions from several Decision Trees to improve classification performance and reduce overfitting. Each tree is trained on a random subset of the data, and the results are combined to produce the final prediction. This ensembling technique ensures robust performance with an improvement in generalizing unseen data.

XGBoost is an ensemble learning technique that builds a sequence of models, each correcting the errors of the previous model. It is optimized for handling big data and allows flexibility in tuning parameters, making it apt for fraud detection problems. XGBoost was selected because it has shown its capability for high accuracy and effectiveness in handling imbalanced data.

CatBoost is a gradient-boosting algorithm optimized for categorical data. It introduces novelty because it takes categorical features directly without much pre-processing or one-hot encoding. It is computationally efficient and does well on both balanced and imbalanced datasets. It reduces overfitting by using ordered boosting and symmetrical trees, hence generalizing well on unseen data. The CatBoost algorithm had quite decent performance metrics, was very close, and even outperformed XGBoost in calibration and accuracy in this paper. With their interpretability and computational efficiency, highly accurate predictions make this algorithm competitive in fraud detection.

3.2. Dataset Description

The publicly available “Credit Card Fraud Detection” dataset from Kaggle was used in this research, which consists of 284,807 transactions in total, with just 492 of them labeled as fraudulent; the fraud rate turns out to be highly imbalanced, at 0.172%. Each transaction is represented by 31 attributes, of which 28 anonymized features V1–V28 were derived using Principal Component Analysis and two non-transformed features, “Time” and “Amount”. The binary “Class” attribute is the target variable, where 1 denotes fraud, and 0 denotes a legitimate transaction. The PCA transformation maintains privacy while retaining essential feature representations for fraud detection.

Subsequent EDA was rather extensive, and it came out that several PCA components showed some striking dependencies, including positive ones for V17 versus V18 and negative ones for V14 versus V4. These can indicate feature interactions that could influence fraud detection performance. The “Time” feature represents a certain number of seconds between each pair of transactions. It was excluded since it provides no meaningful temporal context for the classification tasks. The “Amount” feature was standardized with the StandardScaler transformation, so its scale aligns with PCA-transformed features, and model training can be consistently performed.

SMOTE was applied to generate synthetic fraudulent examples to deal with class imbalance. It operates by oversampling the minority class with new synthetic data points created based on the feature space of the already existing minority samples. This approach reduces the bias towards the majority class during a model’s training and improves the generalization of classification models.

In model evaluation, k-fold cross-validation was used to ensure robustness and avoid performance biases. The data were then divided into k subsets: k − 1 subsets for training and the remaining subset for validation. This process was repeated k times, moving each time to one subset for validation. K-fold cross-validation gives an overall view of the model’s performance, avoiding overfitting by using all the data for training and validation. K-fold cross-validation was preferred over simple splitting of data since it does not run any risk of a biased estimate of performance for models, and because diverse data partitions can be tried on the models. For purposes of comparison, LOOCV was also considered for the smaller subsets, in which a single transaction was utilized once as a validating sample to provide a high-granular estimate of the predictive performance of the models, although using LOOCV on the entire dataset would have been computationally costly, considering its enormity.

This embedding of the pre-processing steps into the k-fold cross-validation guarantees the best dataset preparation for the training and evaluation of any machine learning model. Highly problematic issues, such as class imbalance and overfitting, are handled, reliably estimating generalizability.

Figure 3: Heatmap of the pairwise correlations between PCA-transformed features, V1 through V28, transaction amount, and the target class. This is a critical analysis to understand the relationships and dependencies within the dataset, especially for features derived from PCA, which do not have intuitive interpretability. The heatmap shows both positive and negative correlations, underlining some key relationships that can be used to inform feature engineering and model optimization in credit card fraud detection.

Some noticeable patterns are the high positive values for the correlations among V17 and V18 and V16 and V17. These features seem to capture overlapping statistical dimensions of the transaction data and, therefore, can be redundant or share the importance of identifying fraud transactions. Features like V14 show a highly negative correlation with V4, indicating these attributes represent complementary patterns within the dataset and may play a crucial role in distinguishing fraudulent transactions from valid ones.

The “Class” variable, as the binary label of fraud, correlates rather weakly with various features. Most conspicuous are a mild negative correlation with V14 and a positive correlation with V10 and V4. These associations, however, are not strongly direct in establishing predictive powers for themselves; they point toward more subtle patterns in the data, which can help fraud detection through several features in combination under a more complex modeling framework. The “Amount” feature of the transaction in the heatmap underlines that it could be an essential variable in fraud detection. A weak positive correlation with the “Class” attribute may indicate that higher transactions are slightly associated with fraud and should be treated carefully to avoid bias during pre-processing and model training.

First, the heatmap is an essential step in the exploratory data analysis process, providing insight into feature relationships that will inform the construction of robust machine learning models. Features such as V17 and V18, documenting multicollinearity, are highly correlated, thus indicating the potential use of dimensionality reduction techniques or feature selection strategies. In addition, understanding the subtle but real correlations under the “Class” variable allows for the feature prioritization during training. However, any contextual information that could help reveal the root causes behind such collinearity is unavailable because of feature anonymization.

This analysis ensures that the features contributing most to predictive accuracy are prioritized and highlights areas where additional pre-processing, such as decorrelation or feature scaling, may enhance model performance. The heatmap thus provides a foundational basis for developing optimized machine learning pipelines for fraud detection. A box plot to review the transaction amount distribution was obtained from the command data[’Amount’].plot.box() The created visualization of the transaction amount distribution and outliers allows instant impressions about the variability of data and extreme values (Figure 4).

The kernel density estimate plot, via the command sns.kdeplot(data=data[‘Amount’], shade=True), showed a smooth representation of the distribution, indicating that the transaction amounts are positively skewed, and most values fall in the lower range of the data distribution.

This was considered a non-normal distribution and was scaled and pre-processed as such (Figure 5).

Individual distributions of highly correlated key features like V1, V10, V12, and V23 were studied using histograms (Figure 6). These features were chosen from the earlier heatmap based on their correlation with the target class. The histograms showed a variety of skewness, some approximately normal and others with significant deviations.

A pie chart was generated to display the target class distribution. Figure 7 below depicts that only 0.172% of the dataset is fraudulent transactions, demonstrating that this is an imbalanced classification. There is a big class imbalance problem; hence, resampling techniques like SMOTE will be necessary to generate synthetic samples of the minority class to make the model generalizable in fraud detection.

Care has been taken to avoid the arbitrary splitting of the dataset into training and testing subsets for reliable model evaluation. In its place, k-fold cross-validation with k = 10 has been used since it gives a more robust assessment by training and validating the model on all parts of the dataset, which reduces bias and variance in the evaluation process. It prepared data for standard scaling with features within comparable ranges to one another; a crucial step, knowing how different magnitudes of transaction amount would show compared to the features transformed via PCA. The data should also be appropriately visualized and statistically checked for correctness.

For each model, the Brier Score was computed; it gives a statistical performance assessment of the model predictions. It measures the accuracy of the probabilistic predictions resulting from calculating the mean squared difference between predicted probabilities and actual outcomes. The Brier Score benefits imbalanced classification problems since it considers the calibration and refinement of probabilities. A low Brier Score reflects a well-calibrated prediction.

To identify which model produced statistically superior forecasts, this study followed Hansen’s Model Confidence Set framework [116]. The underlying approach compares models concerning the Brier Score, retaining only those statistically superior to other models in a confidence set. The MCS framework conducts an orderly examination of the significance of the differences in performance between models. This allows for a rigorous statistical comparison to address the problem of similar performance across models.

Apart from the Brier Score evaluation, the Model Confidence Set (MCS) procedure [116] was implemented to statistically support the significance of the identified best performance. The MCS framework compares models by testing the null hypothesis of equal predictive ability for all included models and, through an iterative elimination of the worst-performing model, identifies a confidence set containing the best-performing models. This further ensures a sound comparison of the model forecasting accuracy in case of similar performances.

The Brier Scores of the models comprising Logistic Regression, Decision Trees, Random Forest, XGBoost, and CatBoost were computed in order to assess the probabilistic accuracy and calibration of these various techniques. For this comparison, CatBoost showed promising performance for both depth levels of 6 and 8, achieving a test Brier Score of 0.0004 at depth 8, thus outperforming most of the other models in calibration and probabilistic accuracy on imbalanced datasets, and being very close to XGBoost, which achieved a Brier Score of 0.0002. MCS confirmed that, at a confidence level of 95%, the model CatBoost was statistically equivalent in its predictive ability compared to models XGBoost and Random Forest. At the same time, CatBoost showed better calibration than XGBoost for settings. Figure 8 below visualizes the Brier Score for each evaluated model, where lower values represent better-calibrated probabilistic predictions.

Random Forest, XGBoost, and CatBoost consistently outperform Logistic Regression and Decision Trees, particularly in handling imbalanced datasets. CatBoost’s intense calibration (Brier Score of 0.0004) aligns with its superior interpretability and computational efficiency.

3.3. Experimental Controls and Reproducibility Measures

All experiments were conducted using a standardized environment to ensure the full transparency and reproducibility of this study. The machine learning models were implemented in Python 3.9 with libraries including scikit-learn 1.2, XGBoost 1.7, and CatBoost 1.2. The distributed processing environment was set up on Apache Spark 3.3.1 using PySpark, with four worker nodes provisioned via a cloud-based cluster—each with 16 GB RAM and 4 virtual CPUs. Experiments were orchestrated using Hadoop YARN for resource management.

To avoid result variability, a global random seed [42] was fixed across all components, including numpy, random, and all model libraries. All pre-processing (e.g., SMOTE, standard scaling) was applied only to training data within each fold to prevent data leakage. The classification models were trained and evaluated using stratified 10-fold cross-validation, ensuring equal class distribution in each fold. Leave-One-Out Cross-Validation (LOOCV) was also explored on smaller subsets for comparison purposes.

Model hyperparameters were tuned using Bayesian optimization via Optuna (50 trials per model). The primary optimization metric was F1-score, and early stopping was employed (patience of 10 rounds). Default settings for regularization parameters were tested against custom configurations to evaluate their impact. The exact parameter grids and best-performing configurations are available upon request.

All performance metrics—accuracy, precision, recall, F1-score, specificity, and Brier Score—were calculated using scikit-learn. Runtime measurements (training/testing duration) were captured using Python’s time module. For real-time detection latency, PySpark’s Structured Streaming was simulated with 500 ms micro-batches to assess the feasibility of online fraud detection under operational conditions.

This rigorously controlled setup ensures this study’s findings are robust, interpretable, and replicable in academic and applied settings.

3.4. Rationale for Model Selection and Statistical Validation Enhancements

To reinforce the robustness and reliability of our findings, this section elaborates on the rationale behind the selection of specific machine learning algorithms and presents the additional statistical analyses employed.

3.4.1. Model Selection Justification

The models chosen for this study—Logistic Regression, Decision Tree, Random Forest, XGBoost, and CatBoost—were selected to represent a diverse set of classification paradigms ranging from baseline linear models to advanced ensemble techniques.

Logistic Regression was used as a baseline due to its interpretability and widespread use in the fraud detection literature.
Decision Trees and Random Forests were included because they can capture non-linear relationships and provide feature importance insights.
XGBoost and CatBoost were selected as state-of-the-art gradient boosting frameworks for their high predictive performance, handling of imbalanced datasets, and efficient computation. CatBoost, in particular, is robust to categorical variables and offers superior calibration, which is essential in fraud probability estimation.

These choices allow for comparative evaluation and performance benchmarking, addressing the dual goals of accuracy and explainability in high-stakes financial applications.

3.4.2. Statistical Validation and Error Analysis

To further validate the outcomes, we expanded the statistical analysis in the following ways:

Confidence Intervals: For each performance metric (accuracy, precision, recall, F1-score, Brier Score), 95% confidence intervals were calculated using bootstrapped resampling (n = 1000 iterations). This provides a measure of statistical stability and allows for more informed comparisons between models.
McNemar’s Test: To compare the statistical significance between model predictions (e.g., XGBoost vs. CatBoost), we applied McNemar’s test on paired classification outputs. This test evaluated whether the observed improvements were statistically significant or due to chance.
Calibration Curves: we incorporated reliability plots (calibration curves) for XGBoost and CatBoost to assess how well the predicted probabilities align with actual outcomes.
Error Distribution: Misclassified instances were analyzed by comparing the distribution of feature values between false positives, false negatives, and correctly predicted classes. This analysis helps identify potential edge cases or underrepresented patterns in the training data.

4. Results

The models’ performance was evaluated using accuracy, precision, recall, F1-score, specificity, and Brier Score. K-fold cross-validation (k = 10) was employed for training and testing, ensuring robust evaluation. The results below summarize the performance of Logistic Regression, Decision Trees, Random Forest, XGBoost, and CatBoost, with insights into their suitability for credit card fraud detection.

4.1. Logistic Regression

Logistic Regression demonstrated reliable performance with a mean test accuracy of 96.49% and a mean test Brier Score of 0.0259, indicating relatively good calibration. Specificity was consistent at 97.84%, showcasing its ability to minimize false positives (Table 1). However, the model underperformed compared to ensemble methods.

4.2. Decision Trees

Decision Trees, evaluated at different maximum depths (15, 20, 25), exhibited increasing performance with depth, achieving a mean test accuracy of 98.91% at depth 25 and a Brier Score of 0.0085. Specificity reached 99.15%, making Decision Trees highly effective at differentiating legitimate and fraudulent transactions (Table 2).

4.3. Random Forest

Random Forest consistently outperformed Decision Trees and Logistic Regression, achieving a test accuracy of 99.36% and a Brier Score of 0.0053 at depth 25 (Table 3). Specificity improved with depth, peaking at 99.50%, showcasing the model’s robustness in handling imbalanced data.

4.4. XGBoost

XGBoost achieved the highest test accuracy of 99.97% and a Brier Score of 0.0002, outperforming Random Forest in calibration but requiring more computational resources (Table 4). Specificity was also exceptional at 99.95%, reinforcing its suitability for fraud detection.

4.5. CatBoost

CatBoost, tested at depths 6 and 8, demonstrated strong calibration and performance. At depth 8, the test accuracy was 99.96%, with a Brier Score of 0.0004 (Table 5). Specificity was marginally lower than XGBoost at 99.91% but still competitive.

The performance of the machine learning models was evaluated using key metrics, including accuracy, precision, recall, F1-score, specificity, and Brier Score, as shown in Table 6. Logistic Regression achieved a moderate accuracy of 96.5%, reflecting its simplicity and interpretability. Decision Trees and Random Forest exhibited robust performance, with accuracies of 98.9% and 99.4%, respectively, highlighting their strength in handling imbalanced datasets. Advanced ensemble methods like XGBoost and CatBoost demonstrated exceptional results, achieving near-perfect accuracies of 99.97% and 99.96%, respectively, alongside superior calibration metrics such as the lowest Brier Scores (0.0002 for XGBoost and 0.0004 for CatBoost).

4.6. Model Accuracy Comparison

The performance of the machine learning models was assessed based on their accuracy during both the training and testing phases. Figure 9 and Figure 10 illustrate the accuracy comparison of various algorithms, including Logistic Regression, Decision Trees (at depths of 15, 20, and 25), Random Forest (at depths of 15, 20, and 25), XGBoost (depth 6), and CatBoost (depths 6 and 8).

In the training phase, Figure 9 depicts the performance of Decision Tree and Random Forest models perfectly reaching 100% for all depths, indicating a potential tendency for overfitting. In comparison, XGBoost and CatBoost reached near-perfect accuracies of 99.97% and 99.96%, respectively, already showing the generalization ability during the training process.

Logistic Regression, being the baseline model, achieved an accuracy of 96.50%, which is comparatively lower but still competitive given its simplicity. In the testing phase, as shown in Figure 10, XGBoost and CatBoost retained their near-perfect accuracy scores of 99.97% and 99.96%, respectively.

These results underline their strength and generalization capability on unseen data. The Random Forest and Decision Tree models had very high accuracies of over 99% during testing, though these were slightly decreased from the training accuracy, indicating mild overfitting.

Logistic Regression maintained its accuracy at 96.49% during testing, strengthening its reliability as a baseline model against which to compare. These results show the improved performance of sophisticated ensemble models, such as XGBoost and CatBoost, on fraud detection complexities.

They can achieve high accuracy in training and testing phases and are robust against overfitting, making them good candidates for real-world applications. Logistic Regression is a benchmark because of its simplicity and interpretability, while Decision Trees and Random Forests perform excellently but may require extra regularization to avoid overfitting risks.

These results highlight some intrinsic trade-offs among model complexity, performance, and generalization, which could provide valuable insights into selecting the optimum algorithms for fraud detection systems.

4.7. Comparative Analysis

Precision, recall, and F1-score metrics provide further insight into how the models handle class imbalance. Figure 3 and Figure 4 depict the metrics at both the training and testing phases for Logistic Regression, Decision Tree, depths 15, 20, and 25, Random Forest, depths 15, 20, and 25, XGBoost, depth 6, and CatBoost, depths 6 and 8.

Figure 11 illustrates that the Decision Tree and Random Forest models for the training sets have been performed with perfect precision, recall, and an F1-score of 1.0 at each depth without error, capturing all true positives. However, flawless scores hint that there is a likelihood of overfitting. XGBoost and CatBoost showed near perfect scores (~0.999) by maintaining a trade-off between capturing true positives and reducing either false positives or false negatives. Logistic Regression, while simpler, consistently obtained scores of ~0.965 and proved reliable as a baseline.

Figure 12 shows that during the testing phase, XGBoost and CatBoost maintained near-perfect precision, recall, and F1-scores of 0.999+, proving their robustness and efficiency on unseen data. Random Forest and Decision Tree models were also impressive during testing, with scores close to 1.0, although these scores are slightly lower than in training, indicating slight overfitting. Logistic Regression had balanced scores of ~0.96, showcasing its capability despite being less sophisticated.

This analysis has underlined how advanced models, such as XGBoost and CatBoost, realize excellent performance while preserving generalization. While Decision Tree and Random Forest are excellent on these metrics, their tendency toward overfitting requires cautious parameter tuning in practical scenarios. Though less accurate, Logistic Regression is useful because of its simplicity and interpretability. These metrics point to the trade-offs between model complexity, precision, and generalization within fraud detection.

More analytically, the comparison underlined some different advantages and limitations for each of the metrics considered, reflecting the effectiveness of the models in identifying fraudulent transactions. Even though Logistic Regression is simple, it was taken as a baseline model because it is interpretable and its assumptions are linear. However, this limitation becomes evident while handling nonlinear relationships, which are complex in fraud detection datasets. Although it achieved a test accuracy of 96.49%, its results were still bound by suboptimal precision, recall, and F1-scores compared to state-of-the-art methods.

Decision Trees represent data in a flowchart or tree structure and hence allow interpretability of most metrics: 99.04% precision: 99.79%; the tests show some declines, such that a specificity of 99.15% and a Brier Score of 0.0085 would probably highlight the efficient output of Decision Trees to tend toward overfitting to their respective training sets a little by a marginal degree, generalizing them in return.

Random Forest is an ensemble method that enhances predictive robustness by combining multiple Decision Tree results. Such a method reduces overfitting and is condemned to high performance for training and test datasets repeatedly. Its generalization is exemplary, as it presents an accuracy of 99.36% and specificity of 99.50%. Its Brier Score of 0.0053 confirms superlative calibration and probabilistic performance, making it a very effective method against imbalanced datasets found in fraudulent cases.

The performance-optimized version of gradient boosting, XGBoost, realizes 99.97% test accuracy, a specificity of 99.95%, and a Brier Score of 0.0002. These performances mean that this method efficiently handles big, imbalanced datasets and has excellent computational efficiency. Due to their sequential training nature, XGBoost significantly reduces prediction errors and produces a high precision and recall value. Its computation complexity can hinder real-time applications where fast decision-making is required.

CatBoost has been specially developed to handle categorical features, which makes it quite a competitive model. Without extensive pre-processing, CatBoost integrates categorical data into its learning framework by itself and thus reduces computational over-head. It achieved a test accuracy of 99.96% and a specificity of 99.91%, making it run close to XGBoost. A Brier Score of 0.0004 points to intense calibration, which would be very useful in probabilistic predictions. It efficiently handles categorical features, and its performance is comparable or even better, but with less tuning. Therefore, CatBoost is of higher value in practical applications.

The simple models, Logistic Regression among them, provide more basic insights, while more advanced ensemble methods—Random Forest, XGBoost, and CatBoost—show better predictive accuracy, calibration, and robust performances. XGBoost is better calibrated, although the results for both models are close to perfect; this means both models are promising for modeling such a complex task as credit card fraud detection. The inclusion of CatBoost indicates its ability to handle both categorical and numerical features seamlessly, with better performance and less pre-processing, making it practical and efficient in real-world financial applications.

The combined heatmap (Figure 13) presents a unified view of all models’ training and testing metrics. The rows correspond to different metrics and datasets (training or testing), while the columns represent the models. Insights:

XGBoost and CatBoost consistently achieve superior performance across all metrics and datasets.
Brier Score is adjusted for visualization, highlighting calibration strengths.
Differences between training and testing performance are minimal for advanced models like XGBoost and CatBoost.

5. Discussion

This work has analyzed several machine learning methods for fraud detection in credit cards: Logistic Regression, Decision Trees, Random Forest, XGBoost, and CatBoost. The results present several aspects related to the trade-off with model simplicity and interpretability for better predictive performance: accuracy, calibration, and computation efficiency.

Logistic Regression is the baseline model with an accuracy of 96.49%, indicating its usefulness when model interpretability is crucial. However, it suffers from a limited ability to capture complicated nonlinear relationships, thus limiting its application in fraud detection, especially in imbalanced datasets. The Brier Score of 0.0259 is relatively high, indicating poor calibration; hence, it is unsuitable for a high-stakes environment where precision is key.

The performance of Decision Trees and Random Forest showed quite remarkable improvements. Random Forest achieved almost perfect metrics on the training and testing datasets. Due to the nature of aggregating predictions from several trees, Random Forest mitigates overfitting and enhances generalization. This was depicted by a low Brier Score of 0.0053 and a specificity of 99.50%. However, the ensemble nature comes with a more computationally expensive model, which is less applicable in scenarios that demand rapid predictions.

XGBoost further enhanced the performance by availing gradient boosting, reaching the highest accuracy of 99.97% and a Brier Score of 0.0002. This model has proven to be very effective in handling imbalanced and complex datasets because of its ability to correct prediction errors sequentially. However, XGBoost requires intensive hyperparameter tuning and a lot of computational resources, especially for big datasets, hence making its deployment difficult in real-time systems. CatBoost was a very strong competitor, especially in handling categorical features natively without pre-processing.

The model achieved an accuracy of 99.96% on testing, with a specificity of 99.91% and a Brier Score of 0.0004. The computational efficiency of CatBoost, in addition to the high calibration and accuracy, underlines its practical relevance for real-world applications, especially where mixed data types are concerned. Compared to XGBoost, CatBoost performs similarly, with reduced pre-processing overhead and faster training times.

Table 7 summarizes the hypotheses proposed and their respective evaluation outcomes to provide a structured assessment of this study’s theoretical foundations. This table highlights the empirical confirmation of both hypotheses based on the experimental results and performance metrics discussed earlier.

Specifically, Hypothesis 1 (H1), which posits that advanced machine learning techniques significantly enhance fraud detection accuracy compared to traditional methods, was confirmed. The results showed that ensemble models such as XGBoost and CatBoost outperformed Logistic Regression and Decision Trees, achieving test accuracies of 99.97% and 99.96%, respectively, alongside superior Brier Scores and calibration.

Similarly, Hypothesis 2 (H2) was also validated concerning the role of distributed frameworks like PySpark in enhancing scalability and processing efficiency. PySpark reduced training times significantly and enabled real-time fraud detection capabilities with system latency as low as 500 milliseconds, outperforming traditional platforms like Hadoop and Flink.

Confirming both hypotheses reinforces the value of combining advanced ML models with scalable distributed architectures in developing robust, adaptive fraud detection systems.

Prior studies [117,118] have contributed significantly to the field by applying traditional machine learning algorithms like Random Forest and Logistic Regression to credit card fraud detection. While these studies addressed essential issues such as data imbalance and model evaluation, their implementations were confined mainly to non-distributed environments, achieving accuracy rates between 94% and 96% and often requiring batch-based processing. In contrast, our proposed framework integrates advanced ensemble methods (XGBoost and CatBoost) within a distributed PySpark architecture, achieving near-perfect accuracy (99.96–99.97%), extremely low Brier Scores (0.0002–0.0004), and scalability across large transactional datasets.

Additionally, our work improves upon traditional approaches by emphasizing real-time detection capability, which is essential for fraud prevention in high-volume financial systems. PySpark’s in-memory processing and distributed training significantly reduced latency (to 500 milliseconds), outperforming traditional big data frameworks like Hadoop and Flink, which remain more batch-oriented. Moreover, while many existing studies relied solely on resampling techniques like SMOTE to handle class imbalance, our approach incorporates these techniques within an advanced learning pipeline that uses hyperparameter-optimized ensemble classifiers, ensuring improved generalization and probabilistic calibration.

Finally, a recent study [119] has demonstrated how machine learning frameworks can effectively identify and mitigate digital threats through anomaly detection models in parallel domains like cybersecurity. These findings align with our fraud detection context, where timely and accurate classification of malicious behavior is equally critical. This cross-domain relevance underscores the broader applicability of our proposed methodology and supports its deployment across various sectors requiring robust anomaly detection solutions.

In addition to computational overhead, Figure 14 and Figure 15 show the computational overhead of the assessed models, another critical aspect when real-time fraud detection systems come into play. Logistic Regression proved to be the most computationally efficient, having minimal training and testing times of 1.87 s and 0.058 s, respectively, at the cost of slightly reducing predictive power compared with more advanced algorithms. It will be ideal for cases with the highest possible computational efficiency and simplicity.

Decision Trees gave a relatively moderate computational load, while the training and testing time increased linearly with the depth of the trees. At depth 25, training took 21.74 s, and testing took 0.127 s, showing their good scalability while remaining computationally feasible. However, the ensemble models, such as Random Forest, were much more overhead-expensive. For example, at depth 25, the training time was 78.55 s, and the testing time was 0.22 s. Considering robustness and high accuracy, despite higher resource demands, Random Forest suits those systems where training can be completed offline. XGBoost showed a balanced approach: 34.02 s training time and 0.06 s testing time. The optimization techniques, parallel processing, and regularization enabled XGBoost to show competitive performance without going to an extreme computational cost, though hyperparameter tuning is still resource-demanding.

CatBoost, while highly comparable to XGBoost in terms of performance, did take a little longer to train, 34.15 s, but kept testing times low at 0.03 s. Native support for categorical data without pre-processing dramatically reduces the overall complexity of the implementation. It makes it an up-and-coming candidate for real-world, large-scale fraud detection applications. PySpark was essential for handling the computational burden arising from high-volume transaction data. Its distributed processing capabilities enabled the implementation of all models in a scalable manner. It ensured the feasibility of deploying resource-intensive algorithms such as Random Forest, XGBoost, and CatBoost in real-time detection environments. The computational trade-offs presented in this study bring to the fore the importance of choosing appropriate algorithms based on fraud detection systems’ operational constraints and objectives.

The experimental results directly validate Hypothesis 1 (H1) because XGBoost and CatBoost achieved predictive accuracy rates of 99.97% and 99.96%, which exceeded the results of traditional analytical methods such as Logistic Regression at 96.49%. The advanced algorithms achieve better results through sequential error correction, balanced dataset handling, and adaptive learning capabilities, which result in improved adaptability and responsiveness. The observed performance improvements between distributed frameworks like PySpark and advanced machine learning models also validate Hypothesis 2 (H2). The PySpark system showed better scalability and computational efficiency through its ability to cut training time by 52% compared to Hadoop and 34% compared to Flink. At the same time, it achieved real-time latency at 500 ms instead of 1200 ms (Hadoop) and 800 ms (Flink). The analytical results demonstrate how PySpark’s in-memory processing, parallelized computations, and streaming capabilities make it superior to traditional non-distributed frameworks.

These results, which were benchmarked against traditional methods and state-of-the-art machine learning models, further confirm the superiority of ensemble methods like Random Forest, XGBoost, and CatBoost. Though Logistic Regression is a very dependable baseline, it lacks the robustness of an ensemble method on complex, imbalanced datasets. The performance of CatBoost agrees with empirical evidence from existing research on its efficiency and accuracy in handling mixed data types. These results agree with those obtained in similar studies [103,107,110]. The comparison would be more complete, and the results would be further validated by including other benchmarks such as LightGBM or hybrid models.

5.1. Comparative Evaluation and Innovations Beyond the State of the Art

A detailed comparative analysis was conducted to benchmark the proposed framework, combining PySpark with advanced machine learning models (XGBoost and CatBoost), against other distributed frameworks, including Apache Flink and Hadoop (Table 8). This evaluation focused on critical performance metrics such as training time, accuracy, real-time latency, and scalability, as these factors significantly influence the efficacy of fraud detection systems.

The comparison highlighted PySpark’s superior ability to integrate with machine learning workflows while maintaining real-time responsiveness. Its distributed architecture, in-memory processing, and support for iterative computations offered distinct advantages over Hadoop, which relies on disk-based operations, and Flink, which has limited support for advanced machine learning models. PySpark’s flexibility in managing batch and streaming data further reinforces its utility in fraud detection scenarios that demand rapid analysis and adaptability.

Key Observations:

▪: Training Time: the proposed framework achieved a 52% reduction in training time compared to Hadoop and a 34% reduction compared to Flink, attributed to PySpark’s parallelized processing and in-memory computations during gradient boosting iterations.
▪: Accuracy: by leveraging optimized feature engineering and advanced model tuning, the proposed framework improved detection accuracy by up to 4% over Flink and 4% over Hadoop, highlighting its ability to handle imbalanced datasets effectively.
▪: Latency: PySpark’s integration with XGBoost and CatBoost demonstrated a real-time latency of 500 milliseconds, significantly outperforming Hadoop (1200 ms) and Flink (800 ms). This advantage stems from PySpark’s support for streaming and low-latency computations.
▪: Scalability: while both PySpark and Flink exhibited high scalability due to their distributed nature, Hadoop’s scalability was moderate, limited by its reliance on batch processing and disk-based operations.

The proposed framework introduces several new features that were limited or lacking in previous systems. PySpark’s streaming capability was utilized to implement continuous learning mechanisms so that the models keep evolving dynamically as newer data come in, eliminating manual retraining and allowing real-time adaptation to changing fraud patterns. The framework will include edge computing to process high-priority transactions locally, reducing latency for critical fraud detection tasks while maintaining scalability for larger datasets processed in the cloud. This hybrid approach enhances the framework’s responsiveness and resource utilization.

This will provide insight into feature importance and model predictions for both XGBoost and CatBoost models in integrating explainable AI methods. These explainable outputs are significant in ensuring regulatory compliance while instilling stakeholder confidence in the fraud detection system. Some computation-intensive tasks, such as interaction term creation, aggregated statistics, and temporal feature extraction, were divided between PySpark nodes. This way, nuanced patterns associated with fraudulent transactions were better represented, and model performance increased substantially.

In the meantime, the proposed framework could make this real with parallelized grid and random search tuning for learning rate, max depth, and tree boosters through the PySpark distributed infrastructure, hence decreasing time optimizations by at least a factor of 40% when benchmarked with conventional sequential optimization procedures. These comparative insights bring forth the superiority of the proposed framework in handling large-scale transactional datasets with high precision and efficiency. It is also specially fitted for real-world deployment in financial institutions because it integrates adaptive learning mechanisms and hybrid architecture. Furthermore, with explainable outputs, interpretability enhances trust and compliance in a regulatory environment.

5.2. Model Sensitivity to Macroeconomic Shifts and Policy Invariance

Although this study’s findings reaffirm machine learning methods’ excellent accuracy and calibration in detecting credit card fraud based on historical transaction data, we must acknowledge a straightforward limitation: the possible non-invariance of structural parameters over time, as described in the Lucas critique. For economic modeling, parameter constancy ensures correct inference under changing policy regimes or macroeconomic conditions.

While credit card fraud is primarily a micro-level phenomenon, macroeconomic factors influence it indirectly. The literature [120] provides evidence that fiscal deficits, unemployment, interest rates, and the general performance of GDP can influence aggregate credit risk behavior. For instance, during a fiscal expansion or recession, the characteristics and volume of fraudulent activity can be entirely different. Previously insignificant predictor variables can become significant, and current predictors can become irrelevant or have different correlations.

Like most others, our models are trained in historical patterns, typically extracted from data collected in relatively stable economic times. Therefore, their predictive capacity can degrade when applied in structurally different environments. This vulnerability is particularly pertinent to real-time systems functioning across extended time horizons and heterogeneous economic conditions. To mitigate this, future model development should consider the following:

▪: Temporally aware training (e.g., temporal cross-validation or rolling windows) for accurately fitting to changing trends.
▪: Incorporating macroeconomic variables (like inflation rates, consumer confidence) into the feature set allows the model to adjust for general conditions.
▪: Continuity mechanisms of learning that facilitate adaptive retraining whenever new patterns emerge due to policy and economic changes.

Lastly, recognizing this sensitivity encourages model interpretability and transparency, essential for deployment in high-stakes financial environments. This dialogue makes practitioners aware of the risks of potential generalization from previous data to future fraud situations.

5.3. Limitations of the Study

Although the area of credit card transactions is the most critical domain in the financial fraud detection part of the literature review, it represents only a tiny portion of the entire scope of the research study. An extended fraud detection framework would expand the research area from pure credit card transaction-based detection to considering other platforms in digital wallets, online banking systems, and even peer-to-peer payment networks. This broad generalization of such models has not been decided upon across various domains in this study.

Other limitations concern the focus adopted for machine learning models and technologies: PySpark, XGBoost, and CatBoost. While the performance of the models and tools described was extremely high, these were part of just a tiny cut of the available and possible algorithms and platforms that can perform fraud detection. Not to mention the more established ones, such as LightGBM, and all new paradigms like graph neural networks and hybrid models. Real-time implementation poses significant challenges that were not fully addressed in this research. These include the following:

▪: Latency and Delays: Fraud detection systems need to process transactions in real-time to detect and stop fraudulent activities on time. The computational overhead and latency associated with advanced models, especially ensemble methods, may impede their deployment in live systems.
▪: Scalability: it must handle and analyze vast volumes of transactional data on scalable architectures for high performance when the load is high.
▪: Data Privacy: handling sensitive financial information in a world with global regulations on data privacy, such as GDPR and CCPA, adds to the complexity.
▪: Global Collaboration: coordinating various financial institutions in fraud detection involves standardized protocols for data sharing, secure channels of communication, and mechanisms for resolving cross-border regulatory constraints.

Another limitation is the issue of data imbalance, which is still fresh in fraud detection. Various techniques were applied in this paper, such as oversampling and synthesizing data creation, such as SMOTE. Still, they were not evaluated for performance when faced with such changing fraud patterns. Also, using only one dataset limits the general applicability of results because fraud patterns in the wild vary significantly over geographies, industries, and user demographics. Most of the emphasis in this research is on model performance metrics, such as accuracy and Brier Scores, without in-depth cost–benefit analysis for implementation in live environments. Understanding the computational costs involved and resource allocation trade-offs versus fraud detection effectiveness in real-world deployment is essential.

Such limitations are overcome by considering the careful thinking and strategic preparation involved. The work in the future should be performed on various financial platforms, ranging over a wide scale of machine learning algorithms, to deploy scalable real-time fraud detection systems and preserve privacy. Such solutions to the challenges mentioned will increase the adaptability, reliability, and effectiveness of the fraud detection frameworks to keep them relevant for handling the ever-changing dimensions of financial fraud.

5.4. Future Research Directions

This research proved the viability of distributed machine learning models in improving credit card fraud detection systems’ accuracy, efficiency, and flexibility. Extending this work’s contributions, various future research opportunities are suggested to aid the further development of real-time, robust, and privacy-preserving fraud detection systems. A crucial follow-on action entails deploying the proposed models in actual financial transactional environments to test their performance against live operational conditions. This includes system latency, computational overhead, and compatibility with existing infrastructures to validate effortless integration into financial systems. This deployment will necessitate ongoing optimization and monitoring to ensure real-time responsiveness.

Since fraud’s character changes over time, future research must create real-time learning mechanisms that will develop alongside these new trends as they arise. These involve behavioral analytics and context-aware anomaly detection, which utilize historic, contextual, and social interactional data to inform predictive capacity. Methods like Kolmogorov complexity and social signal processing can also detect deviations from normative behavior that transactional features will not detect [121,122,123].

Emerging technologies offer additional opportunities. Blockchain offers decentralized, tamper-proof data exchange between financial institutions, enabling data transparency and traceability [124]. Quantum computing, conversely, may enable unparalleled processing speed and model training opportunities, particularly for streaming data scenarios at large scales [125,126,127].

In addition, the future work must investigate hybrid modeling approaches blending conventional statistical techniques with sophisticated machine learning models. Such blends may provide greater flexibility and robustness to detect simple and intricate fraud patterns. Additionally, Explainable AI (XAI) will be essential to address regulatory compliance and provide transparency in model-based decisions, especially in high-risk finance applications [128].

One other critical area of research is the uptake of federated learning, which allows joint model training across various financial institutions without revealing sensitive customer information. This procedure honors worldwide privacy laws like GDPR while enhancing fraud detection via secure knowledge sharing [129]. Deep insight into consumer behavior remains at the center of fraud detection. Research is needed into how geographic, temporal, and transactional behaviors interact to determine legitimate versus anomalous activity, reducing false positives and enhancing detection accuracy.

Finally, next-generation systems must incorporate privacy-preserving machine learning solutions such as secure multi-party computation and homomorphic encryption to keep up with changing privacy demands. These solutions allow for secure, scalable fraud detection without sacrificing confidentiality or adherence to worldwide data protection regulations.

6. Conclusions

This study demonstrated the effectiveness of combining distributed computing with the latest machine learning models in identifying credit card fraud with accuracy and scalability. Among the compared models, XGBoost and CatBoost were top-performing models that achieved precision, recall, and F1-scores above 0.999 at the testing stage. This speaks to their power and generalization capacity on new, unseen data, and readiness for implementation in real-world dynamic financial systems. While Random Forest and Decision Tree classifiers also performed well, their slight reduction in test-phase performance compared to training indicates mild overfitting. Nevertheless, they remain useful for scenarios where interpretability and fast training are required. Logistic Regression, though simpler, persisted with balanced performance with precision and recall near 0.96, reaffirming its continued utility as a lean baseline model. In conclusion, the results highlight that distributed gradient boosting models, when implemented in a PySpark setup, yield an efficient and scalable real-time fraud detection system. This study confirms the necessity of putting into action machine learning pipelines that have the ability to handle imbalanced datasets, enable low-latency inference, and withstand operational complexities within financial networks. Future work will face real-time learning, privacy-preservation mechanisms, and more behavioral analytics to move fraud detection systems towards high adaptability, fairness, and regulation compliance.

Author Contributions

Conceptualization, L.T., A.T. (Alexandra Theodoropoulou), A.T. (Anastasios Tsimakis) and C.H.; Methodology, L.T., A.T. (Alexandra Theodoropoulou), A.T. (Anastasios Tsimakis) and C.H.; Software, L.T., A.T. (Alexandra Theodoropoulou), A.T. (Anastasios Tsimakis) and C.H.; Validation, L.T., A.T. (Alexandra Theodoropoulou), A.T. (Anastasios Tsimakis) and C.H.; Formal analysis, L.T., A.T. (Alexandra Theodoropoulou), A.T. (Anastasios Tsimakis) and C.H.; Investigation, L.T., A.T. (Alexandra Theodoropoulou), A.T. (Anastasios Tsimakis) and C.H.; Resources, L.T., A.T. (Alexandra Theodoropoulou), A.T. (Anastasios Tsimakis) and C.H.; Data curation, L.T., A.T. (Alexandra Theodoropoulou), A.T. (Anastasios Tsimakis) and C.H.; Writing—original draft, L.T., A.T. (Alexandra Theodoropoulou), A.T. (Anastasios Tsimakis) and C.H.; Writing—review & editing, L.T., A.T. (Alexandra Theodoropoulou), A.T. (Anastasios Tsimakis) and C.H.; Visualization, L.T., A.T. (Alexandra Theodoropoulou), A.T. (Anastasios Tsimakis) and C.H.; Supervision, L.T., A.T. (Alexandra Theodoropoulou), A.T. (Anastasios Tsimakis) and C.H.; Project administration, L.T., A.T. (Alexandra Theodoropoulou), A.T. (Anastasios Tsimakis) and C.H.; Funding acquisition, L.T., A.T. (Alexandra Theodoropoulou), A.T. (Anastasios Tsimakis) and C.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data used in this study were obtained from kaggle.com (https://www.kaggle.com/datasets/mlg-ulb/creditcardfraud/data, accessed on 12 April 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Cherif, A.; Badhib, A.; Ammar, H.; Alshehri, S.; Kalkatawi, M.; Imine, A. Credit card fraud detection in the era of disruptive technologies: A systematic review. J. King Saud Univ.-Comput. Inf. Sci. 2023, 35, 145–174. [Google Scholar] [CrossRef]
Bin Sulaiman, R.; Schetinin, V.; Sant, P. Review of machine learning approach on credit card fraud detection. Hum.-Centric Intell. Syst. 2022, 2, 55–68. [Google Scholar] [CrossRef]
Mekterović, I.; Karan, M.; Pintar, D.; Brkić, L. Credit card fraud detection in card-not-present transactions: Where to invest? Appl. Sci. 2021, 11, 6766. [Google Scholar] [CrossRef]
Karthika, J.; Senthilselvi, A. Smart credit card fraud detection system based on dilated convolutional neural network with sampling technique. Multimed. Tools Appl. 2023, 82, 31691–31708. [Google Scholar] [CrossRef]
Vivek, Y.; Ravi, V.; Anand Mane, A.; Ramesh Naidu, L. ATM Fraud Detection using Streaming Data Analytics. arXiv 2023, arXiv:2303.04946. [Google Scholar]
Găbudeanu, L.; Brici, I.; Mare, C.; Mihai, I.C.; Șcheau, M.C. Privacy intrusiveness in financial-banking fraud detection. Risks 2021, 9, 104. [Google Scholar] [CrossRef]
Jaswadi, J.; Purnomo, H.; Sumiadji, S. Financial statement fraud in Indonesia: A longitudinal study of financial misstatement in the pre-and post-establishment of financial services authority. J. Financ. Report. Account. 2024, 22, 634–652. [Google Scholar] [CrossRef]
Zhou, H.; Sun, G.; Fu, S.; Wang, L.; Hu, J.; Gao, Y. Internet financial fraud detection based on a distributed big data approach with node2vec. IEEE Access 2021, 9, 43378–43386. [Google Scholar] [CrossRef]
Yousuf, B.; Bin Sulaiman, R.; Saberin Nipun, M. A novel approach to increase scalability while training machine learning algorithms using Bfloat 16 in credit card fraud detection. arXiv 2022, arXiv:2206.12415. [Google Scholar]
Jayatilake, S.M.D.A.C.; Ganegoda, G.U. Involvement of machine learning tools in healthcare decision making. J. Healthc. Eng. 2021, 2021, 6679512. [Google Scholar] [CrossRef]
Shrestha, Y.R.; Krishna, V.; von Krogh, G. Augmenting organizational decision-making with deep learning algorithms: Principles, promises, and challenges. J. Bus. Res. 2021, 123, 588–603. [Google Scholar] [CrossRef]
Singh, V.; Chen, S.S.; Singhania, M.; Nanavati, B.; Gupta, A. How are reinforcement learning and deep learning algorithms used for big data-based decision-making in financial industries: A review and research agenda. Int. J. Inf. Manag. Data Insights 2022, 2, 100094. [Google Scholar] [CrossRef]
Sarker, I.H. Machine learning: Algorithms, real-world applications and research directions. SN Comput. Sci. 2021, 2, 160. [Google Scholar] [CrossRef] [PubMed]
Javaid, M.; Haleem, A.; Singh, R.P.; Suman, R.; Rab, S. Significance of machine learning in healthcare: Features, pillars and applications. Int. J. Intell. Netw. 2022, 3, 58–73. [Google Scholar] [CrossRef]
Soori, M.; Arezoo, B.; Dastres, R. Artificial intelligence, machine learning and deep learning in advanced robotics, a review. Cogn. Robot. 2023, 3, 54–70. [Google Scholar] [CrossRef]
Boutaher, N.; Elomri, A.; Abghour, N.; Moussaid, K.; Rida, M. A review of credit card fraud detection using machine learning techniques. In Proceedings of the 2020 5th International Conference on Cloud Computing and Artificial Intelligence: Technologies and Applications (CloudTech), Marrakesh, Morocco, 24–26 November 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–5. [Google Scholar]
Minastireanu, E.A.; Mesnita, G. An Analysis of the Most Used Machine Learning Algorithms for Online Fraud Detection. Inform. Econ. 2019, 23, 5–16. [Google Scholar] [CrossRef]
Alnafessah, A.; Casale, G. Artificial neural networks based techniques for anomaly detection in Apache Spark. Clust. Comput. 2020, 23, 1345–1360. [Google Scholar] [CrossRef]
Mustafa, A.; Rahimi Azghadi, M. Automated machine learning for healthcare and clinical notes analysis. Computers 2021, 10, 24. [Google Scholar] [CrossRef]
Janiesch, C.; Zschech, P.; Heinrich, K. Machine learning and deep learning. Electron. Mark. 2021, 31, 685–695. [Google Scholar] [CrossRef]
Chatterjee, P.; Das, D.; Rawat, D.B. Digital twin for credit card fraud detection: Opportunities, challenges, and fraud detection advancements. Future Gener. Comput. Syst. 2024, 158, 410–426. [Google Scholar] [CrossRef]
Mienye, I.D.; Jere, N. Deep learning for credit card fraud detection: A review of algorithms, challenges, and solutions. IEEE Access 2024, 12, 96893–96910. [Google Scholar] [CrossRef]
Kulatilleke, G.K. Challenges and complexities in machine learning-based credit card fraud detection. arXiv 2022, arXiv:2208.10943. [Google Scholar]
Shu Yee, O.; Sagadevan, S.; Hashimah Ahamed Hassain Malim, N. Credit card fraud detection using machine learning as a data mining technique. J. Telecommun. Electron. Comput. Eng. (JTEC) 2018, 10, 23–27. [Google Scholar]
Niu, X.; Wang, L.; Yang, X. A comparison study of credit card fraud detection: Supervised versus unsupervised. arXiv 2019, arXiv:1904.10604. [Google Scholar]
Wedge, R.; Max Kanter, J.; Moral Rubio, S.; Iglesias Perez, S.; Veeramachaneni, K. Solving the false positives problem in fraud prediction. arXiv 2017, arXiv:1710.07709. [Google Scholar]
Isangediok, M.; Gajamannage, K. Fraud detection using optimized machine learning tools under imbalance classes. In Proceedings of the 2022 IEEE International Conference on Big Data (Big Data), Osaka, Japan, 17–20 December 2022. [Google Scholar] [CrossRef]
Priscilla, C.V.; Prabha, D.P. Credit card fraud detection: A systematic review. In Intelligent Computing Paradigm and Cutting-edge Technologies: Proceedings of the First International Conference on Innovative Computing and Cutting-Edge Technologies (ICICCT 2019), Istanbul, Turkey, 30–31 October 2019; Springer International Publishing: Cham, Switzerland, 2020; Volume 1, pp. 290–303. [Google Scholar]
Shirodkar, N.; Mandrekar, P.; Mandrekar, R.S.; Sakhalkar, R.; Kumar, K.C.; Aswale, S. Credit card fraud detection techniques—A survey. In Proceedings of the 2020 International Conference on Emerging Trends in Information Technology and Engineering (ic-ETITE), Vellore, India, 24–25 February2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–7. [Google Scholar]
Kasa, N.; Dahbura, A.; Ravoori, C.; Adams, S. Improving credit card fraud detection by profiling and clustering accounts. In Proceedings of the 2019 Systems and Information Engineering Design Symposium (SIEDS), Charlottesville, VA, USA, 26 April 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–6. [Google Scholar]
Sadgali, I.; Sael, N.; Benabbou, F. Detection of credit card fraud: State of art. Int. J. Comput. Sci. Netw. Secur. 2018, 18, 76–83. [Google Scholar]
Fiore, U.; De Santis, A.; Perla, F.; Zanetti, P.; Palmieri, F. Using generative adversarial networks for improving classification effectiveness in credit card fraud detection. Inf. Sci. 2019, 479, 448–455. [Google Scholar] [CrossRef]
Abdulghani, A.Q.; Uçan, O.N.; Alheeti, K.M.A. Credit card fraud detection using XGBoost algorithm. In Proceedings of the 2021 14th International Conference on Developments in eSystems Engineering (DeSE), Sharjah, United Arab Emirates, 7–10 December 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 487–492. [Google Scholar]
Halkiopoulos, C.; Papadopoulos, A.; Stamatiou, Y.C.; Theodorakopoulos, L.; Vlachos, V. A Digital Service for Citizens: Multi-Parameter Optimization Model for Cost-Benefit Analysis of Cybercrime and Cyberdefense. Emerg. Sci. J. 2024, 8, 1320–1344. [Google Scholar] [CrossRef]
Darwish, S.M. A bio-inspired credit card fraud detection model based on user behavior analysis suitable for business management in electronic banking. J. Ambient. Intell. Humaniz. Comput. 2020, 11, 4873–4887. [Google Scholar] [CrossRef]
Tyagi, N.; Rana, A.; Awasthi, S.; Tyagi, L.K. Data Science: Concern for Credit Card Scam with Artificial Intelligence. In Cyber Security in Intelligent Computing and Communications; Springer: Singapore, 2022; pp. 115–128. [Google Scholar]
Velasco, R.B.; Carpanese, I.; Interian, R.; Paulo Neto, O.C.; Ribeiro, C.C. A decision support system for fraud detection in public procurement. Int. Trans. Oper. Res. 2021, 28, 27–47. [Google Scholar] [CrossRef]
Bello, O.A.; Olufemi, K. Artificial intelligence in fraud prevention: Exploring techniques and applications, challenges, and opportunities. Comput. Sci. IT Res. J. 2024, 5, 1505–1520. [Google Scholar] [CrossRef]
Shoetan, P.O.; Familoni, B.T. Transforming fintech fraud detection with advanced artificial intelligence algorithms. Financ. Account. Res. J. 2024, 6, 602–625. [Google Scholar] [CrossRef]
Stamatiou, Y.; Halkiopoulos, C.; Antonopoulou, H. A Generic, Flexible Smart City Platform focused on Citizen Security and Privacy. In Proceedings of the 27th Pan-Hellenic Conference on Progress in Computing and Informatics, Lamia, Greece, 24–26 November 2023. [Google Scholar] [CrossRef]
Ali, A.; Razak, S.A.; Othman, S.H.; Eisa, T.A.E.; Al-Dhaqm, A.; Nasser, M.; Elhassan, T.; Elshafie, H.; Saif, A. Financial fraud detection based on machine learning: A systematic literature review. Appl. Sci. 2022, 12, 9637. [Google Scholar] [CrossRef]
Huang, Z.; Zheng, H.; Li, C.; Che, C. Application of machine learning-based k-means clustering for financial fraud detection. Acad. J. Sci. Technol. 2024, 10, 33–39. [Google Scholar] [CrossRef]
Paramesha, M.; Rane, N.L.; Rane, J. Artificial intelligence, machine learning, deep learning, and blockchain in financial and banking services: A comprehensive review. Partn. Univers. Multidiscip. Res. J. 2024, 1, 51–67. [Google Scholar] [CrossRef]
Kannagi, A.; Mohammed, J.G.; Murugan, S.S.G.; Varsha, M. Intelligent mechanical systems and its applications on online fraud detection analysis using pattern recognition K-nearest neighbor algorithm for cloud security applications. Mater. Today Proc. 2023, 81, 745–749. [Google Scholar] [CrossRef]
Hilal, W.; Gadsden, S.A.; Yawney, J. Financial fraud: A review of anomaly detection techniques and recent advances. Expert Syst. Appl. 2022, 193, 116429. [Google Scholar] [CrossRef]
Oprea, S.V.; Bâra, A.; Puican, F.C.; Radu, I.C. Anomaly detection with machine learning algorithms and big data in electricity consumption. Sustainability 2021, 13, 10963. [Google Scholar] [CrossRef]
Ashtiani, M.N.; Raahemi, B. Intelligent fraud detection in financial statements using machine learning and data mining: A systematic literature review. IEEE Access 2021, 10, 72504–72525. [Google Scholar] [CrossRef]
Oprea, S.V.; Bâra, A. Machine learning classification algorithms and anomaly detection in conventional meters and Tunisian electricity consumption large datasets. Comput. Electr. Eng. 2021, 94, 107329. [Google Scholar] [CrossRef]
Gopichander, A.; Leboulluec, A.K.; Leboulluec, P.L. Financial Fraud Detection using Machine Learning and Deep Learning Models. Int. J. Comput. Appl. 2023, 185, 32–37. [Google Scholar] [CrossRef]
Al-Dahasi, E.M.; Alsheikh, R.K.; Khan, F.A.; Jeon, G. Optimizing fraud detection in financial transactions with machine learning and imbalance mitigation. Expert Syst. 2024, 42, e13682. [Google Scholar] [CrossRef]
Alsuwailem, A.A.S.; Salem, E.; Saudagar, A.K.J. Performance of different machine learning algorithms in detecting financial fraud. Comput. Econ. 2023, 62, 1631–1667. [Google Scholar] [CrossRef]
Patil, S.; Nemade, V.; Soni, P.K. Predictive modelling for credit card fraud detection using data analytics. Procedia Comput. Sci. 2018, 132, 385–395. [Google Scholar] [CrossRef]
Karthik, V.S.S.; Mishra, A.; Reddy, U.S. Credit card fraud detection by modelling behaviour pattern using hybrid ensemble model. Arab. J. Sci. Eng. 2022, 47, 1987–1997. [Google Scholar] [CrossRef]
Gupta, P.; Varshney, A.; Khan, M.R.; Ahmed, R.; Shuaib, M.; Alam, S. Unbalanced credit card fraud detection data: A machine learning-oriented comparative study of balancing techniques. Procedia Comput. Sci. 2023, 218, 2575–2584. [Google Scholar] [CrossRef]
Singh, A.; Ranjan, R.K.; Tiwari, A. Credit card fraud detection under extreme imbalanced data: A comparative study of data-level algorithms. J. Exp. Theor. Artif. Intell. 2022, 34, 571–598. [Google Scholar] [CrossRef]
Tran, T.C.; Dang, T.K. Machine learning for prediction of imbalanced data: Credit fraud detection. In Proceedings of the 2021 15th International Conference on Ubiquitous Information Management and Communication (IMCOM), Seoul, Republic of Korea, 4–6 January 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–7. [Google Scholar] [CrossRef]
Kostopoulos, N.; Stamatiou, Y.C.; Halkiopoulos, C.; Antonopoulou, H. Blockchain Applications in the Military Domain: A Systematic Review. Technologies 2025, 13, 23. [Google Scholar] [CrossRef]
Sisodia, D.; Sisodia, D.S. Data sampling strategies for click fraud detection using imbalanced user click data of online advertising: An empirical review. IETE Tech. Rev. 2022, 39, 789–798. [Google Scholar] [CrossRef]
Vanini, P.; Rossi, S.; Zvizdic, E.; Domenig, T. Online payment fraud: From anomaly detection to risk management. Financ. Innov. 2023, 9, 66. [Google Scholar] [CrossRef]
Nascimento, D.C.; Barbosa, B.; Perez, A.M.; Caires, D.O.; Hirama, E.; Ramos, P.L.; Louzada, F. Risk management in e-commerce—A fraud study case using acoustic analysis through its complexity. Entropy 2019, 21, 1087. [Google Scholar] [CrossRef]
Aljohani, A. Predictive analytics and machine learning for real-time supply chain risk mitigation and agility. Sustainability 2023, 15, 15088. [Google Scholar] [CrossRef]
Wang, X.; Mazumder, R.K.; Salarieh, B.; Salman, A.M.; Shafieezadeh, A.; Li, Y. Machine learning for risk and resilience assessment in structural engineering: Progress and future trends. J. Struct. Eng. 2022, 148, 03122003. [Google Scholar] [CrossRef]
Bello, H.O.; Ige, A.B.; Ameyaw, M.N. Adaptive machine learning models: Concepts for real-time financial fraud prevention in dynamic environments. World J. Adv. Eng. Technol. Sci. 2024, 12, 021–034. [Google Scholar] [CrossRef]
Bello, O.A. Machine learning algorithms for credit risk assessment: An economic and financial analysis. Int. J. Manag. 2023, 10, 109–133. [Google Scholar]
Alazizi, A.; Habrard, A.; Jacquenet, F.; He-Guelton, L.; Oblé, F.; Siblini, W. Anomaly detection, consider your dataset first an illustration on fraud detection. In Proceedings of the 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), Portland, OR, USA, 4–6 November 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1351–1355. [Google Scholar]
Chadyšas, V.; Bugajev, A.; Kriauzienė, R.; Vasilecas, O. Outlier analysis for telecom fraud detection. In International Baltic Conference on Digital Business and Intelligent Systems; Springer International Publishing: Cham, Switzerland, 2022; pp. 219–231. [Google Scholar] [CrossRef]
Westerski, A.; Kanagasabai, R.; Shaham, E.; Narayanan, A.; Wong, J.; Singh, M. Explainable anomaly detection for procurement fraud identification: Lessons from practical deployments. Int. Trans. Oper. Res. 2021, 28, 3276–3302. [Google Scholar] [CrossRef]
Sikder, M.N.K.; Batarseh, F.A. Outlier detection using AI: A survey. In AI Assurance; Academic Press: Cambridge, MA, USA, 2023; pp. 231–291. [Google Scholar] [CrossRef]
Eldawy, E.O.; Hendawi, A.; Abdalla, M.; Mokhtar, H.M. FraudMove: Fraud drivers discovery using real-time trajectory outlier detection. ISPRS Int. J. Geo-Inf. 2021, 10, 767. [Google Scholar] [CrossRef]
Palakurti, N.R. Challenges and future directions in anomaly detection. In Practical Applications of Data Processing, Algorithms, and Modeling; IGI Global: Hershey, PA, USA, 2024; pp. 269–284. [Google Scholar] [CrossRef]
Vosseler, A. Unsupervised insurance fraud prediction based on anomaly detector ensembles. Risks 2022, 10, 132. [Google Scholar] [CrossRef]
Sood, P.; Sharma, C.; Nijjer, S.; Sakhuja, S. Review the role of artificial intelligence in detecting and preventing financial fraud using natural language processing. Int. J. Syst. Assur. Eng. Manag. 2023, 14, 2120–2135. [Google Scholar] [CrossRef]
Rane, N.L.; Choudhary, S.P.; Rane, J. Artificial intelligence-driven corporate finance: Enhancing efficiency and decision-making through machine learning, natural language processing, and robotic process automation in corporate governance and sustainability. Stud. Econ. Bus. Relat. 2024, 5, 1–22. [Google Scholar] [CrossRef]
Adelakun, B.O.; Onwubuariri, E.R.; Adeniran, G.A.; Ntiakoh, A. Enhancing fraud detection in accounting through AI: Techniques and case studies. Financ. Account. Res. J. 2024, 6, 978–999. [Google Scholar] [CrossRef]
Faccia, A.; McDonald, J.; George, B. NLP sentiment analysis and accounting transparency: A new era of financial record keeping. Computers 2023, 13, 5. [Google Scholar] [CrossRef]
Xu, X.; Xiong, F.; An, Z. Using machine learning to predict corporate fraud: Evidence based on the gone framework. J. Bus. Ethics 2023, 186, 137–158. [Google Scholar] [CrossRef]
Wang, H.; Wang, W.; Liu, Y.; Alidaee, B. Integrating machine learning algorithms with quantum annealing solvers for online fraud detection. IEEE Access 2022, 10, 75908–75917. [Google Scholar] [CrossRef]
Chhabra Roy, N.; Prabhakaran, S. Internal-led cyber frauds in Indian banks: An effective machine learning-based defense system to fraud detection, prioritization, and prevention. Aslib J. Inf. Manag. 2023, 75, 246–296. [Google Scholar] [CrossRef]
Alarfaj, F.K.; Malik, I.; Khan, H.U.; Almusallam, N.; Ramzan, M.; Ahmed, M. Credit card fraud detection using state-of-the-art machine learning and deep learning algorithms. IEEE Access 2022, 10, 39700–39715. [Google Scholar] [CrossRef]
Esenogho, E.; Mienye, I.D.; Swart, T.G.; Aruleba, K.; Obaido, G. A neural network ensemble with feature engineering for improved credit card fraud detection. IEEE Access 2022, 10, 16400–16407. [Google Scholar] [CrossRef]
Ludera, D.T. Credit card fraud detection by combining synthetic minority oversampling and edited nearest neighbours. In Advances in Information and Communication: Proceedings of the 2021 Future of Information and Communication Conference (FICC), Virtual, 29–30 April 2021; Springer International Publishing: Cham, Switzerland, 2021; Volume 2, pp. 735–743. [Google Scholar]
Abd El-Naby, A.; Hemdan, E.E.D.; El-Sayed, A. An efficient fraud detection framework with credit card imbalanced data in financial services. Multimed. Tools Appl. 2023, 82, 4139–4160. [Google Scholar] [CrossRef]
Saeed, M.A.; Saeed, M.A. Real-Time Diabetes Detection Using Machine Learning and Apache Spark. In Proceedings of the 2024 4th International Conference on Emerging Smart Technologies and Applications (eSmarTA), Sana’a, Yemen, 6–7 August 2024; pp. 1–6. [Google Scholar] [CrossRef]
Vishal, K.S.; Srinidhi, S.K.; Shashank, U.S.; Samhitha, S.S.; Venugopalan, M. A Scalable Model for Text Mining using Sentiment Analysis on PySpark. In Proceedings of the 2023 4th International Conference on Intelligent Technologies (CONIT), Bangalore, India, 21–23 June 2024; pp. 1–8. [Google Scholar] [CrossRef]
Bounab, R.; Guelib, B.; Zarour, K. A Novel Machine Learning Approach For Handling Imbalanced Data: Leveraging SMOTE-ENN and XGBoost. In Proceedings of the 2024 6th International Conference on Pattern Analysis and Intelligent Systems (PAIS), EL OUED, Algeria, 24–25 April 2024; pp. 1–7. [Google Scholar] [CrossRef]
Khalil, A.A.; Liu, Z.; Fathalla, A.; Ali, A.; Salah, A. Machine Learning based Method for Insurance Fraud Detection on Class Imbalance Datasets with Missing Values. IEEE Access 2024, 12, 155451–155468. [Google Scholar] [CrossRef]
Talukder, M.A.; Khalid, M.; Uddin, M.A. An Integrated Multistage Ensemble Machine Learning Model for Fraudulent Transaction Detection. J. Big Data 2024, 11, 168. [Google Scholar] [CrossRef]
Bounab, R.; Zarour, K.; Guelib, B.; Khlifa, N. Enhancing Medicare Fraud Detection Through Machine Learning: Addressing Class Imbalance With SMOTE-ENN. IEEE Access 2024, 12, 54382–54396. [Google Scholar] [CrossRef]
Alfaiz, N.S.; Fati, S.M. Enhanced credit card fraud detection model using machine learning. Electronics 2022, 11, 662. [Google Scholar] [CrossRef]
Pourroostaei Ardakani, S.; Cheshmehzangi, A. Improve the Daily Societal Operations Using Credit Fraud Detection: A Big Data Classification Solution. In Big Data Analytics for Smart Urban Systems; Springer Nature: Singapore, 2023; pp. 97–110. [Google Scholar] [CrossRef]
Gulhare, A.K.; Badholia, A.; Sharma, A. Mean-Shift and Local Outlier Factor-Based Ensemble Machine Learning Approach for Anomaly Detection in IoT Devices. In Proceedings of the 2022 International Conference on Inventive Computation Technologies (ICICT), Lalitpur, Nepal, 20–22 July 2022; pp. 649–656. [Google Scholar] [CrossRef]
Kamal, H.; Mashaly, M. Advanced Hybrid Transformer-CNN Deep Learning Model for Effective Intrusion Detection Systems with Class Imbalance Mitigation Using Resampling. Future Internet 2024, 16, 481. [Google Scholar] [CrossRef]
Ünal, F.; Almalaq, A.; Ekici, S.; Glauner, P. Big Data-Driven Detection of False Data Injection Attacks in Smart Meters. IEEE Access 2021, 9, 144313–144326. [Google Scholar] [CrossRef]
Alghamdi, R.; Bellaiche, M. Evaluation and Selection Models for Ensemble Intrusion Detection Systems in IoT. IoT 2022, 3, 285–314. [Google Scholar] [CrossRef]
Sulochana, B.C.; Pragada, B.S.; Lokesh, K.; Venugopalan, M. PySpark-Powered ML Models for Accurate Spam Detection in Messages. In Proceedings of the 2023 2nd International Conference on Futuristic Technologies (INCOFT), Belagavi, India, 24–26 November 2023; pp. 1–6. [Google Scholar] [CrossRef]
Alrefaei, A.; Ilyas, M. Using Machine Learning Multiclass Classification Technique to Detect IoT Attacks in Real Time. Sensors 2024, 24, 4516. [Google Scholar] [CrossRef]
Kourid, A.; Chikhi, S.; Recupero, D.R. Fuzzy Optimized V-Detector Algorithm on Apache Spark for Class Imbalance Issue of Intrusion Detection in Big Data. Neural Comput. Appl. 2023, 35, 19821–19845. [Google Scholar] [CrossRef]
Asgari, M.; Yang, W.; Farnaghi, M. Spatiotemporal Data Partitioning for Distributed Random Forest Algorithm: Air Quality Prediction Using Imbalanced Big Spatiotemporal Data on Spark Distributed Framework. Environ. Technol. Innov. 2022, 27, 102776. [Google Scholar] [CrossRef]
Alam, A. Cloud-Based E-Learning: Development of Conceptual Model for Adaptive E-Learning Ecosystem Based on Cloud Computing Infrastructure. In Proceedings of the International Conference on Artificial Intelligence and Data Science, Hyderabad, India, 17–18 December 2021; Springer Nature: Cham, Switzerland, 2021; pp. 377–391. [Google Scholar] [CrossRef]
Venegas, F.G.; Petit, M.; Perez, Y. Active Integration of Electric Vehicles into Distribution Grids: Barriers and Frameworks for Flexibility Services. Renew. Sustain. Energy Rev. 2021, 145, 111060. [Google Scholar] [CrossRef]
Bonawitz, K.; Kairouz, P.; McMahan, B.; Ramage, D. Federated Learning and Privacy: Building Privacy-Preserving Systems for Machine Learning and Data Science on Decentralized Data. Queue 2021, 19, 87–114. [Google Scholar] [CrossRef]
Singh, A.; Jain, A. An efficient credit card fraud detection approach using cost-sensitive weak learner with imbalanced dataset. Comput. Intell. 2022, 38, 2035–2055. [Google Scholar] [CrossRef]
Krishna Rao, N.V.; Harika Devi, Y.; Shalini, N.; Harika, A.; Divyavani, V.; Mangathayaru, N. Credit card fraud detection using spark and machine learning techniques. In Machine Learning Technologies and Applications: Proceedings of the ICACECS 2020, Hyderabad, Telangana, 13–14 August 2020; Springer: Singapore, 2021; pp. 163–172. [Google Scholar]
Madhavi, A.; Sivaramireddy, T. Real-Time Credit Card Fraud Detection Using Spark Framework. In Machine Learning Technologies and Applications: Proceedings of the ICACECS 2020, Hyderabad, Telangana, 13–14 August 2020; Springer: Singapore, 2021; pp. 287–298. [Google Scholar]
Theodorakopoulos, L.; Theodoropoulou, A.; Halkiopoulos, C. Enhancing Decentralized Decision-Making with Big Data and Blockchain Technology: A Comprehensive Review. Appl. Sci. 2024, 14, 7007. [Google Scholar] [CrossRef]
Chouiekh, A.; Ibn El Haj, E.H. Towards Spark-Based Deep Learning Approach for Fraud Detection Analysis. In Proceedings of the Sixth International Congress on Information and Communication Technology: ICICT 2021, London, UK, 25–26 February 2021; Springer: Singapore, 2022; Volume 3, pp. 15–22. [Google Scholar]
El Hajjami, S.; Malki, J.; Berrada, M.; Mostafa, H.; Bouju, A. Machine learning system for fraud detection. A methodological approach for a development platform. In Proceedings of the International Conference on Digital Technologies and Applications, Fez, Morocco, 29–30 January 2021; Springer International Publishing: Cham, Switzerland, 2021; pp. 99–110. [Google Scholar]
Vlachou, E.; Karras, A.; Karras, C.; Theodorakopoulos, L.; Halkiopoulos, C.; Sioutas, S. Distributed Bayesian Inference for Large-Scale IoT Systems. Big Data Cogn. Comput. 2023, 8, 1. [Google Scholar] [CrossRef]
Karras, A.; Giannaros, A.; Karras, C.; Theodorakopoulos, L.; Mammassis, C.S.; Krimpas, G.A.; Sioutas, S. TinyML Algorithms for Big Data Management in Large-Scale IoT Systems. Future Internet 2024, 16, 42. [Google Scholar] [CrossRef]
Khalid, A.R.; Owoh, N.; Uthmani, O.; Ashawa, M.; Osamor, J.; Adejoh, J. Enhancing Credit Card Fraud Detection: An Ensemble Machine Learning Approach. Big Data Cogn. Comput. 2024, 8, 6. [Google Scholar] [CrossRef]
Li, Z.; Wang, B.; Huang, J.; Jin, Y.; Xu, Z.; Zhang, J.; Gao, J. A graph-powered large-scale fraud detection system. Int. J. Mach. Learn. Cybern. 2024, 15, 115–128. [Google Scholar] [CrossRef]
Singh, K.; Kolar, P.; Abraham, R.; Seetharam, V.; Nanduri, S.; Kumar, D. Automated Secure Computing for Fraud Detection in Financial Transactions. Autom. Secur. Comput. Next-Gener. Syst. 2024, 9, 177–189. [Google Scholar]
Chen, C.T.; Lee, C.; Huang, S.H.; Peng, W.C. Credit Card Fraud Detection via Intelligent Sampling and Self-supervised Learning. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–29. [Google Scholar] [CrossRef]
Aburbeian, A.M.; Fernández-Veiga, M. Secure Internet Financial Transactions: A Framework Integrating Multi-Factor Authentication and Machine Learning. AI 2024, 5, 177–194. [Google Scholar] [CrossRef]
Zhao, Y.; Zheng, G.; Mukherjee, S.; McCann, R.; Awadallah, A. Admoe: Anomaly detection with mixture-of-experts from noisy labels. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–17 February 2023; Volume 37, pp. 4937–4945. [Google Scholar]
Hansen, P.R.; Lunde, A.; Nason, J.M. The model confidence set. Econometrica 2011, 79, 453–497. [Google Scholar] [CrossRef]
Dal Pozzolo, A.; Caelen, O.; Le Borgne, Y.A.; Waterschoot, S.; Bontempi, G. Learned lessons in credit card fraud detection from a practitioner perspective. Expert Syst. Appl. 2015, 41, 4915–4928. [Google Scholar] [CrossRef]
Carcillo, F.; Dal Pozzolo, A.; Le Borgne, Y.A.; Caelen, O.; Mazzer, Y.; Bontempi, G. Scarff: A scalable framework for streaming credit card fraud detection with Spark. Inf. Fusion 2018, 41, 182–194. [Google Scholar] [CrossRef]
Benaichouba, R.; Brahmi, M.; Adala, L. Economic of cyber-security and society databases: Protecting the digital ecosystem from cyber-attacks. Int. J. Prof. Bus. Rev. 2024, 9, e04803. [Google Scholar] [CrossRef]
Silva, F.B.G. Fiscal deficits, bank credit risk, and loan-loss provisions. J. Financ. Quant. Anal. 2021, 56, 1537–1589. [Google Scholar] [CrossRef]
Gkintoni, E.; Kakoleres, G.; Telonis, G.; Halkiopoulos, C.; Boutsinas, B. A Conceptual Framework for Applying Social Signal Processing to Neuro-Tourism. In Proceedings of the International Conference of the International Association of Cultural and Digital Tourism, Crete, Greece, 29–31 August 2023; Springer: Cham, Switzerland, 2023; pp. 323–335. [Google Scholar] [CrossRef]
Antonopoulou, H. Kolmogorov complexity based upper bounds for the unsatisfiability threshold of random k-SAT. J. Discret. Math. Sci. Cryptogr. 2020, 23, 1431–1438. [Google Scholar] [CrossRef]
Antonopoulou, S.; Stamatiou, Y.C.; Vamvakari, M. An asymptotic expansion for theq-binomial series using singularity analysis for generating functions. J. Discret. Math. Sci. Cryptogr. 2007, 10, 313–328. [Google Scholar] [CrossRef]
Gousteris, S.; Stamatiou, Y.C.; Halkiopoulos, C.; Antonopoulou, H.; Kostopoulos, N. Secure Distributed Cloud Storage based on the Blockchain Technology and Smart Contracts. Emerg. Sci. J. 2023, 7, 469–479. [Google Scholar] [CrossRef]
Karras, A.; Giannaros, A.; Theodorakopoulos, L.; Krimpas, G.A.; Kalogeratos, G.; Karras, C.; Sioutas, S. FLIBD: A Federated Learning-Based IoT Big Data Management Approach for Privacy-Preserving over Apache Spark with FATE. Electronics 2023, 12, 4633. [Google Scholar] [CrossRef]
Karras, A.; Karras, C.; Bompotas, A.; Bouras, P.; Theodorakopoulos, L.; Sioutas, S. SparkReact: A Novel and User-friendly Graphical Interface for the Apache Spark MLlib Library. In Proceedings of the 26th Pan-Hellenic Conference on Informatics, Athens, Greece, 25–27 November 2022. [Google Scholar] [CrossRef]
Antonopoulou, H.; Theodorakopoulos, L.; Halkiopoulos, C.; Mamalougkou, V. Utilizing Machine Learning to Reassess the Predictability of Bank Stocks. Emerg. Sci. J. 2023, 7, 724–732. [Google Scholar] [CrossRef]
Zehra, S.; Faseeha, U.; Syed, H.J.; Samad, F.; Ibrahim, A.O.; Abulfaraj, A.W.; Nagmeldin, W. Machine Learning-Based Anomaly Detection in NFV: A Comprehensive Survey. Sensors 2023, 23, 5340. [Google Scholar] [CrossRef]
Asimakopoulos, G.; Antonopoulou, H.; Giotopoulos, K.; Halkiopoulos, C. Impact of Information and Communication Technologies on Democratic Processes and Citizen Participation. Societies 2025, 15, 40. [Google Scholar] [CrossRef]

Figure 1. Credit card fraud detection concentric diagram.

Figure 2. Methodology framework.

Figure 3. Correlation heatmap of PCA-transformed features (V1–V28), the transaction amount, and the target class. High positive correlations are shown in dark blue, negative in dark green, and weaker in lighter shades.

Figure 4. Number of transactions.

Figure 5. Kdeplot of transaction amount distribution.

Figure 6. Printing histograms of features V1, V10, V12, and V23.

Figure 7. Fraud rate in credit card transactions.

Figure 8. Key metrics of models.

Figure 9. Training accuracy comparison of machine learning models.

Figure 10. Testing the accuracy comparison of machine learning models.

Figure 11. Training precision, recall, and F1-score of machine learning models.

Figure 12. Testing precision, recall, and F1-score of machine learning models.

Figure 13. Comparative metrics heatmap for training and testing datasets.

Figure 14. Comparison of training times across machine learning models.

Figure 15. Comparison of testing times across machine learning models.

Table 1. Performance metrics for the Logistic Regression model.

	Metric	Training	Testing
Logistic Regression	Accuracy (%)	96.50	96.49
	Specificity (%)	97.84	97.84
	Precision	96.53	96.52
	Recall	96.50	96.49
	F1-Score	96.50	96.50
	Brier Score	0.0259	0.0259

Table 2. Performance metrics for the Decision Trees model.

	Depth	Accuracy (%)	Specificity (%)	Brier Score	Depth
Decision Trees	15	98.06	98.65	0.0147	15
	20	98.62	98.97	0.0106	20
	25	98.91	99.15	0.0085	25

Table 3. Performance metrics for the Random Forest model.

	Depth	Accuracy (%)	Specificity (%)	Brier Score
Random Forest	15	99.11	99.31	0.0072
	20	99.26	99.42	0.0061
	25	99.36	99.50	0.0053

Table 4. Performance metrics for the XGBoost model.

	Metric	Training	Testing
XGBoost	Accuracy (%)	100.00	99.97
	Specificity (%)	100.00	99.95
	Brier Score	0.0000	0.0002

Table 5. Performance metrics for the CatBoost model.

	Depth	Accuracy (%)	Specificity (%)	Brier Score
CatBoost	6	99.95	99.90	0.0005
CatBoost	8	99.96	99.91	0.0004

Table 6. Comparative performance metrics for machine learning models.

Model	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)	Specificity (%)	Brier Score
Logistic Regression	96.5	96.0	96.2	96.1	97.8	0.0259
Decision Trees (Depth 25)	98.9	99.2	99.1	99.1	99.1	0.0085
Random Forest (Depth 25)	99.4	99.5	99.4	99.5	99.5	0.0053
XGBoost (Depth 6)	99.97	99.95	99.97	99.96	99.95	0.0002
CatBoost (Depth 8)	99.96	99.91	99.95	99.93	99.91	0.0004

Table 7. Hypothesis evaluation summary.

Hypothesis	Description	Evaluation Criteria	Outcome
H1	Advanced machine learning techniques (e.g., XGBoost, CatBoost) significantly improve decision-making in fraud detection systems compared to traditional models.	Accuracy, Precision, Recall, F1-Score, Brier Score	XGBoost and CatBoost achieved 99.97% and 99.96% accuracy, outperforming Logistic Regression (96.49%)
H2	Integrating distributed frameworks (e.g., PySpark) with ML algorithms improves scalability, efficiency, and real-time performance of fraud detection.	Training time, Latency, Scalability metrics, System integration	PySpark reduced training time by up to 52% and supported real-time processing at 500 ms latency, outperforming traditional frameworks like Hadoop and Flink

Table 8. Proposed Framework for PySpark + XGBoost/CatBoost.

Metric	Proposed Framework (PySpark + XGBoost/CatBoost)	Hadoop-Based	Flink-Based	Metric
Training Time	2.5 h	5.2 h	3.8 h	Training Time
Accuracy	98%	94%	96%	Accuracy
Latency (real-time)	500 ms	1200 ms	800 ms	Latency (real-time)
Scalability	High	Moderate	High	Scalability

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Theodorakopoulos, L.; Theodoropoulou, A.; Tsimakis, A.; Halkiopoulos, C. Big Data-Driven Distributed Machine Learning for Scalable Credit Card Fraud Detection Using PySpark, XGBoost, and CatBoost. Electronics 2025, 14, 1754. https://doi.org/10.3390/electronics14091754

AMA Style

Theodorakopoulos L, Theodoropoulou A, Tsimakis A, Halkiopoulos C. Big Data-Driven Distributed Machine Learning for Scalable Credit Card Fraud Detection Using PySpark, XGBoost, and CatBoost. Electronics. 2025; 14(9):1754. https://doi.org/10.3390/electronics14091754

Chicago/Turabian Style

Theodorakopoulos, Leonidas, Alexandra Theodoropoulou, Anastasios Tsimakis, and Constantinos Halkiopoulos. 2025. "Big Data-Driven Distributed Machine Learning for Scalable Credit Card Fraud Detection Using PySpark, XGBoost, and CatBoost" Electronics 14, no. 9: 1754. https://doi.org/10.3390/electronics14091754

APA Style

Theodorakopoulos, L., Theodoropoulou, A., Tsimakis, A., & Halkiopoulos, C. (2025). Big Data-Driven Distributed Machine Learning for Scalable Credit Card Fraud Detection Using PySpark, XGBoost, and CatBoost. Electronics, 14(9), 1754. https://doi.org/10.3390/electronics14091754

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Big Data-Driven Distributed Machine Learning for Scalable Credit Card Fraud Detection Using PySpark, XGBoost, and CatBoost

Abstract

1. Introduction

2. Literature Review

2.1. Introduction to Credit Card Fraud Detection

2.2. Role of Machine Learning in Improving Decision-Making Processes in Fraud Detection

2.3. Distributed Frameworks for Scalable Credit Card Fraud Detection

3. Materials and Methods

3.1. Performance Evaluation

3.2. Dataset Description

3.3. Experimental Controls and Reproducibility Measures

3.4. Rationale for Model Selection and Statistical Validation Enhancements

3.4.1. Model Selection Justification

3.4.2. Statistical Validation and Error Analysis

4. Results

4.1. Logistic Regression

4.2. Decision Trees

4.3. Random Forest

4.4. XGBoost

4.5. CatBoost

4.6. Model Accuracy Comparison

4.7. Comparative Analysis

5. Discussion

5.1. Comparative Evaluation and Innovations Beyond the State of the Art

5.2. Model Sensitivity to Macroeconomic Shifts and Policy Invariance

5.3. Limitations of the Study

5.4. Future Research Directions

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI