Credit Card Fraud: Analysis of Feature Extraction Techniques for Ensemble Hidden Markov Model Prediction Approach

Ogundile, Olayinka; Babalola, Oluwaseyi; Ogunbanwo, Afolakemi; Ogundile, Olabisi; Balyan, Vipin

doi:10.3390/app14167389

Open AccessArticle

Credit Card Fraud: Analysis of Feature Extraction Techniques for Ensemble Hidden Markov Model Prediction Approach

by

Olayinka Ogundile

^1,*

,

Oluwaseyi Babalola

²

,

Afolakemi Ogunbanwo

¹

,

Olabisi Ogundile

¹

and

Vipin Balyan

²

¹

Department of Computer Science, Tai Solarin University of Education, Ijagun, Ijebu-Ode 2118, Nigeria

²

French-South African Institute of Technology, Department of Electrical, Electronic, and Computer Engineering, Cape Peninsula University of Technology, Bellville Campus, Bellville 7535, South Africa

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(16), 7389; https://doi.org/10.3390/app14167389

Submission received: 22 July 2024 / Revised: 17 August 2024 / Accepted: 20 August 2024 / Published: 21 August 2024

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

In the face of escalating credit card fraud due to the surge in e-commerce activities, effectively distinguishing between legitimate and fraudulent transactions has become increasingly challenging. To address this, various machine learning (ML) techniques have been employed to safeguard cardholders and financial institutions. This article explores the use of the Ensemble Hidden Markov Model (EHMM) combined with two distinct feature extraction methods: principal component analysis (PCA) and a proposed statistical feature set termed MRE, comprising Mean, Relative Amplitude, and Entropy. Both the PCA-EHMM and MRE-EHMM approaches were evaluated using a dataset of European cardholders and demonstrated comparable performance in terms of recall (sensitivity), specificity, precision, and F1-score. Notably, the MRE-EHMM method exhibited significantly reduced computational complexity, making it more suitable for real-time credit card fraud detection. Results also demonstrated that the PCA and MRE approaches perform significantly better when integrated with the EHMM in contrast to the conventional HMM approach. In addition, the proposed MRE-EHMM and PCA-EHMM techniques outperform other classic ML models, including random forest (RF), linear regression (LR), decision trees (DT) and K-nearest neighbour (KNN).

Keywords:

credit card; entropy; EHMM; fraud prediction; MRE; mean; PCA; relative amplitude

1. Introduction

The progress in online banking systems has motivated the use of credit cards to ease the payment of products and services. Credit cards are payment cards issued to cardholders to purchase products and services based on their accumulated debt. In most cases, before a credit card payment is approved, some vital information is requested from the cardholders such as the personal identification number (PIN), card verification value (CVV) number, expiration date, and so on. This information helps to validate the authenticity of the cardholders, to prevent fraud; yet cases of credit card fraud are recorded daily. Nonetheless, credit cards are becoming favoured and widely employed for online banking payments, which implies that the number of credit card transactions has increased exponentially despite the theft risk it involves.

This growth in the acceptability of credit cards has made it difficult to differentiate between a fraudulent and a non-fraudulent credit card transaction. Accordingly, credit card theft is on the rise irrespective of the security information (such as PIN and CVV) required before a transaction is approved. Although security checks such as tokenization and data encryption are used to prevent credit card theft [1], they cannot completely prevent fraudulent credit card transactions. In particular, credit card fraud occurs remotely, whereby the simple card information is all that is required. The time and place for the fraudulent transactions do not call for a PIN, card imprint, or handwritten signature. Most of the time, the victims of fraud are oblivious that the perpetrators have access to their credit card information, especially when these credit cards are used for payment on phishing websites [2,3]. The quickest way to spot these credit card frauds is to examine each card’s spending patterns and look for any differences from regular spending patterns. However, it is complex to achieve because the daily number of credit card transactions is huge, resulting in large transactional information. As a result, there has been a lot of recent study into the exact, rapid, and efficient prediction of credit card fraud.

In recent times, machine learning (ML) tools have been deployed in the literature to efficiently predict credit card frauds. ML tools enable computers to improve their forecasting abilities by learning from previous datasets. Different ML tools, such as hidden Markov models (HMMs), decision trees, K-nearest neighbour, logistic regression, and so on, have been deployed for the prediction of credit card frauds [4,5,6,7,8]. However, work is being carried out to improve the predictive ability of these ML tools.

HMM is a popular and flexible ML tool that can easily model randomly changing datasets. HMMs can easily predict fraudulent transactions from a sequence of defined observations. Nonetheless, the ability of HMM to effectively predict these credit card frauds depend on the adopted feature extraction technique. This implies that the more reliable the output of the feature extraction technique, the better the prediction performance of HMM [9,10,11]. Likewise, the length of the outputted feature vector, which doubles as the dimension of HMM, determines the computational time complexity imposed on HMM. The longer the length of the feature vector, the more the computational time complexity imposed on HMM [12]. Therefore, it is paramount to carefully select the feature attraction technique that will be combined with HMM to balance the trade-off between its performance gain and the computational time complexity.

Accordingly, this article analyses two feature extraction techniques that can be combined with HMM for the prediction of credit card fraud. First, the principal component analysis (PCA) [13,14] technique is combined with HMM for the prediction of credit card fraud. In addition to employing PCA, this article computationally determines the “optimal” feature vector length that will balance the trade-off between the PCA performance gain and the computational time complexity it imposed on HMM. It was discovered that this ‘optimal’ feature vector length is not computational time-efficient when PCA is combined with HMM. Therefore, the features derived using PCA are converted to statistical features to reduce the computational time complexity imposed on HMM. The proposed robust but simple statistical features denoted as MRE (Mean, Relative Amplitude, and Entropy) are merged to form a feature vector that can be combined with HMM to effectively predict credit card frauds.

Furthermore, as highlighted in [12], the Gaussian emission distribution parameters of HMM are sensitive to a flat start or random values, which impedes prediction performance. Hence, the K-mean clustering (K-MC) technique [15,16] and the Gaussian mixture model (GMM) [17,18] are sequentially used to initialise the HMM process. Since the K-MC and GMM techniques are embedded in the Gaussian emission process of HMM, the latter is referred to as an ensemble hidden Markov model (EHMM) in this article. Therefore, this article evaluates the performance of PCA and MRE when combined with EHMM (PCA-EHMM and MRE-EHMM). Likewise, the performance of PCA and MRE is verified when combined with the conventional HMM (PCA-HMM and MRE-HMM), demonstrating the necessity of the initialisation. Moreover, the performance of the proposed MRE-EHMM and PCA-EHMM techniques is validated with other classic ML models such as random forest (RF), linear regression (LR), decision trees (DT), and K-nearest neighbour (KNN). The performances are documented for different performance metrics such as recall/sensitivity (

S_{e}

), specificity (

S_{p}

), precision (

P

), and F1-score (

F 1

) using the credit card transactions dataset of European cardholders gathered within two days in September 2013 (https://www.kaggle.com/mlg-ulb/creditcardfraud, accessed on 21 July 2024).

The key contributions and significance of this article are outlined as follows. While various studies have proposed the use of HMM for predicting credit card fraud, these works often lack clarity regarding the feature extraction techniques employed and their integration with the HMM. Feature extraction is crucial for enhancing the model’s predictive accuracy [11]. In this article, two feature extraction techniques are proposed, which are tailored for use with the EHMM to effectively predict fraudulent credit card transactions. The EHMM approach presented here distinguishes itself from the conventional HMM models used in credit card fraud detection in two significant ways: (1) HMM is systematically integrated with K-MC and GMM techniques to form an EHMM, enhancing the model’s predictive capability, and (2) two feature extraction techniques (PCA and MRE) are introduced and can be seamlessly integrated with the EHMM to further improve its performance. Moreover, the results obtained from applying these techniques demonstrate high reliability and are reproducible for real-time credit card transaction monitoring. This article is expected to contribute to reducing financial losses for cardholders and institutions while bolstering confidence in online banking involving credit card transactions.

The remaining part of this article is organised as follows. Section 2 briefly reviews some of the recent works on the prediction of credit card frauds using HMMs. In Section 3, the dataset used for result verification is discussed. The PCA and MRE feature extraction techniques are explained in Section 4. Section 5 discusses the EHMM in detail while also explaining its training process. In Section 6, the metrics used in documenting the performances of the proposed feature extraction techniques are briefly explained. Section 7 discusses the obtained results from the PCA-EHMM and MRE-EHMM prediction techniques. In addition, this section highlights the performance difference between the EHMM and conventional HMM approaches, and also compares the PCA-EHMM and MRE-EHMM techniques to other existing ML techniques in the literature. The article is concluded with some final remarks in Section 8.

2. Related Work

The widespread use of credit cards for online purchases has also increased the possibility of fraud. According to the Nigerian Deposit Insurance Corporation (NDIC) annual report of 2018, between 2016 and 2018, the number of credit card fraud incidents in Nigeria increased by 33%, while the actual amount lost to this credit card fraud climbed by 84% [19]. Likewise, the federal trade commission (FTC) affirmed that there were around 1579 data breaches, totalling 179 million data points, with credit card fraud being the most widespread [6,20]. Therefore, a wide range of ML and data mining techniques have been deployed in the literature to proffer solutions to this menace. This section reviews some recent and related credit card prediction techniques.

A weighted average ensemble algorithm for credit card fraud detection was proposed in [21]. This algorithm integrated multiple models, including logistic regression (LR), K-nearest neighbour (KNN), random forest (RF), AdaBoost, and Bagging, to enhance detection accuracy. The study utilized a dataset from a European credit card company and achieved an accuracy of 99%, surpassing the baseline models (LR, AdaBoost, RF, KNN, and Bagging) in performance. The authors demonstrated the potential of utilizing an ensemble model for real-time credit card fraud detection. However, the lack of details regarding the feature extraction process limits the reproducibility of their work.

In [22], a comparative study on four ML techniques, namely, DT, RF, LR, and Naive Bayes (NB), for the detection of credit card frauds was carried out, with attention on imbalanced datasets. Their analysis revealed that RF was effective at detecting credit card fraud, while LR and NB performed comparably, and DT had the worst performance. However, similar to [21], the study did not provide insight into the feature extraction process, raising concerns about the reproducibility of their results.

The authors in [23] conducted an extensive review of various machine learning techniques used for detecting credit card fraud. The review encompassed methods such as NB, support vector machines (SVMs), multilayer feed forward (MLFF) neural networks, Bayesian belief networks (BBNs), genetic algorithms (GAs), and classification and regression trees (CARTs). As this article was a critical analysis of prior studies, no specific dataset was employed. The findings highlighted the superior performance of NB, which achieved the highest accuracy, followed by SVM and GA.

In [24], the sequence of operation of credit card transactions was modelled using HMMs.Also, the paper provided information on how HMMs can be used to detect fraudulent credit card transactions. Yet, the paper did not document any clear results to buttress the performance of their developed HMM. More importantly, the paper did not provide information on the feature extraction technique used or combined with their developed HMM. As mentioned earlier, the feature extraction technique used with any ML technique including HMMs determines the performance of the ML technique, hence, the focus of this article.

The authors in [25] modelled a fraud detection system that would attempt to detect credit card fraud as accurately as possible by producing clusters and analysing the clusters formed by the dataset for anomalies. Therefore, their work examined the detection accuracy of two hybrid techniques: the K-MC technique with multilayer perceptron (MLP) and the K-MC technique with HMM. The authors show that the detection accuracy of the two models examined was fairly the same. Nonetheless, the paper was silent on the adopted feature extraction technique which is a major factor in analysing the performances of their proposed models.

The process of fraud detection using HMMs was described in [26]. In their work, the researchers combined the HMM process with the K-MC technique to form what this article described as an EHMM. The K-MC technique was used to initialise the HMM process to improve its performance. Their model’s performance was presented in terms of recall and precision. Although their model training flowchart displayed the importance of the feature extraction step, there was no discussion on their adopted feature extraction technique. This article combines the K-MC technique and GMM sequentially with the HMM to form an EHMM and focuses on the adopted feature extraction technique, which determines the performance of the EHMM and any other ML tool.

A credit card fraud detection model was developed using a multiple-perspective HMM-based approach in [4]. The study developed eight HMMs to model sequences of credit card transactions. They employed history-based features with HMM based on three perspectives that were aided with different assumptions. Their result was documented in terms of recall and precision. On the other hand, this article derives the feature vector using PCA and a simple statistical method termed MRE. More so, an EHMM is used in this article, which demonstrates significant performance improvement in comparison to the conventional HMM method.

3. Dataset

A secondary dataset for credit card fraud was obtained for this article from an internet repository made available by Kaggle.com (https://www.kaggle.com/mlg-ulb/creditcardfraud, accessed on 21 July 2024).This dataset dating from September 2013 contains different credit card transactions made by cardholders in Europe.This dataset includes 492 fraud cases out of 284,807 transactions within two days, indicating that 0.172% of card transactions were fraudulent. Each transaction contains 31 numerical features (31 columns), 28 (

V_{1}

–

V_{28}

) of which are transformed using PCA to support seclusion [6,20]. The three remaining columns are time, amount, and the class of transaction. The time column depicts the time difference between each transaction in the dataset from the first one to the last, whereas the class of transaction column, which is key to this study, indicates if the transaction is fraudulent ‘1’ or non-fraudulent ‘0’. Table 1 provides a summary of some statistical properties (

S P

) of the dataset.

4. Feature Extraction Techniques

4.1. Principal Component Analysis (PCA)

PCA is a technique used for reducing the dimension of a dataset, by converting the variables in the dataset into a smaller set while preserving the information in the original dataset [12,13]. Consequently, this reduced dataset, called features, can be seamlessly visualised and evaluated using different ML tools. PCA uses basic equations to derive variables, called principal components, from the original dataset. PCA is a very robust feature extraction technique, especially when the sampling points in the dataset to be analysed carry both negative and positive values [12,13]. The operation of PCA can be sequentially broken down into three steps: (I) standardisation, (II) computation of covariance matrix, and (III) eigenvector and eigenvalue computation [12].

4.1.1. Standardisation

The standardisation step is a scaling process that allows all the variables or sampling points in a continuous signal

S_{i}

to evenly participate in the feature analysis. This step limits any bias in the feature analysis process, in case there are large differences between the dataset variables. Given a continuous signal,

S_{i} = (s_{1}, s_{2}, \dots, s_{i})

, the standardisation step is achieved as:

{\bar{S}}_{i} = (\frac{s_{1} - M (S_{i})}{D (S_{i})}, \frac{s_{2} - M (S_{i})}{D (S_{i})}, \dots, \frac{s_{i} - M (S_{i})}{D (S_{i})}) = ({\bar{s}}_{1}, {\bar{s}}_{2}, \dots, {\bar{s}}_{i}),

(1)

where

{\bar{S}}_{i}

is the standardised continuous signal, M is the mean of

S_{i}

, and D is the standard deviation of

S_{i}

.

4.1.2. Covariance Matrix Computation, ∑

The covariance matrix ∑ is computed to identify the variation between each variable in

{\bar{S}}_{i}

and the mean of

{\bar{S}}_{i}

. This step is used to eliminate redundant points or information since the variables are presumed to be highly correlated. ∑ is a symmetric

d \times d

matrix, where its entries are covariances linked to all the possible pairs of the initial variables. For example, given a d-dimensional matrix of the form:

G = (\begin{matrix} G_{1} & G_{2} & \dots & G_{d} \end{matrix}),

(2)

where

G_{1}

,

G_{2}

, and

G_{d}

are defined as:

G_{1} = (\begin{matrix} g_{1_{1}} \\ g_{1_{2}} \\ ⋮ \\ g_{1_{a}} \end{matrix}), G_{2} = (\begin{matrix} g_{2_{1}} \\ g_{2_{2}} \\ ⋮ \\ g_{2_{a}} \end{matrix}), \dots G_{d} = (\begin{matrix} g_{d_{1}} \\ g_{d_{2}} \\ ⋮ \\ g_{d_{a}} \end{matrix}) .

Then, the covariance of matrix G can be computed as:

\bar{G} = c o v (G) = (\begin{matrix} \sum \frac{G_{1}^{2}}{a} & \sum \frac{G_{1} G_{2}}{a} & \dots & \sum \frac{G_{1} G_{d}}{a} \\ \sum \frac{G_{2} G_{1}}{a} & \sum \frac{G_{2}^{2}}{a} & \dots & \sum \frac{G_{2} G_{d}}{a} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ \sum \frac{G_{d} G_{1}}{a} & \sum \frac{G_{d} G_{2}}{a} & \dots & \sum \frac{G_{d}^{2}}{a} \end{matrix}),

(3)

where the variance of

G_{1}

,

G_{2}

, and

G_{d}

is the diagonal of Equation (3). Also, it should be noted from Equation (3) that

\sum \frac{G_{1} G_{2}}{a} = \sum \frac{G_{2} G_{1}}{a}

is the covariance of two vectors, which is defined as:

c o v (G_{1}, G_{2}) = c o v (G_{2}, G_{1}) = \frac{\sum (G_{1} - M (G_{1})) (G_{2} - M (G_{2}))}{a - 1} .

(4)

4.1.3. Eigenvector and Eigenvalue Computation

The covariance matrix ∑ is used to compute the eigenvectors

U

and eigenvalues

V

to derive the principal components of the continuous signal

S_{i}

. The principal components are computed by finding the solution to Equation (5).

\sum U = V U .

(5)

where

V

are scalar values and

U

are non-zero vectors called principal components. The direction of the PCA space is depicted by

U

while the corresponding

V

shows the length and scaling factor of

U

[12,27]. The principal components are arranged according to the variance of

V

from the highest to the lowest. Thus, the

U

with the highest

V

is the first principal component because it has the highest variance and carries the maximum possible information. This arrangement enables a reduction in the dimension of the principal components with no or minimal information lost. It should be noted that these principal components can be reconstructed linearly as a combination of the original signal. Also, the number of derived principal components is a function of the dimension of the analysed signal. That is, the d-dimension matrix of Equation (2) will produce d principal components after complete decomposition.

To solve for

U

and

V

, Equation (5) is decomposed using the singular vector decomposition (SVD) technique into three matrices given as:

\bar{G} = X Y Z^{†},

(6)

where Y is the

a \times a

diagonal matrix, X is the

a \times d

matrix called the left singular vector, and Z is the

d \times d

matrix referred to as the right singular vector. The elements on the diagonal of Y arranged from highest to lowest are the

V

while the principal components (

U

) are the columns of the right singular vector, Z. The principal components after complete SVD are usually of the form:

Z = (\begin{matrix} z_{1, 1} & z_{1, 2} & \dots & z_{1, d} \\ z_{2, 1} & z_{2, 2} & \dots & z_{2, d} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ z_{d, 1} & z_{d, 2} & \dots & z_{d, d} \end{matrix}) .

(7)

Therefore, to derive a suitable feature vector f of dimensions

1 \times d

that can be combined with HMM, Equation (7) is transformed as:

f_{d} = [{(\frac{1}{d} \sum_{1}^{d} | z_{r} |)}_{1}, {(\frac{1}{d} \sum_{1}^{d} | z_{r} |)}_{2}, \dots, {(\frac{1}{d} \sum_{1}^{d} | z_{r} |)}_{d}] = [f_{1}, f_{2}, \dots, f_{d}] .

(8)

Hence, given

α

credit card transactions used for training HMM, the feature vector can be represented in the form of a matrix as:

f_{α, d} = (\begin{matrix} f_{1, 1} & f_{1, 2} & \dots & f_{1, d} \\ f_{2, 1} & f_{2, 2} & \dots & f_{2, d} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ f_{α, 1} & f_{α, 2} & \dots & f_{α, d} \end{matrix}) .

(9)

It should be noted that because of privacy concerns, each transaction in the dataset has been converted to principal components with dimensions

1 \times d

, where

d = 28

. Nonetheless, if given the original dataset, the principal components which serve as the feature vector can be derived using these three sequential steps. This proposed PCA feature extraction technique is summarised in Algorithm 1.

Algorithm 1: PCA Feature Extraction

Input:

S_{i}

Output:

f_{α, d}

1: Compute

{\bar{S}}_{i}

as in (1)

2: Compute

\bar{G}

as in (3)

3: Compute

V

and

U

as in (6)

4: Derive

f_{d}

and

f_{α, d}

as in (8) and (9)

4.2. Statistical Features: MRE

This statistical feature extraction method, termed MRE (Mean, Relative Amplitude, and Entropy), is based on the observation of the output of PCA; that is, the principal components. First, it is observed that the principal components are of dimensions

1 \times 28

, which automatically increases the complexity of the EHMM. However, as mentioned earlier, the principal components are arranged with respect to the variance of the eigenvector

V

from the highest to the lowest. This implies that the dimension of the principal components can be reduced to

1 \times k

(k is the ‘optimal’ length of the principal components) without necessarily losing information, as shown in Section 7.1. Yet, the value of k is still large enough to increase the computational burden of the EHMM. Therefore, the proposed MRE is used to statistically reduce the dimension of k so as to limit the computational time complexity of the EHMM. Accordingly, given that the principal component is of

1 \times k

dimensions (

f = [f_{1}, f_{2}, \dots, f_{k}])

, the MRE feature vector can be constructed by merging the following statistical parameters in no specified order.

4.2.1. Mean, $M$

The mean M is an important statistical tool used to average a collection of numbers. The mean is selected as a component of the MRE because it is noticed that the

1 \times k

principal components fall within a defined range. Therefore, given a single card transaction with

1 \times k

principal components, M is computed as:

M = \frac{1}{k} \sum_{1}^{k} f_{k} .

(10)

Consequently, for

α

credit card transactions, the mean can be represented as:

M = [\begin{matrix} M_{1} \\ M_{2} \\ ⋮ \\ M_{α} \end{matrix}] = [\begin{matrix} {(\frac{1}{k} \sum_{1}^{k} f_{k})}_{1} \\ {(\frac{1}{k} \sum_{1}^{k} f_{k})}_{2} \\ ⋮ \\ {(\frac{1}{k} \sum_{1}^{k} f_{k})}_{α} \end{matrix}]

(11)

4.2.2. Relative Amplitude

The relative amplitude

R

is also a statistical tool used in this article to reduce the dimension of the principal components. It is selected because it determines the bands of the principal components. Thus, given a single card transaction with

1 \times k

principal components,

R

is computed as:

R = [f^{m a x}, f^{m i n}, f^{d i f f}],

(12)

where

f^{m a x}

is the maximum relative amplitude,

f^{m i n}

is the minimum relative amplitude, and

f^{d i f f}

is the difference between the maximum and minimum relative amplitude, computed as:

f^{m a x} = m a x (f), f^{m i n} = m i n (f) and f^{d i f f} = | f^{m a x} - f^{m i n} | .

(13)

For

α

credit card transactions, the relative amplitude can be represented as:

R = [\begin{matrix} f_{1}^{m a x} & f_{1}^{m i n} & f_{1}^{d i f f} \\ f_{2}^{m a x} & f_{2}^{m i n} & f_{2}^{d i f f} \\ ⋮ & ⋮ & ⋮ \\ f_{α}^{m a x} & f_{α}^{m i n} & f_{α}^{d i f f} \end{matrix}] .

(14)

4.2.3. Entropy, E

The theory of entropy plays a key part in the description of many intricate processes such as communications, statistics, thermodynamics, and so on. Entropy is scientifically associated with the terms disorder, uncertainty, or randomness [28]. Different types of entropy have been used for feature extraction such as approximate entropy (ApEn), sample entropy (SampEn), permutation entropy (PeEn), and so on [29,30]. However, the Shannon entropy E is used in this article because it can easily measure the mean or average amount of information conveyed in each data point. Therefore, given a single card transaction with

1 \times k

principal components, E is estimated as:

E = \sum_{1}^{k} P_{k} l o g_{2} (\frac{1}{P_{k}}),

(15)

where

P_{k}

is the principal component at point k. Accordingly, for

α

credit card transactions, E will be formulated as:

E = [\begin{matrix} E_{1} \\ E_{2} \\ ⋮ \\ E_{α} \end{matrix}] = [\begin{matrix} \sum_{1}^{k} P_{k} l o g_{2} {(\frac{1}{P_{k}})}_{1} \\ \sum_{1}^{k} P_{k} l o g_{2} {(\frac{1}{P_{k}})}_{2} \\ ⋮ \\ \sum_{1}^{k} P_{k} l o g_{2} {(\frac{1}{P_{k}})}_{α} \end{matrix}]

(16)

Thus, the mean, relative amplitude, and entropy are combined in no particular order to form a feature vector that can be adapted with HMM as shown:

f_{M R E} = [M, R, E] = [\begin{matrix} M_{1} & R_{1} & E_{1} \\ M_{2} & R_{2} & E_{2} \\ ⋮ & ⋮ & ⋮ \\ M_{α} & R_{α} & E_{α} \end{matrix}]

(17)

It should be noted that the

f_{M R E}

is h-dimensional (where h = 5), which is a significant reduction from the d-dimension (

d = 28

) or k-dimension (k is determined in Section 7.1). However, with HMM, the computational complexity increases with an increase in the dimension of the feature vector. This implies that MRE-HMM simplifies the HMM process in comparison to PCA-HMM. This proposed MRE feature extraction technique is summarised in Algorithm 2.

Algorithm 2: MRE Feature Extraction

Input: f

Output:

f_{M R E}

1: Compute

M

as in (10) and (11)

2: Compute

R

as in (12)–(14)

3: Compute E as in (15) and (16)

4: Derive

f_{M R E}

as in (17)

5. Ensemble Hidden Markov Model (EHMM)

HMM is a probabilistic ranking classifier that assigns a tag to each observational segment in a sequence. As a result, it determines the probability distribution over the set of observations and produces the sequence of observations that is most probable [10,12]. Because of this, it is able to model and categorise the set of observations derived from this credit card transaction with ease. There are various forms of HMMs; however, this article employs the ergodic HMM, which permits transitions between states without any transition probabilities being zero [12]. In general, an HMM consist of the following components.

The number of the transition states, s.
The sequence of observation, $Q = {q_{1}, q_{2}, \dots, q_{z}}$ , where z is the number of observation which determines the output of the modelled system.
The transition probability matrix, $ρ$ : The transition probability between two states, $s_{1}$ and $s_{2}$ , is denoted as $ρ = ρ_{s_{1}, s_{2}}$ . Thus, $ρ$ is the probability of moving from state $s_{1}$ to $s_{2}$ . It is important to note that the likelihood of transitioning from state $s_{1}$ to $s_{2}$ is only determined by the present state $s_{1}$ , not by any previous state. Mathematically, $ρ$ is of the form of an $s \times s$ matrix given as:

$ρ = (\begin{matrix} ρ_{1, 1} & ρ_{1, 2} & \dots & ρ_{1, s} \\ ρ_{2, 1} & ρ_{2, 2} & \dots & ρ_{2, s} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ ρ_{s, 1} & ρ_{s, 2} & \dots & ρ_{s, s} \end{matrix}) .$

(18)
Gaussian emission distribution parameters, $ϕ$ : The probability of emitting an observation from a particular state s is determined by $ϕ$ . Mathematically, $ϕ$ is of the form of an $s \times z$ matrix given as

$ϕ = (\begin{matrix} ϕ_{1, 1} & ϕ_{1, 2} & \dots & ϕ_{1, z} \\ ϕ_{2, 1} & ϕ_{2, 2} & \dots & ϕ_{2, z} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ ϕ_{s, 1} & ϕ_{s, 2} & \dots & ϕ_{s, z} \end{matrix}) .$

(19)

$ϕ$ consist of three important parameters; namely, the mean M, covariance matrix ∑, and mixture weight $β$ , represented as $ϕ$ = ${M, \sum, β}$ .
The start probability, $ν = {ν_{1}, ν_{2}, \dots, ν_{z}}$ : $ν$ indicates the beginning of the Gaussian distribution process. $ν$ always adds up to 1 at every point of the distribution process. That is,

$\sum_{1}^{z} ν_{z} = 1$

(20)

As a result, a given HMM can be represented using these three parameters as:

H = {ν, ρ, ψ}

(21)

In applying HMMs, three primary problems are typically addressed to find the optimal sequence of observations:

Computing the observation probability: The HMM uses an observation sequence, $Q = {q_{1}, q_{2}, \dots, q_{z}}$ , and a known model, $H = {ν, ρ, ψ}$ , to estimate the likelihood of a specific sequence of observations.
Computing the model parameters (Training): During this phase, HMM selects the model that best fits the input data. The HMM parameters, $H = {ν, ρ, ψ}$ , are continually re-estimated to maximize the likelihood of each observation sequence. Maximum likelihood estimation (MLE) techniques like the Baum–Welch algorithm [31] can be used for this process. The Baum–Welch algorithm iteratively optimizes the HMM parameters, $H = {ν, ρ, ψ}$ , based on the Gaussian emission distribution parameters, $ϕ$ = ${M, \sum, β}$ until the condition is satisfied. The Gaussian emission distribution parameters, $ϕ$ = ${M, \sum, β}$ , are used in the MLE process. These Gaussian parameters, $ϕ$ , assume random values or flat values at the start of the MLE process. However, HMMs are sensitive to flat start or random values of the Gaussian parameters M, ∑, and $β$ because it limits the prediction performance of the model in general. Accordingly, the K-MC [15,16] and GMM [17,18] techniques are sequentially used to initialise the Gaussian emission parameters. As a result, this article refers to HMM as an ensemble hidden Markov model (EHMM) because the K-MC and GMM techniques are embedded in the Gaussian emission process, as shown in Figure 1.
Decoding: HMM identifies the hidden states or path based on the observation, Q. The Viterbi algorithm ( $V_{a l g}$ ) [32] is used to complete this phase. The $V_{a l g}$ is an error-correcting algorithm that searches for a new path using the knowledge from a previous path. It subsequently outputs the path with optimal probability by calculating all probable hidden paths.

This conventional HMM approach, utilizing MLE and the

V_{a l g}

, has been applied in prior studies to predict credit card fraud. However, the reliability of the extracted feature vector is crucial to the performance of EHMM. In fact, the more reliable the feature vector, the better the performance of EHMM. Therefore, this article places significant emphasis on the feature extraction technique.

EHMM Training

The credit card dataset, which includes 284,807 transactions, is divided into two portions. The model is tested on a small subset of the dataset, while the majority is used for training (

χ

= 70–80%). The training portion is divided into two groups: fraudulent transactions and non-fraudulent transactions. As indicated in the dataset, fraudulent transactions are represented as ’1’ while non-fraudulent transactions are designated with ’0’. As a result, two HMMs are created to represent fraudulent transactions (

ν_{1}

,

ρ_{1}

,

ϕ_{1}

) and non-fraudulent transactions (

ν_{2}

,

ρ_{2}

,

ϕ_{2}

). In each scenario, a four-state ergodic HMM with two mixture weights is used. The dimension of EHMM is dictated by the resulting feature extraction vector dimension. MRE-EHMM assumes five dimensions, whereas PCA-EHMM assumes k dimensions.

During testing, the two HMMs are merged and processed through the

V_{a l g}

. This combined HMM (defined by

ν = ν_{1}, ν_{2}

,

ρ = ρ_{1}, ρ_{2}

, and

ϕ = ϕ_{1}, ϕ_{2}

) forms an eight-state model with four mixture weights. States 1–4 represent the fraudulent transactions, while states 5–8 correspond to the non-fraudulent transactions. The

ϕ = ϕ_{1}, ϕ_{2}

parameter is used to fine-tune the resulting feature vector (

f_{M R E}

or

f_{α, d}

) for the unknown card transaction to be predicted. The fine-tuned feature vector (

{f_{M R E}}_{ζ}

or

{f_{α, d}}_{ζ}

) is subsequently fed into the

V_{a l g}

. The

V_{a l g}

uses the

ν = ν_{1}, ν_{2}

and

ρ = ρ_{1}, ρ_{2}

to predict the sequence of states form the fine-tuned feature vector, thereby predicting whether the unknown card transaction is fraudulent or not. Also, the

V_{a l g}

transits between the two states (states 1–4 and 5–8) with equal probability as defined in the transmission matrix.

6. Performance Parameters

The performance of the two feature vectors is verified using the following metrics, which are explained in the context of this study.

Recall/Sensitivity, $S_{e}$ : $S_{e}$ measures the ability of the models to correctly predict the non-fraudulent transactions; that is, class ‘0’. It is expressed as:

$S_{e} = \frac{T P}{T P + F N},$

(22)

where true positives (TP) refer to the number of accurately predicted non-fraudulent transactions and false negatives (FN) refer to the number of times the models miss the manually identified non-fraudulent transactions.
Specificity, ( $S_{p}$ ): $S_{p}$ measures the ability of the models to correctly predict the fraudulent transactions; that is, class ‘1’. It is expressed as:

$S_{p} = \frac{T N}{T N + F P},$

(23)

where true negatives (TN) refer to the number of accurately predicted fraudulent transactions and false positives (FP) refer to the number of times the models miss the manually identified fraudulent transactions.
Precision, $P$ : $P$ is defined as the capacity of the model to accurately predict the class of the card transaction; that is, ‘0’ of ‘1’. It is expressed as:

$P = \frac{T P}{T P + F P},$

(24)

where true positives (TP) refer to the number of accurately predicted transactions and false negatives (FP) refer to the number of times the models miss a manually identified transaction class. A high value of $P$ indicates a good model performance.
F1-score, $F 1$ : $F 1$ is a measure that combines precision and recall. It is commonly known as the harmonic mean of the two. It is expressed as:

$F 1 = 2 * \frac{P * S_{e}}{P + S_{e}} .$

(25)

7. Results and Discussion

7.1. PCA-EHMM

Table 2 shows the PCA-EHMM prediction performance for specific values (k = 7–12). As previously stated, the principal components after complete decomposition are organized from left to right in order of importance. As a result, reducing the number of principal components from the right has no effect on their strength. In this vein, the performance of PCA-EHMM was verified for k = 7–12 (rather than

d = 28

), which reduced the computational cost imposed by PCA on EHMM. This is because the higher the dimension, the more complex the EHMM process, where k represents the dimension of EHMM.

From Table 2, one can notice that the performance of PCA-EHMM improves as the value of k increases from 7 to 9. For example, at

χ = 80

, there is a performance improvement of 0.20%, 0.21%, 0.20%, and 0.21% in

S_{e}

,

S_{p}

,

P

, and

F 1

, respectively, as k increases from 7 to 8. Likewise, at

χ = 80

, there is a performance improvement of 0.21%, 0.23%, 0.22%, and 0.22% in

S_{e}

,

S_{p}

,

P

, and

F 1

, respectively, as k increases from 8 to 9. Therefore, increasing the value of k does not result in a significant improvement in the prediction performance of PCA-EHMM; instead, it increases the computing cost of the EHMM process. Thus,

k = 9

is chosen as the ideal number for k because it strikes a suitable compromise between performance gain and computational time complexity. Notwithstanding, this value of k is large enough to impose a high computational burden on EHMM. As such, MRE-EHMM is introduced in this article, as discussed earlier.

Figure 2, Figure 3, Figure 4 and Figure 5 depict the PCA-EHMM performance as k increases in size. One can notice how the curves of each figure flatten as k increases. This clearly shows that there is no significant increase in the performance of PCA-EHMM, even if all derived principle components are used (that is,

d = 28

). Also, it is important to mention that the performance of PCA-EHMM improves as the training size

χ

increases from 70% to 80%. This performance improvement is evident in the four performance metrics considered.

7.2. MRE-EHMM

Table 3 shows the performance of the introduced MRE-EHMM in comparison to PCA-EHMM at

k = 9

. From the table, MRE-EHMM and PCA-EHMM exhibited approximately the same

S_{e}

,

S_{p}

,

P

, and

F 1

performances, even as

χ

increases. However, one should keep in mind that MRE-EHMM is five-dimensional while PCA-EHMM is nine-dimensional. However, the computational time complexity of EHMM is proportional to the dimension size and increases as the dimension grows. Thus, when compared to the PCA approach, the MRE approach requires significantly less processing time from EHMM. As a result, the MRE approach can be used as a more efficient performance alternative to the PCA approach for real-time credit fraud prediction systems.

7.3. PCA and MRE Performance Comparison with HMM and Proposed EHMM

In this section, the performance of PCA and MRE is verified for the proposed EHMM and conventional HMM (that is, when the K-MC and GMM components are excluded from the HMM process). The prediction models’ performance is shown in Table 4. The table shows that PCA-EHMM and MRE-EHMM outperform PCA-HMM and MRE-HMM, respectively. For example, at

χ = 80 %

, PCA-EHMM outperforms PCA-HMM by 28.01%, 28.16%, 28.22%, and 28.12% in terms of

S_{e}

,

S_{p}

,

P

, and

F 1

, respectively. Likewise, at

χ = 80 %

, MRE-EHMM outperforms MRE-HMM by 28.02%, 28.14%, 28.22%, and 28.13% in terms of

S_{e}

,

S_{p}

,

P

, and

F 1

, respectively. This indicates that sequentially initialising the HMM process with the K-MC and GMM techniques is vital for improving its performance and the prediction system as a whole.

PCA-EHMM and MRE-EHMM Comparison with other Models

This section compares the performance of PCA-EHMM and MRE-EHMM with other relevant ML models in the literature. As highlighted in Section 2, RF, LR, DT, and KNN have been extensively used for detecting credit card frauds. RF combines different decision trees to limit overfitting [33,34], LR predicts the class probabilities based on some certain dependant variable [22], DT utilises a tree-like model for feature classification [35], and KNN classifies transactions based on the K-nearest data point [36]. Table 5 shows the performance of PCA-EHMM and MRE-EHMM in comparison to the RF, LR, DT, and KNN models.

From Table 5, the proposed MRE-EHMM and PCA-EHMM techniques achieved better

S_{e}

,

S_{p}

,

P

, and

F 1

performance in comparison to the RF, LR, DT, and KNN models. Furthermore, this improvement in performance is noticeable regardless of changes in

χ

. Thus, the proposed MRE-EHMM and PCA-EHMM techniques are more suited to detecting online banking fraudulent transactions, followed by RF, LR, and DT, with KNN having the worst performance.

8. Conclusions

This article analyses the performance of two feature extraction techniques (PCA and MRE) that can be combined with EHMM for the prediction of credit card fraud. The PCA method offered good

S_{e}

,

S_{p}

,

P

, and

F 1

performance. However, it exhibited a high computation burden on the entire prediction system. The MRE was introduced in the article as an efficient performance alternative to the PCA technique. Although the MRE technique offered the same

S_{e}

,

S_{p}

,

P

, and

F 1

performances as the PCA technique, it exhibited significantly less processing time, which makes it more suitable in real time. Nonetheless, both approaches offer solutions to the loss experienced by cardholders and financial institutions as a result of online banking which requires credit card details. Moreover, the results demonstrated that the two feature extraction techniques perform significantly better when integrated with EHMM compared to the conventional HMM approach (that is, when the K-MC and GMM components are excluded). In addition, the article demonstrated that the proposed MRE-EHMM and PCA-EHMM techniques are suitable for credit card fraud detection, outperforming the RF, LR, DT, and KNN models. Future research could further compare these models with other ML and deep learning techniques to highlight their relative advantages and potential limitations.

Author Contributions

O.O. (Olayinka Ogundile) and O.B. conceptualized the study. O.O. (Olayinka Ogundile), A.O. and O.O. (Olabisi Ogundile) completed the theoretical derivation. O.O. (Olayinka Ogundile) and O.B. established the model, carried out the simulation, and finished the writing of this manuscript. V.B. proofread and supervised the writing of the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

This study analyzed publicly available datasets. The data are available at https://www.kaggle.com/mlg-ulb/creditcardfraud (accessed on 1 June 2023).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Iwasokun, G.B.; Omomule, T.G.; Akinyede, R.O. Encryption Tokenization-Based Syst. Credit Card Inf. Security. Int. J. Cyber Secur. Digit. Forensics 2018, 7, 283–293. [Google Scholar]
Bhasin, M.L. The Role of Technology in Combating Bank Frauds: Perspectives and Prospects. Ecoforum J. 2016, 5, 200–212. [Google Scholar]
Rushin, G.; Stancil, C.; Sun, S.; Adam, S.; Beling, P. Horse Race Analysis in Credit Card Fraud—Deep Learning, Logistic Regression, and Gradient Boosted Tree. In Proceedings of the 2017 Systems and Information Engineering Design Symposium (SIEDS), Charlottesville, VA, USA, 28 April 2017; IEEE: New York, NY, USA, 2017; pp. 117–121. [Google Scholar]
Lucas, Y.; Portier, P.-E.; Laporte, L.; He-Guelton, L.; Caelen, O.; Granitzer, M.; Calabr, S. Towards Automated Feature Engineering for Credit Card Fraud Detection Using Multi-Perspective HMMs. Future Gener. Comput. Syst. 2020, 102, 393–402. [Google Scholar]
Robinson, W.N.; Aria, A. Sequential Fraud Detection for Prepaid Cards Using Hidden Markov Model Divergence. Expert Syst. Appl. 2018, 91, 235–251. [Google Scholar]
Ileberi, E.; Sun, Y.; Wang, Z. A Machine Learning Based Credit Card Fraud Detection using the GA Algorithm for Feature Selection. J. Big Data 2022, 9, 24. [Google Scholar]
Khare, N.; Sait, S.Y. Credit Card Fraud Detection using Machine Learning Models and Collating Machine Learning Models. Int. J. Pure Appl. Math. 2018, 118, 825–837. [Google Scholar]
Xuan, S.; Liu, G.; Li, Z.; Zheng, L.; Wang, S.; Jiang, C. Random Forest for Credit Card Fraud Detection. In Proceedings of the 2018 IEEE 15th International Conference on Networking, Sensing and Control (ICNSC), Zhuhai, China, 27–29 March 2018; IEEE: Zhuhai, China, 2018; pp. 1–6. [Google Scholar]
Ogundile, O.M.; Owoade, A.A.; Ogundile, O.O.; Babalola, O.P. Linear Discriminant Analysis Based Hidden Markov Model for Detection of Mysticetes’ Vocalisations. Sci. Afr. 2024, 24, e02128. [Google Scholar]
Ogundile, O.O.; Usman, A.M.; Versfeld, D.J.J. An Empirical Mode Decomposition Based Hidden Markov Model Approach for Detection of Bryde’s Whale Pulse Calls. J. Acoust. Soc. Am. 2020, 147, EL125–EL131. [Google Scholar] [PubMed]
Babalola, O.P.; Usman, A.M.; Ogundile, O.O.; Versfeld, D.J.J. Detection of Bryde’s Whale Short Pulse Calls using Time Domain Features with Hidden Markov Models. S. Afr. Inst. Electr. Eng. 2021, 112, 15–23. [Google Scholar]
Ogundile, O.O.; Babalola, O.P.; Odeyemi, S.G.; Rufai, K.I. Hidden Markov Models for Detection of Mysticetes Vocalisations Based on Principal Component Analysis. Bioacoustics 2022, 31, 710–738. [Google Scholar]
Alkarkhi, A.F.M.; Alqaraghuli, W.A.A. Chapter 8—Principal Components Analysis. In Easy Statistics for Food Science with R; Alkarkhi, A.F.M., Alqaraghuli, W.A.A., Eds.; Academic Press: London, UK, 2019; pp. 125–141. [Google Scholar]
Syms, C. Principal Components Analysis. In Encyclopedia of Ecology; Jørgensen, S.E., Fath, B.D., Eds.; Elsevier: London, UK, 2008; pp. 2940–2949. [Google Scholar]
MacQueen, J. Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability, Volume 1: Statistics; University of California Press: Berkeley, CA, USA, 1967; pp. 281–297. [Google Scholar]
Forgy, E.W. Cluster Analysis of Multivariate Data: Efficiency versus Interpretability of Classification. Biometrics 1965, 21, 768–780. [Google Scholar]
Duda, R.; Hart, P.; Stork, D.G. Pattern Classification, 2nd ed.; Wiley: New York, NY, USA, 2001. [Google Scholar]
Reynolds, D. Gaussian Mixture Models. In Encyclopedia of Biometrics; Li, S.Z., Jain, A., Eds.; Springer: Boston, MA, USA, 2009; pp. 659–663. [Google Scholar]
Ololade, B.M.; Salawu, M.K.; Adekanmi, A.D. E-Fraud in Nigerian Banks: Why and How? J. Financ. Risk Manag. 2020, 9, 211–228. [Google Scholar]
Dornadula, V.N.; Geetha, S. Credit Card Fraud Detection using Machine Learning Algorithms. Proc. Comput. Sci. 2019, 165, 631–641. [Google Scholar]
Sahithi, G.L.; Roshmi, V.; Sameera, Y.V.; Pradeepini, G. Credit Card Fraud Detection using Ensemble Methods in Machine Learning. In Proceedings of the 2022 6th International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 28–30 April 2022; pp. 1237–1241. [Google Scholar]
Tanouz, D.; Subramanian, R.R.; Eswar, D.; Reddy, G.V.P.; Kumar, A.R.; Praneeth, C.H.V.N.M. Credit Card Fraud Detection Using Machine Learning. In Proceedings of the 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, 6–8 May 2021; pp. 967–972. [Google Scholar]
Sadgali, I.; Sael, N.; Benabbou, F. Performance of machine learning techniques in the detection of financial frauds. Procedia Comput. Sci. 2019, 148, 45–54. [Google Scholar]
Khan, A.; Singh, T.; Sinhal, A. Observation Probability in Hidden Markov Model for Credit Card Fraudulent Detection System. In Proceedings of the Second International Conference on Soft Computing for Problem Solving (SocProS 2012), Jaipur, India, 28–30 December 2012; Springer: New Delhi, India, 2014; Volume 236, pp. 751–760. [Google Scholar]
Fashoto, S.G.; Owolabi, O.; Adeleye, O.; Wandera, J. Hybrid Methods for Credit Card Fraud Detection Using K-means Clustering with Hidden Markov Model and Multilayer Perceptron Algorithm. Br. J. Appl. Sci. Technol. 2016, 13, 1–11. [Google Scholar]
Wang, X.; Wu, H.; Yi, Z. Research on Bank Anti-fraud Model Based on K-means and Hidden Markov Model. In Proceedings of the 2018 3rd IEEE International Conference on Image, Vision and Computing, Chongqing, China, 27–29 June 2018; IEEE: New York, NY, USA, 2018; pp. 780–784. [Google Scholar]
Tharwat, A. Principal Component Analysis—A Tutorial. Int. J. Appl. Sci. Eng. 2009, 7, 41–61. [Google Scholar]
Aristov, V.V.; Buchelnikov, A.S.; Nechipurenko, Y.D. The Use of the Statistical Entropy in Some New Approaches for the Description of Biosystems. Entropy 2022, 24, 172. [Google Scholar] [CrossRef] [PubMed]
Nalband, S.; Prince, A.; Agrawal, A. Entropy-Based Feature Extraction and Classification of Vibroarthographic Signal Using Complete Ensemble Empirical Mode Decomposition with Adaptive Noise. IET Sci. Meas. Technol. 2018, 12, 350–359. [Google Scholar]
Richman, J.S.; Lake, D.E.; Moorman, J.R. Sample Entropy. In Numerical Computer Methods, Part E; Walker, J.M., Ed.; Academic Press: London, UK, 2004; Volume 384, pp. 172–184. [Google Scholar]
Baum, L.E.; Petrie, T.; Soules, G.; Weiss, N. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains. Ann. Math. Stat. 1970, 41, 164–171. [Google Scholar]
Viterbi, A. Error Bounds for Convolutional Codes and an Asymptotically Optimum Decoding Algorithm. IEEE Trans. Inf. Theory 1967, 13, 260–269. [Google Scholar]
Randhawa, K.; Loo, C.K.; Seera, M.; Lim, C.P.; Nandi, A.K. Credit card fraud detection using AdaBoost and majority voting. IEEE Access 2018, 6, 14277–14284. [Google Scholar]
Khalid, A.R.; Owoh, N.; Uthmani, O.; Ashawa, M.; Osamor, J.; Adejoh, J. Enhancing Credit Card Fraud Detection: An Ensemble Machine Learning Approach. Big Data Cogn. Comput. 2024, 8, 6. [Google Scholar] [CrossRef]
Jadhav, S.D.; Channe, H.P. Comparative study of K-NN, naive Bayes and decision tree classification techniques. Int. J. Sci. Res. 2016, 5, 1842–1845. [Google Scholar]
Ogundile, O.O.; Owoade, A.A.; Emeka, P.B.; Olaniyan, O.S. Animals’ Classification: A Review of Different Machine Learning Classifiers. J. Sci. Logics ICT Res. 2023, 9, 106–114. [Google Scholar]

Figure 1. EHMM credit card prediction model.

Figure 2. PCA-EHMM sensitivity versus dimension performance.

Figure 3. PCA-EHMM specificity versus dimension performance.

Figure 4. PCA-EHMM precision versus dimension performance.

Figure 5. PCA-EHMM F1-score versus dimension performance.

Table 1. Summary of some statistical properties of the dataset.

$S P$	Time	$V_{1}$	$V_{2}$	$V_{3}$	$V_{4}$	$V_{5}$
Minimum	0	−56.4075	−72.7157	−48.3256	−5.6832	−113.7433
Maximum	172792	2.4549	22.0577	9.3826	16.8753	34.8017
Mean	94814	0.0000	0.0000	0.0000	0.0000	0.0000
Median	84692	0.0181	0.0655	0.1799	−0.0199	−0.5434
1st quartile	54202	−0.9204	−5.5986	−0.8904	−0.8486	−0.6916
3rd quartile	139321	1.3156	0.8037	1.0272	0.7433	0.6119
$S P$	$V_{6}$	$V_{7}$	$V_{8}$	$V_{9}$	$V_{10}$	$V_{11}$
Minimum	−26.1605	−43.5572	−73.2167	−13.4341	−24.5883	−4.7975
Maximum	73.3016	120.5895	20.0072	15.5950	23.7451	12.0189
Mean	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
Median	−0.2742	0.0401	0.0224	−0.0514	−0.0929	−0.0328
1st quartile	−0.7683	−0.5541	−0.2086	−0.6431	−0.5354	−0.7625
3rd quartile	0.3986	0.5704	0.32735	0.5971	0.45392	0.7396
$S P$	$V_{12}$	$V_{13}$	$V_{14}$	$V_{15}$	$V_{16}$	$V_{17}$
Minimum	−12.6837	−5.7919	−19.2143	−4.4989	−14.1295	−25.1628
Maximum	7.8484	7.1268	10.5268	8.8777	17.3151	9.2535
Mean	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
Median	0.1400	−0.0136	0.0506	0.0481	0.0664	−0.6568
1st quartile	−0.4056	−0.6485	−0.4256	−0.5829	−0.4680	−0.4838
3rd quartile	0.6182	0.6625	0.4931	0.6488	0.5233	0.3997
$S P$	$V_{18}$	$V_{19}$	$V_{20}$	$V_{21}$	$V_{22}$	$V_{23}$
Minimum	−9.4987	−7.2135	−54.4977	−34.8304	−10.9331	−44.8077
Maximum	5.0411	5.5920	39.4209	27.2028	10.5030	22.5284
Mean	0.0000	0.0000	0.0000	0.0000	0.0000	0.0000
Median	−0.0036	0.0037	−0.0625	−0.0295	0.0068	−0.0112
1st quartile	−0.4989	−0.4563	−0.2117	−0.2284	−0.5424	−0.1619
3rd quartile	0.5008	0.4589	0.1330	0.1864	0.5286	0.1476
$S P$	$V_{24}$	$V_{25}$	$V_{26}$	$V_{27}$	$V_{28}$	Amount
Minimum	−2.8366	−10.2954	−2.6046	−22.5657	−15.4301	0.0000
Maximum	4.5846	7.5196	3.5173	31.6122	33.8478	25,691.1600
Mean	0.0000	0.0000	0.0000	0.0000	0.0000	88.3500
Median	0.0410	0.0166	−0.0521	0.0013	0.0112	22.0000
1st quartile	−0.3546	−0.3172	−0.3270	−0.0708	−0.0530	5.6000
3rd quartile	0.4395	0.3507	0.2410	0.0910	0.0783	77.1700
$S P$	Class
Minimum	0.0000
Maximum	1.0000
Mean	0.0017
Median	0.0000
1st quartile	0.0000
3rd quartile	0.0000

Table 2. PCA-EHMM performance as a function of k.

$k = 7$
$χ$ (%)	$S_{e}$ (%)	$S_{p}$ (%)	$P$ (%)	$F 1$ (%)
70	99.33	99.27	99.19	99.26
75	99.39	99.39	99.31	99.35
80	99.40	99.38	99.30	99.35
$d = 8$
$χ$ (%)	$S_{e}$ (%)	$S_{p}$ (%)	$P$ (%)	$F 1$ (%)
70	99.58	99.49	99.41	99.49
75	99.60	99.61	99.52	99.56
80	99.60	99.59	99.53	99.57
$k = 9$
$χ$ (%)	$S_{e}$ (%)	$S_{p}$ (%)	$P$ (%)	$F 1$ (%)
70	99.69	99.70	99.63	99.66
75	99.79	99.82	99.71	99.75
80	99.81	99.82	99.72	99.77
$k = 10$
$χ$ (%)	$S_{e}$ (%)	$S_{p}$ (%)	$P$ (%)	$F 1$ (%)
70	99.69	99.71	99.63	99.66
75	99.80	99.81	99.70	99.75
80	99.80	99.82	99.72	99.76
$k = 11$
$χ$ (%)	$S_{e}$ (%)	$S_{p}$ (%)	$P$ (%)	$F 1$ (%)
70	99.68	99.70	99.61	99.65
75	99.79	99.79	99.71	99.75
80	99.80	99.81	99.72	99.76
$k = 12$
$χ$ (%)	$S_{e}$ (%)	$S_{p}$ (%)	$P$ (%)	$F 1$ (%)
70	99.69	99.71	99.61	99.65
75	99.79	99.80	99.72	99.75
80	99.79	99.81	99.71	99.75

Table 3. MRE-EHMM and PCA-EHMM performance comparison;

h = 5

and

k = 9

.

Table 3. MRE-EHMM and PCA-EHMM performance comparison;

h = 5

and

k = 9

.

$χ$ (%)	MRE-EHMM	PCA-EHMM	MRE-EHMM	PCA-EHMM	MRE-EHMM	PCA-EHMM	MRE-EHMM	PCA-EHMM
$χ$ (%)	$S_{e}$ (%)		$S_{p}$ (%)		$P$ (%)		$F 1$ (%)
70	99.68	99.69	99.70	99.70	99.62	99.63	99.65	99.66
75	99.80	99.79	99.80	99.82	99.72	99.71	99.76	99.75
80	99.81	99.81	99.81	99.82	99.73	99.72	99.77	99.77

Table 4. MRE and PCA performance comparison with EHMM and HMM;

h = 5

and

k = 9

.

Table 4. MRE and PCA performance comparison with EHMM and HMM;

h = 5

and

k = 9

.

$χ$ (%)	Predictor	$S_{e}$ (%)	$S_{p}$ (%)	$P$ (%)	$F 1$ (%)
70	MRE-EHMM	99.68	99.70	99.62	99.65
	MRE-HMM	71.22	70.91	71.05	71.13
	PCA-EHMM	99.69	99.70	99.63	99.66
	PCA-HMM	71.20	70.96	71.07	71.13
75	MRE-EHMM	99.80	99.80	99.72	99.76
	MRE-HMM	71.65	71 61	71.44	71.54
	PCA-EHMM	99.79	99.82	99.71	99.75
	PCA-HMM	71.66	71.60	71.45	71.56
80	MRE-EHMM	99.81	99.81	99.73	99.77
	MRE-HMM	71.79	71.67	71.51	71.64
	PCA-EHMM	99.81	99.82	99.72	99.77
	PCA-HMM	71.80	71.66	71.50	71.65

Table 5. MRE-EHMM and PCA-EHMM performance comparison with other ML models.

$χ$ (%)	Predictor	$S_{e}$ (%)	$S_{p}$ (%)	$P$ (%)	$F 1$ (%)
70	MRE-EHMM	99.68	99.70	99.62	90.65
	PCA-EHMM	99.69	99.70	99.63	99.66
	RF	97.01	97.09	96.98	97.00
	LR	96.77	96.79	96.65	96.71
	DT	96.23	96.25	96.10	96.17
	KNN	95.87	95.86	95.62	95.75
75	MRE-EHMM	99.80	99.80	99.72	99.76
	PCA-EHMM	99.79	99.82	99.71	99.75
	RF	97.22	97.24	97.19	97.16
	LR	96.99	96.97	96.88	96.94
	DT	96.44	96.43	96.21	96.32
	KNN	96.04	96.03	95.89	95.96
80	MRE-EHMM	99.81	99.81	99.73	99.77
	PCA-EHMM	99.81	99.82	99.72	99.77
	RF	97.56	97.54	97.29	97.42
	LR	97.11	97.09	97.00	97.06
	DT	96.88	96.89	96.78	96.83
	KNN	96.54	96.57	96.48	96.38

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ogundile, O.; Babalola, O.; Ogunbanwo, A.; Ogundile, O.; Balyan, V. Credit Card Fraud: Analysis of Feature Extraction Techniques for Ensemble Hidden Markov Model Prediction Approach. Appl. Sci. 2024, 14, 7389. https://doi.org/10.3390/app14167389

AMA Style

Ogundile O, Babalola O, Ogunbanwo A, Ogundile O, Balyan V. Credit Card Fraud: Analysis of Feature Extraction Techniques for Ensemble Hidden Markov Model Prediction Approach. Applied Sciences. 2024; 14(16):7389. https://doi.org/10.3390/app14167389

Chicago/Turabian Style

Ogundile, Olayinka, Oluwaseyi Babalola, Afolakemi Ogunbanwo, Olabisi Ogundile, and Vipin Balyan. 2024. "Credit Card Fraud: Analysis of Feature Extraction Techniques for Ensemble Hidden Markov Model Prediction Approach" Applied Sciences 14, no. 16: 7389. https://doi.org/10.3390/app14167389

APA Style

Ogundile, O., Babalola, O., Ogunbanwo, A., Ogundile, O., & Balyan, V. (2024). Credit Card Fraud: Analysis of Feature Extraction Techniques for Ensemble Hidden Markov Model Prediction Approach. Applied Sciences, 14(16), 7389. https://doi.org/10.3390/app14167389

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Credit Card Fraud: Analysis of Feature Extraction Techniques for Ensemble Hidden Markov Model Prediction Approach

Abstract

1. Introduction

2. Related Work

3. Dataset

4. Feature Extraction Techniques

4.1. Principal Component Analysis (PCA)

4.1.1. Standardisation

4.1.2. Covariance Matrix Computation, ∑

4.1.3. Eigenvector and Eigenvalue Computation

4.2. Statistical Features: MRE

4.2.1. Mean, $M$

4.2.2. Relative Amplitude

4.2.3. Entropy, E

5. Ensemble Hidden Markov Model (EHMM)

EHMM Training

6. Performance Parameters

7. Results and Discussion

7.1. PCA-EHMM

7.2. MRE-EHMM

7.3. PCA and MRE Performance Comparison with HMM and Proposed EHMM

PCA-EHMM and MRE-EHMM Comparison with other Models

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

Credit Card Fraud: Analysis of Feature Extraction Techniques for Ensemble Hidden Markov Model Prediction Approach

Abstract

1. Introduction

2. Related Work

3. Dataset

4. Feature Extraction Techniques

4.1. Principal Component Analysis (PCA)

4.1.1. Standardisation

4.1.2. Covariance Matrix Computation, ∑

4.1.3. Eigenvector and Eigenvalue Computation

4.2. Statistical Features: MRE

4.2.1. Mean, M

4.2.2. Relative Amplitude

4.2.3. Entropy, E

5. Ensemble Hidden Markov Model (EHMM)

EHMM Training

6. Performance Parameters

7. Results and Discussion

7.1. PCA-EHMM

7.2. MRE-EHMM

7.3. PCA and MRE Performance Comparison with HMM and Proposed EHMM

PCA-EHMM and MRE-EHMM Comparison with other Models

8. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.2.1. Mean, $M$