1. Introduction
Each year, the Saudi Zakat Tax and Customs Authority (ZATCA) [
1] receives millions of tax declarations from individuals, government agencies, financial institutions, companies, and various other entities. Those electronic forms include financial data relevant to taxpayer actvities. Additionnally, ZATCA collects third-party data from several government agencies to consolidate taxpayer profiles and cross-checks their financial information. Typically, ZATCA relies on big data for its tax enforcement activities. In particular, Zakat under-reporting represents a major risk faced by the Saudi tax authority. In fact, the Zakat base represents the net worth of the entity as calculated for Zakat purposes [
2]. Then, Zakat is charged on the company’s Zakat base at 2.5%. To bridge the widening Zakat gap, ZATCA governments enforce various legal penalties and regulations [
2]. In fact, Zakat non-compliance is perceived as a delinquent act. Thus, taxpayers are required to prove their compliance in order to avoid legal consequences.
The earliest solutions considered to improve tax compliance relied exclusively on auditor efforts. However, this strategy is costly and constrained by the large number of taxpayers in addition to the reduced audit capacity of the tax administration. Moreover, big data being collected by tax authorities and stored in their databases are not efficiently exploited to enhance the detection rate of tax under-reporting. In other words, most of the case selection strategies rely on the intuition, domain knowledge, and experience of auditors with no intelligent mining of the existing data [
3]. Further, taxpayers have been continuously developing new techniques of tax evasion that are relatively difficult to detect, requiring the deployment of advanced robust fraud detection methods [
4,
5].
The recent advances in Artificial Intelligence and its application to data science promoted the interest of tax administrations around the world to design intelligent solutions to support their conventional approaches to determine fraudulent behavior and optimize the management of the avilable auditing resources and data collection capabilities. In particular, the rich tax data collected by tax adiminstrations triggered the development of advanced analytics models intended to investigate and mine tax fraud patterns [
6]. Namely, machine learning (ML) techniques have been adapted and associated with a large amount of tax datasets to improve risk description and detection performance [
7]. Particularly, Value Added Tax (VAT) under-reporting detection has been formulated as a supervised learning task. In other words, historical VAT data have been used to train classification models able to map unseen VAT declarations into one of the predefined classes: under-reporting or actual declaration [
8].
Despite the promising solutions introduced to address VAT under-reporting risk [
9], to the best of our knowledge, no intelligent approaches have been proposed to alleviate Zakat fraud concerns. Moreover, although existing supervised machine learning techniques [
10] solve the categorization problem, the challenge for tax administration remains the prioritization of cases to be audited. For instance, of 100,000 declarations positively categorized using a classification model as under-reporting risks, and given the audit capacity of a tax administration, only a subset of the flagged samples will be selected and sent to the operations department for audit. Natuarally, the classification confidence or probability is used to determine the most risky instances. However, this priority measure does not take into consideration the revenue at risk, which is among the main key performance indicators for tax administrations. In other words, a highly accurate supervised learning model may not yield more tax income for the government.
This research aims at addressing this challenge as well as enhancing the overall under-reporting detection performance, through the design and development of a supervised machine learning model that predicts Zakat under-reporting and the revenue at risk. Specifically, a deep neural network is designed to classify Zakat declarations into “under-reporting” or “actual declaration” classes, and predict the expected tax gap. Moreover, the proposed model would support administration efforts to pinpoint the declarartions and/or taxpayers that were assigned to the class “under-reporting” and that correspond to a higher revenue at risk. Specifically, the proposed model generates a confidence value that encloses the likelihood of belonging to the class “under-reporting” as well as the cooresponding revenue at risk. Thus, a high confidence value is associated with both under-reporting risk and high revenue at risk. Accordingly, the main contributions of this research can be summarized as: (i) designing a deep neural network performing simultaneous classification and regression, (ii) enabling Zakat declarations to categorize taxpayers into compliant or non-compliant, and (iii) determining the shortlist of Zakat payers that should be audited first in order to maximize the expected Zakat income. Further, the model hyper-parameters are investigated to determine the optimal settings. Namely, activation, the batch size, epochs, and the number of layers are investigated during the fine tuning phase.
The rest of this manuscript is organized as follows:
Section 2 surveys the related works relevant to tax fraud prediction using machine learning techiques. In
Section 3, the proposed solution is depicted, while the experiments settings, findings and discussion are outlined in
Section 4. Finally, the research conclusions and future work are presented in
Section 5.
2. Related Works
To the best of our knowledge, to date, no research has dealt with Zakat fraud detection using machine learning techniques. Accordingly, this section covers relevant tax fraud detection techniques introduced by researchers and/or adopted by fiscal administrations around the globe.
Typically, rule-based systems have been designed to address challenges related to various tax fraud detection. However, the resulting solutions proved to be limited by the millions of taxpayers (individual and business) to be investigated in addition to the subjective intuition and knowledge of auditors when selecting suspicious cases [
11]. This alternative exhibits two main drawbacks: (i) expensive maintenance and update costs of knowledge-based approaches, and (ii) dependence on previous experience, which affects its ability to recognize recent fraudulent behavior. On the other hand, fraudsters keep developing tactics to evade paying taxes. This makes auditor intuition and experience insufficient to track them.
Recently, data science and Artificial Intelligence emerged as the most promising alternatives for addressing complex analytics challenges. Specifically, they have been used to leverage the machine’s ability to learn from available data with no explicit programming. The works in [
11,
12] introduced fraud-focused advanced data analytics and machine learning. Although they were not focused on tax fraud detection, multiple studies [
13,
14] adapted supervised and unsupervised machine learning techniques. One should note that more contributions relied on unsupervised machine learning due to the scarcity of labeled data. Particularly, the authors in [
13] relied on unsupervised learning to group similarly valued tax declarations into homogeneous clusters. Then, they adjusted the resulting probability distribution of each obtained cluster. Finally, the detection of suspicious patterns was achieved based on the quantiles of cluster-adjusted distribution. Despite the reported promising results, the main limitations of the work were data scarcity and the reduced number of features used to represent it. Moreover, the absence of labeled gold data affected the trustworthiness of the performance achieved by the model.
The pioneering deployment of supervised machine learning techniques to address fiscal fraud detection was achieved in [
12]. Specifically, the authors investigated the C5.0 decision tree algorithm to build a predictive model for tax evasion detection in Italy. In addition, random forests [
15], rule-based classification [
16] and Bayesian networks were also considered to resolve fraudulent tax behavior detection. The researchers in [
17] depicted a Value Added Tax (VAT) screening framework to determine non-compliant VAT declarations. In particular, the Apriori algorithm was employed to mine association rules from historical data of business entities with a confirmed fraudulent behavior. The resulting model was assessed using non-compliant VAT declarations collected in Taiwan from 2003 to 2004. Similarly, the Apriori algorithm was adapted in [
18,
19] to mine hidden patterns underlying fraudulent tax behaviors. Specifically, it was associated with Principal Component Analysis (PCA) [
20] and Singular Value Decomposition (SVD) [
21], as dimensionality reduction techniques, in order to determine the relevant fraud indicators and learn a fraud scale to rank Brazilian taxpayers based on the risk they represent for the tax administration. In [
22], unsupervised and supervised machine learning techniques were coupled to detect taxpayers’ suspicious behavior. In particular, two unsupervised learning techniques were adopted to discover clusters of business entities that exhibit similar tax-related behavior. Namely, Neural Gas [
23] and Self-Organizing Maps [
24] were investigated and associated with two datasets including tax declarations of micro to small and medium to large companies collected in Chile from 2005 to 2007. Additionally, three supervised learning techniques, namely, neural network, decision tree and Bayesian network were used to build classification models able to detect fraudulent taxpayer behavior. Similarly, the research in [
21] included a comparison between Artificial Neural Networks (ANN) [
25], Support Vector Machines (SVM) [
26], and K-Nearest Neighbors in the context of credit card fraud detection. The reported results showed that ANN outperforms the other models. In [
27], the authors introduced a fraud detection approach for Mellon Bank. ANN proved to be more accurate and to enhance the timeliness and overall detection performance. ANN overtook decision trees and Naive Bayes classifiers when associated with financial data to detect fraud. In summary, the multilayer perceptron (MLP) model has been recommended for fraud detection tasks [
28].
In [
29], the researchers introduced a multilayer perceptron neural network model to detect fraud in personal income tax forms. The reported findings show that the multilayer perceptron can be considered as an efficient classifier to predict fraudulent taxpayers, and estimate the taxpayer’s likelihood of cheating tax. One should note that the latter approach can be generalized to recognize fraud patterns for other types of taxes. A Hybrid Unsupervised Outlier Detection (HUNOD) model was presented in [
22] to mine risky tax behaviors. In particular, user knowledge was fed into a combination of representational learning and clustering to detect outliers in personal income tax data. The authors claim that the interpretability of the detected outliers is performed through the training of explainable-by-design surrogate models over outliers validated internally. Recently, in [
30], the authors coupled Artificial Neural Networks with a real dataset to detect factors related to income tax fraud. Their approach was designed to reduce time, effort, and cost taken by auditors in the manual identification of cases to be audited. This was the first study to adopt Artificial Neural Networks for income tax fraud detection in Rwanda. Similarly, financial prediction was tackled in [
31] using deep convolutional neural networks (DCNN) and multilayer perceptron (MLP). In particular, the authors, used an 8-layer MLP and a 13-layer DCNN for their credit scoring model. The models were assessed using Australian and German credit scoring data. The reported experiments proved that DCNN achieved a considerably higher performance compared to MLP. The researchers in [
32] outlined a transfer learning approach to build a tax evasion detection model. Specifically, exploited conditional adversarial networks to encode a collection of labelled tax evasion records by extracting the relevant features. The transfer learning approach is then conducted by fine-tuning the trained model using five tax datasets collected in five Chinese regions. In [
33], a large-scale dataset of electronic records of taxable transactions collected in Mexico was analyzed. The authors concluded that the interaction patterns of evaders differ from those corresponding to typical taxpayer behavior. Based on this finding, they built deep neural network and random forest [
15] models to classify unseen records as suspicious or evasion-free cases.
Semi-supervised learning was also used to tackle tax evasion detection. In [
34], a semi-supervised approach was introduced for VAT audit case selection. Precisely, a gated mixture variational autoencoder network [
35] was adapted to extract relevant features and map them into some predefined classes. Another solution based on positive and unlabeled (PU) learning techniques was depicted in [
36]. One should note that PU techniques are suitable for data collections including a small subset of data that are positively annotated while the remaining records are not labeled. The method uses: (i) one-class probabilistic classification to generate pseudo-labels and assign them to unlabeled data, (ii) random forest [
15] to determine relevant features and (iii) LightGBM [
37] as a predictive model to classify unseen records. Additionally, the authors in [
38] investigated PU learning tax evasion detection. Their method integrated features obtained by embedding a transaction graph into an Euclidean space. The work was then extended further by the researchers in [
39] who introduced a graph-embedding algorithm for transaction graphs that, prior to generating pseudo-labels for unlabeled records, extracts network-based features. Finally, a multilayer perceptron (MLP) neural network is built using pseudo-annotated data to detect unseen tax evasion instances.
As outlined above, neither rule-based solutions [
10] nor unsupervised-learning-based approaches [
13,
14] yielded satisfactory achievements when used to address tax fraud detection. The reported results were typically constrained by the expensive maintenance and update cost of knowledge-based rules as well as the np-hardness of the clustering problem. Alternatively, the supervised-learning-based solutions proved to be promising despite the scarcity of labeled data [
12]. In particular, machine learning techniques such as decision tree, random forest, Bayesian networks, Support Vector Machines (SVM) and K-Nearest Neighbors were investigated to mine fraudulent tax behavior [
15,
26,
35]. Moreover, dimensionality reduction techniques such as Principal Component Analysis (PCA) [
20] and Singular Value Decomposition (SVD) [
40] were deployed as dimensionality reduction techniques in order to identify relevant fraud indicators/attributes. More recently, ANN-based approaches proved to be more effective than shallow-model-based solutions in detecting tax fraud cases [
21,
27,
31]. One should note that semi-supervised learning along with outlier detection techniques were also investigated to enhance VAT audit case selection [
34,
35,
36,
37,
38,
39]. However, the obtained results do not show drastic improvements in terms of detection performance.
3. Proposed Approach
Given the criticality of the detection task in the context of tax under-reporting risk. This research aims to introduce an under-reporting detection approach based on deep learning techniques. Specifically, it formulates the prediction of Zakat under-reporting cases as a classification task. Moreover, it estimates the revenue at risk as the outcome through a regression task, which yields an objective prioritization of the auditing operations to be conducted by tax administrations. In other words, the proposed model classifies Zakat declarations into the “under-reporting” and “actual declaration” categories, which represent the positive class and the negative class, respectively. Furthermore, it determines which cases among those assigned to the positive class should be given higher priority for auditing by the tax administration. In fact, the priority sorting is conducted based on the confidence degree generated by the proposed model. Specifically, the proposed model assigns a high confidence value to under-reporting which corresponds to an expected high revenue at risk.
In the following, we depict the design details of the proposed system.
Figure 1 overviews the proposed network designed to extract relevant low-level features and learn their mapping into the predefined classes. As such, a collection of Zakat declarations including applicable attributes is fed into the system for the training phase. Note that these training instances are labelled, which makes them suitable for the training of the proposed deep neural networks. Note that the supervision information (labels) represents previous auditing results. Precisely, each training instance is associated with a ground truth class label as well as a Zakat revenue calculated as the difference between the pre-audit and post-audit Zakat amounts. The considered deep neural network consists of an input layer, a set of hidden layers, and two output layers. The latter are designed to perform simultaneously: (i) a regression task to predict the expected revenue at risk, and (ii) a classification task to assign each declaration to “under-reporting” or “actual declaration” category using a sigmoid layer fed with the same input. The proposed dual classification and regression task yields an objective prioritization of the auditing effort based on the confidence value generated by the designed model.
The training of the proposed model relies on the optimization of the following loss function:
where
represents the coefficient of the loss function that controls the tradeoff between linear regression layer and the sigmoid layer loss functions. Particularly, the loss function corresponding to the regression layer is formulated as:
where
is the number of instances,
expresses the predicted value for input
, and
represents the actual value corresponding to the input
.
Further, the binary cross entropy is exploited for the optimization of the classification model:
where
represents the probability of the first category, and
represents the probability of the second category.
The backpropagation algorithm considered for optimizing the loss function while training the proposed model and updating the network weights is detailed Algorithm 1.
Algorithm 1: Backpropagation |
for i in reversed network: Layer ←networki if i is last layer then: for j in Layer:
update the network weights |
For the testing phase, an unseen a Zakat declarations dataset is conveyed to the trained model to classify cases into “under-valuation” or “actual declaration”, and predict each declaration revenue at risk using the regression layer.
Further, one can note that the designed architecture was fine-tuned empirically. In other words, the final network architecture as well as the hyper-parameter settings were optimized through comprehensive experiments. In particular, the appropriate number of layers along with the number of neurons per layer were determined empirically. Additionally, the network hyper-parameters, such as the batch size, optimizer, the number of epochs and the learning rate, were investigated during the training phase. Accordingly, the architecture proposed in
Figure 1 encloses an input layer with five hidden layers. The first four hidden layers are, respectively, followed by dropout layers. In fact, the latter layers represent a regularization technique that prevents the proposed neural networks from overfitting. Specifically, these dropout layers are intended to modify the network itself to avoid overfitting. On the other hand, the last hidden layer is followed by two output layers. The first one is meant to perform the regression task while the second layer is dedicated for the classification purpose.
Figure 2 illustrates the structure of the proposed architecture.
Accordingly,
Table 1 details each layer of the proposed architecture. As it can be seen, the input layer is fed with 94-dimentional data and conveys them to the next hidden layer which is also composed of 94 neurons. Then, a dropout layer is placed to randomly omit 10% of the neurons in order to increase the resulting model generalization capability and avoid overfitting. On the other hand, the next hidden layer encloses 1000 neurons, followed by a 10% dropout layer, while the third hidden layer encloses 1000-dimensional features and yields 500-dimensional features through 500 neurons prior to a dropout layer. Similarly, the 250-neuron fourth hidden layer is coupled with the last 10% dropout layer, and yields 250-dimensional features to be processed by 50 neurons. Note that a l2 regularizer was associated with a ReLU activation function for all hidden layers. Further, the prediction is synchronously performed through the considered output layers. In particular, as illustrated in
Figure 2, a “reg_output” activation function is dedicated for the revenue at risk prediction, while a “class_output” layer that consists of a sigmoid function is intended to classify Zakat declarations into the “under-reporting” and “actual declaration” categories.
4. Experiments
This section outlines the experiment settings and the dataset used to develop the proposed approach. Moreover, the data preparation and pre-processing techniques considered for this research, in addition, to the training strategy adopted to build the model are revealed. Furthermore, the standard performance measures used to assess the regression and classification results achieved by the model are defined.
In this research, Keras library [
41] was coupled with the TensorFlow platform [
42] and a Spyder [
43] open-source environment to implement the proposed work. On the other hand, the hardware specifications include 16 GB dual-channel RAM, an Intel i7 9700 CPU, and Intel UHD GPU 630. Zakat declarations used in this research were provided by the Saudi Zakat, Tax, and Customs Authority (ZATCA) subject to releasing only information related to data and research findings that are considered as non-sensitive by ZATCA. The rationale that supports this decision is to avoid publishing information, relevant to ZATCA strategies, that can be exploited by some taxpayers to adapt their fraudulent behavior. Specifically, the data used in this research experiment consist of Zakat filings randomly selected from the Zakat declarartions collected by ZATCA between January 2018 and April 2022.
Table 2 presents a high-level description of this data collection. The attributes enclosed in the ZATCA [
1] dataset consist of relevant fields extracted from Zakat forms. Moreover, derived features were designed to enrich the considered dataset. However, due to confidentiality, the authors of this research can share a limited amount of details about the variables that were engineered based Zakat declaration forms. One should note that this does not affect the effectiveness of the proposed approach that can be exploited by other researchers and fed with different set of variables.
In order to handle the outliers that may affect the model performance, the Winsorizing [
44] function was deployed on the training data. Additionally, the StandardScaler function was employed to normalize the data attributes, which exhibit highly variant scales. Moreover, the dataset exhibits some data distribution imbalance. Furthermore, an oversampling technique, namely, the Synthetic Minority Oversampling Technique (SMOTE) [
45] was employed to address the considerable data imbalance and generates instances based on their closest neighbors from the minority class. Similarly, an under-sampling of the majority class was randomly performed. This yielded equally balanced class distributions.
To build the proposed model, we split the ZATCA dataset into a 60% training set and a 20% validation set to adjust the hyper-parameters, while the remaining 20% set was dedicated for testing the model’s performance. Further, for the conducted experiments, several measures were used to evaluate the classification performance. Namely, the accuracy, recall, precision, and the F1-measure were calculated based on the confusion matrix shown in
Table 3, where the row corresponds to actual values, and the column reports the predicted values.
In particular, TP (True Positive) represents the number of VAT under-reporting cases which the model classified correctly. On the other hand, TN (True Negative) reports the number of correctly predicted actual reporting cases, while FP (False Positive) and FN (False Negative) refer to the number of misclassified cases for both classes. Accordingly, the accuracy is defined as the ratio of correctly categorized instances. It is obtained using:
In addition, the recall represents the ratio of correctly classified under-reporting declarations over all records from this class. It is calculated as follows:
Similarly, the precision reports the ratio of correctly categorized declarations over all instances assigned to the under-reporting class. It is obtained as follows:
Finally, the F1-measure is obtained as a combination for precision and recall using:
To assess the regression performance, appropriate metrics such as the Mean Square Error (MSE), the Root Mean Square Error (RMSE) and the Mean Absolute Error (MAE) were used in this research experiments. Specifically, the
MSE is defined as an absolute measure of the model’s fit goodness. It is calculated using:
where
N represents the number of instances in the dataset, while
and
and correspond to the predicted value for the input
and the actual value, respectively. Additionally, the RMSE measures the performance of the proposed model as a square root of the
MSE value:
Finally, the
MAE is calculated as the sum of the error absolute values:
Since the proposed model contains two output layers, we have set weights for the loss of each output layer. In other words, the alpha value
is set as the weight of the regression layer loss and
as the weight of the classification layer loss. Therefore, different values were assigned to
in order to adjust the model and get the best performance for the proposed model.
Table 4 proves that the proposed model achieved the best performance using the dataset without resampling when
was set to 0.4. On the other hand, when associating the considered dataset after applying SMOTE and hybrid SMOTE with random under-sampling (RU), the proposed model achieved the best performance for
equal to 0.3.
Table 5 shows that the proposed model using the dataset without resampling achieved the lowest performance using a batch size of 512 and a learning rate of 1 × 10
−4, while the model, using the dataset after applying SMOTE, achieved better performance with respect to Precision, Recall, and F1-score. Moreover, it yielded a lower regression error when using a batch size of 256 and a learning rate of 1 × 10
−3. Additionally, as it can be seen, the proposed model yielded better performance for a batch size of 256 when applying hybrid SMOTE along with RU. One should note that while validating the model, the ReduceLROnPlateau method was used to reduce the learning rate when the loss function stops improving. The parameter ‘Patience’ was set to 15. This means that the learning rate is reduced if no improvement is recorded for 15 epochs and min_lr to 1 × 10
−7. Moreover, to prevent overfitting, ‘EarlyStopping’ was used and ‘patience’ was set to 100. In other words, the training stops when the validation loss does not improve for 100 epochs. One should note that, for
Nt training instances and a batch size of b
s, we need
Nt/b
s iterations to complete one epoch.
To further tune the performance of the proposed model after applying SMOTE and hybrid SMOTE with RU to the entire dataset, we investigated the model with different values of
and recorded the attained performance as depicted in
Table 6. As such, the proposed model achieved better regression performance when resampling the dataset. In particular, associating 15% resampling with SMOTE and RU yielded the best classification and regression performance. The classification performance, which was initially good, did not improve further. This proves that the additional data improved the generalization of the regression model without affecting the classification capability of the model. The experiment results reported above show that training the proposed model on a larger and more balanced dataset improves the regression’s performance. In other words, data resampling improved model generalization.
As the first contribution of this research consists of classifying Zakat declarations into the “under-reporting” and “actual declaration” classes, recall is more important than precision for the classification task. On the other hand, for the prioritization of auditing achieved through the regression task, the revenue at risk represents the main performance metric for this supervised learning task.
Table 6 reports the results obtained using the best proposed model. Namely, the ROC curve in
Figure 3a and the expected revenue at risk vs. the auditing rate in
Figure 3b were achieved using the proposed model, SMOTE + RU and a sampling rate of 15%. Note that the learning rate, the batch size and β were set to 1 × 10
−3, 256 and 0.3, respectively. As it can be seen, auditing 40% of positive cases yields 79% of the expected revenue. In other words, the proposed system requires a 40% auditing coverage to collect 79% of Zakat revenue. This proves that a tradeoff between the performance of the classifier and the prioritization of auditing tasks has been successfully established. In fact, this meets the objective of the proposed model which does not only classify Zakat declarations but also prioritizes the under-reporting risk based on the expected revenue at risk for a better governance of the auditing resources. Further investigation of the results reported in
Figure 3b showed that the leveled segment of the graph is caused by considerable subset of test instances that were correctly assigned to the “under-reporting” class with high probability confidences; however, the associated expected revenue at risk was relatively small.
Finally, in
Figure 4, we compare the performance of the best-proposed model with five typical machine learning models, namely k-Nearest Neighbors (KNN), the Naïve Bayes (NB) classifier, Logistic Regression (LR) [
46], CART decision tree and Linear Discriminant Analysis (LDA). Particularly, the proposed model outperforms CART with an increase of 6% and 5% in terms of accuracy and F1-score, respectively. Moreover, it overtakes the LDA model with an improvement of 17% and 11% in terms of accuracy and F1-score, respectively. One should mention that in addition to this improvement recorded at the classification level, the proposed model allows effective prioritization of the detected under-valuation cases as illustrated in
Figure 3.
5. Conclusions
Recently, the Kingdom of Saudi Arabia (KSA) has started exploiting tax revenue to increase government investments in ambitious new initiatives or prevent drastic budget cuts. In particular, KSA has raised more taxes from non-oil activities to support fiscal consolidation. Moreover, it has witnessed modernization of the technological infrastructure of its tax administration in order to improve tax collection efficiency. In KSA, non-Saudi investors are liable for income tax. On the other hand, Saudi citizen investors (and citizens of the GCC countries) are liable for Zakat, an Islamic assessment. Typically, taxpayers are in charge of preparing and accurately reporting their Zakat declaration, which allows tax authorities to overview and audit their business activities.
Despite government efforts to increase taxpayer compliance, considerable revenue remains at under-reporting risk. Therefore, in this research, we outlined an intelligent approach to support tax authority efforts in detecting under-reporting among Zakat payer declarations. In particular, the proposed solution aims at improving detection accuracy and determining fraud cases that correspond to a higher revenue at risk. Specifically, we formulate Zakat under-reporting detection as a supervised machine learning task. Consequently, we designed a deep neural network that performs simultaneous classification and regression of Zakat declarations into the under-reporting or actual declaration classes and predicts the revenue at risk caused by this fraud, if any. In particular, the proposed network contains an input layer, five hidden layers, and two output layers for classification and regression tasks. This enables the proposed model to prioritize the auditing of specific taxpayers based on the predicted revenue at risk. The proposed model was validated and assessed using a real dataset including 51,919 Zakat declarations and standard performance metrics. Further, SMOTE improved the proposed model performance, and yielded a classification accuracy of 99% and an MAE of 26% for the regression task. Moreover, SMOTE enabled the proposed model to outperform relevant state-of-the-art supervised machine learning models.
As future work, we plan to expand the collection of Zakat declarations by considering relevant third-party data. Moreover, a dynamic approach to determine the best network architecture can be integrated in the solution. This would make the proposed work relevant to other fraud detection datasets and applications. Furthermore, a sequence of label classification techniques as well as regression frameworks will be considered for thorough empirical comparison of the proposed approach with relevant state-of-the-art solutions.