A New Approach for Deepfake Detection with the Choquet Fuzzy Integral

Karaköse, Mehmet; İlhan, İsmail; Yetiş, Hasan; Ataş, Serhat

doi:10.3390/app14167216

Open AccessArticle

A New Approach for Deepfake Detection with the Choquet Fuzzy Integral

by

Mehmet Karaköse

¹

,

İsmail İlhan

²,

Hasan Yetiş

^1,*

and

Serhat Ataş

¹

Computer Engineering Department, Firat University, Elazig 23200, Turkey

²

Vocational School of Technical Sciences, Adiyaman University, Adiyaman 02040, Turkey

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(16), 7216; https://doi.org/10.3390/app14167216

Submission received: 29 July 2024 / Revised: 13 August 2024 / Accepted: 13 August 2024 / Published: 16 August 2024

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

Deepfakes have become widespread and have continued to develop rapidly in recent years. In addition to the use of deepfakes in movies and for humorous purposes, this technology has also begun to pose a threat to many companies and politicians. Deepfake detection is critical to the prevention of this threat. In this study, a Choquet fuzzy integral-based deepfake detection method is proposed to increase overall performance by combining the results obtained from different deepfake detection methods. Three different deepfake detection models were used in the study: XceptionNet, which has better performance in detecting real images/videos; EfficientNet, which has better performance in detecting fake videos; and a model based on their hybrid uses. The proposed method based on the Choquet fuzzy integral aims to eliminate the shortcomings of these methods by using each of the other methods. As a result, a higher performance was achieved with the proposed method than found when all three methods were used individually. As a result of the testing and validation studies carried out on FaceForensics++, DFDC, Celeb-DF, and DeepFake-TIMIT datasets, the individual performance levels of the algorithms used were 81.34%, 82.78%, and 79.15% on average, according to the AUC curve, while the level of 97.79% was reached with the proposed method. Considering that the average performance of the three methods across all datasets is 81.09%, it can be seen that an improvement of approximately 16.7% is achieved. In the FaceForensics++ dataset, in which individual algorithms are more successful, the performance of the proposed method reaches the highest AUC value, 99.8%. It can be seen that the performance rates can be increased by changing the individual methods discussed in the proposed method. We believe that the proposed method will inspire researchers and will be further developed.

Keywords:

deepfake; manipulation; deepfake video detection; faceswap; face reenactment; Choquet fuzzy integral

1. Introduction

With the rapid advancement of deep learning technology in recent years, its use in multimedia is also becoming widespread. Work that could previously take a long time to be performed with image/video processing tools that required expertise has become very short in duration and practical with artificial intelligence technology (AI). With the development of this technology, AI tools have begun to achieve very successful results. The images/videos obtained through deep learning-based methods such as Autoencoders and GAN methods are called deepfakes [1]. Deepfake technology is used in the creative arts, advertising, the film industry, and video games [2]. Another example of the integration of the deepfake with today’s technologies is its use in the field of virtual reality [3]. In addition to all these advantages, deepfake technology also has malicious uses. Successful deepfake images/videos, which are easily produced, have become available to anyone, which has led to the spread of deepfake content in the media. Examples of misuse of this technology have found wide coverage in the media and have been the subject of films. Deepfake has become a daily concern these days.

Deepfake videos are often created by changing head positions on human images, switching between faces, and using methods such as lip syncing to manipulate speech. Although these methods are quite realistic and convincing, many famous companies, such as Kaggle and Facebook, organize competitions to combat them [4]. The number of studies on deepfake detection methods has increased, focusing on the disadvantages of data created with deepfake methods, such as body position, head position, number of blinks according to gender and age, color differences between images, time and space incompatibilities, and voice and noise inconsistencies [5]. Deepfake detection methods are also a subject of recent work. This field has been developed over the years and has been associated with deepfake creation methods. With this development, it has recently begun to be widely discussed in the field of security. Many different datasets have emerged with the remarkable work of users in this field.

Kawa et al. worked on deepfake detection by using a new activation function and developed MesoNet [6]. Korshunov et al. analyzed methods used in deepfake detection and presented the first publicly available deepfake video dataset [7]. Ismail et al. developed a new methodology using XGBoost, a top-level CNN network identifier, for deepfake detection [8]. Mitra et al. employed a detection technique that identifies visual artifacts in images on social media to detect forgery [9]. Gong et al. proposed a deepfake algorithm called DeepfakeNet, which consists of 20 network layers, and studied its performance using the FF+, Kaggle, and TIMIT datasets [10]. Almars extensively reviewed the work on deepfake detection and compared existing studies in the field [11]. Tao provided information about current deepfake creation and detection techniques and compared various datasets [12]. Ismail et al. conducted a comparative analysis of deepfake detection methods, explaining their methodologies and presenting performance results in tables [13]. Factors affecting detection methods include training data, image quality and resolution, preprocessing, artificial neural architecture, and classification methods. In another study, its authors presented a more efficient method by combining the EfficientNet and Xception methods [14]. Subhrajit et al. proposed a method for detecting COVID-19 disease using X-ray images of patients. In their method, they used the Choquet fuzzy integral, which combines the InceptionV3, DenseNet121, and VGG19 models, achieving an accuracy of 99% in classification processes [15]. They also proposed a new collection of fuzzy integral-based multi-classifiers to improve the accuracy of Android malware classification.

To increase the accuracy of fuzzy integral-based classification applications, a multi-classifier ensemble was used in another study. The proposed approach, which was based on the Choquet fuzzy integral technique, outperformed the single classifiers, achieving the highest accuracy, 95.08% [16]. Another study focused on the detection of COVID-19 using the Choquet fuzzy integral method, including the InceptionV3, DenseNet121, and VGG19 models, achieving a test accuracy of 99.05%. These results were superior to those of commonly used classifier ensemble methods as well as many state-of-the-art techniques [15].

This study proposes a method based on the Choquet fuzzy integral to improve the overall accuracy of the classification of deepfake videos. The proposed approach can collect more meaningful prediction results from multiple classifiers using adaptive fuzzy measures. The rest of this paper is organized in the following way: In Section 2, the methods of creating deepfake videos and the datasets and detection examples that allow us to reach the solution are explained. Section 3 presents the model of the proposed method, the algorithms used, and the key elements. Section 4 includes the experimental results and comparisons with state-of-the-art models. Finally, discussions are included in Section 5, and the paper is concluded in Section 6.

2. Materials and Methods

In this section, we present the creation of fake videos, and compare various fake-video creation methods. Moreover, we examine fake-video detection methods and provide a comparison of these techniques.

2.1. Deepfake Video Creation

“Deepfake” is a combination of the words “deep” and “fake”, signifying the integration of deep learning and forgery. The first deepfake video was created by a Reddit user with the username “deepfake”, resulting in the term “deepfake” being used to describe such videos ever since [17]. Deepfake technology generates fake images within videos by training the program on a dataset using a neural network. These fake images can be created through techniques such as face swapping, lip-syncing, and voice swapping. Deepfake creation techniques can be broadly categorized into two main groups: learning-based methods and approaches inspired by computer graphics.

Examples of learning-based methods include Deepfakes, Faceswap, and Faceswap-GAN [18]. Although there are various neural network architectures, deepfake videos are typically created using structures composed of variations and combinations of convolutional networks and encoder–decoder networks. Commonly used networks include Convolutional Autoencoders, Generative Adversarial Networks (GANs), Convolutional Neural Networks (CNNs), and Recurrent Neural Networks (RNNs).

To create a deepfake, two components are needed, an encoder and a decoder. The encoder is utilized in both the creation and the training processes of fake videos. The decoder, on the other hand, is employed to differentiate between real and fake data. As shown in Figure 1, an input image is fed into to the encoder to generate fake videos. The encoder takes a sample face from this image, together with a sample face from another encoder. Subsequently, these samples are transmitted to the decoder. The decoder then merges the two incoming images to generate a fake image.

Another method for creating deepfakes is through GANs (Generative Adversarial Networks), which consist of two neural networks working in opposition to each other. While the generator aims to produce realistic images (as illustrated in Figure 2), the discriminator attempts to differentiate between real and fake images. If the generator successfully deceives the discriminator, the discriminator adjusts its decision-making process based on the received feedback. Conversely, if the discriminator identifies the generated image as fake, the generator updates itself to generate more convincing images.

Another technology employed in deepfake generation is the CNN (Convolutional Neural Network). CNNs learn hierarchical patterns from images and can be utilized to detect artifacts within images. Additionally, RNNs (Recurrent Neural Networks) are used in deepfake generation. RNNs are a type of neural network capable of processing sequential and variable-length data, essentially providing a form of memory within the network. This facilitates the exchange of information between previous and current states. Deepfake video generation can be classified into categories such as reenactment, replacement, editing, and synthesis [18].

A list of featured creation methods is provided in Table 1. It is evident from the table that fake images are generated by incorporating multiple features such as mouth, expression, pose, gaze, and body in the methods.

2.1.1. Reenactment

Deepfake Reenactment involves using the source (x_s) image to create a mouth, pose, gaze, or body image on the target (x_t) image. Mouth Reenactment involves directing the mouth of x_t based on the mouth of x_s, or using voice or text. Lip-syncing changes the lip movements of the person in the target image according to those in the source image, making it appear as if the person in the target image is speaking the words of the source image. The goal is to synchronize the speech of another image with the mouth movements of the source image. Gaze Reenactment guides the orientation of x_t’s eyes and the position of its eyelids based on x_s. Pose Reenactment controls the head position of x_t based on x_s. Expression Reenactment generates x_t’s expression based on x_s, guiding the target’s mouth and pose and providing ample flexibility. Body Reenactment involves human pose synthesis, creating a body pose similar to the face reenactment of x_t. Full Body Puppetry integrates the body movements of the person in the source image with those of the target person, combining their body position movements [20]. The purpose is to transfer one person’s body position and movements to another person.

2.1.2. Replacement

Deepfake replacement is the process in which the x_t content is replaced with the x_s content and the x_s identity is preserved. Transfer is the replacement of x_t content with x_s content. Swap is the redirection of content transferred from x_s to x_t by x_t.

2.1.3. Editing

Deepfake editing is the process of adding, changing, or removing properties from the x_t image. For instance, this may involve altering clothes, beard, age, hair, weight, image aesthetics, and ethnicity. While primarily utilized for entertainment purposes, it can also be employed to modify ages and genders, enabling the creation of diverse online profiles.

2.1.4. Synthesis

Deepfake synthesis refers to the creation of a new image without specific targeting. Techniques for synthesizing human faces and bodies can be employed to generate new characters for movies and games, or to produce fake-person images.

To compare the deepfake methods, each is used for different purposes. While all of these methods can be harmful to society, they can also benefit many platforms, and each method has its own purposes and is used in different ways. For example, lip-syncing can use a famous person’s face to create useful videos for commercials and instructional content, but it can also be maliciously used and threatening, as was the case with the Barack Obama video [21].

2.2. Deepfake Video Detection

The detection of deepfake videos employs various methods to identify fraudulent content. Numerous techniques have been developed for deepfake detection, leading to significant advancements in this field. The rapid development of deepfake technology in recent years has raised significant concerns, particularly regarding the lag in deepfake detection capabilities. In response to these concerns, major technology companies such as Facebook and Microsoft have initiated studies and provided support for research into deepfake detection. Additionally, competitions have been organized in this field, with the Deepfake Detection Challenge standing out as a prominent initiative. This challenge is a collaborative effort involving DARPA (the United States Defense Advanced Research Projects Agency), Facebook, Microsoft, and The Partnership on AI [13,22].

Deepfake detection operates in a manner similar to the creation of deepfakes themselves. Neural networks, such as CNNs, RNNs, and DNNs, are commonly employed for this purpose. These networks are trained on datasets to recognize facial features and patterns indicative of manipulation. Detection methods encompass a range of techniques, including analyzing body language, detecting blinking patterns, assessing head position, and scrutinizing mimic movements.

In deepfake detection, various factors are considered, such as biological signals, spatial and temporal feature analysis, head position, body language, and color inconsistencies. For instance, one approach involves the detection of forgery by analyzing blinking patterns. Li et al., in their forgery detection studies, successfully identified manipulation by analyzing eye blinks [23]. This method compares the number of blinks in a video subject to those of a normal individual, detecting discrepancies as indicative of forgery. Through blink analysis, the detection process discerns irregularities in blinking behavior, aiding in the identification of deepfake content.

Many deepfake detection methods rely on the analysis of individual video frames [24]. When deepfake videos are created, the focus is often on a single frame, causing inconsistencies between frames. We can detect forgery by detecting the temporal and spatial differences between the frames by performing the forgery detection on many frames instead of a single frame. In this detection method, a temporal sequence is created, and analysis is performed between them. We compared the deepfake detection approaches used today and those widely used recently in Table 2. According to the table, most of the detection applications have used CNN and LSTM in their models, rely on features in their content, and utilize videos as their application source. The structures used in artificial neural architecture, such as CNN, RNN, LSTM, RCN, LCRN, DNN, Capsule CNN, and AE have specific effective properties, which has led to the formation of many mixed architectures in applications. As a result, the unique features of the applications have come to the fore. When we look at the comparisons, it is seen that some applications are more accurate, some applications are faster, and some applications are better within each general area. The technology creates an architecture according to the solution method it focuses on and shows a more successful performance in that area. This situation is similar to a treatment being given according to the type of disease.

In addition, with the developments in deepfake detection in recent years, many comprehensive datasets have emerged; today, we can easily access Celeb-DF, FaceForensics++, UADVF, DeeperForensics, DFTIMIT, and many private datasets as well [7,41,42,43]. With the development of these datasets, model training has become healthier and stronger. The datasets used for deepfake detection are highly effective for model training, and if we can train the model well, we can then make detections with high accuracy by utilizing the successful program.

3. Proposed Method

In our proposed method, we utilize different deepfake detection algorithms to process a video input, resulting in numerical outputs ranging from 0 to 1. These numerical values are then employed in the Choquet fuzzy integral algorithm. The Choquet fuzzy integral combines the outputs of the three algorithms, taking into account both the results and their assigned importance levels. If the output value from the Choquet fuzzy integral is below 0.50, the video is classified as fake; if it is above 0.50, it is considered real. Our goal in using the proposed method is to enhance deepfake detection by combining the outputs of three algorithms using the Choquet fuzzy integral. The steps of our proposed method are illustrated in Figure 3.

Unlike the traditional fuzzy Choquet integral, we determined the fuzzy measures (

g

) with the help of genetic algorithms. This allows combinatorial fuzzy measures (

g_{1,2}

,

g_{1,3}

etc.) to be calculated independently from the

λ

value calculated in the classical Choquet fuzzy integral method. In fuzzy measure notation,

g_{1,2} = g {x_{1}, x_{2}}

and x corresponds to the DDAs in our study. Since

g {\emptyset} = 0

and

g \{x_{1}, x_{2}, x_{3}\} = 1

in fuzzy measure values, these are not included in the calculation. The constraint used to calculate fuzzy measure values with genetic algorithms is given in Equation (1). The super-additive interaction type was selected because having more than one model produce the same result will increase the confidence value for the result [15].

g {x_{i}, x_{j}} > g {x_{i}} + g {x_{j}}, f o r a l l i, j w h e r e i \neq j

(1)

The parameters pop_size = 16, max_iter = 50, and early_stopping = 10, which gave the best results through trial and error, were used to run the genetic algorithms. The crossover probability was set to 0.7 and the mutation probability was set to 0.3. Crossover and mutation operations were carried out as given in Equation (2) and Equation (3) respectively, where {

C_{1}, C_{2}, C_{3} \dots, C_{n}

} is the population, and r refers to the randomly generated number. The correct classification rate was determined as the fitness function. Calculation of fuzzy measures with genetic algorithms is given in Figure 4.

h_{1} = 0.5 (C_{1} + C_{2}) h_{2} = 1.5 C_{1} - 0.5 C_{2}

(2)

C_{m} = h_{i} \pm r / 4

(3)

Due to the fact that detection performance depends on extracted faces and models, we employed 3 different deepfake detection algorithms (DDAs), as summarized in Figure 5. We chose these algorithms because it has been observed that, while the Xception model gives better results in detection of real videos, the EfficientNet model gives better results in the detection of fake videos [14]. The main purpose of the study is demonstrating the usability of the Choquet fuzzy integral to increase the overall success. Therefore, we employed different face-extraction algorithms to obtain the output variety. For all models, a batch size of 64 is chosen, and default parameter values are used.

3.1. Deepfake Detection Algorithm 1 (DDA 1)

The first algorithm we used for deepfake detection processes the video we have provided subsequent to performing preprocessing on the video. In the preprocessing part, we perform the conversion operations on the video, and after this conversion process is finished, we extract the frames from the converted video and perform the face-extraction process over these frames by using the Blazeface model in our algorithm. Blazeface is a face-extraction model produced by Google that can even be used on phones. After the face-extraction process is finished, we perform deepfake detection using our Xception model trained on the DFDC dataset. Blazeface and Xception models used in this algorithm were used with transfer learning and the weights of the models were taken from shared platforms [14]. The architecture of XceptionNet is given in Figure 6 [44].

3.2. Deepfake Detection Algorithm 2 (DDA 2)

In the second algorithm we used for deepfake detection, we again perform video conversion operations. In this algorithm, we focus on the retina in the face-extraction process and we use the Resnet model as the face-extraction model. After the face-extraction process is finished in this algorithm, we use two models, EfficientNet and Xception, to produce a detection result at the same time. The EfficientNet and Xception models used in this algorithm were used with transfer learning and the weights of the models were taken from the shared platforms [14,36,45,46].

3.3. Deepfake Detection Algorithm 3 (DDA 3)

The third algorithm which we have used for deepfake detection performs preprocessing similar to the other 2 algorithms. In this algorithm, we perform face subtraction with the help of MTCNN model while preprocessing is performed. After the face-extraction process is completed, we perform fake detection using the EfficientNet model. The MTCNN and EfficientNet models used in this algorithm were used with transfer learning, and the weights of the models were taken from shared platforms [45,46,47,48]. The architecture of XceptionNet is given in Figure 7 [49].

3.4. The Choquet Fuzzy Integral

The Choquet fuzzy integral method has been used before in many pattern recognition problems. The advantage of the method, different from the weighted average, is that it uses the degree of uncertainty found in the decision scores. That is, it is a generalization of aggregation operators over a set of confidence scores known as fuzzy measures. These fuzzy measures are weighted values assigned to each classifier [15]. The advantage of the Choquet fuzzy integral is that it can change the weight of each classifier based on the prediction results provided by other classifiers. This makes the system dynamic, unlike the majority voting system or other aggregation methods that use weighted average, where setting some parametric values makes the system completely static [15].

The Choquet integral begins with an alternative representation of the area under the function f in the case of addition. To clarify, this expression refers to the horizontal combinations of the input algorithms. This expression is shown in Figure 8. The operation here is the permutation {1, 2, …, n} [15,50], that is: Let

g ()

calculates fuzzy measure values of a set of algorithms

A = {{a}_{1}, a_{2} \dots a_{n}}

and

X {{x}_{1}, x_{2} \dots x_{n}}

are the performance scores of the individual algorithms in A. For calculating the fuzzy membership values of algorithm combinations, we first need to determine the value of

λ

using Equation (4).

λ + 1 = \prod_{i = 1}^{n} (1 + λ \cdot g_{i}), λ > - 1 0 \leq f (x_{1}) \leq f (x_{2}) \leq \dots \leq f (x_{n}) F_{1} = \{x_{1}\}, F_{2} = \{x_{1}, x_{2}\}, \dots F_{n} = \{x_{1}, x_{2, \dots,} x_{n}\}

(4)

The process of evaluating criteria using the Choquet fuzzy function is described in Equations (5) and (6) [51].

(C) \int F d g = f (x_{n}) \cdot g_{λ} (F_{n}) + [f (x_{n - 1}) - f (x_{n})] \cdot g_{λ} (F_{n - 1}) + \dots + [f (x_{1}) - f (x_{2})] \cdot g_{λ} (F_{1})

(5)

(C) \int F d g = f (x_{n}) \cdot [g_{λ} (F_{n}) - g_{λ} (F_{n - 1})] + f (x_{n - 1}) \cdot [g_{λ} (F_{n - 1}) - g_{λ} (F_{n - 2})] + \dots + f (x_{1}) \cdot g_{λ} (F_{1})

(6)

The Choquet fuzzy integral takes the 3 deepfake detection algorithms we have given above and takes the output values of these algorithms as input to its algorithm. We give these 3 algorithms 3 different degrees of importance between 0–1 so that the sum of the importance degrees of the algorithms is 1. As a result of these operations, we obtain a numerical value between 0–1. As a result, the value we receive shows whether the image we received is fake or real. The Choquet fuzzy integral combines the 3 algorithms here and provides improvement by giving better results than when using the algorithms alone. The combinations of the Choquet fuzzy integral subsets of the 3 algorithms are given in Figure 9. The i = 1, 2, …, n, and the fuzzy membership values for each classifier (i.e., g(x_i)) are experimentally set in the classical Choquet fuzzy integral.

4. Experimental Results

The characteristics of the datasets used for the test are briefly explained in Table 3. These datasets are the most preferred datasets used in training and test data in other methods. As seen in Table 3, the numbers of videos in the datasets, the durations of the videos, the numbers of frames in the videos, the resolution of the videos, and the file formats of the videos affect the training performance of the application. They also affect the speed, accuracy, and production quality of the application. In addition, the dataset should be suitable for the focused algorithm in the detection method. For example, in an application that detects inconsistencies in head poses, a dataset created for artificial blinks should not be used in the training process.

The effect of datasets is also clearly seen in the accuracy performance of applications. Table 3 shows that the test accuracy rate of the dataset used in the training set is different when compared to another dataset. Some applications deal with neural network and feature extraction or temporal inconsistencies between frames, while some applications use both [25,26,35].

We randomly selected 100 data elements from the DFDC, CELEB-DF, and FF-DF datasets; this is called the “custom dataset” in the rest of the paper. The custom dataset is used for the test algorithm. We ran each algorithm alone on these videos and recorded the detection results, and then we ran all three algorithms on the same video and gave the detection results to the Choquet fuzzy integral as an input value, giving our algorithms importance degrees according to the detection results they gave. We converted the values we received into accuracy rates and recorded them as experimental results. After the individual weight or the fuzzy density for each detection algorithm is obtained, the fuzzy measures which represent the degrees of interdependence for all subsets of the deepfake detection algorithms can be calculated by using (1). Since there are three inputs being considered, there are eight subsets available. Table 4 shows results of the measures for all subsets. In Table 4,

x_{1}

represents the output of algorithm 1,

x_{2}

represents the output of algorithm 2, and

x_{3}

represents the output of algorithm 3. The required fuzzy measure must be calculated for the Choquet fuzzy integral module. The test results of each algorithm were recorded and used as a training set in the Choquet fuzzy integral network.

First, we run the deepfake detection algorithms (DDA) separately, and later, they are run in combination, using the Choquet fuzzy integral. The confusion matrices for the experiments on the custom dataset are given in Figure 10. Accuracy values for the methods used for the custom dataset are given in Table 5. As can be seen from Table 5, the Choquet fuzzy integral has better performance.

To demonstrate the computational efficiency of the proposed method, we run the inference process for different sizes of inputs. The elapsed times for inference associated with DDA1, DDA2, DDA3, and the proposed method are given in Table 6. Because the proposed algorithm uses two methods, in addition to their hybrid use, it has no need to calculate an additional instance of a given method’s output. It can be seen in the table that the proposed method’s run time is no longer than two instances of each method. Because the inputs are resized before inference, only the resizing process affects the performance. The experimental studies used to obtain the values in Table 6 are performed with a computer equipped with an Intel Core i7 processor, AsusRTX 3060 GPU, and 32 GB RAM.

In order to compare the results of the study, DDA1, DDA2, DDA3, and the proposed method were tested on the DFDC, CELEB-DF, FF-DF, DF-TIMIT datasets. AUC values of the methods are given in Table 7, in comparison with studies in the literature.

5. Discussions

Studies in the literature that aim to combine the results of different sub-methods are given in Table 7. In study [57], a combination of Google and FFDF datasets was used. A single result was obtained from six different ResNet version results (50, 50V2, 101, 101V2, 152, and 152V2) obtained with ensemble learning. As a result, a 95.5% AUC score was obtained.

In study [58], the results of seven different networks, namely XceptionNet, InceptionV3, Inception v2, MobilNet, ResNet101, DenseNet121, and DenseNet169, were given to a CNN classifier. While InceptionV3 had a minimum AUC value of 86.6% and a maximum AUC value of 97.6%, for Xception, an AUC value of 100% was reached with the proposed method. Results with different datasets were not included in the study. There is no information about the working times of the methods. In study [59], the results of different versions of EfficientNet were combined. In the study conducted on the FF-DF and DFDC datasets, the AUC values of 94.4% and 87.8% were obtained, respectively, when the B4, B4ST, B4Att, and B4AttST models were used together. It is noteworthy that a higher AUC value of 88.1% was obtained for the DFDC dataset when only B4, B4ST, and B4Att were used, and without using B4AttST. In study [60], an ensemble method is proposed with Xception, Xception with Attention, and EfficientNet with Attention models. In the study, metrics are given as a result of the methods, which worked separately on the DFDC and CelebDF datasets. The best results were obtained with Xception with Attention, with 97.2% for DFDC and 97.8% for CELEB-DF. The AUC values obtained with the proposed ensemble method were 97.5% for DFDC and 98.4% for CELEB-DF. It is seen that the application of the attention layer is an important contribution to the success of the method. Findings regarding the running time of the method are not included. In study [61], the results of 12 different ensemble methods and six different models (EfficientNet AttB4ST-AttB4-B4ST-B4, Xception, and ResNet) were combined. As a result of the training and testing operations performed on the FF-DF dataset, the highest individual performance was obtained with EfficientNet AttB4 and B4ST, with 95.9%. In the ensemble methods, the lowest performance was obtained with the Max method, with 76.2%. The highest performance was obtained with Multi Layer Perception, with 98.4%. In the runtimes, provided without specifying the hardware features, the high runtime of the SVM-based method is striking. In addition, it is understood that the time elapsed for the ensemble method is given only with the data in relation to microseconds, and an impression cannot be obtained about the loads that the individual methods will bring to the system.

As a result, in the studies mentioned, the individual method performance levels are generally high and the contributions obtained with the ensemble method are limited. In this respect, it is seen that the performance levels obtained with the help of the Choquet fuzzy integral are more effective in terms of the ensemble. In addition, up to seven different methods have been used in different studies, and although comparisons cannot be made in terms of working times, it is thought that the use of a large number of methods will increase the required resources.

6. Conclusions

Images/videos produced with deepfake technology have started to become more realistic in recent years. Today, deepfake technologies are used in sectors such as movies and games, as well as for malicious purposes. Damage can be prevented by detecting deepfakes used for malicious purposes such as blackmail, fraud, and humiliation.

Algorithms generally used in deepfake detection focus on the shortcomings of deepfake production methods. Some methods are better at detecting real entries, and some methods are better at detecting fake entries; XceptionNet and EfficientNet are corresponding examples. A Choquet fuzzy integral-based method is proposed in this study to increase the overall performance by using the advantages of both algorithms for deepfake detection. In the proposed method, the outputs obtained from running Xception, EfficientNet, and their hybrid are given as input to the Choquet fuzzy integral.

The FaceForensics++, DFDC, Celeb-DF, and DeepFake-TIMIT datasets were used in the study. Firstly, it has been confirmed by the accuracy and f1-score metrics presented above that the method increases the performance on the compiled dataset, which is a subset of these datasets. Then, in order to make comparisons, all datasets were tested separately, and comparisons were made based on AUC values generally used in the literature. In the comparisons given, the outputs obtained from the proposed method, as well as instances run with DDA1 (XceptionNet based), DDA2 (hybrid use), and DDA3 (EfficientNet based) are given for each dataset. While the best result for the DFDC dataset was obtained from the DDA 3 model, with 87.98% AUC, 95.3% AUC was obtained using the Choquet fuzzy integral with the proposed method. It can be seen that while the maximum CELEB-DF AUC value is 77.45% with the DDA 1 algorithm, it increases to 92.6% with the proposed method. Similarly, FF-DF achieved the best AUC, 95.50%, with the DDA 1 model, and this rate was increased to 99.8% with the proposed method. Finally, DDA 2 gave the highest AUC value for the DF-TIMIT dataset, with 74.1%. With the proposed method, this value was increased to 89.7%.

The results show that the performance for each dataset is improved compared to using the methods separately. When compared to studies in the literature, it can be seen that the results are superior to those of other studies, with 99.8% success for the FaceForensics++ dataset. As to consideration of the inference working time of the proposed method, it appears to be less than twice the working time of an individual method.

In our study, it is seen that higher performance levels than the maximum performance can be achieved by combining different algorithms using the Choquet fuzzy integral. It seems that it would be possible to further increase the performance by changing the methods used in the Choquet fuzzy integral method. In future studies, research will be conducted on scenarios increasing the number of models and using generalized Choquet integrals.

Author Contributions

Conceptualization, M.K.; Methodology, M.K.; Software, İ.İ. and S.A.; Formal analysis, H.Y.; Investigation, H.Y.; Writing—original draft, İ.İ. and S.A.; Writing—review and editing, H.Y.; Visualization, H.Y.; Project administration, M.K. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by Scientific and Technological Research Council of Turkey (TUBITAK) under Grant Number 122E676. The authors thank TUBITAK for their support. This study was supported by FIRAT University Scientific Research Projects Unit (FUBAP) under Grant Number MF.24.44. The authors thank FUBAP for their support.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data is contained within the article.

Acknowledgments

The work for this article was carried out within the scope of İsmail İlhan’s doctoral dissertation, named “Real-Time Detection of Deepfake Videos Using Machine Learning”. The supervisor of the thesis: Mehmet Karaköse.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Berrahal, M.; Boukabous, M.; Idrissi, I. A Comparative Analysis of Fake Image Detection in Generative Adversarial Networks and Variational Autoencoders. In Proceedings of the 2023 International Conference on Decision Aid Sciences and Applications (DASA), Annaba, Algeria, 16–17 September 2023; pp. 223–230. [Google Scholar]
Singh, H.; Kaur, K.; Mohan Nahak, F.; Singh, S.K.; Kumar, S. Deepfake as an Artificial Intelligence tool for VFX Films. In Proceedings of the 2023 7th International Conference on Computation System and Information Technology for Sustainable Solutions (CSITSS), Bangalore, India, 2–4 November 2023; pp. 1–5. [Google Scholar]
Seymour, M.; Riemer, K.; Yuan, L.; Dennis, A.R. Beyond Deep Fakes. Commun. ACM 2023, 66, 56–67. [Google Scholar] [CrossRef]
Arshed, M.A.; Mumtaz, S.; Ibrahim, M.; Dewi, C.; Tanveer, M.; Ahmed, S. Multiclass AI-Generated Deepfake Face Detection Using Patch-Wise Deep Learning Model. Computers 2024, 13, 31. [Google Scholar] [CrossRef]
Raza, A.; Munir, K.; Almutairi, M. A Novel Deep Learning Approach for Deepfake Image Detection. Appl. Sci. 2022, 12, 9820. [Google Scholar] [CrossRef]
Afchar, D.; Nozick, V.; Yamagishi, J.; Echizen, I. Mesonet: A compact facial video forgery detection network. In Proceedings of the 2018 IEEE International Workshop on Information Forensics and Security (WIFS), Hong Kong, China, 11–13 December 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–7. Available online: https://ieeexplore.ieee.org/abstract/document/8630761/ (accessed on 27 February 2024).
Korshunov, P.; Marcel, S. DeepFakes: A New Threat to Face Recognition? Assessment and Detection. arxiv 2018, arXiv:1812.08685. Available online: http://arxiv.org/abs/1812.08685 (accessed on 23 February 2024).
Ismail, A.; Elpeltagy, M.; Zaki, M.S.; Eldahshan, K. A new deep learning-based methodology for video deepfake detection using XGBoost. Sensors 2021, 21, 5413. [Google Scholar] [CrossRef] [PubMed]
Mitra, A.; Mohanty, S.P.; Corcoran, P.; Kougianos, E. A Machine Learning Based Approach for Deepfake Detection in Social Media Through Key Video Frame Extraction. SN Comput. Sci. 2021, 2, 98. [Google Scholar] [CrossRef]
Gong, D.; Kumar, Y.J.; Goh, O.S.; Ye, Z.; Chi, W. DeepfakeNet, an efficient deepfake detection method. Int. J. Adv. Comput. Sci. Appl. 2021, 12, 201–207. [Google Scholar] [CrossRef]
Almars, A.M. Deepfakes detection techniques using deep learning: A survey. J. Comput. Commun. 2021, 9, 20–35. [Google Scholar] [CrossRef]
Zhang, T. Deepfake generation and detection, a survey. Multimed. Tools Appl. 2022, 81, 6259–6276. [Google Scholar] [CrossRef]
İlhan, İ.; Karaköse, M. A Comparison Study for The Detection and Applications of Deepfake Videos. Adıyaman Üniversitesi Mühendislik Bilim. Derg. 2021, 8, 47–60. [Google Scholar]
AtaŞ, S.; İlhan, İ.; KarakÖse, M. An Efficient Deepfake Video Detection Approach with Combination of EfficientNet and Xception Models Using Deep Learning. In Proceedings of the 2022 26th International Conference on Information Technology (IT), Zabljak, Montenegro, 16–19 February 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–4. Available online: https://ieeexplore.ieee.org/abstract/document/9743542/ (accessed on 27 February 2024).
Dey, S.; Bhattacharya, R.; Malakar, S.; Mirjalili, S.; Sarkar, R. Choquet fuzzy integral-based classifier ensemble technique for COVID-19 detection. Comput. Biol. Med. 2021, 135, 104585. [Google Scholar] [CrossRef] [PubMed]
Taha, A.; Barukab, O.; Malebary, S. Fuzzy integral-based multi-classifiers ensemble for android malware classification. Mathematics 2021, 9, 2880. [Google Scholar] [CrossRef]
Chadha, A.; Kumar, V.; Kashyap, S.; Gupta, M. Deepfake: An Overview. In Proceedings of the Second International Conference on Computing, Communications, and Cyber-Security, Ghaziabad, India, 3–4 October 2020; Singh, P.K., Wierzchoń, S.T., Tanwar, S., Ganzha, M., Rodrigues, J.J.P.C., Eds.; Lecture Notes in Networks and Systems. Springer: Singapore, 2021; Volume 203, pp. 557–566, ISBN 9789811607325. [Google Scholar] [CrossRef]
Mirsky, Y.; Lee, W. The Creation and Detection of Deepfakes: A Survey. ACM Comput. Surv. 2022, 54, 1–41. [Google Scholar] [CrossRef]
Xu, R.; Zhou, Z.; Zhang, W.; Yu, Y. Face Transfer with Generative Adversarial Network. arXiv 2017, arXiv:1710.06090. Available online: http://arxiv.org/abs/1710.06090 (accessed on 25 May 2024).
Kietzmann, J.; Lee, L.W.; McCarthy, I.P.; Kietzmann, T.C. Deepfakes: Trick or treat? Bus. Horiz. 2020, 63, 135–146. [Google Scholar] [CrossRef]
Thies, J.; Zollhofer, M.; Stamminger, M.; Theobalt, C.; Nießner, M. Face2face: Real-time face capture and reenactment of rgb videos. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 2387–2395. Available online: http://openaccess.thecvf.com/content_cvpr_2016/html/Thies_Face2Face_Real-Time_Face_CVPR_2016_paper.html (accessed on 27 February 2024).
Nguyen, T.T.; Nguyen, Q.V.H.; Nguyen, D.T.; Nguyen, D.T.; Huynh-The, T.; Nahavandi, S.; Nguyen, T.T.; Pham, Q.-V.; Nguyen, C.M. Deep learning for deepfakes creation and detection: A survey. Comput. Vis. Image Underst. 2022, 223, 103525. [Google Scholar] [CrossRef]
Li, Y.; Chang, M.-C.; Lyu, S. In ictu oculi: Exposing ai created fake videos by detecting eye blinking. In Proceedings of the 2018 IEEE International Workshop on Information Forensics and Security (WIFS), Hong Kong, China, 11–13 December 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–7. Available online: https://ieeexplore.ieee.org/abstract/document/8630787/ (accessed on 27 February 2024).
de Lima, O.; Franklin, S.; Basu, S.; Karwoski, B.; George, A. Deepfake Detection using Spatiotemporal Convolutional Networks. arXiv 2020, arXiv:2006.14749. Available online: http://arxiv.org/abs/2006.14749 (accessed on 23 February 2024).
Sabir, E.; Cheng, J.; Jaiswal, A.; AbdAlmageed, W.; Masi, I.; Natarajan, P. Recurrent convolutional strategies for face manipulation detection in videos. Interfaces 2019, 3, 80–87. [Google Scholar]
Güera, D.; Delp, E.J. Deepfake video detection using recurrent neural networks. In Proceedings of the 2018 15th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Auckland, New Zealand, 27–30 November 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–6. Available online: https://ieeexplore.ieee.org/abstract/document/8639163/ (accessed on 25 May 2024).
Li, Y.; Lyu, S. Exposing deepfake videos by detecting face warping artifacts. arXiv 2018, arXiv:1811.00656. Available online: http://openaccess.thecvf.com/content_CVPRW_2019/papers/Media%20Forensics/Li_Exposing_DeepFake_Videos_By_Detecting_Face_Warping_Artifacts_CVPRW_2019_paper.pdf (accessed on 25 May 2024).
Khalil, S.S.; Youssef, S.M.; Saleh, S.N. iCaps-Dfake: An integrated capsule-based model for deepfake image and video detection. Future Internet 2021, 13, 93. [Google Scholar] [CrossRef]
Yang, X.; Li, Y.; Lyu, S. Exposing deep fakes using inconsistent head poses. In Proceedings of the ICASSP 2019–2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Brighton, UK, 12–17 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 8261–8265. Available online: https://ieeexplore.ieee.org/abstract/document/8683164/ (accessed on 27 February 2024).
Dang, H.; Liu, F.; Stehouwer, J.; Liu, X.; Jain, A.K. On the detection of digital face manipulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 5781–5790. Available online: http://openaccess.thecvf.com/content_CVPR_2020/html/Dang_On_the_Detection_of_Digital_Face_Manipulation_CVPR_2020_paper.html (accessed on 23 February 2024).
Nguyen, H.H.; Fang, F.; Yamagishi, J.; Echizen, I. Multi-task learning for detecting and segmenting manipulated facial images and videos. In Proceedings of the 2019 IEEE 10th International Conference on Biometrics Theory, Applications and Systems (BTAS), Tampa, FL, USA, 23–26 September 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 1–8. Available online: https://ieeexplore.ieee.org/abstract/document/9185974/ (accessed on 25 May 2024).
Korshunov, P.; Marcel, S. Speaker inconsistency detection in tampered video. In Proceedings of the 2018 26th European Signal Processing Conference (EUSIPCO), Roma, Italy, 3–7 September 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 2375–2379. Available online: https://ieeexplore.ieee.org/abstract/document/8553270/ (accessed on 25 May 2024).
Korshunov, P.; Halstead, M.; Castan, D.; Graciarena, M.; McLaren, M.; Burns, B.; Lawson, A.; Marcel, S. Tampered speaker inconsistency detection with phonetically aware audio-visual features. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; Available online: https://publications.idiap.ch/attachments/papers/2019/Korshunov_AVFAKESICML_2019.pdf (accessed on 25 May 2024).
Chen, T.; Kumar, A.; Nagarsheth, P.; Sivaraman, G.; Khoury, E. Generalization of Audio Deepfake Detection. In Proceedings of the Odyssey, Tokyo, Japan, 1–5 November 2020; pp. 132–137. Available online: https://www.researchgate.net/profile/Avrosh-Kumar/publication/345141913_Generalization_of_Audio_Deepfake_Detection/links/600cb38945851553a0678e07/Generalization-of-Audio-Deepfake-Detection.pdf (accessed on 25 May 2024).
Chintha, A.; Thai, B.; Sohrawardi, S.J.; Bhatt, K.; Hickerson, A.; Wright, M.; Ptucha, R. Recurrent convolutional structures for audio spoof and video deepfake detection. IEEE J. Sel. Top. Signal Process. 2020, 14, 1024–1037. [Google Scholar] [CrossRef]
Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. Available online: http://openaccess.thecvf.com/content_cvpr_2017/html/Chollet_Xception_Deep_Learning_CVPR_2017_paper.html (accessed on 25 May 2024).
Yasrab, R.; Jiang, W.; Riaz, A. Fighting deepfakes using body language analysis. Forecasting 2021, 3, 303–321. [Google Scholar] [CrossRef]
Yadav, D.; Salmani, S. Deepfake: A survey on facial forgery technique using generative adversarial network. In Proceedings of the 2019 International Conference on Intelligent Computing and Control Systems (ICCS), Madurai, India, 15–17 May 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 852–857. Available online: https://ieeexplore.ieee.org/abstract/document/9065881/ (accessed on 25 May 2024).
Matern, F.; Riess, C.; Stamminger, M. Exploiting visual artifacts to expose deepfakes and face manipulations. In Proceedings of the 2019 IEEE Winter Applications of Computer Vision Workshops (WACVW), Waikoloa Village, HI, USA, 7–11 January 2019; IEEE: Piscataway, NJ, USA, 2019; pp. 83–92. Available online: https://ieeexplore.ieee.org/abstract/document/8638330/ (accessed on 25 May 2024).
Amerini, I.; Galteri, L.; Caldelli, R.; Del Bimbo, A. Deepfake video detection through optical flow based cnn. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019; pp. 1205–1207. Available online: http://openaccess.thecvf.com/content_ICCVW_2019/html/HBU/Amerini_Deepfake_Video_Detection_through_Optical_Flow_Based_CNN_ICCVW_2019_paper.html?ref=https://githubhelp.com (accessed on 25 May 2024).
Rossler, A.; Cozzolino, D.; Verdoliva, L.; Riess, C.; Thies, J.; Nießner, M. Faceforensics++: Learning to detect manipulated facial images. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1–11. Available online: http://openaccess.thecvf.com/content_ICCV_2019/html/Rossler_FaceForensics_Learning_to_Detect_Manipulated_Facial_Images_ICCV_2019_paper.html (accessed on 23 February 2024).
Dolhansky, B.; Bitton, J.; Pflaum, B.; Lu, J.; Howes, R.; Wang, M.; Ferrer, C.C. The DeepFake Detection Challenge (DFDC) Dataset. arXiv 2020, arXiv:2006.07397. Available online: http://arxiv.org/abs/2006.07397 (accessed on 25 May 2024).
Li, Y.; Yang, X.; Sun, P.; Qi, H.; Lyu, S. Celeb-df: A large-scale challenging dataset for deepfake forensics. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 3207–3216. Available online: http://openaccess.thecvf.com/content_CVPR_2020/html/Li_Celeb-DF_A_Large-Scale_Challenging_Dataset_for_DeepFake_Forensics_CVPR_2020_paper.html (accessed on 23 February 2024).
Lu, X.; Firoozeh Abolhasani Zadeh, Y.A. Deep Learning-Based Classification for Melanoma Detection Using XceptionNet. J. Healthc. Eng. 2022, 2022, e2196096. [Google Scholar] [CrossRef] [PubMed]
Remi Cadene/Pretrained-Models. Pytorch 2024. Available online: https://github.com/Cadene/pretrained-models.pytorch (accessed on 25 May 2024).
Tan, M.; Le, Q. Efficientnet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; PMLR: New York, NY, USA, 2019; pp. 6105–6114. Available online: http://proceedings.mlr.press/v97/tan19a.html?ref=jina-ai-gmbh.ghost.io (accessed on 25 May 2024).
GitHub—AITTSMD/MTCNN-Tensorflow: Reproduce MTCNN Using Tensorflow. Available online: https://github.com/AITTSMD/MTCNN-Tensorflow (accessed on 25 May 2024).
Xiang, J.; Zhu, G. Joint face detection and facial expression recognition with MTCNN. In Proceedings of the 2017 4th International Conference on Information Science and Control Engineering (ICISCE), Changsha, China, 21–23 July 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 424–427. Available online: https://ieeexplore.ieee.org/abstract/document/8110322/ (accessed on 25 May 2024).
Geetha, A.; Prakash, N. Classification of Glaucoma in Retinal Images Using EfficientnetB4 Deep Learning Model. CSSE 2022, 43, 1041–1055. [Google Scholar] [CrossRef]
Fallah Tehrani, A.; Cheng, W.; Dembczyński, K.; Hüllermeier, E. Learning monotone nonlinear models using the Choquet integral. Mach Learn. 2012, 89, 183–211. [Google Scholar] [CrossRef]
Goztepe, K. A Study on OS Selection Using ANP Based Choquet Integral in Terms of Cyber Threats. IJISS 2012, 1, 67–78. [Google Scholar]
Wei, P.; Ball, J.E.; Anderson, D.T. Fusion of an Ensemble of Augmented Image Detectors for Robust Object Detection. Sensors 2018, 18, 894. [Google Scholar] [CrossRef]
Lu, Y.; Ebrahimi, T. Assessment framework for deepfake detection in real-world situations. J. Image Video Process. 2024, 2024, 6. [Google Scholar] [CrossRef]
Lin, H.; Luo, W.; Wei, K.; Liu, M. Improved Xception with Dual Attention Mechanism and Feature Fusion for Face Forgery Detection. In Proceedings of the 2022 4th International Conference on Data Intelligence and Security (ICDIS), Shenzhen, China, 24–26 August 2022; pp. 208–212. [Google Scholar] [CrossRef]
Kayadibi, I.; Güraksın, G.E.; Ergün, U.; Özmen Süzme, N. An Eye State Recognition System Using Transfer Learning: AlexNet-Based Deep Convolutional Neural Network. Int. J. Comput. Intell. Syst. 2022, 15, 49. [Google Scholar] [CrossRef]
A Hybrid CNN-LSTM model for Video Deepfake Detection by Leveraging Optical Flow Features|IEEE Conference Publication|IEEE Xplore. Available online: https://ieeexplore.ieee.org/abstract/document/9892905 (accessed on 25 May 2024).
Kshirsagar, M.; Suratkar, S.; Kazi, F. Deepfake video detection methods using deep neural networks. In Proceedings of the 2022 Third International Conference on Intelligent Computing Instrumentation and Control Technologies (ICICICT), Kannur, India, 11–12 August 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 27–34. Available online: https://ieeexplore.ieee.org/abstract/document/9917701/ (accessed on 13 August 2024).
Rana, M.S.; Sung, A.H. Deepfakestack: A deep ensemble-based learning technique for deepfake detection. In Proceedings of the 2020 7th IEEE International Conference on Cyber Security and Cloud Computing (CSCloud)/2020 6th IEEE International Conference on Edge Computing and Scalable Cloud (EdgeCom), New York, NY, USA, 1–3 August 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 70–75. Available online: https://ieeexplore.ieee.org/abstract/document/9171002/ (accessed on 13 August 2024).
Bonettini, N.; Cannas, E.D.; Mandelli, S.; Bondi, L.; Bestagini, P.; Tubaro, S. Video face manipulation detection through ensemble of cnns. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 5012–5019. Available online: https://ieeexplore.ieee.org/abstract/document/9412711/ (accessed on 13 August 2024).
Silva, S.H.; Bethany, M.; Votto, A.M.; Scarff, I.H.; Beebe, N.; Najafirad, P. Deepfake forensics analysis: An explainable hierarchical ensemble of weakly supervised models. Forensic Sci. Int. Synerg. 2022, 4, 100217. [Google Scholar] [CrossRef] [PubMed]
Concas, S.; La Cava, S.M.; Orrù, G.; Cuccu, C.; Gao, J.; Feng, X.; Marcialis, G.L.; Roli, F. Analysis of score-level fusion rules for deepfake detection. Appl. Sci. 2022, 12, 7365. [Google Scholar] [CrossRef]

Figure 1. General structure of autoencoders used to generate deepfake images [12].

Figure 2. Generating deepfakes with GAN-based methods [19].

Figure 3. General diagram of the proposed method.

Figure 4. Calculating fuzzy measures (

g

values) by genetic algorithm.

Figure 4. Calculating fuzzy measures (

g

values) by genetic algorithm.

Figure 5. Block diagram for the steps in each deepfake detection algorithm (DDA).

Figure 6. Architecture of the Xception model [44].

Figure 7. Architecture of the EfficientNetB4 model [49].

Figure 8. Area union for Choquet fuzzy integral: x = classifier/expert, g = Choquet fuzzy measures, F = The score of classifier/expert x [50].

Figure 9. Calculating combinational g values for the classical Choquet fuzzy integral; x = Expert/classifier, g = Choquet fuzzy measure values. The sample values are shown in black [52].

Figure 10. Confusion matrices for custom dataset: (a) DDA 1, (b) DDA 2, (c) DDA 3, and (d) Proposed Choquet fuzzy integral.

Table 1. Comparisons of deepfake generation methods. Method types: Face reenactment (FR), Identity synthesis (IS), Face swap (FS), Synthesis of human poses (SP). Source and targets: portrait (P), body (G), sound (S), video (V). Symbols: +: capable *: partially capable.

Year	Method Name	Neural Network Model	Method Type	Mouth	Expression	Pose	Gaze	Body	Source	Target	Dataset
2017	FT-GAN	LSTM, CNN, GAN	FR	+	+	+	*		P	P	SCUText2face
2018	Recycle-GAN	GAN	IS	+	+	+			P	-	Viper
2018	DeepFaceLab	GAN	FS	+	+	+			PV	-	FaceForensics++
2017	Syth. Obama	LSTM	FR	+		+			S	PV	Obama video
2018	ReenactGAN	GAN	FR	+	+	+			P	P	Celebrity Video/ Boundary Estimation Dataset, DISFA
2018	Vid2vid	Autoencoders	SP	+	+	+	*	+	PV	-	YouTube dancing videos, Street-scene videos, Face videos
2019	Everybody DN	GAN	SP		+	+		+	G	-	YouTube short videos
2019	Few-shot Vid2Vid	-	SP	+	+	+	*	+	PG	PG	YouTube dancing videos, Street-scene videos, Face videos
2018	paGAN	GAN	FR (Real-time)	+	+	+	+		P	P	Chicago Face Dataset, Compound facial expressions (CFE), Radbound Faces
2018	X2Face	U-Net, pix2pix	FR	+	+	+			P	P	VoxCeleb video
2018	FaceID-GAN	GAN	FR	+	+	+			P	P	CASIA-WebFace, CelebA, IJB-A, LFW
2019	wg-GAN	GAN	FR (Real-time)	+	+				P	P	MMI Facial Expression, MUG
2019	FSGAN	GAN, CNN, U-Net, Pix2pixHD	FR/FS	+	+	+			P	P	IJB-C
2019	FaceSwapNet	pix2pix	FR	+	+				P	P	RaFD
2019	FusionNet	U-Net	FR	+	+	+	*		P	P	EOTT, CelebA, RAF-DB, FFHQ
2019	Speech2Vid	CNN	FR	+					S	PV	VGG Face, VoxCeleb2, LRS2
2020	MarioNETte	Autoencoders	FR	+	+	+			P	P	VoxCeleb1, CelebV
2016	Face2Face	Graphical based	FS (Real-time)						P	P	YouTube
2018	FaceSwap GAN	GAN	FS						-	P	YouTube
2018	DeepFaceLab	GAN, TrueFace	FS						-	P	FaceForensics++
2017	Fast Face Swap		FS						P	P	FaceForensics
2018	RSGAN	GAN, Separator networks	FS						P	P	CelebA
2019	FS Face Trans.	GAN	FS						P	P	CelebA
2019	FaceShifter	GAN	FS						P	P	FaceForensics++

Table 2. Comparison of deepfake detection methods in the literature.

Methods	Model	Content			Source			Dataset/ACC–EE–AUC
Methods	Model	Feature	Face	Image	Image	Video	Sound	FF-DF	UADFV	Celeb-DF	DF-TIMIT	DFD	DFDC	Other
Eye blinking [23]	LRCN (LSTM + CNN)	✓	✓			✓								0.99
Using space–temporal properties [25]	RCN (CNN + RNN)	✓		✓		✓		96.9
In-frame and temporal inconsistencies [26]	CNN + RNN	✓		✓										97.1
Face warping artifacts [27]	CNN	✓			✓	✓		93.0	97.7	64.6	99.9	93.0	75.5
MesoNet [6]	CNN			✓		✓		95.23	82.1	53.6	87.8
Capsule forensics [28]	Capsule CNN		✓		✓	✓		99.33
Head poses [29]	SVM	✓	✓		✓	✓		47.3	89.0	54.6	53.2		55.9
Face manipulation [30]	CNN	✓	✓		✓	✓								98.4
Multi-task learning [31]	CNN + DE		✓		✓	✓				36.5	62.2
Voice inconsistency [32]	LSTM					✓	✓							24.74
Voice inconsistency [33]	LSTM + DNN	✓				✓	✓							17.6
Audio features [34]	DNN	✓				✓	✓							1.26
Video and audio features [35]	LSTM	✓		✓		✓	✓	100		99.6
XceptionNet [36]	CNN							99.26		38.7	56.7
Forgery detection by body analysis [37]	DNN-LSTM			✓		✓								94.39
Head pose estimation [24]	LBP + CNN			✓		✓								91.70
Time and space analysis [25]	CNN + RNN			✓		✓		96.90
Using a f2f counterfeiting technique [38]	CNN + LSTM		✓	✓	✓	✓		95
Use of visual artifacts [39]	MLP		✓	✓	✓	✓				84				84
Optical flow [40]	CNN			✓		✓								81.61

Table 3. Comparison of datasets.

Dataset	Real		Fake		Year	Resolution	Method	Source	Actor
Dataset	Video	Frame	Video	Frame	Year	Resolution	Method	Source	Actor
DeepFake-TIMIT-LQ (DF-TIMIT) [7]	320	34.0 k	320	34.0 k	2018	64	faceswap-GAN	VidTIMITDataset	32
DeepFake-TIMIT-HQ [7]	320	34.0 k	320	34.0 k	2018	128	faceswap-GAN	VidTIMITDataset	32
FaceForensics++ (FF-DF) [41]	1000	509.9 k	1000	509.9 k	2019	480, 720, 1080	Deepfakes, Face2Face, FaceSwap, NeuralTextures	YouTube	977
Facebook DeepFake Detection Challenge Dataset (DFDC) [42]	1131	488.4 k	4119	1783.3 k	2019	480, 720, 1080	8 different methods: FSGAN, StyleGAN, MM/NN, DF-256	Volunteer Actors	960
CELEB-DF [43]	590	225.4 k	5639	2116.8 k	2019	256	DeepFake synthesis algorithm	YouTube	59

Table 4. Fuzzy measure values.

Set	Fuzzy Measure Values	Set	Fuzzy Measure Values
$g {\emptyset}$	0	$g {x_{1}, x_{2}}$	0.7173912
$g {x_{1}}$	0.0833336	$g {x_{1}, x_{3}}$	0.9444430
$g {x_{2}}$	0.0175454	$g {x_{2}, x_{3}}$	0.8823529
$g {x_{3}}$	0.3000000	$g {x_{1}, x_{2}, x_{3}}$	1.0

Table 5. Accuracy values and F1 scores of algorithms on custom dataset.

Dataset	DDA 1		DDA 2		DDA 3		Proposed Method
	ACC	F1	ACC	F1	ACC	F1	ACC	F1
Custom dataset	65.08	71.67	56.91	65.56	72.91	84.15	75.91	85.28

Table 6. Elapsed times for inference.

Input Description				Elapsed Time (sec)
Resolution	Length (sec)	FPS	File Size	DDA 1	DDA 2	DDA 3	Proposed Method
1080 × 1920	10.02	30	13 Mb	2.42	4.13	2.57	4.52
942 × 500	15.47	30	4.4 Mb	3.25	5.54	3.21	5.95
432 × 500	13.63	30	592 kB	2.73	4.86	2.82	5.03
942 × 500	10.47	30	1.68 Mb	2.20	3.61	2.28	3.98
512 × 384	5.40	25	290 kB	0.82	1.52	0.85	1.80

Table 7. Comparison of the proposed method with other methods.

Study	Method	Dataset	AUC
DDA (Model) 1	XceptionNet	DFDC-P	78.64
		CELEB-DF	77.45
		FF-DF	95.50
		DF-TIMIT	73.75
DDA (Model) 2	EfficientNet + XceptionNet	DFDC-P	86.5
		CELEB-DF	75.2
		FF-DF	95.3
		DF-TIMIT	74.1
DDA (Model) 3	EfficientNet	DFDC-P	87.98
		CELEB-DF	72.09
		FF-DF	93.4
		DF-TIMIT	63.47
[39]	MLP	FF-DF	86.6
[39]	LogReg	FF-DF	82.3
[28]	iCaps-Dfake	CELEB-DF	96.9
[28]	iCaps-Dfake	DFDC-P	87.8
[53]	CapsuleNet	FF-DF	94.52
[53]	CapsuleNet	CELEB-DF	99.14
[54]	XceptionNet variant	FF-DF (LQ)	98.1
[55]	DCNN	ZJU	99.4
[55]	DCNN	CEW	99.69
[56]	OF + RNN + CNN combinations	FF-DF	91.0
		DFDC	68.0
		CELEB-DF	83.0
[57]	Combination of 6 different ResNet versions (50, 50V2, 101, 101V2, 152, and 152V2)	Google + FF-DF	95.5
[58]	DFC—Combination of 7 different classifiers (XceptionNet, InceptionV3, Inception v2, MobilNet, ResNet101, DenseNet121, and DenseNet169)	FF-DF	100.0
[59]	Combination of EfficientNet B4, B4ST, B4Att, B4AttST	FF-DF	94.4
[59]	Combination of EfficientNet B4, B4ST, B4Att, B4AttST	DFDC	87.8
[60]	Combination of 3 different classifiers (Xception, Xception with Attention, and Efficient-NetB3 with Attention)	DFDC	97.5
[60]		CELEB-DF	98.4
[61]	MLP—Combination of 6 different classifiers (EfficientNetAttB4ST, AttB4, B4ST, B4, Xception, and ResNet)	FF-DF	98.4
Proposed Method	Choquet fuzzy integral	DFDC-P	95.3
		CELEB-DF	92.6
		FF-DF	99.8
		DF-TIMIT	89.7

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Karaköse, M.; İlhan, İ.; Yetiş, H.; Ataş, S. A New Approach for Deepfake Detection with the Choquet Fuzzy Integral. Appl. Sci. 2024, 14, 7216. https://doi.org/10.3390/app14167216

AMA Style

Karaköse M, İlhan İ, Yetiş H, Ataş S. A New Approach for Deepfake Detection with the Choquet Fuzzy Integral. Applied Sciences. 2024; 14(16):7216. https://doi.org/10.3390/app14167216

Chicago/Turabian Style

Karaköse, Mehmet, İsmail İlhan, Hasan Yetiş, and Serhat Ataş. 2024. "A New Approach for Deepfake Detection with the Choquet Fuzzy Integral" Applied Sciences 14, no. 16: 7216. https://doi.org/10.3390/app14167216

APA Style

Karaköse, M., İlhan, İ., Yetiş, H., & Ataş, S. (2024). A New Approach for Deepfake Detection with the Choquet Fuzzy Integral. Applied Sciences, 14(16), 7216. https://doi.org/10.3390/app14167216

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A New Approach for Deepfake Detection with the Choquet Fuzzy Integral

Abstract

1. Introduction

2. Materials and Methods

2.1. Deepfake Video Creation

2.1.1. Reenactment

2.1.2. Replacement

2.1.3. Editing

2.1.4. Synthesis

2.2. Deepfake Video Detection

3. Proposed Method

3.1. Deepfake Detection Algorithm 1 (DDA 1)

3.2. Deepfake Detection Algorithm 2 (DDA 2)

3.3. Deepfake Detection Algorithm 3 (DDA 3)

3.4. The Choquet Fuzzy Integral

4. Experimental Results

5. Discussions

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI