A Novel Adversarial Example Detection Method Based on Frequency Domain Reconstruction for Image Sensors

Huang, Shuaina; Zhang, Zhiyong; Song, Bin

doi:10.3390/s24175507

Open AccessArticle

A Novel Adversarial Example Detection Method Based on Frequency Domain Reconstruction for Image Sensors

by

Shuaina Huang

^1,2,3,

Zhiyong Zhang

^1,2,3,*

and

Bin Song

^1,2,3

¹

Information Engineering College, Henan University of Science and Technology, Luoyang 471023, China

²

Henan International Joint Laboratory of Cyberspace Security Applications, Henan University of Science and Technology, Luoyang 471023, China

³

Henan Intelligent Manufacturing Big Data Development Innovation Laboratory, Henan University of Science and Technology, Luoyang 471023, China

^*

Author to whom correspondence should be addressed.

Sensors 2024, 24(17), 5507; https://doi.org/10.3390/s24175507

Submission received: 29 July 2024 / Revised: 22 August 2024 / Accepted: 23 August 2024 / Published: 25 August 2024

(This article belongs to the Section Sensing and Imaging)

Download

Browse Figures

Versions Notes

Abstract

:

Convolutional neural networks (CNNs) have been extensively used in numerous remote sensing image detection tasks owing to their exceptional performance. Nevertheless, CNNs are often vulnerable to adversarial examples, limiting the uses in different safety-critical scenarios. Recently, how to efficiently detect adversarial examples and improve the robustness of CNNs has drawn considerable focus. The existing adversarial example detection methods require modifying CNNs, which not only affects the model performance but also greatly enhances training cost. With the purpose of solving these problems, this study proposes a detection algorithm for adversarial examples that does not need modification of the CNN models and can simultaneously retain the classification accuracy of normal examples. Specifically, we design a method to detect adversarial examples using frequency domain reconstruction. After converting the input adversarial examples into the frequency domain by Fourier transform, the adversarial disturbance from adversarial attacks can be eliminated by modifying the frequency of the example. The inverse Fourier transform is then used to maximize the recovery of the original example. Firstly, we train a CNN to reconstruct input examples. Then, we insert Fourier transform, convolution operation, and inverse Fourier transform into the features of the input examples to automatically filter out adversarial frequencies. We refer to our proposed method as FDR (frequency domain reconstruction), which removes adversarial interference by converting input samples into frequency and reconstructing them back into the spatial domain to restore the image. In addition, we also introduce gradient masking into the proposed FDR method to enhance the detection accuracy of the model for complex adversarial examples. We conduct extensive experiments on five mainstream adversarial attacks on three benchmark datasets, and the experimental results show that FDR can outperform state-of-the-art solutions in detecting adversarial examples. Additionally, FDR does not require any modifications to the detector and can be integrated with other adversarial example detection methods to be installed in sensing devices to ensure detection safety.

Keywords:

adversarial detection; deep learning attacks; frequency domain; gradient masking; reconstruction

1. Introduction

Deep learning techniques [1,2], have achieved significant success in multiple fields including computer vision [3], autonomous driving [4], motion tracking [5], and speech recognition [6]. However, Szegedy et al. [7] have identified that these networks are susceptible to adversarial examples (AEs), which poses a substantial barrier to their broader adoption. Adversarial examples are created when an attacker introduces subtle, carefully crafted perturbations to a correctly classified image, leading the CNNs to classify it incorrectly with high confidence. With the widespread adoption of deep learning technology, developing effective defenses against such adversarial attacks is vital for enhancing the security of neural network models [8,9].

As CNNs can be susceptible to adversarial attacks, extensive research has been conducted on such attacks [10,11,12,13]. These attack methods include FGSM [10], the Carlini and Wagner (C&W) [11] method, the Basic Iterative Method (BIM) [12], etc. Given that adversarial attacks typically involve the addition of perturbations to the input, causing misclassification errors in deep learning models, strategic input transformations and model reconstructions can serve as defenses against these attacks [14,15,16].

Adversarial defense strategies can be primarily classified into two categories: model-based defenses [17,18], and input sample-based defenses [19,20]. Model-based defenses modify the parameters of CNNs for better robustness. Employed techniques include adversarial training [21], gradient regularization [22], and model distillation [23]. To accommodate model adaptation to sample sets of varied attributes, Goodfellow et al. [10] proposed adversarial training, wherein both adversarial and normal samples are applied in model training to improve classification accuracy. Adversarial trained DNNS identify clean examples with much less accuracy than normally trained DNNS, limiting their large-scale use in real-world scenarios. To deal with the existing problems, Papernot et al. [23] obtained a distillation model with similar architecture to the original model by using input samples and probability distribution training. The method trains the teacher network at a distillation temperature of T to acquire prior knowledge. This prior knowledge is then transferred to the student network, which is also trained at the same distillation temperature T, to produce the final output. The defensive distillation approach enhances model robustness without requiring smoothing of the neural network’s output. Input sample-based defense involves designing algorithms to remove perturbations or to detect adversarial examples by analyzing their characteristics. Techniques in this domain include data feature squeezing [24], adversarial feature analysis [25,26], and input reconstruction [27]. While model-based defense techniques bolster model robustness, they may fail to recognize adversarial samples and potentially diminish accuracy on standard inputs. Defense based on input samples does not affect the classification accuracy of the model on normal samples. After adversarial examples are detected, we can perform pre-processing operations such as denoising to disable the disturbance and make the defense more fine-grained.

Although the current detection methods have made progress, there are still problems of poor detection performance. For instance, setting thresholds for methods that rely on data feature squeezing involves considerable uncertainty, leading to significant deviations in practical applications. Utilizing the Conditional Pixel CNN [28] model, multiple input images are reconstructed to discern differences and detect adversarial samples. Although this model demonstrates robust generalization capabilities, the image generation process, which sequentially constructs each pixel, results in prolonged training and inference times. Moreover, detection strategies based on Local Intrinsic Dimensionality (LID) [29] necessitate an acute understanding of the targeted model’s hyperparameters and structures, which can be challenging to ascertain. Such strategies also exhibit limited efficacy in identifying adversarial examples produced by alternative attack models.

To overcome the existing issues, this study attempts to introduce a novel detection method based on frequency domain reconfiguration to effectively prevent the most advanced adversarial attacks. Our approach innovatively utilizes frequency domain analysis as a distinct measure. We observe that most adversarial perturbations manifest as high-frequency signals or are superimposed onto high-frequency components. Neural networks, during training, tend to overfit to low-frequency signals, making them vulnerable to being deceived by the high-frequency signals within adversarial samples. To counteract this, we apply neural networks in conjunction with discrete Fourier transform to decompose the input image into its frequency domain components, followed by convolutional filtering to attenuate noise. Subsequently, we reconstruct the denoised samples back into the spatial domain to restore the image and eliminate adversarial distortions. Additionally, we incorporate gradient masking to strengthen the intrinsic link between prediction confidence and the authenticity of the original image. Finally, a consistency check is implemented to identify AEs. In summary, the following contributions are presented:

This study indicates that adversarial examples are easily eliminated in the frequency domain, while normal examples are often immune to this operation in the frequency domain, providing a novel research direction for AE detection.
This study proposes a new detection method that distinguishes AEs from normal samples through training a frequency domain reconstructor. Specifically, we transform adversarial examples into the frequency domain to eliminate adversarial attacks and restore images.
Gradient masking is introduced into the proposed FDR method. This method not only strengthens the internal relationship between the prediction confidence and the original image authenticity but also enhances the recognition accuracy for complex examples.

2. Related Work

The main work of this paper is to improve the detection accuracy of AEs to better identify AEs. Therefore, the first part of this section introduces the research status in the field of generating AEs. The current popular adversarial attack algorithms are analyzed and reviewed. We use these attack algorithms to generate adversarial examples for detection. The second part analyzes and summarizes the methods of adversarial example detection from recent years, and identifies key problems to be overcome.

2.1. Adversarial Examples Generation

Szegedy et al. [7] first found that convolutional neural networks have blind spots in recognizing adversarial examples. This refers to inputs that have been perturbed in a subtle manner, imperceptible to humans but leading a target model to make high-confidence incorrect predictions. Researchers refer to the process of generating such subtle perturbations as adversarial attacks. The images with perturbations produced by adversarial attacks are referred to as adversarial examples (AEs) [30]. Based on the malicious person’s knowledge of the target network’s architecture or parameters, adversarial attacks are divided into two types: white-box and black-box. White-box attacks require the participant to have a prior knowledge of the target network, including its hidden layer, hyperparameters, gradient information and outputs from each layer. The principal methodologies for implementing such attacks include optimization-based attacks, gradient-based attacks, and generation-based attacks. Black-box attacks are conducted without detailed knowledge of the target network. Attackers usually indirectly obtain model information based on output labels and confidence levels or use the transferability AEs to launch attacks. The primary implementation algorithms are transfer-based attacks and query-based attacks. This section introduces several representative adversarial attack methods. The AEs generated by these methods will be used in this study for experimental testing.

Goodfellow et al. [10] initially put forward a target-free attack algorithm called the fast gradient symbol method (FGSM), generating AEs under the

L \infty

norm limit of original clean samples. The principle of FGSM is to add disturbance that causes maximum loss to the model along the opposite direction of the normal example gradient, thus reducing the classification confidence. FGSM represents a one-step attack process conducted within a white-box setting. The gradient of the classification model regarding input x is calculated. The attack direction is determined with the use of a symbolic function, which is then scaled by a specified step to compute the adversarial perturbation. This perturbation is supplemented to the initial input x to obtain adversarial sample

\hat{x}

under the FGSM attack. FGSM can generate adversarial examples quickly, but the attack success rate of FGSM depends on the selected hyperparameters. The generated AEs can be written as the following:

\hat{x} = x + α \cdot s i g n (\nabla_{x} L (x, y))

(1)

The FGSM method does not require altering the network’s model structure or specific pixels in the input sample. Instead, it focuses on the direction of the gradient perturbations, calculating the gradient of each input image to modify its structure.

Carlini and Wagner [11] put forward the C&W attack method, which takes into account both high attack accuracy and low disturbance. The algorithm optimizes perturbations by limiting the

L 0

,

L 2

, or

L \infty

norms, and its objective function consists of optimization problems that maximize loss functions and minimize perturbations. In the C&W algorithm, AEs can be acquired by addressing the following optimization problems:

\{\begin{matrix} min D (x, t) + c \cdot f (t) \\ t = x + δ \end{matrix} s . t . t \in {[0, 1]}^{n}

(2)

where

D (\cdot)

is a function of the distance measurement, x is the normal sample, t is the adversarial sample, and

δ

indicates the disturbance added for each iteration. Parameter c can control the balance between the added disturbance value and the confidence of the wrong classification.

f (\cdot)

is the design objective function:

f (x^{'}) = max (max \{Z {(x^{'})}_{i} : i \neq t\} - Z {(x^{'})}_{t}, - k)

(3)

where

Z (\cdot)

is the logical value of the previous layer of Softmax, i is the label category, t is the label category of target attack, and parameter k can control the confidence that the adversarial sample is misjudged. The larger the k value, the higher the success rate of the adversarial attack. The proposed method has a good attack effect but a high cost.

Kurakin et al. [12] introduced the BIM method, which improves the performance of FGSM through multiple iterations with small step sizes. BIM follows the gradient direction of the input loss function, performs FGSM in smaller steps, and trims the updated adversarial example into an effective range. Then, the AEs generated by BIM method can be described as follows:

x_{i + 1} = C l i p_{ε} \{\begin{matrix} x_{i} + α \cdot s i g n (\nabla_{x} L (x, y)) \end{matrix}\}

(4)

where i is the number of iterations, and Clip is a clipping function that limits the perturbation to the

ε

neighborhood of original pixel. The BIM method is an iterative attack approach based on FGSM, where the pixel values are adjusted by

α

in each iteration. While it improves the efficiency in white-box attacks compared to FGSM, it is more computationally intensive.

Moosavi-Dezfooli et al. [31] presented the DeepFool method, which crafts perturbations based on the minimum distance between normal samples and the decision boundary of adversarial examples. After several iterations, the normal examples may cross the decision boundary, thereby achieving the attack’s objective of misclassifying the target model. Compared to the FGSM attack, the DeepFool attack requires adding only small perturbations to create highly effective adversarial examples. However, its drawback lies in the higher computational cost, as it requires gradually calculating the magnitude of the perturbations. Papernot et al. [32] introduced an effective attack method termed the Jacobian-based Saliency Map Attack (JSMA). Initially, the Jacobian matrix of the target model is obtained by calculating the forward derivative. Subsequently, the Jacobian matrix is utilized to construct the adversarial significance graph, reflecting the extent to which input features influence the output of the target network. Ultimately, the pixel with the highest adversarial significance value is selected to craft the perturbation.

2.2. Detection of Adversarial Examples

Adversarial example detection methods [33,34,35] are usually based on the following theory: given a class K neural network classifier, the original training set is

D = {\{x_{i} \in R^{d}\}}_{i = 1}^{N}

, and then we construct an adversarial sample set

D^{'} = {\{x_{j}^{'} \in R^{d}\}}_{j = 1}^{N}

and design a detection method to distinguish

D^{'}

and D. In recent years, there has been a lot of meaningful research in the field of adversarial defense: GAN was used to defend against adversarial attacks in [36]; autoencoder was employed for adversarial detection [37]; and diffusion model for adversarial purification [38]. Currently, methods for detecting adversarial examples predominantly encompass those based on data feature transformation, adversarial feature distribution statistics, and neural network characteristics. Methods based on data feature transformation typically investigate the discrepancies between normal and adversarial examples through noise reduction and reconstruction of input data. These methods implement adversarial example detection via thresholding or prediction inconsistency analysis. Methods based on adversarial feature statistics construct detectors by quantifying the example’s features at various levels. Detection methods leveraging neural network characteristics primarily build detectors by monitoring the example’s performance or behavior within the neural network, thus utilizing the network’s inherent properties [39,40,41].

For data feature transformation-based methods, Xu et al. [24] proposed two feature compression methods. They pointed out that CNNs do not require a large input feature space, and a larger feature space makes it easier for malicious attackers to generate adversarial examples. To address this, feature squeezing techniques, such as reducing image color depth and applying smoothing, are employed to create squeezed images from the initial input. Both the squeezed and original images are then individually predicted. AEs are detected by measuring the distance between the prediction vectors of the original input and each squeezed image. If any of these distances exceed a predefined threshold, the original input is classified as adversarial. While this method achieves good detection success rates against various adversarial attacks, its effectiveness relies heavily on the quality of the feature squeezing module, and setting the hyperparameter threshold remains a challenge. Gupta et al. [42] developed a CIIDefence method for the reconstruction of image regions that most significantly influence the present classification outcomes. Then, the adversarial examples are identified through analyzing the classification results of the original and reconstructed examples. CIIDefence does not require retraining or modification of the classifier. Liang et al. [43] proposed scalar quantization and smooth spatial filtering to lower noise. Using image entropy as the metric, the adaptive noise reduction in different types of images is performed. The adversarial examples are detected through judging the change in the prediction label of the examples before and after denoising. This defense method can protect against such as FGSM, C&W, and other adversarial attacks, but against larger adversarial disturbance, the defense performance is poor.

In the context of adversarial feature distribution statistics-based methods, Hendrycks et al. [44] observed differences in the variance and softmax distribution of the input coefficients after principal component whitening when comparing normal examples with adversarial examples as revealed by statistical analysis. Ma et al. [29] suggested using Local Intrinsic Dimensionality for characterizing the internal dimensions of disturbed regions, and to distinguish adversarial examples by analyzing the distance distribution among examples and their neighbors. Zhang et al. [45] developed an adversarial example detection method that leverages label sequence differences. The algorithm employs a reference value to mask the pixels and utilizes a random forest classifier to compute label sequences and LSD (label sequence difference) score for both the original and transformed examples. Then, these metrics are applied to train the classifier to identify adversarial examples. Feinman et al. [46] investigated the disparities in data distribution subspaces between normal and adversarial examples, applying kernel density estimation to identify adversarial examples distant from a manifold, and Bayesian uncertainty estimation to recognize adversarial samples in proximity to a manifold. These approaches highlight the diverse strategies in detecting subtle manipulations in data, essential for bolstering the robustness of machine learning models against adversarial threats. However, there is also a defect, that is, when the attacker understands the detection principle of the target model, they will adjust the attack strategy accordingly so as to bypass the detection mechanism and successfully attack.

Regarding methods based on neural network characteristics, Lu et al. [47] proposed the SafetyNet detection algorithm, where discrete codes derived from the input examples in the last ReLU layer are utilized as inputs for the support vector machine, aiming to construct an adversarial examples detector. Carrara et al. [48] encoded the activation position within the network for specific examples in the DNN feature space and proposed a feature distance space detection algorithm. The activation position of AEs deviates from the reference position of the normal example, enabling detection of the adversarial example according to the difference in the activation trajectory of the input example. This suite of approaches highlights the diversity in strategies employed to identify and counter adversarial attacks, leveraging the unique characteristics and behaviors within neural network models. Ma et al. [49] analyzed the internal structure of CNNs under various types of attacks and identified channels through which adversarial examples exploit vulnerabilities: the source channel and the activation distribution channel. They proposed a detection algorithm based on CNN channel invariants, known as Neural-network Invariant Checking (NIC). When testing input samples, NIC examines potentially adversarial inputs by running them through all invariant models to obtain independent predictions. These predictions indicate whether the input might cause a state that violates the invariant distribution. The final outcome of the NIC detection method is a joint decision based on these predictions. However, extracting invariants involves determining probability distributions, which requires a large number of models, leading to significant additional overhead. Additionally, the deployment of this detection framework is relatively complex.

Although current adversarial example detection methods can identify a subset of AEs generated by adversarial attacks, there are still some key issues to overcome. These problems include the low detection accuracy of adversarial examples, insufficient recognition ability of emerging adversarial examples, and how to eliminate adversarial interference. These issues underscore the need for ongoing research and development to enhance the robustness and adaptability of detection methodologies against a wider array of adversarial tactics. This paper focuses on these problems to enhance the detection accuracy of multi-class adversarial examples.

3. Adversarial Examples Detection Based on Frequency Domain Reconstruction

The fundamental principle of the proposed method is to consider disturbances as a form of noise leading to image distortion. Our strategy involves frequency domain reconstruction and gradient masking, aiming to significantly reduce the effects of adversarial noise. By introducing a subtle perturbation

Δ x

to the input example x to obtain a new input

x^{'}

, the DNN model can give an incorrect output with high confidence, and

x^{'}

here is called an adversarial example. The adversarial example

x^{'}

satisfies

f (x^{'}) = f (x)

, and

Δ x \leq ε

, where

ε \in R^{n}

, and n denotes the dimension of the vector. The smaller the magnitude of the added perturbation, the less perceptible it becomes to the human eye. When

x_{a d ν}

is entered into a deep neural network (DNN), the output at each layer undergoes changes. To illustrate, taking the first layer as an example,

φ_{1} (x) = w_{1} (x) \cdot x + b_{1}

, the weights of this layer can be denoted as

w_{1} (x + Δ x) = w_{1} (x) + Δ w

. After inputting the adversarial examples, the representation of the neural network’s first layer becomes:

\begin{matrix} φ_{1} (x_{a d ν}) & = w_{1} (x + Δ x) \cdot (x + Δ x) + b_{1} \\ = w_{1} (x) \cdot x + w_{1} (x) \cdot Δ x + Δ w \cdot x + Δ w \cdot Δ x + b_{1} \\ = φ_{1} (x) + w_{1} (x) \cdot Δ x + Δ w \cdot x + Δ w \cdot Δ x \end{matrix}

(5)

φ_{1} (x_{a d ν})

and

φ_{1} (x)

represent the output of the first layer with the model being fed with original and adversarial examples, respectively. From the equation, it becomes clear that the output of this layer is influenced by the last three terms, which are collectively denoted as

Δ φ_{1} (x, Δ x)

. These minor changes become amplified during the forward propagation process between layers and are represented by the term:

\begin{matrix} f (x_{a d ν}) & = φ_{l} \circ φ_{l - 1} \circ \dots \circ φ_{2} \circ φ_{1} (x_{a d ν}) \\ = φ_{l} \circ φ_{l - 1} \circ \dots \circ φ_{2} \circ (φ_{1} (x) + Δ φ_{1} (x, Δ x)) \end{matrix}

(6)

Predicted labels

l_{a d ν} = arg {max}_{θ} f (x_{a d ν}) \neq l

are significantly influenced by these perturbations. Although initially subtle within the network, these perturbations become progressively magnified in the deeper layers as the network propagates, eventually leading to misclassification.

3.1. Analysis of Adversarial Examples in the Frequency Domain

Song et al. [50] identified that a predominant portion of adversarial perturbations either constitute high-frequency signals or are superimposed onto high-frequency signals. This distribution bias originates from the propensity of neural networks to overfit low-frequency signals during the training phase, rendering them susceptible to the deceptive nature of high-frequency signals inherent in adversarial examples. In contrast, Zhang et al. [23] advocated for a methodology that entails the extraction of high-frequency components from adversarial examples through the process of adversarial training. A Fourier transform [51] on the spatial dimensions generates global statistical information, which enhances the distinguishability of various global representations and simplifies their learning process. The current work introduces a novel Fourier transform-based learning mechanism to mitigate or attenuate adversarial perturbations.

The physical significance of the Fourier transform is to convert the image’s grayscale distribution function to its frequency distribution function. In essence, performing a two-dimensional Fourier transform on an image to obtain a spectrum map is essentially a gradient distribution map of the image. The varying brightness of spots on the Fourier spectral map reflects the intensity of differences between a point in the image and its neighborhood, indicating the frequency magnitude at that point. Therefore, applying a Fourier transform to features provides an effective space for filtering out high-frequency perturbations and suppressing low-frequency signals, aiding in the reconstruction of clean samples from adversarial ones. As shown in Figure 1, from the aspect of the frequency domain, adversarial attacks add noise in the opposite direction of the neural network gradient, resulting in more bright and dark spots. Convolutional filtering and decoding reconstruction filter out or weaken the adversarial perturbations on the image. Intuitively, after the adversarial perturbations are weakened or eliminated, the transition of bright and dark spots in the image’s frequency domain becomes smoother.

In an ideal scenario, if we could reconstruct the initial input

f (m, n)

from an adversarial sample

g (m, n)

, adversarial samples would be immediately detectable. However, due to the absence of essential knowledge regarding the noise

s i g n (\nabla J (θ, x, y))

, this is quite challenging to achieve. Therefore, we aim to eliminate perturbations in a classification sense and reconstruct the original image. That is to say, we transform

g (m, n)

into a new image

f^{'} (m, n)

, and then we observe whether its predicted class

L (f^{'} (m, n))

matches

L (f (m, n))

.

3.2. The Proposed Model

The Frequency Domain Reconstruction Network (FDR-Net) is elucidated in this section, encompassing its comprehensive framework, the mechanism for detecting adversarial samples, and the details of its implementation. The FDR-Net utilizes convolutional neural networks as the backbone architecture. We first encode the input image. After the encoding phase, Fast Fourier Transform is employed to convert the image’s grayscale distribution function into its frequency distribution function. Through convolution operations, Adversarial perturbations in the frequency domain are eliminated, and low-frequency perturbations are suppressed. In the decoding phase, the introduction of inverse Fourier Transform and multi-layer convolution operations converts the frequency domain features back into a standard image. AEs are detected through comparing the classification results of both the initial image and the image reconstructed from the frequency domain.

The methodology put forward in this work primarily encompasses three components, frequency domain reconstruction, gradient masking, and consistency verification, as illustrated in Figure 2. In the frequency domain reconstruction phase, initially, the original examples are sampled to obtain frequency domain samples based on the Discrete Fourier Series. Subsequently, convolutional filtering and concatenation are performed on the amplitude and phase components of the frequency domain samples. Thereafter, the image is transformed from the frequency domain back into the spatial domain through inverse Fourier Transformation, followed by multi-layer convolutional local filtering for noise reduction. The filtered frequency domain examples are then reverse-normalized and mapped back to the RGB color space to obtain reconstructed examples. These reconstructed examples, along with the original adversarial examples, are inputted into the model to obtain classification results, which are then subjected to consistency verification.

3.3. Frequency Domain Reconstruction

The Fourier transform is indeed widely acknowledged for its utility in analyzing the frequency components of images. CNNs are particularly susceptible to the influence of high-frequency signals in images, which are often the carriers of misleading signals from adversarial examples. Moreover, the skewed distribution arises from the tendency of neural networks to fit low-frequency signals preferentially while insufficiently learning the distribution of high-frequency signals. Building upon the understanding, this study designed a frequency domain reconstruction mechanism to mitigate the vulnerability caused by high-frequency signals. In a Fourier spectral map, the varying brightness of the spots indicates the strength of the difference between a point in the image and its neighboring points, essentially representing the magnitude of the gradient. The low-frequency parts of an image correspond to areas with low gradients, while the high-frequency parts are the opposite. Larger gradients mean brighter intensity at a point, and smaller gradients mean weaker intensity. Given an image

x \in R^{H \times W \times C}

, the Fourier transform

F (\cdot)

converts it to Fourier space, obtaining the complex component

F (x)

, which is expressed as:

F (x) (u, v) = \sum_{h = 0}^{H - 1} \sum_{w = 0}^{W - 1} x (h, w) e^{- j 2 π (\frac{h}{H} u + \frac{w}{W} v)}

(7)

Both the Fourier transform and its inverse procedure

F^{- 1} (\cdot)

can be effectively performed by FFT and IFFT algorithms. The amplitude component

A (x) (u, v)

and phase component

P (x) (u, v)

are expressed as:

\begin{matrix} A (x) (u, v) & = \sqrt{R^{2} (x) (u, v) + I^{2} (x) (u, v)} \\ P (x) (u, v) & = arctan [\frac{I (x) (u, v)}{R (x) (u, v)}] \end{matrix}

(8)

where

R (x) (u, v)

and

I (x) (u, v)

represent the real and imaginary operate, respectively. Consequently,

A (x) (u, v)

and

P (x) (u, v)

denote the amplitude and phase variations in the image frequency, respectively.

Secondly, we introduce a 1 × 1 convolutional layer to perform convolution operations on the real part

R (x)

and the imaginary part

I (x)

, respectively, eliminating adversarial attacks in the frequency domain and ultimately obtaining new real part

R {(x)}^{'}

and imaginary part

I {(x)}^{'}

:

\begin{matrix} R (x) {(u, v)}^{'} = w_{1} (R (x) (u, v)) + b_{1} \\ = w_{1} (A (x) (f) ⊙ cos (P (x) (f))) + b_{1} \\ I (x) {(u, v)}^{'} = w_{2} (I (x) (u, v)) + b_{2} \\ = w_{2} (A (x) (f) ⊙ sin (P (x) (f))) + b_{2} \end{matrix}

(9)

Finally, we convert the processed Fourier domain feature

R {(x)}^{'}

and

I {(x)}^{'}

to their original space by employing the inverted Fourier transform:

\begin{matrix} x^{'} = F^{- 1} (c a t (R {(x)}^{'}, I {(x)}^{'})) \\ = \frac{1}{H W} \sum_{u = 0}^{H - 1} \sum_{v = 0}^{W - 1} F (u, v) e^{j 2 π (\frac{u}{H} h + \frac{v}{W} w)} \\ \hat{x} = c o n v (c o n v (c o n v (x^{'}))) \end{matrix}

(10)

The inverse Fourier transform can be applied to the image features. Through decoding operations, they are transformed into a reconstructed image. Through a series of encoding, decoding, Fourier transformation, and convolutional filtering operations, the adversarial noise contained in the reconstructed image is reduced.

3.4. Gradient Masking

Adversarial attacks change the specific statistics of the input images, thus altering the model’s predictions. Mainstream adversarial attack algorithms seek out key pixels for minor perturbations that cause the classifier to make incorrect judgments. Thus, erasing these key pixels and reconstructing can reduce the perturbation’s effectiveness. In reality, adversarial perturbations have a specific structure. Therefore, we introduce the method of gradient masking, combined with Fourier transform, and design and experiment with image transformations that change these perturbation structures. Owing to the weak generalizability of iterative attacks, low-level image transformations such as resizing, padding, and compression might destroy the specific structure of adversarial perturbations, which can make them a good defense. If random transformations are applied, they can effectively defend against white-box attacks. The reason is that each image undergoes a random transformation, and the malicious attackers do not know the specific transformation during the process of generating adversarial noise.

Gradient masking employs a randomized layer to randomly adjust the input size, resizing original images of dimensions

W \times H \times 3

to a new size of

W^{'} \times H^{'} \times 3

. It is noteworthy that

|W^{'} - W|

and

|H^{'} - H|

need to be within a reasonably small range; otherwise, the performance on non-adversarial inputs can degrade obviously. The randomly resized image is then passed to the convolution part, followed by frequency domain reconstruction and decoding to yield the reconstructed image.

3.5. Consistency Check

After the above steps, the features reconstructed from the frequency domain are decoded and mapped back to the RGB color space, resulting in a filtered sample. Then, the reconstructed sample

x^{'}

and the adversarial example

x^{″}

are input into the discriminator, yielding discrimination results

C (x^{'})

and

C (x^{″})

, respectively, where a consistency verification is performed (as shown in Figure 2). If the discrimination results are consistent

C (x^{'}) = C (x^{″})

, the example is judged to be normal

y_{x} = 1

; otherwise, it is determined to be an adversarial sample

y_{x} = 0

. The decision criterion can be mathematically represented as follows:

y_{x} = \{\begin{matrix} 1 & C (x^{'}) = C (x^{″}) \\ 0 & C (x^{'}) \neq C (x^{″}) \end{matrix}

(11)

4. Experimental Analysis

4.1. Experimental Setup

Hardware and Software Environment: The hardware configuration used in the current experiment consisted of an Intel Xeon E5-2673 v4 CPU and 8 NVIDIA GeForce RTX 2080Ti GPUs (with 256 GB memory and 8 GB VRAM). The experiment was conducted on the Ubuntu operating system, using Python 3.9 programming language and PyTorch 1.11 deep learning environment to implement all the detection methods. The convolution filter used 3 × 3 and 1 × 1 convolution kernel, the step length was 2, and the zero filling was adopted. The default function FFT in PyTorch was used for the fast Fourier transform. Axes and sequence of ints in FFT defaulted to −2 and −1. We adopted orthogonal normalization.

Datasets: In this study, we selected the MNIST, SVHN, and CIFAR-10 datasets, which are extensively recognized in the field of adversarial defense. The MNIST dataset refers to a collection of handwritten digit images with a training set of 60,000 samples and a test set of 10,000 samples. Each image is labeled into one of ten categories representing digits from 0 to 9, with each sample image having dimensions of 28 × 28 pixels. The SVHN dataset is a real-world image set of house numbers. It includes a training set with 73,257 samples and a test set with 26,032 samples. Each image is represented by pixel information at a resolution of 32 × 32 pixels. The CIFAR-10 dataset consists of color images categorized into ten classes. It comprises a total of 60,000 images with each class containing approximately equal samples (6000 per class). There are separate sets for training (50k) and testing (10k) purposes. The CIFAR-10 images have dimensions of size 32 × 32 pixels. These datasets provide diverse representations suitable for evaluating adversarial defenses, as they cover various everyday life categories such as aviation objects, vehicles, birds, cats, dogs, fish, horses and ships. The use of colored imagery in the CIFAR-10 dataset accurately reflects the real-world objects’ appearance, making it more relevant for applications involving adversarial samples.

Network Settings: We train a deep convolutional neural network on the basis of Lenet, which consists of three parts: convolutional layer, pooling layer, and fully connected layer. Using each convolutional layer, a pooling operation is executed, utilizing either average pooling or max pooling. Upon training, the network demonstrates robust performance across three benchmark datasets, achieving accuracies of 99.14% on MNIST, 81.42% on CIFAR-10, and 92.68% on SVHN. For the optimization process, the Adam algorithm is employed, and the network’s learning is guided through minimizing of the cross-entropy loss function.

4.2. Adversarial Examples Generation

We select five attack algorithms for experiments on three datasets, and also select three detection algorithms for comparison. Our designed adversarial attacks fall into two categories: targeted or non-targeted attacks. Targeted attacks use two target classes: the next class (the index of the true class plus one) and the least likely class (the class with the lowest predicted probability by the model). These attacks cover various norm versions to ensure comprehensive and reliable evaluation. Specifically, FGSM refers to an

L \infty

attack, C&W and DeepFool are

L 2

attacks, BIM is on the basis of an

L 1

norm attack, and MJSMA is on the basis of an

L 0

norm attack.

We comprehensively evaluate the performance of targeted FDR detection methods against five mainstream adversarial attack algorithms. Table 1 details the specific parameter settings of these five attacks on three datasets and provides their attack success rates, number of iterations, and perturbation magnitudes. To facilitate the quantification and comparison of perturbation magnitudes, we normalize the pixel values of the images to the range

[0, 1]

and then measure the perturbation magnitudes using the

L 2

norm. As shown in Table 1, these mainstream attack algorithms keep a balance between maintaining high attack success rates and controlling perturbation magnitudes at a relatively low level, achieving a trade-off between attack effectiveness and visual quality. ASR is the success rate of adversarial attack, and PSNR is the logarithm of the mean square error between clean samples and adversarial samples relative to the square of the signal maximum. When the disturbance values of FGSM attacks are 0.3, 0.1, and 0.3, respectively, the attack success rate is the highest on the three datasets. When the disturbance value of BIM attack is 0.3, the best attack success rate can be achieved on each dataset. When the iterations of C&W, DeepFool and MJSMA are 100,10 and 100, respectively, ASR reaches the best rate. At the same time, the PSNR values have little difference, and the image quality is stable. It should be noted that the score of PSNR cannot completely conform to the visual quality found by the human eye since the sensitivity of human vision to error is not absolute. When the attack is carried out, the unified attack parameters and disturbance amplitude can make the evaluation result more objective.

4.3. Detection of Adversarial Examples

We select five attack algorithms on three datasets and three detection algorithms for comparison. From the Table 2, we can see that under the attacks of FGSM, DeepFool, CW, and BIM, our FDR and FDR GM method maintains good results on the MNIST dataset. Under the FGSM attack, the accuracy of the proposed method is more than 5% higher than that of ANR method [43]. ANR uses smooth spatial filtering to denoise, but adversarial interference mostly exists in the frequency domain of the image. Both our FDR and FDR GM methods denoise in the frequency domain and use convolutional kernel for denoising, so the detection accuracy is better. Under the DeepFool attack, the accuracy of the proposed method is more than 30% higher than that of the RAN method [52]. RAN uses random resizing and random padding to mitigate the adversarial effect. The method has a limited effect on frequency domain interference, so the detection accuracy is lower than the method proposed in this paper. Among all the attack methods, the accuracy of the detection method in this paper is higher than that of TVM [53]. The reason may be that TVM uses Bernoulli random variables to reconstruct some pixels. This Bernoulli randomization process loses some feature information. Under the MJSMA attack, the detection algorithm RAM is more advantageous. MJSMA uses an adversarial salience map for its attack, while the picture in the MNIST dataset is a grayscale map. Therefore, the position of the pixels of the two points with the largest absolute values found by the MJSMA attack may change greatly, and the RAN uses random resizing to mitigate the antagonistic effect. This process easily removes some of the disturbance points in MJSMA. However, for the MJSMA attack as shown in Figure 3, we compare all the detection algorithms on three datasets together. The images from the CIFAR10 and SVHN datasets are three-channel color RGB images. The FDR GM algorithm has advantages over the RAM detection algorithm on the CIFAR-10 and SVHN datasets, and its defense performance is more stable. Our FDR GM method performs well on the other two datasets, CIFAR10 and SVHN, indicating that FDR GM is more suitable for images with higher information content, such as RGB images. In contrast, the FDR method is more applicable to images with lower information content, like the MNIST dataset. We speculate that this is because the GM module may lose some information, leading to poorer performance on single-channel datasets. On the other hand, RGB datasets contain richer information content, which can be utilized by the GM module. Furthermore, we can observe that our proposed FDR and FDR GM algorithms outperform the three compared algorithms overall, achieving a maximum defense success rate of 93.58% against the CW attack. This demonstrates that our proposed methods can effectively defend against attack algorithms across different datasets.

With the purpose of more intuitively demonstrating the impact of our proposed algorithm, we perform model attacks on three datasets using attack algorithms of four different methods. According to the optimal interference value and number of iterations calculated by us in the previous section, the number of iterations for the DeepFool attack algorithm increases from 1 to 10. The number of iterations for the C&W attack algorithm increases from 10 to 100. The disturbance e value increases from 0.03 to 0.30. The accuracy curve of our proposed method FDR GM and the other three defense algorithms are illustrated in Figure 4, Figure 5 and Figure 6. From the figure, we can see that the defense effect of our proposed method under the attack algorithm with different iterations is generally better than that of other defense algorithms, which is the same as the analysis of experimental results in Table 2. On the MNIST dataset, ANR and RAN detection algorithms sometimes perform slightly better than our algorithms under FGSM and BIM attacks. But overall, the performance of our algorithm is more stable. And for different e values, the average detection accuracy is higher. On the other two datasets, sometimes some algorithms perform better. However, our algorithm has more robust performance and smaller average detection variance under different attack algorithms. On the CIFAR10 and SVHN datasets, FDR GM performs better than FDR. It shows that for three-channel RGB images with richer information content and greater gradient variation, adding gradient masking is more beneficial to eliminate the adversarial disturbance.

Meanwhile, to more intuitively show the effect difference, we visualize the attack samples after the recovery of the detection algorithm in Figure 7. Clearly, the figure shows that our proposed FDR is closer to the original sample and contains less attack noise. The proposed FDR GM algorithm can maintain the rough outline of the original sample and filter out almost all noise. The goal of the adversarial defense is to detect AEs and maintain the stability of the classification network. Compared with the other three detection algorithms, the proposed algorithm balances the detection accuracy and the sample fidelity after filtering under five kinds of attacks.

To assess the impact of each module in our proposed FDR GM method, we select five attack algorithms for ablation experiments on three datasets and compare them with different modes. The experimental results are presented in Table 3, Table 4 and Table 5. Based on Table 3, it can be seen that under the five attacks, the combination of Conv and GM modules has more advantages in terms of average detection accuracy and variance. The detection robustness of the combination model is stronger. Under DeepFool and C&W attacks, the detection accuracy of the Conv module is better than the other two models. But the advantage is small when comparing 15 sets of data. The reason for this advantage may be that the GM module could lose information from low-resolution grayscale images. And because MNIST has only digital recognition, limited interference can make the recognition wrong. This also makes the Conv denoising module have a large role space. In Table 4, the overall participation of the Conv and GM modules achieves better detection results. The Conv filter module performs 6% better under BIM attacks. But considering the real-world deployment of detection algorithms, it is impossible for adversaries to adopt only one attack method. Therefore, the combined use of Conv and GM modules brings higher defense benefits. As shown in Table 5, the combined use of Conv and GM modules achieves better detection accuracy than other combinations in the SVHN dataset. Compared with the Conv module, the GM module achieves better detection accuracy. This may be because SVHN is a house number in the real street view, with a more complex background and more diverse number variations. So Conv filtering is of limited use. Finally, we find that the combination of Conv and the GM module can effectively deal with various adversarial attacks under complex images and achieve a good defense effect.

5. Conclusions

With the purpose of improving the capability of the CNNs classification network to defend against AEs, this study puts forward a novel detection method based on frequency domain reconstruction, which is universal to all kinds of attack methods. In the process of detection, Fourier transform is used to transform the image from the spatial domain to the frequency domain, and the convolutional neural network is used to filter and denoise the frequency domain. The gradient masking algorithm is used for multi-feature input. The adversarial examples are detected by the classification consistency of the reconstructed examples and the original samples by the deep convolutional neural network. Five attack algorithms and three detection algorithms are compared on three datasets. Numerous experiments prove that the proposed FDR GM and FDR method in this study has a high detection accuracy. The proposed method does not need to modify the original network and has the advantage of simple application. Relative to other defense methods, the proposed method has a strong generalization ability among models and can defend against attacks against samples across models, effectively ensuring the credibility of the classification results. In future research, we will also focus on exploring integrated robustness enhancement methods for detection models and target models to lower the likelihood of attackers spoofing target classifiers and adversarial detection models. Meanwhile, the adversarial example detection method proposed in the current work can be transferred to other fields to detect adversarial examples, such as detecting adversarial examples in remote sensing image detection, video processing, and natural language processing.

Author Contributions

S.H. designed the method architecture and wrote the paper. Z.Z. provided guidance and device support. B.S. and S.H. conceived and conducted the experiments. B.S., S.H. and Z.Z. explored the experiments results. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by National Natural Science Foundation of China Grant No. 61972133, Project of Leading Talents in Science and Technology Innovation in Henan Province Grant No. 204200510021, Program for Henan Province Key Science and Technology under Grant Nos. 222102210177, 242102211077, Henan Province University Key Scientific Research Project under Grant No. 23A520008.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets applied in this paper are all open-source datasets. The open source URLs are shown below. https://yann.lecun.com/exdb/mnist/, https://www.cs.toronto.edu/~kriz/cifar.html, http://ufldl.stanford.edu/housenumbers/, accessed on 22 August 2024.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Dogan, A.; Birant, D. Machine learning and data mining in manufacturing. Expert Syst. Appl. 2021, 166, 114060. [Google Scholar] [CrossRef]
Tang, K.; Ma, Y.; Miao, D.; Song, P.; Gu, Z.; Tian, Z.; Wang, W. Decision fusion networks for image classification. IEEE Trans. Neural Netw. Learn. Syst. 2022, 1–14. [Google Scholar] [CrossRef]
Farooque, G.; Liu, Q.; Sargano, A.B.; Xiao, L. Swin transformer with multiscale 3D atrous convolution for hyperspectral image classification. Eng. Appl. Artif. Intell. 2023, 126, 107070. [Google Scholar] [CrossRef]
Xu, Y.; Yang, X.; Gong, L.; Lin, H.-C.; Wu, T.-Y.; Li, Y.; Vasconcelos, N. Explainable object-induced action decision for autonomous vehicles. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 9523–9532. [Google Scholar]
Dongjie, L.; Dongge, L.; Yang, L. Application of convolutional neural network in dynamic gesture tracking. J. Front. Comput. Sci. Technol. 2020, 14, 841. [Google Scholar]
Otter, D.W.; Medina, J.R.; Kalita, J.K. A survey of the usages of deep learning for natural language processing. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 604–624. [Google Scholar] [CrossRef]
Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; Fergus, R. Intriguing properties of neural networks. arXiv 2013, arXiv:1312.6199. [Google Scholar]
Guo, S.; Li, X.; Zhu, P.; Mu, Z. ADS-detector: An attention-based dual stream adversarial example detection method. Knowl.-Based Syst. 2023, 265, 110388. [Google Scholar] [CrossRef]
Zhang, S.; Chen, S.; Liu, X.; Hua, C.; Wang, W.; Chen, K.; Zhang, J.; Wang, J. Detecting adversarial samples for deep learning models: A comparative study. IEEE Trans. Netw. Sci. Eng. 2021, 9, 231–244. [Google Scholar] [CrossRef]
Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and harnessing adversarial examples. arXiv 2014, arXiv:1412.6572. [Google Scholar]
Carlini, N.; Wagner, D. Towards evaluating the robustness of neural networks. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2017; pp. 39–57. [Google Scholar]
Kurakin, A.; Goodfellow, I.J.; Bengio, S. Adversarial examples in the physical world. In Artificial Intelligence Safety and Security; Chapman and Hall/CRC: Boca Raton, FL, USA, 2018; pp. 99–112. [Google Scholar]
Zhao, Z.; Liu, Z.; Larson, M. Towards large yet imperceptible adversarial image perturbations with perceptual color distance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1039–1048. [Google Scholar]
Inkawhich, N.; Wen, W.; Li, H.H.; Chen, Y. Feature space perturbations yield more transferable adversarial examples. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 7066–7074. [Google Scholar]
Wang, H.; Wang, G.; Li, Y.; Zhang, D.; Lin, L. Transferable, controllable, and inconspicuous adversarial attacks on person re-identification with deep mis-ranking. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 342–351. [Google Scholar]
Jang, U.; Wu, X.; Jha, S. Objective metrics and gradient descent algorithms for adversarial examples in machine learning. In Proceedings of the 33rd Annual Computer Security Applications Conference, Orlando, FL, USA, 4–8 December 2017; pp. 262–277. [Google Scholar]
Wong, E.; Rice, L.; Kolter, J.Z. Fast is better than free: Revisiting adversarial training. arXiv 2020, arXiv:2001.03994. [Google Scholar]
Shafahi, A.; Najibi, M.; Xu, Z.; Dickerson, J.; Davis, L.S.; Goldstein, T. Universal adversarial training. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 5636–5643. [Google Scholar]
Liu, Z.; Liu, Q.; Liu, T.; Xu, N.; Lin, X.; Wang, Y.; Wen, W. Feature distillation: DNN-oriented JPEG compression against adversarial examples. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 860–868. [Google Scholar]
Zhu, B.; Dong, C.; Zhang, Y.; Mao, Y.; Zhong, S. Towards universal detection of adversarial examples via pseudorandom classifiers. IEEE Trans. Inf. Forensics Secur. 2023, 19, 1810–1825. [Google Scholar] [CrossRef]
Miyato, T.; Dai, A.M.; Goodfellow, I. Adversarial training methods for semi-supervised text classification. arXiv 2016, arXiv:1605.07725. [Google Scholar]
Naseer, M.; Khan, S.; Porikli, F. Local gradients smoothing: Defense against localized adversarial attacks. In Proceedings of the 2019 IEEE Winter Conference on Applications of Computer Vision (WACV), Waikoloa, HI, USA, 7–11 January 2019; pp. 1300–1307. [Google Scholar]
Papernot, N.; McDaniel, P.; Wu, X.; Jha, S.; Swami, A. Distillation as a defense to adversarial perturbations against deep neural networks. In Proceedings of the 2016 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2016; pp. 582–597. [Google Scholar]
Xu, W.; Evans, D.; Qi, Y. Feature squeezing: Detecting adversarial examples in deep neural networks. arXiv 2017, arXiv:1704.01155. [Google Scholar]
Yang, P.; Chen, J.; Hsieh, C.-J.; Wang, J.-L.; Jordan, M. ML-LOO: Detecting adversarial examples with feature attribution. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 6639–6647. [Google Scholar]
Fidel, G.; Bitton, R.; Shabtai, A. When explainability meets adversarial learning: Detecting adversarial examples using SHAP signatures. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–8. [Google Scholar]
Meng, D.; Chen, H. MagNet: A two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 135–147. [Google Scholar]
Kiani, S.; Awan, S.; Lan, C.; Li, F.; Luo, B. Two souls in an adversarial image: Towards universal adversarial example detection using multi-view inconsistency. In Proceedings of the 37th Annual Computer Security Applications Conference, Virtual Event USA, 6–10 December 2021; pp. 31–44. [Google Scholar]
Ma, X.; Li, B.; Wang, Y.; Erfani, S.M.; Wijewickrema, S.; Schoenebeck, G.; Song, D.; Houle, M.E.; Bailey, J. Characterizing adversarial subspaces using local intrinsic dimensionality. arXiv 2018, arXiv:1801.02613. [Google Scholar]
Luo, C.; Lin, Q.; Xie, W.; Wu, B.; Xie, J.; Shen, L. Frequency-driven imperceptible adversarial attack on semantic similarity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2022; pp. 15315–15324. [Google Scholar]
Moosavi-Dezfooli, S.-M.; Fawzi, A.; Frossard, P. Deepfool: A simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2574–2582. [Google Scholar]
Papernot, N.; McDaniel, P.; Jha, S.; Fredrikson, M.; Celik, Z.B.; Swami, A. The limitations of deep learning in adversarial settings. In Proceedings of the 2016 IEEE European Symposium on Security and Privacy (EuroS&P), Saarbruecken, Germany, 21–24 March 2016; pp. 372–387. [Google Scholar]
Aldahdooh, A.; Hamidouche, W.; Fezza, S.A.; Déforges, O. Adversarial example detection for DNN models: A review and experimental comparison. Artif. Intell. Rev. 2022, 55, 4403–4462. [Google Scholar] [CrossRef]
Tramer, F. Detecting adversarial examples is (nearly) as hard as classifying them. In Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 17–23 July 2022; pp. 21692–21702. [Google Scholar]
Li, X.; Li, F. Adversarial examples detection in deep networks with convolutional filter statistics. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5764–5772. [Google Scholar]
Samangouei, P.; Kabkab, M.; Chellappa, R. Defense-GAN: Protecting classifiers against adversarial attacks using generative models. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Hussain, M.; Hong, J.E. Reconstruction-based adversarial attack detection in vision-based autonomous driving systems. Mach. Learn. Knowl. Extr. 2023, 5, 1589–1611. [Google Scholar] [CrossRef]
Nie, W.; Guo, B.; Huang, Y.; Xiao, C.; Vahdat, A.; Anandkumar, A. Diffusion models for adversarial purification. In Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 17–23 July 2022; pp. 16805–16827. [Google Scholar]
Liu, X.; Cheng, M.; Zhang, H.; Hsieh, C.-J. Towards robust neural networks via random self-ensemble. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 369–385. [Google Scholar]
Cohen, J.; Rosenfeld, E.; Kolter, Z. Certified adversarial robustness via randomized smoothing. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; pp. 1310–1320. [Google Scholar]
Athalye, A.; Carlini, N.; Wagner, D. Obfuscated gradients give a false sense of security: Circumventing defenses to adversarial examples. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 274–283. [Google Scholar]
Gupta, P.; Rahtu, E. Ciidefence: Defeating adversarial attacks by fusing class-specific image inpainting and image denoising. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 6708–6717. [Google Scholar]
Liang, B.; Li, H.; Su, M.; Li, X.; Shi, W.; Wang, X. Detecting adversarial image examples in deep neural networks with adaptive noise reduction. IEEE Trans. Dependable Secur. Comput. 2018, 18, 72–85. [Google Scholar] [CrossRef]
Hendrycks, D.; Gimpel, K. Early methods for detecting adversarial images. arXiv 2016, arXiv:1608.00530. [Google Scholar]
Zhang, S.; Chen, S.; Hua, C.; Li, Z.; Li, Y.; Liu, X.; Chen, K.; Li, Z.; Wang, W. LSD: Adversarial Examples Detection Based on Label Sequences Discrepancy. IEEE Trans. Inf. Forensics Secur. 2023, 18, 5133–5147. [Google Scholar] [CrossRef]
Feinman, R.; Curtin, R.R.; Shintre, S.; Gardner, A.B. Detecting adversarial samples from artifacts. arXiv 2017, arXiv:1703.00410. [Google Scholar]
Lu, J.; Issaranon, T.; Forsyth, D. Safetynet: Detecting and rejecting adversarial examples robustly. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 446–454. [Google Scholar]
Carrara, F.; Becarelli, R.; Caldelli, R.; Falchi, F.; Amato, G. Adversarial examples detection in features distance spaces. In Proceedings of the European Conference on Computer Vision (ECCV) Workshops, Munich, Germany, 8–14 September 2018. [Google Scholar]
Ma, S.; Liu, Y. Nic: Detecting adversarial samples with neural network invariant checking. In Proceedings of the 26th Network and Distributed System Security Symposium, NDSS, San Diego, CA, USA, 24–27 February 2019. [Google Scholar]
Song, B.; Wang, R.; Gao, S.; Dong, Y.; Liu, L.; Zhou, W. Securing deep learning as a service against adaptive high frequency attacks with MMCAT. IEEE Trans. Serv. Comput. 2023, 16, 3723–3735. [Google Scholar] [CrossRef]
Chandrasegaran, K.; Tran, N.-T.; Cheung, N.-M. A closer look at Fourier spectrum discrepancies for CNN-generated images detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 7200–7209. [Google Scholar]
Xie, C.; Wang, J.; Zhang, Z.; Ren, Z.; Yuille, A. Mitigating adversarial effects through randomization. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Guo, C.; Rana, M.; Cisse, M.; Van Der Maaten, L. Countering adversarial images using input transformations. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]

Figure 1. Comparison of frequency domain graphs of original example, adversarial example and reconstructed example. The first column is the spatial domain diagram of the original example, and the rest is the frequency domain diagram.

Figure 2. (a) The FDR structure first extracts the features of the input attack example, then performs Fourier transform to the frequency domain, and finally eliminates the attack in the frequency domain and reconstructs the original image. (b) The FDR with gradient masking structure first resizes the attack examples and extracts the features of the input attack examples, then performs Fourier transform to the frequency domain, and finally eliminates the attack in the frequency domain and reconstructs the original image before resizing to recover the input size.

Figure 3. We select MJSMA attack algorithms to conduct experiments on three datasets and the number of iterations is 10–100.

Figure 4. We select four attack algorithms to conduct experiments on the MNIST dataset and the number of iterations of DeepFool and C&W attack algorithm is 10 or 100, respectively. We compare the proposed algorithm with 3 different detection algorithms.

Figure 5. We select four attack algorithms to conduct experiments on the CIFAR10 dataset, and the disturbance value of the FGSM and BIM attack algorithms is 0.03–0.30. We compare the proposed algorithm with 3 different detection algorithms.

Figure 6. We select four attack algorithms to conduct experiments on the SVHN dataset, and the disturbance value of the FGSM and BIM attack algorithms is 0.03–0.30. We compare the proposed algorithm with 3 different detection algorithms.

Figure 7. We visualize the recovery results of FDR GM method and other attack example detection methods under different attack algorithms.

Table 1. Comparison of adversarial attack performance.

Attack	Parameters Description	Dataset	Parameter Settings	ASR	PSNR
			$ε$ = 0.1	34.42	70.74
		MNIST	$ε$ = 0.2	84.49	64.80
			$ε$ = 0.3	95.23	61.34
			$ε$ = 0.1	95.59	68.31
FGSM	$ε$ : perturbation	CIFAR10	$ε$ = 0.2	93.94	62.50
			$ε$ = 0.3	92.00	59.27
			$ε$ = 0.1	80.00	68.23
		SVHN	$ε$ = 0.2	84.69	62.39
			$ε$ = 0.3	86.53	59.14
			max_ite = 1	64.06	74.02
		MNIST	max_ite = 10	99.99	68.04
			max_ite = 50	99.99	68.04
			max_ite = 1	33.59	96.28
DeepFool	max_ite: attack iteration	CIFAR10	max_ite = 10	99.99	89.56
			max_ite = 50	99.99	89.56
			max_ite = 1	17.53	91.95
		SVHN	max_ite = 10	99.99	84.50
			max_ite = 50	99.99	84.50
			max_ite = 10	93.24	62.79
		MNIST	max_ite = 100	99.99	63.27
			max_ite = 1000	99.99	63.24
			max_ite = 10	98.48	63.27
C&W	max_ite: attack iteration	CIFAR10	max_ite = 100	99.99	63.19
			max_ite = 1000	99.99	63.19
			max_ite = 10	98.77	63.05
		SVHN	max_ite = 100	99.79	62.96
			max_ite = 1000	99.99	62.61
			$ε$ = 0.1	86.31	59.03
		MNIST	$ε$ = 0.2	89.55	54.43
			$ε$ = 0.3	90.11	50.44
			$ε$ = 0.1	89.59	60.58
BIM	$ε$ : perturbation	CIFAR10	$ε$ = 0.2	89.53	56.86
			$ε$ = 0.3	89.72	55.80
			$ε$ = 0.1	89.49	60.21
		SVHN	$ε$ = 0.2	89.62	56.86
			$ε$ = 0.3	89.72	55.80
			max_ite = 10	98.29	79.03
		MNIST	max_ite = 100	99.99	65.39
			max_ite = 1000	99.99	65.39
			max_ite = 10	99.12	75.18
MJSMA	max_ite: attack iteration	CIFAR10	max_ite = 100	99.99	75.70
			max_ite = 1000	99.99	75.70
			max_ite = 10	99.87	74.54
		SVHN	max_ite = 100	99.99	73.83
			max_ite = 1000	99.99	73.83

Table 2. Comparison of detection accuracy results on three datasets.

		Attack Methods
Dataset	Method	FGSM	DeepFool	C&W	BIM	MJSMA
	ANR	59.36	47.50	37.44	29.01	49.07
	RAN	43.17	55.66	64.25	56.78	73.51
MNIST	TVM	13.33	27.02	26.32	13.65	34.48
	FDR	64.63	92.46	93.58	45.70	56.43
	FDR GM	67.47	85.43	88.23	61.08	63.51
	ANR	13.27	30.39	12.75	23.24	28.86
	RAN	43.21	50.93	40.70	40.55	53.82
CIFAR10	TVM	15.91	22.55	12.42	17.69	29.73
	FDR	43.87	44.55	47.58	77.05	49.36
	FDR GM	48.24	54.08	51.05	71.15	60.39
	ANR	14.29	36.64	13.08	28.57	36.64
	RAN	62.76	68.63	68.39	67.30	67.91
SVHN	TVM	29.94	26.62	25.13	31.03	27.52
	FDR	27.52	15.70	18.80	33.30	19.53
	FDR GM	75.18	81.55	77.52	82.42	74.06

Table 3. Comparison of ablation results of FDR GM methods under different attack algorithms in the MNIST dataset.

Conv	GM	FGSM	DeepFool	C&W	BIM	MJSMA
✓		64.63	92.46	93.58	45.70	56.43
	✓	38.54	54.85	68.59	27.48	37.59
✓	✓	67.47	85.43	88.23	61.08	63.51

Table 4. Comparison of ablation results of FDR GM methods under different attack algorithms in the CIFAR10 dataset.

Conv	GM	FGSM	DeepFool	C&W	BIM	MJSMA
✓		43.87	44.55	47.58	77.05	49.36
	✓	27.48	37.59	18.65	25.86	37.85
✓	✓	48.24	54.08	51.05	71.15	61.39

Table 5. Experimental results of ablation under different attack algorithms in the SVHN dataset.

Conv	GM	FGSM	DeepFool	C&W	BIM	MJSMA
✓		27.52	15.70	18.80	33.30	19.53
	✓	48.74	58.65	46.47	58.52	56.57
✓	✓	75.18	81.55	77.52	82.42	74.06

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, S.; Zhang, Z.; Song, B. A Novel Adversarial Example Detection Method Based on Frequency Domain Reconstruction for Image Sensors. Sensors 2024, 24, 5507. https://doi.org/10.3390/s24175507

AMA Style

Huang S, Zhang Z, Song B. A Novel Adversarial Example Detection Method Based on Frequency Domain Reconstruction for Image Sensors. Sensors. 2024; 24(17):5507. https://doi.org/10.3390/s24175507

Chicago/Turabian Style

Huang, Shuaina, Zhiyong Zhang, and Bin Song. 2024. "A Novel Adversarial Example Detection Method Based on Frequency Domain Reconstruction for Image Sensors" Sensors 24, no. 17: 5507. https://doi.org/10.3390/s24175507

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Novel Adversarial Example Detection Method Based on Frequency Domain Reconstruction for Image Sensors

Abstract

1. Introduction

2. Related Work

2.1. Adversarial Examples Generation

2.2. Detection of Adversarial Examples

3. Adversarial Examples Detection Based on Frequency Domain Reconstruction

3.1. Analysis of Adversarial Examples in the Frequency Domain

3.2. The Proposed Model

3.3. Frequency Domain Reconstruction

3.4. Gradient Masking

3.5. Consistency Check

4. Experimental Analysis

4.1. Experimental Setup

4.2. Adversarial Examples Generation

4.3. Detection of Adversarial Examples

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI