Defense against Adversarial Attacks in Image Recognition Based on Multilayer Filters

Wang, Mingde; Liu, Zhijing

doi:10.3390/app14188119

Open AccessArticle

Defense against Adversarial Attacks in Image Recognition Based on Multilayer Filters

by

Mingde Wang

and

Zhijing Liu

^*

Computer Information Application Research Center, School of Computer Science and Technology, Xidian University, Xi’an 710071, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(18), 8119; https://doi.org/10.3390/app14188119

Submission received: 27 June 2024 / Revised: 5 September 2024 / Accepted: 7 September 2024 / Published: 10 September 2024

(This article belongs to the Special Issue Deep Learning for Image Recognition and Processing)

Download

Browse Figures

Versions Notes

Abstract

:

The security and privacy of a system are urgent issues in achieving secure and efficient learning-based systems. Recent studies have shown that these systems are susceptible to subtle adversarial perturbations applied to inputs. Although these perturbations are difficult for humans to detect, they can easily mislead deep learning classifiers. Noise injection, as a defense mechanism, can offer a provable defense against adversarial attacks by reducing sensitivity to subtle input changes. However, these methods face issues of computational complexity and limited adaptability. We propose a multilayer filter defense model, drawing inspiration from filter-based image denoising techniques. This model inserts a filtering layer after the input layer and before the convolutional layer, and incorporates noise injection techniques during the training process. This model substantially enhances the resilience of image classification systems to adversarial attacks. We also investigated the impact of various filter combinations, filter area sizes, standard deviations, and filter layers on the effectiveness of defense. The experimental results indicate that, across the MNIST, CIFAR10, and CIFAR100 datasets, the multilayer filter defense model achieves the highest average accuracy when employing a double-layer Gaussian filter (filter area size of

3 \times 3

, standard deviation of 1). We compared our method with two filter-based defense models, and the experimental results demonstrated that our method attained an average accuracy of 71.9%, effectively enhancing the robustness of the image recognition classifier against adversarial attacks. This method not only performs well on small-scale datasets but also exhibits robustness on large-scale datasets (miniImageNet) and modern models (EfficientNet and WideResNet).

Keywords:

adversarial attack; deep learning; defense method; machine learning

1. Introduction

System security and privacy are closely related to the security and efficiency of a system. Deep learning networks perform particularly well on many artificial intelligence tasks, including security-sensitive applications such as object recognition [1], self-driving cars [2], computational vision [3], and image classification [4]. Some data carefully crafted by adversaries seriously threatened these systems [5,6,7]. These data are called adversarial examples [8]: the adversary finds that adversarial images are generated by adding subtle adversarial perturbations to legitimate input images [9,10]. Such carefully crafted perturbations are visually imperceptible to the human cognitive system, and they can cause deep learning models to misclassify the input images. AN FGSM [11] attack is a relatively basic but very effective adversarial attack. It generates adversarial perturbations by calculating the gradient of the loss function with respect to the input image and based on the sign of the gradient. The VNIFGSM [12] attack method is based on a new method called variance tuning to enhance iterative gradient-based attacks and improve their transferability.

Defense system designers often prioritize the modification of models. For instance, they may streamline the architecture of a classifier [13] or incorporate pre/post-processing layers into the classifier’s architecture [14,15]. Dynamic network routing (DNR) [16] stands out by producing pruned DNNs that boast both high robustness and standard accuracy. It achieves this through a unified constraint optimization formula and a mixed loss function, effectively merging ultra-high model compression with robust adversarial training. On the other hand, low-temperature distillation (LTD) [17] employs a distillation framework to generate labels, utilizing lower temperatures in the teacher model and distinct fixed temperatures for both the teacher and student models. The integration of parameter-shifted sigmoid linear units (PSSiLUs) with learnable parameter activation functions (PAFs), coupled with adversarial training, further enhances robustness. SENSEI [18] adopts a strategy that either substitutes each data point with suitable variants or retains it as is. These methods, however, alter specific deep learning model architectures, potentially leading to reduced adaptability and portability.

Researchers have designed an independent module to identify adversarial samples in order to enhance the robustness of the classifier [19]. For example, MagNet [20] consists of one or more independent detector networks and a reformer network. The function of the reformer network is to guide adversarial samples onto the manifold of normal samples, which is particularly effective for accurately classifying adversarial samples under small perturbations. A class activation feature-based denoiser (CAFD) [21] is a self-supervised method that trains a set of examples generated by class activation feature-based attacks (CAFAs) [21] from adversarial examples to remove noise. The detector graph (DG) [22] constructs a latent neighborhood graph (LNG) for each original instance and uses graph neural networks (GNNs) [23] to consider the relationship between the original instance and the adversarial instance, considering the graph to detect adversarial instances. An adversarial defense with local implicit functions (DISCO) [24] serves as an adjunct network to the classifier, employing local manifold projections to mitigate adversarial perturbations. It receives adversarial images and a query pixel position. This defense mechanism includes an encoder that creates depth features per pixel and a local implicit module that uses these features to predict clean RGB values. These methods require a significant amount of overhead to maintain an additional module.

As a defense mechanism, noise injection [25,26,27] has been demonstrated in many literature studies [28,29,30] to provide provable defense performance against adversarial attacks in certain situations. This mechanism is achieved by inserting noise layers into the classification network. The noise layer enhances the image classifier by inserting noise into deep learning networks using different noise generation mechanisms, such as Laplacian and Gaussian mechanisms. The adversarial purification-guided diffusion model (GDMAP) [31] receives pure Gaussian noise as initial input and gradually denoises it to the adversarial image. DensePure [32] iteratively denoises input images with different random seeds to obtain multiple reverse samples, which are fed to the classifier. The final prediction is based on majority voting. In addition, Wang et al. [33] used the latest diffusion model [34], demonstrating that diffusion models with higher efficiency and image quality can be directly transformed into better robust accuracy. However, noise injection methods require a compromise between natural and adversarial accuracy, making their performance unsatisfactory.

Before training the classifier, denoising the training data can improve the robustness of the classifier. Common methods include data quantization [35,36] and image filters [37,38]. Faiq Khalid et al. proposed a quantization-based defense mechanism to protect deep neural networks (DNNs) from adversarial attacks (QuSecNets) [36]. However, QuSecNets obtains control parameters by training neural networks, which requires complex computational resources and raw data. Yuyang X. et al. proposed a defense based on digital image processing (DIP) [39], which uses spatial filters and thresholds for image preprocessing to remove adversarial disturbances. This method can easily and effectively resist adversarial attacks. While effective in grayscale imagery, this method exhibits limitations in its application to color images, as the utilization of threshold techniques may result in partial loss of critical image information. Vadim Z. and Maxim T. proposed a low-pass image filtering (LPIF) [40] technique to reduce the impact of high-frequency noise on cellular neural networks. This approach offers advantages in terms of resource efficiency and ease of implementation. However, it may be ineffective against low-frequency adversarial attacks.

Taking inspiration from the above methods, we propose a robust enhancement strategy for image classifiers based on multilayer filtering techniques. This strategy establishes a robust defense model capable of withstanding a variety of attacks by integrating filtering layers into susceptible classifier networks. Additionally, we incorporated a noise injection mechanism during the classifier’s training phase to further bolster the model’s defensive capabilities. The filtering layer, functioning as a processing unit, enhances the classifier’s resilience to minor alterations in input data by exploiting information from neighboring pixels. We also studied the effects of different filter combinations, filter area sizes, standard deviations, and filter layers on defense effectiveness. The comparative analysis of experimental results on filters with different layers revealed that employing a double-layer Gaussian filter with a filter area size of

3 \times 3

and a standard deviation of 1 yielded the highest average accuracy for the multilayer filter defense model in the MNIST, CIFAR10, and CIFAR100 datasets. Furthermore, comparative results against two alternative defense models utilizing filters indicate that our approach outperforms, demonstrating that this model significantly enhances the robustness of image recognition classifiers. We have also confirmed the efficacy of our method on large-scale datasets, such as miniImageNet, and modern models like EfficientNet and WideResNet. Finally, we analyzed the time cost required for model training and explored the balance between model robustness and training efficiency.

2. Background

2.1. Adversarial Samples

Adversarial samples are a type of attack data targeting machine learning models, especially in convolutional neural networks (CNNs) [41] for multi-class image classification. This attack method has a high degree of threat, as the attacker adds visually imperceptible interference to the original image, which may lead to image classification failure. Consider the deep learning model

F : R_{n} \to Y

as a function that maps n-dimensional inputs to all possible label sets

Y = {1, \dots, k}

. In practical applications, the test image x is used as the model input, and the classification label y of the image is obtained through the function

F_{ω}

. The

ω

here is a set of parameters that deep learning models need to learn during the training process. If instance x is correctly classified, we can obtain

y = F_{ω} (x)

.

However, the attacker’s goal is to introduce a small amount of perturbation

δ \in R^{n}

to x toward x, i.e.,

x^{'} = x + δ

, resulting in classifier errors. Specifically, adversarial attacks are defined as Equation (1), as follows:

\begin{matrix} min_{δ} F_{ω} (x + δ) \neq y \\ s . t . F_{ω} (x) = y \\ {| | δ | |}_{p} \leq ϵ \end{matrix}

(1)

In this study, it is generally presumed that the attacker’s actions are bounded by a p-norm, which guarantees that

{| | δ | |}_{p} \leq ϵ

. Here,

ϵ

acts as a threshold that restricts the attacker’s ability to alter the input. All attack methodologies employed within this research are founded upon the ∞-norm, adhering to the fundamental principle that the variation in pixel values, represented by

δ

, must not surpass the predetermined

ϵ

value limit.

2.2. Adversarial Attack

Based on the different ways of generating adversarial samples, we consider two types of adversarial attacks: iterative attacks and optimization-based attacks. FGSM (fast gradient sign method) is currently one of the most popular iterative adversarial attack methods [11]. The core idea of this method is to add some adversarial vectors to the original image. The direction of this vector matches the gradient of the loss function. The FGSM vector can be expressed as Formula (2), as follows:

x^{a d v} = x + α \times s i g n (▽_{x} L o s s (x, y))

(2)

L o s s (x, y)

is the loss function used to train the classifier, where

▽_{x}

is the gradient in the image space,

α

is the iteration step (set

α = ϵ

), x is the input vector (image), y is the true vector class, and

x^{a d v}

is an adversary example.

The basic iterative method (BIM) [42] is an iterative attack method that performs multiple iterations with small steps to determine the direction of disturbance based on the decision boundary of the classifier during each iteration. Projection gradient descent (PGD) [43] is a stronger iterative attack method that extends the fast gradient sign method by finding the gradient direction with the highest loss and projecting it onto the sample. VNIFGSM [12] is a method called variance adjustment to enhance the category of iterative gradient-based attack methods and improve their attack transferability. Specifically, in each iteration of gradient calculation, we do not directly use the current gradient for momentum accumulation, but further consider the gradient variance of the previous iteration to adjust the current gradient, in order to stabilize the update direction and remove poor local optima.

Intuitively, the disturbance generated by the iterative method can effectively fool the network, but it is also easy to overfit specific network parameters, so the transferability is poor. The single-step method is weak in attack. Therefore, we conduct comprehensive experiments to test the effectiveness of proposed defense methods against three iterative attack methods under different attack scenarios.

In addition, optimization-based methods directly optimize the distance between the real and adversarial examples. Instead of leveraging the training loss, Carlini and Wagner (C&W) [44] designed an L2-regularized loss function based on the logit layer representation for generating adversarial examples. AutoAttack [45] integrates multiple advanced attack algorithms, including FGSM, PGD, etc., to achieve a comprehensive and effective combination of attack strategies. AutoAttack uses a combination of these algorithms to form an automated and gradually enhanced attack strategy. This means that it can adapt and optimize attacks through continuous experimentation to maximize the exposure of the model’s vulnerability.

2.3. Image Recognition Classifiers

In this research, we employed six distinct image classification network architectures: convolutional neural network (CNN) [41], deep neural network (DNN) [46], VGG16 [47], ResNet50 [48], EfficientNet [49,50], and WideResNet [51]. Figure 1 illustrates the architectural layout of the CNN. A convolutional neural network (CNN) comprises three convolutional layers and a single fully connected layer. Each convolutional layer is succeeded by a pooling layer, designed to diminish feature dimensionality and distill essential information. To mitigate overfitting, we integrated a Dropout layer with a dropout rate of 0.2 following the last convolutional layer. Subsequently, a fully connected layer was established, and the final classification outcome was produced via a softmax activation function. Figure 2 depicts the architectural layout of the DNN. A deep neural network (DNN) is composed of an input layer, an output layer, and two hidden layers. Considering the generalization limitations of CNNs and DNNs, the primary evaluation was performed on smaller datasets, specifically MNIST and CIFAR-10.

Furthermore, both CNN and DNN employ ReLU (rectified linear unit) as their activation function. The rationale behind selecting ReLU is its ability to expedite the training process and mitigate the issue of gradient vanishing, thus facilitating a more stable and efficient learning experience.

The VGG16 network has garnered significant attention within the realm of image classification tasks, owing to its profound architecture and exceptional performance. This network is composed of 16 convolutional layers and fully connected layers, which progressively distill advanced image features through the iterative application of 3 × 3 convolutional kernels and 2 × 2 pooling layers. On the other hand, the ResNet50 network addresses the gradient vanishing issue inherent in deep network training by incorporating residual connections. This network, comprising 50 convolutional layers, employs skip connections to bypass the input directly to subsequent layers, thereby enabling the network to capture deeper representations.

To ascertain the efficacy of this approach within modern architectural frameworks, we chose two distinct models for evaluation: EfficientNet and WideResNet. The EfficientNet model employs a set of composite coefficients to dynamically adjust the network’s width, depth, and resolution, thus enhancing the model’s accuracy without escalating computational demands. Conversely, the WideResNet model boosts performance by broadening the network’s architecture and incorporating Dropout layers to mitigate overfitting. This model augments its capacity by expanding the number of channels within the convolutional layers, all the while preserving computational efficiency.

3. Materials and Methods

3.1. Robustness Defense

Image recognition algorithms occupy an important position in the field of artificial intelligence, among which convolutional neural networks (CNNs) and their variants are widely used. However, there are differences between CNN and human visual systems in processing image information, which pose potential risks for adversarial attacks.

Specifically, the human visual system has strong adaptability to high-frequency components such as details and noise in images, and the high-frequency information has a limited impact on human visual perception [52]. In contrast, CNN processes all types of information equally when processing images, regardless of whether they belong to high-frequency or low-frequency components. Therefore, when facing images containing complex high-frequency noise or edge information, the recognition accuracy of CNN may be significantly affected.

Given this difference, attackers can construct adversarial samples; that is, by fine-tuning pixel values while maintaining the visual perception of the image, CNN produces erroneous recognition results. Due to the relatively fragile processing of high-frequency information by CNN, attackers can successfully carry out adversarial attacks by cleverly introducing or modifying high-frequency noise or edge information.

The use of Gaussian filters for preprocessing training images can effectively suppress high-frequency noise components in the images, thereby improving the robustness of image recognition models in dealing with adversarial attacks [40]. Based on this, we further investigate and propose a defense strategy based on multilayer filters. This strategy adds a filtering layer after the input layer to achieve multilayer filtering processing of the input image. This method does not rely on a specific network architecture; therefore, it has broad applicability and can be flexibly applied to various existing defense models. In addition, we introduce a noise injection mechanism during the model training phase. This cannot only further suppress image noise and improve image quality, but also significantly enhance the robustness of the image recognition model in dealing with adversarial attacks, thereby improving the recognition accuracy and stability of the model. Through this mechanism, we can further enhance the reliability and robustness of the model in practical applications while ensuring its performance.

As shown in Figure 3, the proposed defense model inserts a filtering layer between the input layer and the first convolutional layer. The traditional image classification method (as shown in black in Figure 3) is to train a classification system based on machine learning algorithms that can distinguish images with different labels. This method is easily susceptible to adversarial attacks and loses effectiveness. Typically, deep learning models consist of multiple convolutional layers and a fully connected layer. S represents the scoring function of the original classification. It is a deterministic mapping of the probability distribution from the image x to the k label, and then

Y (X) = a r g m a x (S (X))

. The main reason for being vulnerable to adversarial example attacks is the infinite sensitivity of S to p-norm change in the input. By adding calibration noise to convert S into a random function

F S

, the expected output of this function will be sensitive to changes in the p-norm in the input. We achieve this by introducing a filter layer (shown in blue in Figure 3), which first performs preliminary Gaussian filtering operations. This move aims to weaken the high-frequency noise interference in the image and reduce its adverse impact on the model training process. The image that has undergone preliminary filtering processing will mainly retain low-frequency information, namely the core structure and key features of the image, to ensure that the model can more accurately extract effective information from the image. Subsequently, we further applied Gaussian filtering to the preliminarily filtered images to form a double-layer filtering mechanism. This step aims to further enhance the robustness of the image to effectively respond to complex and ever-changing adversarial attacks that may occur. Through double-layer filtering, we can further suppress potential noise interference and improve the robustness of the model while maintaining the core information of the image. It should be pointed out that although the double-layer Gaussian filter strategy can significantly improve the robustness of the model, in practical applications, we also need to pay attention to its potential impact on the recognition accuracy of the model. Therefore, when applying this strategy, we need to balance the relationship between defense effectiveness and recognition accuracy based on specific needs and scenarios, in order to achieve optimization and balance of model performance.

Training a robust network with a filter layer is similar to training the original network. We use the original loss and optimizers, such as stochastic gradient descent (SGD). The major difference is that we changed the input of the network to limit its sensitivity to p-norm changes in the input. We use image filtering techniques to convert S into another function

F S

with limited sensitivity to p-norm changes in the input. It is important to note that during training, we use

F S

for a single optimization to predict the true label for a training example x. Only when the real label score is significantly higher than other label scores, the prediction output

Y (x)

is robust. By pushing the network to give high scores to the real label k at points around x, the goal of our training is to increase the prediction score

E (F S_{i = k} (x))

of the real label and reduce the prediction score

E (F S_{i \neq k} (x))

of other labels.

3.2. Filtering Area

The sensitivity of the expected output to changes in the input layer depends on the location of the filter layer in the network. In fact, it is feasible to insert the filter layer anywhere in the network. The most straightforward placement of the filter layer is just after the input layer, which is equivalent to adding noise to individual pixels of the image.

Image filtering technology is mainly used to suppress noise in the target image while preserving image details as much as possible. This is an indispensable operation in image preprocessing, and the quality of its processing directly affects the effectiveness and reliability of subsequent image processing and analysis. Common image filtering methods include linear filtering (such as mean filtering and Gaussian filtering) and nonlinear filtering (such as median filtering and bilateral filtering). Each filtering method has its specific application scenarios and advantages. For example, median filtering can effectively remove salt and pepper noise, while bilateral filtering can preserve edge information while smoothing the image.

The area where image filters calculate pixels is usually referred to as the “filtering area” or “neighborhood”. The filter will perform operations on the pixels in this area, and the result of the operation is used to determine the pixel value of the new image. The filtering area is designed as an

a \times b

rectangle. The filter scans the entire image in order from left to right and from top to bottom, as shown in Figure 4.

By using a filtering layer, the size of the output image is the same as the size of the input image. Each pixel of the output image is influenced by its neighboring pixels. Given the input image size

n \times n

and filtering area size

a \times b

, the output image of size

n \times n

is calculated from all pixels in the filtering area. Let

S_{x, y}

denote the set of coordinates of the input image affected by the filtering area of the filtered image at point

(x, y)

. Specifically, the filtering area of the image filter is defined as Formula (3), as follows:

\begin{matrix} S_{x, y} = {(u, v) | u \in [x - \frac{a}{2}, x + \frac{a}{2}) \land [0, n), \\ v \in [y - \frac{b}{2}, y + \frac{b}{2}) \land [0, n), u, v \in N} \end{matrix}

(3)

where

(u, v)

is the coordinate of the filtering area. The region of the filtering area is a rectangle of size

a \times b

formed by the center pixel (x, y) and its neighbor pixels.

When processing the edges of the input image, it is necessary to expand the image boundary to ensure the completeness of the filtering area. The padding method of the image boundary affects the effect of the image filter. Ideally, we expect the image filter to be unaffected by the padding value. During the experiment, it is feasible to fill the border of the input image with 0.

3.3. Filter Layer

Image filter works by moving through the image pixel by pixel and using the function value of all pixels of the receptive field calculated by feature mapping function to replace each value [53,54]. Gaussian filter is a widely used method. We first use a Gaussian filter in the filtering layer to weaken high-frequency noise interference in the image, where the operation of the image filter can be expressed as Formula (4), as follows:

Q (x, y) = G a u s s i a n_{(u, v) \in S_{x, y}} P (u, v)

(4)

where

S_{x, y}

is the coordinate set of the filtering area of the input image at the point

(x, y)

.

P (u, v)

is the pixel value of the point

(u, v)

in the input image.

Gaussian filters can be used again to effectively smooth out noise in the image while maintaining natural transitions at the edges, thereby improving the overall visual effect of the image, as shown in Formula (5):

F i l t e r (x, y) = G a u s s i a n_{(u, v) \in S_{x, y}} Q (u, v)

(5)

where

F i l t e r (x, y)

is the pixel value of the filter image produced by the filter layer. Our method not only accounts for extreme values that are difficult to affect but also considers the information from the original images to prevent distortion.

The double-layer Gaussian filter strategy has shown significant effectiveness in improving model robustness. However, in practical applications, we still need to deeply weigh the subtle relationship between defense effectiveness and recognition accuracy to ensure that the optimization and balance of model performance can be effectively achieved.

3.4. Noise Injection

Adversarial samples present a significant threat to the resilience of classifier models. To bolster the model’s robustness, researchers are diligently investigating a range of defense strategies, with noise injection [40] technology emerging as a particularly effective method. Drawing inspiration from this approach [40], we too have implemented a noise injection mechanism.

Figure 5 depicts the process of incorporating noise injection during the training data phase. The core idea of the noise injection mechanism is to introduce an appropriate amount of noise during the training phase of the classifier model, so that the model can adapt to the noisy environment during the training process, thereby enhancing its robustness in practical applications. During the training process, researchers first inject Gaussian noise into the original training data to obtain two parts: noiseless data and noisy data. Subsequently, they merge these two parts of data to form a brand new training set. Through this approach, the classifier model can fully adapt to data with different noise levels during the training process, thereby improving its robustness in practical applications.

3.5. Black Box Attack Environment Settings

For real attack scenarios, attackers usually do not know the specific structure and parameters of the classifier, which is like a black box, completely transparent to the attacker. Due to the unknown architecture and parameters of the classifier for attackers, they usually choose to directly utilize alternative models to generate adversarial samples against the target model [55]. Among numerous deep learning models, convolutional neural networks perform well in image recognition and classification tasks, with extremely high transferability and scalability [56,57,58]. Therefore, in the experiment, we used a convolutional neural network architecture as a substitute model for the MNIST dataset [59]; For the CIFAR10 and CIFAR100 datasets [60], we chose the VGG16 [47] network architecture as their alternative model.

3.6. Dataset

This paper validated the effectiveness of the proposed defense model through experiments conducted on publicly available image datasets, namely, MNIST [59], CIFAR10, and CIFAR100 [60]. Among them, the MNIST dataset consists of a training set containing 60,000 samples and a testing set containing 10,000 samples. The CIFAR10 dataset consists of 60,000 color images from 10 categories, each containing 6000 images. The CIFAR100 dataset is similar to CIFAR-10 in that it contains 100 categories, each with 600 images.

The ImageNet [61] dataset is an extensive repository of images, predominantly utilized for research in computer vision and machine learning. This database was established in 2009 by Professor Li of Stanford University, alongside other contributors, with the goal of fostering progress in image recognition technology. ImageNet encompasses millions of images, each meticulously annotated and representing thousands of distinct categories, with hundreds of images per category. MiniImageNet [62], a derivative subset of the renowned ImageNet dataset, is scaled down to facilitate the study of algorithms that learn from limited samples. The miniImageNet dataset comprises 100 categories of images, with 600 images per category, amounting to a total of 60,000 images. These images are partitioned into training, validation, and testing sets, enabling researchers to assess the generalization capabilities of their models across various tasks. Our intention is to verify the efficacy of our method using the miniImageNet dataset.

3.7. Hardware and Methods Used in the Simulation

In this paper, all experiments were conducted based on the publicly available TensorFlow 2.10 framework. The experiment was conducted on a machine running the Windows 11 operating system, equipped with a 14-core 2.30 GHz CPU, NVIDIA GeForce RTX 3060 Laptop GPU, and 16 GB memory. On the MNIST dataset, we trained two different classifiers: convolutional neural network (CNN) [41] and deep neural network (DNN) [46]. For the parameter adjustment of each layer, we adopted the Adam optimization algorithm and a fixed learning rate of 0.01. On the CIFAR-10 dataset, we trained three different classifiers: deep neural network, VGG16 [47], and residual network (ResNet50) [48]. On the CIFAR-100 dataset, we trained two different classifiers (VGG16 and the residual network). Except for the parameter adjustment of each layer of the deep neural network using the Adam optimization algorithm and a fixed learning rate of 0.001, the parameter adjustment of all other layers adopts the stochastic gradient descent (SGD) algorithm with a driving amount of 0.9, with a learning rate of 0.01. Except for the final classification layer, all activation units have adopted a rectified linear unit (ReLU) activation function. In addition, the batch size is set to 16. In addition, to generate adversarial images, the black box models in the MNIST dataset use a convolutional neural network architecture, while the black box models in the CIFAR-10 and CIFAR-100 datasets use the VGG16 architecture.

4. Experimental Results

4.1. Experimental Setup

In the in-depth exploration of adversarial attacks in experimental research, we have established a prerequisite assumption: that the attacker is completely unaware of the existence of the filtering layer. In addition, this study also reproduced six typical adversarial attack patterns, including FGSM, BIM, PGD, CW, AutoAttack, and VNIFGSM. In the selection of attack samples, we adopted a rigorous approach to ensure that all generated test samples can effectively lure black box classifiers into making false positives. Table 1 shows in detail the number of test samples used in different attack scenarios.

To ensure that the disturbance is visually imperceptible, all attack factors are set to

ϵ = 8 / 255

. The selection of this value aims to maintain the integrity of image information to the greatest extent possible while introducing minimal disturbances that are difficult to detect with the naked eye. Subsequently, we used six different attack methods to generate a series of adversarial samples for all test samples in the dataset. On this basis, we used a trained black box classifier to classify these adversarial samples. Finally, we will gather all adversarial samples that have been misclassified into one dataset. This dataset will serve as the test sample for our subsequent experiments.

To evaluate the performance of classifiers in the face of adversarial attacks, we usually use test datasets that have not participated in training to generate adversarial samples. The difficulty level of generating adversarial samples varies among different datasets. The MNIST dataset focuses on handwritten digit recognition, with smaller image sizes and relatively single categories. In this context, the accuracy of the black box classifier can reach up to 98.17%. Therefore, for the MNIST dataset, when using adversarial attack methods such as FGSM and BIM, the number of generated adversarial samples is relatively small. In contrast, the CIFAR100 dataset contains larger and more diverse types of image data. On this dataset, the accuracy of the black box classifier reached 70.25%. Due to the relatively limited performance of the classifier, in this case, applying a small degree of disturbance to the image may induce misjudgment by the classifier. Therefore, on the CIFAR100 dataset, using attack methods such as FGSM and BIM can more easily generate a large number of adversarial samples.

4.2. Combination of Multilayer Image Filters

In the field of image processing, the role of filters is crucial. They assist us in removing noise from images, improving image clarity, and reducing the impact of adversarial noise generated by adversarial attacks. In this experiment, we selected five common filters for comparative testing. These five types of filters include average filters, Gaussian filters, maximum filters, minimum filters, and median filters. These filters have their own characteristics in handling image noise, edge preservation, and other aspects.

In order to comprehensively evaluate the performance of these five filters, this experiment tested all possible filter combinations using two different layers. During the experiment, the image classifier used CNN, and the filtering area size of all image filters was 5 × 5, which can effectively handle noise in the image. In addition, the standard deviation of the Gaussian filter is set to 1. In the method of using only one layer filter, the second layer filter is marked as “null”. By comparing and experimentally verifying different filters, we can gain a deeper understanding of their advantages and disadvantages.

Table 2 and Table 3 show the average accuracy of each image filter combination defense model in six adversarial attack scenarios on the MNIST dataset and CIFAR10 dataset, respectively. The experimental results show that the “Gaussian-Gaussian” image filter defense model performs better, with an average accuracy of 66.7%. This method uses a two-layer Gaussian filter. Gaussian filters are very effective in suppressing adversarial attacks and generating adversarial perturbations. Meanwhile, Gaussian filters achieve smoothing of the entire image by reducing the intensity variation between one pixel and the next.

4.3. Performance of the Proposed Defense

To ascertain the efficacy of the proposed methodology, this study chose two filter-based defense strategies (DIP [39] and LPIF [40]) for a comparative analysis. To guarantee the thoroughness and dependability of the experimental outcomes, six prevalent adversarial attack techniques were selected: FGSM, PGD, BIM, C&W attack, AutoAttack, and VNIFGSM. These attack methods are capable of providing a comprehensive assessment of the performance of various defense strategies across diverse attack scenarios.

Table 4 depicts the accuracy performance of image classifiers utilizing diverse defense strategies within six unique attack scenarios. Our research findings suggest that our method surpasses the other two defense approaches, exhibiting enhanced accuracy and robustness. Notably, DIP experiences a considerable drop in performance when handling color images, such as those in the CIFAR-10 and CIFAR-100 datasets, due to the reliance on fixed threshold techniques that frequently lead to the loss of critical details in color images. Furthermore, we subjected our defense technology to rigorous testing across multiple datasets and various deep learning architectures, and the results confirmed its robustness in practical deployment.

4.4. Evaluation of the Impact of Noise Injection on the Robustness of Classifiers

The noise injection mechanism, as an effective defense strategy, makes the model robust in the face of adversarial attacks. We tested the performance of the noise injection mechanism in defense strategies on the CIFAR10 dataset and analyzed its effectiveness and applicability through experiments. Throughout all testing scenarios, the CNN served as the image classification model, while the Adam optimizer was employed, utilizing a consistent learning rate of 0.01.

In order to improve the performance of the classifier model under noise interference, we added different intensities of noise to the training set. Figure 6 shows the detection results of the classifier model after adding noise to the training set. The black dashed line represents defense methods that have not adopted noise injection mechanisms. Through comparison, it can be found that the defense method using noise injection mechanism has significantly improved performance under noise interference. This indicates that injecting noise during the training process can significantly improve the robustness of the classifier model.

To further analyze the impact of noise injection mechanisms with different intensities on defense effectiveness, we tested the defense effectiveness of noise intensity sigma from 1 to 10. The experimental results show that as the noise intensity increases, the performance of the defense model is further improved. This indicates that within a certain range, an increase in noise intensity can improve defense effectiveness.

However, when the noise intensity exceeds a certain level, the defense performance will actually decrease. This is because excessive noise intensity can interfere with the useful information learned by the model, leading to performance degradation. In the CIFAR10 dataset, we found that the optimal noise intensity is 3, at which point the defense performance reaches its optimal level.

4.5. The Size of the Filtering Area and Standard Deviation

Gaussian filters can effectively smooth images and suppress adversarial noise. In the design of Gaussian filters, two key parameters are the size of the filtering area and the standard deviation. They are crucial for the influence of Gaussian filters, therefore, in this experiment, we analyzed these two parameters. Throughout all testing scenarios, the CNN served as the image classification model, while the Adam optimizer was employed, utilizing a consistent learning rate of 0.01.

The size of the filtering area refers to the size of the area covered when filtering an image. Obviously, the larger the filtering area, the larger the area covered by the filter on the image. This also means that the range of filtering effects will be affected accordingly. In order to analyze the impact of filtering area size on defense models, five different filtering area sizes were used in this experiment, namely 3 × 3, 5 × 5, 9 × 9, 13 × 13, and 17 × 17.

The standard deviation is an important parameter of the Gaussian function, which determines the degree of diffusion of the Gaussian function. Adjusting the standard deviation is actually adjusting the degree of influence of surrounding pixels on the current pixel while increasing the standard deviation increases the degree of influence of distant pixels on the central pixel. In other words, the larger the standard deviation, the more pronounced the diffusion of the Gaussian function. Therefore, in practical applications, we can adjust the value of the standard deviation as needed to achieve the desired filtering effect. In this experiment, we used four different standard deviation parameter sigma values, namely 1, 2, 5, and 10.

Table 5 shows in detail the performance of the model in terms of average accuracy under different filtering area sizes and standard deviation combinations. Through comparative analysis of multiple sets of experimental data, we found that when the filtering area size is 3 × 3 and the standard deviation is 1, the average accuracy of the multilayer filter defense model reaches the highest, reaching 77.6%. A reasonable combination of filtering area size and standard deviation can help multilayer filter defense models better improve defense effectiveness.

4.6. Number of Layers for Multilayer Image Filters

The multilayer filter defense model uses Gaussian filters to suppress the impact of adversarial noise on the model. As the number of filter layers increases, the robustness of the model is improved. This is because each layer of the filter can further clean the input data, remove noise, and make the model more stable. However, we need to note that increasing the number of layers is not entirely an advantage. When there are too many layers, the generalization ability of the model to normal samples will be affected, resulting in poor performance of the model in practical applications. Therefore, balancing the robustness and generalization ability of the model has become a key issue.

In order to further investigate this issue, this study conducted a comparative analysis on multiple models with layers ranging from 1 to 5. In every testing scenario, CNN served as the image classification framework for both MNIST and CIFAR10 datasets, whereas ResNet50 was utilized for the CIFAR100 dataset. All methodologies employed a 3 × 3 dimensional filtering area. The Gaussian filter’s standard deviation was established at 1, and a noise injection strategy was incorporated into the model’s training process. As illustrated in Table 6, the average accuracy comparison results for each layer model on the three datasets are displayed across six distinct adversarial attack scenarios. Based on the data observed, we deduce that the model’s performance attains its peak when the quantity of filter layers is 2. This finding indicates that within a specific range, augmenting the number of filtering layers can indeed enhance the model’s robustness; however, an excessive number of layers can lead to a decline in the model’s generalization capabilities.

4.7. The Effectiveness of the Method

To evaluate the effectiveness of our proposed method in modern network environments, we not only tested it on the CIFAR10 dataset but also extended the experiments to more complex image classification tasks, such as the miniImageNet dataset. On miniImageNet, we used the EfficientNet model. Within the EfficientNet family, ranging from B0 to B7, there is a progressive increase in model parameters, which is accompanied by a corresponding enhancement in accuracy. Nevertheless, due to constraints related to device capabilities and graphics card memory resources, we ultimately opted for the EfficientNet-B0 model.

In addition, to further validate the performance of our method, we also compared it with two defense methods that utilize diffusion models. We adhere to the fundamental experimental setup and utilize the PyTorch implementations provided by Wang et al. (https://github.com/wzekai99/DM-Improves-AT, accessed on 1 September 2024) and DiffPure (https://github.com/NVlabs/DiffPure, accessed on 1 September 2024). The classification model employed is WideResNet [51]. Given the constraints of our device’s performance and the available graphics card memory, we limited ourselves to generating denoised images using 1M generated data. We trained a WRN-28-10 model on the CIFAR10 dataset with 1M generated data. For optimization, we used the SGD optimizer with Nesterov momentum, setting the momentum factor to 0.9 and the weight decay to

5 \times 10^{- 4}

. We implemented a cyclic learning rate schedule with cosine annealing, starting with an initial learning rate of 0.1. The dropout rate was set at 0.3. To evaluate performance, we considered two metrics: clean accuracy and robust accuracy. Clean accuracy evaluates defense methods on clean data and measures them across the entire test set, while robust accuracy evaluates the model’s resistance to AutoAttack.

Table 7 shows the testing accuracy under clean images and AutoAttack on the CIFAR-10 and ImageNet datasets. The experimental results show that our method has comparable accuracy on clean data to existing advanced methods, while significantly improving robustness on adversarial samples. In particular, on the ImageNet dataset, our method shows a significantly smaller decrease in accuracy compared to other comparison methods when subjected to AutoAttack attacks. This indicates that our method not only performs well on small-scale datasets but also demonstrates strong robustness on large-scale datasets and in the face of complex attacks.

4.8. Analysis of Time Consumption in Model Training

The time consumption of model training, which encompasses the total time expended to finalize the entire training process, stands as a pivotal metric for gauging the efficiency of machine learning and deep learning models. We executed an assessment of classifier performance across seven distinct scenarios. Specifically, we documented the temporal consumption of the baseline classifier, the classifier equipped solely with multilayer filters, the classifier with an exclusive noise injection mechanism, and our proposed classifier. During the experimental configuration, both CNN and DNN models underwent 20 iterations on the MNIST dataset, 100 iterations on the CIFAR10 dataset using both CNN and VGG16, 30 iterations on ResNet50, 200 iterations on VGG16 for the CIFAR100 dataset, and 18 iterations on ResNet50. We employed three datasets, segregating them into training and validation subsets at an 8:2 ratio. To mitigate experimental inconsistencies, we conducted three iterations of model training for each classification methodology.

Table 8 provides an overview of the time expenses required to train the model across seven distinct scenarios. A comparative analysis reveals that the additional time overhead resulting from the incorporation of multilayer filters is relatively minor, with a growth rate of less than 10%. However, the majority of the time is spent during the noise injection phase, where the mechanism doubles the size of the training dataset. Although the noise injection mechanism is essential for enhancing the model’s robustness, it does lead to considerably longer training durations. Future research could investigate more efficient noise injection strategies or refined training methodologies to reduce this time overhead, aiming to achieve a more favorable balance between model robustness and training efficiency.

5. Discussion

We propose a method using multilayer filter technology aimed at significantly improving the robustness of the image classifier model. Specifically, we embedded filtering layers in the vulnerable classifier network, thus constructing a powerful defense model that can effectively resist various types of attacks. At the same time, during the training phase of the classifier, we also introduced a noise injection mechanism to further strengthen the model’s defense ability. Through multiple experimental comparisons, we found that the multilayer filter defense model using a double-layer Gaussian filter with a filtering area of

3 \times 3

and a standard deviation of 1 achieved average accuracies of 92.7%, 70.5%, and 64.9% on the MNIST, CIFAR10, and CIFAR100 datasets, respectively. To verify the practical effectiveness of this method, we designed six adversarial attack scenarios on the MNIST, CIFAR10, and CIFAR100 datasets, and conducted comprehensive comparative tests between our model and two filter-based defense models DIP and LPIF. The experimental outcomes reveal that our integrated method maintains an average accuracy of 71.9% across the six adversarial attack scenarios (FGSM, BIM, PGD, C&W, AutoAttack, and VNIFGSM). The results indicate that by introducing additional filtering layers and combining noise injection mechanisms, we have successfully achieved stronger robustness in learning-based image classification systems under resource constraints. Notably, this method excels not only on small-scale datasets but also exhibits robustness on large-scale datasets (miniImageNet) and modern models (EfficientNet and WideResNet). Upon assessing the time required for model training, we determined that the additional time incurred by the inclusion of multi-layer filters is relatively minor, with an increase of under 10%. Nonetheless, the primary time expenditure during training stems from the noise injection phase, which results in a doubling of the dataset size. While noise injection enhances model robustness, it also notably extends the training duration. Consequently, future research may wish to investigate more efficient noise injection methodologies or refine training techniques to mitigate time costs and attain a harmonious balance between model robustness and training efficiency.

Author Contributions

Conceptualization, M.W. and Z.L.; methodology, M.W. and Z.L.; software, M.W.; validation, M.W. and Z.L.; formal analysis, M.W.; investigation, M.W.; resources, M.W.; data curation, M.W.; writing—original draft preparation, M.W.; writing—review and editing, Z.L.; visualization, Z.L.; supervision, Z.L.; project administration, Z.L.; funding acquisition, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The MNIST data presented in this study are openly available at http://yann.lecun.com/exdb/mnist/ (accessed on 1 September 2024) [59]. The CIFAR-10 and CIFAR-100 data presented in this study are openly available at http://www.cs.toronto.edu/~kriz/cifar.html (accessed on 1 September 2024) [60].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ahmed, M.W.; Jalal, A. Robust Object Recognition with Genetic Algorithm and Composite Saliency Map. In Proceedings of the 2024 5th International Conference on Advancements in Computational Sciences (ICACS), Lahore, Pakistan, 19–20 February 2024; pp. 1–7. [Google Scholar]
Dhaif, Z.S.; El Abbadi, N.K. A Review of Machine Learning Techniques Utilised in Self-Driving Cars. Iraqi J. Comput. Sci. Math. 2024, 5, 205–219. [Google Scholar]
Tian, Y.; Fan, L.; Chen, K.; Katabi, D.; Krishnan, D.; Isola, P. Learning vision from models rivals learning vision from data. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 17–21 July 2024; pp. 15887–15898. [Google Scholar]
Wang, W.; Sun, Y.; Li, W.; Yang, Y. Transhp: Image classification with hierarchical prompting. Adv. Neural Inf. Process. Syst. 2024, 36, 28187–28200. Available online: https://openreview.net/forum?id=vpQuCsZXz2 (accessed on 1 September 2024).
Costa, J.C.; Roxo, T.; Proença, H.; Inácio, P.R.M. How Deep Learning Sees the World: A Survey on Adversarial Attacks and Defenses. IEEE Access 2024, 12, 61113–61136. [Google Scholar] [CrossRef]
Liang, H.; He, E.; Zhao, Y.; Jia, Z.; Li, H. Adversarial attack and defense: A survey. Electronics 2022, 11, 1283. [Google Scholar] [CrossRef]
Zhou, S.; Liu, C.; Ye, D.; Zhu, T.; Zhou, W.; Yu, P.S. Adversarial attacks and defenses in deep learning: From a perspective of cybersecurity. ACM Comput. Surv. 2022, 55, 1–39. [Google Scholar] [CrossRef]
Pinot, R.; Ettedgui, R.; Rizk, G.; Chevaleyre, Y.; Atif, J. Randomization matters How to defend against strong adversarial attacks. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual Event, 13–18 July 2020; pp. 7717–7727. [Google Scholar]
Wang, Z.; Guo, H.; Zhang, Z.; Liu, W.; Qin, Z.; Ren, K. Feature importance-aware transferable adversarial attacks. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual Event, 11–17 October 2021; pp. 7639–7648. [Google Scholar]
Yuan, Z.; Zhang, J.; Jia, Y.; Tan, C.; Xue, T.; Shan, S. Meta gradient adversarial attack. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Virtual Event, 11–17 October 2021; pp. 7748–7757. [Google Scholar]
Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and harnessing adversarial examples. arXiv 2014, arXiv:1412.6572. Available online: http://arxiv.org/abs/1412.6572 (accessed on 1 September 2024).
Wang, X.; He, K. Enhancing the transferability of adversarial attacks through variance tuning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Virtual Event, 11–17 October 2021; pp. 1924–1933. [Google Scholar]
Taran, O.; Rezaeifar, S.; Holotyak, T.; Voloshynovskiy, S. Defending against adversarial attacks by randomized diversification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 11226–11233. [Google Scholar]
Melis, M.; Demontis, A.; Biggio, B.; Brown, G.; Fumera, G.; Roli, F. Is deep learning safe for robot vision? adversarial examples against the icub humanoid. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 751–759. [Google Scholar]
Smutz, C.; Stavrou, A. When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors. In Proceedings of the NDSS, San Diego, CA, USA, 21–24 February 2016. [Google Scholar]
Kundu, S.; Nazemi, M.; Beerel, P.A.; Pedram, M. DNR: A Tunable Robust Pruning Framework Through Dynamic Network Rewiring of DNNs. In Proceedings of the 26th Asia and South Pacific Design Automation Conference, ASPDAC ’21, Tokyo, Japan, 18–21 January 2021; pp. 344–350. [Google Scholar] [CrossRef]
Chen, E.; Lee, C. LTD: Low Temperature Distillation for Robust Adversarial Training. arXiv 2021, arXiv:2111.02331. Available online: http://arxiv.org/abs/2111.02331 (accessed on 1 September 2024).
Gao, X.; Saha, R.K.; Prasad, M.R.; Roychoudhury, A. Fuzz testing based data augmentation to improve robustness of deep neural networks. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, Seoul, Republic of Korea, 27 June–19 July 2020; pp. 1147–1158. [Google Scholar]
Theagarajan, R.; Bhanu, B. Defending Black Box Facial Recognition Classifiers Against Adversarial Attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 812–813. [Google Scholar]
Meng, D.; Chen, H. Magnet: A two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 September 2017; pp. 135–147. [Google Scholar]
Zhou, D.; Wang, N.; Peng, C.; Gao, X.; Wang, X.; Yu, J.; Liu, T. Removing Adversarial Noise in Class Activation Feature Space. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Virtual Event, 11–17 October 2021; pp. 7878–7887. [Google Scholar]
Abusnaina, A.; Wu, Y.; Arora, S.; Wang, Y.; Wang, F.; Yang, H.; Mohaisen, D. Adversarial Example Detection Using Latent Neighborhood Graph. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Virtual Event, 11–17 October 2021; pp. 7687–7696. [Google Scholar]
Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 2008, 20, 61–80. [Google Scholar] [CrossRef] [PubMed]
Ho, C.H.; Vasconcelos, N. DISCO: Adversarial defense with local implicit functions. Adv. Neural Inf. Process. Syst. 2022, 35, 23818–23837. [Google Scholar]
Dhillon, G.S.; Azizzadenesheli, K.; Lipton, Z.C.; Bernstein, J.D.; Kossaifi, J.; Khanna, A.; Anandkumar, A. Stochastic Activation Pruning for Robust Adversarial Defense. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Xie, C.; Wang, J.; Zhang, Z.; Ren, Z.; Yuille, A. Mitigating Adversarial Effects Through Randomization. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Lecuyer, M.; Atlidakis, V.; Geambasu, R.; Hsu, D.; Jana, S. Certified robustness to adversarial examples with differential privacy. In Proceedings of the 2019 IEEE Symposium on Security and Privacy (SP), San Francisco, CA, USA, 19–23 May 2019; pp. 656–672. [Google Scholar]
Cohen, J.; Rosenfeld, E.; Kolter, Z. Certified adversarial robustness via randomized smoothing. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 1310–1320. [Google Scholar]
Li, B.; Chen, C.; Wang, W.; Carin, L. Certified adversarial robustness with additive noise. Proc. Adv. Neural Inf. Process. Syst. 2019, 32, 9459–9469. [Google Scholar]
Wang, B.; Yuan, B.; Shi, Z.; Osher, S.J. Resnets ensemble via the feynman-kac formalism to improve natural and robust accuracies. In Proceedings of the 33rd Conference on Neural Information Processing Systems (NIPS 2019), Vancouver, BC, Canada, 8–14 December 2019. [Google Scholar]
Wu, Q.; Ye, H.; Gu, Y. Guided diffusion model for adversarial purification from random noise. arXiv 2022, arXiv:2206.10875. Available online: http://arxiv.org/abs/2206.10875 (accessed on 1 September 2024).
Xiao, C.; Chen, Z.; Jin, K.; Wang, J.; Nie, W.; Liu, M.; Anandkumar, A.; Li, B.; Song, D. DensePure: Understanding Diffusion Models towards Adversarial Robustness. In Proceedings of the Workshop on Trustworthy and Socially Responsible Machine Learning, NeurIPS 2022, New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
Wang, Z.; Pang, T.; Du, C.; Lin, M.; Liu, W.; Yan, S. Better Diffusion Models Further Improve Adversarial Training. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; Krause, A., Brunskill, E., Cho, K., Engelhardt, B., Sabato, S., Scarlett, J., Eds.; Volume 202, pp. 36246–36263. [Google Scholar]
Karras, T.; Aittala, M.; Aila, T.; Laine, S. Elucidating the design space of diffusion-based generative models. Adv. Neural Inf. Process. Syst. 2022, 35, 26565–26577. [Google Scholar]
Rakin, A.S.; Yi, J.; Gong, B.; Fan, D. Defend deep neural networks against adversarial examples via fixed and dynamic quantized activation functions. arXiv 2018, arXiv:1807.06714. Available online: http://arxiv.org/abs/1807.06714 (accessed on 1 September 2024).
Khalid, F.; Ali, H.; Tariq, H.; Hanif, M.A.; Rehman, S.; Ahmed, R.; Shafique, M. QuSecNets: Quantization-based Defense Mechanism for Securing Deep Neural Network against Adversarial Attacks. In Proceedings of the 2019 IEEE 25th International Symposium on On-Line Testing and Robust System Design (IOLTS), Rhodes, Greece, 1–3 July 2019; pp. 182–187. [Google Scholar]
Wu, F.; Yang, W.; Xiao, L.; Zhu, J. Adaptive Wiener Filter and Natural Noise to Eliminate Adversarial Perturbation. Electronics 2020, 9, 1634. [Google Scholar] [CrossRef]
Zhang, H.; Yao, Z.; Sakurai, K. Versatile Defense Against Adversarial Attacks on Image Recognition. arXiv 2024, arXiv:2403.08170. Available online: http://arxiv.org/abs/2403.08170 (accessed on 1 September 2024).
Xiao, Y.; Deng, X.; Yu, Z. Defending against Adversarial Attacks using Digital Image Processing. J. Phys. Conf. Ser. 2023, 2577, 012016. [Google Scholar] [CrossRef]
Ziyadinov, V.; Tereshonok, M. Low-Pass Image Filtering to Achieve Adversarial Robustness. Sensors 2023, 23, 9032. [Google Scholar] [CrossRef]
Dumoulin, V.; Visin, F. A guide to convolution arithmetic for deep learning. arXiv 2016, arXiv:1603.07285. Available online: http://arxiv.org/abs/1603.07285 (accessed on 1 September 2024).
Kurakin, A.; Goodfellow, I.; Bengio, S. Adversarial examples in the physical world. arXiv 2016, arXiv:1607.02533. Available online: http://arxiv.org/abs/1607.02533 (accessed on 1 September 2024).
Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards deep learning models resistant to adversarial attacks. arXiv 2017, arXiv:1706.06083. Available online: http://arxiv.org/abs/1706.06083 (accessed on 1 September 2024).
Carlini, N.; Wagner, D. Towards evaluating the robustness of neural networks. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2017; pp. 39–57. [Google Scholar]
Croce, F.; Hein, M. Reliable evaluation of adversarial robustness with an ensemble of diverse parameter-free attacks. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual Event, 13–18 July 2020; pp. 2206–2216. [Google Scholar]
Moosavi-Dezfooli, S.M.; Fawzi, A.; Frossard, P. Deepfool: A simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2574–2582. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. Available online: http://arxiv.org/abs/1409.1556 (accessed on 1 September 2024).
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Tan, M.; Le, Q.V. EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks. arXiv 2019, arXiv:1905.11946. Available online: http://arxiv.org/abs/1905.11946 (accessed on 1 September 2024).
Tan, M.; Le, Q.V. EfficientNetV2: Smaller Models and Faster Training. arXiv 2021, arXiv:2104.00298. Available online: http://arxiv.org/abs/2104.00298 (accessed on 1 September 2024).
Zagoruyko, S.; Komodakis, N. Wide Residual Networks. In Proceedings of the British Machine Vision Conference, York, UK, 19–22 September 2016; British Machine Vision Association: Durham, UK, 2016. [Google Scholar]
Wang, H.; Wu, X.; Huang, Z.; Xing, E.P. High-frequency component helps explain the generalization of convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 8684–8694. [Google Scholar]
Goossens, B.; Pizurica, A.; Philips, W. Image denoising using mixtures of projected Gaussian scale mixtures. IEEE Trans. Image Process. 2009, 18, 1689–1702. [Google Scholar] [CrossRef] [PubMed]
Lysaker, M.; Lundervold, A.; Tai, X.C. Noise removal using fourth-order partial differential equation with applications to medical magnetic resonance images in space and time. IEEE Trans. Image Process. 2003, 12, 1579–1590. [Google Scholar] [CrossRef]
Chung, I.; Park, S.; Kim, J.; Kwak, N. Feature-map-level Online Adversarial Knowledge Distillation. In Proceedings of the 37th International Conference on Machine Learning, Virtual Event, 13–18 July 2020; Volume 119, pp. 2006–2015. [Google Scholar]
Chen, G.; Choi, W.; Yu, X.; Han, T.; Chandraker, M. Learning efficient object detection models with knowledge distillation. Adv. Neural Inf. Process. Syst. 2017, 30, 742–751. [Google Scholar]
Furlanello, T.; Lipton, Z.C.; Tschannen, M.; Itti, L.; Anandkumar, A. Born again neural networks. arXiv 2018, arXiv:1805.04770. Available online: http://arxiv.org/abs/1805.04770 (accessed on 1 September 2024).
Orekondy, T.; Schiele, B.; Fritz, M. Knockoff nets: Stealing functionality of black-box models. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4954–4963. [Google Scholar]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Krizhevsky, A. Learning Multiple Layers of Features from Tiny Images. Master’s Thesis, University of Tront, Toronto, ON, Canada, 2009. [Google Scholar]
Russakovsky, O.; Deng, J.; Su, H.; Krause, J.; Satheesh, S.; Ma, S.; Huang, Z.; Karpathy, A.; Khosla, A.; Bernstein, M.; et al. ImageNet Large Scale Visual Recognition Challenge. Int. J. Comput. Vis. (IJCV) 2015, 115, 211–252. [Google Scholar] [CrossRef]
Vinyals, O.; Blundell, C.; Lillicrap, T.; Wierstra, D. Matching networks for one shot learning. Adv. Neural Inf. Process. Syst. 2016, 29, 3637–3645. [Google Scholar]
Nie, W.; Guo, B.; Huang, Y.; Xiao, C.; Vahdat, A.; Anandkumar, A. Diffusion models for adversarial purification. arXiv 2022, arXiv:2205.07460. Available online: http://arxiv.org/abs/2205.07460 (accessed on 1 September 2024).

Figure 1. The architectural layout of the CNN.

Figure 2. The architectural layout of the DNN.

Figure 3. The overall architecture of image recognition defense model with filter layer against adversarial examples.

Figure 4. Scanning methods of image filters: the filtering area scans the entire picture in the order from left to right and top to bottom.

Figure 5. The process of incorporating noise injection during the training data phase.

Figure 6. The detection results of the classifier model after adding noise to the training set. The black dashed line represents defense methods that have not adopted noise injection mechanisms.

Table 1. The number of test samples used in different attack scenarios.

Datasets	FGSM	BIM	PGD	CW	AutoAttack	VNIFGSM
MNIST	1080	3013	4420	5915	5876	5620
CIFAR-10	4509	5223	5212	2293	5619	5081
CIFAR-100	6239	6839	6825	6855	6875	6866

Table 2. Average accuracy of defense models with different image filter combinations on the MNIST dataset.

First Layer	Second Layer
First Layer	Null	Max	Min	Midpoint	Average	Gaussian
Max	0.86349	0.80559	0.79002	0.23039	0.81714	0.85185
Min	0.24844	0.23592	0.03955	0.01234	0.23602	0.23961
Midpoint	0.02344	0.01972	0.01234	0.01327	0.01921	0.01810
Average	0.85433	0.88133	0.81258	0.07882	0.84395	0.86244
Gaussian	0.89178	0.86613	0.63420	0.01234	0.85877	0.91329

Table 3. Average accuracy of defense models with different image filter combinations on the CIFAR10 dataset.

First Layer	Second Layer
First Layer	Null	Max	Min	Midpoint	Average	Gaussian
Max	0.56602	0.47965	0.58282	0.06860	0.55302	0.54232
Min	0.59879	0.60677	0.50553	0.08838	0.56375	0.57001
Midpoint	0.07436	0.20019	0.08563	0.05761	0.05438	0.14775
Average	0.62835	0.58955	0.59644	0.08838	0.61992	0.62959
Gaussian	0.63335	0.59730	0.62383	0.06546	0.62801	0.66676

Table 4. The accuracy of the image classifier using DIP, LPIF, and our defense on the MNIST dataset, the CIFAR-10 dataset, and the CIFAR-100 dataset. FGSM, BIM, PGD, C&W, AutoAttack, and VNIFGSM as baselines attacks. The image recognition classifier adopts CNN, DNN, VGG16, and ResNet50 architecture, respectively.

Attack	Defense	MNIST		CIFAR10			CIFAR100
Attack	Defense	CNN	DNN	DNN	VGG16	ResNet50	VGG16	ResNet50
FGSM	DIP (2023) [39]	0.83222	0.60648	0.61143	0.37918	0.35497	0.24902	0.35442
	LPIF (2023) [40]	0.87256	0.65574	0.70605	0.76355	0.82319	0.43198	0.52629
	MLFD (Ours)	0.94256	0.67222	0.76297	0.77154	0.84574	0.44723	0.54943
BIM	DIP (2023) [39]	0.87321	0.68469	0.38390	0.38534	0.45134	0.26314	0.47384
	LPIF (2023) [40]	0.91545	0.80672	0.58750	0.63211	0.70308	0.37452	0.67570
	MLFD (Ours)	0.94664	0.84135	0.59862	0.63529	0.70470	0.42786	0.69665
PGD	DIP (2023) [39]	0.89276	0.73076	0.54229	0.38658	0.55810	0.26150	0.47556
	LPIF (2023) [40]	0.91908	0.84484	0.59477	0.59051	0.66361	0.39228	0.65303
	MLFD (Ours)	0.94184	0.86873	0.59925	0.63494	0.70250	0.43419	0.70332
C&W	DIP (2023) [39]	0.90580	0.81568	0.59861	0.42256	0.74056	0.38984	0.59375
	LPIF (2023) [40]	0.92153	0.90213	0.73496	0.77202	0.83548	0.50473	0.73226
	MLFD (Ours)	0.94324	0.94235	0.79007	0.77570	0.88081	0.54676	0.77240
AutoAttack	DIP (2023) [39]	0.72425	0.77357	0.59860	0.41551	0.45827	0.38652	0.59107
	LPIF (2023) [40]	0.91857	0.89419	0.56649	0.50486	0.62317	0.45371	0.66516
	MLFD (Ours)	0.93976	0.93262	0.60243	0.52458	0.63503	0.50223	0.68746
VNIFGSM	DIP (2023) [39]	0.84227	0.73079	0.55705	0.40158	0.50637	0.32980	0.51549
	LPIF (2023) [40]	0.91254	0.83871	0.63081	0.62862	0.70782	0.44808	0.65922
	MLFD (Ours)	0.94264	0.86498	0.65929	0.64444	0.73397	0.48867	0.68821
Average	DIP (2023) [39]	0.84508	0.72366	0.54864	0.39845	0.51160	0.31330	0.50068
	LPIF (2023) [40]	0.90995	0.82372	0.63676	0.64861	0.72605	0.43421	0.65194
	MLFD (Ours)	0.94278	0.85370	0.66877	0.66441	0.75045	0.47449	0.68291

Table 5. The performance of the model in terms of average accuracy under different filtering area sizes and standard deviation combinations.

Size	Standard Deviation ( $σ$ )
Size	1	2	5	10
$3 \times 3$	0.70451	0.69819	0.68600	0.68296
$5 \times 5$	0.68220	0.65230	0.62666	0.61941
$9 \times 9$	0.67396	0.60564	0.57471	0.56534
$13 \times 13$	0.66691	0.61677	0.54001	0.54274
$17 \times 17$	0.67881	0.61373	0.51017	0.50090

Table 6. The average accuracy comparison results for each layer model on the three datasets are displayed across six distinct adversarial attack scenarios.

Number of Layers	Dataset	Attack						Average Accuracy
Number of Layers	Dataset	FGSM	BIM	PGD	CW	AA	VNIFGSM	Average Accuracy
1	MNIST	0.870	0.930	0.935	0.931	0.943	0.940	0.925
	CIFAR10	0.770	0.603	0.606	0.802	0.650	0.651	0.680
	CIFAR100	0.383	0.682	0.687	0.749	0.662	0.668	0.638
2	MNIST	0.891	0.935	0.945	0.943	0.926	0.922	0.927
	CIFAR10	0.789	0.632	0.632	0.812	0.681	0.683	0.705
	CIFAR100	0.384	0.712	0.723	0.787	0.641	0.644	0.649
3	MNIST	0.828	0.912	0.919	0.911	0.912	0.918	0.900
	CIFAR10	0.765	0.604	0.603	0.782	0.631	0.637	0.677
	CIFAR100	0.389	0.679	0.688	0.747	0.613	0.618	0.622
4	MNIST	0.800	0.898	0.914	0.910	0.899	0.892	0.884
	CIFAR10	0.746	0.591	0.590	0.775	0.612	0.619	0.656
	CIFAR100	0.399	0.648	0.661	0.736	0.602	0.609	0.609
5	MNIST	0.841	0.893	0.893	0.892	0.896	0.890	0.883
	CIFAR10	0.737	0.598	0.597	0.772	0.603	0.607	0.653
	CIFAR100	0.374	0.667	0.666	0.734	0.591	0.601	0.605

Table 7. The testing accuracy (%) under clean images and AutoAttack (AA) on the CIFAR-10 and ImageNet datasets.

Dataset	Method	Architecture	Generated	Params	Clean	AA
CIFAR10	DiffPure (2022) [63]	WRN-28-10	1M	-	$89.37$	$75.68$
	Wang et al. (2023) [33]	WRN-28-10	1M	-	$90.24$	$76.34$
	MLFD (Ours)	WRN-28-10	-	-	90.61	78.93
miniImageNet	LPIF (2023) [40]	EfficientNet-B0	-	5.3M	$72.64$	$57.38$
miniImageNet	MLFD (Ours)	EfficientNet-B0	-	5.3M	74.18	60.63

Table 8. The overview of the time expenses required to train the model across seven distinct scenarios. The unit is hours.

Dataset	Classifier	Original	With Multilayer Filter		With Noise Injection		MLFD
Dataset	Classifier	Original	Time	Rate	Time	Rate	Time	Rate
MNIST	DNN	0.020	0.021	5.39%	0.041	105.87%	0.046	128.82%
MNIST	CNN	0.030	0.031	10.60%	0.062	102.15%	0.064	109.78%
CIFAR10	CNN	0.146	0.149	2.60%	0.304	108.15%	0.307	110.39%
	VGG16	0.261	0.272	4.08%	0.478	82.65%	0.525	100.74%
	ResNet50	0.499	0.503	0.79%	0.932	86.54%	1.255	151.02%
CIFAR100	VGG16	0.612	0.668	9.06%	1.249	103.84%	1.280	108.87%
CIFAR100	ResNet50	0.302	0.325	7.64%	0.565	87.23%	0.656	117.18%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, M.; Liu, Z. Defense against Adversarial Attacks in Image Recognition Based on Multilayer Filters. Appl. Sci. 2024, 14, 8119. https://doi.org/10.3390/app14188119

AMA Style

Wang M, Liu Z. Defense against Adversarial Attacks in Image Recognition Based on Multilayer Filters. Applied Sciences. 2024; 14(18):8119. https://doi.org/10.3390/app14188119

Chicago/Turabian Style

Wang, Mingde, and Zhijing Liu. 2024. "Defense against Adversarial Attacks in Image Recognition Based on Multilayer Filters" Applied Sciences 14, no. 18: 8119. https://doi.org/10.3390/app14188119

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Defense against Adversarial Attacks in Image Recognition Based on Multilayer Filters

Abstract

1. Introduction

2. Background

2.1. Adversarial Samples

2.2. Adversarial Attack

2.3. Image Recognition Classifiers

3. Materials and Methods

3.1. Robustness Defense

3.2. Filtering Area

3.3. Filter Layer

3.4. Noise Injection

3.5. Black Box Attack Environment Settings

3.6. Dataset

3.7. Hardware and Methods Used in the Simulation

4. Experimental Results

4.1. Experimental Setup

4.2. Combination of Multilayer Image Filters

4.3. Performance of the Proposed Defense

4.4. Evaluation of the Impact of Noise Injection on the Robustness of Classifiers

4.5. The Size of the Filtering Area and Standard Deviation

4.6. Number of Layers for Multilayer Image Filters

4.7. The Effectiveness of the Method

4.8. Analysis of Time Consumption in Model Training

5. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI