Lightweight Robust Image Classifier Using Non-Overlapping Image Compression Filters

Wang, Mingde; Liu, Zhijing

doi:10.3390/app14198636

Open AccessArticle

Lightweight Robust Image Classifier Using Non-Overlapping Image Compression Filters

by

Mingde Wang

and

Zhijing Liu

^*

Computer Information Application Research Center, School of Computer Science and Technology, Xidian University, Xi’an 710071, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(19), 8636; https://doi.org/10.3390/app14198636

Submission received: 6 August 2024 / Revised: 18 September 2024 / Accepted: 21 September 2024 / Published: 25 September 2024

(This article belongs to the Special Issue Deep Learning for Image Recognition and Processing)

Download

Browse Figures

Versions Notes

Abstract

:

Machine learning systems, particularly in the domain of image recognition, are susceptible to adversarial perturbations applied to input data. These perturbations, while imperceptible to humans, have the capacity to easily deceive deep learning classifiers. Current defense methods for image recognition focus on using diffusion models and their variants. Due to the depth of diffusion models and the large amount of computations generated during each inference process, the GPU and storage performance of the device are extremely high. To address this problem, we propose a new defense-based non-overlapping image compression filter for image recognition classifiers against adversarial attacks. This method inserts a non-overlapping image compression filter before the classifier to make the results of the classifier invariant under subtle changes in images. This method does not weaken the adversarial robustness of the model and can reduce the computational cost during the training process of the image classification model. In addition, our method can be easily integrated with existing image classification training frameworks with only some minor adjustments. We validate our results by performing a series of experiments under three different convolutional neural network architectures (VGG16, ResNet34, and Inception-ResNet-v2) and on different datasets (CIFAR10 and CIFAR100). The experimental results show that under the Inception-ResNet-v2 architecture, our method achieves an average accuracy of up to 81.15% on the CIFAR10 dataset, fully demonstrating its effectiveness in mitigating adversarial attacks. In addition, under the WRN-28-10 architecture, our method achieves not only 91.28% standard accuracy on the CIFAR10 dataset but also 76.46% average robust accuracy. The test experiment on the model training time consumption shows that our defense method has an advantage in time cost, proving that our defense method is a lightweight and efficient defense strategy.

Keywords:

image recognition; adversarial attack; defense method; deep learning; machine learning

1. Introduction

In image classification tasks, deep learning has demonstrated remarkable and impressive performance. However, recent research on adversarial attacks has found that deep learning models are vulnerable in the presence of adversarial images. Costa et al. [1] provided a survey article on the latest adversarial attacks, grouped by attack capability, and modern defenses, grouped by protection strategies. Adversarial images are generated by adding subtle adversarial perturbations to legitimate input images. Such carefully crafted perturbations are visually imperceptible to the human cognitive system, but they can cause deep learning models to misclassify the input images [2,3,4]. As depicted in Figure 1, adversarial perturbations have been intentionally introduced into the handwritten image of the numeral “3” to create an adversarial image. While human observers can effortlessly discern the original numeral “3”, the image classification model incorrectly identifies it as the numeral “8”. This phenomenon highlights the susceptibility of machine learning models to adversarial examples.

At first, researchers adopted a strategy of building one or more adversarial sample detectors to identify whether the input image was an adversarial example. Once such samples are identified, they are reset to the structure of normal samples through the transformation of the reformer network. It is worth noting that this detector is essentially a classifier, and it also carries the risk of being attacked, which may result in normal samples being misjudged as adversarial samples.

The diffusion model, as a powerful generative model, surpasses GANs in image generation [5]. The diffusion model consists of two processes: (1) a forward diffusion process, which converts data into noise by gradually adding noise; and (2) a reverse generation process, which starts with noise, gradually removes it, and generates data. In the generation process, the diffusion model purifies the noise samples, which is similar to the purification model. DiffPure [6] uses the Denoising Diffusion Probability Model (DDPM) [7] for adversarial purification. Wang et al. [8] demonstrated that a diffusion model with higher efficiency and image quality can improve the robustness of image classifiers using the Explanatory Diffusion Model (EDM) [9]. However, diffusion models are sensitive to image colors, and as we increase the resolution and diffusion length T of the image, the inference speed becomes unbearable.

To address this issue, recent research has proposed a defense strategy based on image filters. This type of filter is simple to execute and performs well in preserving the original color information of the image, providing an effective and practical solution for processing adversarial samples. Yuyang X. et al. proposed a defense based on digital image processing (DIP) [10], which uses spatial filters and thresholds for image preprocessing to remove adversarial disturbances. This method can easily and effectively resist adversarial attacks. While effective in grayscale imagery, this method exhibits limitations in its application to color images, as the utilization of threshold techniques may result in partial loss of critical image information. Vadim Z. and Maxim T. proposed a low-pass image-filtering (LPIF) [11] technique to reduce the impact of high-frequency noise on cellular neural networks. This approach offers advantages in terms of resource efficiency and ease of implementation. However, it may be ineffective against low-frequency adversarial attacks.

Mao et al. [12] introduced the Universal Defense Filter (UDFilter) to counter adversarial patch attacks. The UDFilter employs defense filters of equal size to overlay the input image, thereby mitigating the adverse effects of adversarial patches. However, the generation of these defense filters relies on an adaptive learning algorithm, necessitating the training of a model with a substantial number of parameters, which results in high computational costs. Chen and Lee [13] posited that not all adversarial examples in adversarial training are equally significant. They proposed a method to eliminate these examples through data filtering without compromising robustness. Nonetheless, this approach requires the preparation of adversarial samples. While it enhances resistance to adversarial examples, it may inadvertently impair the model’s classification ability for normal samples.

Inspired by image-filtering technology, this paper proposes a defense strategy based on non-overlapping image compression filters, aimed at reducing the computational cost during the training process of image classification models without sacrificing robustness. In addition, our method can easily integrate with existing image classification training frameworks with only minor adjustments. Non-overlapping image compression filters can capture extreme information that runs through the entire image and output low-resolution images. This process effectively ignores most of the added adversarial noise. Its core lies in imitating the human visual system to perform dimensionality reduction on data. At the same time, this method also reduces the feature dimensions output by the convolutional layer, thereby reducing network parameters and computational costs. We evaluate the performance of image recognition classifiers by assessing the recognition accuracy of adversarial images generated by various image adversarial attacks. The experimental results show that the defense strategy using non-overlapping image compression filters performs better compared to the latest defense measures. It should be emphasized that the goal of this article is not to design a robust image recognition classifier but to develop an effective defense mechanism to protect existing image recognition classifiers from adversarial attacks, simultaneously reducing the time cost of model training.

2. Related Works

Table 1 summarizes the latest advancements in adversarial defense research within the field of image recognition. We categorize adversarial defense into three types based on their motivations. The first motivation is altering the model to enhance the performance of the defense system. Most defense system designers prioritize improving the model’s performance, which is the most direct approach to enhancing defense capabilities. Typically, modifying the model is given precedence, which involves pruning the architecture of the classifier [14,15], using a distillation framework [16], implementing learnable parametric activation functions [17], applying a data variant strategy [18], or adding pre/post-processing layers to the classifier’s architecture [2,3]. For example, Dynamic Network Rewiring (DNR) [15] generates pruned DNNs with high robustness and standard accuracy by employing a unified constrained optimization formula and combining ultra-high model compression with robust adversarial training using a mixed loss function. These methods modify specific deep learning model architectures, affecting their adaptability and portability.

Using auxiliary tools involves having an independent module that is able to process the input before it is passed to the classifier [19]. For example, MagNet [20] includes one or more separate detector networks and a reformer network. The reformer network moves adversarial examples toward the manifold of normal examples, which is effective for correctly classifying adversarial examples with small perturbations. In addition, the independent modules include a class activation feature-based denoiser (CAFD) [21], a latent neighborhood graph (LNG) [22], and an auxiliary network for the classifier [23]. However, these methods require a significant amount of overhead to maintain the additional module.

The second motivation lies in the utilization of adversarial examples. Adversarial training serves as a crucial approach to enhancing the robustness of deep learning models. During the adversarial training process, minor perturbations are introduced into the samples, enabling the deep learning model to adapt to such changes and thereby demonstrate robustness against adversarial examples [14,24,25]. However, adversarial training increases the cost and time of model training. Adversarial purification methods utilize generative models to purify adversarial examples before feeding them into the classification model [26,27]. Common methods include DiffPure [6], the guided diffusion model for adversarial purification (GDMAP) [28], and DensePure [29]. Furthermore, Wang et al. [8] leveraged the latest diffusion model [9], demonstrating that diffusion models with higher efficiency and image quality can directly translate into better robust accuracy. However, adversarial purification methods often need to strike a balance between natural and adversarial accuracy, resulting in unsatisfactory performance. This implies that while resistance to adversarial examples is improved, it may potentially compromise the classification capabilities of the model for ordinary samples.

Table 1. Recent studies of adversarial defense in the domain of image recognition.

Method	Year	Motivation	Innovative Approach	Inadequacy
DNR [15]	2021	Improving performance	Combines ultra-high model compression with robust adversarial training to generate a pruned DNN.	Lower adaptability and portability
LTD [16]	2021	Improving performance	Using a distillation framework, different fixed temperatures are used in both teacher and student models.	Lower adaptability and portability
PSSiLU [17]	2022	Improving performance	Uses learnable parametric activation functions (PAFs) to improve performance.	Lower adaptability and portability
SENSEI [18]	2020	Improving performance	Replaces each data point with a suitable variant or keeps it unchanged.	Lower adaptability and portability
Magnet [20]	2017	Improving performance	Uses one or more separate detector networks and a reformer network.	Overhead to maintain an additional module
CAFD [21]	2021	Improving performance	Employs a defense mechanism called defense distillation to reduce the effectiveness of adversarial samples on DNNs.	Overhead to maintain an additional module
DG [22]	2021	Improving performance	Uses a Graph Neural Network (GNN) [30] to construct a Latent Neighborhood Graph (LNG) for each original example.	Overhead to maintain an additional module
DISCO [23]	2022	Improving performance	Adversarial Defense with Local Implicit Functions (DISCO) serves as an adjunct network to the classifier.	Overhead to maintain an additional module
HAT [24]	2021	Adversarial training	Additional mislabeled examples were added during the training process.	Reduced generalization ability
FAT [25]	2021	Adversarial training	Performing stochastic smoothing effectively optimizes the internal maximization problem.	Reduced generalization ability
DiffPure [6]	2022	Adversarial training	Uses the denoising diffusion probabilistic model (DDPM) [7] for adversarial purification.	Cost of maintaining adversarial images
GDMAP [28]	2022	Adversarial training	Takes pure Gaussian noise as the initial input and gradually denoises it into the adversarial image.	Cost of maintaining adversarial images
DensePure [29]	2022	Adversarial training	Iterative denoising of input images using different random seeds to obtain multiple reverse samples.	Cost of maintaining adversarial images
Wang et al. [8]	2023	Adversarial training	Uses a diffusion model [9].	Cost of maintaining adversarial images
QuSecNets [31]	2019	Increase insensitivity	Uses a quantization-based defense mechanism to secure deep neural networks against adversarial attacks.	Complex computing resources
DIP [10]	2023	Increase insensitivity	Uses spatial filters and thresholds for image preprocessing to remove adversarial disturbances.	Limitations for color images
LPIF [11]	2023	Increase insensitivity	Uses the low-pass image-filtering technique to reduce the impact of high-frequency noise on cellular neural networks.	Poor effect against low-frequency attacks

The third motivation is to increase insensitivity to subtle changes in input. Increasing insensitivity is another effective defense method. The attacker generates adversarial samples by adding subtle perturbations that are invisible to the human eye. If the classifier model maps this subtle range of the input to the same output, the attacker needs to increase the attack intensity to achieve the purpose of the attack. In the field of image recognition and classification, attack intensity refers to the degree of modification of the pixel values, such that the adversarial image will be noticed by the human eye [32,33]. Common methods include data quantification [31,34] and image filtering [35]. Faiq Khalid et al. proposed a quantization-based defense mechanism for securing deep neural networks (DNNs) against adversarial attacks (QuSecNets) [31]. QuSecNets uses data quantization technology to convert the input into the output containing only

2^{n}

discrete levels ranging from

- 1

to

+ 1

. For example, for a grayscale image, the range of each pixel value is

(0, 255)

. When

n = 2

, the output will be quantized into four discrete levels:

- 1, - 0.33, 0.33

, and 1. For a robust model, the selection of the control parameter n and the range parameter R of each quantization level is crucial. QuSecNets obtains these parameters by training a neural network. It requires complex computing resources and raw data.

Based on the motivation of improving insensitivity, we propose a non-overlapping image compression filter strategy aimed at mining significant features of receptive fields. Adversarial attacks rely on minimal tampering with the input image to disrupt the classifier’s decision-making process. To address this issue, we introduce a ‘non-overlapping’ mechanism that effectively captures and stabilizes extreme value points in the receptive field, significantly reducing the impact of adversarial noise on the model. The implementation of this strategy aims to significantly enhance the robustness of the model against potential adversarial attacks.

3. Materials and Methods

In this section, we define the threat models used for evaluating our defense. Then, image adversarial attacks and substitute models are introduced. In this paper, we focus on image recognition classification tasks. We describe how we adapt an image filter into a classifier model to address the problem of vulnerability to adversarial perturbations.

3.1. Threat Models

Given a test image–label pair

(x, y)

, the goal of the deep learning network is to learn the mapping function

f_{θ} (x)

from the input image

x \in R^{d}

to the associated target label y, where

θ

is a set of parameters to be learned during training. If the instance x is correctly classified such that

y = f_{θ} (x)

, the adversary’s goal is to introduce a small number of engineered perturbations

δ \in R^{d}

to x, such that

x^{'} = x + δ

, while still maintaining its perceptual similarity to x, to deceive the classifier into making a mistake. This paper models the adversarial attack using the following Equation (1):

\begin{matrix} min_{δ} f_{θ} (x + δ) \neq y \\ s . t . f_{θ} (x) = y \\ {| | δ | |}_{l_{p}} \leq ϵ \end{matrix}

(1)

The adversary is usually assumed to be constrained by an

l_{p}

-norm, so that

{| | δ | |}_{l_{p}} \leq ϵ

, where

ϵ

bounds the adversary’s freedom to alter the input. All attacks in this study employed the

l_{\infty}

-norm strategy, with the core principle being to ensure that the change in each pixel value (denoted as Delta) did not exceed the set epsilon value limit.

3.2. Adversarial Attacks

Adversarial attacks can be categorized into two types based on the access level granted to the attacker: white-box and black-box attacks [36]. White-box attacks, such as the Fast Gradient Sign Method (FGSM) [37], Basic Iterative Method (BIM) [38], and Projected Gradient Descent (PGD) [39], necessitate comprehensive knowledge of the target model, including its architecture and parameters. Consequently, these attacks require access to the model’s gradients and direct interaction with it. On the other hand, black-box attacks only require the attacker to interact with the target model to obtain its output information, without needing full details about its internal workings [40]. It is evident that black-box attacks more closely resemble actual attack scenarios in the real world; as such, research into this type of attack is considered more practical.

The Fast Gradient Sign Method (FGSM) and its multiple variations produce adversarial examples that fool a model with high confidence while requiring only a small perturbation. The aforementioned white-box attacks are described as follows:

Fast Gradient Sign Method [37]: This attack uses the sign of the gradients at every pixel to determine the direction of perturbation. Adversarial samples $x^{a d v}$ are generated according to Equation (2):

$x^{a d v} = x + ϵ \times s i g n (▽_{x} L o s s (x, y))$

(2)
Basic Iterative Method (BIM) [38]: This attack extends the FGSM attack [37] by iterating it multiple times with a small step size. Adversarial samples are generated according to Equation (3):

$x_{n + 1}^{a d v} = C l i p_{ϵ} (x_{n} + α \times s i g n (▽_{x} L o s s (x, y)))$

(3)
Projected Gradient Descent (PGD) [39]: This attack computes the gradient in the direction of the highest loss and projects it back to the $l_{p}$ norm around the sample, according to Equation (4):

$x_{n + 1}^{a d v} = \overset{ϵ}{Π} (x_{n} + α \times s i g n (▽_{x} L o s s (x, y)))$

(4)

In Equations (2)–(4),

L o s s (x, y)

is the loss function used to train the classifier,

α

is the iterative step size, and

C l i p (\cdot)

and

Π (\cdot)

are the clipping and projection functions, respectively.

The Square attack [40] is a score-based black-box adversarial attack that does not depend on local gradient information, thereby circumventing the effects of gradient masking. This method employs a strategy of random search, selecting local square updates at random positions, and ensuring that the perturbation in each iteration is roughly situated at the boundary of the feasible set.

3.3. The Proposed Method

Armed with the background knowledge of image recognition classification in the adversarial environment, we now introduce a defense approach to mitigate the threat of exposing classifiers to adversarial images. As shown in Figure 2, the general process of the traditional image classification method to solve the problem is as follows: based on the collected image data, we analyze the differences between images with different labels (such as the objects that appear in the images, their shapes and positions, etc.) and then train a classification system that can distinguish these images based on a machine learning algorithm. This method is vulnerable to adversarial attacks and loses its effectiveness. The non-overlapping image compression filter scans the entire image through the receptive field and then calculates the feature mapping function of the receptive field to generate a low-resolution image. The non-overlapping image compression filter significantly eliminates adversarial disturbances in the training images. The classifier trained on the processed images becomes more robust and achieves better performance and efficiency compared to other defenses.

3.4. Receptive Field

The receptive field is defined as the region in the input space that a particular feature of a convolutional neural network is examining [41]. It is a two-dimensional concept, so the receptive field is also two-dimensional. Therefore, the receptive field is designed as an

a \times b

rectangle. The receptive field scans the entire image from left to right and top to bottom. There are two scanning methods—overlapping and non-overlapping—as shown in Figure 3. In adversarial settings, non-overlapping filters perform better.

Figure 3a shows the scanning method of the receptive field of the overlapping image filter. The overlapping image filter outputs a filtered image with the same size as the input image. Each pixel of the filtered image is affected by its neighbors. Each pixel of the input image is reused by the receptive field. Given the input image size

n \times n

and the receptive field size

a \times b

, a filtered image of size

n \times n

is calculated by all pixels of the receptive field. Let

S_{x, y}

denote the set of coordinates of the input image affected by the receptive field of the filtered image at point

(x, y)

, as shown in the following Equation (5):

\begin{matrix} S_{x, y} = {(u, v) | u \in [x - \frac{a}{2}, x + \frac{a}{2}) \land [0, n), \\ v \in [y - \frac{b}{2}, y + \frac{b}{2}) \land [0, n), u, v \in N} \end{matrix}

(5)

Figure 3c shows the scanning method of the receptive field of the non-overlapping image filter. The non-overlapping image filter outputs a low-resolution image smaller than the input image size. Like the overlapping image filter, each pixel of the filtered image is affected by its neighbors. However, each pixel of the input image is only used once. Similarly, let

S_{x, y}^{'}

denote the set of coordinates of the input image affected by the receptive field of the filtered image at point

(x, y)

, as shown in the following Equation (6):

\begin{matrix} S_{x, y}^{'} = {(u, v) | u \in [a x, a x + a) \land [0, n), \\ v \in [b y, b y + b) \land [0, n), u, v \in N} \end{matrix}

(6)

where

(u, v)

is the coordinate of the input image affected by the receptive field, and its value must be a positive number. The region of the receptive field is a rectangle of size

a \times b

formed by the center pixel

(x, y)

and its neighboring pixels.

When processing the edges of the input image, it is necessary to expand the image boundary to ensure the completeness of the receptive field. The padding method used for the image boundary affects the performance of the image filter. Ideally, the result of the image filter should be completely unaffected by the padding value. In this paper, the input image boundary is filled with 0.

3.5. Overlapping and Non-Overlapping Processes

The data transmission methods of the two different image filters are shown in Figure 3b,d. In the overlapping image filter, one pixel x in the input image is calculated using three pixels

l_{2}, l_{3}

, and

l_{4}

in the filter layer if the size of the receptive field is

1 \times 3

. In other words, pixel x can affect the three pixels in the filter layer. But in the non-overlapping image filter, pixel x only affects one pixel

l_{1}^{'}

in the filter layer. If adversarial noise is carefully added to pixel x, the adversarial noise is likely to be transmitted to the filter layer in the overlapping image filter, but it is ignored in the non-overlapping image filter.

In the overlapping filtering operation, the size of the input image is the same as that of the output image. However, the non-overlapping filter method can lead to a reduction in the size of the output image. However, whether processed by “overlapping” or “non-overlapping” filters, the labels of the images always maintain their original labels. The size of the receptive field is

2 \times 2

.

Figure 4 presents a comparison of the valid and adversarial sample sets of the overlapping and non-overlapping filters on the MNIST dataset. In undisturbed images, both filters can extract the shape of the original image. However, in adversarial images, the situation is different. The image processed by the overlapping filter not only retains the shape of the adversarial image but also amplifies the adversarial noise. In contrast, images processed by the non-overlapping filter effectively eliminate most of the adversarial noise. We have chosen to use the non-overlapping image filter for our research.

3.6. Feature Mapping Function

The image filter works by moving through the image pixel by pixel and using the function value of all pixels in the receptive field, calculated by a feature mapping function, to replace each value. It is significant that the feature mapping function eliminates adversarial noise. The function usually takes the extreme value of all pixel values in the receptive field, such as the median value, average value, etc. Our image filter works in a non-overlapping manner by moving through the image pixel by pixel and replacing each value with the maximum value of all pixels in the receptive field. The operation of this image filter can be expressed as shown in Equation (7):

F i l t e r (x, y) = m a x_{(u, v) \in S_{x, y}^{'}} P (u, v)

(7)

where

F i l t e r (x, y)

is the pixel value of the filtered image produced by the image filter.

S_{x, y}^{'}

is the set of coordinates of the input image affected by the receptive field of the filtered image at point

(x, y)

.

P (u, v)

is the pixel value of the point

(u, v)

in the input image.

3.7. Convolutional Neural Network

To validate the effectiveness of our method in convolutional neural network (CNN) configurations, we selected three different CNN architectures: VGGNet [42], ResNet [43], and GoogLeNet [44]. VGGNet has attracted widespread attention in image classification tasks due to its profound architectural design and excellent performance. ResNet effectively alleviates the problem of vanishing gradients during deep network training by introducing residual connections. The design goal of GoogLeNet is to address the problems that deep networks may encounter during training, including overfitting, gradient vanishing, and gradient explosion. Table 2 shows the CNN architectures we adopted and the specific configuration of each module.

In [45], the authors tested the performance of various CNN architectures in detecting induction motor strip fracture faults, and the results showed that VGG19 performed excellently in terms of accuracy and precision. In particular, after training in specific fields, the effectiveness of VGG19 in fault detection is particularly outstanding. Given that the test data consisted of low-resolution images (32 × 32 pixels), the VGG16 architecture was adopted in the study. ResNet34 is an architecture in the ResNet series, and Barrera et al. [46] used classifiers with the ResNet34 structure as discriminators for Generative Adversarial Networks (GANs). They evaluated the processing ability of a system composed of two structurally similar GANs on digital stained images to test their ability to modify colors without altering cell morphology. The Inception structure is the core subnetwork of GoogLeNet. Inception-ResNet-v2 [47] combines the Inception structure with ResNet, further enhancing the network’s feature extraction capability and prediction accuracy.

3.8. Performance Metrics

The accuracy metric is defined as the quantitative measure of the number of samples correctly predicted by the model, as shown in the following Equation (8):

a c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(8)

where TP refers to True Positive, TN to True Negative, FP to False Positive, and FN to False Negative. In scenarios involving adversarial images, this metric is appropriately termed “robust accuracy”. The utilization of accuracy as an evaluation criterion enables a direct comparison with previously established methodologies and offers a clear indication of the model’s performance under the conditions of an adversarial attack.

The Inception Score (IS) [48] serves as a metric to gauge the disparity between an original image and its corresponding adversarial counterpart. The IS relies on the carefully designed convolutional network model Inception-V3. This model accepts an image tensor as its input and yields a 1000-dimensional vector as output. Each dimension within this vector corresponds to the likelihood of the image being classified into a particular category, thereby allowing the entire vector to be interpreted as a probability distribution. The IS is defined by the following equation:

I S (x^{a d v}) = e^{E [K L (p (y | x^{a d v}) | | p (y))]}

(9)

where

p (y | x^{a d v})

represents the output distribution of the Inception model when inputting image

x^{a d v}

,

P (x^{a d v})

represents the probability of adversarial image

x^{a d v}

, and

P (y_{i} | x^{a d v})

represents the probability of the Inception model predicting that adversarial image x belongs to class i.

K L (p (y | x^{a d v}) | | p (y))

represents the KL divergence (relative entropy) between these two distributions. The calculation formula is as follows:

K L (P | | Q) = \sum_{i} P (i) l o g \frac{P (i)}{Q (i)}

(10)

Based on this definition, a higher IS value correlates with a superior quality of the adversarial image.

Learned Perceptual Image Patch Similarity (LPIPS) [49] is a sophisticated metric designed to assess the dissimilarity between two images. Unlike straightforward mathematical computations, LPIPS employs deep neural networks to facilitate its evaluations. Typically, the LPIPS model accepts two images as inputs and yields a score that quantifies their perceived similarity. The model’s architecture and parameters are derived from extensive training, with the goal of discerning the perceptual attributes of images.

Initially, we loaded a pretrained LPIPS model, specifically the Alex [50] model. Subsequently, we imported two images and conducted essential preprocessing steps to ensure that their formats conformed to the LPIPS model’s input specifications. In conclusion, we utilized the LPIPS model to ascertain the perceived similarity score for the pair of images.

4. Experimental Results

In this section, we verify the effectiveness of our approach on the publicly available CIFAR10 and CIFAR100 [51] datasets. We performed all tests on a machine running Windows 11, equipped with a 14-core 2.30 GHz CPU, an NVIDIA GeForce RTX 3060 GPU, and 16 GB of memory. Our code was compiled and run using Python 3.8 with the TensorFlow 2 [52] platform.

4.1. Datasets and Substitute Model

The CIFAR10 [51] dataset consists of 60,000 color images in 10 classes, with 6000 images per class. There are 50,000 training images and 10,000 test images. All the images in the CIFAR10 dataset are sized at

32 \times 32

pixels. The CIFAR100 dataset [51] is an extended version of the CIFAR10 dataset, designed to provide more challenging tasks. CIFAR100 contains 100 different categories, each containing 600

32 \times 32

pixel color images, of which 500 are training images and the remaining 100 are test images.

4.2. Evaluation of Image Transformation in Adversarial Defense

When training convolutional neural network (CNN) models, image transformation techniques, such as rotation, adjusting brightness, or adding noise, are typically used to enhance the network’s robustness to various disturbances. To evaluate the impact of different image transformation techniques on the robustness of the network, we selected the VGG16 architecture as the basis for the image classification model. In the white-box attack scenario, we employed FGSM, BIM, and PGD attacks, which can access the gradient information of the target model and obtain all the details of the model. Specifically, the number of iterations for BIM and PGD attacks was set to 10. The attack factor

ϵ

used for all attacks was set to 8/256, meaning that on the CIFAR10 and CIFAR100 datasets, the value of each pixel can change by up to 8 units. As a representative of black-box attacks, Square attacks can only access the target model and obtain its output results but cannot obtain the gradient information of the model.

Table 3 reveals the impact of different image transformations on the accuracy of adversarial attacks under the VGG16 architecture. Introducing noise has proven to be an effective strategy for improving the robustness of networks against disturbances. Compared to this, our method demonstrates superior performance. Although increasing noise improves the processing capability of adversarial samples, it also slightly reduces the recognition accuracy of normal samples. UDFiltering has developed an adaptive learning algorithm that employs a universal defense filter to process adversarial images and normal images with varying intensities. However, it also faces the issue of applying a relatively strong defense filter to normal images while using a milder version for adversarial ones.

4.3. Performance of Our Defense Techniques

To evaluate defense performance, we compared the effectiveness of DIP, LPIF, Data Filtering, UDFilter, and our own defense techniques on the CIFAR10 and CIFAR100 datasets. We adopted three architectures: VGG16, ResNet34, and Inception ResNet-v2. As shown in Table 4, we selected white-box attacks (FGSM, BIM, and PGD) and a black-box attack (Square attack) as baseline adversarial attack methods, applying these five defense measures on the two datasets to test the accuracy of the image classifier when faced with adversarial samples.

The results reveal that our method exhibited higher accuracy compared to the other four defense strategies, indicating its robustness. However, when attempting to use the DIP method, due to its fixed output image size (659 × 659), it exceeded the performance limit of our device during the training of the Inception-ResNet-v2 architecture, resulting in an inability to train. Whether facing white-box attacks or black-box attacks, our method not only demonstrates robustness against adversarial perturbations but also does not introduce an additional computational burden during the training process of the image classification model.

4.4. Comparison of Overlapping and Non-Overlapping Scanning Methods

In order to verify the performance of the non-overlapping method applied in the receptive field, we compared two different methods. In image filters, the processing method of the receptive field is related to how the added perturbation is handled. Table 5 shows the accuracy of using the overlapping and non-overlapping methods in the face of white-box and black-box attacks on the CIFAR10 dataset. The network architecture used was Inception-ResNet-v2. The results show that both methods can enhance defense capabilities against adversarial attacks, but the non-overlapping method outperforms the overlapping method. In the non-overlapping mode, the influence range of a pixel is reduced when subtle perturbations are added.

4.5. Quantitative Evaluation of Image Preprocessing

We quantitatively evaluated the preprocessed images to verify the effectiveness of our defense measures. For this purpose, we first constructed test sets for CIFAR10 and CIFAR100 images. Given that Data Filtering does not apply to raw image data but rather excludes irrelevant adversarial samples during adversarial training, this process was not considered.

Two commonly used indicators in image processing quality assessment were calculated: IS and LPIPS. Table 6 shows the results of each method for these indicators.

The IS metric is an indicator used to evaluate the quality and diversity of image processing. It calculates the KL divergence between the conditional label distribution and the edge label distribution of the processed image using a pretrained Inception-v3 classification model. In this study, although our method involves data dimensionality reduction, which often leads to a decrease in image quality, as shown in Table 6, our IS score was superior to that of the DIP and LPIF methods and similar to that of the UDFilter method. This indicates that our method effectively maintains the diversity of the image without compromising its overall quality.

The LPIPS metric is used to evaluate the visual perception of image sets, similar to the human perceptual experience. During testing, we measured the LPIPS metric on both the original image set and the processed image set. In Table 6, it can be seen that the LPIPS scores of all methods were quite similar, indicating that the original image and the processed image share similarities in visual perception.

4.6. Time Consumption of Model Training

The time consumption of model training, which refers to the total time spent completing the entire model training process, is one of the core parameters for evaluating model efficiency in the fields of machine learning and deep learning. Multiple factors collectively affect the time cost of model training, among which the architectural design and complexity of the model are particularly crucial. For structurally complex models, such as deep neural networks, parameter optimization and model adjustments often require more computational resources and time consumption.

In our experiment, all five defense strategies used three CNN architectures as classification models after processing the data, with the main differences reflected in the data processing strategies. In order to compare the time cost differences among the five different defense strategies, we adopted a unified network architecture as a benchmark. Specifically, the number of iterations for the ResNet34 architecture was set to 100, while the number of iterations for VGG16 and Inception-ResNet-v2 was set to 20. In addition, we used two datasets and divided them into training and validation sets in an 8:2 ratio. In the experiment, DIP adopted optimized parameter settings, with a THRESHOLD value of 87, and a PIXELS value of 659, while LPIF used an optimal low-pass filter size of 7. To reduce experimental errors, we repeated the model training process for each defense method three times.

Table 7 provides a detailed list of the time costs required for the five different defense methods during model training. Through comparative analysis, our defense method showed significant advantages in terms of time consumption. Due to the use of a larger image filter size (659 × 659), the DIP method significantly increased the number and complexity of model parameters, thereby prolonging the training time. Data Filtering increased the size of the training data by incorporating adversarial samples into the training process. UDFilter required additional computing resources and utilized adaptive learning algorithms to generate a universal defense filter. In contrast, our method effectively reduced the size of the processed image by optimizing the image-filtering process, thereby reducing the number and complexity of model parameters and achieving more efficient model training. This demonstrates that our defense method reduces the computational overhead during the training process of image classification models without sacrificing robustness, making it a lightweight and efficient defense strategy.

4.7. Robustness of Our Defense Techniques

The robustness of a defense is directly affected by the strength of the attack and is closely related to the success rate of that attack. We evaluated the success rates of various defense strategies under different attack factors. In terms of countering attacks, we adopted the Square attack method from black-box attacks. As for the CNN architecture, we chose the Inception ResNet-v2 model. Figure 5 reveals the attack success rates of Square attacks against five different defense models on the CIFAR10 test dataset. The attack factor

ϵ

ranges from 0.1 to 0.9.

When the attack factor

ϵ \geq 0.4

, the attack success rate of Square attacks exceeded 50% across all defense models. At this point, the maximum modification of pixel values in the adversarial image was 106, and humans can perceive the adversarial perturbations in the image. When the attack factor

ϵ \leq 0.2

, it is clear that the attack success rate in our defense model met the condition

r a t e < 0.25

. Under this setting, the added perturbations are imperceptible to the human eye. Thus, the proposed method is efficient and robust against adversarial attacks.

4.8. Comparison Results with Other Methods

We compared our method with two defense methods based on diffusion models. We followed the basic setup and used the PyTorch implementation of Wang et al. (https://github.com/wzekai99/DM-Improves-AT, accessed on 23 September 2024) and DiffPure (https://github.com/NVlabs/DiffPure, accessed on 23 September 2024). The classification model employed was WideResNet [53]. Due to limitations in device performance and graphics card memory size, we only used 1M to generate denoised images. We trained a WRN-28-10 model on CIFAR10 using 1M generated data. We used the SGD optimizer with Nesterov momentum, where the momentum factor and weight decay were set to 0.9 and

5 \times 10^{- 4}

, respectively. We used a cyclic learning rate schedule with cost annexing, where the initial learning rate was set to 0.1. The dropout rate was set to 0.3. We considered two metrics for evaluation: standard accuracy and robust accuracy. The standard accuracy measures the performance of the defense methods on clean data and is evaluated on the entire test set. The robust accuracy measures the performance on adversarial instances generated by three different attacks.

Table 8 shows the standard accuracy and robust accuracy of the two diffusion model-based defense methods and our proposed method on the CIFAR10 (

l_{\infty}, ϵ = 8 / 255

) test set. It is evident that our method performed better in both standard accuracy and robust accuracy when using the WRN-28-10 architecture. The results demonstrate the effectiveness of using the non-overlapping method to preprocess images in enhancing the robustness of the model.

5. Discussion and Conclusions

In this paper, we introduced a simple and practical strategy for defending against adversarial attacks. This strategy aims to enhance the robustness of deep convolutional neural networks against adversarial attacks. We pointed out the shortcomings of current adversarial defense techniques based on image filters in terms of time efficiency. In order to shorten the training time of the model without reducing adversarial robustness, we proposed a new method that uses non-overlapping image compression filters to protect the image recognition classifier from adversarial attacks. This filter was validated on multiple datasets and under different CNN architectures. The experimental results showed that our method achieved an average accuracy of 75.48% on the CIFAR10 dataset. This proves that the method can effectively resist adversarial attacks and maintain good performance under various learning architectures.

First, we compared our method with traditional image conversion techniques, including rotation, brightness adjustment, and noise addition, to evaluate its effectiveness in enhancing the robustness of the network against various disturbances. The results showed that our method performed better in improving the robustness of the network against these disturbances.

After experimental verification, the results showed that using non-overlapping image filters achieved higher efficiency in eliminating adversarial disturbances. Akin to the overlapping image filter, the non-overlapping image filter is influenced by neighboring pixels during image processing. Nevertheless, it is imperative to emphasize that the non-overlapping image filter ensures that each pixel within the input image is utilized only once, thereby effectively minimizing the influence of adversarial disturbances.

The proposed method demonstrates significant advantages in computational efficiency, characterized by the need for only one simple filtering preprocessing step for subsequent image recognition tasks. The time required for filtering mainly depends on the resolution of the image. Among the three different CNN architectures, the shortest training time for the entire image classification network was only 0.169 h.

Then, we determined that our method demonstrates robustness in resisting adversarial attacks. Specifically, when the value of the attack factor

ϵ

remains below

0.2

, the attack success rate is firmly constrained within the limit of

r a t e < 0.25

. Within this parameter range, the perturbations introduced are imperceptible to the human eye.

Finally, we tested two diffusion model-based defense methods and evaluated the standard and robust accuracies of our proposed method on the CIFAR10 test set. Under the WRN-28-10 architecture, our method achieved a standard accuracy of 91.28% and an average robust accuracy of 76.46%. These results demonstrate the effectiveness of our method in enhancing the robustness of the model.

Given the demonstrated efficiency and low-complexity advantages of the proposed method, it stands as a potentially effective tool for diverse image recognition applications, particularly those operating on hardware platforms constrained by limited computing resources. However, we must be cautious about the potential limitations of this method, particularly those pertaining to its performance in addressing non-gradient-based adversarial attacks. Therefore, in future research, we will delve into image classification models for adversarial attacks from the perspectives of the following two types of image preprocessing:

(1) Before applying the receptive field algorithm, performing an RoI analysis can significantly reduce recognition time. It can effectively minimize interference from image backgrounds and has high practicality and universality. At the same time, it can transfer single-object classification models to multi-object classification tasks.

(2) We reduce adversarial noise and also lower the resolution of the image through the “non-overlapping ” mechanism. In future research, we will consider adding a super-resolution network to improve the resolution of denoised images.

Overall, although current methods have shown strong application prospects in the field of image recognition, there are still challenges that need to be addressed. By delving into the behavioral characteristics of convolutional neural networks, combined with RoI operations and noise suppression mechanisms, we have the potential to develop more powerful and secure image recognition technologies to address increasingly complex computing environments and potential adversarial threats.

Author Contributions

Conceptualization, M.W. and Z.L.; methodology, M.W. and Z.L.; software, M.W.; validation, M.W. and Z.L.; formal analysis, M.W.; investigation, M.W.; resources, M.W.; data curation, M.W.; writing—original draft preparation, M.W.; writing—review and editing, Z.L.; visualization, Z.L.; supervision, Z.L.; project administration, Z.L.; funding acquisition, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The MNIST data used in this study are openly available at http://yann.lecun.com/exdb/mnist/ (accessed on 23 September 2024) [54]. The CIFAR-10 and CIFAR-100 datasets used in this study are openly available at http://www.cs.toronto.edu/~kriz/cifar.html (accessed on 23 September 2024) [51].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Costa, J.C.; Roxo, T.; Proença, H.; Inácio, P.R. How deep learning sees the world: A survey on adversarial attacks & defenses. IEEE Access 2024, 12, 61113–61136. [Google Scholar]
Melis, M.; Demontis, A.; Biggio, B.; Brown, G.; Fumera, G.; Roli, F. Is deep learning safe for robot vision? adversarial examples against the icub humanoid. In Proceedings of the IEEE International Conference on Computer Vision Workshops, Venice, Italy, 22–29 October 2017; pp. 751–759. [Google Scholar]
Smutz, C.; Stavrou, A. When a Tree Falls: Using Diversity in Ensemble Classifiers to Identify Evasion in Malware Detectors; NDSS: San Diego, CA, USA, 2016. [Google Scholar]
Rakin, A.S.; Fan, D. Defense-net: Defend against a wide range of adversarial attacks through adversarial detector. In Proceedings of the 2019 IEEE Computer Society Annual Symposium on VLSI (ISVLSI), Miami, FL, USA, 15–17 July 2019; pp. 332–337. [Google Scholar]
Dhariwal, P.; Nichol, A. Diffusion models beat gans on image synthesis. Adv. Neural Inf. Process. Syst. 2021, 34, 8780–8794. [Google Scholar]
Nie, W.; Guo, B.; Huang, Y.; Xiao, C.; Vahdat, A.; Anandkumar, A. Diffusion models for adversarial purification. arXiv 2022, arXiv:2205.07460. [Google Scholar]
Ho, J.; Jain, A.; Abbeel, P. Denoising diffusion probabilistic models. Adv. Neural Inf. Process. Syst. 2020, 33, 6840–6851. [Google Scholar]
Wang, Z.; Pang, T.; Du, C.; Lin, M.; Liu, W.; Yan, S. Better Diffusion Models Further Improve Adversarial Training. In Proceedings of the 40th International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; Volume 202, pp. 36246–36263. [Google Scholar]
Karras, T.; Aittala, M.; Aila, T.; Laine, S. Elucidating the design space of diffusion-based generative models. Adv. Neural Inf. Process. Syst. 2022, 35, 26565–26577. [Google Scholar]
Xiao, Y.; Deng, X.; Yu, Z. Defending against Adversarial Attacks using Digital Image Processing. J. Physics Conf. Ser. 2023, 2577, 012016. [Google Scholar] [CrossRef]
Ziyadinov, V.; Tereshonok, M. Low-Pass Image Filtering to Achieve Adversarial Robustness. Sensors 2023, 23, 9032. [Google Scholar] [CrossRef] [PubMed]
Mao, Z.; Chen, S.; Miao, Z.; Li, H.; Xia, B.; Cai, J.; Yuan, W.; You, X. Enhancing robustness of person detection: A universal defense filter against adversarial patch attacks. Comput. Secur. 2024, 146, 104066. [Google Scholar] [CrossRef]
Chen, E.C.; Lee, C.R. Data filtering for efficient adversarial training. Pattern Recognit. 2024, 151, 110394. [Google Scholar] [CrossRef]
Taran, O.; Rezaeifar, S.; Holotyak, T.; Voloshynovskiy, S. Defending against adversarial attacks by randomized diversification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 11226–11233. [Google Scholar]
Kundu, S.; Nazemi, M.; Beerel, P.A.; Pedram, M. DNR: A Tunable Robust Pruning Framework Through Dynamic Network Rewiring of DNNs. In Proceedings of the 26th Asia and South Pacific Design Automation Conference, Tokyo, Japan, 18–21 January 2021; ASPDAC ’21. pp. 344–350. [Google Scholar] [CrossRef]
Chen, E.C.; Lee, C.R. Ltd: Low temperature distillation for robust adversarial training. arXiv 2021, arXiv:2111.02331. [Google Scholar]
Dai, S.; Mahloujifar, S.; Mittal, P. Parameterizing activation functions for adversarial robustness. In Proceedings of the 2022 IEEE Security and Privacy Workshops (SPW), San Francisco, CA, USA, 22–26 May 2022; pp. 80–87. [Google Scholar]
Gao, X.; Saha, R.K.; Prasad, M.R.; Roychoudhury, A. Fuzz testing based data augmentation to improve robustness of deep neural networks. In Proceedings of the ACM/IEEE 42nd International Conference on Software Engineering, Seoul, Republic of Korea, 27 June–19 July 2020; pp. 1147–1158. [Google Scholar]
Theagarajan, R.; Bhanu, B. Defending Black Box Facial Recognition Classifiers Against Adversarial Attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Seattle, WA, USA, 14–19 June 2020; pp. 812–813. [Google Scholar]
Meng, D.; Chen, H. Magnet: A two-pronged defense against adversarial examples. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 30 October–3 November 2017; pp. 135–147. [Google Scholar]
Zhou, D.; Wang, N.; Peng, C.; Gao, X.; Wang, X.; Yu, J.; Liu, T. Removing Adversarial Noise in Class Activation Feature Space. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 7878–7887. [Google Scholar]
Abusnaina, A.; Wu, Y.; Arora, S.; Wang, Y.; Wang, F.; Yang, H.; Mohaisen, D. Adversarial Example Detection Using Latent Neighborhood Graph. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, BC, Canada, 11–17 October 2021; pp. 7687–7696. [Google Scholar]
Ho, C.H.; Vasconcelos, N. DISCO: Adversarial Defense with Local Implicit Functions. In Advances in Neural Information Processing Systems; Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A., Eds.; Curran Associates, Inc.: New Orleans, LA, USA, 2022; Volume 35, pp. 23818–23837. [Google Scholar]
Rade, R.; Moosavi-Dezfooli, S.M. Helper-based Adversarial Training: Reducing Excessive Margin to Achieve a Better Accuracy vs. Robustness Trade-off. In Proceedings of the ICML 2021 Workshop on Adversarial Machine Learning, Vienna, Austria, 18–24 July 2021. [Google Scholar]
Chen, J.; Cheng, Y.; Gan, Z.; Gu, Q.; Liu, J. Efficient robust training via backward smoothing. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, Canada, 22 February–1 March 2022; Volume 36, pp. 6222–6230. [Google Scholar]
Shi, C.; Holtz, C.; Mishne, G. Online Adversarial Purification based on Self-Supervision. arXiv 2021, arXiv:2101.09387. [Google Scholar]
Yoon, J.; Hwang, S.J.; Lee, J. Adversarial purification with score-based generative models. In Proceedings of the International Conference on Machine Learning, Virtual Event, 18–24 July 2021; pp. 12062–12072. [Google Scholar]
Wu, Q.; Ye, H.; Gu, Y. Guided diffusion model for adversarial purification from random noise. arXiv 2022, arXiv:2206.10875. [Google Scholar]
Xiao, C.; Chen, Z.; Jin, K.; Wang, J.; Nie, W.; Liu, M.; Anandkumar, A.; Li, B.; Song, D. DensePure: Understanding Diffusion Models towards Adversarial Robustness. In Proceedings of the Workshop on Trustworthy and Socially Responsible Machine Learning, NeurIPS 2022, New Orleans, LA, USA, 28 November–9 December 2022. [Google Scholar]
Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 2008, 20, 61–80. [Google Scholar] [CrossRef] [PubMed]
Khalid, F.; Ali, H.; Tariq, H.; Hanif, M.A.; Rehman, S.; Ahmed, R.; Shafique, M. QuSecNets: Quantization-based Defense Mechanism for Securing Deep Neural Network against Adversarial Attacks. In Proceedings of the 2019 IEEE 25th International Symposium on On-Line Testing and Robust System Design (IOLTS), Rhodes, Greece, 1–3 July 2019; pp. 182–187. [Google Scholar]
Papernot, N.; McDaniel, P.; Wu, X.; Jha, S.; Swami, A. Distillation as a defense to adversarial perturbations against deep neural networks. In Proceedings of the 2016 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2016; pp. 582–597. [Google Scholar]
Rahnama, A.; Nguyen, A.T.; Raff, E. Robust Design of Deep Neural Networks against Adversarial Attacks based on Lyapunov Theory. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 8178–8187. [Google Scholar]
Rakin, A.S.; Yi, J.; Gong, B.; Fan, D. Defend deep neural networks against adversarial examples via fixed and dynamic quantized activation functions. arXiv 2018, arXiv:1807.06714. [Google Scholar]
Zhang, H.; Yao, Z.; Sakurai, K. Versatile Defense Against Adversarial Attacks on Image Recognition. arXiv 2024, arXiv:2403.08170. [Google Scholar]
Zhu, Y.; Zhao, Y.; Hu, Z.; Luo, T.; He, L. A review of black-box adversarial attacks on image classification. Neurocomputing 2024, 610, 128512. [Google Scholar] [CrossRef]
Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and harnessing adversarial examples. arXiv 2014, arXiv:1412.6572. [Google Scholar]
Kurakin, A.; Goodfellow, I.; Bengio, S. Adversarial examples in the physical world. arXiv 2016, arXiv:1607.02533. [Google Scholar]
Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards deep learning models resistant to adversarial attacks. arXiv 2017, arXiv:1706.06083. [Google Scholar]
Andriushchenko, M.; Croce, F.; Flammarion, N.; Hein, M. Square attack: A query-efficient black-box adversarial attack via random search. In European Conference on Computer Vision; Springer: Cham, Switzerland, 2020; pp. 484–501. [Google Scholar]
Dumoulin, V.; Visin, F. A guide to convolution arithmetic for deep learning. arXiv 2016, arXiv:1603.07285. [Google Scholar]
Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Barrera-Llanga, K.; Burriel-Valencia, J.; Sapena-Bañó, Á.; Martínez-Román, J. A comparative analysis of deep learning convolutional neural network architectures for fault diagnosis of broken rotor bars in induction motors. Sensors 2023, 23, 8196. [Google Scholar] [CrossRef] [PubMed]
Barrera, K.; Rodellar, J.; Alférez, S.; Merino, A. Automatic normalized digital color staining in the recognition of abnormal blood cells using generative adversarial networks. Comput. Methods Programs Biomed. 2023, 240, 107629. [Google Scholar] [CrossRef] [PubMed]
Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, inception-resnet and the impact of residual connections on learning. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar]
Salimans, T.; Goodfellow, I.; Zaremba, W.; Cheung, V.; Radford, A.; Chen, X. Improved techniques for training gans. Adv. Neural Inf. Process. Syst. 2016, 29, 2234–2242. [Google Scholar] [CrossRef]
Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The unreasonable effectiveness of deep features as a perceptual metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 586–595. [Google Scholar]
Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2012, 25. [Google Scholar] [CrossRef]
Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; University of Tront: Toronto, ON, Canada, 2009. [Google Scholar]
Abadi, M.; Agarwal, A.; Barham, P.; Brevdo, E.; Chen, Z.; Citro, C.; Corrado, G.S.; Davis, A.; Dean, J.; Devin, M.; et al. TensorFlow: Large-Scale Machine Learning on Heterogeneous Systems. Software available from tensorflow.org. 2015. Available online: https://www.tensorflow.org/ (accessed on 23 September 2024).
Zagoruyko, S.; Komodakis, N. Wide Residual Networks. In Proceedings of the British Machine Vision Conference 2016. British Machine Vision Association, York, UK, 19–22 September 2016. [Google Scholar]
LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]

Figure 1. Adversarial perturbations have been intentionally introduced into the handwritten image of the numeral “3” to create an adversarial image.

Figure 2. The detailed process of applying image filters to a classifier model to solve the vulnerability of adversarial perturbations.

Figure 3. Scanning methods of two different image filters: (a,c) Scanning mode diagrams of the receptive fields of overlapping and non-overlapping image filters, respectively. (b,d) Diagrams of data transmission methods for the corresponding methods. The red dots represent the filter layer pixels that can be influenced by the input layer pixel x.

Figure 4. Set of legitimate and adversarial samples of overlapping and non-overlapping image filters on MNIST. The set of legitimate samples can be seen in the top section, which were correctly classified by the CNN [42]. The corresponding set of adversarial samples (crafted using the FGSM [37]), which were misclassified by the CNN, is shown in the bottom section. For each sample set, the unprocessed images are in the top row, the images processed by the overlapping image filter are in the middle row, and the images processed by the non-overlapping image filter are in the bottom row. The label predicted by the CNN is shown above each image.

Figure 5. The attack success rates of Square attacks against five different defense models on the CIFAR10 test dataset. The attack factor

ϵ

ranges from 0.1 to 0.9.

Figure 5. The attack success rates of Square attacks against five different defense models on the CIFAR10 test dataset. The attack factor

ϵ

ranges from 0.1 to 0.9.

Table 2. The CNN architecture we adopted and the specific configuration of each module.

CNN	VGGNet		ResNet		GoogLeNet
Architectures	VGG16		ResNet34		Inception-ResNet-v2
Input	$32 \times 32$ RGB image
Name	Layer	Output Size	Layer	Output Size	Layer	Output Size
Stem			stem,strides=1	$32 \times 32$	stem,strides=1	$32 \times 32$
Block-A			$[\begin{matrix} 3 \times 3, 64 \\ 3 \times 3, 64 \end{matrix}] \times 3$	$16 \times 16$	Inception-resnet-A×5	$32 \times 32$
	$[3 \times 3, 64] \times 2$	$16 \times 16$			Inception-resnet-A×5	$32 \times 32$
	maxpool				ReductionA	$16 \times 16$
Block-B			$[\begin{matrix} 3 \times 3, 128 \\ 3 \times 3, 128 \end{matrix}] \times 4$	$8 \times 8$	Inception-resnet-B×10	$16 \times 16$
	$[3 \times 3, 128] \times 2$	$8 \times 8$			Inception-resnet-B×10	$16 \times 16$
	maxpool				ReductionB	$8 \times 8$
Block-C			$[\begin{matrix} 3 \times 3, 256 \\ 3 \times 3, 256 \end{matrix}] \times 6$	$4 \times 4$	Inception-resnet-C×5	$8 \times 8$
	$[3 \times 3, 256] \times 3$	$4 \times 4$			Inception-resnet-C×5	$8 \times 8$
	maxpool
Block-D			$[\begin{matrix} 3 \times 3, 512 \\ 3 \times 3, 512 \end{matrix}] \times 3$	$2 \times 2$
	$[3 \times 3, 512] \times 3$	$2 \times 2$
	maxpool
Block-E
	$[3 \times 3, 512] \times 3$	$1 \times 1$
	maxpool
Block-F	FullConnect	256	AveragePooling,strides=1		AveragePooling,strides=1
	FullConnect	128			Dropout(0.8)
	FullConnect	num_classes	FullConnect	num_classes	FullConnect	num_classes
Output	softmax

Table 3. The impact of different image transformations on the accuracy of adversarial attacks under the VGG16 architecture.

Dataname	Attack	Origin	Rotation	Brightness	Contrast	Noise	UDFilter	Our
CIFAR10	FGSM	54.9%	63.0%	65.1%	66.4%	68.7%	72.3%	73.1%
	BIM	47.7%	56.8%	57.3%	59.4%	60.5%	63.8%	65.4%
	PGD	47.9%	54.2%	58.0%	56.2%	59.1%	64.6%	64.8%
	Square	51.9%	56.9%	58.0%	59.1%	61.4%	67.6%	68.1%
CIFAR100	FGSM	37.6%	37.7%	41.6%	58.0%	58.9%	60.5%	61.0%
	BIM	31.6%	33.5%	32.2%	45.2%	47.1%	52.2%	53.5%
	PGD	31.7%	33.8%	33.2%	48.1%	49.8%	54.8%	55.1%
	Square	33.7%	36.1%	35.4%	51.3%	52.1%	58.3%	58.7%

Table 4. The accuracy of DIP, LPIF, Data Filtering, UDFilter, and our defense techniques on the CIFAR10 and CIFAR100 datasets.

Attack	Defense	VGG16		ResNet34		Inception-ResNet-v2
Attack	Defense	CIFAR10	CIFAR100	CIFAR10	CIFAR100	CIFAR10	CIFAR100
FGSM	DIP (2023) [10]	0.618	0.506	0.677	0.541	-	-
	LPIF (2023) [11]	0.704	0.589	0.771	0.629	0.800	0.695
	Data Filtering (2024) [12]	0.729	0.608	0.798	0.650	0.828	0.718
	UDFilter (2024) [13]	0.723	0.605	0.792	0.647	0.822	0.715
	Our	0.731	0.610	0.801	0.653	0.831	0.721
BIM	DIP (2023) [10]	0.551	0.477	0.643	0.557	-	-
	LPIF (2023) [11]	0.625	0.531	0.729	0.621	0.761	0.696
	Data Filtering (2024) [12]	0.641	0.530	0.748	0.619	0.781	0.695
	UDFilter (2024) [13]	0.638	0.522	0.744	0.610	0.777	0.685
	Our	0.654	0.535	0.763	0.625	0.796	0.701
PGD	DIP (2023) [10]	0.548	0.468	0.642	0.528	-	-
	LPIF (2023) [11]	0.628	0.521	0.736	0.588	0.777	0.661
	Data Filtering (2024) [12]	0.643	0.551	0.753	0.621	0.795	0.698
	UDFilter (2024) [13]	0.646	0.548	0.756	0.618	0.798	0.694
	Our	0.648	0.551	0.759	0.622	0.802	0.698
Square	DIP (2023) [10]	0.572	0.481	0.650	0.524	-	-
	LPIF (2023) [11]	0.638	0.564	0.725	0.614	0.765	0.681
	Data Filtering (2024) [12]	0.670	0.582	0.761	0.634	0.803	0.702
	UDFilter (2024) [13]	0.676	0.583	0.768	0.635	0.811	0.703
	Our	0.681	0.587	0.774	0.639	0.816	0.708

Table 5. Accuracy of using overlapping and non-overlapping methods in the face of white-box and black-box attacks on the CIFAR10 dataset. The CNN architecture used is Inception-ResNet-v2.

Processing Method	FGSM	BIM	PGD	Square
Unprocessed	0.526	0.452	0.458	0.507
overlapping	0.827	0.787	0.796	0.812
non-overlapping	0.831	0.796	0.802	0.816

Table 6. The performance results of each method for the IS and LPIPS metrics for image quality assessment.

Defense	IS	LPIPS	IS	LPIPS
DIP	4.035 ± 0.162	0.207	4.397 ± 0.080	0.244
LPIF	4.528 ± 0.173	0.206	4.978 ± 0.165	0.242
UDFilter	4.877 ± 0.106	0.202	5.929 ± 0.092	0.239
Our	4.863 ± 0.115	0.201	5.915 ± 0.127	0.237

Table 7. The time cost required to train models with five different defense methods. The unit is hours.

Defense	VGG16		ResNet34		Inception-ResNet-v2
Defense	CIFAR10	CIFAR100	CIFAR10	CIFAR100	CIFAR10	CIFAR100
DIP	2.265	2.363	10.641	9.843	-	-
LPIF	0.253	0.301	1.415	1.772	7.133	7.185
Data Filtering	0.527	0.573	2.287	2.594	10.347	10.647
UDFilter	0.245	0.308	1.427	1.520	6.649	6.823
Our	0.169	0.177	0.868	0.809	3.731	3.623

Table 8. The standard accuracy and robust accuracy of two diffusion model-based defense methods and our proposed method on the CIFAR10 (

l_{\infty}, ϵ = 8 / 255

) test set.

Table 8. The standard accuracy and robust accuracy of two diffusion model-based defense methods and our proposed method on the CIFAR10 (

l_{\infty}, ϵ = 8 / 255

) test set.

Method	Architecture	Generated	Standard Accuracy	FGSM	BIM	PGD	Square
DiffPure (2023) [6]	WRN-28-10	1M	90.83	76.36	73.87	74.18	75.07
Wang et al. (2023) [8]	WRN-28-10	1M	91.12	77.27	74.28	74.87	76.26
Our	WRN-28-10		91.28	78.35	75.04	75.52	76.92

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, M.; Liu, Z. Lightweight Robust Image Classifier Using Non-Overlapping Image Compression Filters. Appl. Sci. 2024, 14, 8636. https://doi.org/10.3390/app14198636

AMA Style

Wang M, Liu Z. Lightweight Robust Image Classifier Using Non-Overlapping Image Compression Filters. Applied Sciences. 2024; 14(19):8636. https://doi.org/10.3390/app14198636

Chicago/Turabian Style

Wang, Mingde, and Zhijing Liu. 2024. "Lightweight Robust Image Classifier Using Non-Overlapping Image Compression Filters" Applied Sciences 14, no. 19: 8636. https://doi.org/10.3390/app14198636

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Lightweight Robust Image Classifier Using Non-Overlapping Image Compression Filters

Abstract

1. Introduction

2. Related Works

3. Materials and Methods

3.1. Threat Models

3.2. Adversarial Attacks

3.3. The Proposed Method

3.4. Receptive Field

3.5. Overlapping and Non-Overlapping Processes

3.6. Feature Mapping Function

3.7. Convolutional Neural Network

3.8. Performance Metrics

4. Experimental Results

4.1. Datasets and Substitute Model

4.2. Evaluation of Image Transformation in Adversarial Defense

4.3. Performance of Our Defense Techniques

4.4. Comparison of Overlapping and Non-Overlapping Scanning Methods

4.5. Quantitative Evaluation of Image Preprocessing

4.6. Time Consumption of Model Training

4.7. Robustness of Our Defense Techniques

4.8. Comparison Results with Other Methods

5. Discussion and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI