Using High-Level Representation Difference Constraint and Relative Reconstruction Constraint for Defending against Adversarial Attacks

Gao, Song; Wang, Xiaoxuan; Dong, Yunyun; Yao, Shaowen

doi:10.3390/electronics12092017

Open AccessArticle

Using High-Level Representation Difference Constraint and Relative Reconstruction Constraint for Defending against Adversarial Attacks

by

Song Gao

¹,

Xiaoxuan Wang

^2,*,

Yunyun Dong

¹ and

Shaowen Yao

¹

National Pilot School of Software, Yunnan University, Kunming 650504, China

²

School of Information Science and Technology, Yunnan Normal University, Kunming 650504, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(9), 2017; https://doi.org/10.3390/electronics12092017

Submission received: 27 March 2023 / Revised: 19 April 2023 / Accepted: 24 April 2023 / Published: 26 April 2023

Download

Browse Figures

Versions Notes

Abstract

:

Adversarial examples in which imperceptible perturbations to the input can easily subvert a well-trained model’s prediction pose huge potential security threats to deep neural networks (DNNs). As an effective way to resist adversarial samples, input reconstruction can eliminate the antagonism of adversarial examples in the inference process without involving modifications to the target model’s structure and parameters. However, preprocessing inputs often results in some loss of the protected model’s prediction accuracy. In this paper, we introduce a new input reconstruction method that adopts the high-level representation difference constraint and the relative reconstruction constraint on a dual autoencoder to advance the prediction accuracy of the protected model. The high-level representation difference constraint utilizes the gap between the protected model’s high-level representations, activated by clean images, and their adversarial examples to guide the training of the dual autoencoder. Additionally, the relative reconstruction constraint is imposed on latent representations and their noisy versions to advance the robustness of the dual autoencoder to tiny perturbations. The extensive empirical experiments on two real datasets, CIFAR-10 and ImageNet, show that the presented approach demonstrates exceptional performance in resisting different types of attacks.

Keywords:

DNNs; adversarial samples; robustness; input reconstruction

1. Introduction

Although deep neural networks (DNNs) have achieved excellent performance in solving many perceptual tasks [1,2,3], they are harassed by adversarial examples which bring immense challenges to DNNs, especially in safety-critical applications. Existing studies have introduced a number of defense strategies that attempt to effectively resist adversarial examples, such as gradient regularization [4,5], defensive distillation [6,7] and adversarial training [8,9,10,11]. However, these strategies often involve modifying the cost function, augmenting the training data, or distilling the protected model. Obviously, they require changing the details of the protected model and introducing expensive computing consumption.

As a replaceable solution, input reconstruction denoises adversarial samples before passing them to the target model. Input reconstruction approaches are flexible, they can protect a model in the inference phase, and they can be conveniently integrated with other defenses. For example, Gu and Rigazio introduced DAE [12], which leverages the plain autoencoder to reconstruct input images. Autoencoders usually utilize the

L_{1}

norm or

L_{2}

norm for a pixel-by-pixel comparison. However, adversarial perturbations are very small, and a benign sample and its corresponding adversarial sample are similar. This means that using only the difference between pixel pairs makes it difficult to effectively distinguish clean images from their adversarial versions. Similarly, Jia et al. [13] aimed to remove adversarial perturbations by compressing images on a large scale. They employed an encoder to project images into a very small space and then used a decoder to reconstruct images from the small space. However, reconstructing images from a very small space will improve reconstruction difficulty and reduce reconstruction quality. Liao et al. [14] proposed the HGD by taking advantage of the amplified difference between benign samples and their adversarial versions. Although the HGD is a very effective method, it still loses some of the target model’s prediction accuracy, which is a common problem faced by input reconstruction methods.

In this paper, we take advantage of the dual autoencoder architecture as a denoiser and enforce the high-level representation difference constraint and the relative reconstruction constraint on the denoiser to advance the prediction performance of the target model. Let f be a well-trained classifier, x be a clean example, and r be adversarial perturbations. The adversarial version of x can be defined as

x_{a d v} = x + r

. For a good denoiser that contains an encoder

D_{e n c}

and a decoder

D_{d e c}

, we expect that

f (D_{d e c} (D_{e n c} (x))) = f (D_{d e c} (D_{e n c} (x + r)))

and disregard the similarity between input x and

D_{d e c} (D_{e n c} (x))

. Therefore, instead of directly measuring the gap between x and

D_{d e c} (D_{e n c} (x))

, we calculate the difference between

f (D_{d e c} (D_{e n c} (x)))

and

f (D_{d e c} (D_{e n c} (x + r)))

. As we know, adversarial perturbations are tiny but can cause a well-trained classifier to output incorrect results. This means that the gap between

f (x)

and

f (x + r)

is greater than the gap between

x

and

x + r

, and using the greater difference to guide the training of a denoiser can effectively remove adversarial perturbations. To further advance the performance of the protected model on reconstructed images, the relative reconstruction constraint is utilized to measure the gap between

D_{d e c} (D_{e n c} (x))

and

D_{d e c} (D_{e n c} (x) + ξ)

; here,

ξ

is the random Gaussian noise. A benign sample and its adversarial sample are very similar; therefore, under the same supervision information, the latent representations of a clean example and its adversarial sample generated by

D_{e n c}

are also similar. We can regard the latent representation of an adversarial sample as the addition of some tiny noise to the latent representation of its clean version. The relative reconstruction constraint can effectively eliminate the effect of subtle noise on the decoder, thereby advancing the prediction performance of the protected model. We test the proposed approach on CIFAR-10 and ImageNet, and the experimental results show that compared with several state-of-the-art input reconstruction approaches, our approach achieves outstanding performance in resisting different kinds of adversarial attacks.

In summary, the key contributions of this work are as follows:

(1): We introduce a new input reconstruction defense for resisting adversarial examples which denoises inputs before they are inputted into the protected model. This allows us to provide protection to a deployed model without modifying its architecture or parameters.
(2): We utilize the high-level representation difference constraint and the relative reconstruction constraint to guide the training of the denoiser. The high-level representation difference constraint is in charge of effectively removing adversarial perturbations, and the relative reconstruction constraint guarantees that tiny perturbations will not interfere with our denoiser.
(3): Extensive experiments using two real datasets verify that the presented method achieves outstanding performance in resisting adversarial attacks.

In the remaining sections of this paper, we introduce adversarial attacks and defenses in Section 2, and we present the proposed input reconstruction approach in Section 3. We show the experimental settings, results, and analysis in Section 4. Finally, Section 5 presents the conclusions of this work.

2. Related Work

2.1. Crafting Adversarial Examples

For a well-trained classifier

f

with parameters

θ

that maps

x \in R^{m}

to

y

, the objective of adversarial attacks is:

\min ||r|| s . t . f (x + r, θ) \neq y .

(1)

FGSM [15] is a very simple adversarial attack that only calculates the gradient once and is formulated as

x_{a d v} = x + ε \cdot s i g n (\nabla_{x} L (f (x, θ), y)),

(2)

where

ε

represents a constant,

L (\cdot)

is the loss function, and

s i g n

represents the symbolic function. BIM [16] is an extended version of FGSM, which uses multiple small steps to replace the single step of FGSM. PGD [17] is a variant of BIM. It first adds random perturbations to

x

and then iteratively produces the adversarial example as BIM does. Dong et al. [18] added a momentum term into the iteration process and proposed MIM. Papernot et al. [19] adopted the

L_{0}

norm to limit the generated perturbations and produced adversarial samples by modifying small amounts of pixels in clean images. Su et al. [20] presented the one-pixel attack, which only changes one pixel in an image to fool classifiers. DeepFool [21] finds smaller perturbations than FGSM and leverages a linear approximation to perform an iterative attack. Carlini and Wagner [22] presented CW against defensive distillation. CW contains three attacks, CW₀, CW₂, and CW∞, which use the

L_{0}

,

L_{2}

, and

L_{\infty}

norms, respectively. There are also some attacks based on the generative adversarial network. Xiao et al. [23] proposed AdvGAN, which maps clean samples into adversarial perturbations through well-trained generators. Jandial et al. [24] extended AdvGAN and proposed AdvGAN++, which adds a feature extractor on the basis of AdvGAN. Zhao et al. [25] proposed Natural GAN to generate more natural adversarial images. Ref. [26] introduces a physical-world attack. Xu et al. [27] designed T-shirts with special patterns so that the wearer can successfully evade camera recognition algorithms. Sharif et al. [28] developed adversarial glasses to deceive face recognition systems. As the focus of this work is defense, classic attacks with high attack success rates such as BIM, PGD, and CW₂ are adopted to produce adversarial samples to test our method.

2.2. Defenses against Adversarial Attacks

Previous research studies have proposed a number of defense strategies to try to resist adversarial attacks. These defenses can be roughly divided into proactive defenses and reactive defenses. Proactive defenses, such as randomized models [29,30], defensive distillation [6,7], distributional smoothing [31], adversarial training [8,9,10,11], verifiable defense [32,33] and gradient regularization [4,5], aim to refine the target model. Proactive defense approaches often require modifications to the target model’s architecture or training process and require more experimental samples or a higher computational cost. Unfortunately, once a deep model is trained, there are always more powerful attacks that can successfully fool it.

Reactive defenses advance the robustness of the protected model by indirect means without changing the trained parameters and the network structure. For example, [34] trains binary classifiers based on DNN to distinguish adversarial samples. Metzen et al. [35] used the high-level outputs of the protected model as input features of an additional auxiliary model to identify adversarial samples. Ref. [36] utilized the representation difference in inputs of different models to identify adversarial examples. These methods only detected adversarial samples without further processing them. In addition, some studies focused on preprocessing inputs before they were inputted into the protected model. Ref. [37] adjusts the input example to a random size to reduce the interference against perturbations. Ref. [38] used the JPEG compression technology to preprocess input images to remove high-frequency signal components and selectively blurred input images to remove adversarial perturbations. These methods mainly leveraged compression technologies to destroy the structure of adversarial perturbations; this makes significant changes to the input images and greatly affects the prediction performance of the target model. DAE [12] and ComDefend [13] utilize the plain autoencoder to reconstruct input images. HGD [14] and TD [39] adopt the difference over high-level representations activated by clean images and their adversarial versions to train denoisers. When compared with input compression methods, input reconstruction methods can obtain better preprocessing results but still cause the target model to lose some prediction accuracy. To further advance the performance of the input reconstruction approach, we combine the high-level representation difference constraint with the relative reconstruction constraint to enhance the prediction performance of the protected model on reconstructed examples.

3. Methodology

Our proposed approach, the dual autoencoder denoiser (DAED), contains a denoiser with a dual autoencoder structure and a well-trained classifier for magnifying the gap between benign examples and their adversarial examples (see Figure 1). We first cover the details of the reconstruction module for our method, followed by the design and objectives of our method.

3.1. Reconstruction Module

Our method adopts the dual autoencoder architecture as the reconstruction module. The difference between a dual autoencoder and a plain autoencoder in structure is that an autoencoder contains an encoder and a decoder, while a dual autoencoder contains an encoder and two decoders. Suppose

x

is an input image. Then, the encoder

D_{e n c}

projects

x

into a latent space:

z = D_{e n c} (x),

(3)

here,

z

represents the latent representation of

x

. The decoder

D_{d e c}

then reconstructs

x

based on

z

:

x^{'} = D_{d e c} (z),

(4)

here,

x^{'}

represents the reconstructed image of

x

. The entire editing process is:

x^{'} = D_{d e c} (D_{e n c} (x)) .

(5)

The dual autoencoder adds a decoder on the basis of the plain autoencoder, which uses the noisy

z

to reconstruct

x

:

x^{″} = D_{d e c} (D_{e n c} (x) + ξ),

(6)

here,

ξ

represents the Gaussian noise with

μ = 0

and

σ = 0.1

. In this work, we designed two dual autoencoders for CIFAR-10 and ImageNet. The structural details of the dual autoencoder for CIFAR-10 are shown in Table 1, and the structural details of the dual autoencoder for ImageNet are shown in Table 2. Since the two decoders of a dual autoencoder have the same structure and share parameters, we only display the structure of one decoder.

3.2. Training Formulation

Let

x_{c}

be a clean image, and let

x_{a d v}

be the adversarial image of

x_{c}

. For a target classifier

f

with parameters

θ

, we hope that the outputs of

f

with

x_{c}

, the reconstructed

x_{c},

and the reconstructed

x_{a d v}

as inputs are consistent. That is,

f (x_{c}^{'}, θ) = f (x_{a d v}^{'}, θ) = f (x_{c}, θ) .

(7)

We aim to eliminate the antagonism of adversarial samples while maintaining the performance of the target model regardless of whether

x_{c}

,

x_{c}^{'},

and

x_{a d v}^{'}

are similar or not. Therefore, we abandon the absolute reconstruction constraint to reduce the impact on the prediction accuracy and adopt the relative reconstruction constraint to close the gap between

x^{'}

and

x^{″}

. The relative reconstruction loss is:

\begin{array}{l} L_{r e c} & = {| | x^{'} - x^{″} | |}_{2} \\ = {| | D_{d e c} (D_{e n c} (x)) - D_{d e c} (D_{e n c} (x) + ξ) | |}_{2} . \end{array}

(8)

As we described above,

x^{'}

represents the reconstructed sample of

x

based on the latent representation

z

, and

x^{″}

represents the reconstructed version of

x

based on the noisy latent representation

z + ξ

. The relative reconstruction loss can shorten the gap between

x^{'}

and

x^{″}

, which means that adding tiny noise to the latent representation cannot significantly change the reconstruction result. A clean image is very similar to its adversarial version; therefore, under the same supervision information, their latent representations will also be similar. The latent representation of an adversarial example can be considered the addition of tiny noise to the latent representation of its clean version. The relative reconstruction loss can effectively eliminate the adverse effect caused by adversarial perturbations, thus improving the ability of our method to resist adversarial examples.

In addition to the relative reconstruction loss, we adopt the high-level representation difference constraint to promote the classifier to correctly recognize reconstructed images. In this work, we adopt the logits as the high-level representations to highlight the gap between benign samples and their corresponding adversarial samples, i.e., the logits pairing loss (see Figure 1). The logits pairing loss enables the reconstructed examples to be classified correctly by the target model:

L_{l o g i t s} = {| | Z (x) - Z (x^{'}) | |}_{2},

(9)

where

Z (\cdot)

represents the well-trained classifier that removes the final Softmax layer. By fusing the relative reconstruction loss and the logits pairing loss, the overall objective of our approach is achieved,

m i n i m i z e {L o s s}_{t o t a l} = {L o s s}_{r e c} + α \cdot {L o s s}_{l o g i t s},

(10)

where

α

denotes a hyperparameter that balances the losses.

4. Experiments

We first present the experimental settings in this section and then compare the proposed approach with some state-of-the-art input reconstruction approaches on two real datasets.

4.1. Experimental Settings

4.1.1. Datasets

We test the proposed approach on two real datasets, CIFAR-10 and ImageNet. CIFAR-10 contains 60,000 color images with the shape

(32 \times 32)

. For ImageNet, limited by computing resources, we select ten categories from ILSVRC2012, namely, teapot, hummingbird, rapeseed, violin, ice cream, admiral, goldfish, axolotl, ostrich, and chameleon. Each class contains 1300 training samples and 50 test samples. The images in ImageNet have different sizes, so we crop the long side of each image to the same length as the short side and then resize all images to (

224 \times 224

). Since our task is not classification, we do not carry out any data augmentation on the training data for simplicity.

4.1.2. Classifiers

We design a classifier called C-1 for CIFAR-10. The detailed architecture of this classifier is (C(64, 3, 1) + ReLU, C(64, 3, 1) + ReLU, Max Pooling

2 \times 2

, C(128, 3, 1) + ReLU, C(128, 3, 1) + ReLU, Max Pooling

2 \times 2

, Fully Connected 256, Fully Connected 256, Softmax 10). Other classifiers adopted in our work are displayed in Table 3. For each classifier, we replace its fully connected layers with a single fully connected layer that has 10 output neurons. Each classifier is trained by Adam (

β_{1} = 0.9

,

β_{2} = 0.999

), with epochs of 50, a batch size of 128, and an initial learning rate of 0.001. Under these settings, all classifiers can achieve satisfactory classification accuracy.

4.1.3. Attack Techniques

We adopt five attacks, i.e., FGSM [15], PGD [17], MIM [18], BIM [16], and CW₂ [22], to test the performance of the proposed method. For FGSM, PGD, MIM and BIM, the

ε

is set to

8 / 255

(see Equation (2)).

4.1.4. Parameter Settings

The presented method is trained by Adam (

β_{1} = 0.5

,

β_{2} = 0.999

) with epochs of 100, a batch size of 128, and a learning rate of 0.0001. The

α

is set to 1.0 in Equation (10). Each batch contains 64 benign samples and the corresponding adversarial samples. The training adversarial samples are produced by C-1 and VGG-16 for CIFAR-10 and ImageNet, respectively.

4.2. Experimental Results Evaluation

We compare the presented approach with DAE [12], HGD [14], and TD [39]. The experimental results of these input reconstruction approaches are compared in terms of resisting white-box attacks and defending against black-box attacks. Beyond that, we also analyze the defense performance of the presented method on different classifiers.

4.2.1. Resisting White-Box Attacks

In the experimental setting of resisting white-box attacks, C-1 is the protected model for CIFAR-10 and VGG-16 is the protected model for ImageNet. All adversarial samples including training adversarial samples and testing adversarial samples, are produced by the two classifiers. Table 4 shows the classification accuracy of C-1 and VGG-16 on testing sets obtained by different defense approaches. NA means no assistance attack and no defense. On CIFAR-10, with the help of PGD (adversarial examples produced by PGD participated in the training of defenses), the accuracy of the target classifier is significantly improved on adversarial samples under the protection of different defenses. Among these defenses, DAE performs the worst, HGD and TD perform significantly better than DAE, and our method performs the best. DAE achieves an accuracy of 0.636 on clean images, but only achieves a classification accuracy of more than 0.4 on adversarial images. This means that although DAE can effectively reconstruct inputs, it is difficult to remove tiny perturbations in the inputs. DAE uses the gap between benign images and their corresponding adversarial samples to guide the training of a denoiser. As we know, a clean image is very similar to its adversarial version. Therefore, it is difficult to highlight the gap between a clean sample and its adversarial sample only using the

L_{2}

norm. HGD and TD use the gap between the high-level representations activated by clean samples and their adversarial samples to guide the training of a denoiser. The high-level representations can highlight the gap between benign examples and their corresponding adversarial examples, so HGD and TD can effectively remove adversarial perturbations. Except for the high-level representation difference constraint, our method adds the relative reconstruction constraint to a denoiser, which guarantees that tiny perturbations will not interfere with the decoder, thus improving the classification performance of the protected model on the reconstructed examples. With the help of BIM and MIM, our approach’s performance is still the best. HGD and TD perform similarly, ranking second and third, and DAE’s performance is still the worst. On the ImageNet, HGD and DAED perform similarly with the help of PGD, followed by TD, and DAE performs the worst. With the help of BIM and MIM, DAED has the best defense performance, followed by HGD, TD ranks third, and DAE ranks forth.

It is worth noting that these defenses perform better with the help of MIM. The adversarial examples generated by MIM have better transferability than the adversarial examples generated by PGD and BIM; therefore, these defenses have good generalization with the help of MIM. In the following experiments, we leverage MIM to assist in training all denoisers.

4.2.2. Resisting Black-Box Attacks

Since black-box attacks have a poor effect on ImageNet under the experimental settings in this work, the different input reconstruction methods are only compared and analyzed on CIFAR-10. C-1 is the target classifier, and all training adversarial images are produced by MIM with C-1. The testing adversarial samples are generated by VGG-16, MobileNet, and ResNet-50, respectively. Table 5 displays the defense performances of defenses against black-box attacks. We can see that whether resisting adversarial samples produced by VGG-16, MobileNet, or ResNet-50, the presented approach is better than other input reconstruction approaches. It is worth noting that when resisting the adversarial examples produced by CW₂, the accuracy of the protected classifier in the reconstructed images is lower than that in the adversarial images. CW₂ has a strong attack power under the white-box setting, but the attack success rate of CW₂ is low under the black-box setting because the perturbations generated by CW₂ are too small. In Figure 2, we show the histograms of the classification accuracy of the protected classifier under the protection of different defenses. We can intuitively see that whether VGG-16 is used as the surrogate classifier to produce adversarial samples, MobileNet is used as the surrogate classifier to produce adversarial samples, or ResNet-50 is used as the surrogate classifier to produce adversarial samples, our method is ahead of other input reconstruction defenses. Taken together, our approach can effectively advance the classification performance of the protected classifier against both white-box attacks and black-box attacks.

4.2.3. Protecting Different Classifiers

Table 6 displays the defense performances of different defenses when protecting different classifiers on CIFAR-10 and ImageNet. We select ResNet-50 and MobileNet for CIFAR-10, and MobileNet for ImageNet. All training adversarial samples and testing adversarial samples are produced by different attacks with the protected model. On CIFAR-10 dataset, when MobileNet is used as the target model, our method is inferior to HGD when reconstructing adversarial examples generated by PGD, and it is superior to other defense methods in other cases. When ResNet-50 is used as the protected classifier, the presented approach is superior to other defenses in most cases. Even if our approach is worse than TD in reconstructing clean images and worse than HGD in reconstructing adversarial images generated by CW₂, it is only 0.07 less than TD and 0.01 less than HGD, respectively. On ImageNet, our method is inferior to HGD only when defending against PGD. Figure 3 intuitively shows the performances of the different defense methods when protecting different classifiers. Overall, when protecting more complex models, DAED can still play a good role in protection. Our method provides protection for models in the inference phase, it does not require changing the target classifier, and it can be combined with other defenses easily.

5. Conclusions

In this work, we introduce a new input reconstruction method called DAED to resist adversarial samples. The presented method leverages the high-level representation difference constraint to guide a denoiser to effectively remove adversarial perturbations and utilizes the relative reconstruction constraint to further advance the prediction performance of the protected model on the reconstructed images. We evaluate our method in three scenarios: resisting white-box attacks, resisting black-box attacks, and protecting different classifiers. The experimental results show that the presented method is very stable and can be reused to provide effective protection for different classifiers against different attacks after being trained with the assistance of a specific attack. Although the presented approach has good generalization, its transferability is not very good. We find that our method will add unexpected noise to the reconstructed images, which can only be recognized by the protected classifier. The advantage is that the unexpected noise can effectively destroy the adversarial perturbations in images, and the cost is to reduce the transferability of our method. In our future study, we will explore ways to balance the generalization and transferability of our method, and further advance the defense performance of the presented approach.

Author Contributions

Conceptualization, S.G.; methodology, S.G. and X.W.; software, S.G. and X.W.; validation, S.G.; formal analysis, X.W.; investigation, Y.D.; data curation, Y.D.; writing—original draft preparation, S.G.; writing—review and editing, X.W.; supervision, S.Y.; funding acquisition, S.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under grant number 62101480 and 62162067, in part by the Research and Application of Object detection based on Artificial Intelligence, and the Yunnan Province Science Foundation under Grant No.202001BB050076 and No.202005AC160007.

Data Availability Statement

Data available in a publicly accessible repository that does not issue DOIs. Publicly available datasets were analyzed in this study. This data can be found here: [http://www.cs.toronto.edu/~kriz/cifar.html; https://image-net.org/challenges/LSVRC/index.php].

Conflicts of Interest

The authors declare no conflict of interest.

References

Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7132–7141. [Google Scholar]
Jing, L.; Chen, Y.; Tian, Y. Coarse-to-Fine semantic segmentation from image-level labels. IEEE Trans. Image Process. 2020, 29, 225–236. [Google Scholar] [CrossRef] [PubMed]
Fan, C.; Yi, J.; Tao, J.; Tian, Z.; Liu, B.; Wen, Z. Gated recurrent fusion with joint training framework for robust end-to-end speech recognition. IEEE/ACM Trans. Audio Speech Lang. Process. 2021, 29, 198–209. [Google Scholar] [CrossRef]
Lyu, C.; Huang, K.; Liang, H.-N. A unified gradient regularization family for adversarial examples. In Proceedings of the IEEE International Conference on Data Mining, Atlantic City, NJ, USA, 14–17 November 2015; pp. 301–309. [Google Scholar]
Ross, A.; Doshi-Velez, F. Improving the adversarial robustness and interpretability of deep neural networks by regularizing their input gradients. In Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018; pp. 1660–1669. [Google Scholar]
Papernot, N.; McDaniel, P.; Wu, X.; Jha, S.; Swami, A. Distillation as a defense to adversarial perturbations against deep neural networks. In Proceedings of the IEEE Symposium on Security and Privacy, San Jose, CA, USA, 22–26 May 2016; pp. 582–597. [Google Scholar]
Papernot, N.; McDaniel, P. Extending defensive distillation. arXiv 2017, arXiv:1705.05264. [Google Scholar]
Xie, C.; Wu, Y.; Maaten, L.; Yuille, A.; He, K. Feature denoising for improving adversarial robustness. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 501–509. [Google Scholar]
Song, C.; He, K.; Lin, J.; Wang, L.; Hopcroft, J. Robust local features for improving the generalization of adversarial training. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
Huang, R.; Xu, B.; Schuurmans, D.; Szepesvari, C. Learning with a strong adversary. In Proceedings of the International Conference on Learning Representations, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
Kurakin, A.; Goodfellow, I.; Bengio, S. Adversarial machine learning at scale. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
Gu, S.; Rigazio, L. Towards deep neural network architectures robust to adversarial examples. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Jia, X.; Wei, X.; Cao, X.; Foroosh, H. ComDefend: An efficient image compression model to defend adversarial examples. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 16–20 June 2019; pp. 6084–6092. [Google Scholar]
Liao, F.; Liang, M.; Dong, Y.; Pang, T.; Hu, X.; Zhu, J. Defense against adversarial attacks using high-level representation guided denoiser. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1778–1787. [Google Scholar]
Goodfellow, I.; Shlen, J.; Szegedy, C. Explaining and harnessing adversarial examples. In Proceedings of the International Conference on Learning Representations, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
Kurakin, A.; Goodfellow, I.; Bengio, S. Adversarial examples in the physical world. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards deep learning models resistant to adversarial attacks. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Dong, Y.; Liao, F.; Pang, T.; Su, H.; Zhu, J.; Hu, X.; Li, J. Boosting adversarial attacks with momentum. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 9185–9193. [Google Scholar]
Papernot, N.; McDaniel, P.; Jha, S.; Fredrikson, M.; Celik, Z.; Swami, A. The limitations of deep learning in adversarial settings. In Proceedings of the IEEE European Symposium on Security and Privacy, Saarbrücken, Germany, 21–24 March 2016; pp. 372–387. [Google Scholar]
Su, J.; Vargas, D.V.; Sakurai, K. One Pixel Attack for Fooling Deep Neural Networks. IEEE Trans. Evol. Comput. 2019, 23, 828–841. [Google Scholar] [CrossRef]
Moosavi-Dezfooli, S.-M.; Fawzi, A.; Frossard, P. DeepFool: A simple and accurate method to fool deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2574–2582. [Google Scholar]
Carlini, N.; Wagner, D. Towards evaluating the robustness of neural networks. In Proceedings of the IEEE Symposium on Security and Privacy, San Jose, CA, USA, 22–26 May 2017; pp. 39–57. [Google Scholar]
Xiao, C.; Li, B.; Zhu, J.; He, W.; Liu, M.; Song, D. Generating adversarial examples with adversarial networks. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 3905–3911. [Google Scholar]
Jandial, S.; Mangla, P.; Varshney, S.; Balasubramanian, V. AdvGAN++: Harnessing latent layers for adversary generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision Workshops, Seoul, Republic of Korea, 27–28 October 2019; pp. 2045–2048. [Google Scholar]
Zhao, Z.; Dua, D.; Singh, S. Generating natural adversarial examples. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Eykholt, K.; Evtimov, I.; Fernandes, E.; Li, B.; Rahmati, A.; Xiao, C.; Prakash, A.; Kohno, T.; Song, D. Robust physical-world attacks on deep learning visual classification. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1625–1634. [Google Scholar]
Xu, K.; Zhang, G.; Liu, S.; Fan, Q.; Sun, M.; Chen, H.; Chen, P.; Wang, Y.; Lin, X. Adversarial T-shirt! Evading person detectors in a physical world. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 665–681. [Google Scholar]
Sharif, M.; Bhagavatula, S.; Bauer, L.; Reiter, M. Accessorize to a crime: Real and stealthy attacks on state-of-the-art face recognition. In Proceedings of the ACM Conference on Computer and Communications Security, Vienna, Austria, 24–28 October 2016; pp. 1528–1540. [Google Scholar]
Lecuyer, M.; Atlidakis, V.; Geambasu, R.; Hsu, D.; Jana, S. Certified robustness to adversarial examples with differential privacy. In Proceedings of the IEEE Symposium on Security and Privacy, San Francisco, CA, USA, 19–23 May 2019; pp. 656–672. [Google Scholar]
Liu, X.; Cheng, M.; Zhang, H.; Hsieh, C. Towards robust neural networks via random self-ensemble. In Proceedings of the European Conference, Munich, Germany, 8–14 September 2018; pp. 381–397. [Google Scholar]
Miyato, T.; Meada, S.; Koyama, M.; Nakae, K.; Ishii, S. Distributional smoothing with virtual adversarial training. In Proceedings of the International Conference on Learning Representations, San Juan, PR, USA, 2–4 May 2016. [Google Scholar]
Wong, E.; Kolter, J. Provable defenses against adversarial examples via the convex outer adversarial polytope. In Proceedings of the International Conference on Machine Learning, Stockholmsmässan, Stockholm, Sweden, 10–15 July 2018; pp. 5283–5292. [Google Scholar]
Dvijotham, K.; Gowal, S.; Stanforth, R.; Arandjelovic, R.; O’Donoghue, B.; Uesato, J.; Kohli, P. Training verified learners with learned verifiers. arXiv 2018, arXiv:1805.10265. [Google Scholar]
Gao, S.; Yu, S.; Wu, L.; Yao, S.; Zhou, X. Detecting adversarial examples by additional evidence from noise domain. IET Image Process. 2022, 16, 378–392. [Google Scholar] [CrossRef]
Metzen, J.; Genewein, T.; Fischer, V.; Bischoff, B. On detecting adversarial perturbations. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
Gao, S.; Wang, R.; Wang, X.; Yu, S.; Dong, Y.; Yao, S.; Zhou, W. Detecting adversarial examples on deep neural networks with mutual information neural estimation. IEEE Trans. Depend. Secure Comput. 2023. [Google Scholar] [CrossRef]
Xie, C.; Wang, J.; Zhang, Z.; Ren, Z.; Yuille, A. Mitigating adversarial effects through randomization. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Das, N.; Shanbhogue, M.; Chen, S.; Hohman, F.; Chen, L.; Kounavis, M.; Chau, D. Keeping the bad guys out: Protecting and vaccinating deep learning with JPEG compression. arXiv 2017, arXiv:1705.02900. [Google Scholar]
Gao, S.; Yao, S.; Li, R. Transferable adversarial defense by fusing reconstruction learning and denoising learning. In Proceedings of the IEEE Conference on Computer Communications Workshops, Vancouver, BC, Canada, 10–13 May 2021; pp. 1–6. [Google Scholar]

Figure 1. Illustration of our approach. The logits pairing loss measures the distance between the high-level representations (logits in this work) of benign examples and their adversarial versions.

Figure 2. Histograms of the classification performances of the protected classifier (C-1) under the protection of different defenses when facing black-box attacks. (a) Adversarial samples are produced by different attacks with VGG-16. (b) Adversarial samples are produced by different attacks with MobileNet. (c) Adversarial samples are produced by different attacks with ResNet-50.

Figure 3. Defense effect of the proposed method when protecting different classifiers. For CIFAR-10, C-1 is adopted as the auxiliary model, and MobileNet and ResNet50 are used as the protected models. For ImageNet, VGG-16 is adopted as the auxiliary model, and MobileNet is used as the protected model. (a) MobileNet is adopted as the protected model on CIFAR-10. (b) ResNet-50 is adopted as the protected model on CIFAR-10. (c) MobileNet is adopted as the protected model on ImageNet.

Table 1. Dual autoencoder architecture for CIFAR-10.

CIFAR-10
$Encoder (D_{e n c})$	$Decoder (D_{d e c})$
C(64, 3, 1), BN, LeakyReLU	Up Sampling 2 × 2
Max Pooling 2 × 2	C(256, 3, 1), BN, ReLU
C(128, 3, 1), BN, LeakyReLU	Up Sampling 2 × 2 autoencoder architecture for
Max Pooling 2 × 2	C(128, 3, 1), BN, ReLU
C(256, 3, 1), BN, LeakyReLU	Up Sampling 2 × 2
Max Pooling 2 × 2	C(64, 3, 1), BN, ReLU
	C(3, 5, 1), Tanh

C(

d

, k,

s

) represents the convolutional layer with

c

as the dimension,

k

as the kernel size and

s

as the stride. BN is the batch normalization.

Table 2. Dual autoencoder architecture for ImageNet.

ImageNet
$Encoder (D_{e n c})$	$Decoder (D_{d e c})$
C(64, 3, 1), BN, LeakyReLU	Up Sampling 2 × 2
Max Pooling 2 × 2	C(512, 3, 1), BN, ReLU
C(128, 3, 1), BN, LeakyReLU	Up Sampling 2 × 2
Max Pooling 2 × 2	C(512, 3, 1), BN, ReLU
C(256, 3, 1), BN, LeakyReLU	Up Sampling 2 × 2
Max Pooling 2 × 2	C(256, 3, 1), BN, ReLU
C(512, 3, 1), BN, LeakyReLU	Up Sampling 2 × 2
Max Pooling 2 × 2	C(128, 3, 1), BN, ReLU
C(512, 3, 1), BN, LeakyReLU	Up Sampling 2 × 2
Max Pooling 2 × 2	C(64, 3, 1), BN, ReLU
	C(3, 5, 1), Tanh

C(

d

, k,

s

) represents the convolutional layer with

d

as the dimension,

k

as the kernel size and

s

as the stride. BN is the batch normalization.

Table 3. Classifiers for CIFAR-10 and ImageNet.

CIFAR-10	ImageNet
VGG-16	VGG-16
MoblieNet	MobileNet
ResNet-50

Table 4. The defense capability of input reconstruction approaches in resisting white-box attacks on CIFAR-10 and ImageNet. For CIFAR-10, C-1 is used as the protected classifier, and all adversarial samples are produced by C-1. For ImageNet, VGG-16 is adopted as the target classifier, and all adversarial samples are generated by VGG-16.

Dataset	Assistant Attack	Method	Clean	FGSM	BIM	PGD	MIM	CW₂
CIFAR-10	NA	NA	0.786	0.178	0.011	0.016	0.017	0.121
	PGD	DAE	0.636	0.461	0.489	0.513	0.432	0.598
		HGD	0.693	0.608	0.641	0.647	0.621	0.688
		TD	0.694	0.606	0.639	0.645	0.619	0.687
		ours	0.699	0.617	0.653	0.659	0.628	0.693
	BIM	DAE	0.629	0.461	0.503	0.518	0.437	0.596
		HGD	0.676	0.589	0.627	0.628	0.603	0.670
		TD	0.678	0.597	0.633	0.636	0.606	0.671
		Ours	0.698	0.608	0.648	0.656	0.626	0.694
	MIM	DAE	0.632	0.459	0.501	0.533	0.441	0.594
		HGD	0.691	0.619	0.649	0.650	0.621	0.687
		TD	0.671	0.601	0.636	0.638	0.614	0.667
		ours	0.705	0.641	0.669	0.674	0.652	0.701
ImageNet	NA	NA	0.926	0.036	0	0	0	0.044
	PGD	DAE	0.442	0.366	0.402	0.396	0.376	0.432
		HGD	0.812	0.764	0.800	0.790	0.770	0.812
		TD	0.796	0.762	0.784	0.784	0.770	0.794
		ours	0.810	0.770	0.790	0.790	0.776	0.816
	BIM	DAE	0.434	0.362	0.384	0.378	0.346	0.432
		HGD	0.796	0.774	0.784	0.774	0.772	0.790
		TD	0.778	0.748	0.764	0.768	0.752	0.774
		Ours	0.804	0.784	0.778	0.784	0.776	0.790
	MIM	DAE	0.450	0.378	0.402	0.410	0.370	0.446
		HGD	0.814	0.762	0.780	0.782	0.770	0.806
		TD	0.792	0.752	0.778	0.772	0.760	0.786
		ours	0.822	0.762	0.806	0.808	0.790	0.824

Table 5. The defense capabilities of different input reconstruction approaches in defending against black-box attacks on CIFAR-10. The C-1 is used as the target classifier, and adversarial images are produced by VGG-16, MobileNet, and ResNet-50.

Model	Method	Clean	FGSM	BIM	PGD	MIM	CW₂
VGG-16	NA	0.786	0.573	0.674	0.681	0.586	0.765
	DAE	0.612	0.572	0.590	0.594	0.582	0.602
	HGD	0.691	0.635	0.672	0.669	0.651	0.684
	TD	0.691	0.641	0.671	0.678	0.657	0.687
	Ours	0.705	0.651	0.685	0.688	0.665	0.700
MobileNet	NA	0.786	0.596	0.704	0.710	0.620	0.778
	DAE	0.612	0.572	0.591	0.592	0.580	0.608
	HGD	0.691	0.625	0.662	0.662	0.639	0.688
	TD	0.691	0.642	0.672	0.671	0.651	0.688
	ours	0.705	0.648	0.683	0.683	0.661	0.702
ResNet-50	NA	0.786	0.554	0.605	0.624	0.545	0.774
	DAE	0.612	0.548	0.564	0.571	0.558	0.604
	HGD	0.691	0.605	0.629	0.631	0.612	0.686
	TD	0.692	0.618	0.638	0.647	0.621	0.685
	ours	0.709	0.625	0.649	0.658	0.633	0.702

Table 6. The defense capabilities of different methods in protecting different models. All training adversarial examples and testing adversarial samples are produced by different attacks with the protected model.

Dataset	Model	Method	Clean	FGSM	BIM	PGD	MIM	CW₂
CIFAR-10	MobileNet	NA	0.825	0.100	0.036	0.038	0.025	0.145
		DAE	0.676	0.456	0.562	0.579	0.470	0.664
		HGD	0.667	0.626	0.649	0.657	0.633	0.667
		TD	0.673	0.633	0.655	0.650	0.640	0.671
		ours	0.688	0.648	0.673	0.652	0.647	0.686
	ResNet-50	NA	0.806	0.081	0.024	0.026	0.027	0.111
		DAE	0.678	0.437	0.488	0.518	0.434	0.679
		HGD	0.692	0.611	0.639	0.633	0.620	0.688
		TD	0.704	0.632	0.649	0.655	0.635	0.680
		ours	0.697	0.649	0.651	0.655	0.640	0.687
ImageNet	MobileNet	NA	0.954	0.428	0.134	0.140	0.134	0.026
		DAE	0.678	0.670	0.676	0.676	0.660	0.678
		HGD	0.834	0.818	0.830	0.842	0.814	0.834
		TD	0.776	0.758	0.772	0.772	0.760	0.772
		ours	0.854	0.846	0.848	0.840	0.851	0.858

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gao, S.; Wang, X.; Dong, Y.; Yao, S. Using High-Level Representation Difference Constraint and Relative Reconstruction Constraint for Defending against Adversarial Attacks. Electronics 2023, 12, 2017. https://doi.org/10.3390/electronics12092017

AMA Style

Gao S, Wang X, Dong Y, Yao S. Using High-Level Representation Difference Constraint and Relative Reconstruction Constraint for Defending against Adversarial Attacks. Electronics. 2023; 12(9):2017. https://doi.org/10.3390/electronics12092017

Chicago/Turabian Style

Gao, Song, Xiaoxuan Wang, Yunyun Dong, and Shaowen Yao. 2023. "Using High-Level Representation Difference Constraint and Relative Reconstruction Constraint for Defending against Adversarial Attacks" Electronics 12, no. 9: 2017. https://doi.org/10.3390/electronics12092017

APA Style

Gao, S., Wang, X., Dong, Y., & Yao, S. (2023). Using High-Level Representation Difference Constraint and Relative Reconstruction Constraint for Defending against Adversarial Attacks. Electronics, 12(9), 2017. https://doi.org/10.3390/electronics12092017

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Using High-Level Representation Difference Constraint and Relative Reconstruction Constraint for Defending against Adversarial Attacks

Abstract

1. Introduction

2. Related Work

2.1. Crafting Adversarial Examples

2.2. Defenses against Adversarial Attacks

3. Methodology

3.1. Reconstruction Module

3.2. Training Formulation

4. Experiments

4.1. Experimental Settings

4.1.1. Datasets

4.1.2. Classifiers

4.1.3. Attack Techniques

4.1.4. Parameter Settings

4.2. Experimental Results Evaluation

4.2.1. Resisting White-Box Attacks

4.2.2. Resisting Black-Box Attacks

4.2.3. Protecting Different Classifiers

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI