Next Article in Journal
Histopathology and Genetic Biomarkers of Choroidal Melanoma
Previous Article in Journal
Special Issue on Unmanned Aerial Vehicles (UAVs)
 
 
Article
Peer-Review Record

On the Effectiveness of Adversarial Training in Defending against Adversarial Example Attacks for Image Classification

Appl. Sci. 2020, 10(22), 8079; https://doi.org/10.3390/app10228079
by Sanglee Park and Jungmin So *
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Reviewer 4: Anonymous
Reviewer 5: Anonymous
Appl. Sci. 2020, 10(22), 8079; https://doi.org/10.3390/app10228079
Submission received: 18 September 2020 / Revised: 3 November 2020 / Accepted: 11 November 2020 / Published: 14 November 2020

Round 1

Reviewer 1 Report

The article gives a clear illustration based on well documented experiments of the results achievable through adversarial training methodologies to improve the robustness of neural networks in the face of the possibility of attacks aimed at degrading their performance. The main techniques for creating attacks are used to define models of adversarial training whose performance is compared and evaluated. The results show that there is still a considerable margin of vulnerability of deep learning methodologies in the face of the possibility of attacks, despite the progress made following the development of adversarial training methodologies. The article is clearly written despite the considerable complexity of the technical aspects considered. and can be useful to anyone who wants to use deep learning for any classification application that involves training through data that offer the side to attacks through misleading data samples.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

It is clear and nicely written paper. The adversary strategies are well explained. The conclusions can be expanded with elaborated future work and relevant discussions. Will suit different levels and experience of the readers.

Some minor remarks for textual corrections:


58-60: Training a model with adversarial images created from multiple distance metrics "could actually improve the robustness, but will improve the cost of training"

188: noise is added "whether" at the background or near the digit.

200: a typical technique is "to results" from an ensemble of multiple networks

341: the results in Table 2 and Table 3, we "could" see that adding adversarial samples in the training data

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

This paper conducts an experiment on MNIST data set by training neural network with adversarial examples generated from three adversarial attack algorithms (and the combination of those). Then the author checked the resulting network for the resistance to those adversarial attacks.  

Technically the experiments are sound. However, there is little novelty in this paper, since the principle messages are already widely known in the community. While initially the adversarial training was proposed as a defense against adversarial examples, it was quickly discovered that such defense only works against the specific known attacks. Such defense can not cover unknown adversarial attacks, and is easily broken down when the attacker knows that adversarial training is applied against known attacks. The experiment results simply confirmed the futility of trying to use adversarial training to be resistant to all adversarial attacks.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 4 Report

Dear Authors

Your research is well organized, the idea is clear (good strategy for the training method),  and the results show the best option. 

Please explain some real cases where could be used your proposal.

Regards.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 5 Report

In this paper the robustness of Deep Neural Networks (DNN) against adversarial attacks is studied. The authors perform experiments using the LeNet5 network and ResNet by training the networks using the MNIST dataset. The experiments included training using FGSM, JSMA, and C&W adversarial examples and combinations of the three adversarial techniques. The authors found that training with adversarial examples will provide robustness against an attack using the same algorithm, and will provide some protection against other algorithms. In particular they found that training with FGSM or JSMA will provide robustness against C&W and vice versa. However, training with FGSM will not provide protection against JSMA or vice versa. Additionally the authors performed analysis to determine if data augmentation has any effect on the robustness of the network against attack. They show that training with data augmentation techniques causes some loss of robustness and hence there is a trade off to be made when deciding whether or not to use data augmentation when training a DNN. The paper is well written and the experiments which are performed are well analyzed, and the methodology is clearly stated and easy to understand. However, there are two comments which should be addressed to improve the paper.  

  1. Section 2.1 paragraph 2 (Line 70-75), this paragraph needs a reference to show the validity of the statement.
  2. There is a prior publication on this topic from 2019 (Tramèr and Boneh) which should be cited and compared. The prior work is cited below.

Tramèr, Florian, and Dan Boneh. "Adversarial training and robustness for multiple perturbations." Advances in Neural Information Processing Systems. 2019.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 3 Report

The authors recast their work as studying the effects adversarial training on black-box adversarial attack. However, this mere rewording does NOT address my main concern that the results in this paper is not novel, since the results still do not inform the researchers in this community important information that we do not know. I am including my original comments and will explain afterwards why the revision does not change my opinion.

================================

This paper conducts an experiment on MNIST data set by training neural network with adversarial examples generated from three adversarial attack algorithms (and the combination of those). Then the author checked the resulting network for the resistance to those adversarial attacks.  

Technically the experiments are sound. However, there is little novelty in this paper, since the principle messages are already widely known in the community. While initially the adversarial training was proposed as a defense against adversarial examples, it was quickly discovered that such defense only works against the specific known attacks. Such defense can not cover unknown adversarial attacks, and is easily broken down when the attacker knows that adversarial training is applied against known attacks. The experiment results simply confirmed the futility of trying to use adversarial training to be resistant to all adversarial attacks.

==================

The authors acknowledged in the response that, while adversarial training protects against specific adversarial attacks it was designed to protect, the value of its defense against other adversarial attacks is never proven. However, they then argue that previous literature has only shown how to overcome the adversarial trained network in white-box adversarial attacks, and they are conducting experiments to study the effects on black-box adversarial attacks. This rephrasing of the purpose does not address the fundamental issue that the results in this paper does not contribute important conclusive information.

The white-box attack examples in literature shows an achievable limit of the capability of an adversary with access to the network parameters: and this limit is very insecure for networks protected using adversarial training. Simple experiments on how your adversarial trained network perform against KNOWN adversarial attacks (which are not specifically designed to overcome your protection) cannot show that this limit is not achievable by a black-box attacker. It only shows that the limit was not achieved by the three adversarial attacks you simulated.

If you want to claim that the upper-limit of protection against white-box attacks and the black-box attacks are different, you need some theoretical proof on how the knowledge of network parameters changes the upper-limit. From information theoretical point of view, the capability of a black-box attacker is the same as that of the white-box attacker, since any black-box  attacker can judge whether an example is adversarial or not exactly the same as a white-box attacker. So there exists a black-box attack (with infinite computing resource) that can achieve the same negative results in previous literature of white-box attacks. The difference is only in the practical convenience of searching that adversarial example using the knowledge of network parameters, which the previous white-box attackers used to demonstrate that this limit is low with real implemented attack. There has not been any theory that the computational restriction can reduce the achievable capability of a black-box attacker compared to a white-box attacker.

Basically, your simulation shows that the adversarial training against the three known adversarial attacks sometime provides some protection across each other but very weak. I was saying that this negative result is not surprising since the white-box attacks in literature showed even stronger negative results. Now if you are claiming that this less negative result is your contribution on the differences between white-box and black-box attacker capabilities, then it is not a reliable conclusion. You have not shown that the adversarial training can protect against other potential black-box attacks. The limited simulation of only three existing attacks (not even designed with consideration to overcome any adversarial training) does not reliably show what is achievable by a cleverer black-box attacker. A principle of cyber security is that your protection scheme cannot simply rely on the ignorance/dumbness of your adversary.  

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 3

Reviewer 3 Report

The authors have clarified what their experimental results implies. Those are good improvement on the manuscript. My criticisms from beginning, however, was that such implications should have been expected and the contribution of experimental confirmation of these conclusions was minimal.

Let me summarize quickly what their experiments have shown:

For three known (black-box) attacks, they conducted adversarial training with samples from each and combination of these attacks. The resulting network  shows robustness against each of three specific attacks when the training is aimed at that attack. Against the other attacks not specifically trained against, resulting network can have some robustness but can also be not very robust. This generally confirms that adversarial training works against known specific attack but not necessarily against other unknown attacks.

We know from literature that, for each network adversarially trained against a specific attack, there exists specially designed new white-box attack which renders it valuable again. Thus we already knew the resulting network  can not be robust against other unknown attacks. The authors’ contribution basically confirms this for black-box attacks: networks adversarially trained against a specific attack may not be robust against another black-box attack. However, I do not see why this result needs confirmation since it should be implied from the fact that there are other unknown attacks which breaks down networks trained for a specific attack. Since there exist such attack to break down an adversarially trained network, then there must also exists black-box attacks that can do the same. The only question is whether we find such black-box attacks.

The experimental results do not really advance our understanding of this problem much. From the experiment, we do know that networks adversarially trained against a specific attack may be robust against some other black-box attack and also may not be robust against some other different attacks. In principle that is easy to understand: if both attacks are similar enough, defense against one would work against another; and if the two attacks are not similar enough, then they require different defenses. But how similar in what aspects would make the defense work across attacks, the experiments provide no real clue. We can not see from the experiments what aspects of the three known attacks are similar to make adversarial training works across attacks, and what aspects of difference make them resist adversarial training across attacks.

Back to TopTop