In this section, we describe an evasion attack, which we consider in this paper as a target adversarial attack, and image reconstruction, which we rely on as a defense against evasion attacks.
2.2. Deep Image Prior (DIP)
Deep Image Prior (DIP) [
19] is an image reconstruction method that starts with random noise and reconstructs an image in an iterative way, gradually making it closer to the target image. In addition, DIP utilizes features of Convolutional Neural Networks (ConvNets) architecture to capture low-level features from images
without training.
Figure 1 shows the results of various image reconstruction methods for a noise-added image in comparison. It can be seen, even with human vision, that DIP could reconstruct the details (high-frequency features) of the image well compared to others.
Figure 2 illustrates the iterative procedure of DIP. It starts from a randomly initialized ConvNet (
) that reconstructs an image from a randomly generated noise (
z). DIP iteratively updates the ConvNet weights
in a way so that the generated image is as close as possible to the target image
.
Figure 3 shows the example reconstructed images for different numbers of iterations in DIP. Comparing the reconstructed image after 700 iterations,
Figure 3e, with the target image,
Figure 3h, it can be seen that the image under reconstruction has much fewer high-frequency features (noise). This means that DIP can also be used to remove high-frequency features (noise). Thus, if one can decide on the number of iterations where only the robust features are reconstructed, DIP can successfully be used to filter out the injected perturbation in adversarial examples.
Figure 4 shows how DIP can be used to defend against adversarial attacks. The image to be reconstructed in
Figure 4 is an adversarial example
generated by an adversarial attack. The iterations of DIP can be divided into three stages. First, the image (between
and
) is reconstructed, starting from random noise, independent of both the adversarial example
and the original image
. In the second stage (between
and
), the reconstructed image is similar to
, with only robust features reconstructed out of
. Lastly, the image generated later than
contains abundant non-robust features in which adversarial perturbation is hidden. Therefore, it is important to determine the second stage (
and
), during which, only robust features are reconstructed, in order to successfully filter out perturbation from an adversarial example.
DIPDefend [
17] uses Second-order Exponential Smoothing (SES)-PSNR, a method of measuring PSNR trends in reconstructed images to determine the second stage (
and
). This is a method that utilizes the fact that PSNR increases rapidly when reconstructing the robust features because most of the energy in the image is contained at a low frequency. Therefore, the point until which PSNR has risen rapidly and stopped increasing suddenly is most likely between
and
, and this is the point where DIP needs to stop its iterations.
Figure 5 shows how SES-PSNR and PSNR change as the number of iterations of DIP increases. In
Figure 5, PSNR increases as it reconstructs the robust features in the early stages of DIP iteration, which tends to converge after
. Based on this, when the peak of SES-PSNR is found (between
and
),
DIPDefend stops reconstructing the image. SES-PSNR, denoted as
, can be obtained using Equation (
1). Since there is no ground-truth image (i.e., original unattacked image) available, PSNR, denoted as
, is calculated using an adversarial example.
used in Equation (
1) represents a change in the tendency of the previous cycle and can be obtained using Equation (
2).
There are fitting coefficients and in the above equations. Note that the most appropriate values for and may differ from one adversarial example to another. In DIPDefend, these values are empirically determined and have been used for all images.