MAAG: A Multi-Attention Architecture for Generalizable Multi-Target Adversarial Attacks

Ou, Dongbo; Lu, Jintian; Hua, Cheng; Zhou, Shihui; Zeng, Ying; He, Yingsheng; Tian, Jie

doi:10.3390/app15189915

Open AccessArticle

MAAG: A Multi-Attention Architecture for Generalizable Multi-Target Adversarial Attacks

by

Dongbo Ou

,

Jintian Lu

,

Cheng Hua

,

Shihui Zhou

,

Ying Zeng

,

Yingsheng He

and

Jie Tian

^*

School of Computer Science and Engineering, Jishou University, Jishou 416000, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(18), 9915; https://doi.org/10.3390/app15189915

Submission received: 20 August 2025 / Revised: 6 September 2025 / Accepted: 8 September 2025 / Published: 10 September 2025

Download

Browse Figures

Versions Notes

Abstract

Adversarial examples pose a severe threat to deep neural networks. They are crafted by applying imperceptible perturbations to benign inputs, causing the model to produce incorrect predictions. Most existing attack methods exhibit limited generalization, especially in black-box settings involving unseen models or unknown classes. To address these limitations, we propose MAAG (multi-attention adversarial generation), a novel model architecture that enhances attack generalizability and transferability. MAAG integrates channel and spatial attention to extract representative features for adversarial example generation and capture diverse decision boundaries for better transferability. A composite loss guides the generation of adversarial examples across different victim models. Extensive experiments validate the superiority of our proposed method in crafting adversarial examples for both known and unknown classes. Specifically, it surpasses existing generative methods by approximately 7.0% and 7.8% in attack success rate on known and unknown classes, respectively.

Keywords:

adversarial attacks; multiple attention; generalization capability; multi-target attacks

1. Introduction

Deep neural networks (DNNs) have been widely applied in various domains, including image recognition [1,2,3], natural language processing [4,5], and neural control [6,7,8,9], as well as other fields [10,11,12,13,14]. Their security and robustness have raised growing concerns in the research community. As demonstrated in [15,16], DNNs are vulnerable to adversarial examples, which are inputs with imperceptible perturbations that mislead models into making incorrect predictions or classifications. This phenomenon was first reported by [15] and has since sparked extensive interest in neural network security.

DNNs are increasingly used in safety-critical applications such as autonomous driving, medical diagnosis, and financial risk assessment. Their adversarial vulnerability can lead to serious consequences. In autonomous driving, adversarial examples may cause the system to misinterpret traffic signs and trigger accidents. In medical diagnosis, incorrect predictions may directly affect treatment decisions. Studying the mechanisms behind adversarial vulnerability and exploring effective attack and defense methods are both important. These efforts are not only valuable for research but also essential for securing real-world applications.

Despite significant progress in performance, DNNs remain vulnerable in complex environments. This vulnerability reflects the limitation of deep learning models. It mainly arises from the high-dimensional and nonlinear dependence of neural networks on input data [17]. In such settings, even small perturbations can alter the model’s decision boundaries. Research shows that adversarial examples are not accidental. Instead, they result from intrinsic characteristics of the network architecture and the training process. Therefore, understanding how adversarial examples are generated and improving model robustness have become central challenges in deep learning [18,19,20].

Since adversarial examples were proposal in [15], research on adversarial examples has rapidly advanced from theory to practice, resulting in a range of attack methods and strategies. Based on how they are generated, adversarial examples can be divided into two main categories. The first is iteration-based methods [17,21,22]. These methods optimize perturbations in multiple steps using the model’s gradient information. They often perform well in white-box settings and can successfully fool target models. However, repeated gradient computation makes them computationally expensive and unsuitable for real-time applications. The second is generative-based methods [23,24,25,26]. Researchers designed these methods to reduce the high computational cost of iterative attacks. They train generators to produce adversarial examples in a single forward pass, significantly improving efficiency. Single-target generators perform well on specific classes but lack the generalizability to transfer across multiple targets.

To address the limitations of single-target generators, multi-target generation methods were developed [27,28]. They are capable of generating adversarial examples for multiple targets and have shown improved performance on known classes. For instance, these approaches can achieve competitive targeted attack success rates on the training (known) classes, often exceeding those of single-target generators. Although these methods perform well on known classes, their generalization to unknown classes is limited, with a notable drop in attack success rate. Moreover, their generated adversarial examples frequently suffer from unstable quality across different classes, and the transferability to unknown classes remains significantly weaker. This indicates that existing multi-target methods are still constrained by limited cross-class generalizability despite their efficiency gains.

To address the limited generalizability of multi-target adversarial attack methods on unknown classes, we propose MAAG (multi-attention adversarial generation), a novel generative adversarial model architecture that integrates a dynamic infection mechanism, multi-attention modules, and a composite loss function to enhance both generalization to unknown classes and transferability across models. The core idea of MAAG is to adaptively fuse features from source images and target images through a dynamic infection mechanism. The infection factor

β

controls the degree of feature fusion, ensuring that critical features from the target image are given higher weight. For example, given a source image of a dog and a target image of a cat, MAAG leverages the dynamic infection mechanism to suppress background and source-specific features (e.g., dog-related patterns) while enhancing essential cat features, thereby generating adversarial examples that are both more target-oriented and transferable. The fused features are then gradually reconstructed into adversarial examples by the adversarial decoder. The decoder is composed of multiple deconvolutional blocks that progressively restore the spatial resolution of adversarial samples. After each deconvolutional block, a channel attention module and a spatial attention module are introduced to highlight task-relevant features and suppress irrelevant information. Through this design, MAAG acquires the ability to adaptively fuse features from arbitrary image pairs and generate high-quality adversarial examples. This architecture significantly improves its generalization to unknown classes and its transferability across models. Experiments show that our method performs efficient attacks on both known and unknown classes, especially achieving higher success rates on unknown classes compared to existing methods. It also remains robust when attacking models equipped with common defenses, such as adversarial training and input preprocessing. This offers a new perspective for studying the adversarial vulnerability of deep neural networks. The contributions of this study are summarized in three main aspects:

We propose MAAG, a generative adversarial model architecture that integrates a dynamic infection mechanism and multi-attention modules to adaptively fuse source and target features. This design enables high-quality feature fusion, improving the transferability of adversarial examples and enhancing their cross-category generalizability.
A composite loss function is introduced to jointly balance attack effectiveness, perturbation sparsity, and attention diversity, ensuring that the generated adversarial examples are both efficient and visually imperceptible.
Extensive experimental results show that the proposed method achieves significantly higher attack success rates on unknown classes compared to existing approaches, demonstrating strong generalizability and providing new insights for adversarial attack research.

The remainder of this paper is organized as follows. Section 2 reviews related work on adversarial attacks and defenses, and highlights their limitations that motivate the design of MAAG. Section 3 introduces the proposed MAAG in detail. Section 4 presents the experimental settings and results. Section 5 concludes the paper and discusses future work.

2. Related Work

2.1. Early Iterative Methods

The origins of adversarial example research can be traced back to a seminal study [15]. The study first revealed the vulnerability of deep neural networks to imperceptible perturbations, and showed that model predictions can be altered by applying carefully crafted perturbations to the inputs. They used an optimization method based on the L-BFGS algorithm to generate adversarial examples. This approach performs well in white-box settings but suffers from high computational complexity, which limits its practical use. This finding reveals the complexity of neural network decision boundaries and provides a theoretical foundation for future research on adversarial attacks.

Early work used optimization-based methods such as L-BFGS to generate adversarial examples, but these approaches were often computationally expensive and impractical for large-scale use. To address this limitation, gradient-based methods were introduced to improve efficiency while maintaining attack effectiveness. FGSM [17] proposed a one-step gradient approach that generates adversarial examples efficiently, though its performance degrades in complex scenarios. I-FGSM [21] extended this idea by applying iterative updates, which improved attack success rates in white-box settings. MI-FGSM [29] incorporated a momentum term to improve the transferability of adversarial examples by stabilizing the gradient direction and avoiding poor local optima. Among the iterative methods, PGD [22] is considered a milestone. It formulates adversarial example generation as a constrained optimization problem by projecting gradient updates within a bounded perturbation range. It performs well in both white-box and black-box settings and has become a standard method for evaluating model robustness.

However, iterative methods often involve repeated gradient computations, resulting in high computational cost. This makes them unsuitable for real-time applications. This limitation has motivated the development of generative adversarial attack methods.

2.2. Single Target Generative Adversarial Attack

To address the computational cost of iterative methods, generative adversarial attack methods have emerged as a promising alternative. Generative methods improve efficiency by training a generator to produce adversarial examples or perturbations in a single forward pass. One representative method is AdvGAN [30], which uses the generative adversarial network (GAN) framework to model adversarial example generation as a game between a generator and a discriminator. The generator is trained to produce realistic adversarial examples that remain visually similar to the original input. Experiments show that AdvGAN achieves high generation efficiency and strong transferability in black-box scenarios. However, its attack capability is often tied to the specific training classes, which means it performs well on known classes but struggles to generalize to unknown ones. This indicates that AdvGAN is efficient but suffers from limited class generalizability.

Another representative method is GAP [31], which addresses adversarial example generation by training a generator to approximate the solution of an optimization problem. The generator learns to produce input-specific adversarial perturbations while maintaining generalization ability across samples. Compared to AdvGAN, GAP achieves a better balance between generation efficiency and perturbation quality. Nevertheless, GAP still relies heavily on the training distribution: while it maintains reasonable perturbation quality within known classes, its performance across unknown classes is less stable, leading to a noticeable drop in attack success rate. This suggests that GAP is effective in improving perturbation quality but remains constrained in cross-class scalability.

Early generative methods are typically designed for class-specific attacks, where each generator is trained to target a fixed class or model. This design limits their flexibility and makes them difficult to adapt to more diverse attack scenarios. These limitations have motivated the emergence of multi-target adversarial generation methods, which aim to enable a single generator to handle multiple classes more effectively.

2.3. Multi-Target Adversarial Attack

To improve the transferability and generalizability of adversarial attacks, multi-target attack methods have recently gained increasing attention. Han et al. [27] proposed MAN, a method that enables a single generator to attack multiple target classes by embedding class information into intermediate feature representations. This approach significantly reduces the training cost and improves attack coverage. However, its generalization ability is constrained by the diversity of training classes, resulting in low success rates when attacking unknown classes.

To address this limitation, Sun et al. [28] proposed GAKer, which improves multi-target adversarial attacks using a latent infection mechanism. Specifically, the latent infection mechanism substitutes features in the latent space to gradually shift source features toward target class features in a smoother and more controllable manner, thereby reducing artifacts in adversarial examples and improving their transferability across models. GAKer also integrates the CLIP module to ensure visual consistency. Its key advantage is the ability to attack arbitrary target classes, even those not seen during training. However, experiments show that GAKer still struggles to generalize to unknown classes, particularly under black-box models or defenses, where its success rate drops noticeably. Overall, while MAN and GAKer broaden the scope of multi-target generation, both methods remain limited in cross-class generalizability, especially when attacking unknown classes, and also suffer from instability of adversarial examples.

2.4. Black-Box and Transfer Attack

Beyond white-box settings, adversarial attacks in black-box scenarios have also received considerable attention. In this setting, the attacker cannot access the internal structure or parameters of the target model and therefore leverages the transferability of adversarial examples across models to mount attacks. Dong et al. [29] introduced a momentum term on top of the iterative FGSM, which accumulates gradients to stabilize the update direction and thereby improve the transferability of adversarial examples. Subsequently, Xie et al. [32] enhanced cross-model robustness via input diversity, which significantly improves the transferability of adversarial examples and achieves higher success rates against various black-box models. To address the limited transferability of traditional adversarial attacks on vision transformers, Naseer et al. [33] proposed self-ensemble and token refinement strategies. These methods exploit multi-layer class tokens and their structural interactions with patch tokens, effectively enhancing ViT transferability. Experimental results show that they substantially improve the success rates of adversarial attacks in cross-model scenarios compared to conventional approaches. Despite improvements from momentum stabilization, input transformations, and architecture-aware designs, performance can still degrade when the surrogate and target models differ substantially. The continuous evolution of increasingly transferable attack strategies has, in turn, motivated extensive research on adversarial defense methods.

2.5. Adversarial Defence Methods

Adversarial defense methods aim to improve the robustness of deep neural networks against adversarial examples. Existing methods can be broadly divided into two categories. One category focuses on improving model robustness through modifications to the training process [17,34,35]. Madry et al. [22] proposed PGD-based adversarial training, which improves robustness by incorporating adversarial examples into the training process. Different from Madry et al., Xie et al. [36] proposed a feature denoising strategy that reduces perturbations in the intermediate layers to improve robustness in both white-box and black-box settings. Another category removes adversarial perturbations through input preprocessing before feeding the data into the model. Guo et al. [37] proposed a set of transformation-based techniques, including image quilting, JPEG compression, and total variance minimization. Among them, total variance minimization and image quilting demonstrated a strong defense performance against both gray-box and black-box attacks, substantially mitigating the effectiveness of a wide range of adversarial methods. Different from such simple transformation-based preprocessing, Samangouei et al. [38] introduced Defense-GAN, which projects inputs into the latent space of a GAN to filter out adversarial noise, thereby enhancing classifier robustness. While these methods are effective in certain cases, their success heavily depends on the characteristics of the perturbations and the strength of the attacks, and they often fail against stronger adversarial attacks.

2.6. Our Work

Related work provides a foundation for adversarial attack and defense research while revealing the limitations of existing methods. Iterative methods are effective in white-box scenarios but are computationally expensive and unsuitable for real-time applications. Generative methods improve efficiency, but single-target generators lack generalizability, and multi-target methods struggle to generalize to unknown classes. Defense methods also show limited performance against strong or unseen attacks. To address the poor generalization of multi-target generative methods on unknown classes, we propose MAAG, a generative adversarial model architecture that integrates a dynamic infection mechanism and attention modules. The dynamic infection mechanism adaptively fuses source and target features with a learnable factor. In addition, the attention modules are applied across the entire generation process, including feature extraction, dynamic infection, and feature reconstruction, to highlight informative regions and channels. Experimental results demonstrate that, compared with MAN and GAKer, MAAG achieves stronger generalization to unknown classes and higher transferability in black-box scenarios. Therefore, MAAG contributes a new model design methodologically and offers a new perspective for adversarial attack research in practical black-box scenarios.

3. Methodology

3.1. Problem Definition

The objective of a targeted adversarial attack is to generate an adversarial example from a given source image. The classifier should misclassify it into a specified target class, while the perturbation remains small enough to preserve visual similarity. Let the source image be

x_{s} \in R^{H \times W \times 3}

, where H and W denote the image height and width, and 3 corresponds to the number of RGB channels. The target class is denoted by

t_{t} \in {1, 2, \dots, K}

, where K denotes the total number of classes. The classifier is defined as

F_{ϕ} : R^{H \times W \times 3} \to R^{K}

, parameterized by

ϕ

, and outputs a probability distribution over the K classes. The objective is to generate an adversarial example

x_{s}^{a d v} \in R^{H \times W \times 3}

by solving the following optimization problem:

min_{x_{s}^{a d v}} L (F_{ϕ} (x_{s}^{a d v}), t_{t}), s.t. {∥ x_{s}^{a d v} - x_{s} ∥}_{p} \leq ϵ

(1)

where

L

denotes the loss function, typically cross-entropy, which measures the discrepancy between the classifier output and the target label

t_{t}

;

{∥ \cdot ∥}_{p}

denotes the

L_{p}

norm used to constrain the perturbation norm, with common choices such as

p = \infty

for bounding the maximum per-pixel change and

p = 2

for bounding the overall perturbation norm; and

ϵ \in R^{+}

is a small positive constant representing the perturbation budget. Under the

L_{\infty}

norm,

ϵ = \frac{8}{255}

is commonly used to ensure visual imperceptibility.

Unlike existing methods that generate adversarial examples only for known classes, MAAG also performs well on unknown classes. It enhances cross-model transferability and preserves visual imperceptibility. This definition guides MAAG by emphasizing the joint optimization of attack effectiveness, generalization, and visual consistency.

3.2. Model Architecture

The proposed generative model architecture is presented in Figure 1. The adaptive attention generator (AAG) is the core component, which accepts paired inputs to generate high-quality adversarial examples. It mainly consists of a channel attention gate (CAG) [39], a spatial attention module (SA) [39], a dynamic infection mechanism, an adversarial decoder, and a lightweight feature extractor. The design of each module is described in detail below. Specifically, the lightweight feature extractor provides semantic representations of source and target images, which are adaptively modulated by CAG and SA to emphasize informative channels and regions. The dynamic infection mechanism then fuses source and target features with an infection factor

β

predicted from their mean–variance statistics, ensuring adaptive and input-specific blending. Finally, the adversarial decoder reconstructs adversarial examples by progressively upsampling the fused features, with CAG and SA inserted after each deconvolutional block to refine feature quality.

Channel Attention Gate. The channel attention gate enhances feature representations by adaptively reweighting channels, as illustrated in Figure 2. It applies global average pooling to extract channel statistics, which are then processed through two lightweight convolutional blocks: the channel compression block (CCB) with a Conv-ReLU structure and the channel expansion block (CEB) with a Conv-Sigmoid structure. This yields a channel attention vector that refines the input feature map via channel-wise multiplication, emphasizing informative channels and suppressing less relevant ones.

Given an input feature map

F \in R^{C \times H \times W}

, where C is the number of channels and H, W denote spatial dimensions, the global average pooling computes channel descriptors as follows:

z_{c} = \frac{1}{H \times W} \sum_{i = 1}^{H} \sum_{j = 1}^{W} F_{c} (i, j),

(2)

where

F_{c} \in R^{H \times W}

denotes the feature map of the c-th channel,

z_{c} \in R

is the average activation of that channel, and i, j are spatial indices. The resulting vector

z = {[z_{1}, z_{2}, \dots, z_{C}]}^{⊤} \in R^{C}

serves as the input for generating the channel attention weights.

CAG processes this vector using a lightweight convolutional network to produce the final attention scores:

α_{c} = σ ({Conv}_{2} (ReLU ({Conv}_{1} (z)))),

(3)

where

{Conv}_{1} : R^{C} \to R^{C / r}

reduces the channel dimension using a reduction ratio r, and

{Conv}_{2} : R^{C / r} \to R^{C}

restores the original dimension. The activation function

ReLU (x) = max (0, x)

introduces non-linearity, and

σ (x) = \frac{1}{1 + e^{- x}}

is the Sigmoid function used to normalize the attention weights to

[0, 1]

. The resulting vector

α_{c} \in R^{C}

encodes the attention weights across channels.

The attention-weighted feature map is computed via element-wise multiplication:

F_{CAG} = α_{c} \cdot F,

(4)

where

α_{c}

is broadcast over the spatial dimensions

H \times W

, and

F_{CAG} \in R^{C \times H \times W}

denotes the modulated output. This operation emphasizes informative channels and suppresses less relevant ones, guiding the generator to focus on class-relevant features for more effective adversarial perturbations.

In MAAG, CAG enhances the quality of feature representations, providing a strong input for the subsequent spatial attention module and feature fusion process.

Spatial Attention module. The spatial attention module refines the feature map by emphasizing informative spatial regions, as illustrated in Figure 3. SA processes the attention-weighted feature map

F_{CAG} \in R^{C \times H \times W}

from the CAG and extracts spatial context using both max pooling and average pooling along the channel dimension:

M_{SA} = σ (Conv ([MaxPool (F_{CAG}), AvgPool (F_{CAG})])),

(5)

where

MaxPool : R^{C \times H \times W} \to R^{1 \times H \times W}

selects the maximum activation across channels at each spatial location, and

AvgPool : R^{C \times H \times W} \to R^{1 \times H \times W}

computes the average activation. The two results are concatenated along the channel axis to form a

R^{2 \times H \times W}

feature map, which is passed through a convolutional layer followed by a sigmoid function

σ (\cdot)

to produce the spatial attention map

M_{SA} \in R^{1 \times H \times W}

. The attention-weighted feature map is obtained by element-wise multiplication:

F_{SA} = M_{SA} \cdot F_{CAG},

(6)

where

M_{SA}

is broadcast across all channels, and

F_{SA} \in R^{C \times H \times W}

denotes the final output feature map. SA enhances spatial focus by identifying informative regions, such as the locations of target objects, and guides the generator to produce spatially focused adversarial perturbations.

In MAAG, CAG and SA play complementary roles: CAG emphasizes global channel-level information, while SA focuses on local spatial features. Within the dynamic infection module, CAG is applied twice: first before feeding the source and target features into the MVG to enhance the representations extracted from the feature extractor, and then after their concatenation in the infection predictor, redistributing channel weights to emphasize important dimensions and suppress less relevant ones. In the adversarial decoder, each transposed convolutional block is sequentially followed by CAG and SA, which progressively refine the decoded features and help reconstruct high-quality adversarial examples.

Dynamic Infection Mechanism. The dynamic infection mechanism is the core component of AAG, as illustrated in Figure 4. It fuses the source feature

f_{s} \in R^{C \times H \times W}

and the target feature

f_{t} \in R^{C \times H \times W}

to produce the fused representation

f_{fuse}

in the feature space. The fusion process is guided by an infection factor

β

, defined as follows:

f_{fuse} = (1 - β) \cdot f_{s} + β \cdot f_{t},

(7)

where

f_{fuse} \in R^{C \times H \times W}

denotes the fused feature map used to generate the adversarial example. The infection factor

β \in [0, 1]

controls the contribution of the source and target features, where · denotes element-wise multiplication. Both

β

and

1 - β

are broadcast to match the spatial and channel dimensions of the feature maps.

The infection factor

β

is dynamically predicted by the infection predictor based on the input features, providing an adaptive scalar fusion mechanism guided by semantic relevance:

β = σ (FC (Concat (MVG (f_{s}), MVG (f_{t})))),

(8)

where the mean–variance generator (MVG) produces

MVG (f_{s})

and

MVG (f_{t})

as mean–variance features, computed for each channel by taking the mean and variance across the spatial dimensions of

f_{s}

and

f_{t}

, respectively.

The concatenation operation merges two

2 C

-dimensional vectors into a single

4 C

-dimensional vector. The fully connected layer

FC : R^{4 C} \to R^{1}

maps the concatenated feature to a scalar, and the sigmoid function

σ

normalizes the output to the range

[0, 1]

to produce the infection factor

β

.

The dynamic infection mechanism selects an appropriate fusion ratio

β

based on the input pair. It uses guidance from CAG and SA to extract informative features for attack generation. This strategy improves generalization to unknown classes by adapting to diverse feature distributions.

Adversarial Decoder. Given a fused feature produced by the dynamic infection module, the target of the adversarial decoder is to reconstruct an adversarial example. It consists of three decoding blocks, each implemented as a lightweight transposed convolutional sub-network integrated with CAG and SA modules to enhance feature refinement during the decoding process.

Lightweight Feature Extractor. The lightweight feature extractor (see Figure 5) is a key component of MAAG. It provides representative features throughout both the training and testing phases. This module is based on a pretrained ResNet-50, with the final fully connected and AvgPool layer removed to retain high-quality feature representations. To enhance representation capability, a channel-spatial attention module (CSAM) is inserted after each ResNet block. CSAM processes the input feature map

F_{input} \in R^{C \times H \times W}

and outputs:

F_{CSAM} = M_{spatial} \cdot (α_{channel} \cdot F_{input}),

(9)

where

α_{channel} \in R^{C}

and

M_{spatial} \in R^{1 \times H \times W}

are channel attention and spatial attention in CAG and SA, respectively. The enhanced feature

F_{CSAM} \in R^{C \times H \times W}

is obtained by sequentially applying channel and spatial attention. By combining both mechanisms, CSAM improves feature robustness and informativeness.

In MAAG, the lightweight feature extractor achieves efficiency by fine-tuning only the CSAM parameters, while keeping the ResNet-50 backbone frozen. This design ensures high-quality feature representations for AAG and enables efficient adversarial example generation.

3.3. Loss Function Design

To enhance the generation of high-quality adversarial examples, we propose a composite loss function that simultaneously accounts for attack success, perturbation sparsity, and attention diversity. The overall loss is formulated as follows:

L = L_{adv} + λ_{sparse} L_{sparse} + λ_{div} L_{div},

(10)

where

L_{adv}

denotes the adversarial loss, which drives the adversarial example to be misclassified into the designated target class,

L_{sparse}

represents a sparsity regularization term, aiming to constrain the perturbation norm to ensure visual imperceptibility and

L_{div}

serves as a diversity regularization term that promotes variation across attention modules. The hyperparameters

λ_{sparse} = 0.1

and

λ_{div} = 0.05

are selected via cross-validation to balance the contributions of individual loss components.

Adversarial Loss. Adversarial loss

L_{adv}

enforces feature similarity between the adversarial example and the target image. It is defined using cosine similarity:

L_{adv} = 1 - cos (F_{ϕ} (x_{s}^{adv}), F_{ϕ} (x_{t})),

(11)

where

F_{ϕ} (\cdot) \in R^{D}

denotes the feature representation extracted by the feature extractor, and D is the feature dimensionality. The cosine similarity is given by

cos (a, b) = \frac{a \cdot b}{{∥ a ∥}_{2} {∥ b ∥}_{2}}

, which measures the cosine distance between vectors. Specifically,

a \cdot b = \sum_{i = 1}^{D} a_{i} b_{i}

is the inner product, and

{∥ a ∥}_{2} = \sqrt{\sum_{i = 1}^{D} a_{i}^{2}}

is the

ℓ_{2}

norm. Minimizing

L_{adv}

aligns the adversarial and target features, thereby improving attack success. In MAAG, it serves as the primary objective for the generator.

We chose feature similarity rather than logit-level cross-entropy for two main reasons. First, feature-space alignment captures semantic-level information that is less tied to a specific surrogate model, thereby improving transferability to unseen architectures and unknown classes. In contrast, logit-level objectives depend heavily on the classifier’s decision boundary, which may cause overfitting to the surrogate and hinder generalization in black-box settings. Second, cosine similarity provides a smooth optimization landscape for aligning representations, which empirically stabilizes training. We also conducted experiments by adding a logit-level loss term, and more details are provided in Appendix A.

Sparse Regularization Loss. Sparse regularization loss penalizes the

ℓ_{1}

norm of the perturbation to promote visual imperceptibility of the adversarial perturbation:

L_{sparse} = {∥ x_{s}^{adv} - x_{s} ∥}_{1},

(12)

where

∥ x_{s}^{adv} - x_{s} ∥_{1} = \sum_{i, j, k} |x_{s}^{adv} (i, j, k) - x_{s} (i, j, k)|

denotes the

ℓ_{1}

norm of the perturbation, with

i, j

representing spatial pixel coordinates and k is the channel index. Minimizing

L_{sparse}

suppresses unnecessary pixel changes, encouraging sparsity and imperceptibility. In MAAG, this loss constrains the perturbation to keep the adversarial example visually similar to the original input.

Attention Diversity Loss. Attention diversity loss encourages distinct attention modules to produce diverse features. It is defined as follows:

L_{div} = - \sum_{i \neq j} cos (M_{i}, M_{j}),

(13)

where

M_{i}, M_{j} \in R^{C \times H \times W}

are attention maps generated by different modules, such as channel attention from CAG or spatial attention from SA. The cosine similarity

cos (M_{i}, M_{j})

is computed in the same manner as in the adversarial loss, measuring the cosine distance between the flattened attention maps. The negative sign ensures that minimizing

L_{div}

promotes orthogonality among attention maps, increasing diversity in the learned features. In MAAG, this loss enhances complementarity between attention modules and improves the generator’s generalization to unknown classes.

3.4. Test Model

In the testing phase (Figure 6), MAAG accepts an image from any class as the target. Together with the source image, the target is fed into AAG, where dynamic infection and multi-attention guidance generate high-quality adversarial examples with strong transferability.

3.5. Mathematical Derivation and Analysis

To verify the effectiveness of the composite loss function, we present the mathematical optimization process for each individual loss component.

Gradient Analysis of the Adversarial Loss. Let

a = F_{ϕ} (x_{s}^{adv}) \in R^{D}

and

b = F_{ϕ} (x_{t}) \in R^{D}

denote the feature representations of the adversarial and target inputs, respectively. The adversarial loss is defined as:

L_{adv} = 1 - \frac{a \cdot b}{{∥ a ∥}_{2} {∥ b ∥}_{2}},

(14)

where the gradient of

L_{adv}

with respect to a is computed as follows:

\nabla_{a} L_{adv} = - \frac{b}{{∥ a ∥}_{2} {∥ b ∥}_{2}} + \frac{(a \cdot b) a}{{∥ a ∥}_{2}^{3} {∥ b ∥}_{2}},

(15)

where

\nabla_{a} L_{adv} \in R^{D}

denotes the gradient that guides the optimization of a toward b. Using the chain rule, the gradient is further backpropagated to the adversarial input

x_{s}^{adv}

:

\nabla_{x_{s}^{adv}} L_{adv} = \nabla_{a} L_{adv} \cdot \frac{\partial F_{ϕ} (x_{s}^{adv})}{\partial x_{s}^{adv}},

(16)

this optimization step aligns the adversarial feature with the target class, improving attack success. In MAAG, adversarial loss gradients drive the generator’s training.

Optimization of the Sparse Regularization Loss. The sparse regularization loss

L_{sparse} = {∥ x_{s}^{adv} - x_{s} ∥}_{1}

is optimized using the subgradient method. For the perturbation

δ = x_{s}^{adv} - x_{s}

, the subgradient is:

\partial_{δ_{i}} L_{sparse} = \{\begin{matrix} 1, & if δ_{i} > 0 \\ - 1, & if δ_{i} < 0 \\ 0, & if δ_{i} = 0 \end{matrix},

(17)

where

δ_{i}

denotes the i-th element of the perturbation. This optimization promotes sparsity by suppressing unnecessary pixel changes. In MAAG, the sparse regularization loss constrains the perturbation to preserve the visual imperceptibility of the adversarial examples.

Derivation of the Attention Diversity Loss. Let

u = M_{i}

and

v = M_{j}

denote two attention maps. The attention diversity loss is defined as follows:

L_{div} = - \sum_{i \neq j} cos (u, v) .

(18)

The gradient with respect to u is computed as follows:

\nabla_{u} cos (u, v) = \frac{v}{{∥ u ∥}_{2} {∥ v ∥}_{2}} - \frac{(u \cdot v) u}{{∥ u ∥}_{2}^{3} {∥ v ∥}_{2}} .

(19)

\nabla_{u} L_{div} = - \sum_{v \neq u} \nabla_{u} cos (u, v) .

(20)

This gradient promotes orthogonality among attention maps, thereby increasing their diversity. In MAAG, the diversity loss optimizes the attention distribution and improves the generator’s generalization to unknown classes.

Convergence Analysis. The composite loss

L

combines the adversarial loss (a smooth function), the sparse regularization (a convex function), and the diversity regularization (a smooth function). Under appropriate regularization,

L

satisfies local Lipschitz continuity. Using the AdamW optimizer with learning rate

η = 10^{- 5}

, the gradient update rule is:

θ_{t + 1} = θ_{t} - η \nabla_{θ} L (θ_{t}),

(21)

where

θ

denotes the generator parameters. According to the gradient descent convergence theory, when the learning rate is sufficiently small, the optimization process can achieve stable convergence toward a local minimum.

4. Experiment

4.1. Experiment Setting

Datasets. We conduct experiments on ImageNet [40], a large-scale benchmark for image classification with 1000 classes and over one million images. To evaluate the performance of MAAG, we divide the ImageNet training set into two subsets:

Known Classes: 300 classes are randomly selected from the 1000 ImageNet classes to train the generator. For each class, 500 images are randomly selected from the training set, resulting in a total of 150,000 images. These classes are used to train the generator and represent known target classes. For consistency, the same set of 300 known classes is used across all surrogate models.
Unknown Classes: The remaining 700 classes are used to evaluate the generator’s generalization ability on unknown classes. These classes are excluded during training to fairly assess generalization.

For performance evaluation, we use images from the ImageNet test set, each resized to

224 \times 224

and normalized to

[0, 1]

for consistency. Specifically, images are randomly selected from the test set, with 30 images per class, yielding 30,000 images for evaluating adversarial attacks. In practical black-box targeted attacks, it is often reasonable to assume that the attacker has access to exemplar images of each target class (e.g., from public datasets or online resources). In our experiments, we adopt 30 images per class to ensure stable evaluation and to analyze the sensitivity of MAAG to target selection strategies. More details regarding the target image selection strategies are provided in Appendix B.

Networks. To evaluate the attack performance of MAAG, we select a diverse set of classification models, categorized as follows:

Surrogate Models: ResNet-50 [2] and Inception-v3 (Inc-v3) [41] are used to generate adversarial examples. These models act as surrogates for white-box attack simulation, where the architecture is fully accessible.
Black-Box Models: ResNet-50, Inception-v3 (Inc-v3) [41], Inception-v4 (Inc-v4) [42], DenseNet-121 (DN-121) [43], VGG-16 [44], ViT-B [45], and RVT [46] are used to evaluate transferability across architectures. These models represent black-box settings, where internal structure and parameters are inaccessible.When evaluating transferability, the surrogate model itself is excluded from the corresponding black-box evaluation to ensure a strict separation between white-box and black-box settings.
Defense Models: IncRes-v2ens, Inc-v3adv, Inc-v3ens3, and Inc-v3ens4 [47] are models enhanced through adversarial training or ensemble-based defenses, used to evaluate MAAG’s effectiveness against robust models.

All models are initialized with ImageNet pre-trained weights. Each classifier outputs a 1000-dimensional logits vector, which is used to compute the loss for targeted attacks. The selected models span a variety of deep learning architectures, including convolutional [48] and residual networks [2], to ensure the broad applicability of the experimental results.

Implementation Details. The implementation details of MAAG are summarized as follows:

Generator Training: The AdamW optimizer [49] is used with an initial learning rate of $10^{- 5}$ , dynamically adjusted via cosine annealing. The batch size is set to 32. All input images are normalized to $[0, 1]$ .
Perturbation Constraint: An $L_{\infty}$ norm constraint is adopted with a perturbation budget of $ϵ = 8 / 255$ , ensuring that the adversarial examples remain visually imperceptible. The constraint is enforced throughout generation.
Lightweight Feature Extractor: Based on a pre-trained ResNet-50, we remove the final fully connected and AvgPool layers, and insert a channel-spatial attention module (CSAM) after ResBlocks. All ResBlocks are modified in this way, and each CSAM is equipped with independent parameters rather than sharing weights across blocks. To reduce computation, only the parameters of CSAM are fine-tuned, while the remaining layers of ResNet-50 are kept frozen.
Hyperparameter Setting: The loss function includes adversarial loss, sparse regularization, and diversity regularization. The sparse regularization weight is set to $λ_{sparse} = 0.1$ , and the diversity regularization weight is $λ_{div} = 0.05$ . These hyperparameters are tuned via grid search and cross-validation to balance attack performance, imperceptibility, and generalization.
Hardware and Environment: All experiments are conducted on a single NVIDIA A100 GPU (40 GB). The implementation is based on PyTorch 1.9.0 with Python 3.8. Distributed Data Parallel is used during both training and evaluation for efficient acceleration.

Evaluation Metrics. We use the following metrics to evaluate the attack performance of MAAG:

Targeted Attack Success Rate (TASR): The percentage of adversarial examples classified as the target class. It is defined as follows:

$TASR = \frac{N_{success}}{N_{total}},$

(22)

where $N_{success}$ is the number of successful targeted attacks and $N_{total}$ is the total number of adversarial examples. TASR serves as the core metric for evaluating attack effectiveness.
White-Box Attack: Evaluates performance on surrogate models, reflecting the generator’s effectiveness under fully known model architectures.
Black-Box Attack: Evaluates transferability to models unknown during training, measuring generalization to unknown architectures.
Defense-Model Attack: Assesses the effectiveness of adversarial examples against models equipped with various defense mechanisms, including adversarial training (Inc-v3adv, Inc-v3ens3/4, and IncRes-v2ens) as well as input-transformation-based defenses such as Inc-v3-rand (randomized cropping, crop ratio uniformly sampled from [0.9, 1.0], applied with $p = 1.0$ ) and Inc-v3-jpeg (JPEG compression with quality factor $q = 75$ ).
Perturbation Perceptibility: Visual imperceptibility is measured by the $L_{1}$ norm $∥ x_{s}^{adv} - x_{s} ∥_{1}$ . A lower value indicates more imperceptible perturbations.

4.2. Performance Comparison

We compare MAAG with existing methods to evaluate its attack performance across different scenarios, including known-class, unknown-class, and defense-model attacks. All experiments are conducted under the same perturbation budget of

ϵ = 8 / 255

to ensure a fair comparison.

Attack on Unknown Classes. To evaluate the generalization ability of MAAG on unknown classes, we compare it with two advanced generative attack methods: MAN and GAKer. All models are trained on the same 300 known classes using identical training data. During testing, each unknown class is evaluated on 30 randomly selected images. TASR is used as the evaluation metric. To strengthen the reliability of the comparison and quantify statistical uncertainty, we additionally report 95% confidence intervals (CIs) across all evaluation trials (

N = 700 \times 30 = 21, 000

targets per method–victim pair), treating each attack outcome (success/failure) as a Bernoulli variable. Let

\hat{p}

denote the mean Targeted Attack Success Rate (TASR) across all samples. The CI is computed via the normal approximation as

\hat{p} \pm 1.96 \sqrt{\hat{p} (1 - \hat{p}) / N}

, and the results are reported as “mean ± CI” in Table 1.

Table 1 demonstrates that MAAG consistently outperforms both MAN and GAKer across all tested models. When using ResNet-50 as the surrogate model, MAAG achieves a white-box TASR of 50.5%, surpassing GAKer (42.1%) by 8.4 percentage points and MAN (36.2%) by 14.3 percentage points. In the black-box setting, MAAG exhibits particularly strong transferability. For instance, the TASR on VGG-16 increases from 27.1% (GAKer) to 34.3%, and on Inc-v3 from 6.7% to 10.0%. On ViT-B and RVT, MAAG also achieves the best performance (3.4% and 1.6%, respectively), outperforming MAN and GAKer.

The superior performance of MAAG is attributed to its multi-attention architecture, which integrates the channel attention gate, the spatial attention module, and the dynamic infection mechanism. The attention modules enhance feature extraction and source-target fusion, while the dynamic mechanism adaptively adjusts the fusion ratio, improving generalization to unknown classes.

Attack on Known Classes. To evaluate the performance on known classes, we compare MAAG with several adversarial attack baselines, including iterative methods SU [50], RDI-FTM-E [51], MI-FGSM [29], DI-FGSM [32], a single-target generator TTP [52], and a multi-target generator CGNC [53]. SU is combined with RDI [54] and MITI [29,55], with the momentum decay factor

μ = 1.0

, and a simple logit function is used as the loss. The hyperparameters for RDI-FTM-E [51] follow the original paper. For MI-FGSM [29] and DI-FGSM [32], the number of iterations is set to 20, while other parameters are consistent with the original implementations. A discussion on the choice of iteration number is provided in Appendix J.

Since training 300 TTP generators is computationally expensive, we allocated comparable hardware resources and training time for a fair comparison. On an NVIDIA A100 GPU, training a single TTP generator required approximately 12.5 h, while the full training of our MAAG model took about 125 h in total. Therefore, we randomly selected 10 target classes from the 300 known classes to train 10 TTP generators, and evaluation was conducted on the same 10 classes. The results are reported in Table 2 using TASR as the evaluation metric.

Table 2 demonstrates that MAAG achieves superior performance in white-box attacks. On ResNet-50, MAAG obtains a TASR of 98.3%, slightly outperforming SU (98.1%) and CGNC (97.6%). MAAG also demonstrates strong transferability in black-box settings. When ResNet-50 is used as the surrogate model, MAAG achieves 72.3% TASR on Inc-v3, which is higher than CGNC (61.7%) and RDI-FTM-E (69.2%). On Inc-v4, it reaches 64.2%, outperforming RDI-FTM-E (59.1%). When Inc-v3 is used as the surrogate model, MAAG achieves 78.3% TASR on Inc-v4, exceeding CGNC (75.2%).

These results verify the effectiveness and robustness of MAAG in multi-target attack scenarios. Its attention modules enhance feature representations, and the dynamic infection mechanism adaptively fuses source and target features. Together, these components significantly improve the cross-model transferability of the adversarial examples.

Attack on Defended Models. To evaluate the robustness of MAAG against adversarial defenses, we conduct experiments on defended models including adversarially trained or ensemble-based variants (Inc-v3adv, IncRes-v2ens, Inc-v3ens3, and Inc-v3ens4) as well as input-transformation defenses (Inc-v3-rand and Inc-v3-jpeg). Table 3 reports the targeted attack success rate on both known and unknown classes.

As shown in Table 3, MAAG achieves a competitive performance on known classes (e.g., 32.0% on Inc-v3adv vs. 27.5% MAN), while showing clear advantages on unknown classes. For example, it reaches 8.7% on Inc-v3ens3 compared to 4.2% for GAKer, and maintains higher success rates under input-transformation defenses, such as 26.5%/4.1% on Inc-v3-rand and 22.9%/3.3% on Inc-v3-jpeg.

These results validate MAAG’s robustness against defended models, particularly its generalization ability on unknown classes. The dynamic infection mechanism enables adaptive feature fusion, ensuring that adversarial examples remain effective under varying feature distributions introduced by defense strategies. Its multiple attention modules improve targeting precision by emphasizing critical features.

Perturbation Visibility Analysis. To evaluate the visual imperceptibility of adversarial examples generated by MAAG, we compare the

L_{1}

norm of perturbations produced by different generative methods, including MAN, GAKer, and MAAG. Specifically, we compute the average

L_{1}

norm of perturbations

∥ x_{s}^{adv} - x_{s} ∥_{1}

on known classes, unknown classes, and all classes as a metric of perceptibility.

As shown in Table 4, MAAG achieves the lowest average

L_{1}

norm (0.018) across all classes, outperforming GAKer (0.020) and MAN (0.022). On unknown classes, MAAG attains an even lower

L_{1}

norm of 0.019, demonstrating its ability to generate visually imperceptible perturbations. More discussions on the visual imperceptibility of MAAG can be found in Appendix G.

This performance is attributed to the sparse regularization incorporated in MAAG, which effectively suppresses unnecessary pixel modifications and enhances the visual consistency between adversarial and original images. Such characteristics are particularly important in practical scenarios involving stealthy adversarial attacks, such as bypassing face recognition systems or fooling medical image classifiers without raising human suspicion.

4.3. Ablation Study

We conducted an ablation study to evaluate the contribution of key components in MAAG, including the dynamic infection mechanism, the multi-attention mechanism (CAG and SA), and the regularization terms in the composite loss function. All experiments are performed using ResNet-50 and Inc-v3 as the surrogate models. The evaluation metrics include the targeted attack success rate and perturbation perceptibility, measured by the

L_{1}

norm.

Dynamic Infection Mechanism. To evaluate the contribution of the dynamic infection mechanism, we replace it with a fixed fusion ratio (

β = 0.5

). The results in Figure 7 show that the TASR on unknown classes drops from 50.5% to 43.0% when using ResNet-50 as the surrogate model, and from 10.0% to 8.0% on Inc-v3. For known classes, the TASR on Inc-v3 drops from 72.3% to 66.5%. When Inc-v3 is used as the surrogate model, the attack success rates on known classes also exhibit a consistent decline. Moreover, the

L_{1}

norm of perturbations across all classes slightly increases from 0.018 to 0.020.

This performance degradation suggests that the dynamic infection mechanism improves the performance by adaptively adjusting the feature fusion ratio according to the input image pair.

CAG and SA. We separately removed the channel attention gate and the spatial attention module to assess their individual contributions. The results are shown in Figure 8. When using ResNet-50 as the surrogate model, the removal of CAG reduced the TASR on Inc-v3 from 72.3% to 64.0% for known classes and from 10.0% to 6.2% for unknown classes. Similarly, removing SA resulted in a TASR of 65.5% for known classes and 6.5% for unknown classes. When both CAG and SA were removed simultaneously, the TASR further dropped to 62.8% on known classes and 4.2% on unknown classes.

These results indicate that CAG enhances the precision of feature extraction by emphasizing information-rich channels, while SA refines feature representation by highlighting critical spatial regions. Their combined effect significantly improves attack performance, particularly in terms of black-box transferability and generalization to unknown classes.

The Loss Function. When using ResNet-50 as the surrogate model, we investigated the performance variations by removing the sparsity regularization and the diversity regularization terms. As shown in Table 5 and Figure 9, removing the sparsity regularization increased the

L_{1}

norm of the perturbation across all classes from 0.018 to 0.021 and slightly reduced TASR. For example, on VGG-16 with known classes, the TASR dropped from 73.5% to 66.7%. Excluding the diversity regularization led to a drop in TASR on unknown classes from 10.0% to 7.5% on Inc-v3, indicating that diversity regularization enhances generalization by promoting orthogonal attention distributions. When both regularization terms were removed, the TASR further decreased to 65.3% on VGG-16, and the

L_{1}

norm increased to 0.023. For a more detailed discussion, including comparisons with additional logit-level loss functions, please refer to the Appendix A.

These results validate the importance of the composite loss function in balancing attack success rate, perturbation imperceptibility, and generalization capability.

Based on the experimental results, we draw the following conclusions:

Superior Generalization Ability: MAAG achieves higher TASR on unknown classes than MAN and GAKer. The dynamic infection mechanism enables adaptive feature fusion for unknown inputs, while the multi-attention design improves feature precision and attack specificity.
Strong Transferability: On known classes, MAAG performs better in black-box settings than other methods. Its transferability benefits from precise feature extraction and flexible fusion through attention and infection modules.
Robustness: MAAG effectively attacks defense models and shows a strong performance on unknown classes. Sparsity and diversity regularization help reduce perturbation visibility and improve generalization.
Component Contributions: Ablation studies show that removing the dynamic infection module, attention modules (CAG and SA), or regularization terms leads to a clear drop in TASR. Each component is essential for achieving a high attack performance.

5. Conclusions

This paper proposes a novel multi-attention adversarial generation model architecture (MAAG). By integrating a dynamic infection mechanism, multi-attention modules, and a composite loss function, MAAG significantly improves both cross-model transferability and generalization to unknown classes. The experimental results demonstrate that MAAG outperforms existing generative attack methods by a margin of 7.8% on known classes and 7.0% on unknown classes, highlighting its effectiveness in closing the performance gap across different settings. These improvements highlight that MAAG achieves stronger generalization to unknown classes and better cross-model transferability.

Furthermore, our ablation studies and theoretical analysis validate the contributions of individual components: the dynamic infection mechanism enables adaptive feature fusion, the multi-attention modules enhance spatial and channel-wise focus across the generation pipeline, and the composite loss provides a balanced optimization between attack success and perturbation sparsity. Together, these elements form a unified design that achieves both effectiveness and robustness.

Finally, while MAAG shows a strong performance, challenges remain. Future work will aim to further narrow the gap between known and unknown classes and enhance robustness against advanced defense mechanisms, such as adversarial training and image compression, which remain effective baselines in practice. In addition, the current learnable factor

β

is predicted as a scalar, an interesting extension would be to generalize

β

into a two-dimensional spatial map

β (x, y)

, which could provide more fine-grained control over feature fusion. By addressing these directions, MAAG can contribute more broadly to reliable and realistic evaluations of model robustness.

Author Contributions

Conceptualization, D.O. and J.T.; methodology, D.O., S.Z. and Y.Z.; software, D.O. and Y.Z.; validation, D.O., S.Z., Y.Z. and Y.H.; investigation, D.O., S.Z., Y.Z. and Y.H.; resources, C.H. and J.L.; writing—original draft preparation, D.O. and S.Z.; writing—review and editing, J.T., J.L. and C.H.; visualization, Y.Z. and S.Z.; project administration, J.T.; funding acquisition, J.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

We would like to sincerely thank all individuals who contributed by reviewing the manuscript and providing insightful suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

AAG	Adaptive Attention Generator
CAG	Channel Attention Gate
CCB	Channel Compression Block
CEB	Channel Expansion Block
CSAM	Channel-Spatial Attention Module
DNNs	Deep neural networks
DN-121	DenseNet-121
GAN	Generative Adversarial Network
Inc-v3	Inception-v3
Inc-v4	Inception-v4
MAAG	Multi-Attention Adversarial Generation
MVG	Mean-Variance Generator
SA	Spatial Attention
TASR	Targeted Attack Success Rate

Appendix A. Comparison with Additional Logit-Level Loss Term

To test whether logit-level supervision improves MAAG, we augmented the original objective with a cross-entropy loss on surrogate logits and tuned

λ_{logit} \in {0.01, 0.1, 0.5, 1.0}

. Table A1 and Table A2 summarize the best results on known and unknown classes, using ResNet-50 and Inc-v3 as surrogates. Across all victim models, including CNNs (Res-50, Inc-v3, Inc-v4, DN-121, VGG-16) and Transformers (ViT-B, RVT), MAAG+logit performs similarly or slightly worse than MAAG. This suggests that while feature-space alignment generalizes well, additional logit-level supervision may overfit to the surrogate and reduce transferability.

Table A1. Targeted Attack Success Rate (%) on known classes with and without logit-level loss. Asterisk (*) denotes white-box attack results. Results are reported as mean ± 95% CI.

Surrogate	Method	Res-50 *	Inc-v3	Inc-v4	DN-121	VGG-16	ViT-B	RVT
Res-50	MAAG	98.3 ± 0.17	72.3 ± 0.61	64.2 ± 0.65	88.5 ± 0.43	73.5 ± 0.60	34.7 ± 0.64	22.1 ± 0.56
Res-50	MAAG + logit	98.6 ± 0.15	71.1 ± 0.62	63.0 ± 0.66	87.6 ± 0.45	72.2 ± 0.61	34.0 ± 0.64	21.5 ± 0.55
Inc-v3	MAAG	62.5 ± 0.65	98.7 * ± 0.15	78.3 ± 0.56	77.0 ± 0.57	72.6 ± 0.60	21.9 ± 0.56	13.4 ± 0.46
Inc-v3	MAAG + logit	61.8 ± 0.66	98.6 * ± 0.15	77.5 ± 0.57	76.4 ± 0.57	71.8 ± 0.61	21.3 ± 0.56	12.9 ± 0.44

Table A2. Targeted Attack Success Rate (%) on unknown classes with and without logit-level loss. Asterisk (*) denotes white-box attack results. Results are reported as mean ± 95% CI.

Method	Res-50 *	Inc-v3	Inc-v4	DN-121	VGG-16	ViT-B	RVT
MAAG	50.5 * ± 1.03	10.0 ± 0.62	8.9 ± 0.59	28.4 ± 0.93	34.3 ± 0.98	3.4 ± 0.37	1.6 ± 0.26
MAAG + logit	49.8 * ± 1.03	9.6 ± 0.61	8.5 ± 0.58	27.9 ± 0.93	33.7 ± 0.98	3.2 ± 0.36	1.5 ± 0.25

Appendix B. Sensitivity to Target-Image Choice

To analyze the sensitivity of MAAG to target exemplar selection, we compared three strategies: (1) Random: one exemplar is randomly selected per class (the default in the main paper), (2) Representative: the exemplar closest to the K-means cluster centroid of its class, (3) High-confidence: the exemplar with the highest softmax score within its class. All experiments use ResNet-50 as the surrogate model. Table A3 reports Targeted Attack Success Rate (TASR) on unknown classes. Both representative and high-confidence selection yield consistently slightly higher TASR than random selection across victim models. Similarly, Table A4 shows the results on known classes, where the same trend holds.

Table A3. Targeted Attack Success Rate (%) of MAAG on unknown classes under different target exemplar selection strategies. Asterisk (*) denotes white-box attack results. Results are reported as mean ± 95% CI.

Strategy	Res-50 *	Inc-v3	Inc-v4	DN-121	VGG-16	ViT-B	RVT
Random (default)	50.5 ± 1.03	10.0 ± 0.62	8.9 ± 0.59	28.4 ± 0.93	34.3 ± 0.98	3.4 ± 0.37	1.6 ± 0.26
Representative	52.0 ± 1.03	10.6 ± 0.64	9.4 ± 0.60	29.8 ± 0.94	35.5 ± 0.99	3.7 ± 0.39	1.9 ± 0.28
High-confidence	52.4 ± 1.03	10.9 ± 0.64	9.8 ± 0.61	30.1 ± 0.95	35.9 ± 0.99	4.1 ± 0.41	2.0 ± 0.29

Table A4. Targeted Attack Success Rate (%) of MAAG on known classes under different target exemplar selection strategies. Asterisk (*) denotes white-box attack results. Results are reported as mean ± 95% CI.

Strategy	Res-50 *	Inc-v3	Inc-v4	DN-121	VGG-16	ViT-B	RVT
Random (default)	98.3 ± 0.17	72.3 ± 0.61	64.2 ± 0.65	88.5 ± 0.43	73.5 ± 0.60	34.7 ± 0.64	22.1 ± 0.56
Representative	98.7 ± 0.15	73.8 ± 0.59	64.9 ± 0.65	89.9 ± 0.41	74.1 ± 0.59	35.0 ± 0.65	22.6 ± 0.57
High-confidence	98.9 ± 0.14	75.0 ± 0.59	63.2 ± 0.65	89.0 ± 0.42	76.3 ± 0.58	36.2 ± 0.65	22.8 ± 0.57

Appendix C. Layer-Wise Feature-Loss Studies

We further investigate the impact of attaching the feature alignment loss to different stages of ResNet-50. Specifically, we compare Layer2, Layer3, and Layer4 (used in [2]). The results are reported separately on unknown and known classes. Surrogate model is ResNet-50. As shown in Table A5 and Table A6, applying the feature alignment loss at Layer2 results in noticeably lower targeted attack success rates on both known and unknown classes. In contrast, using Layer3 or Layer4 substantially improves performance. Between these two, Layer4 achieves the highest success rates in most cases, particularly for cross-model transfer. These results indicate that aligning at higher-level semantic representations is generally more effective, which justifies our default choice of using Layer4 in the main paper.

Table A5. Targeted Attack Success Rate (%) on unknown classes using different feature-loss layers. Asterisk (*) denotes white-box attack results. Results are reported as mean ± 95% CI.

Method	Res-50 *	Inc-v3	Inc-v4	DN-121	VGG-16	ViT-B	RVT
Layer2	47.2 ± 1.03	8.4 ± 0.57	7.1 ± 0.53	26.5 ± 0.91	31.2 ± 0.96	2.9 ± 0.35	0.7 ± 0.17
Layer3	49.8 ± 1.03	9.3 ± 0.60	8.4 ± 0.57	27.9 ± 0.93	33.1 ± 0.97	3.2 ± 0.36	1.0 ± 0.21
Layer4	50.5 ± 1.03	10.0 ± 0.62	8.9 ± 0.59	28.4 ± 0.93	34.3 ± 0.98	3.4 ± 0.37	1.6 ± 0.26

Table A6. Targeted Attack Success Rate (%) on known classes using different feature-loss layers. Asterisk (*) denotes white-box attack results. Results are reported as mean ± 95% CI.

Method	Res-50 *	Inc-v3	Inc-v4	DN-121	VGG-16	ViT-B	RVT
Layer2	97.5 ± 0.21	68.4 ± 0.63	58.7 ± 0.67	84.2 ± 0.49	72.1 ± 0.61	33.5 ± 0.64	21.2 ± 0.55
Layer3	98.0 ± 0.19	71.2 ± 0.61	61.5 ± 0.66	87.5 ± 0.45	73.8 ± 0.59	34.2 ± 0.64	21.8 ± 0.56
Layer4	98.3 ± 0.17	72.3 ± 0.61	64.2 ± 0.65	88.5 ± 0.43	73.5 ± 0.60	34.7 ± 0.64	22.1 ± 0.56

Appendix D. β Distribution Histograms During Training

To further analyze the behavior of the dynamic infection mechanism, we plot the distribution of the learned

β

values during training, using ResNet-50 as the surrogate model. Figure A1 shows the histogram of

β

values aggregated over the training process. As can be observed from Figure A1, the

β

values are not uniformly distributed, but rather concentrate in the range of 0.2–0.4. This indicates that, during training, the dynamic infection mechanism adaptively favors lower infection strengths instead of extreme values. At the same time, there remains a non-negligible probability mass for larger

β

values (up to around 0.8), which suggests that the model occasionally employs stronger infection to enforce feature mixing. This distribution demonstrates that the proposed mechanism strikes a balance between stability (most values in the lower range) and flexibility (occasional large values), which helps improve both convergence and transferability.

Figure A1. Histogram of

β

values during training. The horizontal axis denotes the

β

value, and the vertical axis denotes Probability Density.

Figure A1. Histogram of

β

values during training. The horizontal axis denotes the

β

value, and the vertical axis denotes Probability Density.

Appendix E. Sensitivity to λ Coefficients

We study the sensitivity of MAAG to the loss weights

λ_{sparse}

and

λ_{div}

. Specifically, we perform a

3 \times 3

grid search over

λ_{sparse} \in {0.05, 0.10, 0.20}

and

λ_{div} \in {0.025, 0.05, 0.10}

, while keeping all other settings fixed. All experiments in this section are conducted under the setting where ResNet-50 is used as the surrogate model, and the results are shown in Figure A2.

For each configuration, we report the average targeted attack success rate (TASR) across all victim models, computed separately on known and unknown classes. The default hyper-parameters used in the main paper are

λ_{sparse} = 0.10

and

λ_{div} = 0.05

, highlighted by the red box in the plots below.

Figure A2. Sensitivity to

λ

coefficients with ResNet-50 as the surrogate.

Figure A2. Sensitivity to

λ

coefficients with ResNet-50 as the surrogate.

Findings. (1) The default configuration

(λ_{sparse} = 0.10, λ_{div} = 0.05)

yields the best performance on both known and unknown classes. (2) Changing

λ_{div}

or

λ_{sparse}

away from the default results in only mild drops (generally within

\approx 2 %

). (3) The performance remains stable across the tested range, demonstrating that the method is robust to

λ

choices.

Conclusion. Under the ResNet-50 surrogate setting, MAAG is not overly sensitive to

λ

coefficients and demonstrates stability around the chosen default values.

Appendix F. Training Stability

To examine the stability of the training process, we plot the training loss curve of MAAG in Figure A3. The curve shows a steady decrease and converges to around 0.2 after approximately 20 epochs, without exhibiting significant oscillations. This indicates that the proposed optimization is stable. In this experiment, ResNet-50 was used as the surrogate model, with the sparse regularization coefficient set to

λ_{sparse} = 0.1

and the diversity regularization coefficient set to

λ_{div} = 0.05

.

Figure A3. Training loss curve of MAAG, showing stable convergence.

Appendix G. Perceptual Imperceptibility Analysis (SSIM/LPIPS)

To avoid the limitation of discussing only the

L_{1}

norm, we further report SSIM [56] (higher is better) and LPIPS [57] (lower is better), computed as averages over known, unknown, and all classes, with all other experimental settings kept identical. For LPIPS, we use the AlexNet backbone.

As shown in Table A7, MAAG consistently achieves higher SSIM and competitive LPIPS across splits, indicating strong imperceptibility. Compared with MAN, MAAG shows clear improvements in both SSIM and LPIPS. Against GAKer, MAAG attains nearly the same level of LPIPS while maintaining superior SSIM.

Table A7. Comparison of SSIM and LPIPS scores for adversarial examples on known, unknown, and all classes. Higher SSIM and lower LPIPS indicate better perceptual similarity.

Method	SSIM ↑			LPIPS ↓
Method	Known	Unknown	All	Known	Unknown	All
MAN	0.89	0.88	0.885	0.022	0.024	0.023
GAKer	0.91	0.90	0.905	0.018	0.017	0.018
MAAG	0.92	0.92	0.920	0.018	0.019	0.019

Appendix H. Top-3 Targeted Success on Unknown Classes

Table A8. Targeted Attack Success Rate (%) on unknown classes (Top-3). Asterisk (*) denotes white-box attack against the surrogate. Results are reported as mean ± 95% CI.

Method	Res-50 *	Inc-v3	Inc-v4	DN-121	VGG-16	ViT-B	RVT
MAN	$52.0 \pm 1.03$	$6.5 \pm 0.51$	$3.8 \pm 0.40$	$20.0 \pm 0.83$	$21.0 \pm 0.84$	$2.0 \pm 0.29$	$1.0 \pm 0.21$
GAKer	$58.5 \pm 1.02$	$13.0 \pm 0.69$	$12.0 \pm 0.67$	$40.0 \pm 1.01$	$42.0 \pm 1.02$	$3.0 \pm 0.35$	$1.3 \pm 0.23$
MAAG	$68.2 \pm 0.96$	$19.5 \pm 0.82$	$17.0 \pm 0.78$	$45.0 \pm 1.03$	$51.0 \pm 1.03$	$3.8 \pm 0.40$	$1.7 \pm 0.27$

To complement the Top-1 results, we further report targeted attack success rates (TASR) on unknown classes when the target class appears within the model’s Top-3 predictions. All settings are kept identical to the Top-1 protocol, with ResNet-50 used as the surrogate model. Results are reported as mean ± 95% CI.

Appendix I. Logit Margins Analysis

To better understand targeted attack efficacy, we analyze the logit margin on unknown classes, with ResNet-50 used as the surrogate model:

m (x_{adv}) = f_{y^{*}} (x_{adv}) - max_{c \neq y^{*}} f_{c} (x_{adv}),

where

f (\cdot)

denotes the model’s output logits,

f_{y^{*}} (x_{adv})

is the logit corresponding to the target class

y^{*}

, and

x_{adv}

denotes the adversarial example. Positive margins indicate successful targeting. As shown in Figure A4, evaluated on ResNet-50 and VGG-16 victim models, MAAG consistently shifts the margin distribution rightward compared to GAKer, producing more confident successes and fewer severe failures. This demonstrates that MAAG not only improves targeted attack success rates but also yields adversarial examples of higher quality and stability.

Figure A4. Logit margin distributions for MAAG vs. GAKer under two victim models. Dashed line:

m (x) = 0

.

Figure A4. Logit margin distributions for MAAG vs. GAKer under two victim models. Dashed line:

m (x) = 0

.

Appendix J. Effect of Iteration Number on Iterative Baselines

To examine the impact of the iteration number K on iterative attack baselines, we report the average Targeted Attack Success Rate (TASR) of MI-FGSM and DI-FGSM across six black-box victim models, using ResNet-50 as the surrogate. As shown in Table A9, the performance steadily improves when K increases from 10 to 20, but quickly saturates afterwards. In particular, the gains from 25 to 30 iterations are negligible, indicating diminishing returns. We therefore adopt

K = 20

as a tuned setting in our main experiments.

Table A9. Average Targeted Attack Success Rate (%) of MI-FGSM and DI-FGSM across six black-box victim models under different iteration numbers. ResNet-50 is used as the surrogate model.

Method	10 Iters	15 Iters	20 Iters	25 Iters	30 Iters
MI-FGSM	3.0	3.8	4.3	4.4	4.4
DI-FGSM	4.2	5.1	5.7	5.8	5.8

Appendix K. Scalar vs. Per-Channel β Comparison

We further compare the default scalar

β

with a per-channel variant. As shown in Table A10, both versions achieve very similar results: sometimes scalar

β

is slightly better, while in other cases per-channel

β

has a small advantage. This indicates that the scalar design is already sufficient, and per-channel extension brings no substantial gain while adding extra cost.

Table A10. Comparison between scalar and per-channel

β

on unknown classes. Asterisks (*) denote white-box attack results. Results are reported as mean ± 95% CI.

Table A10. Comparison between scalar and per-channel

β

on unknown classes. Asterisks (*) denote white-box attack results. Results are reported as mean ± 95% CI.

Method	Res-50 *	Inc-v3	Inc-v4	DN-121	VGG-16	ViT-B	RVT
Scalar- $β$	50.5 * ± 1.03	10.0 ± 0.62	8.9 ± 0.59	28.4 ± 0.93	34.3 ± 0.98	3.4 ± 0.37	1.6 ± 0.26
Per-channel- $β$	50.2 * ± 1.03	10.2 ± 0.63	9.0 ± 0.59	28.1 ± 0.92	34.5 ± 0.98	3.3 ± 0.37	1.7 ± 0.27

Appendix L. Efficiency Analysis

To evaluate efficiency, we report the training cost and inference latency of different generative methods. Surrogate model is ResNet-50. The results are provided in Table A11. Since MAN cannot generalize to unknown classes, it requires retraining for every 300 new target classes, leading to about

1000 / 300 \approx 3.3

retraining cycles on ImageNet-1k. In contrast, GAKer and MAAG only need a single training process and can directly generalize to unknown classes. All experiments are conducted on a single NVIDIA A100 GPU (40 GB).

Table A11. Training cost and inference latency comparison. Training cost is reported as (retraining cycles × epochs × time per epoch), and latency is measured with batch size 1 on a single NVIDIA A100 GPU.

Method	Training Cost (GPU-Hours)	Inference Latency (ms/Image)
MAN	$3.3 \times 20 \times 5.0$ h	10
GAKer	$1 \times 20 \times 5.3$ h	15
MAAG	$1 \times 20 \times 6.2$ h	14

Appendix M. Calibration Shifts

To more comprehensively demonstrate the performance of MAAG, we additionally report the Expected Calibration Error (ECE) of victim models under attack. ECE measures the discrepancy between prediction confidence and empirical accuracy, and larger values indicate stronger miscalibration. The definition follows Guo et al. [58], where predictions are divided into 15 bins. We use ResNet-50 as the surrogate model and generate targeted adversarial examples on unknown classes (same split as Table 1).

Table A12. Calibration shifts measured by ECE on unknown classes. ResNet-50 is used as the surrogate model; other models serve as black-box victims. Values are reported after attack. Higher ECE indicates stronger miscalibration. Asterisk (*) denotes white-box attack results.

Method	Res-50 *	Inc-v3	Inc-v4	DN-121	VGG-16	ViT-B	RVT
MAN	0.27	0.06	0.04	0.10	0.12	0.05	0.04
GAKer	0.33	0.08	0.07	0.21	0.20	0.07	0.06
MAAG	0.39	0.12	0.11	0.23	0.24	0.12	0.09

References

Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016. [Google Scholar]
Tian, Y. Artificial Intelligence Image Recognition Method Based on Convolutional Neural Network Algorithm. IEEE Access 2020, 8, 125731–125744. [Google Scholar] [CrossRef]
Liu, J.; Li, K.; Zhu, A.; Hong, B.; Zhao, P.; Dai, S.; Wei, C.; Huang, W.; Su, H. Application of deep learning-based natural language processing in multilingual sentiment analysis. Mediterr. J. Basic Appl. Sci. (MJBAS) 2024, 8, 243–260. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is All you Need. In Proceedings of the Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Red Hook, NY, USA, 2017; Volume 30. [Google Scholar]
Li, X.; Ren, X.; Zhang, Z.; Guo, J.; Luo, Y.; Mai, J.; Liao, B. A varying-parameter complementary neural network for multi-robot tracking and formation via model predictive control. Neurocomputing 2024, 609, 128384. [Google Scholar] [CrossRef]
Zhang, Y.; Li, S.; Kadry, S.; Liao, B. Recurrent Neural Network for Kinematic Control of Redundant Manipulators With Periodic Input Disturbance and Physical Constraints. IEEE Trans. Cybern. 2019, 49, 4194–4205. [Google Scholar] [CrossRef]
Liao, B.; Hua, C.; Xu, Q.; Cao, X.; Li, S. Inter-robot management via neighboring robot sensing and measurement using a zeroing neural dynamics approach. Expert Syst. Appl. 2024, 244, 122938. [Google Scholar] [CrossRef]
Tang, Z.; Zhang, Y.; Ming, L. Novel Snap-Layer MMPC Scheme via Neural Dynamics Equivalency and Solver for Redundant Robot Arms With Five-Layer Physical Limits. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 3534–3546. [Google Scholar] [CrossRef]
Sun, Q.; Wu, X. A deep learning-based approach for emotional analysis of sports dance. PeerJ Comput. Sci. 2023, 9, e1441. [Google Scholar] [CrossRef]
Xie, X.; Peng, S.; Yang, X. Deep learning-based signal-to-noise ratio estimation using constellation diagrams. Mob. Inf. Syst. 2020, 2020, 8840340. [Google Scholar] [CrossRef]
Zhang, Z.; Ding, C.; Zhang, M.; Luo, Y.; Mai, J. DCDLN: A densely connected convolutional dynamic learning network for malaria disease diagnosis. Neural Netw. 2024, 176, 106339. [Google Scholar] [CrossRef]
Cao, X.; Peng, C.; Zheng, Y.; Li, S.; Ha, T.T.; Shutyaev, V.; Katsikis, V.; Stanimirovic, P. Neural Networks for Portfolio Analysis in High-Frequency Trading. IEEE Trans. Neural Netw. Learn. Syst. 2024, 35, 18052–18061. [Google Scholar] [CrossRef]
Zhang, Z.; He, Y.; Mai, W.; Luo, Y.; Li, X.; Cheng, Y.; Huang, X.; Lin, R. Convolutional Dynamically Convergent Differential Neural Network for Brain Signal Classification. IEEE Trans. Neural Netw. Learn. Syst. 2025, 36, 8166–8177. [Google Scholar] [CrossRef]
Szegedy, C.; Zaremba, W.; Sutskever, I.; Bruna, J.; Erhan, D.; Goodfellow, I.; Fergus, R. Intriguing properties of neural networks. arXiv 2014, arXiv:1312.6199. [Google Scholar] [CrossRef]
Papernot, N.; McDaniel, P.; Jha, S.; Fredrikson, M.; Celik, Z.B.; Swami, A. The Limitations of Deep Learning in Adversarial Settings. In Proceedings of the 2016 IEEE European Symposium on Security and Privacy (EuroS&P), Saarbruecken, Germany, 21–24 March 2016; pp. 372–387. [Google Scholar] [CrossRef]
Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and Harnessing Adversarial Examples. arXiv 2015, arXiv:1412.6572. [Google Scholar] [CrossRef]
Carlini, N.; Wagner, D. Towards Evaluating the Robustness of Neural Networks. In Proceedings of the 2017 IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–26 May 2017; pp. 39–57. [Google Scholar] [CrossRef]
Xiao, L.; He, Y.; Dai, J.; Liu, X.; Liao, B.; Tan, H. A Variable-Parameter Noise-Tolerant Zeroing Neural Network for Time-Variant Matrix Inversion With Guaranteed Robustness. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 1535–1545. [Google Scholar] [CrossRef] [PubMed]
Xiao, L.; He, Y.; Wang, Y.; Dai, J.; Wang, R.; Tang, W. A Segmented Variable-Parameter ZNN for Dynamic Quadratic Minimization With Improved Convergence and Robustness. IEEE Trans. Neural Netw. Learn. Syst. 2023, 34, 2413–2424. [Google Scholar] [CrossRef] [PubMed]
Kurakin, A.; Goodfellow, I.J.; Bengio, S. Adversarial Examples in the Physical World. In Artificial Intelligence Safety and Security; Chapman and Hall/CRC: Boca Raton, FL, USA, 2018; pp. 99–112. [Google Scholar] [CrossRef]
Madry, A.; Makelov, A.; Schmidt, L.; Tsipras, D.; Vladu, A. Towards Deep Learning Models Resistant to Adversarial Attacks. arXiv 2019, arXiv:1706.06083. [Google Scholar] [CrossRef]
Hu, W.; Tan, Y. Generating Adversarial Malware Examples for Black-Box Attacks Based on GAN. In Proceedings of the Data Mining and Big Data; Tan, Y., Shi, Y., Eds.; Springer: Singapore, 2022; pp. 409–423. [Google Scholar]
Poursaeed, O.; Katsman, I.; Gao, B.; Belongie, S. Generative Adversarial Perturbations. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018; pp. 4422–4431. [Google Scholar]
Yang, X.; Dong, Y.; Pang, T.; Su, H.; Zhu, J. Boosting Transferability of Targeted Adversarial Examples via Hierarchical Generative Networks. In Proceedings of the Computer Vision—ECCV 2022; Avidan, S., Brostow, G., Cissé, M., Farinella, G.M., Hassner, T., Eds.; Springer: Cham, Switzerland, 2022; pp. 725–742. [Google Scholar]
Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
Han, J.; Dong, X.; Zhang, R.; Chen, D.; Zhang, W.; Yu, N.; Luo, P.; Wang, X. Once a man: Towards multi-target attack via learning multi-target adversarial network once. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 5158–5167. [Google Scholar]
Sun, Y.; Yuan, S.; Wang, X.; Gao, L.; Song, J. Any Target Can be Offense: Adversarial Example Generation via Generalized Latent Infection. In Proceedings of the Computer Vision—ECCV 2024; Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G., Eds.; Springer: Cham, Switzerland, 2025; pp. 383–398. [Google Scholar]
Dong, Y.; Liao, F.; Pang, T.; Su, H.; Zhu, J.; Hu, X.; Li, J. Boosting adversarial attacks with momentum. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 9185–9193. [Google Scholar]
Xiao, C.; Li, B.; Zhu, J.Y.; He, W.; Liu, M.; Song, D. Generating Adversarial Examples with Adversarial Networks. arXiv 2019, arXiv:1801.02610. [Google Scholar] [CrossRef]
Huang, C.; Kairouz, P.; Chen, X.; Sankar, L.; Rajagopal, R. Generative Adversarial Privacy. arXiv 2019, arXiv:1807.05306. [Google Scholar] [CrossRef]
Xie, C.; Zhang, Z.; Zhou, Y.; Bai, S.; Wang, J.; Ren, Z.; Yuille, A.L. Improving transferability of adversarial examples with input diversity. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 2730–2739. [Google Scholar]
Naseer, M.; Ranasinghe, K.; Khan, S.; Khan, F.S.; Porikli, F. On Improving Adversarial Transferability of Vision Transformers. arXiv 2022, arXiv:2106.04169. [Google Scholar] [CrossRef]
Jia, X.; Zhang, Y.; Wu, B.; Ma, K.; Wang, J.; Cao, X. LAS-AT: Adversarial Training With Learnable Attack Strategy. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 13398–13408. [Google Scholar]
Kurakin, A.; Goodfellow, I.; Bengio, S. Adversarial Machine Learning at Scale. arXiv 2017, arXiv:1611.01236. [Google Scholar] [CrossRef]
Xie, C.; Wu, Y.; Maaten, L.v.d.; Yuille, A.L.; He, K. Feature denoising for improving adversarial robustness. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 501–509. [Google Scholar]
Guo, C.; Rana, M.; Cisse, M.; van der Maaten, L. Countering Adversarial Images using Input Transformations. arXiv 2018, arXiv:1711.00117. [Google Scholar] [CrossRef]
Samangouei, P.; Kabkab, M.; Chellappa, R. Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models. arXiv 2018, arXiv:1805.06605. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255. [Google Scholar] [CrossRef]
Szegedy, C.; Vanhoucke, V.; Ioffe, S.; Shlens, J.; Wojna, Z. Rethinking the inception architecture for computer vision. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 2818–2826. [Google Scholar]
Szegedy, C.; Ioffe, S.; Vanhoucke, V.; Alemi, A. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning. In Proceedings of the AAAI Conference on Artificial Intelligence, San Francisco, CA, USA, 4–9 February 2017; Volume 31. [Google Scholar] [CrossRef]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. arXiv 2015, arXiv:1409.1556. [Google Scholar] [CrossRef]
Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Dehghani, M.; Minderer, M.; Heigold, G.; Gelly, S.; et al. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Mao, X.; Qi, G.; Chen, Y.; Li, X.; Duan, R.; Ye, S.; He, Y.; Xue, H. Towards Robust Vision Transformer. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 12042–12051. [Google Scholar]
Tramèr, F.; Kurakin, A.; Papernot, N.; Goodfellow, I.; Boneh, D.; McDaniel, P. Ensemble Adversarial Training: Attacks and Defenses. arXiv 2020, arXiv:1705.07204. [Google Scholar] [CrossRef]
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Loshchilov, I.; Hutter, F. Decoupled Weight Decay Regularization. arXiv 2019, arXiv:1711.05101. [Google Scholar] [CrossRef]
Wei, Z.; Chen, J.; Wu, Z.; Jiang, Y.G. Enhancing the Self-Universality for Transferable Targeted Attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 18–22 June 2023; pp. 12281–12290. [Google Scholar]
Liang, K.; Dai, X.; Li, Y.; Wang, D.; Xiao, B. Improving Transferable Targeted Attacks with Feature Tuning Mixup. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 11–15 June 2025; pp. 25802–25811. [Google Scholar]
Naseer, M.; Khan, S.; Hayat, M.; Khan, F.S.; Porikli, F. On Generating Transferable Targeted Perturbations. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 7708–7717. [Google Scholar]
Fang, H.; Kong, J.; Chen, B.; Dai, T.; Wu, H.; Xia, S.T. CLIP-Guided Generative Networks for Transferable Targeted Adversarial Attacks. In Proceedings of the Computer Vision—ECCV 2024; Leonardis, A., Ricci, E., Roth, S., Russakovsky, O., Sattler, T., Varol, G., Eds.; Springer: Cham, Switzerland, 2025; pp. 1–19. [Google Scholar]
Zou, J.; Pan, Z.; Qiu, J.; Liu, X.; Rui, T.; Li, W. Improving the Transferability of Adversarial Examples with Resized-Diverse-Inputs, Diversity-Ensemble and Region Fitting. In Proceedings of the Computer Vision—ECCV 2020; Vedaldi, A., Bischof, H., Brox, T., Frahm, J.M., Eds.; Springer: Cham, Switzerland, 2020; pp. 563–579. [Google Scholar]
Dong, Y.; Pang, T.; Su, H.; Zhu, J. Evading defenses to transferable adversarial examples by translation-invariant attacks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 4312–4321. [Google Scholar]
Wang, Z.; Bovik, A.; Sheikh, H.; Simoncelli, E. Image quality assessment: From error visibility to structural similarity. IEEE Trans. Image Process. 2004, 13, 600–612. [Google Scholar] [CrossRef]
Zhang, R.; Isola, P.; Efros, A.A.; Shechtman, E.; Wang, O. The Unreasonable Effectiveness of Deep Features as a Perceptual Metric. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
Guo, C.; Pleiss, G.; Sun, Y.; Weinberger, K.Q. On Calibration of Modern Neural Networks. In Proceedings of the 34th International Conference on Machine Learning; Precup, D., Teh, Y.W., Eds.; Proceedings of Machine Learning Research (PMLR): Cambridge, MA, USA, 2017; Volume 70, pp. 1321–1330. [Google Scholar]

Figure 1. The overall pipeline of the proposed MAAG (multi-attention adversarial generation) architecture.

Figure 2. Schematic diagram of channel attention gate.

Figure 3. Schematic diagram of spatial attention.

Figure 4. The architecture of the dynamic infection module.

Figure 5. Architecture of the lightweight feature extractor.

Figure 6. Overview of the MAAG attack pipeline during testing.

Figure 7. (a) Attack various victim models using ResNet-50 as the surrogate, with K and U denoting known and unknown classes. (b) Attack various victim models using Inception-v3 as the surrogate, with K denoting known classes.

Figure 8. Attack various victim models under ablation settings on CAG and SA, with (a) showing results on known classes (K) using ResNet-50 as the surrogate, (b) on unknown classes (U) using ResNet-50, and (c) on known classes (K) using Inception-v3.

Figure 9. Attack various victim models using ResNet-50 as the surrogate under ablation settings on diversity and sparsity. (a) shows results on known classes (K), and (b) on unknown classes (U).

Table 1. Targeted Attack Success Rate (%) on unknown classes. Asterisk (*) denotes white-box attack results. Results are reported as mean ± 95% CI.

Method	Res-50 *	Inc-v3	Inc-v4	DN-121	VGG-16	ViT-B	RVT
MAN	36.2 * ± 0.99	2.8 ± 0.34	1.4 ± 0.24	10.7 ± 0.64	11.9 ± 0.67	1.5 ± 0.25	0.8 ± 0.18
GAKer	42.1 * ± 1.02	6.7 ± 0.52	6.1 ± 0.49	25.0 ± 0.89	27.1 ± 0.92	2.3 ± 0.31	1.1 ± 0.22
MAAG	50.5 * ± 1.03	10.0 ± 0.62	8.9 ± 0.59	28.4 ± 0.93	34.3 ± 0.98	3.4 ± 0.37	1.6 ± 0.26

Table 2. Targeted Attack Success Rate (%) on known classes. The surrogate models are listed in the leftmost column, and the asterisk (*) denotes white-box attack results. Results are reported as mean ± 95% CI.

Surrogate	Method	Res-50 *	Inc-v3	Inc-v4	DN-121	VGG-16	ViT-B	RVT
Res-50	SU	98.1 * ± 0.18	33.1 ± 0.64	26.2 ± 0.59	83.5 ± 0.50	66.9 ± 0.64	10.2 ± 0.41	4.1 ± 0.27
	RDI-FTM-E	98.9 * ± 0.14	69.2 ± 0.62	59.1 ± 0.66	87.6 ± 0.45	85.1 ± 0.48	31.5 ± 0.63	20.4 ± 0.55
	TTP	97.9 * ± 0.19	50.0 ± 0.68	48.2 ± 0.68	80.8 ± 0.53	61.8 ± 0.66	30.8 ± 0.62	19.5 ± 0.54
	CGNC	97.6 * ± 0.21	61.7 ± 0.66	60.1 ± 0.66	91.6 ± 0.38	68.5 ± 0.63	26.4 ± 0.60	16.0 ± 0.50
	MI-FGSM	99.3 * ± 0.11	2.0 ± 0.19	1.5 ± 0.16	15.0 ± 0.48	6.0 ± 0.32	0.9 ± 0.13	0.6 ± 0.10
	DI-FGSM	99.4 * ± 0.10	3.5 ± 0.25	2.5 ± 0.21	17.0 ± 0.51	8.0 ± 0.37	1.2 ± 0.15	0.8 ± 0.12
	MAAG	98.3 * ± 0.17	72.3 ± 0.61	64.2 ± 0.65	88.5 ± 0.43	73.5 ± 0.60	34.7 ± 0.64	22.1 ± 0.56
Inc-v3	SU	10.7 ± 0.42	98.2 * ± 0.18	10.1 ± 0.41	10.2 ± 0.41	7.2 ± 0.35	3.8 ± 0.26	2.1 ± 0.19
	RDI-FTM-E	41.9 ± 0.67	96.6 * ± 0.25	58.2 ± 0.67	54.2 ± 0.67	33.1 ± 0.64	12.6 ± 0.45	7.3 ± 0.35
	TTP	37.6 ± 0.66	92.4 * ± 0.36	48.2 ± 0.68	56.2 ± 0.67	70.4 ± 0.62	10.8 ± 0.42	5.9 ± 0.32
	CGNC	63.1 ± 0.65	97.9 * ± 0.19	75.2 ± 0.58	69.1 ± 0.62	70.2 ± 0.62	18.7 ± 0.53	11.5 ± 0.43
	MI-FGSM	7.5 ± 0.36	99.0 * ± 0.13	6.5 ± 0.33	6.8 ± 0.34	4.5 ± 0.28	1.0 ± 0.13	0.5 ± 0.10
	DI-FGSM	10.5 ± 0.41	99.2 * ± 0.12	9.0 ± 0.39	9.2 ± 0.39	6.0 ± 0.32	1.4 ± 0.16	0.8 ± 0.12
	MAAG	62.5 ± 0.65	98.7 * ± 0.15	78.3 ± 0.56	77.0 ± 0.57	72.6 ± 0.60	21.9 ± 0.56	13.4 ± 0.46

Table 3. Targeted attack success rate on defended models (known/unknown classes). ResNet-50 is used as the surrogate model.

Method	Inc-v3adv	IncRes-v2ens	Inc-v3ens3	Inc-v3ens4	Inc-v3-rand	Inc-v3-jpeg
MAN	27.5/1.0	21.0/0.6	28.8/0.9	22.5/0.5	18.2/0.8	15.5/0.6
GAKer	35.5/5.9	25.2/3.2	29.7/4.2	24.6/3.4	23.7/2.5	20.4/1.8
MAAG	32.0/8.0	27.1/6.2	31.0/8.7	27.0/7.3	26.5/4.1	22.9/3.3

Table 4. Average

L_{1}

norm of adversarial perturbations on known and unknown classes.

Table 4. Average

L_{1}

norm of adversarial perturbations on known and unknown classes.

Method	Known Classes	Unknown Classes	All Classes
MAN	0.019	0.024	0.023
GAKer	0.019	0.021	0.020
MAAG	0.018	0.020	0.019

Table 5. Average

L_{1}

norm of adversarial perturbations under different loss configurations.

Table 5. Average

L_{1}

norm of adversarial perturbations under different loss configurations.

Setting	Known Classes	Unknown Classes	All Classes
Full loss	0.017	0.019	0.018
wo div	0.018	0.020	0.019
wo spa	0.020	0.021	0.021
wo div and spa	0.022	0.023	0.023

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ou, D.; Lu, J.; Hua, C.; Zhou, S.; Zeng, Y.; He, Y.; Tian, J. MAAG: A Multi-Attention Architecture for Generalizable Multi-Target Adversarial Attacks. Appl. Sci. 2025, 15, 9915. https://doi.org/10.3390/app15189915

AMA Style

Ou D, Lu J, Hua C, Zhou S, Zeng Y, He Y, Tian J. MAAG: A Multi-Attention Architecture for Generalizable Multi-Target Adversarial Attacks. Applied Sciences. 2025; 15(18):9915. https://doi.org/10.3390/app15189915

Chicago/Turabian Style

Ou, Dongbo, Jintian Lu, Cheng Hua, Shihui Zhou, Ying Zeng, Yingsheng He, and Jie Tian. 2025. "MAAG: A Multi-Attention Architecture for Generalizable Multi-Target Adversarial Attacks" Applied Sciences 15, no. 18: 9915. https://doi.org/10.3390/app15189915

APA Style

Ou, D., Lu, J., Hua, C., Zhou, S., Zeng, Y., He, Y., & Tian, J. (2025). MAAG: A Multi-Attention Architecture for Generalizable Multi-Target Adversarial Attacks. Applied Sciences, 15(18), 9915. https://doi.org/10.3390/app15189915

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MAAG: A Multi-Attention Architecture for Generalizable Multi-Target Adversarial Attacks

Abstract

1. Introduction

2. Related Work

2.1. Early Iterative Methods

2.2. Single Target Generative Adversarial Attack

2.3. Multi-Target Adversarial Attack

2.4. Black-Box and Transfer Attack

2.5. Adversarial Defence Methods

2.6. Our Work

3. Methodology

3.1. Problem Definition

3.2. Model Architecture

3.3. Loss Function Design

3.4. Test Model

3.5. Mathematical Derivation and Analysis

4. Experiment

4.1. Experiment Setting

4.2. Performance Comparison

4.3. Ablation Study

5. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

Abbreviations

Appendix A. Comparison with Additional Logit-Level Loss Term

Appendix B. Sensitivity to Target-Image Choice

Appendix C. Layer-Wise Feature-Loss Studies

Appendix D. β Distribution Histograms During Training

Appendix E. Sensitivity to λ Coefficients

Appendix F. Training Stability

Appendix G. Perceptual Imperceptibility Analysis (SSIM/LPIPS)

Appendix H. Top-3 Targeted Success on Unknown Classes

Appendix I. Logit Margins Analysis

Appendix J. Effect of Iteration Number on Iterative Baselines

Appendix K. Scalar vs. Per-Channel β Comparison

Appendix L. Efficiency Analysis

Appendix M. Calibration Shifts

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI