3.1. Problem Definition
The objective of a targeted adversarial attack is to generate an adversarial example from a given source image. The classifier should misclassify it into a specified target class, while the perturbation remains small enough to preserve visual similarity. Let the source image be
, where
H and
W denote the image height and width, and 3 corresponds to the number of RGB channels. The target class is denoted by
, where
K denotes the total number of classes. The classifier is defined as
, parameterized by
, and outputs a probability distribution over the
K classes. The objective is to generate an adversarial example
by solving the following optimization problem:
where
denotes the loss function, typically cross-entropy, which measures the discrepancy between the classifier output and the target label
;
denotes the
norm used to constrain the perturbation norm, with common choices such as
for bounding the maximum per-pixel change and
for bounding the overall perturbation norm; and
is a small positive constant representing the perturbation budget. Under the
norm,
is commonly used to ensure visual imperceptibility.
Unlike existing methods that generate adversarial examples only for known classes, MAAG also performs well on unknown classes. It enhances cross-model transferability and preserves visual imperceptibility. This definition guides MAAG by emphasizing the joint optimization of attack effectiveness, generalization, and visual consistency.
3.2. Model Architecture
The proposed generative model architecture is presented in
Figure 1. The adaptive attention generator (AAG) is the core component, which accepts paired inputs to generate high-quality adversarial examples. It mainly consists of a channel attention gate (CAG) [
39], a spatial attention module (SA) [
39], a dynamic infection mechanism, an adversarial decoder, and a lightweight feature extractor. The design of each module is described in detail below. Specifically, the lightweight feature extractor provides semantic representations of source and target images, which are adaptively modulated by CAG and SA to emphasize informative channels and regions. The dynamic infection mechanism then fuses source and target features with an infection factor
predicted from their mean–variance statistics, ensuring adaptive and input-specific blending. Finally, the adversarial decoder reconstructs adversarial examples by progressively upsampling the fused features, with CAG and SA inserted after each deconvolutional block to refine feature quality.
Channel Attention Gate. The channel attention gate enhances feature representations by adaptively reweighting channels, as illustrated in
Figure 2. It applies global average pooling to extract channel statistics, which are then processed through two lightweight convolutional blocks: the channel compression block (CCB) with a Conv-ReLU structure and the channel expansion block (CEB) with a Conv-Sigmoid structure. This yields a channel attention vector that refines the input feature map via channel-wise multiplication, emphasizing informative channels and suppressing less relevant ones.
Given an input feature map
, where
C is the number of channels and
H,
W denote spatial dimensions, the global average pooling computes channel descriptors as follows:
where
denotes the feature map of the
c-th channel,
is the average activation of that channel, and
i,
j are spatial indices. The resulting vector
serves as the input for generating the channel attention weights.
CAG processes this vector using a lightweight convolutional network to produce the final attention scores:
where
reduces the channel dimension using a reduction ratio
r, and
restores the original dimension. The activation function
introduces non-linearity, and
is the Sigmoid function used to normalize the attention weights to
. The resulting vector
encodes the attention weights across channels.
The attention-weighted feature map is computed via element-wise multiplication:
where
is broadcast over the spatial dimensions
, and
denotes the modulated output. This operation emphasizes informative channels and suppresses less relevant ones, guiding the generator to focus on class-relevant features for more effective adversarial perturbations.
In MAAG, CAG enhances the quality of feature representations, providing a strong input for the subsequent spatial attention module and feature fusion process.
Spatial Attention module. The spatial attention module refines the feature map by emphasizing informative spatial regions, as illustrated in
Figure 3. SA processes the attention-weighted feature map
from the CAG and extracts spatial context using both max pooling and average pooling along the channel dimension:
where
selects the maximum activation across channels at each spatial location, and
computes the average activation. The two results are concatenated along the channel axis to form a
feature map, which is passed through a convolutional layer followed by a sigmoid function
to produce the spatial attention map
. The attention-weighted feature map is obtained by element-wise multiplication:
where
is broadcast across all channels, and
denotes the final output feature map. SA enhances spatial focus by identifying informative regions, such as the locations of target objects, and guides the generator to produce spatially focused adversarial perturbations.
In MAAG, CAG and SA play complementary roles: CAG emphasizes global channel-level information, while SA focuses on local spatial features. Within the dynamic infection module, CAG is applied twice: first before feeding the source and target features into the MVG to enhance the representations extracted from the feature extractor, and then after their concatenation in the infection predictor, redistributing channel weights to emphasize important dimensions and suppress less relevant ones. In the adversarial decoder, each transposed convolutional block is sequentially followed by CAG and SA, which progressively refine the decoded features and help reconstruct high-quality adversarial examples.
Dynamic Infection Mechanism. The dynamic infection mechanism is the core component of AAG, as illustrated in
Figure 4. It fuses the source feature
and the target feature
to produce the fused representation
in the feature space. The fusion process is guided by an infection factor
, defined as follows:
where
denotes the fused feature map used to generate the adversarial example. The infection factor
controls the contribution of the source and target features, where · denotes element-wise multiplication. Both
and
are broadcast to match the spatial and channel dimensions of the feature maps.
The infection factor
is dynamically predicted by the infection predictor based on the input features, providing an adaptive scalar fusion mechanism guided by semantic relevance:
where the mean–variance generator (MVG) produces
and
as mean–variance features, computed for each channel by taking the mean and variance across the spatial dimensions of
and
, respectively.
The concatenation operation merges two -dimensional vectors into a single -dimensional vector. The fully connected layer maps the concatenated feature to a scalar, and the sigmoid function normalizes the output to the range to produce the infection factor .
The dynamic infection mechanism selects an appropriate fusion ratio based on the input pair. It uses guidance from CAG and SA to extract informative features for attack generation. This strategy improves generalization to unknown classes by adapting to diverse feature distributions.
Adversarial Decoder. Given a fused feature produced by the dynamic infection module, the target of the adversarial decoder is to reconstruct an adversarial example. It consists of three decoding blocks, each implemented as a lightweight transposed convolutional sub-network integrated with CAG and SA modules to enhance feature refinement during the decoding process.
Lightweight Feature Extractor. The lightweight feature extractor (see
Figure 5) is a key component of MAAG. It provides representative features throughout both the training and testing phases. This module is based on a pretrained ResNet-50, with the final fully connected and AvgPool layer removed to retain high-quality feature representations. To enhance representation capability, a channel-spatial attention module (CSAM) is inserted after each ResNet block. CSAM processes the input feature map
and outputs:
where
and
are channel attention and spatial attention in CAG and SA, respectively. The enhanced feature
is obtained by sequentially applying channel and spatial attention. By combining both mechanisms, CSAM improves feature robustness and informativeness.
In MAAG, the lightweight feature extractor achieves efficiency by fine-tuning only the CSAM parameters, while keeping the ResNet-50 backbone frozen. This design ensures high-quality feature representations for AAG and enables efficient adversarial example generation.
3.3. Loss Function Design
To enhance the generation of high-quality adversarial examples, we propose a composite loss function that simultaneously accounts for attack success, perturbation sparsity, and attention diversity. The overall loss is formulated as follows:
where
denotes the adversarial loss, which drives the adversarial example to be misclassified into the designated target class,
represents a sparsity regularization term, aiming to constrain the perturbation norm to ensure visual imperceptibility and
serves as a diversity regularization term that promotes variation across attention modules. The hyperparameters
and
are selected via cross-validation to balance the contributions of individual loss components.
Adversarial Loss. Adversarial loss
enforces feature similarity between the adversarial example and the target image. It is defined using cosine similarity:
where
denotes the feature representation extracted by the feature extractor, and
D is the feature dimensionality. The cosine similarity is given by
, which measures the cosine distance between vectors. Specifically,
is the inner product, and
is the
norm. Minimizing
aligns the adversarial and target features, thereby improving attack success. In MAAG, it serves as the primary objective for the generator.
We chose feature similarity rather than logit-level cross-entropy for two main reasons. First, feature-space alignment captures semantic-level information that is less tied to a specific surrogate model, thereby improving transferability to unseen architectures and unknown classes. In contrast, logit-level objectives depend heavily on the classifier’s decision boundary, which may cause overfitting to the surrogate and hinder generalization in black-box settings. Second, cosine similarity provides a smooth optimization landscape for aligning representations, which empirically stabilizes training. We also conducted experiments by adding a logit-level loss term, and more details are provided in
Appendix A.
Sparse Regularization Loss. Sparse regularization loss penalizes the
norm of the perturbation to promote visual imperceptibility of the adversarial perturbation:
where
denotes the
norm of the perturbation, with
representing spatial pixel coordinates and
k is the channel index. Minimizing
suppresses unnecessary pixel changes, encouraging sparsity and imperceptibility. In MAAG, this loss constrains the perturbation to keep the adversarial example visually similar to the original input.
Attention Diversity Loss. Attention diversity loss encourages distinct attention modules to produce diverse features. It is defined as follows:
where
are attention maps generated by different modules, such as channel attention from CAG or spatial attention from SA. The cosine similarity
is computed in the same manner as in the adversarial loss, measuring the cosine distance between the flattened attention maps. The negative sign ensures that minimizing
promotes orthogonality among attention maps, increasing diversity in the learned features. In MAAG, this loss enhances complementarity between attention modules and improves the generator’s generalization to unknown classes.
3.5. Mathematical Derivation and Analysis
To verify the effectiveness of the composite loss function, we present the mathematical optimization process for each individual loss component.
Gradient Analysis of the Adversarial Loss. Let
and
denote the feature representations of the adversarial and target inputs, respectively. The adversarial loss is defined as:
where the gradient of
with respect to
a is computed as follows:
where
denotes the gradient that guides the optimization of
a toward
b. Using the chain rule, the gradient is further backpropagated to the adversarial input
:
this optimization step aligns the adversarial feature with the target class, improving attack success. In MAAG, adversarial loss gradients drive the generator’s training.
Optimization of the Sparse Regularization Loss. The sparse regularization loss
is optimized using the subgradient method. For the perturbation
, the subgradient is:
where
denotes the
i-th element of the perturbation. This optimization promotes sparsity by suppressing unnecessary pixel changes. In MAAG, the sparse regularization loss constrains the perturbation to preserve the visual imperceptibility of the adversarial examples.
Derivation of the Attention Diversity Loss. Let
and
denote two attention maps. The attention diversity loss is defined as follows:
The gradient with respect to
u is computed as follows:
This gradient promotes orthogonality among attention maps, thereby increasing their diversity. In MAAG, the diversity loss optimizes the attention distribution and improves the generator’s generalization to unknown classes.
Convergence Analysis. The composite loss
combines the adversarial loss (a smooth function), the sparse regularization (a convex function), and the diversity regularization (a smooth function). Under appropriate regularization,
satisfies local Lipschitz continuity. Using the AdamW optimizer with learning rate
, the gradient update rule is:
where
denotes the generator parameters. According to the gradient descent convergence theory, when the learning rate is sufficiently small, the optimization process can achieve stable convergence toward a local minimum.