*3.4. Spectral-Spatial Attention Module*

In order to take full advantage of the advantageous information of the spectral branch and spatial branch, we design a lightweight and effective spectral-spatial attention (SSA) module to guide spectral-spatial information integration. We compute the spatial and spectral attention from the branch of spatial and spectral, respectively. Then, we multiply the original features with the attention maps from another branch to transfer the corresponding information. Finally, we add the original features with the above weighted features in each branch to maintain the original information concentration. The schematic of SSA is shown in Figure 4.

**Figure 4.** Schematic diagram of the spectral-spatial attention module, " ⊕" denotes elementwise addition, and " ⊗" denotes matrix multiplication.

Similar to [50], we use global average pooling and 1D convolution to achieve spectral attention which is lightweight and effective. The weights of spectral attention *wspe* ∈ R1×1×*<sup>B</sup>* can be computed as:

$$w\_{\rm spc} = \ \delta \left( H\_{1D} \left( g \left( I\_{\rm spc}^{n-1} \right) \right) \right) \tag{10}$$

where *g*(*x*) = 1 *WH* ∑ *W i*=1 ∑ *H j*=1 *I <sup>n</sup>*−1 *spe*(*<sup>i</sup>*,*j*) is channel-wise global average pooling (GAP) and *σ* is the Sigmoid function. *<sup>H</sup>*1*D*(·) indicates 1D convolution.

For spatial attention, we use 1 × 1 convolution instead of the max pooling to generate the spatial attention map *wspa* ∈ R *<sup>H</sup>*×*W*×1. It can be formulated as:

$$w\_{\rm spa} = \ \delta \left( H\_{1 \times 1} \left( I\_{\rm spa}^{n-1} \right) \right) \tag{11}$$

To sum up, the spectral-spatial attention module can be formulated as:

$$I\_{\rm spc}^{\rm n} = \text{ReLU}\left(H\_{\rm 3 \times 3}\left(I\_{\rm spc}^{n-1}\right)\right) \odot w\_{\rm spn} + I\_{\rm spc}^{n-1} \tag{12}$$

$$I\_{\rm spr}^{\rm n} = \text{ReLU}\left(H\_{\rm 3 \times 3}\left(I\_{\rm spr}^{n-1}\right)\right) \odot w\_{\rm spc} + I\_{\rm spa}^{n-1} \tag{13}$$

where ⊗ denotes the element-wise multiplication with the auto broadcast mechanism of Pytorch.
