2.3.1. L-CSP Module

The lightweight cross stage partial (L-CSP) module mainly consists of a bottleneck structure and three convolutions. To obtain a lightweight backbone, L-CSP module adopts LBL units (i.e., a lightweight convolution layer (L-Conv), a BN layer, and an L-Relu activation layer) instead of the original CBL units (i.e., a Conv layer, a BN layer, and an L-ReLu activation layer).

Figure 3 shows the detailed structures of an L-CSP module. When the feature map *Fin* inputs, its transmission path is divided into two parallel branches as shown in Figure 3, where the channel of *Fin* is reduced by half to generate two new feature maps. Then, two feature maps of parallel branches are concatenated as a whole feature map (i.e., output feature map *Fout*).

**Figure 3.** The detailed structure of an L-CSP module.

Figure 4 shows the detailed structures of Conv and L-Conv, respectively. Figure 4a shows the architecture of the raw Conv. In practice, the input *X* ∈ R*<sup>W</sup>* × *H* × *C*, where *W*, *H*, and *C* represent the height, width, and channel of the input feature map, respectively. The convolution calculation formula is defined by

$$\mathbf{Y} = \mathbf{X} \ast \mathbf{f} + \mathbf{b} \tag{1}$$

where ∗ is the convolution operation; *f* ∈ R*<sup>C</sup>* × *K* × *K* × *H* are the convolution filters, where *K* × *K* × *N* denotes the kernel size and channel of the convolution filters; and b denotes the bias of the convolution filters.

Figure 4b shows the architecture of the L-Conv. As we can see, only part of the input channels is utilized to generate intermediate feature maps, and the output feature maps are calculated from the obtained intermediate feature maps with low-cost linear operations (i.e., 3 × 3 convolution kernels). In this way, SAR ship features extraction becomes more efficient due to fewer floating-point operations.

Specifically, given the input *X* ∈ R*<sup>W</sup>* × *H* × *C*, *M* intermediate feature maps *Y* ∈ R*<sup>W</sup>* × *H* × *C* are obtained by using a convolution operation, i.e.,

$$Y' = X \* f'\tag{2}$$

where *f* ∈ R*<sup>C</sup>* × *K* × *K* × *H* are the convolution filters and *M* < *N* means there are fewer convolution filters than the original convolution block. To further obtain the *N* output feature maps, a series of low-cost linear operations are utilized to obtain the rest of feature maps. The calculation formula is defined by

$$y\_{ij} = \Phi\_{i,j}(y\_j'), \forall i = 1, \dots, m; j = 1, \dots, s \tag{3}$$

where *yi* is the *i*-th intermediate feature maps of *Y* and *<sup>Φ</sup>i,j* is the *j*-th (*j* < *s*) linear operation to calculate the *j*-th final output feature maps *yi,j* from the *i*-th intermediate feature maps of *Y*. The last *<sup>Φ</sup>i,s* is an identity mapping operation as shown in Figure 4b. Since there are *m* intermediate feature maps and *s* linear operations, we can obtain *n* = *m* × *s* final output feature maps, where the number of output feature maps is equal to the original convolution operations.

Since the computational complexity of linear operations is much less than that of ordinary convolution operations, L-Conv is a more lightweight convolution layer than Conv. Finally, by injecting the L-CSP module including L-Conv into the backbone, we can obtain a lightweight network architecture of low computational cost.

**Figure 4.** The detailed structures of Conv and L-Conv: (**a**) description of Conv; (**b**) description of L-Conv.
