*2.2. Anchor-Guided Feature Alignment Network*

In networks using horizontal anchors, the convolution features are aligned with anchors, so the convolution features can reflect the anchor representations [30]. However, in the network using rotating anchor, solving the misalignment between rotated anchors and convolution features is an important problem. To solve this problem, we introduce an anchor-guided feature alignment network. With the guidance of rotated anchors, the feature map will be adjusted to align with rotated anchors in the next rotational detection network. In this section, we will first introduce the basic information about the rotated bounding box and then introduce the anchor-guided feature alignment network.

### 2.2.1. Introduction to Rotated Bounding Box

In this paper, we use the geometric definition of the rotated bounding box used by MMRotate [50]. The rotated bounding box can be represented by five parameters (*<sup>x</sup>*, *y*, *w*, *h*, *<sup>θ</sup>*). The two tuples (*<sup>x</sup>*, *y*) represtent the location of the center point of the rotated bounding box. The two tuples (*<sup>w</sup>*, *h*) represent the length and width of the rotated bounding box. *θ* is the rotation angle of the rotated bounding box, and its value range is [−45◦, <sup>135</sup>◦). Figure 7 shows more details about the geometric definition of the rotated bounding box.

**Figure 7.** Geometric representation of rotated bounding boxes.

2.2.2. Anchor-Guided Feature Alignment Network

Generally, many existing networks with a rotated bounding box use heuristically defined anchors with different scales and aspect ratios. As a result, these networks tend to suffer from misalignment between the rotated anchor boxes and axis-aligned convolution features. Previous works [30,51] have proved that this misalignment will lead to the decline of detection accuracy. In order to solve this problem, we introduce the alignment

convolution layer to establish the anchor-guided feature alignment network (AFAN) to extract the adaptive features according to the predicted shape of the refined anchor.

The core idea of AFAN is that we use the alignment convolution layer to reset the sampling locations according to the roughly generated rotated bounding box. As shown in Figure 8, the extracted feature maps are sent into a rough regression subnet consisting of two convolution layers. The subnet roughly generates rotated bounding boxes in this part. Then, through these roughly generated bounding boxes, we can calculate their offset from the bounding boxes. Finally, these offsets and the original feature maps are put into the alignment convolution layer to reset the sampling locations.

**Figure 8.** Architecture of anchor-guided feature alignment network (AFAN).

For the feature maps extracted through BAFPN, firstly, they will be input into a regression network to preliminarily locate the anchors and adjust their directions. From this, we obtain the feature maps of *H* × *W* × 5. For each 5-dimensional anchor box, we sample 9 points to obtain the 18-dimension offset field *O*. The calculation formula of the offset field *O* is as follows:

$$L\_{p\_0}^{p\_k} = \frac{1}{S}((x, y) + \frac{1}{k}(w, h)p\_k)R^T(\theta) \tag{8}$$

$$O = \left\{ L\_{p\_0}^{p\_k} - p\_o - p\_k \right\}\_{p\_k \in R} \tag{9}$$

where *Lpk p*0 represents the sampling location. *S* represents the stride of the feature maps. *k* represents the kernel size. *p*0 represents each location on the feature map *XA*. *pk* represents the *pk*-th location in *XA*. *R*(*θ*) = (*cosθ*, <sup>−</sup>*sinθ*;*sinθ*, *cos<sup>θ</sup>*)*<sup>T</sup>* is the rotation matrix. *O* represents the offset field.

Finally, the offset field and feature maps extracted with BAFPN are input into the alignment convolution layer. By alignment convolution, we align the feature maps with the rotated bounding boxes. These aligned feature maps will be input into RDN for ship target detection and classification.

The alignment convolution is established by adding the offset field *o* to the ordinary convolution. The alignment convolution can be defined as follows:

$$X\_A(p\_\circ) = \sum\_{p\_k \in R, \rho \in O} w(p\_k) X(p\_\circ + p\_k + o) \tag{10}$$

*o* represents the offset in the offset field *O*.
