*2.4. Loss Function*

The loss function of RBFA-Net is composed of two parts: the loss of AFAN and the loss of RDN. Each part consists of classification loss and regression loss. The loss of AFAN is similar to that of RDN. Both of them are obtained by adding regression loss and classification loss. The formulae are as follows:

$$L = \frac{1}{N\_A} L\_{AFAN} + \frac{\lambda}{N\_D} L\_{RDN}$$

$$L\_{AFAN} = \sum\_{i=1}^{N} L\_{cls} \left( p\_i^A, p\_i^\* \right) + \sum\_{i=1}^{N} p\_i^\* L\_{rc\xi} \left( \mathbf{t}\_i, \mathbf{t}\_i^\* \right) \tag{11}$$

$$L\_{RDN} = \sum\_{i=1}^{N} L\_{cls} \left( p\_i, p\_i^\* \right) + \sum\_{i=1}^{N} p\_i^\* L\_{rc\xi} \left( \mathbf{t}\_i^D, \mathbf{t}\_i^\* \right)$$

where *pi* represents the predictive class probability, *p*∗ *i* represents the ground-truth category. *p*∗ *i* = 1 when sample *I* is a positive one, else *p*∗ *i* = 0. *pA i* indicates that the prediction probability is obtained by AFAN. *λ* is a hyper-parameter to balance the loss of the alignment network and that of the detection network. *t*∗ *i* represents the offset between the *i*-th sample and ground truth. *ti* represents the offset between the *i*-th prediction and ground truth. *t<sup>D</sup> i*

indicates that the offset is obtained by RDN. We use focal loss [13] and balanced L1 loss [48] as the classification loss *Lcls* and regression loss *Lreg*.

The main purpose of the regression subnet is predicting the location, size and angle of the rotated bounding boxes. According to the definition of the rotated bounding box, the regression offset *t*∗*i* and *ti* can be denoted by

$$\begin{aligned} t\_x^\* &= \frac{G\_x - A\_x}{A\_w}, \; t\_y^\* = \frac{G\_y - A\_y}{A\_k} \\ t\_w^\* &= \log \frac{G\_w}{A\_w}, \; t\_h^\* = \log \frac{G\_h}{A\_k} \end{aligned}$$

$$t\_\theta^\* = \tan(G\_\theta - A\_\theta)$$

$$\begin{aligned} t\_x &= \frac{B\_x - A\_x}{A\_w}, \; t\_y = \frac{B\_y - A\_y}{A\_k} \\ t\_w &= \log \frac{B\_w}{A\_w}, \; t\_h = \log \frac{B\_h}{A\_k} \end{aligned} \tag{12}$$

$$t\_\theta = \tan(B\_\theta - A\_\theta)$$

where *Gi*, *Ai* and *Bi* represent the five-tuple coordinate *i* ∈ (*<sup>x</sup>*, *y*, *w*, *h*, *θ*) of ground truth, anchor box and predicted bounding box. *t*∗*i* represents the regression offset between the ground truth and anchor box. *ti* represents the regression offset between the predicted bounding box and anchor box.

Like Refs [13,55,56], we use focal loss as the classification loss *Lcls*. However, previous work proved that the imbalance of classification loss and regression loss has negative impact on the detection accuracy. Directly adjusting the weight to increase the regression loss will make the network more sensitive to outliers, which is unfavorable for SAR images with much speckle noise. So, we use balanced L1 loss as the regression loss *Lreg* instead of the widely used smooth L1 loss.

Researchers usually balance the regression loss and classification loss by adjusting the loss weight. However, directly increasing the weight of regression loss will make the model more sensitive to the noise in the image. Moreover, SAR images are often disturbed by much background noise and speckle noise, which seriously affects the detection accuracy. Therefore, to solve this problem, we use balanced L1 loss instead of the commonly used smooth L1 loss as the regression loss. The formula of balanced L1 loss is as follows:

$$L\_b(\mathbf{x}) = \begin{cases} \frac{a}{b}(b|\mathbf{x}| + 1)\ln(b|\mathbf{x}| + 1) - a|\mathbf{x}|, & \text{if } |\mathbf{x}| < 1\\ \gamma|\mathbf{x}| + \mathbb{C} & \text{, otherwise} \end{cases} \tag{13}$$

where *a*, *b* and *γ* are hyper-parameters and meet *aln*(*b* + 1) = *γ*. According to the configuration in Ref. [48], we set *a* = 0.5, *γ* = 1.5. *x* is the difference between the predicted value and the ground truth.

As Ref. [48] points out, compared with smooth L1 loss, balanced L1 loss can increase more gradient for accurate samples, reducing the contribution of outliers to loss. For example, compared with the contribution of outliers, the contribution of inliers to loss is only 30%, which makes the network very sensitive to outliers. Using balanced l1loss can improve the contribution of normal values to loss and reduce the sensitivity of the network to outliers. Therefore, using balanced L1 loss is helpful for ship detection, especially for ship detection in inshore scenes.
