*3.4. Loss Function*

Similar to other tow-stage methods, the loss function is defined as sum of classification and regression loss:

$$L\_{total} = L\_{cls}^{RPN} + L\_{bbox}^{RPN} + L\_{cls}^{Cat} + L\_{bbox}^{Rcg} \tag{6}$$

In Equation (6), *Lcls* and *Lbbox* denote classification and regression loss of MB-RPN and finetune loss respectively. Cross entropy is adopted for measuring the classification loss:

$$L\_{cls}(y\_i, y\_i^\*) = -[y\_i^\* \log(y\_i) + (1 - y\_i^\*) \log(1 - y\_i)]\tag{7}$$

where *yi* and *y*<sup>∗</sup> *<sup>i</sup>* denote the predict and annotated category respectively where *y*<sup>∗</sup> *<sup>i</sup>* is 1 if the anchor is positive in MB-RPN and *y*∗ *<sup>i</sup>* is 1 at the dimension representing the object's category in label vector. *Lreg* denote the smooth L1 regression loss [7]:

$$L\left(t\_{i\prime}, t\_i^\*\right) = \begin{cases} 0.5\left(t\_i - t\_i^\*\right)^2, & |x| < 1\\ \left|x\right| - 0.5, & \text{others} \end{cases} \tag{8}$$

where *ti* and *t* ∗ *<sup>i</sup>* denote predict and annotated coordinate and scale transform:

$$\begin{cases} t\_x = (\mathbf{x} - \mathbf{x}\_a) / w\_a & t\_y = (y - y\_a) / h\_a \\ t\_w = \log(w / w\_a) & t\_h = \log(h / h\_a) \\ t\_x^\* = (\mathbf{x}^\* - \mathbf{x}\_a) / w\_a & t\_y = (y^\* - y\_a) / h\_a \\ t\_w^\* = \log(w^\* / w\_a) & t\_h^\* = \log(h^\* / h\_a) \end{cases} \tag{9}$$
