*2.5. Improvements of Loss Function*

The loss function of the YOLOv4 network during training consists of three parts: boundary box regression loss *Lciou*, confidence loss *Lconf* and classification loss *Lclass*, as shown in Equation (11).

$$\begin{cases} \begin{array}{l} L\_{conf} = & -\sum\_{i=0}^{S} \sum\_{j=0}^{B} I\_{i,j}^{obj} [\mathbf{C}\_{i}^{j} \log(\mathbf{C}\_{i}^{j}) + (1 - \mathbf{C}\_{i}^{j}) \log(1 - \mathbf{C}\_{i}^{j})] \end{array} \\ \begin{array}{l} -\lambda\_{noobj} \sum\_{i=0}^{S} \sum\_{j=0}^{B} I\_{i,j}^{noobj} [\mathbf{C}\_{i}^{j} \log(\mathbf{C}\_{i}^{j}) + (1 - \mathbf{C}\_{i}^{j}) \log(1 - \mathbf{C}\_{i}^{j})] \end{array} \\\ L\_{class} = & -\sum\_{i=0}^{S} \int\_{i,j}^{obj} \sum\_{c \in \text{classes}} [P\_{i}^{j} \log(P\_{i}^{j}) + (1 - P\_{i}^{j}) \log(1 - P\_{i}^{j})] \\\ L\_{loss} = & L\_{cion} + L\_{conf} + L\_{class} \end{array} \end{cases} \tag{11}$$

where *S*2 and B are the feature map scale and a priori box, and *<sup>λ</sup>noobj* is the weight coefficient. *Iobj i*,*j* and *Inoobj i*,*j* represent the target and no target at the *j*-th a priori box of the *i*-th grid, respectively. *c* is the diagonal distance between the predicted box and the closure area of the actual box. *b*, *w*, and *h* represent the center coordinates, width, and height of the prediction box, while *bg<sup>t</sup>*, *<sup>w</sup>gt*, and *hg<sup>t</sup>* represent the center coordinates, width, and height of the actual box, respectively. *Cji* and *Cji* represent the confidence of the prediction box and the labeled box, and *Pji* and *Pji* represent the class probability of the prediction box and the labeled box, respectively.

When the model is used to detect cracks, the size of the cracks themselves is small, and the objects occupy a small proportion of the background. To balance the proportion of the foreground and background data samples, a modulation coefficient *α* is added to the cross-entropy loss function of object detection and classification using the focal loss function of the RetinaNet network for reference, as shown in Equation (12).

$$L\_{\text{class}} = -\sum\_{i=0}^{S^2} I\_{i,j}^{obj} \sum\_{c \in \text{classes}} \left[ P\_i^j \log(P\_i^j) + (1 - P\_i^j) \right]^a \log(1 - P\_i^j) \tag{12}$$
