*4.1. Quantitative Results*

Table 3 shows the quantitative results of ShadowDeNet on the SNL video SAR data. In Table 3, we show the quantitative results by the mean of progressively adding the proposed improvements to the experimental baseline Faster R-CNN [36]. More ablation studies about the impact of each improvement on the whole ShadowDeNet model are introduced in Section 5 by the mean of each installation and removal.



1 HESE denotes the histogram equalization shadow enhancement. 2 TSAM denotes the transformer self-attention mechanism. 3 SDAL denotes the shape deformation adaptive learning. 4 SGAAL denotes the semantic-guided anchor-adaptive learning. 5 OHEM denotes the online hard example mining.

From Table 3, one can draw the following conclusions:


among a large number of the "static" dark shadow-like background negative samples. Consequently, the false alarm rate is decreased greatly.


Table 4 shows the performance comparison with the other five state-of-the-art detectors. From Table 4, this paper mainly selects the Faster R-CNN, FPN, YOLOv3, RetinaNet, and CenterNet to conduct performance comparison. They are all trained using the video SAR data of this paper again. Their implementations are basically the same as their original reports. Their backbone networks also load the ImageNet pretrained weights so as to ensure the comparison fairness. Moreover, we indeed know that there might be more other advanced generic deep learning object detection models in the CV community; however, they have not been applied for video SAR moving shadow detection so far by scholars in the SAR community. Therefore, we do not compare ShadowDeNet with them because this paper is limited in the SAR community. On the contrary, Faster R-CNN, FPN, YOLOv3, RetinaNet, and CenterNet have been applied for video SAR moving shadow detection by Ding et al. [24], Wen et al. [25], Huang et al. [26], Yan et al. [27], Zhang et al. [28], Hu et al. [29], Wang et al. [30], and so on, so they are selected. Moreover, various signs of the existing state indicate that deep learning methods have far exceeded most traditional methods, so we leave out the comparison with traditional methods that are usually based on excessive manual features.

**Table 4.** Performance comparison with the other five state-of-the-art detectors. These detectors have been applied for video SAR moving target shadow detection in the SAR community. The best model is marked in bold. The second-best model is marked by underline. The IOU threshold is 0.50.


From Table 4, one can draw the following conclusions:


Figure 16 shows the accuracy changing curves (*r*, *p*, *ap*, and *f* 1) of different methods when the IOU threshold is set increasingly. From Figure 16, when the IOU threshold becomes bigger and bigger, the accuracy becomes poorer and poorer. This is in line with common sense, because a larger IOU threshold will challenge the box regression performance seriously. Furthermore, except the recall–IOU curve in Figure 16a, most accuracy–IOU curves of ShadowDeNet are on the upper of all other curves, especially for the *f* 1–IOU curve in Figure 16d, which shows the better detection performance of ShadowDeNet. The results in Table 4 are at an IOU threshold of 0.50, the same as the PASCAL VOC criterion [79]. Finally, from Figure 16, one can find that it is rather necessary to further improve the box location performance. In the future, scholars should better design stronger coordinate positioning networks to ensure better performance at a large IOU threshold.

Figure 17 shows the precision–recall (*p*–*<sup>r</sup>*) curves of different methods. In Figure 17, the curve of ShadowDeNet is on the top of all other curves, almost all across the whole horizontal axis. This also shows the better detection performance of ShadowDeNet.

**Figure 16.** The accuracy changing curves with different IOU thresholds of different methods. (**a**) The curve between recall (*r*) and IOU; (**b**) the curve between precision (*p*) and IOU; (**c**) the curve between average precision (*ap*) and IOU; (**d**) the curve between *f* 1 and IOU.

**Figure 17.** Precision–recall (*p*–*<sup>r</sup>*) curves of different methods.
