*3.1. Three-Class Approach*

The performance of each architecture without using any type of data augmentation is shown in Table 4. As can be observed, Faster R-CNN (ResNet) is the object detector that showed the highest mAP@50 on average; however, it was not the case for mAP@75 and the performance with small objects, for which CenterNet performed better. SSD-MobileNet, RetinaNet and EfficientDet-D1 all demonstrated a lower performance, especially in the detection of small objects.

**Table 4.** The performance of each architecture without any data augmentation techniques. In the parentheses, the training performance is presented for mAP@50. The bold numbers correspond to the best performances.


In the following step, a single form of augmentation was used at a time, and all of the architectures were evaluated in a similar process. Table 5 shows the performances when using geometrical augmentations. The performances improved in all of the cases. Faster R-CNN (ResNet) and CenterNet obtained the best results in mAP@50, mAP@75 and mAP@small. However, in this case, Faster R-CNN (ResNet) ranked first in the detection of small objects. Table 6 shows the performances when using only colour augmentations. Similarly to the previous configurations, Faster R-CNN (ResNet) and CenterNet obtained the best performances. The bold numbers correspond to the best performances.

**Table 5.** The performance of each architecture with only geometric augmentation active. In the parentheses, the training performance is presented for mAP@50.



**Table 6.** The performance of each architecture with only colour augmentation active. In the parentheses, the training performance is presented for mAP@50. The bold numbers correspond to the best performances.

Table 7 shows the performances when using both colour and geometrical augmentations at the same time. In general, all of the architectures improved their mAP@50 except for Faster R-CNN (ResNet), the performance of which remained similar. As in the previous experiments, CenterNet was the best detector according to mAP@75 (besides mAP@50), confirming its superiority when giving accurate locations.

**Table 7.** The performance of each architecture with both augmentation types (geometrical and colour) active. In the parentheses, the training performance is presented for mAP@50. The bold numbers correspond to the best performances.


Figure 9 depicts a box plot summarizing the performance of the detectors across all of the data augmentations. As can be inferred from the previous tables, Faster RCNN and CenterNet are the most consistent detectors at mAP@50 and mAP@75. However, both architectures presented a different behavior at the same time. On the one hand, Faster RCNN was able to obtain the highest performances with a higher variance. On the other hand, CenterNet did not reach the maximum, but showed more consistent performance. The other detectors (SSD, EfficientDet-D1 and RetinaNet) performed worse on average, but in some specific experiments, they were able to reach higher mAPs than CenterNet.

**Figure 9.** Box plot showing the performance of the different object detectors. (**a**) mAP@50; (**b**) mAP@75.

In order to provide a better description of how the data augmentation stage performs in the detection pipelines, Figure 10 depicts how each augmentation method has worked across all of the experiments and detectors. It can be observed that, in general, geometrical augmentation and complete augmentation show the higher median; however, with mAP@50, complete augmentation shows a more dispersed distribution, making it a less reliable augmentation. On the other hand, with mAP@75, geometrical augmentation obtains a more clear superiority over the rest of the detectors. Finally, Figure 10 also shows that all of the types of augmentations can reach the maximum performances (around 87.5%) with mAP@50, and close to maximum—obtained by geometrical augmentation—with mAP@75.

**Figure 10.** Box plot for image performance with and without augmentation transforms active. (**a**) mAP@50; (**b**) mAP@75.
