*3.4. Test Time Augmentation*

Data augmentation is a common technique for increasing the size of a training dataset and reducing the chances of overfitting. Meanwhile, test time augmentation (TTA) is a data augmentation method applied during test time to improve the prediction capability of a neural network. As shown in the pipeline in Figure **??**, we created multiple augmented copies from a sample image and then made predictions for the original and synthetic samples. The prediction results show the bounding box coordinates with corresponding confidence scores from different augmented samples as well as their original test images. TTA operation is an ensemble method in which multiple augmented samples of a test image are evaluated based on a trained model. The final decision is made by weighted boxes fusion algorithm. While pursuing a balance between computation complexity and performance, we pick three augmentation methods (horizontal flip, vertical flip, and 90-degree rotation) to evaluate the improvement achieved by the TTA method.

**Figure 4.** Test time augmentation applied in trained model. The bounding box prediction results are fused through weighted boxes fusion to highlight the important region and suppress the false detection.

### *3.5. Weighted Boxes Fusion Algorithm*

Weighted boxes fusion (WBF) [**?** ] is a key step while efficiently merging the predict position and the confidence score in TTA. Some common bounding box fusion algorithms such as NMS and soft-NMS [**?** ] also work well for selecting bounding boxes by removing low-threshold overlapping boxes. However, they fail to consider the importance of different predicted bounding boxes. The WBF method will not discard any bounding boxes; instead, it uses the classification confidence scores of each predicted box to produce a combined predicted rectangle with high quality. The detailed WBF algorithm is described in Table **??**, and the notations used in the table are summarized as follows:



**Table 2.** Weighted boxes fusion for merging BBox in test time augmentation method.

**Step 1:** Initialize the list **L**, **B**, **F**, inference **N** methods and store predicted BBox into **B**. **Step 2:** Iterate BBox in **B**, find matching BBox in F, if **IoU** (F*<sup>i</sup>*, B*i*) > **THR.**

–**Step 2.1:** If no matching BBox, dequeue the BBox in **B** and add it into **L**, **F**.

– **Step 2.2:** If the match BBox is found, dequeue the BBox in **B** and add it into **L**. This BBox represents the matching box in "**pos**" position in **F**.

**Step 3:** In each **"pos "** in F, recalculate the all the BBox confidence score and coordinate by formula Equations (1)–(5), weighted sum of the coordinates of the boxes, in here, weight is the confidence score for each BBox. The weight of high confidence sore large than small one.

$$\begin{array}{ll} C\_{i} = \frac{\sum\_{i=1}^{T} C\_{i}}{T} & \text{(1)}\\ x\_{\text{min}} = \frac{\sum\_{i=1}^{T} C\_{i} \* x\_{\text{min}}}{\sum\_{i=1}^{T} C\_{i}} & \text{(2)}\\ x\_{\text{max}} = \frac{\sum\_{i=1}^{T} C\_{i} \* x\_{\text{max}}}{\sum\_{i=1}^{T} C\_{i}} & \text{(3)}\\ y\_{\text{min}} = \frac{\sum\_{i=1}^{T} C\_{i} \* y\_{\text{min}}}{\sum\_{i=1}^{T} C\_{i}} & \text{(4)}\\ y\_{\text{max}} = \frac{\sum\_{i=1}^{T} C\_{i} \* y\_{\text{max}}}{\sum\_{i=1}^{T} C\_{i}} & \text{(5)} \end{array}$$

**Step 4:** Readjust the confidence score as *C* = *C* ∗ *TN* , If there are few BBox in **"pos "** in the same position, the detected region is less like GT categories.
