*3.2. Multi-Scale Sample Discrimination*

The main factors affecting the sample matching in training process are the scale of the anchor box and the labelling result IOU discrimination threshold. The large difference between the scale of the small object and the anchor box makes match difficult under the existing discrimination conditions.

For the sampling process of the positive samples, Figure 2a shows the matching results when default shape of FPN anchor boxes is adopted and the discrimination threshold of IOU> 0.7, where the red rectangle indicates the manually marked area containing the object. As it is seen, none of the anchor boxes can be successfully matched with the object area. Therefore, the image is unable to guide the network parameter training because it does not contain any positive sample during the the training process. Figure 2b shows the matching results for the case where the scale of the anchor box is reduced by half. The green rectangle indicates the labeled samples in the training set, and the red rectangle indicates the corresponding matching anchor box, the sample matching results are still far from ideal.

For small objects, the default IOU threshold of FPN is a stringent condition, resulting in a poor matching even in the case where the anchor box scale is reduced. Also, the design of the anchor boxes should fully consider the objects in the image data set with different sizes. Therefore, simple reduction of the anchor box scale might, in return, result in matching failure for the object samples with a normal size. Hence, it is hard to improve the object matching probability solely by reducing the scale of the anchor box for detecting small objects.

(**a**)default FPN anchor sizes

(**b**)half of the default FPN anchor sizes

**Figure 2.** Sampling Results Example on Small objects.

To address the above issue, multi-scale positive sampling approach with dynamic IOU discrimination threshold is proposed. The FPN method has designed five scale-level anchor boxes, namely, A1∼A5 according to different scale sizes. According to the scale of anchor boxes, the approach divide three different positive sample intervals from smallest anchor size to the biggest, the criteria for levels of are shown as the follow:

$$\begin{cases} \text{small\\_positive} & : a\_i \in \{A\_1\} \\ \text{medium\\_positive} & : a\_i \in \{A\_2, A\_3\} \\ \text{big\\_positive} & : a\_i \in \{A\_4, A\_5\} \end{cases} \tag{1}$$

In Equation (1), *ai* represents the area of an anchor box, and A1∼A5 denote the present area of the anchor boxes in 5 different levels. The IOU discrimination threshold of small and medium anchor boxes are then decreased to 0.5 and 0.6 respectively, to ensure more small and medium boxes will be matched. For large anchor boxes, the default discrimination threshold is kept. By lowering the positive sample discrimination threshold, the anchor box is easier to match with the small object area, and the number of positive samples with the small object area is therefore increased.

Theoretically, lowering the matching threshold for the large-size anchor boxes can also effectively increase the number of matching anchor boxes. However, compared with the small object area, the large object has the following two differences:


the network training process. Therefore, it is very limited to enhance the effect of detection performances for large objects by reducing the IOU discrimination threshold.

**Figure 3.** Sampling Results Example on Big Objects.

From the network training perspective, the object detection approaches that are limited by computing resources often need to set an upper limit of samples. Part of the sampling results will be discarded randomly when too much samples are collected. Taking the FPN as an example, the upper limit on the total number of samples is usually 256, and arrange for positive and negative samples are 128 respectively, the redundant samples will be discarded. In this paper, we argue that the sample priority of the small object area should be higher than that of the large objects. In cases where there are a combination of small and large objects in the image, first and most important is to ensure a sufficient number of the small object areas samples for section. Therefore, the same IOU threshold value as the original FPN method is maintained for the large object areas, and the number of positive samples is not increased.

For negative sample sampling, besides considering the match of the anchor box and the object size, it is also necessary to consider the effect of different discrimination difficulty on the accuracy of the algorithm. For the object detection algorithm, the proposed approach divide the negative samples into easy and hard negative samples depending on the IOU threshold. In particular, the easy negative samples help the network to converge quickly. The detection accuracy however is mainly dependent on hard negative samples. Therefore, when collecting negative samples, the ratio of the number of hard to the number of easy negative samples is balanced. Figure 4 shows the example result of negative samples, where blue, green and red rectangles denote small, medium and big negative samples, most of them belong are easy and small samples.

**Figure 4.** Negative Sample Results of Random Sampling.

To address problem above, the Libra RCNN propose a balance sampling method to ensure the diversity of negative samples: First, according to IOU between anchor boxes and ground-truth the divide negative samples into different intervals. Second, divide the number of negative samples equally according to the intervals and balance sampling in each interval. Within the FPN method, Negative samples are defined as the anchors whose IOU with ground-truth are lower than 0.3, the Libra RCNN further divided it into easy, medium and hard negative intervals which are defined as the follow:

$$\begin{cases} \textit{easy\\_negative} : IOI \in [0, 0.1) \\ \textit{medium\\_negative} : IOI \in [0.1, 0.2) \\ \textit{hard\\_negative} : IOI \in [0.2, 0.3) \end{cases} \tag{2}$$

Based on Libra RCNN, a balance negative sampling method which combining samples' scale and difficulty is proposed. Negative samples were divided into 8 intervals as shown in the Equation3. For medium and big negative samples this approach adopt the similar difficulty dividing approach as Libra RCNN, for instance, the easy\_negative\_medium negative sample denote the samples whose *IOU* ∈ [0, 0.1) and scale *ai* ∈ {*A*2, *A*3}. For small negative samples, since the IOU discrimination threshold of positive samples are adjusted to 0.5, the dividing approach of Libra RCNN is easy to cause confusing between positive and negative samples, therefore this approach correspond adjusted the dividing approach that only divide them into two different intervals.

> ⎧ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎨ ⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎪⎩ *easy*\_*negative*\_*small* : *IOU* ∈ [0, 0.1), *ai* ∈ {*A*1} *hard*\_*negative*\_*small* : *IOU* ∈ [0.1, 0.2), *ai* ∈ {*A*1} *easy*\_*negative*\_*medium* : *IOU* ∈ [0, 0.1), *ai* ∈ {*A*2, *A*3} *medium*\_*negative*\_*medium* : *IOU* ∈ [0.1, 0.2), *ai* ∈ {*A*2, *A*3} *hard*\_*negative*\_*medium* : *IOU* ∈ [0.2, 0.3), *ai* ∈ {*A*2, *A*3} *easy*\_*negative*\_*big* : *IOU* ∈ [0, 0.1), *ai* ∈ {*A*4, *A*5} *medium*\_*negative*\_*big* : *IOU* ∈ [0.1, 0.2), *ai* ∈ {*A*4, *A*5} *hard*\_*negative*\_*big* : *IOU* ∈ [0.2, 0.3), *ai* ∈ {*A*4, *A*5} (3)
