*3.3. Evaluation Indices*

The PASCAL VOC evaluation indices [79] are adopted whose detection IOU threshold is 0.50.

The recall (*r*) is defined by

$$r = \frac{\#tp}{\#tp + \#fn} \times 100\% \tag{13}$$

where *tp* denotes the true positives (i.e., correct detections), *fn* denotes the false negatives (i.e., missed detections), and # denotes the number. In essence, *r* is equal to the detection rate *Pd*.

The precision (*p*) is defined by

$$p = \frac{\#tp}{\#tp + \#fp} \times 100\% \tag{14}$$

where *fp* denotes the false positives (i.e., false alarms). In essence, *p* is equal to 1 − *Pf* (the false alarm rate).

The average precision (*ap*) is defined by

$$ap = \int\_0^1 p(r) \cdot dr \tag{15}$$

where *p*(*r*) denotes the precision-recall curve.

> The *f* 1-score is defined by

$$f1 = 2 \times \frac{r \times p}{r + p} \tag{16}$$

In this paper, we mainly use *f* 1 as the core accuracy index because it can make a tradeoff between the detection rate and the false alarm rate regardless of if in the traditional machine learning community or in the modern deep learning community.

We use the frames per second (FPS) to measure the detection speed, defined by

$$FPS = \frac{1}{T} \tag{17}$$

where *T* denotes the time consumed to complete an image detection.
