*2.5. Evaluation Metrics*

Because comparing the results of different architectures is not trivial, several benchmarks were developed and updated over the last few years for detection challenges, and different researchers may evaluate their model's skills on different benchmarks. In this paper, the evaluation method for the broccoli maturity detection task was based on the Microsoft COCO: Common Objects in Context dataset [42], which is probably the most commonly used dataset for object detection in images. Like COCO, the results of the broccoli maturity detection were reported using the Average Precision (AP). Precision is defined as the number of true positives divided by the sum of true positives (TP) and false positives (FP), while AP is the precision averaged across all of the unique recall levels. Because the calculation of AP only involves one class and, in object detection, there are usually several classes (3 in this paper), the mean Average Precision (mAP) is defined as the mean of the AP across all classes. In order to decide what a TP is, the Intersection over Union (IoU) threshold was used. IoU is defined as the area of the intersection divided by the area of the union of a predicted bounding box. For example, Figure 8 shows different IoUs in the same image and ground truth (red box) by varying the prediction (yellow box). If an IoU > 0.5 is configured, only the third image will contain a TP, while the other two will contain an FP. On the other hand, if an IoU > 0.4 is used, the central image will also count as a TP. In case multiple predictions correspond to the same ground truth, only the one with the highest IoU counts as a TP, while the remaining are considered FPs. Specifically, COCO reports the AP for two detection thresholds: IoU > 0.5 (traditional) and IoU > 0.75 (strict), which are the same thresholds used in this work. Additionally, the mAP with small objects (area less than 32 × 32 pixels) was also reported.
