2.1.3. YOLOv5 Head

YOLOv5 Head consists of layers that generate predictions from the anchor boxes for object detection. The head includes two parts: loss function and non-maximum suppression (NMS). In YOLOv5, the binary cross entropy loss function is used to calculate classification loss and confidence loss, while the complete Iou (CIoU) loss function is applied to calculate location loss (bounding box regression loss). All the losses add up to the total loss. The CIoU loss function fully considers three key geometric parameters: the overlap area, the distance from the center point, and the aspect ratio, thus improving the speed and accuracy of the regression of the prediction box [34]. The NMS is mainly used to remove redundant detection boxes and reserve the candidate box with the highest prediction probability as the final prediction box [35].

YOLOv5 is a family of compound-scaled object detection models trained on the COCO dataset [36]. YOLOv5 is the latest object detection model developed by Ultralytics, which offers open-source research into future object detection methods. An open-source network, such as the COCO dataset, was employed since it has been particularly successful at similar tasks: object segmentation, object detection and classification. However, any open-source dataset cannot be used in all circumstances due to the various classes of object. Once a class is varied, the model should be retrained according to a new dataset. Meanwhile, the model needs to be further optimized in the case of object occlusion and small-object detection.
