2.4.1. Small Object Recognition Layer

There are small impurities, such as broken shells in the walnut images, and the detecting model used must be able to detect small objects. In the process of using the original YOLOv5 model, impurities such as the small broken shells of walnut kernels are small. The feature map in the YOLOv5 network structure is too small, while the multiple of the downsampling is large; thus, it is difficult for the deeper feature map to learn the features of small targets' information, which lead to omissions of small impurities. To solve this problem, this paper tries to add a small object detection layer to the original YOLOv5 head, which will continue to process the feature map for expansion. After the 17th layer of the head part, it performs upsampling and other processing on the feature map so that the feature map continues to expand. At the 20th layer, the acquired feature map with a size of 160 × 160 is concated with the feature map of the second layer in the backbone to obtain a larger feature map for small target detection.

As shown in Figure 4, the function of upsampling is to enlarge the feature map so that the displayed image has a higher resolution, which is more conducive to detecting and recognising small targets. The upsampling process in this paper is implemented by the method of transposed convolution. Unlike the ordinary convolution, transposed convolution is adding a unit-step null pixel between each two pixels of the input image, so that the obtained Feature Map size becomes larger.
