*3.2. Apple Detection Effect in Natural State*

Apple recognition in complex environments has always been a research challenge. In this experiment, to verify the recognition effect of the trained model for different fruit states, apples without bags, apples with bags, and apples at night from the test set are detected. Figure 8 shows the apple detection results in a natural environment using the ShufflenetV2-YOLOX model. According to the detection results, the model proposed in this paper achieves good recognition results in various situations and meets the accuracy requirements of the apple picking robot.

**Figure 8.** Apple detection effect in natural environment based on ShufflenetV2-YOLOX network model.

For images of unbagged apples during the day, the model can detect most of the apples, with only a few overlapping and too distant apples having detection errors. Images of bagged apples are not only sticky, overlapping, and obscured, but also irregular in shape due to the film on the surface of the bagged fruit. There are gaps between the fruit and the film, which compromise the texture and color characteristics of the apple surface. As a result, bagged apples can be identified less accurately than nonbagged apples. Because of the low ambient light at night, apples close to a light source will have more distinctive features. As a result, apples close to the light source are easily detected, while apples away from the light source are difficult to detect. This is the biggest obstacle to nighttime image detection. In the future, the overall effect would be improved if more effort could be put into planning the lighting system to achieve more uniform illumination. Some obscured or small targets may not be detected, due to the limitation of the input image size of 416 × 416. All models have the same problem. Increasing the input size of the image can improve the detection of the model to some extent, but at the expense of detection speed. For example, ShufflenetV2-YOLOX has a detection speed of 65 FPS at an input size of 416 × 416 and 60 FPS at an input size of 640 × 640. Although this is a reduction of 5 frames, the detection is much better and many small targets can be detected. However, the small targets are apple targets that are further away from the picking robot. For the apple picking robot, the small targets are not its working targets and do not affect the actual results. In subsequent work, a threshold pattern can be used, ignoring apples with a detection area smaller than a certain percentage. A target that focuses on a larger proportion of the area is an apple with a shorter distance. This facilitates the work of the picking robot.

Table 3 shows the precision and recall of the model detection in the three cases, the number of apples in the pictures, and the number of apples detected. There were 31 images containing 372 apple targets, of which 345 were detected. Our model can effectively address the low recall of apple detection networks under bagged and nighttime conditions.


