*Article* **A Real-Time Apple Targets Detection Method for Picking Robot Based on ShufflenetV2-YOLOX**

**Wei Ji \*, Yu Pan, Bo Xu and Juncheng Wang**

School of Electrical and Information Engineering, Jiangsu University, Zhenjiang 212013, China; ailqy369@163.com (Y.P.); xubo@ujs.edu.cn (B.X.); w18981051316@icloud.com (J.W.) **\*** Correspondence: jiwei@ujs.edu.cn

**Abstract:** In order to enable the picking robot to detect and locate apples quickly and accurately in the orchard natural environment, we propose an apple object detection method based on Shufflenetv2- YOLOX. This method takes YOLOX-Tiny as the baseline and uses the lightweight network Shufflenetv2 added with the convolutional block attention module (CBAM) as the backbone. An adaptive spatial feature fusion (ASFF) module is added to the PANet network to improve the detection accuracy, and only two extraction layers are used to simplify the network structure. The average precision (AP), precision, recall, and F1 of the trained network under the verification set are 96.76%, 95.62%, 93.75%, and 0.95, respectively, and the detection speed reaches 65 frames per second (FPS). The test results show that the AP value of Shufflenetv2-YOLOX is increased by 6.24% compared with YOLOX-Tiny, and the detection speed is increased by 18%. At the same time, it has a better detection effect and speed than the advanced lightweight networks YOLOv5-s, Efficientdet-d0, YOLOv4-Tiny, and Mobilenet-YOLOv4-Lite. Meanwhile, the half-precision floating-point (FP16) accuracy model on the embedded device Jetson Nano with TensorRT acceleration can reach 26.3 FPS. This method can provide an effective solution for the vision system of the apple picking robot.

**Keywords:** machine vision; picking robot; apple detection; YOLOX; ShufflenetV2

**Citation:** Ji, W.; Pan, Y.; Xu, B.; Wang, J. A Real-Time Apple Targets Detection Method for Picking Robot Based on ShufflenetV2-YOLOX. *Agriculture* **2022**, *12*, 856. https:// doi.org/10.3390/agriculture12060856

Academic Editor: Surya Kant

Received: 21 April 2022 Accepted: 8 June 2022 Published: 13 June 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

#### **1. Introduction**

China's apple planting area and output account for more than 50% of the world [1], but its picking is still dominated by manual picking, with high cost. Therefore, the apple picking robot is the development direction in the future. How to locate and detect apples quickly and accurately in the natural environment is the focus and difficulty of vision research of picking robots [2].

At present, the research on fruit detection at home and abroad is mainly divided into target detection based on the traditional algorithm and target detection based on the deep learning algorithm, and both have made some progress. Traditional algorithms require artificially designed features [3], and their accuracy and detection speed are not as good as those of deep learning algorithms. Currently, they are mostly used for image preprocessing. Xia [4] proposed a method for fruit segmentation based on the K-means clustering algorithm. The Canny edge detection operator was used to extract the fruit contour, the Y-node search algorithm was used for contour separation, and finally, the least squares method was used for contour reconstruction. Liu [5] used a simple linear iterative clustering algorithm to segment the apple image collected in the orchard into super-pixel blocks, and used the color features extracted by blocks to determine the target candidate region. Lv [6] calculated the distance of each fruit in the connected area by using the Euclidean distance method, extracted the effective peak from the smoothed curve by using the improved local extreme value method, and determined the shape of overlapping apples according to the number of peaks. Bochkovskiy [7] chose incandescent lighting to obtain images at night. In the image segmentation stage, the power transformation was used to improve the R-G color difference threshold segmentation method, and the genetic algorithm was introduced to optimize the solution of the maximum interclass variance. The accuracy was 94% and the detection speed was 2.21 FPS.

The detection algorithm based on deep learning has wider applicability than the traditional algorithm. When using a specific dataset, it can learn deeper features and obtain higher accuracy. It is easier to detect the target. In recent years, deep learning has been used in a wide range of industries. Some scholars have conducted in-depth research on apple target detection based on deep learning. Sa [8] basically achieved rapid detection and achieved an F1 score of 0.838 by using the improved Fast R-CNN training RGB color and near-infrared images to detect fruits. Zhao [9] used the improved YOLOv3 algorithm with 13 layers to prove that it is feasible to use the deep learning algorithm in the natural environment under the verification of different illumination directions, different growth stages of apples, and different picking times. Mazzia [10] achieved a detection speed of 30 FPS using a modified YOLOv3-Tiny network on a matched embedded device, the Jetson AGX Xaver. However, the Jetson AGX Xaver is very expensive and its AP is only 83.64%, which does not satisfy the need for detection accuracy. Yan [11] using the improved YOLOv5 can effectively identify grasping apples that are not obscured by leaves or only obscured by leaves, and nongrasping apples that are obscured by branches or other fruits. Wu [12] achieved 98.15% AP and 0.965F1 using an improved EfficientNet-YOLOV4 dataset augmented by foliage occlusion data. However, its model capacity is 158 M, and the real-time detection speed is only 2.95 FPS. Chu [13] designed a novel Mask-RCNN for apple detection. By adding a suppression branch to the standard Mask-RCNN to suppress nonapple features, its F1 index is 0.905, but the detection speed is only 4 FPS. The suppression branch of this method is designed according to color, which is only effective when the color difference between fruit and leaf is large. When the color difference is not large, due to light, disease, or debris, the detection effect may not be good.

Although the above studies have all achieved some results for apple recognition in different scenarios, they all have similar problems. That is, high detection speed and high detection accuracy cannot be satisfied simultaneously. At the same time, according to the current research literature, several directions have been little studied. First, most of the current research on apple recognition has focused on apples that are dense, overlapping, or obscured by foliage, with very little research on apples in the context of bagging. Secondly, there are few studies related to apple detection models running on edge devices to determine how the detection models will perform in practice. To solve the above problems, an apple detection algorithm based on YOLOX-Tiny is proposed in this paper. It can meet the needs of a picking robot working with high precision and in real time. Compared to similar studies, our main contributions are the following two.

