**1. Introduction**

Chinese red jujube is a kind of characteristic fruit which is famous for its various nutritional ingredients [1]. With the increasing demand for red jujubes, it is more and more important to count red jujubes so as to provide a basis for the estimation of jujube yield through vision. Due to the increasing supply of red jujubes, the count of red jujubes will play an important role in the planting and production management. Therefore, it is of great significance to realize the count of red jujubes, and it will help improve the utilization rate of red jujubes. However, the development of artificial intelligence, it provides a new way to solve the problem of low fruit production efficiency [2].

It is an important task of orchard management to estimate the fruit yield by counting the number of fruits. Deep learning has become a potential tool for counting the number of fruits, and It enables automatic feature extraction from data sets. At the same time, by extracting the basic parameters of crop growth, intelligent agricultural technology enables farmers to estimate crop yield, thus reasonably arranging the production and processing of red jujubes [3]. Machine learning methods, such as the Watershed algorithm [4] and

**Citation:** Qiao, Y.; Hu, Y.; Zheng, Z.; Yang, H.; Zhang, K.; Hou, J.; Guo, J. A Counting Method of Red Jujube Based on Improved YOLOv5s. *Agriculture* **2022**, *12*, 2071. https://doi.org/10.3390/ agriculture12122071

Academic Editors: Vadim Bolshev, Vladimir Panchenko and Alexey Sibirev

Received: 10 October 2022 Accepted: 30 November 2022 Published: 2 December 2022

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

the kalman filter algorithm [5], are widely used to count fruit. However, because the supervised learning method in machine learning can't capture the nonlinear relationship between input and output variables and the uncertainty of the crop environment, it is difficult for traditional machine learning methods to develop a reliable crop counting model. However, in recent years, the progress of technology has made it possible to develop advanced crop counting models by using deep learning. Shileiliu et al. [6] proposed a light target detection YOLOv5-CS model, which could realize the object detection and accurate counting of green citrus in the natural environment. The map of the model was 98.23%. ZhangYanchao et al. [7] used the YOLOX target detection network to detect and count the holly fruits, and the map was 95%.

Owing to the improvement of computer hardware and the development of computer vision technology, deep learning has been widely used in various industries [8–10]. Object detection algorithm based on deep learning mainly includes One-Stage and Two-Stage. The first type is the detection algorithm based on candidate region, such as R-CNN (Region-Convolutional Neural Networks) [11], Fast R-CNN (Fast Region-Convolutional Neural Networks) [12], Faster R-CNN (Faster Region-Convolutional Neural Networks) [13]. The second kind regards the detection of target position as a regression problem and directly uses CNN (Convolutional Neural Network) for images, such as SSD (Single Shot Multi-Box Detector) [14,15], YOLO (You Only Look Once) [16–19].

Computer vision technology has also been widely used in various fields [20–23]. The image processing technology is one of the key technologies in precision agriculture, and it is mainly used in classification, localization, and yield prediction [24]. Mulyono et al. [25] proposed a texture extraction method based on a gray-level co-occurrence matrix that is followed by a K-nearest neighbor for the classification of litchii. Sutarno et al. [26] adopted similar ideas to extract texture information and then used the learning vector quantization (LVQ) algorithm as the classifier to classify durian based on their color, shape, and texture. The method was difficult to detect the subtle feature changes among different fruits, and the accuracy of fruit classification was 89%. Zhao et al. [27] proposed a matching algorithm that used the sum of absolute transformed differences (SATD) for fruit detection, followed by the support vector machine (SVM) classifier. The accuracy of recognition reached more than 83%. Dorj et al. [4] proposed forecasting the yield of citrus yields. The method preprocessed images by color space conversion and denoising then recognized and detected citrus and counted citrus by the watershed segmentation algorithm. Other researchers have also studied the fruit classification, identification, and count of fruits based on shape invariant moments [28], decision trees [29], and Hough [30] combined with the texture and color of fruits. The above methods use single features or multi-feature combinations with texture features, shape size, and color differences of fruits to recognize fruits. The recognition result is about 93% when the environment is complex, such as light changes, fruit overlap, leaf occlusion, etc. In addition, the traditional machine learning algorithm is limited by the result of the classifier of the algorithm itself, and it is difficult for the algorithm to complete the object detection of fruit in a complex environment [31].

Due to the occlusion of fruit and leaves, the image transformation, and the background switching in complex orchard environments, the deep learning-based object detection algorithm can solve these problems quickly and effectively with its powerful learning ability and feature representation capability. Fu et al. [32] proposed a deep convolutional neural network detection model in which the improved Faster R-CNN was trained end-to-end by using backpropagation, random gradient descent algorithm, and ZFNet (Zeiler and Fergus networks) for kiwifruit detection. The experiment showed that the model could improve the accuracy of fruit recognition to 96%. Liu et al. [33] fused RGB and NIR images to identify kiwifruit by VGG16. The average detection precision of an image was 90.7%, and the detection time was 0.134 s on one image. Wang et al. [34] proposed an improved model of a lightweight detection network of SSD. The model used a modified DenseNet network as the backbone to replace the first three additional layers in SSD and incorporate a multi-level fusion structure. Compared with the original model, the number of parameters of the improved model was reduced by

11.14 × 106, and the average precision was increased by 2.02%. The classical deep learning networks have been successful in fruit identification and detection. There are advantages of high accuracy and efficiency in the identification and detection of fruits. However, the networks are relatively large, which is not conducive to the application of mobile equipment in the agricultural field. Many researchers have already studied the lightweight model. For instance, Li et al. [35] applied the adaptive spatial pyramid to detect the green peppers and the accuracy reached 96.11% in YOLOv4\_tiny. Zhang et al. [31] used MobileNet-v3 as the feature extraction network of YOLOv4-LITE. The improved model reduced the model size and improved the detection speed. Therefore, it is feasible to reduce the weight of the model while ensuring the precision of model detection.

The lightweight model will be beneficial to the application of agricultural mobile equipment and realize the intelligence of agricultural equipment. In order to ensure the detection accuracy of the model in complex unstructured orchards and counting fruit, a counting method of red jujube based on improved YOLOv5s was proposed. The main goal of this research was to reduce the size of the model while ensuring its detection accuracy and speed in an embedded device. The effectiveness of counting red jujubes in a complex environment was comprehensively considered from four aspects in this research


The second section introduced the method of making the dataset, the improved red jujubes detection algorithm, the counting method of red jujubes, and the training of the network. The third section introduced the test results of the model and the analysis compared with other algorithms. In the last section, the counting methods of red jujubes were summarized and prospect.

### **2. Materials and Methods**

In this section, the acquisition and production of the dataset were mainly introduced. Then, a detection algorithm based on the improved yolov5s of red jujube was proposed, and a counting method for red jujubes was presented. Finally, the training method of the network was introduced, as shown in Figure 1.

**Figure 1.** A counting method of red jujube based on improved YOLOv5s.

#### *2.1. Image Data Acquisition*

The dataset of red jujube, including Jun jujube and Gray jujube, in this study, was collected from a red jujube orchard from 5 October to 9 October in Alar City, Xinjiang, China. Images of Jun jujube and Gray jujube were taken in a jujube orchard of the 13th company of a group in Alar City, Xinjiang Uygur Autonomous Region. In order to ensure the reliability of the experimental results, the jujube image dataset was collected, which was under different illumination at 9:00 a.m., 3:00 p.m., and 9:00 p.m. for red jujubes. The resolution of the images was 1080 × 1920 pixels, with a total of 1026 original images, which included illumination changes, leaf shading, and fruit overlap. In order to improve the robustness of this model, each image contained one or more different scenarios. The distribution of the dataset is shown in Table 1.

**Table 1.** Distribution of dataset of red jujubes.


#### *2.2. Data Preprocessing and Augmentation*

The collection of data sets would affect the recognition effect of the target detection model. The more sufficient and comprehensive the data set is, the better the generalization ability and robustness of the model. Therefore, the number of samples could be expanded by data amplification. In order to truly simulate the shooting of red jujube in a complex environment and apply it to the detection network, this research used Opencv in python to compress and cut the images into 640 × 640. Then, the images were randomly enhanced by different image processing methods [36], such as rotating 180, mirroring, adding salt and pepper noise which set the threshold to 0.5, and changing the image brightness by setting the threshold to 1.3 and 0.7, as shown in Figure 2. Repeated random image processing on an image many times. After enhancement, a total of 10,000 images were obtained as the data set of the model.

**Figure 2.** Image sample after data preprocessing and augmentation. (**a**) original image, (**b**) rotating by 180◦, (**c**) Increasing brightness, (**d**) mirroring image, (**e**) adding noise, (**f**) reducing brightness.

#### *2.3. Images Annotation and Dataset Division*

In this research, LabelImg was used to label red jujube in the data set with artificial rectangular boxes, as shown in Figure 3. The dataset was divided into 80% training datasets, 10% validation datasets, and 10% test datasets. The final image samples of the training set, verification set, and test set are 8000, 1000, and 1000 respectively.

**Figure 3.** LabelImg data set annotation.
