**2. Materials and Methods**

The DL-LS algorithm proposed in this study can carry out path planning tasks for autonomous orchard machinery by combining deep learning methods with fruit tree line fitting algorithms. We selected the YOLO V3 network to accurately identify tree trunks with a bounding box, determine key or reference points with the middle points of the bottom lines of the bounding boxes, and fit the tree row reference lines with the least-square algorithm, which can carry out tree row line detection with higher accuracy under different disturbances in orchard scenarios. We collected a large quantity of actual orchard image data. These images were employed to train the YOLO V3 network after the sorting and labeling. Then the coordinates of the bounding box were generated after the tree trucks were detected. The reference point coordinates of the fruit tree can be calculated with these coordinates. The reference lines of the fruit tree rows were fitted by the least-square method. Finally, the centerline of the fruit tree rows was fitted with two reference lines. The principle is shown in Figure 1. This centerline is regarded as the tracking or moving path for the orchard machinery. Figure 2 is a flowchart of the deep learning-based tree/trunk extraction method. In the training stage, images of fruit tree rows in orchards are collected to form a dataset. The dataset is divided into a training set and a test set, and the manual labeling includes two types of tree trunks and fruit trees. The YOLO V3 network is trained using the training set to generate weight files. While testing, the trunk and fruit tree rectangular boxes are generated by the trained network; then fruit tree row reference point coordinates can be obtained by using trunk rectangular box coordinates calculation, and the fruit tree row lines are generated by means of least-squares fitting. Finally, the centerline of the fruit tree rows is obtained using the algorithm.

**Figure 1.** Schematic diagram of orchard navigation line extraction.

**Figure 2.** Flowchart of the deep-learning extraction method of orchard visual navigation line.

#### *2.1. Detection of Fruit Tree and Trunk*

Traditional target recognition methods are strongly dependent on specific images and are susceptible to light intensity, shade, etc. In this thesis, the YOLO V3 network is used to identify fruit trees and the trunk of fruit trees in contact with the ground area.

#### 2.1.1. Network Structure of YOLO V3

YOLO V3 uses the residual module to improve the phenomenon of gradient disappearance or gradient explosion, and YOLO V3 borrows the idea of the feature pyramid networks (FPN) algorithm, which has excellent performance for small-target detection. The YOLO v3 network is based on a regression approach to feature extraction, enabling end-to-end object detection. Thus, it is more suitable for field application environments as it can quickly predict and classify targets while ensuring high accuracy.

The backbone network of YOLO V3 is Darknet-53. There are 53 layers of convolutional neural networks. The last layer is the fully-connected layer, and the other 52 layers appear as the layers for feature extraction [16]. The structure is as shown in Figure 3. Moreover, the residual module is widely used in the Darknet-53 network [13]. The gradient will disappear or explode if there are too many layers in the network. The residual module can improve this situation. YOLO V3 adopts the mechanism of multiscale fusion and multiscale prediction. YOLO V3's excellent performance for small-target detection is highly suitable for the task of trunk detection. It uses both the rich detail and location information of the low-level feature map and the rich semantic information of the high-level feature map to improve the detection precision and detect small targets better [17–21].


**Figure 3.** The structure of YOLO V3.

### 2.1.2. Image Datasets

The training of deep neural networks requires a great amount of data. The image dataset in this study was acquired from a pear orchard in the Daxing District, Beijing, which contains fruit trees of different ages, including young and adult trees. A large number of images of fruit trees were taken under different angles and illumination. The data collection scenarios are shown in Figure 4. In order to improve the training and prediction speed, the resolution of the input side of the image is uniformly converted to 512 × 512 pixels during image pre-processing. To improve the robustness of the model and suppress overfitting, random perturbations are added to expand the amount of data during training, such as random adjustment of contrast, saturation, brightness, etc. Finally, 971 images are obtained. In each sample image, the position and category of trunks and fruit trees are marked by a rectangle box, and the marked data are saved in a particular format. We chose LabelMe V3.16 installed on Anaconda for image labeling.

**Figure 4.** Some examples of the image datasets.

#### 2.1.3. Model Training

The experiments in this study were conducted on a computer with Intel i7, 64-bit and a GTX 1080Ti GPU. The dataset was split into 70% for training and 30% for testing. In the training and testing processes, the unit of the images was pixel. In the process of model training, there are many hyperparameters that need to be set manually, and the

difference in parameters seriously affects the quality of the model, such as the learning rate and batch size. In our model we set the initial learning rate to 0.001 and the batch size to 8. The learning rate is an important hyperparameter in the deep-learning optimizer which determines the speed of the weight updating. If the learning rate is too high, the training result will exceed the optimal value; if the learning rate is too low, the model will converge too slowly. The batch size depends on the size of the computer memory, and the larger the batch, the better the model training effect. After many parameter adjustments, we trained a model with relatively high accuracy which can accurately identify the trunk and fruit trees in the image. After the training, the loss value curve was drawn, as shown in Figure 5. The line reflects the relationship between the loss value and the number of epochs in the training process. The detection error of YOLO V3 dropped rapidly after the first 10 iterations. And the loss value hardly changed after 50 epochs.

**Figure 5.** Loss curves of the YOLO V3 model.

### *2.2. Path Extraction of Orchard Machinery Navigation*

The previous section extracts the information on tree trunk position coordinates from orchard images taken in the row. In this section, the centerline of the fruit tree rows is extracted on the basis of the trunk box coordinates.

#### 2.2.1. Reference Point Generation

The coordinate value of the bounding-box border can be read clearly by generating the position information of the trunk, which contains the coordinate value of the points in the upper-left and lower-right corner. The coordinates of the points in the upper-left and lower-right corner are Pl (xl, yl) and Pr (xr, yr), respectively. The reference point of this trunk is ( xr−xl <sup>2</sup> + xl, yr). The algorithm's pseudocodes are shown in Algorithm 1.

```
Algorithm 1 Obtain available coordinate points
Input: Acquired raw image
           [r c] = size(img)
Imghalfwidth = c/3/2
           A = importdata (txt)
           [m,n] = size (A.data)
1: for I = 1:m
2: if textdata including "trunk"
3: if Second data < imghalfwidth
4: y = The fifth data value in A
5: x = 0.5 (fourth data value - second data value) + second data value
6: else
7: y = The fifth data value in A
8: x = 0.5 (fourth data value - second data value) + second data value
9: end
10: end
11: end
```
2.2.2. Line Fitting of the Tree Rows

The reference points of the fruit trees are fitted into the reference lines of the fruit trees on both sides of the row by the least-square method. If there are fewer than three available tree trunks extracted in case of missing fruit trees, we simply connect the nearest two reference points. The process is shown in Algorithm 2.

**Algorithm 2** Obtain the reference lines


2.2.3. Obtaining the Centerline

The centerline of the previously obtained two reference lines of the fruit tree rows on both sides is the reference line of the orchard machinery, and its detailed principle is shown in Figure 6. We denote point Pl1 as the farthest reference point on the left reference line in the image. Its corresponding point on the right reference line is Pr1. We connect the segment Pl1 Pr1 and calculate the midpoint Pm1. Similarly, we denote point Pl2 as the nearest reference point and connect the segment Pl2 Pr2 to determine the point Pm2. Currently, the straight line passing through Pm1 and Pm2 is the reference line for the orchard machinery. The algorithm flow is shown in Algorithm 3.

#### **Algorithm 3** Obtain the centerline


**Figure 6.** Centerline acquisition for orchard machinery.

#### **3. Results and Discussion**

#### *3.1. Tree and Trunk Detection Results*

The trained network can identify the tree trunk and fruit tree accurately. The detection accuracy is shown in Table 1. The average precision (AP) of the trees is 92.7%, and the AP of the trunks is 91.51%. The MAP of detection can reach 92.11%, which is not easily affected by sunlight. The trunk of the same fruit tree can be accurately detected under normal sunlight and strong sunlight, as shown in Figure 7. This method has a stronger anti-interference ability compared with traditional methods, especially in the morning and afternoon when the lighting condition changes. Furthermore, weeds easily affect the results of the interference; this is because the color and shape of weeds and leaves are very similar and because weeds occasionally become entangled with the tree trunks. Figure 8 shows the detection result under strong sunlight. The recognition result of the trunks and fruit trees obtained by this network in weed-rich environments shows it to be helpful in alleviating the interference caused by weeds. As shown in Figure 9, the effect of trunk extraction on both sides of the fruit tree rows is excellent under normal sunlight, which is an important basis of this study. Figure 10 shows the result of weak sunlight.

**Table 1.** Detection accuracy.


**Figure 7.** Detection results under different sunlight conditions.

**Figure 8.** Detection results of tree and trunk under strong sunlight.

**Figure 9.** Detection results of trunk under normal sunlight.

**Figure 10.** Detection results of trunk under weak sunlight.
