A Pineapple Target Detection Method in a Field Environment Based on Improved YOLOv7

Lai, Yuhao; Ma, Ruijun; Chen, Yu; Wan, Tao; Jiao, Rui; He, Huandong

doi:10.3390/app13042691

Open AccessArticle

A Pineapple Target Detection Method in a Field Environment Based on Improved YOLOv7

by

Yuhao Lai

,

Ruijun Ma

^*,

Yu Chen

,

Tao Wan

,

Rui Jiao

and

Huandong He

College of Engineering, South China Agricultural University, Guangzhou 510642, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(4), 2691; https://doi.org/10.3390/app13042691

Submission received: 28 January 2023 / Revised: 15 February 2023 / Accepted: 17 February 2023 / Published: 19 February 2023

Download

Browse Figures

Versions Notes

Abstract

:

The accurate detection of pineapples of different maturity levels in a complex field environment is the key step to achieving the early yield estimation and mechanized picking of pineapple. This study proposes a target detection model based on the improved YOLOv7 to achieve the accurate detection and maturity classification of pineapples in the field. First, the attention mechanism SimAM is inserted into the structure of the original YOLOv7 network to improve the feature extraction ability of the model. Then, the max-pooling convolution (MPConv) structure is improved to reduce the feature loss in the downsampling process. Finally, the non-maximum suppression (NMS) algorithm is replaced by the soft-NMS algorithm, with a better effect at improving the detection effect when pineapple is in occlusion or overlaps. According to the test, the mean average precision (mAP) and recall of the model proposed in this paper are 95.82% and 89.83%, which are 2.71% and 3.41% higher than those of the original YOLOv7, respectively. The maturity classification accuracy of the model and the detection performance under six different field scenarios were analyzed quantitatively. This method provides an effective scheme for the vision system of the field pineapple picking robot.

Keywords:

pineapple detection; YOLOv7; deep learning; computer vision; CNN

1. Introduction

China has a long history of pineapple cultivation and is also one of the top 10 pineapple growers [1]. In China, pineapples are mainly picked by hand. At present, the continuous loss of the rural labor force has seriously restricted the development of the pineapple industry [2]. Using picking robots to mechanize picking instead of manual picking can effectively alleviate the problem caused by the loss of rural labor force. Target detection is the key step in mechanized picking [3]. Therefore, in order to realize the efficient automatic picking of pineapple, reduce the burden of workers, and ensure the timely picking of pineapple, it is an effective means to study and improve the target detection technology of pineapples. This has important practical value for developing a computer vision system that can detect pineapples in real time in a complex field environment.

In the field of agriculture, the target detection method based on an image algorithm and the target detection method based on deep learning are two kinds of mainstream target detection methods, both of which have made progress. Fruit detection methods based on traditional image algorithms usually rely on manual experience, such as manually setting image features, such as the shape, color, and texture, and then extracting the features of the candidate regions based on sliding windows and completing the target detection according to the feature classifiers. Chaivivatrakul and Dailey [4] used texture analysis to detect pineapples and bitter melons, achieving an 85% accuracy for pineapples and a 100% accuracy for bitter melons. Wang et al. [5] used local binary patterns (LBP) to detect and count green citrus, achieving an accuracy of 85.6% in the validation set. He et al. [6] used the improved linear discriminant analysis (LDA) method to identify green litchi in a natural environment. He et al. [1] used the Otsu method to perform the threshold segmentation of pineapples in an image and successfully completed the image segmentation of pineapples. Zhao and Lee [7] used a computer image algorithm based on error conversion to detect green citrus. In order to overcome the overlapping and occlusion problems of green apples, Sun et al. [8] proposed a recognition method combining the GrabCut model and Ncut algorithm. Liu et al. [9] used the simple linear iterative clustering (SLIC) method to implement the detection and localization of apples. Vitzra-bin and Edan [10] proposed a threshold algorithm combined with an RGB-D sensor to detect red sweet pepper for highly variable illumination conditions and achieved a detection rate of 90.9%.

There are two kinds of target detection algorithms based on deep learning. The two-stage detection method first uses a region proposal to screen the candidate regions to obtain the region of interest (ROI) and then performs object location and border regression prediction on the selected region. The common two-stage detection methods are Fast R-CNN [11], Faster R-CNN [12], and Mask R-CNN [13]. The one-stage detection method directly divides the whole image into multiple small grids, and feature extraction and classification regression are completed in a very small grid space, judgment and recognition based on the prior frame, and offset information in one step. YOLO [14,15,16] and SSD [17] are a common one-stage detection method. Deep learning can be widely used, mainly due to its high detection accuracy and high detection rate.

Researchers have gradually applied deep learning methods to the agricultural field.

In general, researchers will improve the applicability of a deep learning model in the agricultural field through certain methods to obtain a better detection performance or detection rate, including modifying the network structure of the deep learning model [18,19,20,21,22], adding functional modules to the deep learning model [23,24], and optimizing the postprocessing method of the deep learning model [25]. Ji [26] improved the YOLOX network, replacing the original backbone with ShufflenetV2 and adding a convolutional block attention module (CBAM). Cui [27] successfully proposed a lightweight target detection network by applying leanNet, which solves the problems of similar color and background, fruit overlap, and branch and leaf blocking. In order to achieve the aim of the real-time detection of apple, Yang [28] improved the YOLOv5 network model. The method improved the BottleneckCSP module into the BottleneckCSP-2 module, adding the squeeze and excitation (SE) module and modifying the initial settings of the anchor box size. Kang [29] proposed a neural network model, DasNet-V2, which could realize both detection and segmentation functions. Liu et al. [30] proposed an improved YOLOv5 algorithm to detect the dense citrus in the orchard. By adding the attention mechanism coordinate attention (CA) and applying the bidirectional feature pyramid network (BiFPN) structure, the efficient detection of citrus was realized. Zhang et al. [31] provided a pineapple detection method based on SSD, which used MobileNet as the backbone to optimize the detection speed, but the model is limited only to the detection of a single mature pineapple and cannot distinguish the maturity of the pineapple. This method does not directly test the detection performance of pineapple in the field. Liu et al. [32] used the improved YOLOv3 model to detect pineapple and verified its improved effectiveness under inclusion conditions, but the model can only distinguish ripe and unripe pineapple. In addition, this paper did not analyze the detection performance of the dense multiple pineapple scene. The performances in some related works are shown in Table 1.

Although the above researchers have made certain research achievements in the field of fruit detection by applying advanced target detection algorithms, most of the research directions focus on improving the detection performance of fruits with single maturity and less on improving the detection performance of fruits with different maturities in a natural environment. There are few studies on the target detection of pineapple in a natural environment in the field. Therefore, this paper presents an efficient and stable pineapple target detection algorithm based on the improved YOLOv7, which provides technical support for the efficient automatic pineapple picking robot in a complex field environment. The main contributions of this paper are as follows:

A lightweight and efficient pineapple target-detection model was designed by introducing the attention mechanism SimAM for the original YOLOv7, modifying the the max-pooling convolution (MPConv) structure and replacing the non-maximum suppression (NMS) algorithm with the soft-NMS algorithm. The detection model takes into account the detection accuracy and speed on the premise that it can distinguish the three maturities of pineapples.
Through comparative tests, the differences in the detection performance between the model proposed in this paper and other advanced models were verified, and it is proved that the model proposed in this paper is more suitable for the detection of pineapples in a field environment. At the same time, the classification performances of different models for pineapple with different maturities and the detection performances in different field environments were explored.

2. Materials and Methods

2.1. Pineapple Image Collection

In this study, pictures of pineapples in different growth periods in various field environments were selected as the research object. The image was collected in Qujie town, Xuwen County, Zhanjiang city, Guangdong Province, in May 2022. The acquisition equipment was an Intel RealSense D455 depth camera, and the resolution of the images was 1280 × 720 pixels. In order to obtain clear pineapple pictures stably and accurately, a camera was installed at a distance of 0.5–2 m from the pineapple for shooting, and 2682 pictures were collected in total. The image acquisition device is shown in Figure 1. For the diversity of image data, the collected images included six scenarios: single pineapple, multiple pineapples, exposure, backlight, occlusion, and overlap, as shown in Figure 2.

2.2. Image Processing and Augmentation

There are a variety of interference factors in a field environment, such as light and the stems and leaves of plants, so in order to maximize the restoration of the real field environment, data enhancement means were used to expand the data set to enrich the data set and improve the generalization ability of the training model. The enhancement means include rotating, flipping the actual image, adjusting the image color, brightness, and adding noise to the image. An example is shown in Figure 3.

Through the above means, the total number of data sets were expanded to 4340 images. LabelImg was used as the image-annotation tool. The label file was saved in the XML format. When used, the label file format was adjusted according to the needs of the different networks. Among them, 3472 pictures were used as the training set, 434 pictures were used as the verification set, and 434 pictures were used as the test set. The ratio of the three was 8:1:1. The data set structures are shown in Table 2.

2.3. YOLOv7 Object Detection Network

YOLOv7 [35] is the basic model in the YOLO series. It adopts such strategies as extended efficient long-range attention network (E-ELAN), model scaling based on concatenation-based models [36], and convolution reparameterization [37], and it achieves a very good balance between detection efficiency and accuracy. Compared with other network models of the YOLO series, the detection idea of YOLOv7 is similar to YOLOv4 and YOLOv5, and its structure is shown in Figure 4. The YOLOv7 network is composed of four modules: input, backbone, head, and prediction. The input module scales the input image to a uniform pixel size to meet the input size requirements of the backbone network. The backbone module is composed of several Bconv [35] convolution layers, E-ELAN convolution layers, and MPConv [35] convolution layers, wherein BConv is composed of a convolution layer, batch normalization (BN) layer, and LeakyReLU activation function [38], which is used to extract the image features of different scales. The head module is composed of a path aggregation feature pyramid network (PAFPN) [39] structure. By introducing a bottom-up path, it makes it easier for the underlying information to be transferred to the high level, thus realizing the efficient integration of features at different levels. The prediction module adjusts the number of image channels for the P3, P4, and P5 features of the different scales’ output by PAFPN through the repvgg block (REP) [37] structure and finally passes through a 1 × 1 convolution, which is used to predict the confidence, category, and anchor frame. The field pineapple detection model needs to meet the requirements of real time and accuracy at the same time. In view of the good balance between the detection accuracy and speed, YOLOv7 is selected as the baseline model.

2.4. Improvement to the YOLOv7

Many factors need to be considered for pineapple detection in a complex field environment. For example, the variable lighting conditions, pineapple occlusion, pineapple, and background color are similar. At the same time, this study required the model to have the ability to distinguish pineapple maturity. A pineapple is divided into three categories according to its maturity, and the detection task is a three-category problem. Therefore, it is not appropriate to directly use YOLOv7 as the detection model, and some improvement schemes are needed to achieve better detection results. Based on the above points, in this study, we improved the YOLOv7 model to make it more suitable for pineapple detection in the field.

2.4.1. Improvement on the Network Structure

First, several attention mechanism modules SimAM [40] were embedded into the network structure of YOLOv7. An attention mechanism means that by giving different weights to the network input part, the model ignores irrelevant information and focuses on important information [41], which can effectively improve the feature extraction ability of the model in complex backgrounds. SimAM is an attention mechanism module that cannot increase the number of network parameters. It has the feature of plug and play and can be embedded in any position of the model. Its principle is shown in Figure 5. The core of SimAM lies in the calculation of attention weights using its energy function. SimAM reduces the interference of a complex background on pineapple detection by generating spatial inhibition on the adjacent neurons of the pineapple, highlighting the key features of the pineapple, and enhancing the ability to extract the key features of the pineapple. The calculation process is as follows:

\hat{X} = s i g m o i d (\frac{1}{E}) \otimes X

(1)

E = \frac{4 (σ^{2} + λ)}{{(t - μ)}^{2} + 2 σ^{2} + 2 λ}

(2)

μ = \frac{1}{Q} \sum_{i = 1}^{Q} x_{i}

(3)

σ^{2} = \frac{1}{Q} \sum_{i = 1}^{Q} {(x_{i} - μ)}^{2}

(4)

where

\hat{X}

is the enhanced feature map of the pineapple;

E

is the energy function on each channel. The lower the energy, the higher the differentiation between the pineapple neurons and adjacent neurons. To prevent the value of

E

from being too large, the sigmoid function was used to limit the value of

E

;

\otimes

is the dot product operation;

X

is the input pineapple feature map;

μ

is the mean value of each channel in the input pineapple characteristic map;

σ^{2}

is the variance of each channel in the input pineapple feature map;

λ

is a super-parameter; and

t

is the target pineapple neuron.

Then, the main function of the MPConv is downsampling, which can reduce the feature size through certain feature loss. It can be noticed that the lower branch of the two branches of the MPConv module in YOLOv7 uses a 3 × 3 convolution kernel for convolution operation. As shown in Figure 6, when the step size is 2, some feature information may be lost, and inefficient feature learning may occur in the network. Inspired by the focus module in YOLOv5, the 3 × 3 convolution kernel in the branch under the MPConv was replaced by the focus module. As shown in Figure 7, with the feature map halved, the loss of features was reduced, the learning efficiency of the features was improved, and the performance of the pineapple detection under a complex background was enhanced.

2.4.2. Improvement of the Postprocessing

In this paper, soft-NMS [42] was selected as the postprocessing algorithm of the network model. The traditional NMS algorithm selects the highest score detection box from the test results, determines whether the adjacent detection box is retained by the overlap threshold, and sets the adjacent detection box score to zero directly if it is greater than the threshold value. The fraction reset function of the traditional NMS algorithm is shown in Equation (5).

s_{i} = \{\begin{matrix} s_{i}, I O U (M, b_{i}) < N_{t} \\ 0, I O U (M, b_{i}) \geq N_{t} \end{matrix}

(5)

However, there is also a problem with the traditional NMS algorithm. In densely crowded scenarios, such as pineapple fields in a natural environment, only the scores of adjacent detection boxes above the threshold are directly set to zero, so it is easy to miss detection. Therefore, the soft-NMS algorithm was introduced in this paper. By modifying the score reset function, a penalty function was set for the adjacent detection boxes higher than the threshold value to reduce the score of these detection boxes, instead of directly zeroing them. In this way, for some high score detection boxes, even if the score is reduced in the NMS stage, it may also be used as the correct detection box in the subsequent calculation to effectively improve the detection accuracy and recall rate. At the same time, a Gaussian penalty function was used to solve the problem of continuity. The score reset function of the soft-NMS algorithm used in this paper is as follows:

s_{i} = \{\begin{matrix} s_{i}, I O U (M, b_{i}) < N_{t} \\ s_{i} (1 - I O U (M, b_{i})), I O U (M, b_{i}) \geq N_{t} \end{matrix}

(6)

s_{i} = s_{i} e^{- \frac{I O U {(M, b_{i})}^{2}}{σ}}

(7)

In the formula,

σ

represents the variance of the Gaussian function.

In summary, a schematic diagram of the improved YOLOv7 algorithm model proposed in this paper is shown in Figure 8. The overall detection flow diagram is shown in Figure 9.

2.5. Evaluation Indicators

The detection performance was evaluated by precision (P), recall (R), F1-Score (F1), average precision (AP)and mAP. The specific equations are

P = \frac{T P}{T P + F P} \times 100 %

(8)

R = \frac{T P}{T P + F N} \times 100 %

(9)

F 1 = 2 \frac{P R}{P + R} \times 100 %

(10)

A P = \int_{0}^{1} P (R) d R \times 100 %

(11)

m A P = \frac{1}{n} \sum_{i = 1}^{n} A P_{i} \times 100 %

(12)

The quantity P indicates the percentage of correct detection, R indicates the percentage of pineapples that was successfully detected by network, F1 represents the comprehensive performance of P and R, and mAP is the average value of each type of AP. With the thresholds on the confidence level and the intersection over union (IoU), the number of true positives (TP), false positives (FP), and false negatives (FN) are determined. The TP was a pineapple that was detected as a pineapple, FP was a background that was detected as a pineapple, and FN was a pineapple that was not detected. P and R can be calculated from TP, FP, and FN, and F1 can be calculated from P and R.

2.6. Experimental Details

The training and testing platform for the network model was a laboratory workstation configured as follows: CPU—Intel Xeon Silver 4210 × 2, 10 core 20 threads, maximum runs frequency, 2.82 GHz; GPU—NVIDIA GeForce GTX3090, 24 G memory; run memory—64 G; operating system—professional version of Windows 10 64-bit operating system; deep learning framework—Pytorch1.7; CUDA version—11.1; programming language—Python 3.8. The size of the input picture was 640 × 640 during training. In order to reduce the training process time and obtain a better training effect, 0.001 was the learning rate, the weight attenuation coefficient was set to 0.0005, the optimizer selected was Adam, the training batch was set to 32, and the iteration cycle was set to 200 epochs. After training, the weight file was saved as the weight file used in the test. The output information of the model mainly included two parts: pineapple location frame and ID and probability of the location frame.

3. Results and Discussion

3.1. The Results of Network Training

Figure 10a shows the mAP curve of the training set, and Figure 10b shows the loss curve of the training process. It can be seen that the loss value dropped rapidly in the first 75 epochs and tended to be stable after 100 epochs, while the change trend of the mAP was the same as the loss curve, and the training process was not fitted. In general, setting an epoch that is too small will result in inadequate training, while setting an epoch that is too large will result in overfitting. In this study, 200 epoch output models were selected as the target detection model of the pineapple.

3.2. The Results of the Ablation Test

In order to verify the effectiveness of the improvement strategy, we chose to use an ablation test to evaluate each improvement strategy. In the test, the data set used was the whole test set, and the test environment and data set remained unchanged. The average detection time was obtained by averaging the detection time of each image. The results of the ablation test are shown in Table 3.

As shown in Table 3, the three improvement strategies were effective and could optimize the comprehensive performance of the model. After the application of the SimAM attention mechanism module in the model, mAP increased by 2.37%, R decreased by 0.67%, and the average detection time increased by 20%. After applying the improved MPConv module, mAP increased by 0.49% to 95.97%, while R increased by 1.49%, proving that this strategy could effectively improve the performance of the detection model, and the average detection time decreased by 7%. The soft-NMS also effectively improved the R of the model, making R reach 89.83%, mAP decrease by 0.15%, and the average detection time increase by 2.4%, which is within the allowable range.

3.3. The Results of Comparing the Detection Performance of the Proposed Network with Other Networks

In order to prove the superiority of the model proposed in this paper, its performance was compared with YOLOv4-tiny, YOLOv5s, and YOLOv7.

The data set used in this test was the whole test set. The test environment and data set remained unchanged. In order to achieve the best results for each model, the image input sizes of all models were 640 × 640. Table 4 shows a comparison of the P, R, F1, mAP, and average detection time for the different models, the best result of each item is marked in bold. The model proposed in this paper achieved the best results in the four indicators of P, R, F1, and mAP, while the lightweight network YOLOv5s achieved the minimum average detection time. In terms of the P, our model was 10.65%, 5.22%, and 2.90% higher than YOLOv4-tiny, YOLOv5s and YOLOv7, respectively; the R was 7.06%, 4.69%, and 3.41% higher, respectively. YOLOv5s was close to YOLOv7 in the detection performance, and YOLOv5s was slightly better in the detection speed. Our model was basically the same as YOLOv4-tiny, with an average detection time, but 4.21 and 2.98 ms slower than YOLOv5s and YOLOv7, respectively. The detection time of the four models was less than 25 ms, and our model traded an obvious detection performance improvement with less computation time. Compared with the other models, our proposed model had better detection accuracy, and the detection speed was also within a reasonable range. Therefore, compared with the other models, it is more suitable for pineapple detection in a complex field environment. The three cases of detection results are shown in Figure 11.

3.4. The Results of the Comparison of the Detection Performances of the Proposed Network for Different Maturities of Pineapple

In order to explore the differences in the detection performances between the proposed model and the original YOLOv7 model in detecting pineapples with different maturities, 40 pictures of pineapples with three maturities were selected from the test set for a detection performance test. The test results are shown in Table 5. It can be seen from Table 5 that the number of pineapples to be tested was 64 unripe pineapples, 73 semi-ripe pineapples and 75 ripe pineapples. Comparing the detection results of the model output with the labeled data of the picture, it can be seen that the improved model had better detection results for pineapples of three maturities compared with the original model.

In order to further explore the differences between the improved model and the other models in the detection performance of pineapple with different maturities, the performance of the improved model was compared with YOLOv4-tiny, YOLOv5s, and YOLOv7. The data set used in the test was the whole pictures of the test set. The test results are shown in Figure 12. For YOLOv4-tiny, the lowest AP values were obtained for pineapples with three maturities. Among them, the AP value of unripe pineapple was the highest, higher than that of semi-ripe pineapple and ripe pineapple. This is because YOLOv4-tiny had a poor ability to distinguish mature and semi-mature pineapples, resulting in the simultaneous decline in the AP indicators of both. There are obvious differences between ripe and unripe pineapples, but the difference between ripe pineapples and semi-ripe pineapples was not obvious. If the model had poor differentiation ability for ripe pineapples and semi-ripe pineapples, the AP value of both would be reduced at the same time, resulting in the AP value of unripe pineapples being higher than that of ripe and semi-ripe pineapples. With a better feature extraction network, YOLOv5s could better distinguish ripe and semi-ripe pineapples to prevent the confusion between them and reduce the detection performance. Therefore, the detection AP of both ripe pineapple and semi-ripe pineapple was higher than that of unripe pineapple. Although the detection AP value of YOLOv7 for ripe pineapple was higher than that of YOLOv5s, its detection AP for semi-ripe pineapple was lower than its own detection AP for unripe pineapple. It shows that YOLOv7 had a relatively weak detection ability for semi-ripe pineapple. The improved model proposed in this paper achieved the best results on the three maturities for pineapple detection AP, which were 94.4% for unripe pineapple, 95.5% for semi-ripe pineapple, and 97.5% for ripe pineapple. Among them, compared with the original YOLOv7, the biggest improvement was the detection AP for semi-ripe pineapple. This proves that the improved strategy in this paper is effective for improving the ability of the model to distinguish ripe pineapple and semi-ripe pineapple.

In addition, Zhang et al. [31] proposed a pineapple detection method based on Mobilenet-SSD, which could not distinguish the maturity of pineapple, Liu et al. [32] proposed a pineapple detection method based on improved YOLOv3, which can only distinguish ripe and unripe pineapples, the method proposed in this paper can distinguish ripe, semi-ripe and unripe pineapples. In comparison, our model has richer maturity resolution, which will lead to more accurate detection results.

3.5. The Results of the Detection Performance of the Proposed Network in Different Field Scenarios

In order to explore the advantages of the improved model in detection performance under different complex environments in the field, YOLOv4-tiny, YOLOv5s, and YOLOv7 were selected for a comparison of their detection performances, and the results are shown in Figure 13. In this test, 30 pictures with a single fruit, multiple fruit, exposure, backlight, occlusion, and overlap were selected from the test set as test pictures. As can be seen from Figure 13, occlusion had the greatest impact on the detection precision of the model. When there were many pineapples in the scene, the detection precision was also reduced. From the overall trend analysis of Figure 13a–d, the curve at the top represents its better detection ability. The detection model proposed in this paper was generally at the top of the four charts, while YOLOv4-tiny was at the bottom, so the detection performance of the model proposed in this paper was better than that of the other three. When there was only one pineapple in the image, there was little interference information and the image features were obvious, so the four models had a high detection accuracy for a single pineapple. When there were many pineapples in the image, because the close-range pineapples occupied a large pixel area in the image and the far-range pineapples occupied a small pixel area in the image, it was difficult to detect the far-range pineapples correctly by the model. When the pineapple was exposed, the light changed the color of the pineapple in the picture, making detection more difficult. On the contrary, when backlighting, the light had little influence on the color of the pineapple, and the camera could better capture the information of the pineapple correctly, so the four models had a better detection performance in the backlighting scenes. The occlusion and overlap were similar. They were part of the pineapple information loss in the image. The detection model needed to rely on incomplete pineapple pixel information to detect a complete pineapple. Compared with YOLOv4-tiny and YOLOv5s, YOLOv7 and our model had better pineapple detection performance for the occluded and overlapping scenes. Among them, our model outperformed YOLOv7 in detecting unripe pineapple and semi-ripe pineapple in the occlusion and overlapping scenes. The detection performance of ripe pineapple was very close. Our model was superior to YOLOv7 for the average detection performance of pineapple with three maturities in a variety of complex field scenes. Therefore, our model is more suitable for actual detection tasks.

4. Conclusions

The target detection algorithm based on the improved YOLOv7 proposed in this study was mainly to achieve the efficient detection of pineapple under complex field environments. This method used different mature pineapples collected in a variety of complex field environments as data sets. By adding an attention mechanism, modifying the original network module, and changing the postprocessing mode, the detection speed and accuracy of the model were improved. The P, R, F1, mAP, and average detection time of the training model in the test set were 94.17%, 89.83%, 91.95%, 95.82%, and 23.81 ms. Compared with the original YOLOv7 model, P increased by 2.9%, R increased by 3.41%, F1 increased by 3.17%, mAP increased by 2.71%, and the average detection time increased by 2.98 ms. Compared with YOLOv4-tiny and YOLOv5s, the mAP increased by 9.27% and 4.18%, respectively. This shows that the improved strategy in this paper could significantly improve the detection accuracy of pineapple with different maturities in complex field environments without significantly affecting the detection rate. In a word, the improved model proposed in this paper met the requirements of the pineapple detection task in a complex field environment in terms of the detection accuracy and detection speed.

The model proposed in this paper performed well on a personal computer with strong computing performance but cannot be directly deployed on edge devices with poor computing performance, and the detection performance needs to be improved in dense and highly occluded scenes. In the future, we will continue to study the problem of pineapple real-time detection in dense and highly occluded scenes, while reducing the computational power requirements of the model to adapt to edge devices. We will study three-dimensional positioning technology for pineapple based on a depth camera and achieve efficient automatic picking of pineapple in a natural environment by using the appropriate mechanical arm. The classification of pineapple maturity while picking is expected to be another valuable direction. This will help simplify the subsequent pineapple grading process and reduce the grading cost.

Author Contributions

Conceptualization, Y.L.; Methodology, Y.L. and R.J.; Software, Y.L. and T.W.; Validation, Y.L. and H.H.; Formal analysis, Y.C. and H.H.; Investigation, T.W.; Resources, R.M.; Data curation, R.J.; Writing—original draft, Y.L.; Writing—review & editing, R.M.; Visualization, Y.L.; Supervision, Y.C.; Funding acquisition, R.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Science and Technology Planning Project of Guangdong Province of China (grant number 2021B1212040009); and the APC was funded by Science and Technology Planning Project of Guangdong Province of China (grant number 2021B1212040009).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

He, D.J.; Zhang, L.Z.; Li, X.; Li, P.; Wang, T.Y. Design of automatic pineapple harvesting machine based on binocular machine vision. Anhui Agric. Sci. 2019, 13, 207–210. (In Chinese) [Google Scholar]
Gongal, A.; Amatya, S.; Karkee, M.; Zhang, Q.; Lewis, K. Sensors and systems for fruit detection and localization: A review. Comput. Electron. Agric. 2015, 116, 8–19. [Google Scholar] [CrossRef]
Fu, L.S.; Gao, F.F.; Wu, J.Z.; Li, R.; Manoj, K.; Zhang, Q. Application of consumer RGB-D cameras for fruit detection and localization in field: A critical review. Comput. Electron. Agric. 2020, 177, 105687. [Google Scholar] [CrossRef]
Supawadee, C.; Matthew, N.D. Texture-based fruit detection. Precis. Agric. 2014, 15, 662–683. [Google Scholar]
Wang, C.L.; Lee, W.S.; Zou, X.J.; Choi, D.; Gan, H.; Diamond, J. Detection and counting of immature green citrus fruit based on the Local Binary Patterns (LBP) feature using illumination-normalized images. Precis. Agric. 2018, 19, 1062–1083. [Google Scholar] [CrossRef]
He, Z.L.; Xiong, J.T.; Lin, R.; Zou, X.J.; Tang, L.Y.; Yang, Z.G.; Liu, Z.; Song, G. A method of green litchi recognition in natural environment based on improved LDA classifier. Comput. Electron. Agric. 2017, 140. [Google Scholar] [CrossRef]
Zhao, C.Y.; Won, S.L.; He, D.J. Immature green citrus detection based on colour feature and sum of absolute transformed difference (SATD) using colour images in the citrus grove. Comput. Electron. Agric. 2016, 124, 243–253. [Google Scholar] [CrossRef]
Liu, X.Y.; Zhao, D.; Jia, W.K.; Ji, W.; Sun, Y.P. A Detection Method for Apple Fruits Based on Color and Shape Features. IEEE Access 2019, 7, 67923–67933. [Google Scholar] [CrossRef]
Sun, S.S.; Jiang, M.; He, D.J.; Long, Y.; Song, H.B. Recognition of green apples in an orchard environment by combining the GrabCut model and Ncut algorithm. Biosyst. Eng. 2019, 187, 201–213. [Google Scholar] [CrossRef]
Efi, V.; Yael, E. Adaptive thresholding with fusion using a RGBD sensor for red sweet-pepper detection. Biosyst. Eng. 2016, 146, 45–56. [Google Scholar] [CrossRef]
Girshick, R. Fast R-CNN. IEEE international conference on computer vision (ICCV). arXiv 2015, arXiv:1504.08083. [Google Scholar] [CrossRef]
Ren, S.Q.; He, K.M.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [Green Version]
Lucena, F.; Breunig, F.M.; Kux, H. The Combined Use of UAV-Based RGB and DEM Images for the Detection and Delineation of Orange Tree Crowns with Mask R-CNN: An Approach of Labeling and Unified Framework. Future Internet 2022, 14, 275. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Kuznetsova, A.; Maleva, T.; Soloviev, V. Using YOLOv3 Algorithm with Pre- and Post-Processing for Apple Detection in Fruit-Harvesting Robot. Agronomy 2020, 10, 1016. [Google Scholar] [CrossRef]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Reed, S.; Fu, C.Y.; Berg, A.C. Ssd: Single shot multibox detector. European Conference on Computer Vision (ECCV). arXiv 2016, arXiv:1512.02325. [Google Scholar]
Zheng, Z.H.; Xiong, J.T.; Lin, H.; Han, Y.L.; Sun, B.X.; Xie, Z.M.; Yang, Z.G.; Wang, C.L. A Method of Green Citrus Detection in Natural Environments Using a Deep Convolutional Neural Network. Front. Plant Sci. 2021, 12, 705737. [Google Scholar] [CrossRef] [PubMed]
Zhou, Z.X.; Song, Z.Z.; Fu, L.S.; Gao, F.F.; Li, R.; Cui, Y.J. Real-time kiwifruit detection in orchard using deep learning on Android™ smartphones for yield estimation. Comput. Electron. Agric. 2020, 179, 105856. [Google Scholar] [CrossRef]
Gai, R.L.; Chen, N.; Yuan, H. A detection algorithm for cherry fruits based on the improved YOLO-v4 model. Neural Comput. Appl. 2021. prepublish. [Google Scholar] [CrossRef]
Tu, S.Q.; Pang, J.; Liu, H.F.; Zhuang, N.; Chen, Y.; Zheng, C.; Wan, H.; Xue, Y.J. Passion fruit detection and counting based on multiple scale faster R-CNN using RGB-D images. Precis. Agric. 2020. prepublish. [Google Scholar] [CrossRef]
Tian, Y.N.; Yang, G.D.; Wang, Z.; Wang, H.; Li, E.; Liang, Z.Z. Apple detection during different growth stages in orchards using the improved YOLO-V3 model. Comput. Electron. Agric. 2019, 157, 417–426. [Google Scholar] [CrossRef]
Xu, Z.F.; Jia, R.S.; Sun, H.M.; Liu, Q.M.; Cui, Z. Light-YOLOv3: Fast method for detecting green mangoes in complex scenes using picking robots. Appl. Intell. 2020, 50, 4670–4687. [Google Scholar] [CrossRef]
Fan, Y.C.; Zhang, S.Y.; Feng, K.; Qian, K.C.; Wang, Y.T.; Qin, S.Z. Strawberry Maturity Recognition Algorithm Combining Dark Channel Enhancement and YOLOv5. Sensors 2022, 22, 419. [Google Scholar] [CrossRef] [PubMed]
Lawal, O.M. YOLOMuskmelon: Quest for Fruit Detection Speed and Accuracy Using Deep Learning. IEEE Access 2021, 9, 15221–15227. [Google Scholar] [CrossRef]
Ji, W.; Pan, Y.; Xu, B.; Wang, J.C. A Real-Time Apple Targets Detection Method for Picking Robot Based on ShufflenetV2-YOLOX. Agriculture 2022, 12, 856. [Google Scholar] [CrossRef]
Cui, Z.; Sun, H.M.; Yu, J.T.; Yin, R.N.; Jia, R.S. Fast detection method of green peach for application of picking robot. Appl. Intelligence. 2021. prepublish. [Google Scholar] [CrossRef]
Yan, B.; Fan, P.; Lei, X.Y.; Liu, Z.J.; Yang, F.Z. A Real-Time Apple Targets Detection Method for Picking Robot Based on Improved YOLOv5. Remote Sens. 2021, 13, 1619. [Google Scholar] [CrossRef]
Kang, H.W.; Chen, C. Fruit detection, segmentation and 3D visualisation of environments in apple orchards. Comput. Electron. Agric. 2020, 171, 105302. [Google Scholar] [CrossRef] [Green Version]
Liu, X.; Li, G.; Chen, W.; Liu, B.; Chen, M.; Lu, S. Detection of Dense Citrus Fruits by Combining Coordinated Attention and Cross-Scale Connection with Weighted Feature Fusion. Appl. Sci. 2022, 12, 6600. [Google Scholar] [CrossRef]
Zhang, X.; Gao, Q.; Pan, D.; Cao, P.C.; Huang, D.H. Research on spatial positioning system of fruits to be picked in feld based on binocular vision and SSD Model. J. Phys. 2021, 1748, 042011. [Google Scholar] [CrossRef]
Liu, T.H.; Nie, X.N.; Wu, J.M.; Zhang, D.; Liu, W.; Cheng, Y.F.; Zheng, Y.; Qiu, J.; Qi, L. Pineapple (Ananas comosus) fruit detection and localization in natural environment based on binocular stereo vision and improved YOLOv3 model. Precis. Agric. 2022, 24, 139–160. [Google Scholar] [CrossRef]
Roy, A.M.; Bhaduri, J. Real-time growth stage detection model for high degree of occultation using DenseNet-fused YOLOv4. Comput. Electron. Agric. 2022, 193, 106694. [Google Scholar] [CrossRef]
Wu, D.L.; Jiang, S.; Zhao, E.L.; Liu, Y.L.; Zhu, H.C.; Wang, W.W.; Wang, R.Y. Detection of Camellia oleifera Fruit in Complex Scenes by Using YOLOv7 and Data Augmentation. Appl. Sci. 2022, 12, 11318. [Google Scholar] [CrossRef]
Wang, C.Y.; Bochkovskiy, A.; Liao, H. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object de-tectors. arXiv 2022, arXiv:2207.02696. [Google Scholar]
Wang, C.Y.; Bochkovskiy, A.; Liao, H. Scaled-YOLOv4: Scaling cross stage partial network. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13024–13033. [Google Scholar]
Ding, X.H.; Zhang, X.Y.; Man, N.N.; Han, J.G.; Ding, G.G.; Sun, J. RepVGG: Making VGG-style ConvNets great again. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 13728–13737. [Google Scholar]
Jiang, T.T.; Cheng, J.Y. Target Recognition Based on CNN with LeakyReLU and PReLU Activation Functions. In Proceedings of the The 2019 IEEE Conference on Sensing, Diagnostics, Prognostics, and Control (SDPC), Beijing, China, 15–17 August 2019; pp. 718–722. [Google Scholar]
Ge, Z.; Liu, S.T.; Wang, F.; Li, Z.; Sun, J. YOLOX: Exceeding YOLO series in 2021. arXiv 2021, arXiv:2107.08430. [Google Scholar]
Yang, L.; Zhang, R.Y.; Li, L.; Xie, X. SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks. In Proceedings of the International Conference On Machine Learning(ICML), Virtual Event, 18–24 July 2021; pp. 11863–11974. [Google Scholar]
Santana, A.; Colombini, E. Neural Attention Models in Deep Learning: Survey and Taxonomy. arXiv 2021, arXiv:2112.05909. [Google Scholar]
Bodla, N.; Singh, B.; Chellappa, R.; Davis, L.S. Soft-NMS:Improving object detection with one line of code. In Proceedings of the 2017 IEEE International Conference on Computer Vision(IC-CV), Venice, Italy, 22–29 October 2017; pp. 5562–5570. [Google Scholar]

Figure 1. Image acquisition equipment.

Figure 2. Examples of pineapple images: (a) single pineapple; (b) exposure; (c) occlusion; (d) multiple pineapples; (e) backlight; (f) overlap.

Figure 3. Image augmentation: (a) original image; (b) rotating; (c) flipping; (d) color; (e) brightness; (f) noise.

Figure 4. The network structure of the original YOLOv7.

Figure 5. Principle of a simple and parameter-free attention module.

Figure 6. Convolution process with a stride of two.

Figure 7. Schematic diagram of the MPConv improvement.

Figure 8. The network structure of the improved YOLOv7.

Figure 9. Flow diagram of pineapple detection in the field.

Figure 10. The training result: (a) mAP; (b) loss.

Figure 11. Comparison of the detection results of the four models on pineapples. (a) YOLOv4-Tiny; (b) YOLOv5s; (c) YOLOv7; (d) Ours.

Figure 12. Comparison bar graphs of the AP of the four models for pineapple with different maturities.

Figure 13. Comparison of the detection performance of the four models in different field environments. (a) Detection precision of unripe pineapples; (b) Detection precision of semi-ripe pineapples; (c) Detection precision of ripe pineapples; (d) Average detection of pineapples.

Table 1. Performance of related work.

Model	Detection Target	Precision (%)	Recall(%)	F1 Score (%)	mAP (%)	Average Detection Time (ms)
Improved YOLOv3 [32]	pineapple	94.45	88.48	91.38	-	-
Dense-YOLOv4 [33]	mango	91.45	95.87	93.61	96.20	22.62
YOLO BP [18]	citrus	-	91.00	-	91.55	55.56
Improved YOLOv5 [28]	apple	83.83	91.48	87.49	86.75	15.00
ShufflenetV2-YOLOX [26]	apple	95.62	93.75	-	96.76	15.38
DA-YOLOv7 [34]	Camellia oleifera Fruit	94.76	95.54	95.15	96.03	25.00

Table 2. The details of the pineapple data set.

Data Set	Image Resolution (Pixels)	Number of Images
Training set	1280 × 720	3472
Verification set	1280 × 720	434
Test set	1280 × 720	434

Table 3. Ablation test.

SimAM	F-MPConv	Soft-NMS	mAP (%)	R (%)	Average Detection Time (ms)
			93.11	86.42	20.83
√			95.48	85.75	25.00
√	√		95.97	87.24	23.25
√	√	√	95.82	89.83	23.81

Table 4. Comparison of the proposed network with the other networks.

Model	P (%)	R (%)	F1 (%)	mAP (%)	Average Detection Time (ms)
YOLOv4-tiny	83.52	82.77	83.14	86.55	24.39
YOLOv5s	88.95	85.14	87.01	91.64	19.60
YOLOv7	91.27	86.42	88.78	93.11	20.83
Ours	94.17	89.83	91.95	95.82	23.81

Table 5. The comparison results of the proposed model and YOLOv7 in detecting pineapple under different maturities.

Maturity	Model	Ground Truth Count	Correctly Identified		Falsely Identified
Maturity	Model	Ground Truth Count	Amount	Rate (%)	Amount	Rate (%)
Unripe	YOLOv7	64	53	82.81	5	7.81
Unripe	Ours	64	57	89.06	2	3.12
Semi-ripe	YOLOv7	73	60	82.19	3	4.10
Semi-ripe	Ours	73	66	90.41	1	1.37
Ripe	YOLOv7	75	68	90.66	2	2.66
Ripe	Ours	75	69	92.00	1	1.33

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lai, Y.; Ma, R.; Chen, Y.; Wan, T.; Jiao, R.; He, H. A Pineapple Target Detection Method in a Field Environment Based on Improved YOLOv7. Appl. Sci. 2023, 13, 2691. https://doi.org/10.3390/app13042691

AMA Style

Lai Y, Ma R, Chen Y, Wan T, Jiao R, He H. A Pineapple Target Detection Method in a Field Environment Based on Improved YOLOv7. Applied Sciences. 2023; 13(4):2691. https://doi.org/10.3390/app13042691

Chicago/Turabian Style

Lai, Yuhao, Ruijun Ma, Yu Chen, Tao Wan, Rui Jiao, and Huandong He. 2023. "A Pineapple Target Detection Method in a Field Environment Based on Improved YOLOv7" Applied Sciences 13, no. 4: 2691. https://doi.org/10.3390/app13042691

APA Style

Lai, Y., Ma, R., Chen, Y., Wan, T., Jiao, R., & He, H. (2023). A Pineapple Target Detection Method in a Field Environment Based on Improved YOLOv7. Applied Sciences, 13(4), 2691. https://doi.org/10.3390/app13042691

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Pineapple Target Detection Method in a Field Environment Based on Improved YOLOv7

Abstract

1. Introduction

2. Materials and Methods

2.1. Pineapple Image Collection

2.2. Image Processing and Augmentation

2.3. YOLOv7 Object Detection Network

2.4. Improvement to the YOLOv7

2.4.1. Improvement on the Network Structure

2.4.2. Improvement of the Postprocessing

2.5. Evaluation Indicators

2.6. Experimental Details

3. Results and Discussion

3.1. The Results of Network Training

3.2. The Results of the Ablation Test

3.3. The Results of Comparing the Detection Performance of the Proposed Network with Other Networks

3.4. The Results of the Comparison of the Detection Performances of the Proposed Network for Different Maturities of Pineapple

3.5. The Results of the Detection Performance of the Proposed Network in Different Field Scenarios

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI