A Detection Model for Cucumber Root-Knot Nematodes Based on Modified YOLOv5-CMS

Wang, Chunshan; Sun, Shedong; Zhao, Chunjiang; Mao, Zhenchuan; Wu, Huarui; Teng, Guifa

doi:10.3390/agronomy12102555

Open AccessArticle

A Detection Model for Cucumber Root-Knot Nematodes Based on Modified YOLOv5-CMS

by

Chunshan Wang

^1,2,3

,

Shedong Sun

^1,3,

Chunjiang Zhao

²,

Zhenchuan Mao

^4,*,

Huarui Wu

² and

Guifa Teng

^1,3

¹

School of Information Science and Technology, Hebei Agricultural University, Baoding 071001, China

²

National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China

³

Hebei Key Laboratory of Agricultural Big Data, Baoding 071001, China

⁴

Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences, Beijing 100081, China

^*

Author to whom correspondence should be addressed.

Agronomy 2022, 12(10), 2555; https://doi.org/10.3390/agronomy12102555

Submission received: 5 September 2022 / Revised: 7 October 2022 / Accepted: 17 October 2022 / Published: 19 October 2022

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The development of resistant cucumber varieties is of a great importance for reducing the production loss caused by root-knot nematodes. After cucumber plants are infected with root-knot nematodes, their roots will swell into spherical bumps. Rapid and accurate detection of the infected sites and assessment of the disease severity play a key role in selecting resistant cucumber varieties. Because the locations and sizes of the spherical bumps formed after different degrees of infection are random, the currently available detection and counting methods based on manual operation are extremely time-consuming and labor-intensive, and are prone to human error. In response to these problems, this paper proposes a cucumber root-knot nematode detection model based on the modified YOLOv5s model (i.e., YOLOv5-CMS) in order to support the breeding of resistant cucumber varieties. In the proposed model, the dual attention module (CBAM-CA) was adopted to enhance the model’s ability of extracting key features, the K-means++ clustering algorithm was applied to optimize the selection of the initial cluster center, which effectively improved the model’s performance, and a novel bounding box regression loss function (SIoU) was used to fuse the direction information between the ground-truth box and the predicted box so as to improve the detection precision. The experiment results show that the recall (R) and mAP of the YOLOv5s-CMS model were improved by 3% and 3.1%, respectively, compared to the original YOLOv5s model, which means it can achieve a better performance in cucumber root-knot nematode detection. This study provides an effective method for obtaining more intuitive and accurate data sources during the breeding of cucumber varieties resistant to root-knot nematode.

Keywords:

root-knot nematode; cucumber; target detection; YOLOv5

1. Introduction

Plant-parasitic nematodes are one of the major pathogens of plant invasive diseases, and root-knot nematodes are an important class of plant-parasitic nematodes that seriously threaten the production of vegetables, fruit trees and other crops around the world. According to incomplete statistics, the annual economic loss caused by root-knot nematode diseases is over 100 billion US dollars worldwide, and in China alone, the annual loss is about 3 billion US dollars [1]. In recent years, the cucumber planting area has been continuously increasing in China. Moreover, with the large-scale promotion of solar greenhouses, the increase of multi-cropping index, and the worsening of continuous cropping, a favorable environment for the occurrence and development of root-knot nematode diseases has been created. Consequently, nematodes are accumulating and proliferating rapidly in the soil, making them an important factor that endangers cucumber production in China.

The breeding of cucumber varieties with better resistance is an effective measure to combat the root-knot nematode diseases [1]. In the breeding process, it is necessary to evaluate the disease resistance capabilities of different cucumber cultivars by observing the visual characteristics of the roots. This evaluation procedure requires the detection, counting and classification of the spherical bumps formed by root-knot nematode infection on a huge number of plant roots. At present, this task mainly relies on manual operation, which is not only time-consuming and labor-intensive, but also prone to subjective judgment error. As an alternative method, the deep-learning-based target detection models may be used to detect and locate the infection sites in order to reduce labor consumption while improving the detection efficiency and accuracy.

Following the rapid development in the past decade, the deep learning technology has made remarkable achievements in many fields such as the classification, detection, segmentation and generation of computer vision targets. As expected, deep learning has also been widely applied in the field of high-throughput plant phenotyping. For example, in terms of plant root analysis, a convolutional neural network (CNN)-based method was proposed to monitor the soybean root growth over time without destructive excavation of the soil [2], which realized the automatic segmentation and length estimation of soybean root. Kang et al. [3] proposed a semantic segmentation model of a cotton root system by incorporating an attention mechanism on the basis of DeepLabv3+. By assigning higher weights to the pixels of fine roots and their root hairs, this method realized the segmentation of a cotton root system from the complex soil background. Smith et al. successfully segmented roots from the soil in RGB images by using the classic U-Net model, but this network could only make predictions at a single scale, not able to handle variations in the root size [4]. The encoder-decoder based CNN architecture is able to fuse local features with global features. In addition to the segmentation of plant roots [5], this architecture has also been widely used for the segmentation of other plant [6,7,8].

In terms of root disease detection, Ostovar et al. realized the detection and classification of root rot disease (RBR) in Norwegian spruce stumps by using RGB images based on machine learning technology [9]. In order to detect and quantify the tomato root-knot nematode infection, Pun [10] developed a semi-automatic method based on the size of the target by applying the image analysis technique. Mazurkiewicz M proposed a semi-automatic image analysis method to examine the nematode biomass in marine sediments by applying Leica M205C [11]. Evangelisti designed a CNN to detect the mycorrhizal fungi in root images [12]. Yasrabl et al. proposed a new root image analysis method powered by a deep neural network (RootNav 2.0), which provided the ability to analyze root systems at scale. The use of deep learning technology in the detection of root diseases has become a new research paradigm and has played a significant role. More specifically, it has become a general means for the detection of roots in the pests and diseases control field.

From the literature review above, it can be seen that deep learning has demonstrated a great potential and utility in root disease detection. In the breeding process of cucumber varieties resistant to root-knot nematodes, the localization and counting of infected sites consume a great amount of time and energy from agricultural and plant protection experts. In order to rapidly and accurately detect and count the infected sites and reduce the need of human resources, this paper proposes a cucumber root-knot nematode detection method based on the modified YOLOv5s model (i.e., YOLOv5s-CMS), which shows a better performance. The main contributions of this paper are as follows:

(1): A cucumber root-knot nematode image dataset was constructed, which provided data support for the training and validation of the root-knot nematode detection model.
(2): The dual-attention mechanism CBAM-CA was adopted to allow the model to focus on key regions of the target, which improved the model’s ability of capturing distinguishable features from a small target and enhanced the overall detection performance.
(3): The following methods were used to improve the model performance: the K-means++ algorithm was used to help the initial clustering center to jump out of the local optimal solution to achieve the global optimal solution; the SIoU loss function was used to replace the original CIoU function in order to fully consider the influence of the vector direction between the ground-truth box and the predicted box so as to speed up the model convergence.

2. Materials and Methods

2.1. Image Acquisition

The samples of cucumber root knot nematodes are mainly from Langfang Experimental Base of Vegetable and Flower Research Institute, Chinese Academy of Agricultural Sciences. The image acquisition process is as follows. Firstly, take the main root system as the center and dig out the root. Secondly, after washing away the soil and debris with water, use absorbent paper to gently remove the surface moisture. Thirdly, the roots are naturally unfolded and placed on a well-lit black background for photography. The roots of diseased cucumbers were photographed at a distance of 30–50 cm using a digital single lens reflex (DSLR) camera to ensure that the target area was fully displayed within the image. Canon EOS 60D was adopted for the camera, and its main parameters ISO was set to 1000, AF (Automatic Focus). The dataset contained the root data of nematode-infected cucumber plants accumulated in the past 3 years. After screening the original images, 391 images containing the target were obtained and the resolution of each image is 2592 × 1728 (see Figure 1 for examples).

2.2. Image Processing

2.2.1. Image Preprocessing

By manually cutting 391 images, 686 images containing root knot nematode characteristics were obtained. Then, we resized the 686 images to 640 × 640 in order to meet the model’s requirements for input image size. We used the labelimg tool to label 686 images, and labeled the area infected with nematodes in each image as root-knot nematode label. See Figure 2b for an example of the process. The red box indicates the first infection label position, and the green box indicates the second infection label position. Then, we generated the corresponding xml format file containing the location information, including the image category name information and the location information of the target rectangle box, as shown in Figure 2c. The plant protection experts who are familiar with root knot nematode control were responsible for marking to ensure the accuracy of infection area identification.

2.2.2. Image Augmentation

A deep learning model is required to learn features of the target from a large number of samples. In order to adapt to the amount of data needed for the training of the deep learning target detection network, increase data diversity and reduce the overfitting of model training, data augmentation methods were used in this paper to expand the dataset. Taking into account the influence of shooting angle and light condition in the image acquisition process, the specific data augmentation methods used are as follows: (1) Random rotation: the original image is randomly rotated for an angle of (90°, −90°); (2) Random brightness augmentation: the brightness of the image is randomly adjusted between (0.7, 1.3); (3) Random color augmentation: the image color is randomly adjusted between (1, 1.3); (4) Random contrast augmentation: the image contrast is randomly adjusted between (1, 1.3). Using the above data augmentation method, the number of augmented images was 3430. All images were divided into the training set (2744 images), validation set (343 images), and test set (343 images) according to the ratio of 8:1:1. Finally, the dataset was converted into VOC2007 format to meet the experiment requirements of YOLOv5.

3. YOLOv5s Target Detection Algorithm

The YOLO series detection networks are an important branch of target detection models, and their applications in agricultural target detection are becoming increasingly more extensive [13,14,15]. Among the YOLO series, the YOLOv5s algorithm aims to adopt the smallest network depth and feature map dimension as possible to ensure that it has a small weight file and a fast detection speed. The network structure of Yolov5 is shown in Figure 3. Under the premise of ensuring target detection precision, YOLOv5s is helpful for a mobile device to achieve real-time target detection [16,17]. Therefore, the YOLOv5s network was chosen as the basic network for modification in this paper.

YOLOv5s is a classic one-stage target detection algorithm. Its network structure can be divided into four parts: Input, Backbone, Neck and Prediction.

3.1. Adaptive Anchor Box Calculation

At the input, a predicted box will be output based on the initial anchor box. The predicted box will be compared with the ground-truth box to calculate the difference between the two. On such a basis, the epoch parameter will be updated in a reverse direction. During each training session of YOLOv5s, the adaptive anchor box calculation aims to identify and update the optimal anchor box values in different training sets. However, this method requires a large number of training sessions. In order to save time and improve the detection performance of the model, the k-means++ clustering method was applied in this paper for generating the anchor box.

3.2. Backbone

Backbone is known as the feature extraction network. Its most critical role is to extract features from the image. YOLOv5s uses a convolutional layer with a size of 6, a step length of 2, and a padding of 2 as the first layer of the network. Compared with the conventional focus layer, it can improve the detection speed while enhancing the transportability of the model at the same time. The Backbone is composed of multiple CBL modules and C3 modules. The CBL module is formed by the convolution layer, the BN layer and the SiLU activation function. This combination method can reduce the numerical operation of the BN layer, thereby improving the model detection efficiency, as shown in Figure 4a. The C3 module has two branches: one is formed by the Bottleneck layer and the convolution layer, and the other is formed by the convolution layer only. The final output of C3 is the superposition of the two branches, as shown in Figure 4b.

3.3. Neck

The feature pyramid of the Neck can further improve the model’s feature extraction ability. The role of the Neck is to fuse the feature maps extracted by the Backbone so as to further integrate the contextual information and reduce feature loss. This is very useful for improving the model’s overall performance. In the process of feature fusion, the Neck adopts the FPN+PAN feature pyramid structure, which fuses the low-level spatial features and high-level semantic features in a bidirectional manner to enhance the detection ability on targets of different scales. The feature pyramid has different resolutions at different scales, so that the targets of different scales can be properly represented at their corresponding scales, thereby improving the performance of the model.

4. Modified YOLOv5s Model

4.1. The Embedded Dual Attention Mechanism CBAM-CA

Inspired by the fact that the human visual system is able to find important regions in complex scenes naturally and efficiently, the attention mechanism has been introduced into computer vision systems [18]. The attention mechanism does not rely on manual labeling and can be used as an effective, weakly supervised learning method. The essence of the attention mechanism is to assign different weights to different parts of interest in the model, which is helpful for the model to extract key information and make more accurate judgments. In the process of dataset labeling, it was found that the target regions of cucumber root-knot nematodes had high similarity with other regions of the root, so that the distinguishability of the target was not high. This requires the model to learn the features of key regions as much as possible while ignoring the features of non-critical regions. In this regard, the attention mechanism can help the model focus on important features of the target and ignore non-important features [19]. Therefore, introducing a suitable attention mechanism is an effective means to improve the model’s ability of capturing key features of the target. However, the existing attention modules have generally two problems. First, due to the neglect of channel information and spatial information as well as the interaction between them, the existing attention modules lack the ability to infer attention weights along the channel dimension and along the spatial dimension respectively, and then multiply with the original feature map to make adaptive adjustments on the features. Second, in terms of channel attention, the existing modules usually ignore the location information, which is actually very important for generating spatially selective attention maps. In order to solve these two problems, this paper proposed a dual attention mechanism module by concatenating the CA attention mechanism [20] after the CBAM channel attention module [21], which enabled the model to pay attention not only to the key features of the spatial dimension but also to the location information via the channel attention module. This is very meaningful for improving the performance of the model.

The targets of cucumber root-knot nematodes are generally small, and are mostly of a spherical shape. The shape of the target is the key feature to distinguish it from other root regions. Therefore, a higher weight should be assigned to the channel focusing on this feature, while the weights of other channels should be reduced appropriately. The objective of this task can be achieved by the channel attention mechanism in CBAM-CA, which has a great significance in helping the model extract important features of the cucumber root-knot nematode target (the green box in Figure 5 illustrates the channel attention module). The feature map is input into the channel attention module, and the weight of channel attention is obtained through the multi-layer perceptron (MLP), which is weighted into the input feature map, as shown in Equation (1).

M_{c} (F) = σ (W_{1} (W_{0} (F_{avg}^{c})) + W_{1} (W_{0} (F_{\max}^{c})))

(1)

The nematodes move very slowly. Without the help of external factors, they can only move less than one meter a year. Therefore, the cucumber root-knot nematode targets are characterized by a clustered distribution pattern. In order to reduce missed detections and false detections due to the dense distribution of the targets, the model is required to capture the location information of the target, while the spatial attention module is an effective way to achieve this goal. The spatial attention mechanism in CBAM is able to determine the location of key feature aggregation, which can strengthen the model’s attention to the target in the spatial dimension (the red box in Figure 5 illustrates the spatial attention module). The spatial attention module first performs a pooling operation to change the feature dimension, and then carries out a 7 × 7 convolution operation. At last, it standardizes the feature and fuses it with the feature map directly output by the model to complete the recalibration of the feature map of the two dimensions. This process aims to achieve the purpose of obtaining the spatial features of the target, as shown in Equation (2).

M_{s} (F) = σ (f^{7 \times 7} (F_{avg}^{s}; F_{\max}^{s}))

(2)

where, F refers to the feature map; F_avg and F_max refer to the global average pooling and maximum pooling, respectively;

f^{7 \times 7}

represents the convolution operation with a kernel of 7 × 7.

Some of the cucumber nematode targets have inconspicuous features and small sizes, and are therefore difficult to be detected. This raises a higher requirement for the model to extract key features. A good solution for this problem is to embed the location information into the channel attention module and retain the accurate location information of weak and small targets. In this regard, the coordinate attention (CA) mechanism can achieve the purpose of embedding the location information into the channel attention module. While extracting channel features, it can also capture the direction and location information, allowing the neural network to acquire more critical features while avoiding causing a heavy workload (the blue box in Figure 5 illustrates the CA module). The CA module performs max-pooling in both the horizontal and vertical directions, then encodes the spatial information and finally fuses the information in a weighted manner on the channel. The decomposition pooling method is shown by Equations (3) and (4).

z_{c}^{h} (h) = \frac{1}{W} \sum_{0 \leq j \leq W} u_{c} (h, j)

(3)

z_{c}^{w} (w) = \frac{1}{H} \sum_{0 \leq j \leq H} u_{c} (i, w)

(4)

Equations (3) and (4) calculate the output of the c-th channel encoded by height h and width w on the input feature map U, where

u_{c} (h, j)

refers to the eigenvalue of the c-th channel with height h and width j on the input feature map U, and

u_{c} (i, w)

refers to the eigenvalue of the c-th channel with height w and width i on the input feature map U.

Then, location attention aggregation is performed to achieve the purpose of utilizing the location information generated by the encoding process from both directions, which is concatenated in the spatial dimension through the Concatenate module. As shown in Equation (5).

f = δ (F_{1} ([z^{h}, z^{w}]))

(5)

where [

Z^{h}, Z^{w}

] represents the Concatenate operation;

δ

is the nonlinear activation function;

f

is the intermediate feature map of the location information.

The feature map f is decomposed into two tensors of space directions

f^{h}

and

f^{w}

, which are further transformed into two tensors

g^{h} and g^{w}

with the same number of channels by a convolution operation.

g^{h} = σ (F_{2} (f^{h}))

(6)

g^{w} = σ (F_{3} (f^{w}))

(7)

Finally, the output of CA can be expressed in the form of Equation (8).

y_{c} (i, j) = u_{c} (i, j) \times g_{c}^{h} (i) \times g_{c}^{w} (j)

(8)

where,

g_{c}^{h} (i)

is the horizontal attention weight of the c-th channel with height i on the input feature map;

g_{c}^{w} (j)

is the vertical attention weight of the c-th channel with width j on the input feature map;

y_{c}

(i, j) refers to the eigenvalue of the c-th channel with coordinates (i, j) on the output feature map.

The CBAM-CA dual attention mechanism is flexible and lightweight. It not only takes into account the inter-channel relationship and the location information, but also strengthens the connections between important features of the target in both the channel and spatial dimensions. This is helpful for the YOLOv5s model to extract key features of the target. The experiment results indicate that the CBAM-CA mechanism can improve the performance of dense prediction and small target detection to varying degrees. In this paper, some target regions of cucumber root-knot nematode are characterized by a high distribution rate, high density and small target size. Therefore, the CBAM-CA mechanism is meaningful in improving the model performance. Owing to the advantages of flexibility, modularity and light weight, the CBAM-CA dual attention mechanism can better meet the requirements of model modification. Figure 6 shows the diagram of the Backbone structure, which is incorporated with the CBAM-CA mechanism.

4.2. K-Means++ Customized Anchor Box

The YOLOv5s algorithm adopts K-means clustering to cluster the bounding boxes in the dataset to obtain an appropriate anchor box size. This method is simple and efficient, but its selection of the initial clustering center is random. Moreover, it is prone to falling into a local optimal solution, and thereby unable to obtain the global optimal solution. With respect to this problem, the K-means++ clustering algorithm can be used to optimize the selection method for the initial clustering center for the purpose of saving time and cost, improving model performance and avoiding detection error. On the basis of analysis of the image and target sizes, the K-means++ algorithm was used to replace the K-means algorithm in YOLOv5s, to make sure that the model could generate an anchor box size suitable for the dataset used in this study, instead of the initial one. The main steps of this algorithm are as follows:

(1): A cluster center point C1 is randomly selected in the dataset, and the distance between the target x_i and C1 is calculated one by one, which is denoted as L(x). Thus, the probability that the current target is selected as the next cluster center point is $\frac{L {(x_{i})}^{2}}{\sum_{i = 1}^{n} L {(x_{i})}^{2}}$ . The roulette method is applied to select the next cluster center according to this method. This step is repeated until N cluster center points are determined.
(2): The distances between the remaining targets and the N cluster center points are calculated. According to the obtained distances, the remaining data samples are assigned into the subset where the center point with the smallest distance CN is located.
(3): The center point of each subset is recalculated.
(4): Steps (2) and (3) are repeated until the cluster center point no longer moves.

The sizes of the anchor boxes generated by the K-means++ clustering algorithm are presented in Table 1.

4.3. Improved Loss Function

In neural network training, calculating the difference between the predicted value and the ground-truth value after each round of iteration is the key to ensuring that the model will be continuously optimized toward convergence. The loss function is an important means to achieve this goal. The YOLOv5s model uses the CIoU_Loss function as the loss function of the bounding box, which comprehensively takes into account the distance, the overlap rate, the box scale and the cost factor between the ground-truth box and the predicted box, making the regression of the target bounding box more stable [22].

The equation of CIoU is as follows:

C I o U = I o U - \frac{ρ^{2} (b, b^{g t})}{c^{2}} α v

(9)

where,

ρ^{2} (b, b^{g t})

refers to the Euclidean distance between the center point of the predicted box and the center point of the ground-truth box; c refers to the diagonal length of the smallest outer rectangular box that contains both the predicted box and the ground-truth box; αv refers to the cost factor, which is used to make the regression more stable.

The equation of CIoU Loss is as follows:

L_{C I o U} = 1 - C I o U

(10)

The targets in the cucumber root-knot nematode dataset are characterized by a large quantity, small bounding box and dense distribution. In regions with densely distributed targets, the model may encounter the problem of false localization. In view of the fact that CIoU does not take into account the direction mismatch between the required predicted box and the ground-truth box, there may be situations where the predicted box has a high degree of freedom, the matching convergence rate between the predicted box and the ground-truth box is low, or the precision is low. Consequently, the final prediction effect of the model for densely distributed regions will be negatively affected. Fusing the directional information between the predicted box and the ground-truth box into the loss function is an effective way to avoid the problems mentioned above. In this paper, SIoU was adopted as the loss function, which fully considers the influence of the vector direction between the predicted box and the ground-truth box in order to increase the moving speed of the predicted box and reduce the loss in degree of freedom. Thus, the predicted box would be able to converge in a specific direction, and the model’s detection speed and precision could be improved.

By considering the vector angle between the required regressions, redefining the cost metrics, and adding the angle cost, SIoU [23] Z effectively reduced the loss of degrees of freedom. The SIoU loss function consists of four cost functions, namely, angle cost, distance cost, shape cost and IoU cost. Firstly, the angle cost function

Λ

is as follows:

Λ = 1 - 2 * \sin^{2} (\arcsin (x) - \frac{Π}{4})

(11)

where, x is the sin value of angle α that is formed by the line connecting the center point of the predicted box and the center point of the ground-truth box with the x-axis and y-axis and that is less than Π/4. The use of trigonometric functions to control the direction between the predicted box and ground-truth box is a simple and efficient method, as it can minimize the number of distance-related variables. As shown in Figure 7.

To adapt to the angle cost defined above, the redefined distance cost function ∆ is as follows:

Δ = \sum_{t = x, y} (1 - e^{- γ ρ_{t}})

(12)

where,

γ = 2 - Λ

. It can be seen that when

Λ

approaches 0, the contribution of the distance cost is greatly reduced. This method establishes the distance value cost scheme with time priority for the distance cost.

The shape cost function Ω is as follows:

Ω = {\sum_{t = w, h} (1 - e^{- w_{t}})}^{θ}

(13)

where ww =

\frac{| w - w^{G T} |}{\max (w, w^{G T})}

, wh =

\frac{| h - h^{G T} |}{\max (h, h^{G T})}

, (w, h) and (w^GT, h^GT) refer to the width and height of the predicted box and ground-truth box respectively; θ is used to control the degree of attention to the shape loss. The IoU cost is defined as follows:

L_{I o U C o s t} = 1 - I o U

(14)

The final SIoU loss function is as follows:

L_{S I o U} = 1 - I o U + \frac{Δ + Ω}{2}

(15)

5. Experiment and Analysis

5.1. Model Training

5.1.1. Experiment Environment

The configurations of the experiment platform are as follows: operating system: 64-bit Ubuntu 18.04LTS, Linux; CPU: Intel^® Core(TM) i9-9820X CPU @ 3.00GHz; memory: 64G; GPU: Nvidia GeForce GTX 2080Ti; development environment: Python 3.8, PyTorch 1.9.0 and CUDA 10.1.

5.1.2. Parameter Settings

In order to ensure the fairness and accuracy of the training results, all the models were trained and tested with the same parameter settings (see Table 2).

5.2. Evaluation Indicators

For YOLOv5 models, the prediction results were measured by setting a confidence threshold for the target regions contained in the images. In this study, the confidence threshold was set to 0.5, which means the prediction results higher than 0.5 are judged as correct prediction results. There would be an error between the ground-truth box and the predicted box. The smaller the error, the more precise the detection result is. IoU is a key parameter for measuring the detection precision, which can be expressed by Equation (16). In this experiment, IoU was set to 0.5.

I o U = \frac{D R \cap G T}{D R \cup G T} \times 100 %

(16)

where DR stands for detection result, which represents a prediction box, and GT stands for ground truth, which represents a real annotation box.

Precision, recall, F1 and mAP are commonly used evaluation indicators in target detection [24,25]. Precision (P) refers to the proportion of the targets that are correctly predicted among all the targets, which can represent the correlation of the results. It describes the proportion of positive samples in all the samples that are predicted as positive. Recall (R) refers to the proportion of the targets that are correctly predicted among all the ground-truth targets. F1 refers to the harmonic mean of P and R. The formulas for P, R, and F1 are shown in Equations (17), (18) and (19), respectively.

P = \frac{T P}{T P + F P} \times 100 %

(17)

R = \frac{T P}{T P + F N} \times 100 %

(18)

F 1 = \frac{2 P R}{P + R}

(19)

where TP represents the cucumber root-knot nematode targets that are correctly detected; FP represents the cucumber root-knot nematode targets that are falsely detected; FN represents the detected targets that are not cucumber root-knot nematode.

mAP is a primary evaluation indicator for target detection, which measures the overall performance of a network. It is the mean of the average precisions of all categories, which can be expressed as follows.

m A P = \frac{\sum_{i = 1}^{n} A P_{i}}{n}

(20)

where n represents the number of detection categories. In this paper, there is only one category, i.e., cucumber root-knot nematode, so mAP = AP.

5.3. Experiment Results and Analysis

5.3.1. Comparison of Algorithms Incorporated with Different Attention Mechanisms

In order to validate the effectiveness of the CBAM-CA dual attention mechanism module proposed in this paper, different mainstream attention mechanism modules were comparatively tested. To ensure the fairness of comparison, the parameter settings shown in Table 2 were used for all the tests. The position where each attention mechanism is added is the same as the position where the CBAM-CA module is added, as shown in Figure 6. The results are shown in Table 3.

It can be seen from Table 3 that the CBAM-CA dual attention mechanism showed advantages in terms of all indicators. Although its number of total parameters was increased by 182,706 compared with the original YOLOv5s model, its P, R and mAP were improved by 1.1%, 2.3%, and 2.4% respectively. The effectiveness of the CBAM-CA mechanism designed in this paper in detecting cucumber root-knot nematode targets was therefore validated.

5.3.2. YOLOv5s-CMS Results Analysis

Comparative tests were conducted on YOLOv3 [26], YOLOv4 [27], Faster R-CNN [28], and 5 versions of YOLOv5 series in terms of multiple indicators. The results are shown in Table 4.

As can be seen from Table 4, the p values of YOLOv3, YOLOv4, and Faster R-CNN were all above 65%, but their R and mAP values were not satisfactory. The Faster R-CNN model uses ResNet50 as the feature extraction network, but its p value was low. YOLOv4 had a p value of 75.5%. In general, the R, F1, and mAP values of YOLOv3, YOLOv4, and Faster R-CNN were obviously lower than those of the five versions of YOLOv5 and the YOLOv5-CMS model. The YOLOv5m, YOLOv5l, and YOLOv5x models have deeper layers and a larger number of parameters. Because these models have more parameters than YOLOv5s they should have better feature extraction and target detection capabilities, but YOLOv5s delivered a better performance based on the test results. One of the reasons for this phenomenon may be that the number of YOLOv5m, YOLOv5l, and YOLOv5x parameters is too large, the number of training set samples is limited, and the parameters cannot be well optimized during training, which ultimately leads to their poor fitting effect. It can be seen from Table 4 that YOLOv5s had good results in terms of all indicators (P: 91%; R: 85.5%; mAP: 91.1%), and its number of parameters was low, with a moderate level of computation volume. Further, compared to the original YOLOv5s model, the performance of YOLOv5s-CMS was further improved in terms of P, R, F1, and mAP values. Specifically, its R was increased by 3% and mAP was increased by 3.1%. Overall, YOLOv5s-CMS showed obvious advantages in the detection task of cucumber root-knot nematode.

From the comparison of loss functions after training as shown in Figure 8, it can be seen that the loss value of Faster R-CNN, YOLOv3, and YOLOv4 only declined to about 0.1, while the loss value of the six versions of YOLOv5 could eventually decline to a much lower level. Particularly, the loss function of YOLOv5s-CMS showed a smoother downward trend and had the lowest final loss value, indicating that the modified model in this paper was effective and superior. Figure 9 shows the comparison of mAP value of the training results between various models. It can be seen that all the six versions of YOLOv5 had higher mAP values than the other three models, and the YOLOv5s-CMS model had the highest [email protected] and the fastest convergence.

5.3.3. Ablation Test

In order to validate the three improvement strategies for YOLOv5s proposed in this paper, ablation tests were carried out on the dataset to evaluate the effectiveness of each improvement point. Specifically, the CBAM-CA mechanism, the K-means++ algorithm, and the SIoU loss function were added to the original YOLOv5s model one by one. In the training process, the same parameter settings were used. The results are shown in Table 5 (“√” means that the improvement strategy is used; “-” means that the improvement strategy is not used).

According to Table 5, after adding the CBAM-CA dual attention mechanism into the model, the R value was increased by 1.9% and the mAP value was increased by 1.7%. It can be seen that the incorporation of the dual attention mechanism has indeed improved the feature extraction ability of the Backbone, allowing the model to focus on key feature regions so as to significantly improve its p value. After adding the K-means++ algorithm, the R and mAP values were improved by 2.3% and 2.4% respectively, which reflects that the K-mean++ algorithm has the ability to jump out of the local optimal cluster to find the global optimal cluster. After adding the SIoU loss function, the loss of the predicted box was reduced, so that the regression precision was improved, and the R and mAP values were increased by 1.1% and 2%, respectively.

5.4. Comparison of Detection Results

Finally, target detection tests were carried out between the YOLOv5s-CMS model and the original YOLOv5s model. Some results are shown in Figure 10. The red box represents the target that can be detected by both models, the green box represents the target that YOLOv5s-CMS detects but YOLOv5s fails to detect, and the yellow box represents the target that YOLOv5s detects wrongly. Overall, the YOLOv5s-CMS model could partially solve the problem of missed detection and false detection, so as to improve the detection performance.

6. Conclusions

In this paper, a root-knot nematode target detection model, YOLOv5s-CMS, was proposed to support the breeding of cucumber varieties with resistance to root-knot nematodes. By overcoming the weaknesses of manual operation, the proposed model provides a faster detection tool for the selection and breeding of cucumber varieties. On the basis of the original YOLOv5s model, the CBMA-CA mechanism was introduced to improve the model’s ability of extracting key features, the K-means++ algorithm was used to find the global but not the local optimal cluster center to improve the training performance, and the SIoU loss function was applied to replace the original CIoU loss function to effectively reduce the loss in degrees of freedom. The test results show that the R and mAP values of the modified model were increased by 3.0% and 3.7% respectively, compared to the original model. Considering that the YOLOv5s-CMS model still has room for improvement in terms of detection performance and model complexity, the future research can focus on the modification of network structure to further improve its overall detection capacities.

Author Contributions

C.W.: Writing—Original draft preparation; S.S.: Software; C.Z.: Methodology; Z.M.: Writing—Reviewing and Editing; H.W.: Investigation; G.T.: Data curation. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Key Research and Development Program of China, grant number 2021ZD0113604, and in part by the China Agriculture Research System of MOF and MARA, grant number CARS-23-D07, and in part by the Natural Science Foundation of Hebei Province, grant number F2022204004, and in part by the Hebei Province Key Research and Development Program, grant number 20327402D, 19227210D.

Acknowledgments

We are grateful to our colleagues at Hebei Key Laboratory of Agricultural Big Data and Institute of Vegetables and Flowers, Chinese Academy of Agricultural Sciences for their help and input, without which this study would not have been possible.

Conflicts of Interest

The authors declare no conflict of interest.

References

Atkinson, H.J.; Lilley, C.J.; Urwin, P.E. Strategies for transgenic nematode control in developed and developing world crops. Curr. Opin. Biotechnol. 2012, 23, 251–256. [Google Scholar] [CrossRef] [PubMed]
Wang, T.; Rostamza, M.; Song, Z.; Wang, L.; McNickle, G.; Iyer-Pascuzzi, A.S.; Qiu, Z.; Jin, J. SegRoot: A high throughput segmentation method for root image analysis. Comput. Electron. Agric. 2019, 162, 845–854. [Google Scholar] [CrossRef]
Kang, J.; Liu, L.; Zhang, F.; Shen, C.; Wang, N.; Shao, L. Semantic segmentation model of cotton roots in-situ image based on attention mechanism. Comput. Electron. Agric. 2021, 189, 106370. [Google Scholar] [CrossRef]
Smith, A.G.; Petersen, J.; Selvan, R.; Rasmussen, C.R. Segmentation of roots in soil with U-Net. Plant Methods 2020, 16, 1–15. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Yasrab, R.; Atkinson, J.A.; Wells, D.M.; French, A.P.; Pridmore, T.P.; Pound, M.P. RootNav 2.0: Deep learning for automatic navigation of complex plant root architectures. GigaScience 2019, 8, giz123. [Google Scholar] [CrossRef] [Green Version]
Keller, K.; Kirchgessner, N.; Khanna, R.; Siegwart, R.; Walter, A.; Aasen, H. Soybean leaf coverage estimation with machine learning and thresholding algorithms for field phenotyping. In Proceedings of the British Machine Vision Conference, Newcastle, UK, 3–6 September 2018; BMVC Press: Newcastle, UK, 2018; pp. 3–6. [Google Scholar]
Atanbori, J.; Chen, F.; French, A.P.; Pridmore, T. Towards low-cost image-based plant phenotyping using reduced-parameter CNN. In Proceedings of the Workshop Is Held at 29th British Machine Vision Conference, Northumbria, UK, 4–6 September 2018. [Google Scholar]
Wang, C.; Li, X.; Caragea, D.; Bheemanahallia, R.; Jagadish, S.K. Root anatomy based on root cross-section image analysis with deep learning. Comput. Electron. Agric. 2020, 175, 105549. [Google Scholar] [CrossRef]
Ostovar, A.; Talbot, B.; Puliti, S.; Astrup, R.; Ringdahl, O. Detection and classification of Root and Butt-Rot (RBR) in stumps of Norway Spruce using RGB images and machine learning. Sensors 2019, 19, 1579. [Google Scholar] [CrossRef] [Green Version]
Pun, T.B.; Neupane, A.; Koech, R. Quantification of Root-Knot Nematode Infestation in Tomato Using Digital Image Analysis. Agronomy 2021, 11, 2372. [Google Scholar] [CrossRef]
Mazurkiewicz, M.; Górska, B.; Jankowska, E.; Włodarska-Kowalczuk, M. Assessment of nematode biomass in marine sediments: A semi-automated image analysis method. Limnol. Oceanogr. Methods 2016, 14, 816–827. [Google Scholar] [CrossRef]
Evangelisti, E.; Turner, C.; McDowell, A.; Shenhav, L.; Yunusov, T.; Gavrin, A.; Servante, E.K.; Quan, C.; Schornack, S. Deep learning-based quantification of arbuscular mycorrhizal fungi in plant roots. New Phytol. 2021, 232, 2207–2219. [Google Scholar] [CrossRef]
Tian, Y.; Yang, G.; Wang, Z.; Wang, H.; Li, E.; Liang, Z. Apple detection during different growth stages in orchards using the improved YOLO-V3 model. Comput. Electron. Agric. 2019, 157, 417–426. [Google Scholar] [CrossRef]
Shi, R.; Li, T.; Yamaguchi, Y. An attribution-based pruning method for real-time mango detection with YOLO network. Comput. Electron. Agric. 2020, 169, 105214. [Google Scholar] [CrossRef]
Wu, D.; Lv, S.; Jiang, M.; Song, H. Using channel pruning-based YOLO v4 deep learning algorithm for the real-time and accurate detection of apple flowers in natural environments. Comput. Electron. Agric. 2020, 178, 105742. [Google Scholar] [CrossRef]
Wang, D.; He, D. Channel pruned YOLO V5s-based deep learning approach for rapid and accurate apple fruitlet detection before fruit thinning. Biosyst. Eng. 2021, 210, 271–281. [Google Scholar] [CrossRef]
Malta, A.; Mendes, M.; Farinha, T. Augmented reality maintenance assistant using YOLOv5. Appl. Sci. 2021, 11, 4758. [Google Scholar] [CrossRef]
Chaudhari, S.; Mithal, V.; Polatkan, G.; Ramanath, R. An Attentive Survey of Attention Models. ACM Trans. Intell. Syst. Technol. 2019, 12, 1–32. [Google Scholar] [CrossRef]
Xu, K.; Ba, J.; Kiros, R.; Cho, K.; Courville, A.; Salakhudinov, R.; Zemel, R.; Bengio, Y. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. Comput. Sci. 2015, 37, 2048–2057. [Google Scholar]
Hou, Q.; Zhou, D.; Feng, J. Coordinate Attention for Efficient Mobile Network Design. In Proceedings of the 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021. [Google Scholar]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; Volume 11211, pp. 3–19. [Google Scholar]
Zheng, Z.; Wang, P.; Ren, D.; Liu, W.; Ye, R.; Hu, Q.; Zuo, W. Enhancing Geometric Factors in Model Learning and Inference for Object Detection and Instance Segmentation. IEEE Trans. Cybern. 2021, 52, 8574–8586. [Google Scholar] [CrossRef]
Gevorgyan, Z. SIoU Loss: More Powerful Learning for Bounding Box Regression. arXiv 2022, arXiv:2205.12740. [Google Scholar]
Li, S.; Li, C.; Yang, Y.; Zhang, Q.; Wang, Y.; Guo, Z. Underwater scallop recognition algorithm using improved YOLOv5. Aquac. Eng. 2022, 98, 102273. [Google Scholar] [CrossRef]
Guo, G.; Zhang, Z. Road damage detection algorithm for improved YOLOv5. Sci. Rep. 2022, 12, 15523. [Google Scholar] [CrossRef] [PubMed]
Redmon, J.; Farhadi, A. Yolov3: An Incremental Improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Bochkovskiy, A.; Wang, C.Y.; Liao, H.Y.M. Yolov4: Optimal Speed and Accuracy of Object Detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1137–1149. [Google Scholar] [CrossRef]

Figure 1. The captured cucumber root images.

Figure 2. Image labeling process. (a) Original image, (b) Labeled image, (c) Label information.

Figure 3. The YOLOv5 network structure.

Figure 4. The CBL module and C3 module. (a) CBL module, (b) C3 module.

Figure 5. Diagram of the CBAM-CA module.

Figure 6. The Backbone structure incorporated with the CBAM-CA mechanism.

Figure 7. Diagram of SIoU loss function.

Figure 8. Comparison of loss values.

Figure 9. Comparison of [email protected].

Figure 10. Comparison of partial detection results between YOLOv5s-CMS and YOLOv5s.

Table 1. Prior anchor box size.

Feature Map Scale	Anchor Box Size
Feature Map Scale	Anchor Box 1	Anchor Box 2	Anchor Box 3
Small scale	(10, 13)	(16, 30)	(33, 23)
Middle scale	(30, 61)	(62, 45)	(59, 119)
Large scale	(116, 90)	(156, 198)	(373, 326)

Table 2. Parameter settings.

Parameter Name	Value
Learning rate	0.01
Momentum factor	0.937
Weight decay	0.0005
Batch size	16
Epoch	300

Table 3. Comparison of different attention mechanism modules.

Module	P/%	R/%	F1/%	mAP/%	No. of Total Parameters
/	91	85.5	88.2	91.1	7,012,822
CA	91.2	87.4	89.3	92.8	7,169,542
simAM	90.9	87.3	89.1	93	7,143,894
SE	70.2	54.1	61.1	57.6	7,169,782
CBAM	92.3	85.8	88.9	93.1	7,160,376
CBAM+CA	92.2	87.8	89.9	93.5	7,195,528

Table 4. Comparison between different target detection models.

Model	P/%	R/%	F1/%	mAP/%
YOLOv3	74.2	12.2	21	39.3
YOLOv4	75.5	33.3	46	51.2
Faster R-CNN	67.7	36.2	47	51.4
YOLOv5s	91	85.5	88.2	91.1
YOLOv5m	87.8	77.9	82.6	85.2
YOLOv5n	84.8	82.9	83.8	88.3
YOLOv5l	88.3	79.1	83.4	87.7
YOLOv5x	82.8	78.5	80.1	84.7
YOLOv5s-CMS	94.3	88.5	91.3	94.8

Table 5. Comparison of models during ablation tests.

CBAM+CA	K-Means++	SIoU	P/%	R/%	mAP/%
-	-	-	91	85.5	91.1
√	-	-	91.2	87.4	92.8
-	√	-	92.2	87.8	93.5
-	-	√	91.1	86.8	93.1
√	√	-	93.1	88.4	94.4
√	-	√	93.3	86.3	93.6
-	√	√	93.1	86.3	93.6
√	√	√	94.3	88.5	94.8

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, C.; Sun, S.; Zhao, C.; Mao, Z.; Wu, H.; Teng, G. A Detection Model for Cucumber Root-Knot Nematodes Based on Modified YOLOv5-CMS. Agronomy 2022, 12, 2555. https://doi.org/10.3390/agronomy12102555

AMA Style

Wang C, Sun S, Zhao C, Mao Z, Wu H, Teng G. A Detection Model for Cucumber Root-Knot Nematodes Based on Modified YOLOv5-CMS. Agronomy. 2022; 12(10):2555. https://doi.org/10.3390/agronomy12102555

Chicago/Turabian Style

Wang, Chunshan, Shedong Sun, Chunjiang Zhao, Zhenchuan Mao, Huarui Wu, and Guifa Teng. 2022. "A Detection Model for Cucumber Root-Knot Nematodes Based on Modified YOLOv5-CMS" Agronomy 12, no. 10: 2555. https://doi.org/10.3390/agronomy12102555

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Detection Model for Cucumber Root-Knot Nematodes Based on Modified YOLOv5-CMS

Abstract

1. Introduction

2. Materials and Methods

2.1. Image Acquisition

2.2. Image Processing

2.2.1. Image Preprocessing

2.2.2. Image Augmentation

3. YOLOv5s Target Detection Algorithm

3.1. Adaptive Anchor Box Calculation

3.2. Backbone

3.3. Neck

4. Modified YOLOv5s Model

4.1. The Embedded Dual Attention Mechanism CBAM-CA

4.2. K-Means++ Customized Anchor Box

4.3. Improved Loss Function

5. Experiment and Analysis

5.1. Model Training

5.1.1. Experiment Environment

5.1.2. Parameter Settings

5.2. Evaluation Indicators

5.3. Experiment Results and Analysis

5.3.1. Comparison of Algorithms Incorporated with Different Attention Mechanisms

5.3.2. YOLOv5s-CMS Results Analysis

5.3.3. Ablation Test

5.4. Comparison of Detection Results

6. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI