Next Article in Journal
Lightweight One-Stage Maize Leaf Disease Detection Model with Knowledge Distillation
Previous Article in Journal
Do Futures Prices Help Forecast Spot Prices? Evidence from China’s New Live Hog Futures
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Target Soybean Leaf Segmentation Model Based on Leaf Localization and Guided Segmentation

1
College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China
2
College of Agriculture, South China Agricultural University, Guangzhou 510642, China
*
Author to whom correspondence should be addressed.
Agriculture 2023, 13(9), 1662; https://doi.org/10.3390/agriculture13091662
Submission received: 14 July 2023 / Revised: 18 August 2023 / Accepted: 18 August 2023 / Published: 23 August 2023
(This article belongs to the Section Digital Agriculture)

Abstract

:
The phenotypic characteristics of soybean leaves are of great significance for studying the growth status, physiological traits, and response to the environment of soybeans. The segmentation model for soybean leaves plays a crucial role in morphological analysis. However, current baseline segmentation models are unable to accurately segment leaves in soybean leaf images due to issues like leaf overlap. In this paper, we propose a target leaf segmentation model based on leaf localization and guided segmentation. The segmentation model adopts a two-stage segmentation framework. The first stage involves leaf detection and target leaf localization. Based on the idea that a target leaf is close to the center of the image and has a relatively large area, we propose a target leaf localization algorithm. We also design an experimental scheme to provide optimal localization parameters to ensure precise target leaf localization. The second stage utilizes the target leaf localization information obtained from the first stage to guide the segmentation of the target leaf. To reduce the dependency of the segmentation results on the localization information, we propose a solution called guidance offset strategy to improve segmentation accuracy. We design multiple guided model experiments and select the one with the highest segmentation accuracy. Experimental results demonstrate that the proposed model exhibits strong segmentation capabilities, with the highest average precision (AP) and average recall (AR) reaching 0.976 and 0.981, respectively. We also compare our segmentation results with current baseline segmentation models, and multiple quantitative indicators and qualitative analysis indicate that our segmentation results are better.

1. Introduction

The study of soybean leaf phenotypes plays an important role in soybean breeding, real-time monitoring of plant growth, and precision cultivation management [1]. Phenotypic parameters of soybean leaves include leaf length, leaf width, and leaf area. Traditional methods of data acquisition rely on manual measurements, which are not only time-consuming but also cause irreversible damage to crops [2]. To prevent harm to plant growth, the practice of noncontact data collection is gradually becoming a trend [3]. Images, as the most convenient and easily obtainable medium, have become the primary data type. The target leaf images for phenotype parameter measurements usually contain complex backgrounds. These backgrounds usually contain leaves with the same color and texture as the target leaves, which brings difficulties to segment the target leaves.
Achieving fast and accurate leaf segmentation in complex background conditions has always been a challenge in the field of agricultural image recognition. Currently, there are numerous leaf segmentation algorithms based on traditional image processing techniques. For example, Kumar et al. [4] utilized a graph-based approach to extract leaf regions. Bai et al. [5] utilized a marker-based watershed algorithm that relies on the HSI space to effectively segment target leaves. Kuo et al. [6] proposed a leaf segmentation method based on the k-means algorithm, which utilizes an octree structure to reduce computational complexity and memory usage. Tian et al. [7] proposed an adaptive clustering algorithm called K-means to mitigate the adverse effects of manually selecting inappropriate cluster numbers on the segmentation quality. Gao et al. [8] combined the OTSU and watershed segmentation methods to achieve leaf segmentation by utilizing manually labeled leaf edge points. Although numerous leaf segmentation algorithms have been proposed, these algorithms often heavily depend on the selection of initial parameters, involve complex preprocessing procedures, or fail to effectively segment each leaf in complex practical situations, such as image noise, brightness, and overlapping leaves. These limitations in traditional techniques restrict their widespread use in agricultural production.
In recent years, there has been significant attention given to the use of deep neural networks in addressing agricultural production issues. With the continuous advancement of smart agriculture, the demand for leaf segmentation algorithms in agricultural production is also increasing. Compared to traditional image processing techniques, segmentation models have a wider range of applications, streamlined processes, and a more significant impact. Bhagat et al. [9] proposed an encoder–decoder architecture for leaf segmentation. They used EfficientNet-B4 as the encoder and implemented a lateral output structure to improve segmentation accuracy. Wang et al. [10] proposed an automated algorithm for corn leaf segmentation. The algorithm improves the segmentation results of the model by incorporating image restoration techniques. Liu et al. [11] combined Mask R-CNN [12] with the DBSCAN clustering algorithm to propose a highly accurate automatic segmentation method. Tian [13] combined the mask prediction branch of Mask R-CNN with the U-Net [14] model to improve the accuracy of segmenting apple blossom images. Although the aforementioned methods have yielded positive outcomes, they have only been examined in relatively simple settings. Therefore, their effectiveness must be enhanced when confronted with complex background environments. In addressing the issue of complex backgrounds in image processing, some scholars have adopted a two-stage approach of prescreening the complex background. Wang et al. [15] proposed the DU-Net model, which first utilized DeepLabv3 [16] to segment cucumber leaves and then employed the U-Net model to segment leaf lesions. Tassis et al. [17] proposed a two-stage model based on Mask R-CNN. The model first utilized Mask R-CNN to identify the region of target leaves and then applied the U-Net model to segment leaf lesions within the identified leaf region.
Based on the above research and practical application requirements, we need to solve two problems: (1) High-value leaves in soybean leaf images need to be identified and segmented during the segmentation process. (2) An effective segmentation algorithm needs to be designed for the soybean leaf images with complex background. For the first problem, we set the large leaf close to the center of the image as the target leaf to be segmented, and designed a target leaf localization algorithm to identify the target leaf. For the second problem, we draw on the approach of Wang et al. and propose a two-stage soybean leaf segmentation algorithm. The model consists of two modules, one for target leaf recognition and localization, and the other for the guided segmentation of the target leaf.
The remaining sections of this article are organized as follows: Section 2 presents the materials and methods used in this study. Section 3 analyzes and discusses the experimental results of this study. Section 4 concludes the research and provides future perspectives.

2. Materials and Methods

2.1. Dataset

2.1.1. Large Public Dataset

Before training the relevant deep models using our soybean leaf dataset, we employed the technique of transfer learning. We initialized the model parameters using the weights obtained by pretraining the models on large public datasets. In this paper, we used two large public datasets, Microsoft COCO [18] and Pascal VOC2012 [19].

2.1.2. Data Acquisition

We conducted image data acquisition at the College of Agriculture and Zengcheng Teaching and Research Base of South China Agricultural University in Tianhe District, Guangdong Province, China. The main acquisition device is the iPhone 12, which captured images at a resolution of 4032 × 3024 pixels. The data collection took place in the morning under cloudy weather conditions and without harsh sunlight. We captured images of soybean leaves at two growth stages: flower bud differentiation and flowering and podding. Subsequently, we performed an initial screening of the images to eliminate those with similar backgrounds. Overall, we obtained 220 original images, each of which underwent necessary cropping and resizing operations. The image size was adjusted to 512 × 512 pixels. Figure 1 shows some representative samples of the processed image data.

2.1.3. Data Annotation and Enhancement

We used Labelme (version 5.1.1, https://github.com/wkentaro/labelme, accessed on 18 September 2022) for annotating our dataset. Labelme is a free annotation software specifically designed for object detection and segmentation tasks. In this paper, as our model includes the target leaf localization module and the guided segmentation module, preprocessing of the dataset was necessary to train these two modules effectively. For the target leaf localization module, we required supervised data in the object detection format. This involved annotating bounding boxes to indicate the position and contour of each leaf, as depicted in Figure 2a. Regarding the guided segmentation module, we required supervised data in the object segmentation format. As the objective of the guided segmentation module is to segment target leaves, we only needed to annotate segmentation masks for them, as shown in Figure 2b.
To ensure the trained model possess good robustness, we employ a range of data augmentation techniques to enhance the diversity of the dataset. The specific techniques and their corresponding values are presented in Table 1. After the augmentation process, the dataset comprises a total of 2954 images. We randomly divided 1619 images for model training and allocated 1335 images for model evaluation.

2.2. Methods

As shown in Figure 3, the target leaf segmentation model consists of a target leaf localization module and a guided segmentation module. The primary objective of the target leaf localization module is to identify the target leaf within the image. It accomplishes this by providing the location information of the target leaf to the guided segmentation module in the form of a rectangular bounding box. Additionally, the guided segmentation module is responsible for accurately segmenting the target leaf. It achieves this by leveraging the information provided by the rectangular bounding box, enabling precise segmentation of the leaf.

2.2.1. Target Leaf Localization Module

The localization process of the target leaf localization module comprises two steps: leaf detection and target leaf localization. In the first step, the module employs Libra R-CNN [20] as a leaf detector to accurately detect all the leaves present in the image. Libra R-CNN generates rectangular bounding boxes for each leaf detected. Subsequently, in the second step, the module applies a target leaf localization algorithm to filter out the bounding boxes that correspond to the target leaf from the set of generated bounding boxes.
(1)
Libra R-CNN
Libra R-CNN consists of three parts: feature extraction, region proposal generation, and region proposal optimization, as illustrated in Figure 4.
The feature extraction component of Libra R-CNN utilizes a series of network architectures to extract image features, as depicted in Figure 5. Initially, it utilizes ResNet50 [21] to efficiently capture both intricate details and semantic information from the image. Following that, the Feature Pyramid Network (FPN) [22] is introduced to merge feature maps from neighboring levels using a top-down pathway and lateral connections, enabling the generation of multiscale feature representations. Lastly, the Balanced Feature Pyramid (BFP) is proposed to further enhance the representation capability of these features. To achieve this, an embedded Gaussian nonlocal attention module [23] is introduced, which captures global context information within the feature map and enhances its representation by calculating nonlocal similarities.
The network structure employed for region proposal generation is the Region Proposal Network (RPN), as illustrated in Figure 6. RPN was originally proposed by Faster R-CNN [24]. Its primary objective is to generate a set of candidate boxes on a given input image, which might potentially contain the objects to be detected. Within the RPN, the anchor generator generates multiple anchors that cover regions with various scales and aspect ratios. Subsequently, the region proposal generator performs classification and bounding box regression on these anchors to generate a series of candidate proposals. To ensure a balanced distribution of positive and negative anchor samples, Libra R-CNN proposed the implementation of IoU-balanced sampling within the Region Proposal Network. The loss function for the RPN is denoted as
L r p n = 1 N c l s i   L c l s p i , p i + 1 N r e g i   p i L r e g t i , t i
which includes the classification loss L c l s and the regression loss L r e g of anchors, where p i is the probability that anchor i is predicted to be positive. p i is 1 if anchor i is positive and 0 otherwise. t i is the four predicted regression parameters on anchor i , while t i is the actual regression parameters. The anchor classification loss L c l s defined by binary cross-entropy is
L c l s = [ p i log p i + 1 p i log 1 p i
The regression loss L r e g is defined as
L r e g t i , t i = j x , y , w , h   L 1 b a l a n c e d t i j t i j
L 1 b a l a n c e d x = { α b b x + 1 l n b x + 1 α x i f x < 1 γ x + C o t h e r w i s e
where t i j (j = x, y, w, h) is a specific regression parameter of t i , which is used to correct the x-coordinate, y-coordinate, height and width of Anchor, respectively, and t i j is a specific regression parameter of t i . In our experiments, α is set to 0.5 and γ is set to 1.5.
Region proposal optimization involves adjusting the position, width, and height of region proposals, as well as predicting probability scores for each proposal across all classes. The network architecture for this component adopts the region proposal optimization network proposed by Fast R-CNN [25], as shown in Figure 7. To begin, RoIAlign is employed to convert features of the region of interests (RoIs) into small feature maps with a fixed size of 7 × 7. RoIAlign, proposed by Mask R-CNN, serves as a feature extraction module for RoIs. It is an improvement over RoIPooling, initially proposed by Fast R-CNN. RoIAlign maps the region proposals to the corresponding feature map to obtain RoIs, followed by a maximum pooling operation on these regions. Next, the feature matrices are flattened and passed through two consecutive fully connected layers. These layers are followed by two parallel branches, one for outputting class probabilities and the other for regression parameters of each proposal. The regression parameters are then used to adjust the position and size of the proposals. By selecting boxes with a high probability of being classified as a leaf, a series of leaf bounding boxes can be obtained. The loss for region proposal optimization L r p o is defined as
L r p o = i   L c l s 1 p i , u i + 1 N c l s i   [ u i > 0 ] L r e g t i u i , v i
Here p i is the softmax probability of proposal i for each category (including background). u i is the actual category label. t i u i is the predicted regression parameters for the category u i corresponding to proposal i and v i is the actual regression parameters. The classification loss L c l s 1 of the proposals is defined using the cross-entropy loss for multiple classifications as
L c l s 1 p i , u i = log p i u i
where p i u i is the predicted probability of the category u i corresponding to proposal i . The regression loss L r e g is used to define the regression loss of proposals, which is calculated in Equation (3).
(2)
Target Leaf Localization Algorithm
During the detection phase, we successfully detected the rectangular bounding boxes of all leaves within the image. Now, our next step is to filter out the bounding box that corresponds to the target leaf. The execution flow of the target leaf localization algorithm is outlined below:
Step 1: Calculate the distance between the center point of each bounding box and the center point of the image. This distance is determined by the following equation:
d i = p i x c x 2 + p i y c y 2
Here, p i represents the center point coordinates of the i-th bounding box, and c represents the center point coordinates of the image. After calculating the distances for all bounding boxes, we normalize these distances using Equation (8) where d m a x represents the maximum distance and d m i n represents the minimum distance.
d n i = d i d m i n d m a x d m i n
Step 2: Calculate the area of each bounding box and normalize all the areas. The normalization formula is defined as
S n i = S i S m i n S m a x S m i n
Here, S i represents the area of the i-th bounding box, S m a x represents the maximum area, and S m i n represents the minimum area.
Step 3: Calculate the probability score for each bounding box that contains the target leaf and select the bounding box with the highest probability score as the target box. The formula for calculating the probability score is as follows:
P S i = e ( d n i σ 1 ) e ( 1 S n i σ 2 )
In Equation (10), σ1 and σ2 are control parameters. The bounding box for the target leaf should have a relatively large area, while the distance between its center point and the image center should be relatively small. By considering both the area and distance factors and using a Gaussian function, we can balance the effects of area and distance.

2.2.2. Guided Segmentation Module

The guided segmentation module adopts the network structure proposed by Zhang et al. [26] in their interactive segmentation model. It comprises four stages, as illustrated in Figure 8. Firstly, the input data processing stage extracts the location information implied by the bounding box and employs it as guidance for segmenting the target leaf. Secondly, the feature extraction stage extracts multiscale features of the target leaf. Thirdly, the feature refinement stage upsamples and fuses the multiscale features to restore any lost boundary features of the segmentation region. Lastly, in the mask prediction stage, the module generates the mask for the target leaf.
(1)
Input Data Processing
The input data processing provides guidance information for the target leaf segmentation model. This process consists of three steps, as illustrated in Figure 9. Firstly, an image cropping operation is performed to obtain a local segmentation area by shifting 30 pixels outward along the bounding box of the target leaf. Secondly, the cropped image is resized to a standard size (e.g., 512 × 512), and the vertex coordinates of the bounding box are adjusted accordingly. Lastly, the coordinates of the center point and the vertices of the bounding box are extracted and used to construct two single-channel Gaussian heatmaps. These heatmaps transform the location information into data that the model can process. The foreground guidance channel is defined by a Gaussian heatmap using the center point coordinate ( x 0 , y 0 ) of the bounding box, as shown in Equation (11):
F P = e ( ( x x 0 ) 2 + ( y y 0 ) 2 σ 2 )
where σ = 5 log 2 . Similarly, the background guidance channel is defined using the four vertex coordinates { x i ,   y i | i  {1, 2, 3, 4}} of the bounding box, as shown in Equation (12):
B P = m a x { e ( x x i ) 2 + ( y y i ) 2 σ 2 | i 1 ,   2 ,   3 ,   4 }
These two Gaussian heat maps are concatenated with the resized image, resulting in an input data with five channels for target leaf segmentation.
(2)
Feature Extraction
The feature extraction network in the model follows a structure similar to FPN, as depicted in Figure 10. It comprises two components: basic feature extraction and semantic information fusion. For basic feature extraction, ResNet101 is used to construct a pyramid-structured multiscale feature map. However, unlike the conventional FPN structure, the deepest feature map from ResNet101 is enhanced with global contextual information using the pyramid scene parsing (PSP) module [27], as shown in Figure 11. The PSP module involves averaging the input feature map with four pooling windows of different sizes, followed by sequential convolution, upsampling, and concatenation with the original feature map. The resulting fusion is then processed to obtain a feature map that incorporates semantic information.
(3)
Feature Refinement
The feature refinement network aims to address the loss of fine details in the multiscale feature maps extracted from the feature extraction network, which can affect the accuracy of segment boundaries. It achieves this by upsampling and fusing the multiscale feature information. The network structure, as shown in Figure 12, involves convolving the feature maps of different layers using varying numbers of residual blocks. These convolved feature maps are then upsampled to match the size of the lowest-level feature map, and finally, the refined feature map is obtained by concatenating the feature maps from different levels.
(4)
Mask Prediction
The structure of the mask prediction network is illustrated in Figure 13. The refined feature map is initially processed by the mask predictor to generate a target mask. Subsequently, the mask is mapped back to the original image based on the position and size of the original target leaf bounding box. The loss function for mask prediction is defined as
L m a s k   = 1 N i y i log p i + 1 y i log 1 p i
where p i is the predicted value of pixel i of predicted mask and y i (0 or 1) is the pixel i ’s value of ground truth. To provide better supervision during training, the four-layer feature maps extracted by the feature extraction network are also employed for mask prediction. The generated masks are incorporated in the loss calculation. Therefore, the overall loss function for the guided segmentation module can be defined as
L t o t a l = k = 1 5   L m a s k k

2.2.3. Guidance Offset Strategy

(1)
Definition of Guidance Tolerance Offset Distance
In certain cases, the predicted bounding box for the target leaf by the leaf detector may not completely encompass the entire leaf, as illustrated in Figure 14. This can result to incomplete segmentation outcomes. To address this issue, we propose adjusting the bounding box by moving its four vertices outward by an equal distance. This adjustment allows the bounding box to fully enclose the entire leaf, thereby significantly improving the segmentation effectiveness of the model. We refer to this distance as the guidance tolerance offset distance, denoted as d t . The coordinates of the new vertex after the adjustment can be represented as:
x i , y i = ( x i + ( 1 ) γ 1 d t , y i + ( 1 ) γ 2 d t )
Here, ( x i , y i ) i = 1 ,   2 ,   3 ,   4 represents the original coordinates of the vertex, ( x i , y i ) represents the coordinates after the adjustment, and γ 1 , γ 2 are factors that measure the relative positions of the vertices. Specifically, γ 1 is equal to 1 if the vertex is on the left side of the bounding box, and 0 otherwise. Similarly, γ 2 is equal to 1 if the vertex is on the upper side, and 0 otherwise.
(2)
Definition of Guidance Offset Strategy
To ensure effective utilization of the guided segmentation module with the guidance tolerance distance, we adopt a similar approach of vertex movement for the bounding box of the input data during the training of the guided segmentation module. This enables training the guided segmentation module while being guided by the guidance offset distance. We refer to the combination of a fixed value of d t and the segmentation module trained by d t guidance as a guidance offset strategy. The guidance offset strategy leverages the advantage of d t , which helps compensate for the problem where the leaf bounding box fails to enclose the entire leaf. As long as the relaxation degree of the bounding box remains within a reasonable range, the guided segmentation module associated with this strategy can accurately segment the complete leaf. In essence, the guidance offset strategy addresses the problem of unsatisfactory segmentation results caused by the detection bias of the leaf detector, thus, effectively enhancing the segmentation accuracy.

3. Experimental Results and Analysis

The computer configurations for the model experiments are presented in Table 2. We conducted separate training sessions for the leaf detector and the guided segmentation module. The parameter settings for training the two models are outlined in Table 3. To ensure the performances of these models and reduce the training time, we initialized the backbone parameters of Libra R-CNN with pretrained weights from the COCO dataset, and for the guided segmentation module parameters, we utilized weights pretrained on the VOC2012 dataset. It is important to note that both models were trained until convergence was achieved, ensuring optimal performance.

3.1. Selection of Control Parameters for the Target Leaf Localization Algorithm

The control parameters σ 1 and σ 2 are two key parameters of the target leaf localization algorithm, directly influencing the accuracy of the target leaf localization module. To determine the optimal values of σ 1 and σ 2 , we selected 20 numbers within the range of (0, 2] as candidate values of σ 1 and σ 2 , evenly spaced. This resulted in a total of 400 combinations of σ 1 and σ 2 values for analysis. Since the target leaf localization algorithm’s role is to identify the bounding box belonging to the target leaf among the boxes predicted by a detector, we utilized the predicted boxes from Libra R-CNN as the validation data to assess the feasibility of different σ 1 and σ 2 values, instead of relying on a manually labeled dataset. Additionally, we introduced a label to indicate whether a box is the target box or not; a label of one signifies that the box is the target box, while any other value indicates otherwise. We conducted tests on 1335 soybean leaf images, which amounted to a total of 2,956,800 bounding boxes. The results, depicted in Figure 15, show that the green dot represents the σ 1 and σ 2 values that achieved 100% accuracy in leaf localization. Based on the data presented in Figure 15, we ultimately determined the values of σ 1 and σ 2 to be 0.3 and 1.4, respectively.

3.2. Comparison of Different Guidance Offset Strategies

In this section, we examine the impact of different guidance offset strategies on segmentation accuracy. We employ d t values of 0, 5, 10, 15, and random fluctuations within the range of [0, 15] to guide the training of the guided segmentation module, resulting in five networks with distinct weights. For simplicity, we refer to these networks as Fix0_SNet, Fix5_SNet, Fix10_SNet, Fix15_SNet, and Ran_SNet, respectively. In this experiment, we select d t values of 0, 5, 10, and 15 to align with the aforementioned five modules, resulting in a total of 20 offset strategies. The corresponding segmentation accuracy for each offset strategy is illustrated in Figure 16a, Here, the x-axis represents the values of d t , the y-axis represents the segmentation accuracy of the model (measured using AP), and the color of the lines indicates the segmentation module used. From the data presented in the figure, it is evident that the offset strategy utilizing a d t value of 5 in conjunction with Fix10_SNet achieves the highest segmentation accuracy of 0.976.

3.3. Comparison of Different Leaf Detectors

In this section, we explore the impact of different leaf detectors on segmentation accuracy. Specifically, we selected two popular object detectors, namely Faster R-CNN and Yolov5x, and conducted experiments using the 20 offset strategies mentioned in the previous section. Before commencing the experiments, we ensured that the target leaf localization algorithm achieved 100% accuracy. The experimental results are presented in Figure 16b,c. From the data shown in the figure, it is clear that when using Faster R-CNN as the leaf detector, the strategy of combining d t with a value of 0 with Fix5_SNet, achieves the highest accuracy for the segmentation model, with an AP of 96.2%. Additionally, when employing Yolov5x as the detector, the strategy of combining d t with a value of 0 with Fix10_SNet, attains the highest accuracy for the segmentation model, with an AP of 95.5%. By comparing the highest achievable accuracy across the three detectors, it becomes apparent that Libra R-CNN achieves the highest segmentation accuracy, followed by Faster R-CNN, while Yolov5x exhibits the lowest accuracy.
Based on the experimental results in Section 3.2 and Section 3.3, it is evident that guidance offset strategies have a significant impact on the segmentation accuracy by mitigating the effects of guidance information bias. When a guidance offset strategy is appropriately matched with the detection capability of the detector, it ensures a higher segmentation accuracy. By comparing the highest accuracy values displayed in the three figures, we can observe that although the guidance offset strategy can compensate for detection bias, the detection accuracy of the detector still ultimately determines the upper limit of the segmentation accuracy. Furthermore, regardless of leaf detectors, it is evident that strategies employing d t with higher values perform worse than other strategies, even when the guided segmentation module is trained under the same value of d t . This phenomenon demonstrates that when the guidance information deviates excessively from the target leaf’s bounding box, the segmentation accuracy of the guided segmentation module tends to decrease. In summary, based on the commonalities observed in the three figures, we conclude that the strategies combining the d t = 0 or d t = 5 with the Fix0_SNet or Fix5_SNet can maximize the improvement in the model’s segmentation accuracy.

3.4. Comparison with Other Segmentation Models

To showcase the performance of our leaf segmentation model, we conducted a contrast experiment using four segmentation models: Mask R-CNN, Yolov5x, DeepLabv3, and U-Net. The evaluation involved both quantitative analysis and qualitative comparison. All the models were trained and tested by our constructed dataset, that is, 1619 images for training and 1335 images for evaluating. To ensure fairness in the comparative experiments, we initialized the model parameters using weights pretrained on the COCO or VOC2012 dataset. Additionally, we ensure that each model was trained to convergence.
The evaluation metrics used in quantitative analysis are Accuracy, Precision, Recall, F1 Score, AP and AR, and the evaluation results of these models’ performance metrics are shown in Table 4. Upon comparing the evaluation data of different models, it is evident that our model outperforms the others across all metrics. This demonstrates the superior localization and segmentation capabilities of our model.
The metrics Accuracy, Precision, and Recall differ from AP and AR. In the segmentation task, the model’s predicted mask pixels are compared with the ground truth pixels. Based on this comparison, the pixels are classified as false–positive, true–positive, false–negative, or true–negative samples, which are then used to compute these three metrics. However, due to the large dataset size we collected and the relatively high proportion of background pixels in the images, the true–positive and true–negative samples can dominate the total samples during the calculation process. This can lead to inflated values for these three indicators, creating an illusion of high model accuracy. Since the F1 Score is the harmonic mean of Precision and Recall, in this scenario, the F1 Score value can also become larger, which is not conducive to the objective evaluation of these models. In contrast, AP and AR are calculated in a more scientific manner. They divide the samples based on the IoU (Intersection over Union) value between the predicted mask and the actual mask, and then calculate Precision and Recall under different IoU thresholds. Finally, these values are averaged. From the AP and AR values shown in Table 4, it is evident that our model, Mask R-CNN, and Yolov5x have higher segmentation accuracies, while DeepLabv3 and U-Net exhibit poorer segmentation performance. To further support these conclusions, we also compared some actual segmentation results of these models, as depicted in Figure 17. The results in the figure clearly demonstrate the higher accuracy of our model, while DeepLabv3 and U-Net do not perform as well as the other three models.

4. Conclusions

Target soybean leaf extraction is a prerequisite for calculating the phenotypic parameters of soybean leaves. In this paper, a segmentation model for the soybean target leaf was proposed by combining object detection and guided segmentation technologies. Based on the idea that the target leaf is located in the center of the image and the leaf area is large, a method to locate the target soybean leaf was provided. To alleviate the issue of poor segmentation caused by guidance information bias, the guidance offset strategy was proposed. Various experimental data and comparative analysis show that our model has higher segmentation accuracy and better generalization capacity. However, since the target leaf and the background leaf are highly similar both in color and texture, in some cases, it is difficult to find the boundary between the target leaf and the background leaf, which may induce the wrong segmentation. In order to further improve the segmentation precision, image depth or NeRF (Neural Radiance Field)-based implicit 3D reconstruction technology may be adopted to obtain more information to identify the foreground and background soybean leaves in the future.

Author Contributions

D.W.: methodology; Z.H. and H.Y.: software; Y.L.: funding; S.T.: writing—review and editing; C.Y.: data curation. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the key R&D project of Guangzhou (202206010091, 2023B03J1363), the Special Fund for Rural Revitalization Strategy of Guangdong (2023TS-3), and College Students’ Innovation and Entrepreneurship Competition (X202210564178).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Reynolds, M.; Chapman, S.; Crespo-Herrera, L.; Molero, G.; Mondal, S.; Pequeno, D.N.; Pinto, F.; Pinera-Chavez, F.J.; Poland, J.; Rivera-Amado, C.; et al. Breeder friendly phenotyping. Plant Sci. 2020, 295, 110396. [Google Scholar] [CrossRef] [PubMed]
  2. Yang, W.; Feng, H.; Zhang, X.; Zhang, J.; Doonan, J.H.; Batchelor, W.D.; Xiong, L.; Yan, J. Crop phenomics and high-throughput phenotyping: Past decades, current challenges, and future perspectives. Mol. Plant 2020, 13, 187–214. [Google Scholar] [CrossRef] [PubMed]
  3. Ward, B.; Brien, C.; Oakey, H.; Pearson, A.; Negrão, S.; Schilling, R.K.; Taylor, J.; Jarvis, D.; Timmins, A.; Roy, S.J.; et al. High-throughput 3D modelling to dissect the genetic control of leaf elongation in barley (Hordeum vulgare). Plant J. 2019, 98, 555–570. [Google Scholar] [CrossRef] [PubMed]
  4. Kumar, J.P.; Domnic, S. Image based leaf segmentation and counting in rosette plants. Inf. Process. Agric. 2019, 6, 233–246. [Google Scholar] [CrossRef]
  5. Bai, X.; Li, X.; Fu, Z.; Lv, X.; Zhang, L. A fuzzy clustering segmentation method based on neighborhood grayscale information for defining cucumber leaf spot disease images. Comput. Electron. Agric. 2017, 136, 157–165. [Google Scholar] [CrossRef]
  6. Kuo, K.; Itakura, K.; Hosoi, F. Leaf segmentation based on k-means algorithm to obtain leaf angle distribution using terrestrial LiDAR. Remote Sens. 2019, 11, 2536. [Google Scholar] [CrossRef]
  7. Tian, K.; Li, J.; Zeng, J.; Evans, A.; Zhang, L. Segmentation of tomato leaf images based on adaptive clustering number of K-means algorithm. Comput. Electron. Agric. 2019, 165, 104962. [Google Scholar] [CrossRef]
  8. Gao, L.; Lin, X. A method for accurately segmenting images of medicinal plant leaves with complex backgrounds. Comput. Electron. Agric. 2018, 155, 426–445. [Google Scholar] [CrossRef]
  9. Bhagat, S.; Kokare, M.; Haswani, V.; Hambarde, P.; Kamble, R. Eff-UNet++: A novel architecture for plant leaf segmentation and counting. Ecol. Inform. 2022, 68, 101583. [Google Scholar] [CrossRef]
  10. Wang, P.; Zhang, Y.; Jiang, B.; Hou, J. An maize leaf segmentation algorithm based on image repairing technology. Comput. Electron. Agric. 2020, 172, 105349. [Google Scholar] [CrossRef]
  11. Liu, X.; Hu, C.; Li, P. Automatic segmentation of overlapped poplar seedling leaves combining Mask R-CNN and DBSCAN. Comput. Electron. Agric. 2020, 178, 105753. [Google Scholar] [CrossRef]
  12. He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-Cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
  13. Tian, Y.; Yang, G.; Wang, Z.; Li, E.; Liang, Z. Instance segmentation of apple flowers using the improved mask R–CNN model. Biosyst. Eng. 2020, 193, 264–278. [Google Scholar] [CrossRef]
  14. Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
  15. Wang, C.; Du, P.; Wu, H.; Li, J.; Zhao, C.; Zhu, H. A cucumber leaf disease severity classification method based on the fusion of DeepLabV3+ and U-Net. Comput. Electron. Agric. 2021, 189, 106373. [Google Scholar] [CrossRef]
  16. Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
  17. Tassis, L.M.; de Souza, J.E.T.; Krohling, R.A. A deep learning approach combining instance and semantic segmentation to identify diseases and pests of coffee leaves from in-field images. Comput. Electron. Agric. 2021, 186, 106191. [Google Scholar] [CrossRef]
  18. Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft Coco: Common Objects in Context. In Proceedings of the Computer Vision—ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
  19. Everingham, M.; Eslami, S.M.A.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes Challenge: A Retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [Google Scholar] [CrossRef]
  20. Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra R-Cnn: Towards Balanced Learning for Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 821–830. [Google Scholar]
  21. He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
  22. Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
  23. Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-Local Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
  24. Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
  25. Girshick, R. Fast R-Cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
  26. Zhang, S.; Liew, J.H.; Wei, Y.; Wei, S.; Zhao, Y. Interactive Object Segmentation with Inside-Outside Guidance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12234–12244. [Google Scholar]
  27. Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]
Figure 1. Examples of processed image data.
Figure 1. Examples of processed image data.
Agriculture 13 01662 g001
Figure 2. Examples of image annotation (a) Image annotation for the target leaf localization module. (b) Image annotation for the guided segmentation module.
Figure 2. Examples of image annotation (a) Image annotation for the target leaf localization module. (b) Image annotation for the guided segmentation module.
Agriculture 13 01662 g002
Figure 3. The general framework of Target Leaf Segmentation Model.
Figure 3. The general framework of Target Leaf Segmentation Model.
Agriculture 13 01662 g003
Figure 4. The overall architecture of Libra R-CNN.
Figure 4. The overall architecture of Libra R-CNN.
Agriculture 13 01662 g004
Figure 5. The structure of Feature Extraction Network.
Figure 5. The structure of Feature Extraction Network.
Agriculture 13 01662 g005
Figure 6. The structure of Region Proposal Network.
Figure 6. The structure of Region Proposal Network.
Agriculture 13 01662 g006
Figure 7. The structure of Region Proposal Optimization.
Figure 7. The structure of Region Proposal Optimization.
Agriculture 13 01662 g007
Figure 8. The four stages of Target Leaf Segmentation Model.
Figure 8. The four stages of Target Leaf Segmentation Model.
Agriculture 13 01662 g008
Figure 9. The three steps of input data processing.
Figure 9. The three steps of input data processing.
Agriculture 13 01662 g009
Figure 10. The structure of Feature Extraction Network.
Figure 10. The structure of Feature Extraction Network.
Agriculture 13 01662 g010
Figure 11. The structure of PSP module.
Figure 11. The structure of PSP module.
Agriculture 13 01662 g011
Figure 12. The structure of Feature Refinement Network.
Figure 12. The structure of Feature Refinement Network.
Agriculture 13 01662 g012
Figure 13. The structure of Mask Prediction Network.
Figure 13. The structure of Mask Prediction Network.
Agriculture 13 01662 g013
Figure 14. Original bounding box generated by leaf detector and bounding box after vertex movement. (a) The original bounding box. (b) Bounding boxes after using different guidance tolerance offset distances.
Figure 14. Original bounding box generated by leaf detector and bounding box after vertex movement. (a) The original bounding box. (b) Bounding boxes after using different guidance tolerance offset distances.
Agriculture 13 01662 g014
Figure 15. The accuracy of Target Leaf Localization Algorithm using different values of σ 1 and σ 2 . The green dots indicate that the corresponding values of σ 1 , σ 2 can make the accuracy of the algorithm 100%. The red dots indicate that the corresponding values of σ 1 and σ 2 cannot make the accuracy of the algorithm 100%.
Figure 15. The accuracy of Target Leaf Localization Algorithm using different values of σ 1 and σ 2 . The green dots indicate that the corresponding values of σ 1 , σ 2 can make the accuracy of the algorithm 100%. The red dots indicate that the corresponding values of σ 1 and σ 2 cannot make the accuracy of the algorithm 100%.
Agriculture 13 01662 g015
Figure 16. The segmentation accuracy of the model with different leaf detectors and guidance offset strategies. (a) Use Libra R-CNN as the detector. (b) Use Faster R-CNN as the detector. (c) Use Yolov5x as the detector.
Figure 16. The segmentation accuracy of the model with different leaf detectors and guidance offset strategies. (a) Use Libra R-CNN as the detector. (b) Use Faster R-CNN as the detector. (c) Use Yolov5x as the detector.
Agriculture 13 01662 g016
Figure 17. The segmentation results of different models.
Figure 17. The segmentation results of different models.
Agriculture 13 01662 g017
Table 1. The data-augmentation operations and the corresponding values.
Table 1. The data-augmentation operations and the corresponding values.
OperationValue
fliphorizontal/vertical flip
brightness{0.4, 0.8}
gaussian noisemean = 0.0
Standard deviation = {10, 18}
Table 2. Computer configuration.
Table 2. Computer configuration.
ConfigurationParameter
CPUIntel(R) Core(TM) i7—6700 CPU
GPUGeForce GTX 1080 Ti
Operating systemUbuntu 22.04 LTS
Base environmentCUDA: 11.6
Development environmentPycharm2022
Table 3. Training setting.
Table 3. Training setting.
ParameterLeaf DetectorLeaf Segmentation Network
Epoch60100
Learning rate0.0011 × 10−8
Batch45
Weight decay0.00050.005
Momentum0.90.9
Table 4. Comparison with other segmentation models.
Table 4. Comparison with other segmentation models.
ModelAPARAccuracyPrecisionRecallF1
Ours0.9760.9810.9930.98990.99010.99
Mask R-CNN0.9210.9360.98380.97590.97780.9769
Yolov5x0.8510.8660.96190.94120.95090.9459
DeepLabv30.7670.8150.96450.94220.95840.9769
U-Net0.7940.8340.96750.95440.95210.9532
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, D.; Huang, Z.; Yuan, H.; Liang, Y.; Tu, S.; Yang, C. Target Soybean Leaf Segmentation Model Based on Leaf Localization and Guided Segmentation. Agriculture 2023, 13, 1662. https://doi.org/10.3390/agriculture13091662

AMA Style

Wang D, Huang Z, Yuan H, Liang Y, Tu S, Yang C. Target Soybean Leaf Segmentation Model Based on Leaf Localization and Guided Segmentation. Agriculture. 2023; 13(9):1662. https://doi.org/10.3390/agriculture13091662

Chicago/Turabian Style

Wang, Dong, Zetao Huang, Haipeng Yuan, Yun Liang, Shuqin Tu, and Cunyi Yang. 2023. "Target Soybean Leaf Segmentation Model Based on Leaf Localization and Guided Segmentation" Agriculture 13, no. 9: 1662. https://doi.org/10.3390/agriculture13091662

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop