Target Soybean Leaf Segmentation Model Based on Leaf Localization and Guided Segmentation

Wang, Dong; Huang, Zetao; Yuan, Haipeng; Liang, Yun; Tu, Shuqin; Yang, Cunyi

doi:10.3390/agriculture13091662

Open AccessArticle

Target Soybean Leaf Segmentation Model Based on Leaf Localization and Guided Segmentation

by

Dong Wang

¹,

Zetao Huang

¹,

Haipeng Yuan

¹,

Yun Liang

^1,*,

Shuqin Tu

¹ and

Cunyi Yang

²

¹

College of Mathematics and Informatics, South China Agricultural University, Guangzhou 510642, China

²

College of Agriculture, South China Agricultural University, Guangzhou 510642, China

^*

Author to whom correspondence should be addressed.

Agriculture 2023, 13(9), 1662; https://doi.org/10.3390/agriculture13091662

Submission received: 14 July 2023 / Revised: 18 August 2023 / Accepted: 18 August 2023 / Published: 23 August 2023

(This article belongs to the Section Digital Agriculture)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The phenotypic characteristics of soybean leaves are of great significance for studying the growth status, physiological traits, and response to the environment of soybeans. The segmentation model for soybean leaves plays a crucial role in morphological analysis. However, current baseline segmentation models are unable to accurately segment leaves in soybean leaf images due to issues like leaf overlap. In this paper, we propose a target leaf segmentation model based on leaf localization and guided segmentation. The segmentation model adopts a two-stage segmentation framework. The first stage involves leaf detection and target leaf localization. Based on the idea that a target leaf is close to the center of the image and has a relatively large area, we propose a target leaf localization algorithm. We also design an experimental scheme to provide optimal localization parameters to ensure precise target leaf localization. The second stage utilizes the target leaf localization information obtained from the first stage to guide the segmentation of the target leaf. To reduce the dependency of the segmentation results on the localization information, we propose a solution called guidance offset strategy to improve segmentation accuracy. We design multiple guided model experiments and select the one with the highest segmentation accuracy. Experimental results demonstrate that the proposed model exhibits strong segmentation capabilities, with the highest average precision (AP) and average recall (AR) reaching 0.976 and 0.981, respectively. We also compare our segmentation results with current baseline segmentation models, and multiple quantitative indicators and qualitative analysis indicate that our segmentation results are better.

Keywords:

plant phenotype; soybean leaf; image segmentation; target localization

1. Introduction

The study of soybean leaf phenotypes plays an important role in soybean breeding, real-time monitoring of plant growth, and precision cultivation management [1]. Phenotypic parameters of soybean leaves include leaf length, leaf width, and leaf area. Traditional methods of data acquisition rely on manual measurements, which are not only time-consuming but also cause irreversible damage to crops [2]. To prevent harm to plant growth, the practice of noncontact data collection is gradually becoming a trend [3]. Images, as the most convenient and easily obtainable medium, have become the primary data type. The target leaf images for phenotype parameter measurements usually contain complex backgrounds. These backgrounds usually contain leaves with the same color and texture as the target leaves, which brings difficulties to segment the target leaves.

Achieving fast and accurate leaf segmentation in complex background conditions has always been a challenge in the field of agricultural image recognition. Currently, there are numerous leaf segmentation algorithms based on traditional image processing techniques. For example, Kumar et al. [4] utilized a graph-based approach to extract leaf regions. Bai et al. [5] utilized a marker-based watershed algorithm that relies on the HSI space to effectively segment target leaves. Kuo et al. [6] proposed a leaf segmentation method based on the k-means algorithm, which utilizes an octree structure to reduce computational complexity and memory usage. Tian et al. [7] proposed an adaptive clustering algorithm called K-means to mitigate the adverse effects of manually selecting inappropriate cluster numbers on the segmentation quality. Gao et al. [8] combined the OTSU and watershed segmentation methods to achieve leaf segmentation by utilizing manually labeled leaf edge points. Although numerous leaf segmentation algorithms have been proposed, these algorithms often heavily depend on the selection of initial parameters, involve complex preprocessing procedures, or fail to effectively segment each leaf in complex practical situations, such as image noise, brightness, and overlapping leaves. These limitations in traditional techniques restrict their widespread use in agricultural production.

In recent years, there has been significant attention given to the use of deep neural networks in addressing agricultural production issues. With the continuous advancement of smart agriculture, the demand for leaf segmentation algorithms in agricultural production is also increasing. Compared to traditional image processing techniques, segmentation models have a wider range of applications, streamlined processes, and a more significant impact. Bhagat et al. [9] proposed an encoder–decoder architecture for leaf segmentation. They used EfficientNet-B4 as the encoder and implemented a lateral output structure to improve segmentation accuracy. Wang et al. [10] proposed an automated algorithm for corn leaf segmentation. The algorithm improves the segmentation results of the model by incorporating image restoration techniques. Liu et al. [11] combined Mask R-CNN [12] with the DBSCAN clustering algorithm to propose a highly accurate automatic segmentation method. Tian [13] combined the mask prediction branch of Mask R-CNN with the U-Net [14] model to improve the accuracy of segmenting apple blossom images. Although the aforementioned methods have yielded positive outcomes, they have only been examined in relatively simple settings. Therefore, their effectiveness must be enhanced when confronted with complex background environments. In addressing the issue of complex backgrounds in image processing, some scholars have adopted a two-stage approach of prescreening the complex background. Wang et al. [15] proposed the DU-Net model, which first utilized DeepLabv3 [16] to segment cucumber leaves and then employed the U-Net model to segment leaf lesions. Tassis et al. [17] proposed a two-stage model based on Mask R-CNN. The model first utilized Mask R-CNN to identify the region of target leaves and then applied the U-Net model to segment leaf lesions within the identified leaf region.

Based on the above research and practical application requirements, we need to solve two problems: (1) High-value leaves in soybean leaf images need to be identified and segmented during the segmentation process. (2) An effective segmentation algorithm needs to be designed for the soybean leaf images with complex background. For the first problem, we set the large leaf close to the center of the image as the target leaf to be segmented, and designed a target leaf localization algorithm to identify the target leaf. For the second problem, we draw on the approach of Wang et al. and propose a two-stage soybean leaf segmentation algorithm. The model consists of two modules, one for target leaf recognition and localization, and the other for the guided segmentation of the target leaf.

The remaining sections of this article are organized as follows: Section 2 presents the materials and methods used in this study. Section 3 analyzes and discusses the experimental results of this study. Section 4 concludes the research and provides future perspectives.

2. Materials and Methods

2.1. Dataset

2.1.1. Large Public Dataset

Before training the relevant deep models using our soybean leaf dataset, we employed the technique of transfer learning. We initialized the model parameters using the weights obtained by pretraining the models on large public datasets. In this paper, we used two large public datasets, Microsoft COCO [18] and Pascal VOC2012 [19].

2.1.2. Data Acquisition

We conducted image data acquisition at the College of Agriculture and Zengcheng Teaching and Research Base of South China Agricultural University in Tianhe District, Guangdong Province, China. The main acquisition device is the iPhone 12, which captured images at a resolution of 4032 × 3024 pixels. The data collection took place in the morning under cloudy weather conditions and without harsh sunlight. We captured images of soybean leaves at two growth stages: flower bud differentiation and flowering and podding. Subsequently, we performed an initial screening of the images to eliminate those with similar backgrounds. Overall, we obtained 220 original images, each of which underwent necessary cropping and resizing operations. The image size was adjusted to 512 × 512 pixels. Figure 1 shows some representative samples of the processed image data.

2.1.3. Data Annotation and Enhancement

We used Labelme (version 5.1.1, https://github.com/wkentaro/labelme, accessed on 18 September 2022) for annotating our dataset. Labelme is a free annotation software specifically designed for object detection and segmentation tasks. In this paper, as our model includes the target leaf localization module and the guided segmentation module, preprocessing of the dataset was necessary to train these two modules effectively. For the target leaf localization module, we required supervised data in the object detection format. This involved annotating bounding boxes to indicate the position and contour of each leaf, as depicted in Figure 2a. Regarding the guided segmentation module, we required supervised data in the object segmentation format. As the objective of the guided segmentation module is to segment target leaves, we only needed to annotate segmentation masks for them, as shown in Figure 2b.

To ensure the trained model possess good robustness, we employ a range of data augmentation techniques to enhance the diversity of the dataset. The specific techniques and their corresponding values are presented in Table 1. After the augmentation process, the dataset comprises a total of 2954 images. We randomly divided 1619 images for model training and allocated 1335 images for model evaluation.

2.2. Methods

As shown in Figure 3, the target leaf segmentation model consists of a target leaf localization module and a guided segmentation module. The primary objective of the target leaf localization module is to identify the target leaf within the image. It accomplishes this by providing the location information of the target leaf to the guided segmentation module in the form of a rectangular bounding box. Additionally, the guided segmentation module is responsible for accurately segmenting the target leaf. It achieves this by leveraging the information provided by the rectangular bounding box, enabling precise segmentation of the leaf.

2.2.1. Target Leaf Localization Module

The localization process of the target leaf localization module comprises two steps: leaf detection and target leaf localization. In the first step, the module employs Libra R-CNN [20] as a leaf detector to accurately detect all the leaves present in the image. Libra R-CNN generates rectangular bounding boxes for each leaf detected. Subsequently, in the second step, the module applies a target leaf localization algorithm to filter out the bounding boxes that correspond to the target leaf from the set of generated bounding boxes.

(1): Libra R-CNN

Libra R-CNN consists of three parts: feature extraction, region proposal generation, and region proposal optimization, as illustrated in Figure 4.

The feature extraction component of Libra R-CNN utilizes a series of network architectures to extract image features, as depicted in Figure 5. Initially, it utilizes ResNet50 [21] to efficiently capture both intricate details and semantic information from the image. Following that, the Feature Pyramid Network (FPN) [22] is introduced to merge feature maps from neighboring levels using a top-down pathway and lateral connections, enabling the generation of multiscale feature representations. Lastly, the Balanced Feature Pyramid (BFP) is proposed to further enhance the representation capability of these features. To achieve this, an embedded Gaussian nonlocal attention module [23] is introduced, which captures global context information within the feature map and enhances its representation by calculating nonlocal similarities.

The network structure employed for region proposal generation is the Region Proposal Network (RPN), as illustrated in Figure 6. RPN was originally proposed by Faster R-CNN [24]. Its primary objective is to generate a set of candidate boxes on a given input image, which might potentially contain the objects to be detected. Within the RPN, the anchor generator generates multiple anchors that cover regions with various scales and aspect ratios. Subsequently, the region proposal generator performs classification and bounding box regression on these anchors to generate a series of candidate proposals. To ensure a balanced distribution of positive and negative anchor samples, Libra R-CNN proposed the implementation of IoU-balanced sampling within the Region Proposal Network. The loss function for the RPN is denoted as

L_{r p n} = \frac{1}{N_{c l s}} \sum_{i} L_{c l s} (p_{i}, p_{i}^{*}) + \frac{1}{N_{r e g}} \sum_{i} p_{i}^{*} L_{r e g} (t_{i}, t_{i}^{*})

(1)

which includes the classification loss

L_{c l s}

and the regression loss

L_{r e g}

of anchors, where

p_{i}

is the probability that anchor i is predicted to be positive.

p_{i}^{*}

is 1 if anchor

i

is positive and 0 otherwise.

t_{i}

is the four predicted regression parameters on anchor

i

, while

t_{i}^{*}

is the actual regression parameters. The anchor classification loss

L_{c l s}

defined by binary cross-entropy is

L_{c l s} = - [p_{i}^{*} \log (p_{i}) + (1 - p_{i}^{*}) \log (1 - p_{i})

(2)

The regression loss

L_{r e g}

is defined as

L_{r e g} (t_{i}, t_{i}^{*}) = \sum_{j \in \{x, y, w, h\}} L 1_{b a l a n c e d} (t_{i j} - t_{i j}^{*})

(3)

L 1_{b a l a n c e d} (x) = {\begin{matrix} \frac{α}{b} (b |x| + 1) l n (b |x| + 1) - α |x| & i f |x| < 1 \\ γ |x| + C & o t h e r w i s e \end{matrix}

(4)

where

t_{i j}

(j = x, y, w, h) is a specific regression parameter of

t_{i}

, which is used to correct the x-coordinate, y-coordinate, height and width of Anchor, respectively, and

t_{i j}^{*}

is a specific regression parameter of

t_{i}^{*}

. In our experiments,

α

is set to 0.5 and

γ

is set to 1.5.

Region proposal optimization involves adjusting the position, width, and height of region proposals, as well as predicting probability scores for each proposal across all classes. The network architecture for this component adopts the region proposal optimization network proposed by Fast R-CNN [25], as shown in Figure 7. To begin, RoIAlign is employed to convert features of the region of interests (RoIs) into small feature maps with a fixed size of 7 × 7. RoIAlign, proposed by Mask R-CNN, serves as a feature extraction module for RoIs. It is an improvement over RoIPooling, initially proposed by Fast R-CNN. RoIAlign maps the region proposals to the corresponding feature map to obtain RoIs, followed by a maximum pooling operation on these regions. Next, the feature matrices are flattened and passed through two consecutive fully connected layers. These layers are followed by two parallel branches, one for outputting class probabilities and the other for regression parameters of each proposal. The regression parameters are then used to adjust the position and size of the proposals. By selecting boxes with a high probability of being classified as a leaf, a series of leaf bounding boxes can be obtained. The loss for region proposal optimization

L_{r p o}

is defined as

L_{r p o} = - \sum_{i} L_{c l s 1} (p_{i}^{'}, u_{i}) + \frac{1}{N_{c l s}^{'}} \sum_{i} [u_{i} > 0] L_{r e g} (t_{i}^{u_{i}}, v_{i})

(5)

Here

p_{i}^{'}

is the softmax probability of proposal

i

for each category (including background).

u_{i}

is the actual category label.

t_{i}^{u_{i}}

is the predicted regression parameters for the category

u_{i}

corresponding to proposal

i

and

v_{i}

is the actual regression parameters. The classification loss

L_{c l s 1}

of the proposals is defined using the cross-entropy loss for multiple classifications as

L_{c l s 1} (p_{i}^{'}, u_{i}) = - \log (p_{i}^{u_{i}})

(6)

where

p_{i}^{u_{i}}

is the predicted probability of the category

u_{i}

corresponding to proposal

i

. The regression loss

L_{r e g}

is used to define the regression loss of proposals, which is calculated in Equation (3).

(2): Target Leaf Localization Algorithm

During the detection phase, we successfully detected the rectangular bounding boxes of all leaves within the image. Now, our next step is to filter out the bounding box that corresponds to the target leaf. The execution flow of the target leaf localization algorithm is outlined below:

Step 1: Calculate the distance between the center point of each bounding box and the center point of the image. This distance is determined by the following equation:

d_{i} = \sqrt{{(p_{i} (x) - c (x))}^{2} + {(p_{i} (y) - c (y))}^{2}}

(7)

Here,

p_{i}

represents the center point coordinates of the i-th bounding box, and c represents the center point coordinates of the image. After calculating the distances for all bounding boxes, we normalize these distances using Equation (8) where

d_{m a x}

represents the maximum distance and

d_{m i n}

represents the minimum distance.

d_{n i} = \frac{d_{i} - d_{m i n}}{d_{m a x} - d_{m i n}}

(8)

Step 2: Calculate the area of each bounding box and normalize all the areas. The normalization formula is defined as

S_{n i} = \frac{S_{i} - S_{m i n}}{S_{m a x} - S_{m i n}}

(9)

Here,

S_{i}

represents the area of the i-th bounding box,

S_{m a x}

represents the maximum area, and

S_{m i n}

represents the minimum area.

Step 3: Calculate the probability score for each bounding box that contains the target leaf and select the bounding box with the highest probability score as the target box. The formula for calculating the probability score is as follows:

P S_{i} = e^{(- \frac{d_{n i}}{σ_{1}})} * e^{(- \frac{(1 - S_{n i})}{σ_{2}})}

(10)

In Equation (10), σ₁ and σ₂ are control parameters. The bounding box for the target leaf should have a relatively large area, while the distance between its center point and the image center should be relatively small. By considering both the area and distance factors and using a Gaussian function, we can balance the effects of area and distance.

2.2.2. Guided Segmentation Module

The guided segmentation module adopts the network structure proposed by Zhang et al. [26] in their interactive segmentation model. It comprises four stages, as illustrated in Figure 8. Firstly, the input data processing stage extracts the location information implied by the bounding box and employs it as guidance for segmenting the target leaf. Secondly, the feature extraction stage extracts multiscale features of the target leaf. Thirdly, the feature refinement stage upsamples and fuses the multiscale features to restore any lost boundary features of the segmentation region. Lastly, in the mask prediction stage, the module generates the mask for the target leaf.

(1): Input Data Processing

The input data processing provides guidance information for the target leaf segmentation model. This process consists of three steps, as illustrated in Figure 9. Firstly, an image cropping operation is performed to obtain a local segmentation area by shifting 30 pixels outward along the bounding box of the target leaf. Secondly, the cropped image is resized to a standard size (e.g., 512 × 512), and the vertex coordinates of the bounding box are adjusted accordingly. Lastly, the coordinates of the center point and the vertices of the bounding box are extracted and used to construct two single-channel Gaussian heatmaps. These heatmaps transform the location information into data that the model can process. The foreground guidance channel is defined by a Gaussian heatmap using the center point coordinate (

x_{0}

,

y_{0}

) of the bounding box, as shown in Equation (11):

F P = e^{(- \frac{{(x - x_{0})}^{2} + {(y - y_{0})}^{2}}{σ^{2}})}

(11)

where

σ = \frac{5}{\sqrt{\log 2}}

. Similarly, the background guidance channel is defined using the four vertex coordinates {

(x_{i}, y_{i})

| i

\in

{1, 2, 3, 4}} of the bounding box, as shown in Equation (12):

B P = m a x {e^{(- \frac{{(x - x_{i})}^{2} + {(y - y_{i})}^{2}}{σ^{2}})} | i \in \{1, 2, 3, 4\}}

(12)

These two Gaussian heat maps are concatenated with the resized image, resulting in an input data with five channels for target leaf segmentation.

(2): Feature Extraction

The feature extraction network in the model follows a structure similar to FPN, as depicted in Figure 10. It comprises two components: basic feature extraction and semantic information fusion. For basic feature extraction, ResNet101 is used to construct a pyramid-structured multiscale feature map. However, unlike the conventional FPN structure, the deepest feature map from ResNet101 is enhanced with global contextual information using the pyramid scene parsing (PSP) module [27], as shown in Figure 11. The PSP module involves averaging the input feature map with four pooling windows of different sizes, followed by sequential convolution, upsampling, and concatenation with the original feature map. The resulting fusion is then processed to obtain a feature map that incorporates semantic information.

(3): Feature Refinement

The feature refinement network aims to address the loss of fine details in the multiscale feature maps extracted from the feature extraction network, which can affect the accuracy of segment boundaries. It achieves this by upsampling and fusing the multiscale feature information. The network structure, as shown in Figure 12, involves convolving the feature maps of different layers using varying numbers of residual blocks. These convolved feature maps are then upsampled to match the size of the lowest-level feature map, and finally, the refined feature map is obtained by concatenating the feature maps from different levels.

(4): Mask Prediction

The structure of the mask prediction network is illustrated in Figure 13. The refined feature map is initially processed by the mask predictor to generate a target mask. Subsequently, the mask is mapped back to the original image based on the position and size of the original target leaf bounding box. The loss function for mask prediction is defined as

L_{m a s k} = - \frac{1}{N} \sum_{i} [y_{i} \log (p_{i}) + (1 - y_{i}) \log (1 - p_{i})]

(13)

where

p_{i}

is the predicted value of pixel

i

of predicted mask and

y_{i}

(0 or 1) is the pixel

i

’s value of ground truth. To provide better supervision during training, the four-layer feature maps extracted by the feature extraction network are also employed for mask prediction. The generated masks are incorporated in the loss calculation. Therefore, the overall loss function for the guided segmentation module can be defined as

L_{t o t a l} = \sum_{k = 1}^{5} L_{m a s k_{k}}

(14)

2.2.3. Guidance Offset Strategy

(1): Definition of Guidance Tolerance Offset Distance

In certain cases, the predicted bounding box for the target leaf by the leaf detector may not completely encompass the entire leaf, as illustrated in Figure 14. This can result to incomplete segmentation outcomes. To address this issue, we propose adjusting the bounding box by moving its four vertices outward by an equal distance. This adjustment allows the bounding box to fully enclose the entire leaf, thereby significantly improving the segmentation effectiveness of the model. We refer to this distance as the guidance tolerance offset distance, denoted as

d_{t}

. The coordinates of the new vertex after the adjustment can be represented as:

(x_{i}, y_{i}) = (x_{i}^{'} + {(- 1)}^{γ_{1}} d_{t}, y_{i}^{'} + {(- 1)}^{γ_{2}} d_{t})

(15)

Here, (

x_{i}^{'}

,

y_{i}^{'}) (i = 1, 2, 3, 4)

represents the original coordinates of the vertex, (

x_{i}

,

y_{i}

) represents the coordinates after the adjustment, and

γ_{1}

,

γ_{2}

are factors that measure the relative positions of the vertices. Specifically,

γ_{1}

is equal to 1 if the vertex is on the left side of the bounding box, and 0 otherwise. Similarly,

γ_{2}

is equal to 1 if the vertex is on the upper side, and 0 otherwise.

(2): Definition of Guidance Offset Strategy

To ensure effective utilization of the guided segmentation module with the guidance tolerance distance, we adopt a similar approach of vertex movement for the bounding box of the input data during the training of the guided segmentation module. This enables training the guided segmentation module while being guided by the guidance offset distance. We refer to the combination of a fixed value of

d_{t}

and the segmentation module trained by

d_{t}^{*}

guidance as a guidance offset strategy. The guidance offset strategy leverages the advantage of

d_{t}

, which helps compensate for the problem where the leaf bounding box fails to enclose the entire leaf. As long as the relaxation degree of the bounding box remains within a reasonable range, the guided segmentation module associated with this strategy can accurately segment the complete leaf. In essence, the guidance offset strategy addresses the problem of unsatisfactory segmentation results caused by the detection bias of the leaf detector, thus, effectively enhancing the segmentation accuracy.

3. Experimental Results and Analysis

The computer configurations for the model experiments are presented in Table 2. We conducted separate training sessions for the leaf detector and the guided segmentation module. The parameter settings for training the two models are outlined in Table 3. To ensure the performances of these models and reduce the training time, we initialized the backbone parameters of Libra R-CNN with pretrained weights from the COCO dataset, and for the guided segmentation module parameters, we utilized weights pretrained on the VOC2012 dataset. It is important to note that both models were trained until convergence was achieved, ensuring optimal performance.

3.1. Selection of Control Parameters for the Target Leaf Localization Algorithm

The control parameters

σ_{1}

and

σ_{2}

are two key parameters of the target leaf localization algorithm, directly influencing the accuracy of the target leaf localization module. To determine the optimal values of

σ_{1}

and

σ_{2}

, we selected 20 numbers within the range of (0, 2] as candidate values of

σ_{1}

and

σ_{2}

, evenly spaced. This resulted in a total of 400 combinations of

σ_{1}

and

σ_{2}

values for analysis. Since the target leaf localization algorithm’s role is to identify the bounding box belonging to the target leaf among the boxes predicted by a detector, we utilized the predicted boxes from Libra R-CNN as the validation data to assess the feasibility of different

σ_{1}

and

σ_{2}

values, instead of relying on a manually labeled dataset. Additionally, we introduced a label to indicate whether a box is the target box or not; a label of one signifies that the box is the target box, while any other value indicates otherwise. We conducted tests on 1335 soybean leaf images, which amounted to a total of 2,956,800 bounding boxes. The results, depicted in Figure 15, show that the green dot represents the

σ_{1}

and

σ_{2}

values that achieved 100% accuracy in leaf localization. Based on the data presented in Figure 15, we ultimately determined the values of

σ_{1}

and

σ_{2}

to be 0.3 and 1.4, respectively.

3.2. Comparison of Different Guidance Offset Strategies

In this section, we examine the impact of different guidance offset strategies on segmentation accuracy. We employ

d_{t}

values of 0, 5, 10, 15, and random fluctuations within the range of [0, 15] to guide the training of the guided segmentation module, resulting in five networks with distinct weights. For simplicity, we refer to these networks as Fix0_SNet, Fix5_SNet, Fix10_SNet, Fix15_SNet, and Ran_SNet, respectively. In this experiment, we select

d_{t}

values of 0, 5, 10, and 15 to align with the aforementioned five modules, resulting in a total of 20 offset strategies. The corresponding segmentation accuracy for each offset strategy is illustrated in Figure 16a, Here, the x-axis represents the values of

d_{t}

, the y-axis represents the segmentation accuracy of the model (measured using AP), and the color of the lines indicates the segmentation module used. From the data presented in the figure, it is evident that the offset strategy utilizing a

d_{t}

value of 5 in conjunction with Fix10_SNet achieves the highest segmentation accuracy of 0.976.

3.3. Comparison of Different Leaf Detectors

In this section, we explore the impact of different leaf detectors on segmentation accuracy. Specifically, we selected two popular object detectors, namely Faster R-CNN and Yolov5x, and conducted experiments using the 20 offset strategies mentioned in the previous section. Before commencing the experiments, we ensured that the target leaf localization algorithm achieved 100% accuracy. The experimental results are presented in Figure 16b,c. From the data shown in the figure, it is clear that when using Faster R-CNN as the leaf detector, the strategy of combining

d_{t}

with a value of 0 with Fix5_SNet, achieves the highest accuracy for the segmentation model, with an AP of 96.2%. Additionally, when employing Yolov5x as the detector, the strategy of combining

d_{t}

with a value of 0 with Fix10_SNet, attains the highest accuracy for the segmentation model, with an AP of 95.5%. By comparing the highest achievable accuracy across the three detectors, it becomes apparent that Libra R-CNN achieves the highest segmentation accuracy, followed by Faster R-CNN, while Yolov5x exhibits the lowest accuracy.

Based on the experimental results in Section 3.2 and Section 3.3, it is evident that guidance offset strategies have a significant impact on the segmentation accuracy by mitigating the effects of guidance information bias. When a guidance offset strategy is appropriately matched with the detection capability of the detector, it ensures a higher segmentation accuracy. By comparing the highest accuracy values displayed in the three figures, we can observe that although the guidance offset strategy can compensate for detection bias, the detection accuracy of the detector still ultimately determines the upper limit of the segmentation accuracy. Furthermore, regardless of leaf detectors, it is evident that strategies employing

d_{t}

with higher values perform worse than other strategies, even when the guided segmentation module is trained under the same value of

d_{t}

. This phenomenon demonstrates that when the guidance information deviates excessively from the target leaf’s bounding box, the segmentation accuracy of the guided segmentation module tends to decrease. In summary, based on the commonalities observed in the three figures, we conclude that the strategies combining the

d_{t}

= 0 or

d_{t}

= 5 with the Fix0_SNet or Fix5_SNet can maximize the improvement in the model’s segmentation accuracy.

3.4. Comparison with Other Segmentation Models

To showcase the performance of our leaf segmentation model, we conducted a contrast experiment using four segmentation models: Mask R-CNN, Yolov5x, DeepLabv3, and U-Net. The evaluation involved both quantitative analysis and qualitative comparison. All the models were trained and tested by our constructed dataset, that is, 1619 images for training and 1335 images for evaluating. To ensure fairness in the comparative experiments, we initialized the model parameters using weights pretrained on the COCO or VOC2012 dataset. Additionally, we ensure that each model was trained to convergence.

The evaluation metrics used in quantitative analysis are Accuracy, Precision, Recall, F1 Score, AP and AR, and the evaluation results of these models’ performance metrics are shown in Table 4. Upon comparing the evaluation data of different models, it is evident that our model outperforms the others across all metrics. This demonstrates the superior localization and segmentation capabilities of our model.

The metrics Accuracy, Precision, and Recall differ from AP and AR. In the segmentation task, the model’s predicted mask pixels are compared with the ground truth pixels. Based on this comparison, the pixels are classified as false–positive, true–positive, false–negative, or true–negative samples, which are then used to compute these three metrics. However, due to the large dataset size we collected and the relatively high proportion of background pixels in the images, the true–positive and true–negative samples can dominate the total samples during the calculation process. This can lead to inflated values for these three indicators, creating an illusion of high model accuracy. Since the F1 Score is the harmonic mean of Precision and Recall, in this scenario, the F1 Score value can also become larger, which is not conducive to the objective evaluation of these models. In contrast, AP and AR are calculated in a more scientific manner. They divide the samples based on the IoU (Intersection over Union) value between the predicted mask and the actual mask, and then calculate Precision and Recall under different IoU thresholds. Finally, these values are averaged. From the AP and AR values shown in Table 4, it is evident that our model, Mask R-CNN, and Yolov5x have higher segmentation accuracies, while DeepLabv3 and U-Net exhibit poorer segmentation performance. To further support these conclusions, we also compared some actual segmentation results of these models, as depicted in Figure 17. The results in the figure clearly demonstrate the higher accuracy of our model, while DeepLabv3 and U-Net do not perform as well as the other three models.

4. Conclusions

Target soybean leaf extraction is a prerequisite for calculating the phenotypic parameters of soybean leaves. In this paper, a segmentation model for the soybean target leaf was proposed by combining object detection and guided segmentation technologies. Based on the idea that the target leaf is located in the center of the image and the leaf area is large, a method to locate the target soybean leaf was provided. To alleviate the issue of poor segmentation caused by guidance information bias, the guidance offset strategy was proposed. Various experimental data and comparative analysis show that our model has higher segmentation accuracy and better generalization capacity. However, since the target leaf and the background leaf are highly similar both in color and texture, in some cases, it is difficult to find the boundary between the target leaf and the background leaf, which may induce the wrong segmentation. In order to further improve the segmentation precision, image depth or NeRF (Neural Radiance Field)-based implicit 3D reconstruction technology may be adopted to obtain more information to identify the foreground and background soybean leaves in the future.

Author Contributions

D.W.: methodology; Z.H. and H.Y.: software; Y.L.: funding; S.T.: writing—review and editing; C.Y.: data curation. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the key R&D project of Guangzhou (202206010091, 2023B03J1363), the Special Fund for Rural Revitalization Strategy of Guangdong (2023TS-3), and College Students’ Innovation and Entrepreneurship Competition (X202210564178).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

Reynolds, M.; Chapman, S.; Crespo-Herrera, L.; Molero, G.; Mondal, S.; Pequeno, D.N.; Pinto, F.; Pinera-Chavez, F.J.; Poland, J.; Rivera-Amado, C.; et al. Breeder friendly phenotyping. Plant Sci. 2020, 295, 110396. [Google Scholar] [CrossRef] [PubMed]
Yang, W.; Feng, H.; Zhang, X.; Zhang, J.; Doonan, J.H.; Batchelor, W.D.; Xiong, L.; Yan, J. Crop phenomics and high-throughput phenotyping: Past decades, current challenges, and future perspectives. Mol. Plant 2020, 13, 187–214. [Google Scholar] [CrossRef] [PubMed]
Ward, B.; Brien, C.; Oakey, H.; Pearson, A.; Negrão, S.; Schilling, R.K.; Taylor, J.; Jarvis, D.; Timmins, A.; Roy, S.J.; et al. High-throughput 3D modelling to dissect the genetic control of leaf elongation in barley (Hordeum vulgare). Plant J. 2019, 98, 555–570. [Google Scholar] [CrossRef] [PubMed]
Kumar, J.P.; Domnic, S. Image based leaf segmentation and counting in rosette plants. Inf. Process. Agric. 2019, 6, 233–246. [Google Scholar] [CrossRef]
Bai, X.; Li, X.; Fu, Z.; Lv, X.; Zhang, L. A fuzzy clustering segmentation method based on neighborhood grayscale information for defining cucumber leaf spot disease images. Comput. Electron. Agric. 2017, 136, 157–165. [Google Scholar] [CrossRef]
Kuo, K.; Itakura, K.; Hosoi, F. Leaf segmentation based on k-means algorithm to obtain leaf angle distribution using terrestrial LiDAR. Remote Sens. 2019, 11, 2536. [Google Scholar] [CrossRef]
Tian, K.; Li, J.; Zeng, J.; Evans, A.; Zhang, L. Segmentation of tomato leaf images based on adaptive clustering number of K-means algorithm. Comput. Electron. Agric. 2019, 165, 104962. [Google Scholar] [CrossRef]
Gao, L.; Lin, X. A method for accurately segmenting images of medicinal plant leaves with complex backgrounds. Comput. Electron. Agric. 2018, 155, 426–445. [Google Scholar] [CrossRef]
Bhagat, S.; Kokare, M.; Haswani, V.; Hambarde, P.; Kamble, R. Eff-UNet++: A novel architecture for plant leaf segmentation and counting. Ecol. Inform. 2022, 68, 101583. [Google Scholar] [CrossRef]
Wang, P.; Zhang, Y.; Jiang, B.; Hou, J. An maize leaf segmentation algorithm based on image repairing technology. Comput. Electron. Agric. 2020, 172, 105349. [Google Scholar] [CrossRef]
Liu, X.; Hu, C.; Li, P. Automatic segmentation of overlapped poplar seedling leaves combining Mask R-CNN and DBSCAN. Comput. Electron. Agric. 2020, 178, 105753. [Google Scholar] [CrossRef]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-Cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Tian, Y.; Yang, G.; Wang, Z.; Li, E.; Liang, Z. Instance segmentation of apple flowers using the improved mask R–CNN model. Biosyst. Eng. 2020, 193, 264–278. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention–MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; pp. 234–241. [Google Scholar]
Wang, C.; Du, P.; Wu, H.; Li, J.; Zhao, C.; Zhu, H. A cucumber leaf disease severity classification method based on the fusion of DeepLabV3+ and U-Net. Comput. Electron. Agric. 2021, 189, 106373. [Google Scholar] [CrossRef]
Chen, L.C.; Papandreou, G.; Schroff, F.; Adam, H. Rethinking atrous convolution for semantic image segmentation. arXiv 2017, arXiv:1706.05587. [Google Scholar]
Tassis, L.M.; de Souza, J.E.T.; Krohling, R.A. A deep learning approach combining instance and semantic segmentation to identify diseases and pests of coffee leaves from in-field images. Comput. Electron. Agric. 2021, 186, 106191. [Google Scholar] [CrossRef]
Lin, T.Y.; Maire, M.; Belongie, S.; Hays, J.; Perona, P.; Ramanan, D.; Dollár, P.; Zitnick, C.L. Microsoft Coco: Common Objects in Context. In Proceedings of the Computer Vision—ECCV 2014: 13th European Conference, Zurich, Switzerland, 6–12 September 2014; pp. 740–755. [Google Scholar]
Everingham, M.; Eslami, S.M.A.; Van Gool, L.; Williams, C.K.I.; Winn, J.; Zisserman, A. The Pascal Visual Object Classes Challenge: A Retrospective. Int. J. Comput. Vis. 2015, 111, 98–136. [Google Scholar] [CrossRef]
Pang, J.; Chen, K.; Shi, J.; Feng, H.; Ouyang, W.; Lin, D. Libra R-Cnn: Towards Balanced Learning for Object Detection. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 821–830. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-Local Neural Networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7794–7803. [Google Scholar]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster R-CNN: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2015, 28, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Girshick, R. Fast R-Cnn. In Proceedings of the IEEE International Conference on Computer Vision, Santiago, Chile, 7–13 December 2015; pp. 1440–1448. [Google Scholar]
Zhang, S.; Liew, J.H.; Wei, Y.; Wei, S.; Zhao, Y. Interactive Object Segmentation with Inside-Outside Guidance. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 12234–12244. [Google Scholar]
Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2881–2890. [Google Scholar]

Figure 1. Examples of processed image data.

Figure 2. Examples of image annotation (a) Image annotation for the target leaf localization module. (b) Image annotation for the guided segmentation module.

Figure 3. The general framework of Target Leaf Segmentation Model.

Figure 4. The overall architecture of Libra R-CNN.

Figure 5. The structure of Feature Extraction Network.

Figure 6. The structure of Region Proposal Network.

Figure 7. The structure of Region Proposal Optimization.

Figure 8. The four stages of Target Leaf Segmentation Model.

Figure 9. The three steps of input data processing.

Figure 10. The structure of Feature Extraction Network.

Figure 11. The structure of PSP module.

Figure 12. The structure of Feature Refinement Network.

Figure 13. The structure of Mask Prediction Network.

Figure 14. Original bounding box generated by leaf detector and bounding box after vertex movement. (a) The original bounding box. (b) Bounding boxes after using different guidance tolerance offset distances.

Figure 15. The accuracy of Target Leaf Localization Algorithm using different values of

σ_{1}

and

σ_{2}

. The green dots indicate that the corresponding values of

σ_{1}

,

σ_{2}

can make the accuracy of the algorithm 100%. The red dots indicate that the corresponding values of

σ_{1}

and

σ_{2}

cannot make the accuracy of the algorithm 100%.

Figure 15. The accuracy of Target Leaf Localization Algorithm using different values of

σ_{1}

and

σ_{2}

. The green dots indicate that the corresponding values of

σ_{1}

,

σ_{2}

can make the accuracy of the algorithm 100%. The red dots indicate that the corresponding values of

σ_{1}

and

σ_{2}

cannot make the accuracy of the algorithm 100%.

Figure 16. The segmentation accuracy of the model with different leaf detectors and guidance offset strategies. (a) Use Libra R-CNN as the detector. (b) Use Faster R-CNN as the detector. (c) Use Yolov5x as the detector.

Figure 17. The segmentation results of different models.

Table 1. The data-augmentation operations and the corresponding values.

Operation	Value
flip	horizontal/vertical flip
brightness	{0.4, 0.8}
gaussian noise	mean = 0.0 Standard deviation = {10, 18}

Table 2. Computer configuration.

Configuration	Parameter
CPU	Intel(R) Core(TM) i7—6700 CPU
GPU	GeForce GTX 1080 Ti
Operating system	Ubuntu 22.04 LTS
Base environment	CUDA: 11.6
Development environment	Pycharm2022

Table 3. Training setting.

Parameter	Leaf Detector	Leaf Segmentation Network
Epoch	60	100
Learning rate	0.001	1 × 10⁻⁸
Batch	4	5
Weight decay	0.0005	0.005
Momentum	0.9	0.9

Table 4. Comparison with other segmentation models.

Model	AP	AR	Accuracy	Precision	Recall	F1
Ours	0.976	0.981	0.993	0.9899	0.9901	0.99
Mask R-CNN	0.921	0.936	0.9838	0.9759	0.9778	0.9769
Yolov5x	0.851	0.866	0.9619	0.9412	0.9509	0.9459
DeepLabv3	0.767	0.815	0.9645	0.9422	0.9584	0.9769
U-Net	0.794	0.834	0.9675	0.9544	0.9521	0.9532

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, D.; Huang, Z.; Yuan, H.; Liang, Y.; Tu, S.; Yang, C. Target Soybean Leaf Segmentation Model Based on Leaf Localization and Guided Segmentation. Agriculture 2023, 13, 1662. https://doi.org/10.3390/agriculture13091662

AMA Style

Wang D, Huang Z, Yuan H, Liang Y, Tu S, Yang C. Target Soybean Leaf Segmentation Model Based on Leaf Localization and Guided Segmentation. Agriculture. 2023; 13(9):1662. https://doi.org/10.3390/agriculture13091662

Chicago/Turabian Style

Wang, Dong, Zetao Huang, Haipeng Yuan, Yun Liang, Shuqin Tu, and Cunyi Yang. 2023. "Target Soybean Leaf Segmentation Model Based on Leaf Localization and Guided Segmentation" Agriculture 13, no. 9: 1662. https://doi.org/10.3390/agriculture13091662

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Target Soybean Leaf Segmentation Model Based on Leaf Localization and Guided Segmentation

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.1.1. Large Public Dataset

2.1.2. Data Acquisition

2.1.3. Data Annotation and Enhancement

2.2. Methods

2.2.1. Target Leaf Localization Module

2.2.2. Guided Segmentation Module

2.2.3. Guidance Offset Strategy

3. Experimental Results and Analysis

3.1. Selection of Control Parameters for the Target Leaf Localization Algorithm

3.2. Comparison of Different Guidance Offset Strategies

3.3. Comparison of Different Leaf Detectors

3.4. Comparison with Other Segmentation Models

4. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI