Segmentation and Coverage Measurement of Maize Canopy Images for Variable-Rate Fertilization Using the MCAC-Unet Model

Gong, Hailiang; Xiao, Litong; Wang, Xi

doi:10.3390/agronomy14071565

Open AccessArticle

Segmentation and Coverage Measurement of Maize Canopy Images for Variable-Rate Fertilization Using the MCAC-Unet Model

by

Hailiang Gong

¹,

Litong Xiao

² and

Xi Wang

^1,*

¹

College of Engineering, Heilongjiang Bayi Agricultural University, Daqing 163319, China

²

Eighty-Five Two Company, Heilongjiang Beidahuang Agricultural Co., Ltd., Shuangyashan 155620, China

^*

Author to whom correspondence should be addressed.

Agronomy 2024, 14(7), 1565; https://doi.org/10.3390/agronomy14071565

Submission received: 8 June 2024 / Revised: 12 July 2024 / Accepted: 17 July 2024 / Published: 18 July 2024

Download

Browse Figures

Versions Notes

Abstract

:

Excessive fertilizer use has led to environmental pollution and reduced crop yields, underscoring the importance of research into variable-rate fertilization (VRF) based on digital image technology in precision agriculture. Current methods, which rely on spectral sensors for monitoring and prescription mapping, face significant technical challenges, high costs, and operational complexities, limiting their widespread adoption. This study presents an automated, intelligent, and precise approach to maize canopy image segmentation using the multi-scale attention and Unet model to enhance VRF decision making, reduce fertilization costs, and improve accuracy. A dataset of maize canopy images under various lighting and growth conditions was collected and subjected to data augmentation and normalization preprocessing. The MCAC-Unet model, built upon the MobilenetV3 backbone network and integrating the convolutional block attention module (CBAM), atrous spatial pyramid pooling (ASPP) multi-scale feature fusion, and content-aware reassembly of features (CARAFE) adaptive upsampling modules, achieved a mean intersection over union (mIOU) of 87.51% and a mean pixel accuracy (mPA) of 93.85% in maize canopy image segmentation. Coverage measurements at a height of 1.1 m indicated a relative error ranging from 3.12% to 6.82%, averaging 4.43%, with a determination coefficient of 0.911, meeting practical requirements. The proposed model and measurement system effectively address the challenges in maize canopy segmentation and coverage assessment, providing robust support for crop monitoring and VRF decision making in complex environments.

Keywords:

maize canopy images; coverage; semantic segmentation; Unet; MobilenetV3; variable-rate fertilization

1. Introduction

In the context of current agricultural development, maize, as one of the most important food crops, relies heavily on fertilization for optimal yield. Excessive fertilization not only reduces fertilizer use efficiency and increases production costs without corresponding yield benefits, but also leads to the accumulation of unused fertilizers, causing soil and water pollution along with a series of environmental [1]. To address the problems caused by over-fertilization, modern precision agriculture has introduced variable-rate fertilization (VRF) technology issues [2]. Studies have shown that improvements in crop yields can be planned and achieved through the use of variable-rate fertilization in major cereals [3,4]. However, the widespread adoption of VRF is hindered by the high technical complexity, cost, and operational difficulty of commonly used methods such as spectral sensor monitoring and prescription maps.

Currently, digital image technology is widely applied in the analysis of maize growth, and monitoring maize growth using digital image technology has become a research hotspot [5]. Researchers have already developed fertilization models based on crop coverage parameters to guide VRF. To enhance the intelligence level of fertilization machinery and reduce fertilization costs, this paper proposes a VRF method based on maize canopy coverage [6,7]. The prerequisite for implementing this technology is the accurate measurement of maize canopy coverage [8]. Therefore, this study focuses on the segmentation algorithms for maize canopy images in the field and the measurement of maize canopy coverage.

With the rapid advancement of digital imaging and computer vision technologies, digital imaging devices have become important near-ground monitoring tools, capable of replacing traditional spectral monitoring tools by accurately extracting crop canopy coverage [9]. Their portability, low cost, intuitive nature, fast monitoring speed, and non-destructive advantages make digital imaging devices highly advantageous and promising for crop growth monitoring. Many researchers, both domestically and internationally, have conducted extensive studies in this area. By combining images captured by digital imaging devices with image segmentation algorithms, it is possible to quickly and accurately analyze potential changes in crop canopy structure, monitor crop growth, calculate canopy coverage, and measure bare soil area, thereby yielding significant economic benefits. Canopy coverage, defined as the percentage of ground covered by the vertical projection of vegetation leaves, branches, and stems, is a crucial phenotypic parameter of crops. It reflects the vegetation’s ability to intercept light and serves as a key indicator for assessing crop growth status and predicting yield. Currently, digital imaging technology is widely used in crop growth analysis. Estimating crop canopy coverage through digital imaging technology offers the potential for variable-rate fertilization (VRF) by fertilization machinery. For example, ref. [10] developed a nitrogen management model based on canopy coverage during the maize vegetative growth stage, providing a simple new method for precise nitrogen management in maize. Similarly, ref. [11] constructed a VRF control system based on crop coverage for wheat. These studies demonstrate that accurate measurement of crop canopy segmentation and coverage significantly promotes field VRF technology and guides practical VRF applications. Crop coverage reflects the growth status of crops and is an important parameter during the growth process. In coverage measurement research, constructing an efficient maize canopy segmentation model is fundamental. The segmentation performance of the model directly impacts the accuracy of coverage measurement. Crop canopy coverage is a key parameter reflecting crop growth status, and its accurate measurement relies on effective canopy image segmentation techniques [12,13]. Traditional image segmentation algorithms have been widely applied in agriculture, including thresholding methods, clustering methods, region growing methods, and graph-based methods. These methods primarily rely on manually extracting various image features such as grayscale, color, texture, and spatial geometry, enhancing the differences between the foreground and background regions to achieve separation. In recent years, extensive research has been conducted in the field of crop canopy image segmentation and coverage measurement, focusing mainly on the application of traditional image processing and deep learning algorithms. Early studies predominantly used traditional image processing techniques for canopy segmentation. For instance, ref. [14] processed cotton canopy images using photoshop software and estimated ground cover percentage through black-and-white pixel counting. Ref. [15] employed an automatic thresholding method based on green pixel intensity and a gaussian mixture model (GMM) for wheat canopy segmentation. Additionally, André Coy et al. successfully segmented the green canopy of various crops using histogram-based thresholding techniques. Although these methods are effective under specific conditions, they are generally limited by image quality and environmental variations. Ref. [16] proposed a winter wheat coverage extraction method based on an improved K-means algorithm, demonstrating superior segmentation performance compared to traditional methods. Significant progress has been made in crop canopy segmentation and coverage measurement technology, providing robust technical support for precision agriculture and crop growth monitoring. Despite the relative simplicity and ease of implementation of traditional image segmentation methods, they exhibit several limitations. Firstly, manually defined image features are often insufficient to comprehensively describe crop responses to complex growth environments, making these methods suitable only for specific types of image segmentation tasks. Secondly, traditional algorithms often lack generalization capability when processing crop images under different lighting conditions or growth stages, leading to frequent over-segmentation and under-segmentation issues. These problems primarily arise because traditional methods heavily depend on manually designed features, which cannot fully capture the variations in crop images under different environments. To overcome these limitations, researchers need to explore more intelligent and adaptive image segmentation techniques, such as deep learning methods, to improve the accuracy and robustness of crop image segmentation, thereby better serving agricultural production practices. Advances in deep learning algorithms: With the development of deep learning technology, more studies have focused on utilizing convolutional neural networks (CNNs) and other deep network models for image segmentation. Ref. [17] achieved high-precision segmentation of cotton leaves by combining depth maps and multi-view images with an improved Unet network. Ref. [18] used a CNN-based model to evaluate canopy coverage in intercropping systems involving peas and oats. Furthermore, refs. [19,20] applied digital image processing and deep learning models to strawberries and wheat/rice, respectively, significantly improving segmentation accuracy and practicality. Innovations in Latest Techniques and Algorithms: Recent research also includes the adoption of new algorithms and architectures to address complex backgrounds and enhance segmentation efficiency. Ref. [21] developed a new deep learning architecture, ThelR547v1, which optimized memory usage and improved segmentation accuracy.

The aim of this research is to develop an advanced, automated, and highly precise system for maize canopy image segmentation and coverage measurement, ultimately enhancing the efficiency and accuracy of variable-rate fertilization (VRF) decision making. This study will leverage a diverse dataset of maize canopy images, collected under various lighting and growth conditions, to design and train a sophisticated semantic segmentation model. To optimize the model’s performance in terms of lightweight efficiency and accuracy. To ensure applicability in real-world scenarios, the model’s capabilities will be validated through maize canopy coverage measurements, with calculations of relative error and determination coefficients. The ultimate objective is to develop a comprehensive system based on this enhanced semantic segmentation model, enabling real-time maize canopy segmentation and coverage measurement adaptable to various field conditions. This system will allow users to effectively monitor maize growth and make informed VRF decisions. By meeting these objectives, this study aims to provide critical insights and technical support for precision agriculture, thereby improving crop growth monitoring and fertilization practices.

2. Materials and Methods

2.1. Data Collection and Preprocessing

2.1.1. Determining the Sampling Height

From 15 June to 29 June 2022, during the intertillage fertilization period in Zhaoguang county, Beian city, the sampling height for maize canopy images was validated, with images collected every other day (on the 15th, 17th, 19th, 21st, 23rd, 25th, 27th, and 29th). The maize studied was planted in a wide-ridge double-row pattern (with ridges 110 cm wide and row spacing on the ridges of 40 cm), and only the two smaller rows on each ridge were segmented.

To ensure diversity in the samples, images were captured using two different smartphones: vivo X21 and Huawei nova 3, with resolutions of 5632 × 4224 and 3024 × 3024 pixels, respectively. The images were stored in JPG format. For image collection, smartphones were securely mounted on a tripod, positioned for vertical overhead shooting, as shown in Figure 1.

The experiment involved six different collection heights: 0.9 m, 1.0 m, 1.1 m, 1.2 m, 1.3 m, and 1.4 m. Images were also randomly collected at intervals within these heights for validation. The support structure used for positioning the smartphones was telescopic, allowing for precise adjustment to the desired heights. This flexibility ensured that the smartphones could be accurately positioned to capture images at varying heights relative to the plant growth stages.

Images were manually segmented, and maize coverage at different heights was calculated using photoshop 2021. The experiment showed that on the 15th, 17th, 19th, 21st, 23rd, 25th, 27th, and 29th, maize coverage at heights between 1.0 m and 1.2 m effectively reflected the canopy coverage. On the 15th, maize coverage at 1.0 m was similar to that at 0.9 m, with only a slight increase; on the 29th, coverage peaked at 1.2 m, declining at higher heights. This trend indicates that the height range of 1.0 m–1.2 m can effectively capture variations in maize canopy coverage at different stages, accurately reflecting growth conditions.

Therefore, the fixed image collection heights of 1.0 m, 1.1 m, and 1.2 m were selected for this study, with additional random sampling within this range to enrich the dataset. These heights were chosen based on their ability to maximize coverage metrics, making the trained model highly adaptable. This is crucial for evaluating maize growth and making informed VRF decisions.

2.1.2. Image Acquisition

The classification of maize growth stages typically relies on subjective human assessment, making precise fertilization timing challenging. In practical agricultural production, fertilization timing needs to be dynamically adjusted based on the specific growth conditions of the maize in the field. To construct a comprehensive dataset, this study collected maize canopy images under various lighting conditions during the local intertillage fertilization periods.

A total of 1370 images were collected in Zhaoguang county, Beian city, from 15 June to 29 June 2022, and an additional 2955 images were collected in Qinggang county, Suihua city, from 15 June to 29 June 2023, resulting in a dataset of 4325 maize canopy images. The image acquisition heights were kept consistent to ensure data uniformity. Images were collected every other day, with three sampling plots selected at each location, covering various maize varieties. In each field, five different operational areas were randomly chosen as sampling regions.

The collected images covered the seedling stage and the trumpet stage of maize growth, where plant heights ranged from approximately 35 cm to 85 cm. To account for varying light conditions, images were captured under five different illumination levels, measured using a light meter. These levels were categorized based on the intensity of illumination: weak light conditions (less than 5000 Lux, including overcast mornings, noontime, and afternoons, as well as clear mornings) and strong light conditions (more than 5000 Lux, including clear noontime and afternoons). Additionally, some images contained straw and weeds, increasing the dataset’s diversity and practical applicability.

To ensure objectivity in defining light conditions, illumination data under different heights and angles were collected, minimizing the impact of sunlight angles and shadows on image quality. The definitions of overcast and clear conditions were determined by the average and peak values of daily illumination. Because this study’s data collection covered a wide range of lighting conditions, the trained model exhibited strong generalizability, accommodating variations in maize canopy images across different growth stages. This flexibility allows for adaptable camera height adjustments in practical applications.

To ensure consistency and comparability of images under different conditions, images were calibrated during the acquisition process. Prior to each session, a standard calibration board with known reflectance and color properties was used as a reference for white balance and exposure. This method effectively reduced image bias caused by varying lighting conditions and angles, ensuring dataset consistency. Further post-processing was performed to correct light intensity and white balance, maintaining uniform color and brightness across all images. This study employed a static acquisition method to capture field images of maize canopies, with sample images shown in Figure 2 and Figure 3.

2.1.3. Maize Canopy Image Dataset Creation

To facilitate model training and eliminate edge distortion areas, we cropped the image dimensions from 5632 × 4224 pixels to 4224 × 4224 pixels and ultimately annotated images with a resolution of 3024 × 3024 pixels. Given the complexity of maize canopy images, manual annotation using commonly used tools like labelme and photoshop 2021 can be challenging. Therefore, we employed the excess green (ExG) algorithm, known for its sensitivity to green vegetation, in combination with Photoshop for annotation.

Firstly, we applied the ExG algorithm for initial image segmentation. This algorithm effectively identifies and extracts green vegetation regions but may erroneously classify some shadows or weeds as vegetation, necessitating manual corrections. During manual intervention, we used the polygonal lasso tool in photoshop 2021 for precise corrections. Several intervention parameters were established, including edge tolerance, color threshold, region area size, and edge smoothness, to ensure the accuracy and consistency of the corrections.

The edge tolerance was set in the range of 2–5 pixels to fine-tune edge positions without extensive adjustments, thus saving time while maintaining accuracy. The color threshold for the ExG algorithm was set within the range of 100–255 in the G channel of the RGB values, guiding manual adjustment for similarly colored but mis-segmented areas. For mis-segmented or unsegmented regions, an area threshold was established: regions smaller than 50 pixels were likely misclassified and required manual handling, while larger areas necessitated more detailed inspection and adjustment. The edge smoothness parameter was set between 5–10 to ensure natural transitions along the edges. During the intervention process, erroneous regions identified by the ExG algorithm were first removed by selecting and deleting the incorrectly marked areas using the lasso tool. Next, unsegmented regions were added by manually marking the unidentified vegetation areas. Finally, edge optimization was performed by fine-tuning inaccurate edges and applying the edge smoothness parameters. These manual interventions ensured the annotation results were accurate and met the requirements for model training. The image annotation process is illustrated in Figure 4.

2.1.4. Dataset Construction

Given the computational complexity of semantic segmentation algorithms, the input image size must match the network’s input size during training. Consequently, the resolution of both the original and annotated images was reduced to 512 × 512 pixels. Based on this, the dataset was divided into training, validation, and testing sets in a 6:2:2 ratio, as shown in Table 1. The training set is used to train the model parameters, the validation set is used to optimize the model, and the testing set is used to verify the model’s robustness.

2.1.5. Preprocessing of Maize Canopy Images

To address the issue of insufficient maize image quantities, data augmentation techniques were employed to increase the diversity of training samples, effectively preventing overfitting and enhancing the representativeness of the dataset. Additionally, these techniques improve the model’s segmentation performance and enhance its generalization capability, resulting in superior performance on the test set. During each training iteration, a dynamic data augmentation method was applied to the cropped maize canopy images and their annotated counterparts. This approach further enriches the training data and boosts model performance. Considering the varying orientations of images captured during the maize canopy image collection, and the relatively fixed growth direction of maize in the existing images, introducing rotation operations aids in enhancing dataset diversity and improving the model’s generalization ability. Specifically, the cropped raw images and their corresponding annotated images were subjected to counterclockwise rotation by 45 degrees, mirror flipping, brightness adjustments, and the addition of Gaussian noise, as illustrated in Figure 5.

Normalization of data is a crucial step in image processing. It adjusts the range of pixel intensity values and addresses issues of uneven pixel distribution that can negatively impact semantic segmentation performance. For RGB-channel color images, normalization ensures data stability and consistency. This prevents extended training times and convergence difficulties, thereby enhancing model performance and efficiency.

Given the unique characteristics of agricultural image datasets, we have decided to use linear normalization in subsequent experiments. This approach aims to accelerate model convergence and improve prediction accuracy. Linear transformation adjusts the raw data, ensuring the processed data falls within a specific range. This method enhances the stability and consistency of image data. The mathematical expression for the pixel value after linear normalization is shown in Equation (1).

I_{N} = (I - M i n) \frac{n e w M a x - n e w M i n}{M a x - M i n} + n e w M i n

(1)

In the equation,

I \in M i n, M a x

represents the pixel value of the original image, and

n e w M i n, n e w M a x

represents the range of pixel values after linear transformation.

2.2. Maize Canopy Image Segmentation Based on the MCAC-Unet Model

The proposed maize canopy image segmentation model is based on an improved Unet architecture, with the backbone network replaced by MobileNetV3. The MobileNetV3-Unet significantly reduces the model’s parameter count and the size of the weight files, providing faster execution speed when deployed on edge devices.

To address the issues of under-segmentation and over-segmentation in maize canopy images caused by complex backgrounds and loss of details, an improved MCAC-Unet model was introduced (the structure is shown in Figure 6). By incorporating the convolutional block attention module (CBAM) into the skip connections, the model can automatically focus on more useful feature regions during feature fusion. For maize canopy images, the inclusion of CBAM allows the network to concentrate on the most crucial features, such as the edges of maize leaves, textures, or contrasts with the surrounding environment. This enhances the model’s ability to recognize specific parts of the maize canopy, enabling more accurate differentiation between maize vegetation and non-vegetation areas in complex backgrounds.

The atrous spatial pyramid pooling (ASPP) module was embedded between the encoder and decoder bottleneck layers to capture rich contextual information. Maize canopies typically consist of leaves of varying sizes and orientations, necessitating a model capable of recognizing maize plants across different field sizes. By embedding ASPP into the maize canopy image segmentation task, the model can better understand the morphological features of maize plants at various scales. ASPP aids in identifying maize leaves of different sizes and shapes, facilitating the precise segmentation of maize leaves and the recognition of voids between the leaves.

The original model’s upsampling process was replaced with the content-aware reassembly of features (CARAFE) module. The CARAFE module can adaptively reassemble features, ensuring that the upsampled feature maps retain rich spatial information while integrating deep contextual information. By introducing content-awareness via a weighting map, the CARAFE module makes the upsampling process more flexible and targeted. The model can adjust the upsampling method and intensity according to different regions of the input image, better accommodating variations in the canopy structure. This enhances the segmentation accuracy and boundary clarity, reducing the occurrences of over-segmentation and under-segmentation, thereby improving overall segmentation performance.

2.3. The MCAC-Unet Semantic Segmentation Model

2.3.1. The Unet Semantic Segmentation Model

The Unet semantic segmentation model [22] employs a U-shaped architecture, consisting of an encoder, a decoder, and skip connections. The encoder comprises five stages, each containing two 3 × 3 convolutional layers with ReLU activation functions, followed by a 2 × 2 max-pooling layer with a stride of 2. The decoder also consists of four stages, each performing upsampling via transposed convolutions [23]. The skip connection structure is responsible for merging feature maps from the encoder and decoder to enhance pixel-level segmentation accuracy. The Unet architecture is straightforward and well-suited for scenarios with limited data and computational resources. It is an efficient semantic segmentation network, as illustrated in Figure 7.

2.3.2. Lightweight Backbone Network

The Unet semantic segmentation model employs MobileNet as its backbone network. The MobileNet architecture utilizes depthwise separable convolution for feature extraction, significantly reducing both the parameter count and computational load of the network, thus meeting the requirements for deployment on low-power devices. Building upon the foundations of MobileNetV1 [24] and MobileNetV2 [25], the MobileNetV3 network [26] was introduced. MobileNetV3 retains the functionalities of depthwise separable convolution and linear bottleneck inverted residual structure from its predecessors, while incorporating the squeeze-and-excitation (SE) module. The architecture of the MobileNetV3 network is illustrated in Figure 8.

Depthwise separable convolutions are illustrated in Figure 9. It consists of two stages: first, a depthwise convolution is performed where each convolution kernel processes only one channel of the input feature map. This is followed by a pointwise convolution, which adjusts the channel number using 3 × 3 convolution kernels. Unlike traditional convolution, depthwise convolution interacts with a single channel at a time, reducing the number of parameters. Through these two steps, depthwise separable convolution achieves feature extraction and channel adjustment while reducing parameters, making it more efficient than standard convolution.

The inverted residual structure with linear bottleneck reduces computational load while preserving fine-grained features. It employs a design with fewer channels at the ends and more channels in the middle. This structure is inspired by the residual configuration of 1 × 1, 3 × 3, and 1 × 1 convolutions. The key difference lies in the use of depthwise convolution for the 3 × 3 kernel instead of the standard convolution process. The structure follows a stride pattern of 1 and X. The inverted residual structure with linear bottleneck is depicted in Figure 10.

The SE module is not a standalone network structure but an auxiliary module that can be integrated into other networks. Its core function is to automatically evaluate the importance of each feature channel through a learning mechanism. Based on this learned importance, the SE module can enhance beneficial features and suppress irrelevant ones. Additionally, the H-swish activation function employed reduces computational complexity and further improves the accuracy of the MobileNetV3 network.

For the backbone of the MobileNetV3-Unet network, we have replaced the Unet model’s encoder with the MobileNetV3-Small model to meet lightweight requirements. The modified feature extraction backbone is detailed in Table 2. MobileNetV3-Unet first processes the input image with a standard convolution module using sixteen 3 × 3 kernels, with a stride of 2, to adjust the channel number and reduce feature map dimensions. This is followed by a series of stacked “MobileNetV3-bneck” modules, inheriting the structural characteristics of MobileNetV3-Small. This stacking process achieves feature downsampling by varying strides, consistent with Unet’s downsampling strategy, occurring four times. The key features obtained after downsampling are then concatenated with the upsampled features, enhancing feature extraction capability. Finally, the output is generated through the prediction layer.

2.3.3. CBAM Convolutional Attention Mechanism

In the MobileNetV3-Unet network structure, adding attention mechanisms can help identify important regions or features in the segmented image. By incorporating the CBAM, we can adjust the distribution of weight resources while maintaining minimal increases in computational and parameter overhead. CBAM refines feature weights from both channel and spatial dimensions [27], enhancing network recognition performance through multi-dimensional feature enhancement.

Assume the input feature map is

F^{C \times H \times W}

, where C is the number of channels, H is the height, and W is the width of the input image. The image first passes through a channel attention module, which generates the weighted channel attention map. This is followed by a spatial attention module, resulting in the final weighted spatial attention image. The entire attention computation process is summarized in Equations (2)–(4).

F^{'} = M_{C} (F) \otimes F

(2)

F^{″} = M_{S} (F^{'}) \otimes F^{'}

(3)

F \in R^{H \times W \times C}, M_{C} \in R^{1 \times 1 \times C}, M_{S} \in R^{H \times W \times 1}

(4)

F represents the input image,

F^{'}

and

F^{″}

represent the final precise output, M_c denotes the spatial dimension operation of attention feature extraction, and M_s signifies the operation of attention feature extraction in the channel dimension.

In the MobileNetV3-Unet network architecture, the CBAM has been introduced. CBAM is a hybrid attention mechanism that integrates both spatial and channel attention mechanisms, thereby capturing and utilizing feature information more comprehensively. The structure of CBAM is illustrated in Figure 11.

2.3.4. Atrous Spatial Pyramid Pooling

The atrous spatial pyramid pooling (ASPP) [28] applies parallel sampling of the given input using atrous convolutions with different sampling rates, effectively capturing the image context at multiple scales. ASPP is a combination of atrous convolution [29] and spatial pyramid pooling (SPP) [30].

The ASPP innovates on the SPP by introducing a series of dilated convolutions with different dilation rates in each branch of the SPP. This allows each dilation rate to independently extract features, capturing features of different scales and contextual information. Subsequently, these features extracted by different dilation rates are combined to form the final feature representation, as shown in Equation (5):

Y = c o n c a t (i m a g e (X), H_{1,1} (X), H_{6,3} (X), H_{12,3} (X), H_{18,3} (X))

(5)

In the equation,

H_{r, n} (X)

represents the atrous convolution applied to X with a sampling rate r and a kernel size of n × n. image (X) represents the image-level features extracted from the input X using the global average pooling method.

The atrous pyramid first processes the input image through a series of operations: 1 × 1 × 1 convolution, pyramid pooling (three 3 × 3 atrous convolutions), and an atrous pooling layer (comprising pooling, 1 × 1 convolution, and upsampling operations). The results are then concatenated. The pyramid pooling employs atrous convolutions with dilation rates of 6, 12, and 18 to achieve different receptive fields, enabling multi-scale feature extraction. Figure 12 illustrates the structure of the ASPP.

2.3.5. CARAFE Feature Upsampling Factor

CARAFE [31] consists of an upsampling kernel prediction module and a feature reassembly module. CARAFE maps each pixel of the input image to multiple positions in the output image using an interpolation function, followed by a convolution operation to merge the features at these positions to generate the output image [32]. In CARAFE, the upsampling factor is a critical parameter that defines the resolution multiplication ratio between the input and output images.

Assuming an upsampling factor of

σ

, let the input feature map have a shape of (H, W, C). The upsampling kernel prediction module predicts the upsampling kernel, and the feature reassembly module performs the upsampling to produce a feature map of shape (

σ

H,

σ

W, C). The interpolation function is calculated as follows Equation (6):

(x^{'}, y^{'}) = (α x + δ_{x}, α y + δ_{y})

(6)

In the equation, (x, y) denote the pixel position in the input image,

(x^{'}, y^{'})

denote the pixel position in the output image, and (

δ_{x}

,

δ_{y}

) be the offsets of (x, y).

For a given position

(x^{'}, y^{'})

in the output image, the surrounding input image pixels at

(x_{i}, y_{i})

with corresponding weights

w_{i}

determine the output pixel value

O (x^{'}, y^{'})

. This pixel value can be calculated using the following Equation (7):

O (x^{'}, y^{'}) = \sum_{i = 1}^{N} w_{i} \times I (x^{'}, y^{'})

(7)

In this equation, N represents the number of pixels in the input image that contribute to the pixel at

(x^{'}, y^{'})

in the output image.

I (x^{'}, y^{'})

denotes the pixel value at position

(x_{i}, y_{i})

in the input image. The CARAFE module achieves upsampling by learning appropriate offset values

δ_{x}

and

δ_{y}

, as well as the corresponding weights

w_{i}

. This enables upsampling operations based on the input image. The structure of the CARAFE module is illustrated in Figure 13.

2.4. Model Training

2.4.1. Experimental Platform and Parameter Settings

The experiments were conducted in a Windows 10 64-bit operating environment, with Python version 3.11. The convolutional neural network was constructed using the PyTorch 1.11.0+cu8.2.1 deep learning framework, while data analysis and processing were performed using Python 3.11. The hardware utilized included a 12th generation Intel i5-12400 processor and an NVIDIA RTX 2080Ti GPU, with 16 GB of RAM. For the convenience of model training, the original images were resized to 512 × 512 pixels before being input into the network. The momentum factor was set to 0.99, the weight decay coefficient to 0.00001, and the number of iterations to 100 epochs. The batch size for each training iteration was set to 4, and the Adam optimizer was selected. The initial learning rate was set at 1× 10⁻⁴.

2.4.2. Model Evaluation Metrics

The model’s segmentation performance was evaluated using metrics such as mean pixel accuracy (mPA), mean intersection over union (mIoU), weight files, and parameter counts. The mPA metric represents the average probability of correctly classifying pixels within each class in the image, while mIoU quantifies the ratio of the intersection and union of predicted and true values. The formulas for calculating these metrics are provided in Equations (8) and (9).

m P A = \frac{1}{k + 1} \sum_{i = 0}^{k} \frac{P_{i j}}{\sum_{j = 0}^{k} P_{i j}}

(8)

m I O U = \frac{1}{k + 1} \sum_{i = 0}^{k} \frac{T_{P} + T_{N}}{T_{P} + T_{N} + F_{P} + F_{N}} \times 100 %

(9)

T_P represents True Positive, indicating instances that are correctly identified as positive. Specifically, it refers to the number of pixels manually labeled as maize regions and also predicted as maize regions by the model. T_N stands for True Negative, representing instances that are correctly identified as negative. This corresponds to the number of pixels manually labeled as background and also predicted as background by the model. F_P denotes False Positive, which are instances that are incorrectly identified as positive. F_N represents False Negative, indicating instances that are incorrectly identified as negative. Here, k represents the number of classes, i denotes the true value, and j denotes the predicted value, representing the number of pixels correctly predicted as that class.

2.4.3. Canopy Coverage Calculation

Threshold segmentation was applied to the images, dividing them into black and white regions, where white represents the maize canopy and black represents the background (non-canopy areas). The ratio of the white area to the total image area is defined as the maize canopy coverage. Canopy coverage and accuracy were determined based on the definition of vegetation coverage (C_C). For each segmented image, the total number of pixels in the RGB image is denoted as Y_b, and the number of pixels in the maize canopy is denoted as X_m. The formula for calculating the maize canopy coverage for a given plot is provided in Equation (10).

C_{c} = \frac{X_{m}}{Y_{b}} \times 100 %

(10)

In the equation, X_m represents the number of pixels in the maize tassel part, and Y_b represents the total number of pixels in the image.

The estimation accuracy of maize canopy coverage was quantitatively evaluated using the coefficient of determination (R²) and the root mean square error (RMSE) are provided in Equations (11) and (12). A higher R² value and a lower RMSE value indicate a higher degree of agreement between the model’s segmented coverage and the actual labeled coverage.

R^{2} = [\sum_{i = 1}^{N} {(x_{i} - \bar{x})}^{2} {(y_{i} - \bar{y})}^{2}] / [N \sum_{i = 1}^{N} {(x_{i} - \bar{x})}^{2} \sum_{i = 1}^{N} {(y_{i} - \bar{y})}^{2}]

(11)

R M S E = \sqrt{\sum_{i = 1}^{N} {(y_{i} - x_{i})}^{2} / N}

(12)

x_{i}, x

represent the actual label coverage and the average actual label coverage, respectively;

y_{i}, y

represent the model segmentation coverage and the average model segmentation coverage, respectively; N represents the total number of samples.

3. Results

3.1. Test Results of Different Backbone Networks

To evaluate the effectiveness of lightweight network models for maize canopy image segmentation, we tested the performance of Ghost-Unet, MobileNetV3-Unet, and ShuffleNetV2-Unet. The experimental results were obtained by validating these models on the test dataset. A comparative summary of the quantitative results for different backbone networks is provided in Table 3. Compared to the traditional Unet model, the lightweight variants show a reduction in segmentation performance, but they offer significant advantages in terms of model size and deployability. Specifically, MobileNetV3-Unet shows a decrease in mean intersection over union (mIoU) by 3.05% and mean Pixel Accuracy (mPA) by 4.24% compared to the traditional Unet. ShuffleNetV2-Unet exhibits a similar trend with mIoU and mPA reductions of 6.45% and 6.23%, respectively. Ghost-Unet’s mIoU and mPA are also reduced by 5.41% and 5.30%, respectively. Although these lightweight models experience varying degrees of performance decline, the parameter count and weight file sizes are significantly reduced compared to the traditional Unet. This reduction indicates that substituting the traditional Unet’s feature extraction encoder with more lightweight networks like MobileNetV3, ShuffleNetV2, and Ghost can drastically reduce the model’s size and its weight file, leading to faster run times when deployed on edge devices in practical applications. To visualize the performance comparison among the three lightweight models, we examined mIoU curves and Loss function curves.

As depicted in Figure 14, MobileNetV3-Unet outperforms ShuffleNetV2-Unet and Ghost-Unet in terms of higher mIoU values. Additionally, MobileNetV3-Unet demonstrates faster and more stable convergence of the Loss function. By comparing the charts, it is evident that the improved lightweight model, MobileNetV3-Unet, exhibits the best network performance. Although it has a slightly higher parameter count compared to the other two lightweight models, it achieves an excellent balance between accuracy and model size. Therefore, this model is better suited for practical maize canopy image segmentation in field conditions.

3.2. Improved Network Segmentation Performance

To validate the effectiveness of the improved models for maize canopy segmentation, we conducted tests using images captured at different growth stages and under varying lighting conditions, including mixed scenes with weeds and straw. We compared the performance of the improved models (MobilenetV3-Unet, Ghost-Unet, ShufflenetV2-Unet), the standard Unet model, and the traditional OUST algorithm against manually annotated ground truth labels.

As illustrated in Figure 15, the segmentation results of the different improved networks were compared with the traditional OUST algorithm. The MobilenetV3-Unet and Unet models demonstrated superior segmentation performance relative to OUST. Under both strong and weak lighting conditions, Unet and the improved lightweight models effectively mitigated the impact of illumination variations, ensuring the accuracy of the segmentation results, whereas the OUST algorithm struggled to compensate for lighting effects. Additionally, as shown in Figure 15a,c,e, when dealing with scenes containing mixed weeds and straw, both Unet and the improved lightweight models exhibited robust segmentation capabilities, accurately extracting the crop canopy images. This superior performance is attributed to the high adaptability of the models to complex environmental conditions. However, despite their overall effective segmentation, the semantic segmentation models exhibited relatively poor edge segmentation performance in scenarios where maize plants were large and heavily overlapping, as indicated in Figure 15b,d. Consequently, issues of mis-segmentation and missed segmentation of maize plants across different growth stages persisted.

3.3. Ablation Study

To verify the effectiveness of different modules in the improved model, we conducted an ablation study. The results are presented in Table 4, comparing the original Unet, MobileNetV3-Unet (M-Unet), MobileNetV3-Unet with CBAM (MC-Unet), MobileNetV3-Unet with CBAM and ASPP (MCA-Unet), and MobileNetV3-Unet with CBAM, ASPP, and CARAFE (MCAC-Unet). The key performance metrics considered were mIoU and mPA, along with model weight file sizes and parameter counts. All experimental results were obtained by validating the models on the test dataset.

By analyzing Table 4, we found that M-Unet performed worse than other models across these metrics. This is primarily because M-Unet’s lightweight design reduces its ability to capture complex features. However, when the CBAM module is added to M-Unet, the mIoU and mPA metrics improve by 2.92% and 2.99%, respectively. This indicates that CBAM enhances the model’s ability to identify and focus on key features in the maize canopy and background. Adding the ASPP module further improves the mIoU and mPA metrics by 1.86% and 2.62%, respectively, compared to the M-Unet + CBAM model. This demonstrates the effectiveness of ASPP in capturing multi-scale features. When the CARAFE module is included, the mIoU and mPA metrics increase by another 1.28% and 1.00%, respectively. This shows that the upsampled feature maps not only retain rich spatial information but also incorporate deep contextual information. The results demonstrate that the combination of CBAM, ASPP, and CARAFE significantly enhances the performance of the segmentation task. CBAM focuses attention on critical features, ASPP captures multi-scale contextual information to adapt to object scale variations, and CARAFE introduces content-aware weighting, making the upsampling process more flexible and targeted. The final improved model has a smaller parameter count and weight file size compared to the original Unet model, improving performance and maintaining a lightweight design.

As shown in Figure 16, MCAC-Unet outperforms the other models with higher mIoU values. Its Loss function starts to converge around the 15th epoch, doing so more quickly and smoothly. This proves the effectiveness of the proposed improvements.

Combining Figure 16 and Table 4, it is evident that the improved model demonstrates the best network performance. Although the improved model slightly increases the parameter count, the significant overall performance boost validates the effectiveness of the proposed model architecture. It successfully strikes a balance between performance and parameter size. In summary, compared to the original Unet and M-Unet, the MCAC-Unet with ASPP, CBAM, and CARAFE shows higher accuracy in image segmentation tasks. It is particularly effective in distinguishing different objects and features in complex scenes.

3.4. Test Results of Different Network Models

We compare the proposed MCAC-Unet model with classic semantic segmentation networks, namely DeeplabV3+, PSPnet, and Segnet. As shown in Table 5, the MCAC-Unet model outperforms the other models in both mIoU and mPA, with improvements of 2.08% and 2.36% over the best-performing DeeplabV3+ model, respectively. Additionally, the MCAC-Unet model has the smallest weight file size and parameter count, indicating that the lightweight design effectively reduces the model size. The multi-scale feature fusion strategy, attention mechanism, and feature upsampling factors significantly enhance segmentation performance.

To validate the effectiveness of the improved models for maize canopy segmentation, we utilized images captured under various growth stages and lighting conditions, including scenarios with mixed weeds and straw. Specifically, we compared the performance of the MCAC-Unet, M-Unet, Unet, and the traditional OUST algorithm against manually annotated ground truth labels, as shown in Figure 17. Compared to traditional segmentation algorithms, the improved MCAC-Unet model demonstrated significant advantages. It achieved superior results in both fine edge segmentation and overall segmentation. As illustrated in Figure 17d, the MCAC-Unet model exhibited more meticulous handling of edge details and better capture of contextual information compared to the Unet model. In conditions of both weak and strong illumination, as depicted in Figure 17a,e, the improved model effectively mitigated the impact of lighting variations on segmentation outcomes, ensuring both accuracy and consistency of the results. Additionally, in scenarios involving mixed weeds and straw, the improved model excelled in accurately extracting crop canopy images, indicating strong adaptability and precise feature capture capabilities for distinguishing crops from weeds. Comparative analysis of the MCAC-Unet, M-Unet, and Unet models revealed that MCAC-Unet achieved the best segmentation results. Regardless of illumination conditions or the presence of mixed weeds and straw, MCAC-Unet provided superior segmentation of maize plants, especially in terms of edge precision and the accurate capture of inter-plant spaces. Notably, as shown in Figure 17b, the improved model effectively captured fine features such as maize leaf tips, demonstrating the effectiveness of the proposed enhancements. These findings affirm the significant improvements in segmentation accuracy and robustness offered by the MCAC-Unet model, underscoring its potential for practical application in precision agriculture.

3.5. Evaluation of Maize Crop Canopy Coverage

To validate the accuracy of the system’s coverage measurement, we conducted coverage rate calculations on collected images. This study involved testing images taken on the 17th, 19th, 23rd, and 25th days at heights of 1.0 m, 1.1 m, and 1.2 m (with ten images selected for each height). The measurements obtained from the trained model were compared against manually annotated ground truth labels to assess the accuracy of the coverage estimation. Figure 18 depicts the agreement between estimated maize canopy coverage and the annotated labels at the three different heights.

As the maize plants grew, the number and size of leaves increased, and leaf inclination angles became larger, resulting in an overall upward trend in canopy coverage. At a height of 1.0 m (Figure 18a), the relative error between the model’s coverage measurement and the actual values ranged from 3.43% to 7.06%, with a maximum error not exceeding 7.06% and an average error of 4.49%. This level of accuracy meets the requirements for actual maize canopy coverage measurement. The coefficient of determination (R²) at 1.0 m was 0.911, and the root mean square error (RMSE) was 0.995. For the 1.1 m height (Figure 18b), the relative error in coverage measurement ranged from 3.12% to 6.82%, with a maximum error not exceeding 6.82% and an average error of 4.43%, which is comparable to the 1.0 m height. This indicates that the model can meet the precision requirements for field operations. The R² at 1.1 m was 0.911, and the RMSE was 0.424. At the height of 1.2 m (Figure 18c), the relative error ranged from 2.84% to 6.16%, with a maximum error not exceeding 6.16% and an average error of 4.15%, indicating lower error rates. The R² at 1.2 m was 0.911, and the RMSE was 0.997, suggesting that the model provides a reliable estimation of canopy coverage and that the segmentation effect of the model is satisfactory, meeting the precision requirements for actual field operations. Based on these results, the 1.1 m height demonstrated the smallest RMSE compared to 1.0 m and 1.2 m, indicating that at the height of 1.1 m, the model’s accuracy in detecting maize coverage is suitable for practical field applications.

4. Discussion

This paper presents a maize canopy segmentation and coverage measurement method based on the MCAC-Unet model, aiming to achieve automation, intelligence, and precision in maize canopy segmentation to support variable fertilization decisions. Compared to previous studies on cotton leaf segmentation [17] and pea/oat canopy coverage assessment [18], this research constructs a comprehensive dataset of 4325 maize canopy images under various growth stages and lighting conditions. The diversity of this dataset significantly enhances the model’s generalization capabilities for complex field environments. Meticulous manual annotation was performed using super-green algorithms and Photoshop tools to ensure label accuracy, providing high-quality supervision for model training. Data augmentation techniques such as rotation, flipping, and noise addition were employed to increase diversity and improve generalization, thereby preventing overfitting. Appropriate cropping and normalization were implemented as data preprocessing steps to accelerate convergence and boost training efficiency. The quality of data directly impacts model performance. While manual annotation is commendable, it can be cost-prohibitive, warranting the exploration of semi-automated or weakly supervised methods to reduce costs while preserving quality.

The MCAC-Unet model incorporates several improvements: A lightweight MobileNetV3 backbone, enabled by depthwise separable convolutions and inverted residuals, facilitates deployment on edge devices for real-time performance, albeit with a potential trade-off in feature extraction capabilities, necessitating a balance between accuracy and efficiency. A CBAM attention module is included to automatically focus on critical features such as leaf edges and textures, thereby improving segmentation accuracy by better utilizing limited resources. Additionally, an ASPP module is implemented to extract multi-scale contextual information, effectively handling scale variations in the crop canopy, which consists of differently sized leaves. The CARAFE upsampling module is used to adaptively reconstruct features, preserving details while integrating contextual information to enhance boundary clarity and reduce missed segmentation. Through these enhancements, the MCAC-Unet model achieves substantially higher segmentation accuracy under a manageable parameter budget, making it better suited for complex field conditions, though adjustments may be necessary based on specific tasks and hardware. Extensive experiments validate the effectiveness of the improved model. Ablation studies demonstrate the rationality of each module’s contributions toward accuracy and convergence. Testing under diverse growth stages, lighting conditions, and weed scenarios exhibits strong adaptability and segmentation quality compared to classic approaches. Coverage measurement at varying heights yielded average errors within 5%, meeting practical needs. While commendable in breadth and diversity, further expanding the scale and including factors such as geographic variation and different crop varieties could enhance evaluation objectivity.

Future work could involve further optimization of the model structure and attention/fusion strategies for higher precision and robustness. Extending the application of the model to other crop canopy segmentation and measurement tasks would broaden its utility in precision agriculture. Optimizing the fertilization model involves continuously collecting coverage data at different growth stages, along with corresponding fertilizer amounts and crop growth conditions. Data analysis can then elucidate the specific relationship between coverage and fertilizer requirements. Establishing and validating the model would entail building a regression model or machine learning model based on the collected data and verifying its accuracy through experiments, with iterative adjustments to enhance prediction accuracy. Refining fertilization strategies based on model predictions can be implemented through field experiments to verify the impact on crop growth and yield, leading to further improvements in fertilization approaches. Integrating these models with IoT and big data for real-time monitoring and intelligent management could significantly support agricultural decision making. These prospects highlight promising areas for innovation, forecasting significant contributions in model design, data processing, and experimental validation, while leaving ample room for further research and development.

5. Conclusions

Semantic segmentation of images can significantly advance the development of new variable-rate fertilization methods, reducing labor costs and enhancing productivity. This study emphasizes the segmentation of maize canopy images in complex field environments and the real-time generation of coverage information. By balancing lightweight design and segmentation accuracy, we propose a maize canopy image segmentation model based on the MCAC-Unet network. This improved model aims to boost the accuracy and automation of maize canopy image segmentation.

Detailed data annotation was carried out on the original images, assigning precise class labels to each region to build a comprehensive maize canopy image dataset. To prevent network overfitting and accelerate convergence, data augmentation and normalization preprocessing techniques were employed. The results indicated that data augmentation significantly increased the diversity of training samples, providing a robust data foundation for the training, validation, and testing of the maize canopy image segmentation model.

To meet the practical demands of field operations, we compared three lightweight network models based on the improved Unet paradigm: Ghost-Unet, ShufflenetV2-Unet, and MobileNetV3-Unet. The performance of these models was validated on the maize canopy dataset, with results confirming their effectiveness and analyzing their respective advantages and disadvantages. Compared to Ghost-Unet and ShufflenetV2-Unet, the proposed MobileNetV3-Unet achieved the best balance between accuracy and model size.

Further optimization was performed on the MobileNetV3-Unet by incorporating convolutional attention mechanisms, multi-scale feature fusion, and feature upsampling techniques, resulting in the new MCAC-Unet model. Validation on the constructed maize canopy dataset yielded mIoU of 87.51% and mPA of 93.85%. Compared to the original Unet model, the MCAC-Unet not only improved accuracy but also became more lightweight. The inclusion of the CBAM allowed the network to autonomously focus on informative feature regions. For maize canopy image segmentation, the network could accurately identify features crucial for distinguishing the maize canopy from the background, such as leaf edges, textures, and contrasts with the surrounding environment. The integration of the ASPP module enabled the model to better understand the morphological characteristics of maize plants at different scales, aiding in the identification of leaves of various sizes and shapes and improving segmentation accuracy. The addition of the CARAFE module further enhanced the Unet model’s performance by improving segmentation precision and boundary clarity, thereby reducing mis-segmentation and under-segmentation and enhancing overall segmentation quality.

The MCAC-Unet model was tested on images captured at different heights to evaluate measurement accuracy. The segmentation requirements were successfully met at heights ranging from 1.0 m to 1.2 m. For canopy coverage measurement at a 1.1 m height, the relative error ranged from 3.12% to 6.82%, with an average error of 4.43%, a coefficient of determination (R²) of 0.911, and a root mean square error (RMSE) of 0.424. The coverage detection at a 1.1 m height was more accurate, aligning well with the requirements for field operations. Overall, the MCAC-Unet model demonstrates significant potential for improving the precision and efficiency of maize canopy segmentation and coverage measurement.

Author Contributions

Conceptualization, L.X. and X.W.; methodology, L.X.; software, H.G.; validation, L.X., H.G. and X.W.; formal analysis, H.G.; writing—original draft preparation, H.G.; writing—review and editing, H.G.; visualization, H.G.; supervision, L.X.; project administration, X.W.; funding acquisition, X.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Research and Development Program of China, grant number 2016YFD020060802, and the “Three Verticals” Basic Cultivation Program of Heilongjiang Bayi Agricultural University, grant number ZRCPY202306.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. The dataset is not publicly available due to project restrictions.

Conflicts of Interest

Author Litong Xiao was employed by the company Eighty-Five Two Company, Heilongjiang Beidahuang Agricultural Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Correndo, A.A.; Rotundo, J.L.; Tremblay, N.; Archontoulis, S.; Coulter, J.A.; Ruiz-Diaz, D.; Franzen, D.; Franzluebbers, A.J.; Nafziger, E.; Schwalbert, R. Assessing the uncertainty of maize yield without nitrogen fertilization. Field Crop Res. 2021, 260, 107985. [Google Scholar] [CrossRef]
Guerrero, A.; De Neve, S.; Mouazen, A.M. Current sensor technologies for in situ and on-line measurement of soil nitrogen for variable rate fertilization: A review. Adv. Agron. 2021, 168, 1–38. [Google Scholar] [CrossRef]
Brambilla, M.; Romano, E.; Toscano, P.; Cutini, M.; Biocca, M.; Ferré, C.; Comolli, R.; Bisaglia, C. From Conventional to Precision Fertilization: A Case Study on the Transition for a Small-Medium Farm. Agriengineering 2021, 3, 438–446. [Google Scholar] [CrossRef]
Romano, E.; Bragaglio, A.; Bisaglia, C.; Assirelli, A.; Premoli, E.; Bergonzoli, S. Case Study on the Economic and Environmental Impact of the Introduction of the Variable-Rate Distribution of Fertilizer in Wheat (Triticum aestivum L.) Cultivation. Sustainability 2024, 16, 1612. [Google Scholar] [CrossRef]
Veramendi, W.N.C.; Cruvinel, P.E. Method for maize plants counting and crop evaluation based on multispectral images analysis. Comput. Electron. Agric. 2024, 216, 108470. [Google Scholar] [CrossRef]
Zhang, M.; Zhou, J.; Sudduth, K.A.; Kitchen, N.R. Estimation of maize yield and effects of variable-rate nitrogen application using UAV-based RGB imagery. Biosyst. Eng. 2020, 189, 24–35. [Google Scholar] [CrossRef]
Shi, Y.; Zhu, Y.; Wang, X.; Sun, X.; Ding, Y.; Cao, W.; Hu, Z. Progress and development on biological information of crop phenotype research applied to real-time variable-rate fertilization. Plant Methods 2020, 16, 1–15. [Google Scholar] [CrossRef] [PubMed]
Wang, P.; Zhang, Y.; Jiang, B.; Hou, J. An maize leaf segmentation algorithm based on image repairing technology. Comput. Electron. Agric. 2020, 172, 105349. [Google Scholar] [CrossRef]
Yu, X.; Yin, D.; Nie, C.; Ming, B.; Xu, H.; Liu, Y.; Bai, Y.; Shao, M.; Cheng, M.; Liu, Y. Maize tassel area dynamic monitoring based on near-ground and UAV RGB images by U-Net model. Comput. Electron. Agric. 2022, 203, 107477. [Google Scholar] [CrossRef]
Jiang-peng, F.; Biao, J.; Xue, W.; LAN, Y. Construction of critical nitrogen concentration model based on canopy coverage and the accuracy in yield prediction of maize. J. Plant Nutr. Fertil. 2021, 27, 1703–1713. [Google Scholar] [CrossRef]
Yun, Y.; Chang, X.; Jian, X.; Jin, W. Crop Coverage Measurement System Design for Intelligent Fertilizing Machine. J. Agric. Mech. Res. 2018, 40, 226–230. [Google Scholar] [CrossRef]
Lu, Z.; Qi, L.; Zhang, H.; Wan, J.; Zhou, J. Image Segmentation of UAV Fruit Tree Canopy in a Natural Illumination Environment. Agriculture 2022, 12, 1039. [Google Scholar] [CrossRef]
Qiao, L.; Zhao, R.; Tang, W.; An, L.; Sun, H.; Li, M.; Wang, N.; Liu, Y.; Liu, G. Estimating maize LAI by exploring deep features of vegetation index map from UAV multispectral images. Field Crop Res. 2022, 289, 108739. [Google Scholar] [CrossRef]
Stewart, A.M.; Edmisten, K.L.; Wells, R.; Collins, G.D. Measuring canopy coverage with digital imaging. Commun. Soil. Sci. Plan Anal. 2007, 38, 895–902. [Google Scholar] [CrossRef]
Sadeghi-Tehran, P.; Virlet, N.; Sabermanesh, K.; Hawkesford, M.J. Multi-feature machine learning model for automatic segmentation of green fractional vegetation cover for high-throughput field phenotyping. Plant Methods 2017, 13, 1–16. [Google Scholar] [CrossRef] [PubMed]
Zhao, W.; Ji, J.; Ma, H.; Jin, X.; Li, X.; Ma, H. Extraction of winter wheat coverage based on improved K-means algorithm. J. Agric. Sci. Technol. 2023, 1, 83–91. [Google Scholar] [CrossRef]
Li, Y.; Cao, Z.; Xiao, Y.; Cremers, A.B. DeepCotton: In-field cotton segmentation using deep fully convolutional network. J. Electron. Imaging 2017, 26, 53028. [Google Scholar] [CrossRef]
Munz, S.; Reiser, D. Approach for image-based semantic segmentation of canopy cover in PEA–OAT intercropping. Agriculture 2020, 10, 354. [Google Scholar] [CrossRef]
Sandino, J.D.; Ramos-Sandoval, O.L.; Amaya-Hurtado, D. Method for estimating leaf coverage in strawberry plants using digital image processing. Rev. Bras. Eng. Agrícola Ambient. 2016, 20, 716–721. [Google Scholar] [CrossRef]
Gao, Y.; Li, Y.; Jiang, R.; Zhan, X.; Lu, H.; Guo, W.; Yang, W.; Ding, Y.; Liu, S. Enhancing green fraction estimation in rice and wheat crops: A self-supervised deep learning semantic segmentation approach. Plant Phenomics 2023, 5, 64. [Google Scholar] [CrossRef]
Islam, M.P.; Hatou, K.; Aihara, T.; Kawahara, M.; Okamoto, S.; Senoo, S.; Sumire, K. ThelR547v1—An Asymmetric Dilated Convolutional Neural Network for Real-time Semantic Segmentation of Horticultural Crops. Sensors 2022, 22, 8807. [Google Scholar] [CrossRef] [PubMed]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional networks for biomedical image segmentatio. In Proceedings of the International Conference on Medical Image Computing and Computer-Assisted Intervention, Munich, Germany, 5–9 October 2015; Springer International Publishing: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar]
Ullah, H.S.; Asad, M.H.; Bais, A. End to end segmentation of canola field images using dilated U-Net. IEEE Access 2021, 9, 59741–59753. [Google Scholar] [CrossRef]
Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017. [Google Scholar] [CrossRef]
Sandler, M.; Howard, A.; Zhu, M.; Zhmoginov, A.; Chen, L. Mobilenetv2: Inverted residuals and linear bottlenecks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–23 June 2018; pp. 4510–4520. [Google Scholar]
Howard, A.; Sandler, M.; Chu, G.; Chen, L.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V. Searching for mobilenetv3. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 1314–1324. [Google Scholar]
Woo, S.; Park, J.; Lee, J.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the Computer Vision—ECCV 2018: 15th European Conference, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Chen, L.; Papandreou, G.; Kokkinos, I.; Murphy, K.; Yuille, A.L. Deeplab: Semantic image segmentation with deep convolutional nets, atrous convolution, and fully connected crfs. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 40, 834–848. [Google Scholar] [CrossRef]
Yu, F.; Koltun, V. Multi-scale context aggregation by dilated convolutions. arXiv 2015. [Google Scholar] [CrossRef]
Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid scene parsing network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6230–6239. [Google Scholar]
Wang, J.; Chen, K.; Xu, R.; Liu, Z.; Loy, C.C.; Lin, D. Carafe: Content-aware reassembly of features. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 November 2019; pp. 3007–3016. [Google Scholar]
Lu, H.; Liu, W.; Fu, H.; Cao, Z. FADE: Fusing the assets of decoder and encoder for task-agnostic upsampling. In Proceedings of the Computer Vision—ECCV 2022, Tel Aviv, Israel, 23–27 October 2022; Springer: Cham, Switzerland, 2022; pp. 231–247. [Google Scholar]

Figure 1. Image acquisition schematic.

Figure 2. Low-light conditions.

Figure 3. High-light conditions.

Figure 4. Image annotation.

Figure 5. Original and processed images.

Figure 6. The structure of the MCAC-Unet network model.

Figure 7. The structure of the Unet network model.

Figure 8. The structure of the MobileNetV3 network.

Figure 9. Depthwise separable convolutions.

Figure 10. The inverted residual structure with linear bottleneck.

Figure 11. The structure of CBAM.

Figure 12. The structure of the atrous spatial pyramid pooling.

Figure 13. The structure of the CARAFE module.

Figure 14. Model training curves of different backbone networks.

Figure 15. Segmentation results of the improved backbone network. (a) Weak light with abundant crop residues and weeds (b) Overlapping crop leaves (c) Unobstructed leaves with minimal crop residues and weeds under normal conditions (d) Overlapping crop leaves with the presence of weeds (e) Strong light with abundant weeds.

Figure 16. Model training curves.

Figure 17. Segmentation results of the improved network. (a) Weak light with abundant crop residues and weeds (b) Overlapping crop leaves (c) Unobstructed leaves with minimal crop residues and weeds under normal conditions (d) Overlapping crop leaves with the presence of weeds (e) Strong light with abundant weeds.

Figure 18. Measurement results at different heights.

Table 1. Classification of training sets and test sets.

Dataset Categories	Image Size	Split Ratio	Number of Samples
Training Set	512 × 512	60%	2595
Validation Set	512 × 512	20%	865
Test Set	512 × 512	20%	865

Table 2. The backbone of the MobileNetV3-Unet network backbone.

Network Layer	Output	Convolution Kernel	Stride	SE	Output Channels
Input Image	512 × 512	-	-	-	3
Conv1	256× 256	3 × 3	2	-	16
MobilenetV3-bneck	128 × 128	3 × 3	2	1	32
MobilenetV3-bneck	64 × 64	3 × 3	2	-	88
MobilenetV3-bneck	64 × 64	3 × 3	1	-	96
MobilenetV3-bneck	32 × 32	5 × 5	2	1	112
MobilenetV3-bneck	32 × 32	5 × 5	1	1	160
MobilenetV3-bneck	32 × 32	5 × 5	1	1	160
MobilenetV3-bneck	32 × 32	5 × 5	1	1	320
MobilenetV3-bneck	32 × 32	5 × 5	1	1	320
MobilenetV3-bneck	16 × 16	5 × 5	2	-	288
MobilenetV3-bneck	16 × 16	5 × 5	1	-	576
MobilenetV3-bneck	16 × 16	5 × 5	1	-	576

Table 3. The quantitative results for different backbone networks.

Model	mIOU/%	mPA/%	Weight File/MB	Parameters
Unet	84.54	91.49	94.62	24,712,178
MobilenetV3-Unet	81.49	87.25	14.93	3,872,583
ShufflenetV2-Unet	78.09	85.26	12.46	3,161,726
Ghost-Unet	79.13	86.19	13.97	3,640,107

Table 4. Ablation study.

Unet	M-Unet	CBAM	ASPP	CAFARE	mIOU/%	mPA/%	Weight File/MB	Parameters
√					84.54	91.49	94.62	24,712,178
√	√				81.49	87.25	14.93	3,872,583
√	√	√			84.37	90.23	24.56	6,516,726
√	√	√	√		86.23	92.85	38.26	9,102,563
√	√	√	√	√	87.51	93.85	40.58	10,765,214

Table 5. Comparison of different models.

Model	mIOU/%	mPA/%	Weight File/MB	Parameters
Unet	84.54	91.49	94.62	24,712,178
DeeplabV3+	85.43	92.25	179.36	42,586,202
PSPnet	81.91	87.37	68.58	17,616,726
Segnet	77.95	84.50	47.17	12,640,107
MCAC-Unet	87.51	93.85	40.58	10,765,214

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Gong, H.; Xiao, L.; Wang, X. Segmentation and Coverage Measurement of Maize Canopy Images for Variable-Rate Fertilization Using the MCAC-Unet Model. Agronomy 2024, 14, 1565. https://doi.org/10.3390/agronomy14071565

AMA Style

Gong H, Xiao L, Wang X. Segmentation and Coverage Measurement of Maize Canopy Images for Variable-Rate Fertilization Using the MCAC-Unet Model. Agronomy. 2024; 14(7):1565. https://doi.org/10.3390/agronomy14071565

Chicago/Turabian Style

Gong, Hailiang, Litong Xiao, and Xi Wang. 2024. "Segmentation and Coverage Measurement of Maize Canopy Images for Variable-Rate Fertilization Using the MCAC-Unet Model" Agronomy 14, no. 7: 1565. https://doi.org/10.3390/agronomy14071565

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Segmentation and Coverage Measurement of Maize Canopy Images for Variable-Rate Fertilization Using the MCAC-Unet Model

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Collection and Preprocessing

2.1.1. Determining the Sampling Height

2.1.2. Image Acquisition

2.1.3. Maize Canopy Image Dataset Creation

2.1.4. Dataset Construction

2.1.5. Preprocessing of Maize Canopy Images

2.2. Maize Canopy Image Segmentation Based on the MCAC-Unet Model

2.3. The MCAC-Unet Semantic Segmentation Model

2.3.1. The Unet Semantic Segmentation Model

2.3.2. Lightweight Backbone Network

2.3.3. CBAM Convolutional Attention Mechanism

2.3.4. Atrous Spatial Pyramid Pooling

2.3.5. CARAFE Feature Upsampling Factor

2.4. Model Training

2.4.1. Experimental Platform and Parameter Settings

2.4.2. Model Evaluation Metrics

2.4.3. Canopy Coverage Calculation

3. Results

3.1. Test Results of Different Backbone Networks

3.2. Improved Network Segmentation Performance

3.3. Ablation Study

3.4. Test Results of Different Network Models

3.5. Evaluation of Maize Crop Canopy Coverage

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI