A New Winter Wheat Crop Segmentation Method Based on a New Fast-UNet Model and Multi-Temporal Sentinel-2 Images

Awad, Mohamad M.

doi:10.3390/agronomy14102337

Open AccessArticle

A New Winter Wheat Crop Segmentation Method Based on a New Fast-UNet Model and Multi-Temporal Sentinel-2 Images

by

Mohamad M. Awad

National Council for Scientific Research (CNRS-L), Beirut 11072260, Lebanon

Agronomy 2024, 14(10), 2337; https://doi.org/10.3390/agronomy14102337

Submission received: 17 September 2024 / Revised: 7 October 2024 / Accepted: 8 October 2024 / Published: 10 October 2024

(This article belongs to the Special Issue Application of Deep and Machine Learning in Crop Monitoring and Management)

Download

Browse Figures

Versions Notes

Abstract

:

Mapping and monitoring crops are the most complex and difficult tasks for experts processing and analyzing remote sensing (RS) images. Classifying crops using RS images is the most expensive task, and it requires intensive labor, especially in the sample collection phase. Fieldwork requires periodic visits to collect data about the crop’s physiochemical characteristics and separating them using the known conventional machine learning algorithms and remote sensing images. As the problem becomes more complex because of the diversity of crop types and the increase in area size, sample collection becomes more complex and unreliable. To avoid these problems, a new segmentation model was created that does not require sample collection or high-resolution images and can successfully distinguish wheat from other crops. Moreover, UNet is a well-known Convolutional Neural Network (CNN), and the semantic method was adjusted to become more powerful, faster, and use fewer resources. The new model was named Fast-UNet and was used to improve the segmentation of wheat crops. Fast-UNet was compared to UNet and Google’s newly developed semantic segmentation model, DeepLabV3+. The new model was faster than the compared models, and it had the highest average accuracy compared to UNet and DeepLabV3+, with values of 93.45, 93.05, and 92.56 respectively. Finally, new datasets of time series NDVI images and ground truth data were created. These datasets, and the newly developed model, were made available publicly on the Web.

Keywords:

winter wheat; Fast-UNet; vegetation indices; convolutional neural network; fieldwork; segmentation; semantic

1. Introduction

Precise information about agricultural activities and the way lands are being exploited can help decision-makers plan for handling food security issues and avoid crises. It can help in advising farmers on how to make more profit by exploiting them correctly. However, due to the complexity of crop identification and separation tasks, most known classification and segmentation methods depend heavily on fieldwork. Normally, field sample collection requires undefined time and resources. The quality of the collected information depends on the expertise and perseverance of the person in charge of the sample collection.

The success of machine learning algorithms such as Support Vector Machine [1], Random Forest [2], and current large Convolutional Neural Network algorithms, such as ResNet50 [3], VGG16 [4], AlexNet [5], and GoogleNet [6], depends on the availability of large collected training samples. Indeed, most of these models are slow and demand extensive computer resources and time.

Many researchers tried to improve crop segmentation or classification tasks using different algorithms and data. Sonmez et al. [7] explored the classification of wheat varieties and hybrids using two deep learning models, MobileNetv2 and GoogleNet. These models achieved impressive classification accuracy, with MobileNetv2 reaching 99.26% and GoogleNet achieving 97.41%. In the second scenario, the deep features received from these models were classified using the Support Vector Machine (SVM). In the classification made with the MobileNetv2-SVM hybrid model, an accuracy of 99.91% was achieved. Gill et al. [8] used a combination of different methods to classify wheat crops in India. The system consisted of deep-learning CNN, RNN, and LSTM applications to classify wheat crops. The results showed that the test accuracy ranged from 85% to 95.68%. El Sakka et al. [9] used a Convolutional Neural Network with different vegetation indices to improve crop classification, including wheat. The authors demonstrated the importance and efficiency of using such advanced Artificial Neural Networks for agriculture precision.

Kussul et al. [10] classified a large area into 11 classes, one of which was wheat. The authors used Landsat-8 and Sentinel-1 from October 2014 until September 2015. To verify their data, different samples were collected based on field surveys. These samples were divided equally between training and validation sets. They used two different CNNs that explored spectral and spatial features, respectively. On average, the use of CNNss reached an accuracy of 85% on major crops such as wheat.

UNet [11] was rarely used in the literature to segment wheat crops except in a study conducted by Zaji et al. [12], who combined UNet with other existing CNN models, such as ResNet50 and VGG16, to count the wheat spikes for wheat crop yield estimation. UNet was also combined with other known unsupervised Artificial Neural Network (ANN) models, such as SOM-UNet [13], that showed promising results in segmenting urban and peri-urban forests.

Other literature classified wheat based on the phenology of the crops. Mashonganyika et al. [14] adopted sensitivity testing, receiver operating characteristics, and the area under the curve (AUC) to measure the model’s efficiency. Many sample points were randomly collected from wheat fields. The correlation between the area reported by farmers and the calculated area was positive, with an R² value of 0.98 and a root-mean-square error of 2.23 ha. Another method by Parkash and Dubey [15] used NDVI calculated from 16 different Sentinel-2 images to differentiate between wheat and other land cover classes in the Saharanpur District, India.

In this introduction, the literature attempted to use advanced models of machine learning and Artificial Neural Networks (ANN) to improve wheat segmentation. Others used vegetation indices alone or combined with ANN models, such as Convolutional Neural Network (CNN) models. However, no research provided a definite and reliable solution that reduces field work and improves the accuracy of the classification with no or minimum supervision. Moreover, no one approached or solved the computer resource and time requirements for the known CNN models.

In this research, three targets were achieved: identifying true wheat fields using the Normalized Difference Vegetation Index (NDVI) and field crop limits was the first target. The second target was creating datasets that consisted of two different image sizes, and labels. These datasets identified the wheat fields to use in the training of different CNN models. Finally, the last target was creating a modified U-Net (Fast-UNet) model to reduce resource demand and improve wheat crop segmentation.

In this article, after Section 1, Section 2 describes the area of study, Section 3 contains the data and methods, Section 4 presents the experimental results, Section 5 is the discussion, and Section 6 provides the conclusions.

2. Area of Study

Bekaa Plain is known to be the largest agricultural area in Lebanon. The plain is considered the dividing line between the western Lebanon mountains and the eastern Lebanon mountains (Figure 1). The length of the plain is approximately 120 km, and its width ranges between 10 and 23 km. Its area is approximately 1283 km². It extends between the Lebanon mountain range to the west and the eastern Lebanon mountain range to the east, the Syrian border to the north, and the Qaraoun Lake. This fertile plain is characterized by an abundance of water, as it contains the Litani River (the largest internal river in Lebanon) and the Orontes River (the longest river shared between Lebanon, Syria, and Turkiye), and it is also rich in groundwater. The region receives limited rainfall, particularly in the north, because Mount Lebanon creates a rain shadow that blocks precipitation coming from the sea. The northern section has an average annual rainfall of 230 mm (9.1 in), compared to 610 mm (24 in) in the central valley [16].

The major agricultural crop in the plain is wheat, followed by potato, grapevine, etc.

3. Materials and Methods

The main satellite imagery that was used in this research was Sentinel-2. It is considered important, and free optical remote sensing satellite data are provided by the European Space Agency (ESA, Paris, France). Sentinel-2A and Sentinel-2B were launched in June 2015 and March 2017, respectively [17]. They have a spatial resolution that varies between 10 m and 60 m depending on the wavelength. Sentinel-2A has a temporal resolution of 10 days, which can become 5 days with the combination of Sentinel-2B. The clipped image has a size of 6764 × 9018 pixels and comprises bands 3, 4, and 8, which correspond to green, red, and near-infrared. The bands were chosen for their high spatial resolution (10 m) and their representation of the crop’s photosynthesis process. To extract the required area, I used Google Earth Engine’s (GEE) Sentinel-2 dataset and computation facilities. Sentinel-2 images were selected from March to July 2021 and 2023 for two reasons: to reduce the cloud cover effect (less than 5% of the image size) and to separate wheat from other vegetation based on the phenology progress of these crops. Normally, farmers prefer to make a turnaround between planting winter wheat and potatoes; for this reason, recent odd years were selected.

The field crop boundaries of the Bekaa plain were also used to create samples in the training of the Fast-UNet to segment wheat crops (Figure 2). The data were obtained by digitizing a very high-resolution panchromatic WorldView-3 image.

3.1. Creating Samples for Training

The collected images for each year represent different parts of the area of the study. In certain cases, clouds above the 5% threshold can obstruct the complete view of these parts. My scripts in Java for Google Earth Engine (GEE) allowed me to mosaic chosen images and confirm their full coverage of the study area. If they did not cover the complete area, they were discarded.

The collected images were used to create new bands of the Normalized Difference Vegetation Index (NDVI) based on the following Equation (1):

N D V I = \frac{B a n d 8 A - B a n d 4}{B a n d 8 A + B a n d 4}

(1)

These bands were stacked to create one image for each year; the objective was to distinguish winter wheat from other crops using its phenology. First, thousands of random points intersected with each image representing a stack of NDVI layers. Then, these points were submitted to a function to extract winter wheat representative points. The function worked on comparing the time series values with the expected values during the growth stages of the winter wheat (see pseudocode below in Table 1). Then, these points were intersected with the field crop limits to extract the polygons representing the wheat fields for each year. These polygons were combined with the study area limits and converted to raster bands such that each band represents wheat fields for a specific year. Finally, the clipped stacked NDVIs that were created from Sentinel-2 A and B images for the years 2021 and 2023 and their corresponding wheat fields (labels) were divided into sub-images of sizes 128 × 128 × N and 128 × 128 × 1. Here, N can range from 3 to over 9 bands. The sub-images were organized into two distinct folders: “Labels” and “Source”. Finally, the sub-image without information was filtered out and converted to “PNG” format by selecting 3 bands out of N representing the first, mid, and end-month NDVI layers. The following graph (Figure 3) shows these steps.

3.2. The New Model Fast-UNet

To improve crop type identification, there is a need for an efficient and reliable Convolutional Neural Network (CNN) semantic model. UNet is a reliable, known fully connected symmetrical CNN segmentation model that was created by Ronneberger et al. [11] to classify medical images. The following graph (Figure 4) shows the UNet structure. It comprises an encoder network followed by a decoder network. Besides the discrimination at the pixel level requirement, UNet semantic segmentation also requires a mechanism to project the discriminative features learned at different stages of the encoder onto the pixel space.

The goal is to semantically project the discriminative features (lower resolution) that the encoder learns onto the pixel space (higher resolution) to obtain a compact and dense segmentation. The decoder consists of upscaling concatenation followed by regular convolution operations. UNet takes its name from the architecture, which, when visualized, appears similar to the letter U, as shown in Figure 4. UNet contains only fully convolutional layers and does not contain any dense layers because it can accept images of any size [18]. Figure 5 shows the Fast-UNET model for the current research. In the figure, the inputs are sub-images of size 256 × 256 × N; the same applies to the label sub-images. The output is 256 × 256 × 1, showing the wheat fields. One can easily notice that the original UNet (Figure 4) uses large filters up to 1024; in contrast, the Fast-UNet uses up to 512 filter sizes that can be contracted or expanded to a lower or higher number of filters. The modification worked on reducing the number of convolutional layers by adding a structured convolutional block function that reduces the number of required parameters. Introducing the new convolutional structure reduced the number of required parameters from more than 32 million to less than 8.3 million parameters.

Each standard convolution process was activated by a ReLU activation function (Equation (2)). A ReLU layer performs a threshold operation on each element of the input, where any value less than zero is set to zero [19]. Max pooling performs downsampling by dividing the input into rectangular pooling regions and computing the maximum of each region.

ReLU (x) = m a x (0, x)

(2)

In the upsampling process, a transposed 2-D convolution layer was used, followed by a depth concatenation layer that takes inputs that have the same height and width and concatenates them along the third dimension (the channel dimension).

Finally, a sigmoid function was used in the UNet model, which can be described as any mathematical function whose graph has a characteristic S-shaped or sigmoid curve. A common example of a sigmoid function is the logistic function (Equation (3)).

σ (x) = \frac{1}{1 + e^{- x}}

(3)

There are several reasons for using the sigmoid function in this research, The sigmoid function squashes the output to a range between 0 and 1, which can be interpreted as a probability. This is particularly useful for binary segmentation where each pixel needs to be segmented as belonging to a class (e.g., foreground) or not (e.g., background). It introduces non-linearity into the model, allowing the neural network to learn more complex decision boundaries [20]. The sigmoid function has a smooth gradient, which helps in backpropagation by providing useful gradients for updating the weights during training [21].

In this research, Binary Cross-Entropy (BCE) was deployed because it is a commonly used loss function in neural networks, particularly when dealing with binary segmentation tasks. It is especially handy when the output layer of the network uses a sigmoid activation function (which maps values to the range [0, 1]). The goal of BCE loss is to train the model to distinguish between two classes (usually class 0 and class 1) by minimizing the dissimilarity between predicted probabilities and true labels [22] (Equation (4)).

L o s s = \frac{- 1}{N} \sum_{i = 1}^{M} [y_{t r u e} \log (y_{p r e d}) + (1 - y_{t r u e}) \log (1 - y_{p r e d})

(4)

Here’s what each part of the formula means:

M: This is the total number of samples in your dataset.

y_{t r u e}

: This is the actual label for each sample. In a binary classification problem, this would be either 0 or 1.

y_{p r e d}

: This is the predicted probability that the sample belongs to the positive class (class 1). It is the output of your model.

Log: This is the natural logarithm function.

The new specifications of Fast-UNet can be stated briefly. The UNet structure was modified to create flexibility and modularity that provided an integrated way to build the UNet architecture. This made it easier to experiment with different configurations, such as changing the number of filters or adding dropout. Including dropout layers in the UNet helped in preventing overfitting by randomly dropping units during training. This ensured that the model generalizes better to unseen data. Typically, dropout was applied after certain layers in the contracting path (encoder) of UNet: 1-After convolutional layers, 2-After batch normalization, sometimes after ReLU activation functions.

In this research, the dropout layer was used after the convolution layers and before the max pooling layer. This is shown in the graph as a yellow arrow (Figure 5).

The goal is to encourage the network to learn from different perspectives by deactivating specific neurons during training. This robustness helped prevent overfitting.

By including the optional max pooling layers in UNet, the spatial dimensions of the feature maps were reduced. It is an essential step for downsampling in the contracting path of UNet. This helped in capturing more abstract representations of the input. To improve UNet, skip connections were used to combine low-level features from the contracting path with high-level features in the expansive path (black arrow in Figure 5). This preserved these low-level features, which improved the localization of object boundaries in the final segmentation map [23,24].

In general, the structure of Fast-UNet is slightly different from the original. However, the main difference is the flexibility in changing the number of filters, as depicted in Figure 5 as M filters. Instead of having a fixed number of filters, M can be changed depending on the complexity of the problem.

3.3. Accuracy Estimation of the Models

The accuracy of the results is computed based on the confusion matrix method [25]. The matrix is of size m × m associated with a classifier that shows the predicted and observed classification (Table 2).

Using Table 2, computing the overall accuracy (OA) for each experiment is based on Equation (5).

O A = \frac{{T 1}_{T r u e} + {T 2}_{T r u e}}{{T 1}_{T r u e} + {T 2}_{T r u e} + {T 1}_{F a l s e} + {T 2}_{F a l s e}}

(5)

T1 and T2 can be actual or predicted samples, and these samples are either positive when the actual sample matches the predicted or negative otherwise.

4. Results

In this research, a reliable and efficient Lenovo LOQ 15IRH8 computer model with sufficient resources was deployed to test different models. The computer consisted of an Intel Core I7 CPU, an NVIDIA GeForce RTX graphics card, 16 gigabytes of memory, and a 500 gigabyte hard disk. The software and packages used in this research were the Windows 11 operating system, Google Earth Engine (GEE), and Google CoLab.

The Fast-UNet was tested on multi-temporal Sentinel-2 images from the combined years 2021 and 2023. The dates were chosen due to prior fieldwork having been conducted in those specific years, which assisted in verifying the labels. The first step in the experiments was creating the datasets of NDVI sub-images extracted from Sentinel-2 and the datasets of labels for the years 2021 to 2023. The creation of these datasets required different complex steps as described in the previous sections to keep only sub-images with data and to remove empty ones (zero values and no data values). Moreover, the sub-images were compared in both the source and the label datasets to remove uncommon (unshared) sub-images. In our case, the total number of sub-images for both years was 7526 for each dataset and was reduced to 373 sub-images for each dataset. The sub-images were shuffled, rotated, and resized from 128 × 128 to 256 × 256 pixels. The sub-images were divided into three sets: 75% for training, 15% for validation, and 10% for testing. The images in Figure 6 illustrate examples from these datasets alongside the predicted images generated by Fast-UNet. The field limits that were used in creating the samples, as explained in Section 3.1, were not as accurate as the field limits in the image, which is why there were differences. Moreover, the segmentation model takes into consideration the spatial location of the pixel and its RGB values (NDVI values) in the segmentation process. This explains why we see more objects segmented as wheat.

Fast-UNet was run using different settings, such as changing several epochs, buffer sizes, learning rates, and optimizers. The best combination was using an SDG optimizer with a learning rate of 10⁻³, batch size equal to 16, epoch number equal to 20, and finally 19 steps per epoch. In general, to determine the number of cases needed to test all possible combinations of (n) parameters, it is necessary to consider the number of different values each parameter can take. If each parameter can take (k) different values, the total number of combinations is (kⁿ). For example, if you have five parameters (optimization method, batch size, number of epochs, learning rate, image size), and assume each can take two different values, the total number of combinations would be (5² = 25 combinations).

This approach is exhaustive, meaning it tests every possible combination of parameter values. While this ensures thorough testing, it can be computationally expensive, especially as the number of parameters and their possible values increase or decrease. Table 3 shows some of these tested setup parameters. To avoid excessive training of the model, a function called EarlyStopping was used (pseudo code in italic font) to force the training session to stop if the validation loss function value did not improve for five consecutive epochs.

Initialize EarlyStopping with:

- monitor = ‘val_loss’ # validation loss function

- patience = 5

- restore_best_weights = True

For each epoch in training:

Train the model on the training data

Evaluate the model on the validation data

If validation loss improves:

Save the current model weights

Reset the patience counter

Else:

Increment the patience counter

If the patience counter exceeds the specified patience:

Stop training

Restore the model weights to the best-observed weights.

Running Fast-UNet took about 861 s compared to more than 4100 s for UNet. The accuracy of Fast-UNet is 93.49% compared to 93.18% for UNet, which is comparable to Fast-UNet. The following graphs (Figure 7a,b) show the progress of the training accuracy, validation accuracy, and loss values for both Fast-UNet and UNet.

One can notice from the graphs that Fast-UNet converges smoothly with consistency between training and validation data; the same applies to loss function values. Examining the UNet graphs, it is evident that both the training and validation accuracies have plateaued at a single value, indicating that UNet has become trapped in a local optimum. The loss functions for both the training and validation data of the UNet model exhibited the same issue. Fast-UNet was also evaluated against DeepLabV3+ [26], recognized as the leading semantic segmentation model [27]. The model was developed by Google, and it builds upon the original DeepLabV3 architecture with several key improvements, making it highly effective for tasks like image segmentation.

The settings of DeepLabV3+ were similar to those of Fast-UNet and UNet. However, its training time was over 1090 s longer than Fast-UNet, and its accuracy was 92.9% lower compared to both Fast-UNet and UNet. As shown once again in Figure 7c, the training and validation accuracies, as well as the loss values for DeepLabV3+, are neither homogeneous nor consistent. The validation accuracy began at lower values compared to the training accuracy, while the loss values were higher than those of the training data. This indicates that DeepLabV3+ was once again trapped in a local minimum until the 9th epoch and was unable to recover. Table 4 provides details of the first segmentation experiment. It is important to note that the accuracy calculated using the confusion matrix is based on the test samples and the segmented images produced by the models. The initial two parts (75% for training and 15% for validation) were utilized during the training sessions.

Additional experiments were conducted with Fast-UNet, UNet, and DeepLabV3+, using images sized 128 × 128 × 3 and labels sized 128 × 128 × 1. The new Fast-UNet model took 261 s to complete 20 epochs and converge. Figure 8 illustrates the segmentation results of Fast-UNet, once again demonstrating the model’s efficiency with an accuracy of 93.4%. The UNet model was executed on the same datasets, taking over 1050 s to achieve convergence. This time, the UNet model became stuck in local optima, resulting in an accuracy of 92.91%, which is slightly lower than the new model. Figure 9a,b show that the loss function values for UNet were higher than those for Fast-UNet, and the reduction in these values was neither smooth nor gradual. This suggests that UNet was either overfitting or underfitting. This phenomenon is referred to as a stagnant loss function [28], which occurs when the model fails to learn effectively, leading to overfitting. Fast-UNet mitigated this issue by incorporating additional layers, such as dropout layers, and using a lower number of filters. Running DeepLabV3+ to segment the same datasets took over 324 s. After 20 epochs, the highest accuracy achieved was 92.27%. This indicates that Fast-Net is the fastest and most accurate model, avoiding the overfitting issues seen with UNet. Table 5 provides detailed results of the second experiment.

This further demonstrates that Fast-UNet’s stability was superior to other models, as it neither became stuck in local optima nor encountered overfitting issues. Furthermore, the speed of Fast-UNet remained unaffected despite the increased complexity of the problem, such as raising the resolution of the sub-images from 128 × 128 to 256 × 256. However, both UNet and DeepLabV3+ encountered issues with unstable loss and accuracy flows, resulting in suboptimal final solutions. This was demonstrated by testing these models on various images, revealing that their outputs are not as accurate as their training results. Based on the above experiments, it can be concluded that Fast-UNet is the fastest, most stable, and most accurate model for segmenting images of background and winter wheat, outperforming both UNet and DeepLabV3+.

5. Discussion

The structure of Fast-UNet made it robust against overfitting and local optima, unlike UNet and DeepLabV3+. The addition of dropout layers and a variable number of filters helped Fast-UNet prevent these issues. Although the new model demonstrated superiority over the other semantic segmentation models, its accuracy only slightly improved compared to the others. Even a small improvement in segmentation accuracy, like the one reported for Fast-UNet over UNet and DeepLabV3+, can be significant in the following real-world crop segmentation scenarios

Precision in Agricultural Practices
- Yield estimation: higher accuracy in segmenting crops can lead to more precise yield estimates, which are crucial for planning and logistics.
- Disease detection: accurate segmentation helps in early detection of diseases or pests, enabling timely intervention and reducing crop loss.
Resource Management
- Water and Fertilizer Application: Precise segmentation allows for targeted application of water and fertilizers, optimizing resource use and reducing waste.
- Harvesting: Automated harvesting systems rely on accurate segmentation to identify and pick crops efficiently, minimizing damage and maximizing yield.
Economic Impact
- Cost Savings: Even small improvements in accuracy can translate to significant cost savings by reducing the need for manual labor and improving the efficiency of automated systems.
- Market Value: Higher accuracy in crop quality assessment can ensure better market prices by accurately grading the produce.
Environmental Benefits
- Sustainable Practices: Accurate segmentation supports precision agriculture, which promotes sustainable farming practices by minimizing the environmental footprint.
- Reduced Chemical Use: By accurately identifying areas that need treatment, farmers can reduce the use of pesticides and herbicides, leading to a healthier ecosystem.

However, to enhance accuracy and prevent potential failures of Fast-UNet, future research must address several shortcomings, including:

-: Reduced Flexibility: Fast-UNet is optimized for speed and efficiency, which might limit its flexibility in handling diverse and complex datasets. This can be a drawback when dealing with highly variable data.
-: Potential for Lower Accuracy: The emphasis on speed can sometimes come at the cost of accuracy. Fast-UNet might not perform as well as more complex models like the original U-Net or other advanced architectures in terms of segmentation accuracy.
-: Limited Customization: Fast-UNet’s architecture is designed to be streamlined, which might restrict the ability to customize or tweak the model for specific tasks or datasets.

To address these limitations, additional resources are required, including enhanced computational power. Additionally, various modifications to the model’s structure, such as its depth, width, and size, need to be tested.

6. Conclusions

In this research, a newly modified UNet (Fast-UNet) was implemented to segment winter wheat from time series Sentinel-2 images for the years 2021 and 2023. These images were converted to NDVI and utilized to identify wheat fields by tracking the wheat phenology from sowing to harvesting. The identified wheat fields were used as a new dataset to provide labels for training Fast-UNet, UNet, and DeepLabV3+. The comparison results demonstrated that Fast-UNet is the fastest and most accurate model among the three. Moreover, Fast-UNet did not become trapped in local minima or experience overfitting or underfitting issues, unlike the other two models. The research generated a dataset of sub-images with five bands, but only three were selected to meet the requirements for creating “PNG” images. Therefore, the sub-images were sized at 128 × 128 × 3, representing the NDVI for different months of each year. Another dataset consisted of sub-images sized at 128 × 128 × 1, representing the training labels or samples. Future improvements for this research include extending the study to cover more years, incorporating additional models for testing, and enhancing Fast-UNet with more advanced features. The progress of this work will depend on the availability of support to obtain advanced hardware and software components.

Funding

This research received no external funding.

Data Availability Statement

The datasets were made available on the IEEE data port https://ieee-dataport.org/documents/winter-wheat (accessed on 16 September 2024) and the code was made available on GitHub https://github.com/ma850419/Fast_UNet/blob/main/Fast_UNet_wheat_winter.ipynb (accessed on 16 September 2024).

Acknowledgments

The author thanks CNRS-L for supporting the research by providing the technical means to complete this research.

Conflicts of Interest

The author declares no conflict of interest.

References

Foody, G.; Mathur, A. A relative evaluation of multiclass image classification by support vector machines. IEEE Trans. Geosci. Remote Sens. 2004, 42, 1335–1343. [Google Scholar] [CrossRef]
Gwal, S.; Singh, S.; Gupta, S.; Anand, S. Understanding Forest biomass and net primary productivity in Himalayan ecosystem using geospatial approach. Model. Earth Syst. Environ. 2020, 6, 10. [Google Scholar] [CrossRef]
Sarwinda, D.; Paradisa, R.; Bustamam, A.; Anggia, P. Deep Learning in Image Classification using Residual Network (ResNet) Variants for Detection of Colorectal Cancer. Procedia Comput. Sci. 2021, 179, 423–431. [Google Scholar] [CrossRef]
Tao, J.; Gu, Y.; Sun, J.; Bie, Y.; Wang, H. Research on VGG16 convolutional neural network feature classification algorithm based on Transfer Learning. In Proceedings of the 2nd China International SAR Symposium (CISS), Shanghai, China, 3–5 November 2021; pp. 1–3. [Google Scholar]
Singh, I.; Goyal, G.; Chandel, A. AlexNet architecture based convolutional neural network for toxic comments classification. J. King Saud Univ. Comput. Inf. Sci. 2022, 34, 7547–7558. [Google Scholar] [CrossRef]
Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar]
Sonmez, M.E.; Sabanci, K.; Aydin, N. Convolutional neural network-support vector machine-based approach for identification of wheat hybrids. Eur. Food Res. Technol. 2024, 250, 1353–1362. [Google Scholar] [CrossRef]
Gill, H.S.; Bath, B.S.; Singh, R.; Riar, A. Wheat crop classification using deep learning. Multimed. Tools Appl. 2024. [Google Scholar] [CrossRef]
El Sakka, M.; Mothe, J.; Ivanovici, M. Images and CNN applications in smart agriculture. Eur. J. Remote Sens. 2024, 57, 2352386. [Google Scholar] [CrossRef]
Kussul, N.; Lavreniuk, M.; Skakun, S.; Shelestov, A. Deep learning classification of land cover and crop types using remote sensing data. IEEE Geosci. Remote Sens. Lett. 2017, 14, 778–782. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015, Proceedings of the 18th International Conference, Munich, Germany, 5–9 October 2015; Navab, N., Hornegger, J., Wells, W., Frangi, A., Eds.; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2015; Volume 9351. [Google Scholar] [CrossRef]
Amirhossein, Z.; Zheng, L.; Gaozhi, X.; Pankaj, B.; Jatinder, S.; Yuefeng, R. Wheat spike localization and counting via hybrid UNet architectures. Comput. Electron. Agric. 2022, 203, 107439. [Google Scholar] [CrossRef]
Awad, M.M.; Lauteri, M. Self-Organizing Deep Learning (SO-UNet)—A Novel Framework to Classify Urban and Peri-Urban Forests. Sustainability 2021, 13, 5548. [Google Scholar] [CrossRef]
Mashonganyika, F.; Mugiyo, H.; Svotwa, E.; Kutywayo, D. Mapping of Winter Wheat Using Sentinel-2 NDVI Data. A Case of Mashonaland Central Province in Zimbabwe. Front. Clim. 2021, 3, 715837. [Google Scholar] [CrossRef]
Prakash, P.; Dubey, V. Wheat Crop Classification based on NDVI using Sentinel Time Series: A Case Study Saharanpur Region. In Proceedings of the 2021 International Conference on Computing, Communication and Green Engineering (CCGE), Pune, India, 23–25 September 2021; pp. 1–6. [Google Scholar] [CrossRef]
Darwish, T.; Jomaa, I.; Awad, M.; Boumetri, R. Preliminary contamination hazard assessment of land resources in Central Beka plain of Lebanon, Leban. Sci. J. 2008, 9, 3–15. [Google Scholar]
Li, J.; Roy, D.A. Global analysis of Sentinel-2A, Sentinel-2B and Landsat-8 data revisit intervals and implications for terrestrial monitoring. Remote Sens. 2017, 9, 902. [Google Scholar] [CrossRef]
Hesamian, M.; Jia, W.; He, X.; Kennedy, P. Deep Learning Techniques for Medical Image Segmentation: Achievements and Challenges. J. Digit. Imaging 2019, 32, 582–596. [Google Scholar] [CrossRef] [PubMed]
Nair, V.; Hinton, G.E. Rectified linear units improve restricted Boltzmann machines. In Proceedings of the ICML’10: 27th International Conference on International Conference on Machine Learning, Haifa, Israel, 21–24 June 2010; pp. 807–814. [Google Scholar]
Han, J.; Morag, C. The influence of the sigmoid function parameters on the speed of backpropagation learning. In From Natural to Artificial Neural Computation; José, M., Francisco, S., Eds.; Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 1995; Volume 930, pp. 195–201. ISBN 978-3-540-59497-0. [Google Scholar] [CrossRef]
Langer, S. Approximating smooth functions by deep neural networks with a sigmoid activation function. J. Multivar. Anal. 2020, 182, 104696. [Google Scholar] [CrossRef]
Van Beers, F.; Lindström, A.; Okafor, E.; Wiering, M. Deep Neural Networks with Intersection over Union Loss for Binary Image Segmentation. In Proceedings of the 8th International Conference on Pattern Recognition Applications and Methods (ICPRAM 2019), Prague, Czech Republic, 19–21 February 2019; pp. 438–444. [Google Scholar] [CrossRef]
Korkmaz, F. U-Net##: A Powerful Novel Architecture for Medical Image Segmentation. In Medical Imaging and Computer-Aided Diagnosis, Proceedings of the 2022 International Conference on Medical Imaging and Computer-Aided Diagnosis (MICAD 2022); Leicester, UK, 20–21 November 2022, Su, R., Zhang, Y., Liu, H., Frangi, A.F., Eds.; Lecture Notes in Electrical Engineering 2023; Springer: Singapore, 2022; Volume 810. [Google Scholar] [CrossRef]
Sun, Q.; Qu, F. CPF-UNet: A Dual-Path U-Net Structure for Semantic Segmentation of Panoramic Surround-View Images. Appl. Sci. 2024, 14, 5473. [Google Scholar] [CrossRef]
Congalton, R. A review of assessing the accuracy of classifications of remotely sensed data. Remote Sens. Environ. 1991, 37, 35–46. [Google Scholar] [CrossRef]
Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the Computer Vision—ECCV 2018, Munich, Germany, 8–14 September 2018; Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Lecture Notes in Computer Science 2018. Springer: Berlin/Heidelberg, Germany, 2018; Volume 11211. [Google Scholar] [CrossRef]
Zhu, R.; Xin, B.; Deng, N.; Fan, M. Semantic Segmentation Using DeepLabv3+ Model for Fabric Defect Detection. Wuhan Univ. J. Nat. Sci. 2022, 27, 539–549. [Google Scholar] [CrossRef]
Whang, S.E.; Roh, Y.; Song, H.; Lee, J.-G. Data collection and quality challenges in deep learning: A data-centric AI perspective. VLDB J. 2023, 32, 791–813. [Google Scholar] [CrossRef]

Figure 1. Area of Study.

Figure 2. Field crop boundaries.

Figure 3. The processes of creating datasets for training the new model for identifying winter wheat.

Figure 4. General view of the UNet structure.

Figure 5. The Fast-UNet model that processes sub-images of size 256 × 256 × N.

Figure 6. Original 256 × 256 × 3, ground truth, and segmented images by Fast-UNet.

Figure 7. Accuracy and loss values for processing 256 × 256 × 3 images using (a) Fast-UNet, (b) UNet, and (c) DeepLabV3+.

Figure 8. Original, ground truth, segmented by Fast-UNet, and ground truth images (128 × 128 × 3).

Figure 9. Accuracy and loss values for processing 128 × 128 × 3 images by (a) Fast-UNet, (b) UNet, and (c) DeepLabV3+.

Table 1. Pseudocode of winter wheat sample selection process.

Read samples from the MS Excel file.
Create two limits the index of the first month and the index of the last month from the samples.
Get the maximum and minimum positive values of these months from samples
Selected samples as winter wheat representatives must be greater than 0 less than threshold1 and greater than threshold2
Write the selected samples to a CSV file including coordinates of longitude and latitude.

Table 2. Confusion matrix format.

		Predicted
		T1	T2
Actual	T1	True	False
Actual	T2	False	True

Table 3. Various configurations for Fast-UNet and their results using 256 × 256 × 3 sub-images.

Optimizer	Learning Rate	Batch Size	Epochs	Accuracy %	Time Sec.	Remarks
SGD	10⁻³	16	20	93.49	861	No anomalies
ADAM	10⁻³	16	20	93.43	1120	Val. Accuracy did not change (local optima)
SGD	10⁻⁴	16	20	81.90	1053	No anomalies
ADAM	10⁻⁴	16	20	92.98	1029	Val. Accuracy did not change (local optima)

Table 4. Performance of various segmentation models in the initial experiment.

Model Name	Accuracy %	CM Accuracy %	Loss Value	Time Elapsed Seconds
Fast-UNet	93.49	85.90	0.28	861
UNet	93.18	85.45	0.56	4100
DeepLabV3+	92.90	85.32	0.31	1090

Table 5. Performance of various segmentation models in the second experiment.

Model Name	Accuracy %	CM Accuracy %	Loss Value	Time Elapsed Seconds
Fast-UNet	93.40	81.30	0.29	261
UNet	92.91	82.10	0.65	1050
DeepLabV3+	92.27	80.66	0.29	324

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Awad, M.M. A New Winter Wheat Crop Segmentation Method Based on a New Fast-UNet Model and Multi-Temporal Sentinel-2 Images. Agronomy 2024, 14, 2337. https://doi.org/10.3390/agronomy14102337

AMA Style

Awad MM. A New Winter Wheat Crop Segmentation Method Based on a New Fast-UNet Model and Multi-Temporal Sentinel-2 Images. Agronomy. 2024; 14(10):2337. https://doi.org/10.3390/agronomy14102337

Chicago/Turabian Style

Awad, Mohamad M. 2024. "A New Winter Wheat Crop Segmentation Method Based on a New Fast-UNet Model and Multi-Temporal Sentinel-2 Images" Agronomy 14, no. 10: 2337. https://doi.org/10.3390/agronomy14102337

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A New Winter Wheat Crop Segmentation Method Based on a New Fast-UNet Model and Multi-Temporal Sentinel-2 Images

Abstract

1. Introduction

2. Area of Study

3. Materials and Methods

3.1. Creating Samples for Training

3.2. The New Model Fast-UNet

3.3. Accuracy Estimation of the Models

4. Results

5. Discussion

6. Conclusions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI