1. Introduction
Seed maturity measurement is an important factor for seed quality determination and is gaining interest as the agricultural sector of seed multiplication is facing modifications due to global warming, altering crop cycles [
1] and the germination quality of the seeds produced [
2,
3]. Quantification of seed maturity provides growers additional indicators to precisely monitor the evolution of their crops for selecting optimal harvest dates.
To maintain seed quality, several methods have been implemented for quality control. Jalink et. al. [
4] developed a non-destructive technique for measuring seed maturity based on laser-induced fluorescence (LIF). This approach measures fluorescence emissions of chlorophyll
A at a given wavelength (around 660 nm, corresponding to the absorption peak of chlorophyll
A). Using this procedure, chlorophyll fluorescence (CF) has been correlated with the quality of carrot, tomato, and cabbage seeds [
5,
6,
7]. Fluorescence emissions are negatively correlated with seed chlorophyll and seed germination quality [
8,
9]. CF estimation is an area of research tackled at various levels. It ranges from portable field equipment to laboratory spectrometers and computer vision approaches [
10]. More specifically, applied to seeds, computer vision approaches are mainly based on seed pigmentation information, geometric features [
11,
12] combined with infrared imaging [
13,
14], and near-infrared hyperspectral imaging [
15]. However, these techniques are limited to small samplings over the field and cannot reflect the heterogeneity of the field. Moreover, seed collection and preparation are time consuming. Unmanned aerial vehicles (UAVs) and multispectral imagery are emerging as an attractive approach for field-scale vegetation measurements with high spatial and temporal resolutions.
Indeed, vegetation monitoring based on UAV imagery combined with multispectral imagery has produced notable results in various applications such as yield estimation or vegetation vigor monitoring [
16,
17,
18,
19] and even forecasting vegetation evolution using temporal components [
20,
21]. To extract specific information from multispectral images, vegetation indexes (VIs) have been proposed, which result from arithmetic combinations of images of different wavelengths. In agriculture, they can highlight vegetation health, photosynthetic activity, and leaves’ chlorophyll concentration [
22,
23,
24,
25,
26,
27,
28]. The best-known indexes are the Normalized Difference Vegetation Index (NDVI) [
29] for chlorophyll content and the Normalized Difference Red-Edge (NDRE) for plant nitrogen diagnosis [
30]. Other indexes such as the Soil-Adjusted Vegetation Index (SAVI) or the Leaf Area Index (LAI) [
31] are also suitable for plant observation. Statistical methods and machine learning are also significant approaches that use VIs for building models to detect crop chlorophyll content [
32].
The past decade has seen the rapid development of deep learning approaches in many agriculture applications, such as weed management and disease detection [
33,
34,
35,
36,
37,
38]. Deep learning methods have achieved state-of-the-art accuracy results [
39] compared to other machine learning approaches such as support vector machine (SVM) [
40] or random forests [
41]. Each category generally addresses different use cases: multiclass classification to discriminate portions of vegetation in images [
42,
43,
44], regression to estimate the evolution of a specific marker [
45,
46,
47], and time series classification for the prediction of measurements, using recurrent neural networks (RNNs) [
48,
49]. Deep learning approaches have impressive capacities for data modeling, and their use for chlorophyll estimation from VIs seems an interesting approach.
However, the performance of machine learning approaches is strongly dependent on the amount of labeled data. Collecting samples and labeling a large amount of field data with the associated images is a very tedious and time-consuming task, if not impossible to complete. Semi-supervised learning is one of the most widely used approaches to address the dependency on large, labeled datasets, where the goal is to combine a small labeled dataset with a large set of unlabeled data [
50]. Semi-supervised learning deals with incomplete labeled data, while weakly supervised learning additionally deals with noisy data [
51,
52]. Our aim is to incorporate label uncertainty and small labeled datasets to build efficient deep network models through the concept of weakly supervised learning. However, despite the success of such a concept, how data are annotated remains an open problem that depends on the type of data and the application.
In this paper, we propose a new approach that considers generative models to associate an approximate CF value to an NDVI image in order to build a large labeled dataset for deep learning models. We used both parametric and non-parametric estimation techniques, namely: the Gaussian mixture model (GMM), K-nearest neighbors (KNN), and the kernel density estimator (KDE). The first step consisted of the acquisition of aerial images and the collection of seed samples from different locations in the crop fields. Correlation analysis between CF and NDVI was performed to identify relevant regression variables. Then, generative models were built with a few ground truth samples collected from the studied fields. The created labels were fed into both convolutional neural networks (CNNs) and long short-term memory (LSTM) to build regression models that predict chlorophyll fluorescence emissions as a function of (Date, NDVI). The proposed method proved to have a very interesting potential to deal with CF estimation from remote sensing images and to address the problem of labeling large amounts of data. The agronomic purpose of training a CF regression model is providing an additional indicator of present crop CF throughout the entire field surface to help growers and experts select an optimal harvest date.
This paper is organized as follows:
Section 2 describes the agronomic experimental protocol and the equipment used for the UAV data acquisition, followed by the exploratory analysis of the data. We then present our weak labeling approach and discuss the CF prediction performances of the trained models. Finally, we highlight the benefits, limitations, and perspectives of generative weak labeling for UAV maturity remote sensing.
2. Materials and Methods
2.1. Data Acquisition
The data acquisition campaign was conducted during the summer of 2020, from 21 July to 18 September, covering the seed filling period. Four parsley fields, located in the Centre-Val de Loire region, France, were considered, representing two types of parsley, curly and flat, and four different varieties. These field choices were made to observe potential variations between both parsley types and varieties during the seed maturation phase. The mean distance between each field is about 15 km. The data collected were of two types, ground seed samples and multispectral UAV images, both associated with a temporal dimension. As crop maturity varies rapidly in the weeks before harvest, a 3-day interval between the different data acquisitions was targeted. However, due to external factors such as weather or wind conditions preventing optimal UAV flights, the effective time step between flights varied from 3 to 6 days. This variable time interval between image acquisitions resulted in photographing the fields at different times of the day, under different weather and lighting conditions, adding image data variance.
2.1.1. Drone Multispectral Images
Aerial images were acquired at a flight height of 40 m, a height that was chosen as a trade-off between image resolution and sufficient image ground footprint coverage. At an altitude of 40 m, the ground sample distance between two consecutive pixels is 2.73 cm and the ground footprint of a single image covers 34.9 m × 26.2 m. Therefore, each 4 ha (depicted in
Figure 1) could be covered by the UAV in approximately 20 min. The on-board camera used was the Micasense RedEdge-MX. It is a 5-band multispectral sensor, and 5 images were simultaneously acquired at different wavelengths, as summarized in
Table 1.
This sensor was chosen as it covers both the wavelength used by the LIF method, which is centered on 660 nm (method for estimating seed maturity), and the spectral ranges corresponding to the photosynthetic absorption of plants. The multispectral camera measures the reflectance of light from different types of surfaces, soil, and vegetation. The values are directly influenced by the external lighting conditions. To compare the evolution between UAV images of the same field acquired at different times, the multispectral images needed to be calibrated. This was performed by using two external devices: the sunlight sensor, which measures both the Sun irradiation and the angle of incidence, and secondly, a photograph of a calibration panel whose reflectance surface is known. Once corrected, the multispectral images of 1280 × 960 pixels were assembled into an orthorectified image, as shown in
Figure 1, using the
Agisoft-Metashape-1.6.3 software, Agisoft LLC, Saint Petersburg, Russia, which corrects for image distortions and GPS errors using stereo image calibration between matched points. Successive images were acquired with a minimum longitudinal and lateral overlap of 70% and 30%, respectively, to have enough matched points in adjacent images. The overlap is usually increased depending on the vegetation density.
Pixel-corrected GPS positions in the assembled images allowed the same field acquired at different times during the seed filling period to be overlaid. By spatially aligning global field images at different time steps, we can better monitor the evolution of subareas through multispectral imagery with different UAV flights.
2.1.2. Ground Truth
The ground truth, for each parsley field, was composed of the pairing between UAV images and seed samples. To guarantee the correct pairing, physical control zones were set up, as illustrated in
Figure 2. For each studied parsley field, four 12 m
control zones were marked with ribbons. This ensured that seed samples were collected from the same field subzones during the sampling period and were easily identifiable on aerial images. The locations of the control zones were selected across the field to represent differences in crop maturity.
Thirty-two UAV flights were carried out to monitor the maturity of parsley seeds, resulting in 128 seed samples collected from the control zones. Each sample was composed of approximately 200 g of seeds, which were cleaned of debris before being processed. The maturity of seed samples was quantified by the non-destructive LIF method based on CF estimation. As plant and seed maturity increases, the CF value decreases (i.e., negative correlation) [
5]. The CF estimation machine provides an average CF value ranging from 0 to 10,000 pA as lower and upper theoretical bounds, which are not reached in practice. The CF values are expressed in pA (pico Amperes), as a photodiode was used to capture the fluorescence emissions. Once the CF falls below a certain threshold, the field is considered ready for swathing.
As can be observed in
Figure 3, left column, the CF values of the seed samples for each control zone and each field decreased over time. The observation time of the four fields varied because of the crop types (curly, flat) and varieties. Particular crop phenotypes of the same species mature at distinct speeds; therefore, the four fields were harvested at different times. This resulted in an unbalanced distribution of samples, with Fields B and C being under-sampled compared to Fields D and A.
The CF decrease was not steady and fluctuated with punctuated increased spikes, but following a global decrease. These CF fluctuations were influenced by external factors such as weather, soil type, or irrigation, but also by potential sampling errors, as some control zones became heterogeneous as crops matured. The CF trends of the control zones within the same field followed similar amplitude variations and maturation durations. Greater differences were observable between fields, with early varieties having faster and steeper diminutions in CF than late ones. With aerial images, we calculated the vegetation indexes corresponding to the control zones and associated these indexes with the estimated seed maturity of the ground samples.
2.2. Dataset
2.2.1. Data Preparation
Using multiple UAV flights and aligned orthorectified images, we can monitor the evolution of each field throughout the seed maturity period. Harvest dates were different from one field to another. We had four fields A, B, C, and D, with 15, 6, 4, and 7 aerial observations, respectively. The image dataset can be divided into two categories: a small labeled set of image samples that are matched with ground truth CF values and a large number of unlabeled samples for which we had no ground truth CF. The ground truth samples consisted of 128 CF/image pairs. The remaining unlabeled dataset contained 19,443 image samples, obtained by dividing the field image into 128 × 128 pixel tiles. This size was selected to be consistent with the control area size (i.e., resolution of 2.73 cm per pixel for an altitude of 40 m). In order to compare the influence of the image size, we also extracted 32 × 32 pixel tiles where each image contains at most a single parsley crop, on which we performed the same preparation as the 128 × 128 tiles input.
Since drone images of successive fields are aligned, each tile can be observed over a period, enabling the creation of a time series composed of sequential images of the same area. In this study, we limited the length of the time series to 4 consecutive observations as this is the maximum length we had for Field C. For the other three fields, we selected 4 series by sliding a window, with 4 observations, over the temporal dimension. Once the 128 × 128 images had been extracted from the ortho-photos, those that did not contain a minimum of 85% of vegetation were removed.
2.2.2. Correlation between CF and NDVI
The Normalized Difference Vegetation Index (NDVI) is an arithmetic imagery indicator computed using images acquired in two spectral wavelengths (red 668 nm and near-IR 842 nm) as expressed in Equation (
1). This index is widely used for monitoring vegetation vigor and plant health from satellite and UAV imagery. NDVI values range from 0 to 1, with 0 representing no vegetation presence and values of 0.8 and above representing maximum vegetation cover.
As for the seed CF samples, the NDVI evolution in the control zones was monitored until the field harvest. The decrease in NDVI at different time steps can be observed in
Figure 3, right column, and in
Figure 4, representing the index values using a heat map. We observed the decrease of the index values as the field seeds were maturing. To correlate the CF evolution with NDVI [
22], we computed the mean index value for each control zone. The NDVI during the maturation period also had a decreasing trend like the CF and showed similar variations at given time steps, as depicted side by side in
Figure 3. The NDVI amplitudes at the beginning of the study period varied depending on the crop variety. The NDVI and the CF both decreased over the seed filling period as the parsley crop produces its seeds near the end of its life cycle. As the seeds mature, the parsley crops dry, and their photosynthetic activity decreases.
The ground truths of the CF and NDVI dimensions presented similar behavior when considering individual fields. This similarity did not necessarily hold for different types of fields or varieties. Assuming a direct correlation between CF and NDVI would lead to a low-quality estimate as the Pearson coefficient was centered on 0.7812 when considering each field separately and dropped to 0.6460 when calculated for the four fields studied. It would probably decrease if more plant varieties were added. Since we aimed to allow for stronger correlation, we introduced the date as an extra dimension. As can be seen in the correlation matrix transcribed in
Table 2, the couple (Date, CF) had a negative Pearson coefficient of −0.6913 and the couple (Date, NDVI) a value of −0.5267 when using all samples from the four fields. A second correlation matrix, given in
Table 3, was also calculated, using the Spearman coefficient as a non-parametric measure of rank correlation, whereas the Pearson coefficient is a parametric measure of linear correlation between variables. Both Pearson and Spearman correlation matrices output similar results. The coefficients of the different dimensions having important values, their combination allows a better modeling of the distribution of the data represented by the samples.
Under the assumption of correlation between the target CF and the variables (Date, NDVI), we fit the ground truth samples with generative models that were used for the data labeling phase. The following section describes the parametric and non-parametric models used in this process.
2.3. Labeling Based on Generative Models
Generative models are usually combined with neural networks [
53,
54] for prediction or optimization of model hyperparameters [
55], but not directly for data labeling. Let us now describe the considered parametric and non-parametric methods and how their respective parameters were fit to our acquired ground truth dataset of (Date, NDVI, CF) using the python
scikit-learn machine learning library.
2.3.1. Gaussian Mixture Model
The Gaussian mixture model (GMM) is a parametric method combining
n Gaussians, where each Gaussian clusters a subdivision of data. The Gaussian fitting was performed by the iterative expectation maximization algorithm. Combining multiple Gaussians enables a better characterization of the data compared to a single Gaussian, as each subdistribution is locally approximated. When data are fit with Gaussians, soft clustering is performed, since each prediction is quantified by a probability and not by a continuous target value. The probability density function of a multivariate GMM is given by Equation (
2).
is a multivariate Gaussian defined in Equation (
3):
where
is a data vector.
C is the number of components. The
ith component parameters are the mean of
and the covariance matrix
. The mixture component weights are defined as
with the constraint that
so that the total probability distribution is normalized to 1. The covariance matrix type used was
full, meaning each component has its own general covariance matrix and can independently adopt any shape. The means and weights were initialized with k-means clustering.
Selecting the appropriate number of Gaussian components was performed by measuring the Bayesian information criterion (BIC) and Akaike information criterion (AIC) for multiple components. The BIC explicated in Equation (
4) measures the ratio between the likelihood and the number of parameters used, to determine if the likelihood gain is sufficient to justify the number of parameters.
where
L is the maximum value of the likelihood,
n is the number of data points, and
k is the number of estimated parameters, which are for the GMM the mean vectors, covariance matrices, and mixture weights. Better-performing models reduce the BIC indicator. In addition, the AIC evaluates how well a model fits the data it was generated from. The best models according to the AIC are those representing the highest variations while using the fewest independent variables. The AIC is expressed in Equation (
5) with similar parameters to those in Equation (
4). The advantage of these probabilistic model scores is that they do not require test data and can be evaluated on all samples and handle small datasets.
The selected number of Gaussians for the GMM was 3, which presented a good trade-off between both the BIC and AIC. Maximum values for each indicator were not retained as the BIC tends to select too simple models and, conversely, the AIC too complex ones. Values of the probabilistic indicators are summed up in
Table 4 for a varying number of Gaussians ranging from 1 to 5.
2.3.2. K-Nearest Neighbors
KNN is a non-parametric model that associates multiple variables by calculating the average of the numerical target of the KNN. The distance function used for fitting the KNN model was the Euclidean distance function given in Equation (
6), as it is widely used and adapted for continuous variable distance measuring.
where
and
are data points and
k the number of nearest neighbors. To improve performance, the data dimensions were rescaled between 0 and 1, which prevents biasing Euclidean distance measures. As the influence of the
k value is high, the optimal value was selected to minimize the regression error on the test data split. The error function used to evaluate the performance of the regression was the root-mean-squared error (RMSE). This metric was chosen as it keeps the error in the units of the variable of interest. Furthermore, the error is squared before being averaged, which penalizes larger errors. Equation (
7) expresses the RMSE error.
with
n being the number of data samples,
y the predicted target, and
Y the true label of the target. Overall, the KNN regression method is well suited to low-dimensional datasets, but loses its practicality as the number of features increases. The optimal value of
was retained by performing a grid search cross-validation across the data and for multiple values of
k. This was performed in order to minimize the prediction error and to limit the bias induced by the data splits during KNN model fitting.
2.3.3. Kernel Density Estimator
The kernel density estimator (KDE) is a probability density function estimator for random variables. For each dataset point, it evaluates its probability of belonging to a hypercube. The calculation of the number of points inside the hypercube is formulated in Equation (
8).
where
is the window function determining whether a dataset entry belongs to the hypercube or not. Knowing the points present inside the hypercube, we can estimate the probability density function of the dataset using Equation (
9).
with
N being the total number of samples,
the center of the hypercube,
the
nth data sample, and
h the bandwidth of the hypercube. The parameter
h has a strong influence on the resulting estimate and must be adapted alongside each data dimension if the data ranges vary. Therefore, we rescaled all our data in the range 0–1. The kernel type used was
Gaussian as it is a smoother function than other kernel types and is better suited for observations. For the parsley maturity application, we fit the KDE to our three-dimensional ground truth dataset composed of (Date, NDVI, CF) to estimate the resultant probability density function of the distribution. The optimal bandwidth selected for our dataset was
for each dimension. The parameter
h was computed by performing a grid search, that is
, with a step of
and by scoring the KDE fitting on unseen ground truth portions of the dataset. The KDE scoring was performed by computing the log-likelihood of the tests folds during the cross-validation on the ground truth samples.
2.4. Weak Data Labeling
As previously mentioned, we aimed to combine the generative models with the deep learning approach in order to improve both the prediction performance and generalization capabilities of deep networks for CF estimation. The originality of the proposed method lies in the use of generative models to provide weak CF labels to additional multispectral images, for which no ground truth data were collected. Generative labeling introduces bias in the data distribution. This inaccuracy allows for a wider range of potential pairs of CF and multispectral images to be covered, thus enabling a better representation of natural fluctuations in the fields. In addition, labeling enough samples of the dataset permits the use of a deep learning approach. The neural network will extract additional features from multispectral images that are not considered during the generative fitting.
It can be observed in
Figure 5 that the distribution of the generated weak labels varied between the methods, specifically for the KDE model. The labels generated by the GMM and KNN appear compact around the ground truths depicted in red. Therefore, these distributions consider less heterogeneous variations and focus on average variations, whereas the labels from the KDE span a larger bandwidth of potential data and include a variety of linked heterogeneous ground truths of the other fields. The KDE labeling was seen to be less restricted to the ground truth samples of one field, but was better suited to generating labels for different fields.
By fixing the variables (Date, NDVI) of the generative model, we can extract from the KDE a 1D histogram with the possible values of the CF (i.e., the NDVI variable consists of the image mean NDVI). We randomly picked a CF value from the 1D histogram and added the variance to this value in order to extend the range of CF, thus taking into account fluctuations and unseen data. The CF obtained was associated with the sample image to constitute its label. This procedure was applied for all unlabeled NDVI images. With these labeled data, we trained different neural networks in order to improve the CF prediction from the NDVI images.
To prevent overfitting during the association of the CF labels, the ground truths of the field being labeled were excluded from the data used for the generative model training. As can be seen in
Figure 6, middle plot, the generated CF values followed a similar trend as the ground truth from which they were generated (left plot), but were shifted towards lower values, which is logical as the acquisition duration for Field A lasted longer. The date boxplots for the generated labels and ground truth of Field A (right plot) are identical as the ground truths were harvested at each acquisition date. The generated CF labels covered a wider range than the ground truth, as we added uncertainty when associating the label.
2.5. Deep Neural Networks for CF Estimation
To build a deep learning model for CF estimation, we chose two types of popular architectures. The first one was based on convolutional networks (CNNs). In this scope, we opted for popular well-performing architectures, namely ResNet and EfficientNet. However, they do not take into account the temporal evolution between successive observations. For this purpose, we used a second type of architecture, based on recurrent neural networks (RNNs), specifically long short-term memory (LSTM) cells.
The preferred image input dimension needed for the neural networks was 3-dimensional images, as the models had weights pre-trained on the ImageNet dataset. Using pre-trained weights for fine-tuning CNN architectures yields better results than random initialization. Therefore, the NDVI tile data were stored in the first dimension, and instead of duplicating the data over the 3 image dimensions, the values of the date and mean image NDVI were added. Providing these data combined with the NDVI image to the neural networks standardizes the inputs with respect to the generative models. The models were trained on a Nvidia 2080 TI GPU and built with the Tensorflow 2.4.1 and Keras 2.4.3 frameworks using python 3.6.9. The neural models were trained with an initial learning rate of 1 × 10 with a reduction factor of 0.1 when reaching a plateau and an early stopping criterion of 5 epochs. Only random rotations were performed by the data loader as data augmentations as the quantity of images and variations due to external factors was high.
2.5.1. ResNet
Called also the deep residual network, this is based on the residual blocks, which implement skip connections between layers, as illustrated in
Figure 7. With
the input matrix from the previous block,
the output of the weight layers, and
the output of each block using the skip connection, the model minimizes the residual function during training, as described in Equation (
10).
The ResNet architecture has shown very good performance in the ImageNet and COCO 2015 competitions. It is implemented in varying depths ranging from 18 to 152 stacked residual blocks. We selected the ResNet-50 version because it incorporates 3 layers of residual blocks, which perform better than the 2-layer residual blocks used in ResNet-18 and ResNet-34.
2.5.2. EfficientNet
As the name of the architecture suggests, the EfficientNet family consists of highly parameter-optimized neural networks. They provide an increased parameter accuracy ratio and training efficiency. With only 5.3 M parameters for EfficientNetB0, compared to 26 M parameters for ResNet-50, the results are slightly better than ResNet-50 on the ImageNet dataset. The neural network optimization was formulated as an optimization problem described in Equation (
11), where depth, width, and resolution scaling were performed.
where
,
, and
are constants to be determined and
a coefficient defined by the user.
2.5.3. LSTM
Long short-term memory neural networks are RNNs using LSTM cells. This type of architecture is well suited for sequential data modeling as it considers long-term dependency between observations and also implements the forget gate mechanism for discarding irrelevant features. The detailed principles of the LSTM cell inner architecture are illustrated in
Figure 8 and Equation (
12).
where
W and
U denote weight terms,
b a bias term, and
the
tth observation of the input sequence. The next hidden state and the previous hidden state are expressed by
and
, respectively.
and
are the states of the next cell and the previous one, respectively.
is the nonlinear Sigmoid activation function. The operator ∘ denotes the elementwise product.
As crop maturity progresses over the studied period, considering both past and present observations of the same subzone as dependent, this enables CF prediction, which takes into account the evolution of the crops over time. Thus, bidirectional LSTM cells were used, with a 4-sequence length input and a 4-sequence output to quantify each time observation. Since the LSTM input is a sequence vector, a time-distributed CNN head was used as the feature extractor on the NDVI images to transform them into the format of the LSTM input.
4. Discussion
This study aimed to analyze the possibility of generating extra labeled data from generative models based on a few ground truth samples. Developing such approaches is necessary to enable the creation of datasets large enough for neural network applications, especially when applied to the agricultural sector. The economic cost and time required to annotate UAV data are high and rarely comprehensive due to continuously changing external factors. To overcome these limitations, as depicted previously, parametric and non-parametric methods were used to fit the ground truths for weak labeling. They were fit on data components (Date, NDVI, CF) with CF as the desired output for chlorophyll concentration quantification. The scope of this study was limited to these components as they were the most representative to model parsley plant variations, as shown in
Figure 3.
Given the small amount of ground truth data, the GMM, KNN, and KDE models performed correctly with an RMSE error varying from 0.2059 to 0.0891 (i.e., varying from 2059 to 891 CF). We therefore needed to take this estimation a step further for multiple reasons: the ground truths only considered the mean NDVI of the images, and the ground truths were only a few samples from the field. The incorporation of a neural network enabled additional features to be extracted from multispectral images, improving the CF estimation and potentially correcting for manual experimental ground truth sampling errors. Varying the input image size from 32 × 32 pixels to 128 × 128 pixels also improved the results for LSTM, because instead of having a single plant per image, we had an overview of several plants in a 12 m area, which better matched the size of the monitored zones. We also took into account the aspect of temporal variation of the observations, by feeding the recurrent LSTM network with four successive observations of the same zone. This was performed in order to better address the CF estimation by introducing a factor of vegetation evolution in time. The dataset being distributed in time, the recurrent neural network models performed better than the CNN models for all folds. An input sequence of four observations was used, because Field C was only photographed four times before harvest. Longer input sequences should improve the CF estimation.
From an agronomic point of view, based on company field experts and their CF sampling history, an estimation error below 0.1 or 1000 CF is equivalent to a 3–4 day variation depending on weather conditions, which highlights an optimal harvest date for the farmer. The CF estimation was only performed for past and present UAV orthorectified maps at the time as predicting future evolutions would require combining the current models with connected weather stations.
Finally, in this study, we showed that large amounts of unlabeled aerial images from a UAV can be labeled based on parametric and non-parametric models in order to improve CF estimation and to help generalize neural network prediction on unseen datasets. Limitations for future improvement can be highlighted since the acquired data only covered one harvest season and could be subject to weather and/or crop soil type variations. Furthermore, acquiring aerial images at different heights could also be interesting as parsley types and varieties may have different leaf shapes and reflect light differently. Furthermore, more vegetation indexes could be combined with the NDVI, such as the Normalized Difference Red-Edge (NDRE), which is computed from different wavelengths and used for crop nitrogen monitoring.