Generative-Model-Based Data Labeling for Deep Network Regression: Application to Seed Maturity Estimation from UAV Multispectral Images

Dericquebourg, Eric; Hafiane, Adel; Canals, Raphael

doi:10.3390/rs14205238

Open AccessArticle

Generative-Model-Based Data Labeling for Deep Network Regression: Application to Seed Maturity Estimation from UAV Multispectral Images

by

Eric Dericquebourg

¹

,

Adel Hafiane

^1,*

and

Raphael Canals

²

¹

INSA CVL, University of Orleans, PRISME EA 4229, 18022 Bourges, France

²

INSA CLV, University of Orleans, PRISME EA 4229, 45067 Orleans, France

^*

Author to whom correspondence should be addressed.

Remote Sens. 2022, 14(20), 5238; https://doi.org/10.3390/rs14205238

Submission received: 10 August 2022 / Revised: 18 September 2022 / Accepted: 15 October 2022 / Published: 20 October 2022

(This article belongs to the Special Issue Synergy of UAV Imagery and Artificial Intelligence for Agriculture)

Download

Browse Figures

Versions Notes

Abstract

:

Field seed maturity monitoring is essential to optimize the farming process and guarantee yield quality through high germination. Remote sensing of parsley fields through UAV multispectral imagery allows uniform scanning and better capture of crop information, in comparison to traditional limited field sampling analysis in the laboratory. Moreover, they only represent localized sub-sections of the crop field and are time consuming to process. The limited availability of seed sample maturity data is a drawback for applying deep learning methods, which have shown tremendous potential in estimating agronomic parameters, especially maturity, as they require large labeled datasets. In this paper, we propose a parametric and non-parametric-based weak labeling approach to overcome the lack of maturity labels and render possible maturity estimation by deep network regression to assist growers in harvest decision-making. We present the data acquisition protocol and the performance evaluation of the generative models and neural network architectures. Convolutional and recurrent neural networks were trained on the generated labels and evaluated on maturity ground truth labels to assess the maturity quantification quality. The results showed improvement by the semi-supervised approaches over the generative models, with a root-mean-squared error of 0.0770 for the long-short-term memory network trained on kernel-density-estimation-generated labels. Generative-model-based data labeling can unlock new possibilities for remote sensing fields where data collection is complex, and in our usage, they provide better-performing models for parsley maturity estimation based on UAV multispectral imagery.

Keywords:

UAV imagery; multispectral imaging; machine learning; deep learning; weak labeling; parametric and non-parametric models; chlorophyll; crop monitoring

1. Introduction

Seed maturity measurement is an important factor for seed quality determination and is gaining interest as the agricultural sector of seed multiplication is facing modifications due to global warming, altering crop cycles [1] and the germination quality of the seeds produced [2,3]. Quantification of seed maturity provides growers additional indicators to precisely monitor the evolution of their crops for selecting optimal harvest dates.

To maintain seed quality, several methods have been implemented for quality control. Jalink et. al. [4] developed a non-destructive technique for measuring seed maturity based on laser-induced fluorescence (LIF). This approach measures fluorescence emissions of chlorophyll A at a given wavelength (around 660 nm, corresponding to the absorption peak of chlorophyll A). Using this procedure, chlorophyll fluorescence (CF) has been correlated with the quality of carrot, tomato, and cabbage seeds [5,6,7]. Fluorescence emissions are negatively correlated with seed chlorophyll and seed germination quality [8,9]. CF estimation is an area of research tackled at various levels. It ranges from portable field equipment to laboratory spectrometers and computer vision approaches [10]. More specifically, applied to seeds, computer vision approaches are mainly based on seed pigmentation information, geometric features [11,12] combined with infrared imaging [13,14], and near-infrared hyperspectral imaging [15]. However, these techniques are limited to small samplings over the field and cannot reflect the heterogeneity of the field. Moreover, seed collection and preparation are time consuming. Unmanned aerial vehicles (UAVs) and multispectral imagery are emerging as an attractive approach for field-scale vegetation measurements with high spatial and temporal resolutions.

Indeed, vegetation monitoring based on UAV imagery combined with multispectral imagery has produced notable results in various applications such as yield estimation or vegetation vigor monitoring [16,17,18,19] and even forecasting vegetation evolution using temporal components [20,21]. To extract specific information from multispectral images, vegetation indexes (VIs) have been proposed, which result from arithmetic combinations of images of different wavelengths. In agriculture, they can highlight vegetation health, photosynthetic activity, and leaves’ chlorophyll concentration [22,23,24,25,26,27,28]. The best-known indexes are the Normalized Difference Vegetation Index (NDVI) [29] for chlorophyll content and the Normalized Difference Red-Edge (NDRE) for plant nitrogen diagnosis [30]. Other indexes such as the Soil-Adjusted Vegetation Index (SAVI) or the Leaf Area Index (LAI) [31] are also suitable for plant observation. Statistical methods and machine learning are also significant approaches that use VIs for building models to detect crop chlorophyll content [32].

The past decade has seen the rapid development of deep learning approaches in many agriculture applications, such as weed management and disease detection [33,34,35,36,37,38]. Deep learning methods have achieved state-of-the-art accuracy results [39] compared to other machine learning approaches such as support vector machine (SVM) [40] or random forests [41]. Each category generally addresses different use cases: multiclass classification to discriminate portions of vegetation in images [42,43,44], regression to estimate the evolution of a specific marker [45,46,47], and time series classification for the prediction of measurements, using recurrent neural networks (RNNs) [48,49]. Deep learning approaches have impressive capacities for data modeling, and their use for chlorophyll estimation from VIs seems an interesting approach.

However, the performance of machine learning approaches is strongly dependent on the amount of labeled data. Collecting samples and labeling a large amount of field data with the associated images is a very tedious and time-consuming task, if not impossible to complete. Semi-supervised learning is one of the most widely used approaches to address the dependency on large, labeled datasets, where the goal is to combine a small labeled dataset with a large set of unlabeled data [50]. Semi-supervised learning deals with incomplete labeled data, while weakly supervised learning additionally deals with noisy data [51,52]. Our aim is to incorporate label uncertainty and small labeled datasets to build efficient deep network models through the concept of weakly supervised learning. However, despite the success of such a concept, how data are annotated remains an open problem that depends on the type of data and the application.

In this paper, we propose a new approach that considers generative models to associate an approximate CF value to an NDVI image in order to build a large labeled dataset for deep learning models. We used both parametric and non-parametric estimation techniques, namely: the Gaussian mixture model (GMM), K-nearest neighbors (KNN), and the kernel density estimator (KDE). The first step consisted of the acquisition of aerial images and the collection of seed samples from different locations in the crop fields. Correlation analysis between CF and NDVI was performed to identify relevant regression variables. Then, generative models were built with a few ground truth samples collected from the studied fields. The created labels were fed into both convolutional neural networks (CNNs) and long short-term memory (LSTM) to build regression models that predict chlorophyll fluorescence emissions as a function of (Date, NDVI). The proposed method proved to have a very interesting potential to deal with CF estimation from remote sensing images and to address the problem of labeling large amounts of data. The agronomic purpose of training a CF regression model is providing an additional indicator of present crop CF throughout the entire field surface to help growers and experts select an optimal harvest date.

This paper is organized as follows: Section 2 describes the agronomic experimental protocol and the equipment used for the UAV data acquisition, followed by the exploratory analysis of the data. We then present our weak labeling approach and discuss the CF prediction performances of the trained models. Finally, we highlight the benefits, limitations, and perspectives of generative weak labeling for UAV maturity remote sensing.

2. Materials and Methods

2.1. Data Acquisition

The data acquisition campaign was conducted during the summer of 2020, from 21 July to 18 September, covering the seed filling period. Four parsley fields, located in the Centre-Val de Loire region, France, were considered, representing two types of parsley, curly and flat, and four different varieties. These field choices were made to observe potential variations between both parsley types and varieties during the seed maturation phase. The mean distance between each field is about 15 km. The data collected were of two types, ground seed samples and multispectral UAV images, both associated with a temporal dimension. As crop maturity varies rapidly in the weeks before harvest, a 3-day interval between the different data acquisitions was targeted. However, due to external factors such as weather or wind conditions preventing optimal UAV flights, the effective time step between flights varied from 3 to 6 days. This variable time interval between image acquisitions resulted in photographing the fields at different times of the day, under different weather and lighting conditions, adding image data variance.

2.1.1. Drone Multispectral Images

Aerial images were acquired at a flight height of 40 m, a height that was chosen as a trade-off between image resolution and sufficient image ground footprint coverage. At an altitude of 40 m, the ground sample distance between two consecutive pixels is 2.73 cm and the ground footprint of a single image covers 34.9 m × 26.2 m. Therefore, each 4 ha (depicted in Figure 1) could be covered by the UAV in approximately 20 min. The on-board camera used was the Micasense RedEdge-MX. It is a 5-band multispectral sensor, and 5 images were simultaneously acquired at different wavelengths, as summarized in Table 1.

This sensor was chosen as it covers both the wavelength used by the LIF method, which is centered on 660 nm (method for estimating seed maturity), and the spectral ranges corresponding to the photosynthetic absorption of plants. The multispectral camera measures the reflectance of light from different types of surfaces, soil, and vegetation. The values are directly influenced by the external lighting conditions. To compare the evolution between UAV images of the same field acquired at different times, the multispectral images needed to be calibrated. This was performed by using two external devices: the sunlight sensor, which measures both the Sun irradiation and the angle of incidence, and secondly, a photograph of a calibration panel whose reflectance surface is known. Once corrected, the multispectral images of 1280 × 960 pixels were assembled into an orthorectified image, as shown in Figure 1, using the Agisoft-Metashape-1.6.3 software, Agisoft LLC, Saint Petersburg, Russia, which corrects for image distortions and GPS errors using stereo image calibration between matched points. Successive images were acquired with a minimum longitudinal and lateral overlap of 70% and 30%, respectively, to have enough matched points in adjacent images. The overlap is usually increased depending on the vegetation density.

Pixel-corrected GPS positions in the assembled images allowed the same field acquired at different times during the seed filling period to be overlaid. By spatially aligning global field images at different time steps, we can better monitor the evolution of subareas through multispectral imagery with different UAV flights.

2.1.2. Ground Truth

The ground truth, for each parsley field, was composed of the pairing between UAV images and seed samples. To guarantee the correct pairing, physical control zones were set up, as illustrated in Figure 2. For each studied parsley field, four 12 m

^{2}

control zones were marked with ribbons. This ensured that seed samples were collected from the same field subzones during the sampling period and were easily identifiable on aerial images. The locations of the control zones were selected across the field to represent differences in crop maturity.

Thirty-two UAV flights were carried out to monitor the maturity of parsley seeds, resulting in 128 seed samples collected from the control zones. Each sample was composed of approximately 200 g of seeds, which were cleaned of debris before being processed. The maturity of seed samples was quantified by the non-destructive LIF method based on CF estimation. As plant and seed maturity increases, the CF value decreases (i.e., negative correlation) [5]. The CF estimation machine provides an average CF value ranging from 0 to 10,000 pA as lower and upper theoretical bounds, which are not reached in practice. The CF values are expressed in pA (pico Amperes), as a photodiode was used to capture the fluorescence emissions. Once the CF falls below a certain threshold, the field is considered ready for swathing.

As can be observed in Figure 3, left column, the CF values of the seed samples for each control zone and each field decreased over time. The observation time of the four fields varied because of the crop types (curly, flat) and varieties. Particular crop phenotypes of the same species mature at distinct speeds; therefore, the four fields were harvested at different times. This resulted in an unbalanced distribution of samples, with Fields B and C being under-sampled compared to Fields D and A.

The CF decrease was not steady and fluctuated with punctuated increased spikes, but following a global decrease. These CF fluctuations were influenced by external factors such as weather, soil type, or irrigation, but also by potential sampling errors, as some control zones became heterogeneous as crops matured. The CF trends of the control zones within the same field followed similar amplitude variations and maturation durations. Greater differences were observable between fields, with early varieties having faster and steeper diminutions in CF than late ones. With aerial images, we calculated the vegetation indexes corresponding to the control zones and associated these indexes with the estimated seed maturity of the ground samples.

2.2. Dataset

2.2.1. Data Preparation

Using multiple UAV flights and aligned orthorectified images, we can monitor the evolution of each field throughout the seed maturity period. Harvest dates were different from one field to another. We had four fields A, B, C, and D, with 15, 6, 4, and 7 aerial observations, respectively. The image dataset can be divided into two categories: a small labeled set of image samples that are matched with ground truth CF values and a large number of unlabeled samples for which we had no ground truth CF. The ground truth samples consisted of 128 CF/image pairs. The remaining unlabeled dataset contained 19,443 image samples, obtained by dividing the field image into 128 × 128 pixel tiles. This size was selected to be consistent with the control area size (i.e., resolution of 2.73 cm per pixel for an altitude of 40 m). In order to compare the influence of the image size, we also extracted 32 × 32 pixel tiles where each image contains at most a single parsley crop, on which we performed the same preparation as the 128 × 128 tiles input.

Since drone images of successive fields are aligned, each tile can be observed over a period, enabling the creation of a time series composed of sequential images of the same area. In this study, we limited the length of the time series to 4 consecutive observations as this is the maximum length we had for Field C. For the other three fields, we selected 4 series by sliding a window, with 4 observations, over the temporal dimension. Once the 128 × 128 images had been extracted from the ortho-photos, those that did not contain a minimum of 85% of vegetation were removed.

2.2.2. Correlation between CF and NDVI

The Normalized Difference Vegetation Index (NDVI) is an arithmetic imagery indicator computed using images acquired in two spectral wavelengths (red 668 nm and near-IR 842 nm) as expressed in Equation (1). This index is widely used for monitoring vegetation vigor and plant health from satellite and UAV imagery. NDVI values range from 0 to 1, with 0 representing no vegetation presence and values of 0.8 and above representing maximum vegetation cover.

NDVI = \frac{NearIR - Red}{NearIR + Red}

(1)

As for the seed CF samples, the NDVI evolution in the control zones was monitored until the field harvest. The decrease in NDVI at different time steps can be observed in Figure 3, right column, and in Figure 4, representing the index values using a heat map. We observed the decrease of the index values as the field seeds were maturing. To correlate the CF evolution with NDVI [22], we computed the mean index value for each control zone. The NDVI during the maturation period also had a decreasing trend like the CF and showed similar variations at given time steps, as depicted side by side in Figure 3. The NDVI amplitudes at the beginning of the study period varied depending on the crop variety. The NDVI and the CF both decreased over the seed filling period as the parsley crop produces its seeds near the end of its life cycle. As the seeds mature, the parsley crops dry, and their photosynthetic activity decreases.

The ground truths of the CF and NDVI dimensions presented similar behavior when considering individual fields. This similarity did not necessarily hold for different types of fields or varieties. Assuming a direct correlation between CF and NDVI would lead to a low-quality estimate as the Pearson coefficient was centered on 0.7812 when considering each field separately and dropped to 0.6460 when calculated for the four fields studied. It would probably decrease if more plant varieties were added. Since we aimed to allow for stronger correlation, we introduced the date as an extra dimension. As can be seen in the correlation matrix transcribed in Table 2, the couple (Date, CF) had a negative Pearson coefficient of −0.6913 and the couple (Date, NDVI) a value of −0.5267 when using all samples from the four fields. A second correlation matrix, given in Table 3, was also calculated, using the Spearman coefficient as a non-parametric measure of rank correlation, whereas the Pearson coefficient is a parametric measure of linear correlation between variables. Both Pearson and Spearman correlation matrices output similar results. The coefficients of the different dimensions having important values, their combination allows a better modeling of the distribution of the data represented by the samples.

Under the assumption of correlation between the target CF and the variables (Date, NDVI), we fit the ground truth samples with generative models that were used for the data labeling phase. The following section describes the parametric and non-parametric models used in this process.

2.3. Labeling Based on Generative Models

Generative models are usually combined with neural networks [53,54] for prediction or optimization of model hyperparameters [55], but not directly for data labeling. Let us now describe the considered parametric and non-parametric methods and how their respective parameters were fit to our acquired ground truth dataset of (Date, NDVI, CF) using the python scikit-learn machine learning library.

2.3.1. Gaussian Mixture Model

The Gaussian mixture model (GMM) is a parametric method combining n Gaussians, where each Gaussian clusters a subdivision of data. The Gaussian fitting was performed by the iterative expectation maximization algorithm. Combining multiple Gaussians enables a better characterization of the data compared to a single Gaussian, as each subdistribution is locally approximated. When data are fit with Gaussians, soft clustering is performed, since each prediction is quantified by a probability and not by a continuous target value. The probability density function of a multivariate GMM is given by Equation (2).

p (\vec{x}) = \sum_{i = 1}^{C} ϕ_{i} N (\vec{x} | {\vec{μ}}_{i}, Σ_{i})

(2)

N

is a multivariate Gaussian defined in Equation (3):

N (\vec{x} | {\vec{μ}}_{i}, Σ_{i}) = \frac{1}{\sqrt{{(2 π)}^{C} | Σ_{i} |}} e x p (\frac{- 1}{2} {(\vec{x} - {\vec{μ}}_{i})}^{T} Σ_{i}^{- 1} (\vec{x} - {\vec{μ}}_{i}))

(3)

where

\vec{x}

is a data vector. C is the number of components. The ith component parameters are the mean of

μ_{i}

and the covariance matrix

Σ_{i}

. The mixture component weights are defined as

ϕ

with the constraint that

\sum_{i = 1}^{C} ϕ_{i} = 1

so that the total probability distribution is normalized to 1. The covariance matrix type used was full, meaning each component has its own general covariance matrix and can independently adopt any shape. The means and weights were initialized with k-means clustering.

Selecting the appropriate number of Gaussian components was performed by measuring the Bayesian information criterion (BIC) and Akaike information criterion (AIC) for multiple components. The BIC explicated in Equation (4) measures the ratio between the likelihood and the number of parameters used, to determine if the likelihood gain is sufficient to justify the number of parameters.

BIC = k l n (n) - 2 l n (L)

(4)

where L is the maximum value of the likelihood, n is the number of data points, and k is the number of estimated parameters, which are for the GMM the mean vectors, covariance matrices, and mixture weights. Better-performing models reduce the BIC indicator. In addition, the AIC evaluates how well a model fits the data it was generated from. The best models according to the AIC are those representing the highest variations while using the fewest independent variables. The AIC is expressed in Equation (5) with similar parameters to those in Equation (4). The advantage of these probabilistic model scores is that they do not require test data and can be evaluated on all samples and handle small datasets.

AIC = 2 k - 2 l n (L)

(5)

The selected number of Gaussians for the GMM was 3, which presented a good trade-off between both the BIC and AIC. Maximum values for each indicator were not retained as the BIC tends to select too simple models and, conversely, the AIC too complex ones. Values of the probabilistic indicators are summed up in Table 4 for a varying number of Gaussians ranging from 1 to 5.

2.3.2. K-Nearest Neighbors

KNN is a non-parametric model that associates multiple variables by calculating the average of the numerical target of the KNN. The distance function used for fitting the KNN model was the Euclidean distance function given in Equation (6), as it is widely used and adapted for continuous variable distance measuring.

d_{E u c l i d e a n} (\vec{x}, \vec{y}) = \sqrt{\sum_{i = 1}^{k} {({\vec{x}}_{i} - {\vec{y}}_{i})}^{2}}

(6)

where

\vec{x}

and

\vec{y}

are data points and k the number of nearest neighbors. To improve performance, the data dimensions were rescaled between 0 and 1, which prevents biasing Euclidean distance measures. As the influence of the k value is high, the optimal value was selected to minimize the regression error on the test data split. The error function used to evaluate the performance of the regression was the root-mean-squared error (RMSE). This metric was chosen as it keeps the error in the units of the variable of interest. Furthermore, the error is squared before being averaged, which penalizes larger errors. Equation (7) expresses the RMSE error.

RMSE = \sqrt{\frac{1}{n} \sum_{j = 1}^{n} {(y_{j} - Y_{j})}^{2}}

(7)

with n being the number of data samples, y the predicted target, and Y the true label of the target. Overall, the KNN regression method is well suited to low-dimensional datasets, but loses its practicality as the number of features increases. The optimal value of

k = 8

was retained by performing a grid search cross-validation across the data and for multiple values of k. This was performed in order to minimize the prediction error and to limit the bias induced by the data splits during KNN model fitting.

2.3.3. Kernel Density Estimator

The kernel density estimator (KDE) is a probability density function estimator for random variables. For each dataset point, it evaluates its probability of belonging to a hypercube. The calculation of the number of points inside the hypercube is formulated in Equation (8).

k_{n} = ϕ (\frac{\vec{x} - {\vec{x}}_{n}}{h})

(8)

where

ϕ

is the window function determining whether a dataset entry belongs to the hypercube or not. Knowing the points present inside the hypercube, we can estimate the probability density function of the dataset using Equation (9).

P (x) = \frac{1}{N} \sum_{n = 1}^{N} k_{n}

(9)

with N being the total number of samples,

\vec{x}

the center of the hypercube,

{\vec{x}}_{n}

the nth data sample, and h the bandwidth of the hypercube. The parameter h has a strong influence on the resulting estimate and must be adapted alongside each data dimension if the data ranges vary. Therefore, we rescaled all our data in the range 0–1. The kernel type used was Gaussian as it is a smoother function than other kernel types and is better suited for observations. For the parsley maturity application, we fit the KDE to our three-dimensional ground truth dataset composed of (Date, NDVI, CF) to estimate the resultant probability density function of the distribution. The optimal bandwidth selected for our dataset was

h = 0.08

for each dimension. The parameter h was computed by performing a grid search, that is

0.001 \leq h \leq 1.0

, with a step of

0.001

and by scoring the KDE fitting on unseen ground truth portions of the dataset. The KDE scoring was performed by computing the log-likelihood of the tests folds during the cross-validation on the ground truth samples.

2.4. Weak Data Labeling

As previously mentioned, we aimed to combine the generative models with the deep learning approach in order to improve both the prediction performance and generalization capabilities of deep networks for CF estimation. The originality of the proposed method lies in the use of generative models to provide weak CF labels to additional multispectral images, for which no ground truth data were collected. Generative labeling introduces bias in the data distribution. This inaccuracy allows for a wider range of potential pairs of CF and multispectral images to be covered, thus enabling a better representation of natural fluctuations in the fields. In addition, labeling enough samples of the dataset permits the use of a deep learning approach. The neural network will extract additional features from multispectral images that are not considered during the generative fitting.

It can be observed in Figure 5 that the distribution of the generated weak labels varied between the methods, specifically for the KDE model. The labels generated by the GMM and KNN appear compact around the ground truths depicted in red. Therefore, these distributions consider less heterogeneous variations and focus on average variations, whereas the labels from the KDE span a larger bandwidth of potential data and include a variety of linked heterogeneous ground truths of the other fields. The KDE labeling was seen to be less restricted to the ground truth samples of one field, but was better suited to generating labels for different fields.

By fixing the variables (Date, NDVI) of the generative model, we can extract from the KDE a 1D histogram with the possible values of the CF (i.e., the NDVI variable consists of the image mean NDVI). We randomly picked a CF value from the 1D histogram and added the variance to this value in order to extend the range of CF, thus taking into account fluctuations and unseen data. The CF obtained was associated with the sample image to constitute its label. This procedure was applied for all unlabeled NDVI images. With these labeled data, we trained different neural networks in order to improve the CF prediction from the NDVI images.

To prevent overfitting during the association of the CF labels, the ground truths of the field being labeled were excluded from the data used for the generative model training. As can be seen in Figure 6, middle plot, the generated CF values followed a similar trend as the ground truth from which they were generated (left plot), but were shifted towards lower values, which is logical as the acquisition duration for Field A lasted longer. The date boxplots for the generated labels and ground truth of Field A (right plot) are identical as the ground truths were harvested at each acquisition date. The generated CF labels covered a wider range than the ground truth, as we added uncertainty when associating the label.

2.5. Deep Neural Networks for CF Estimation

To build a deep learning model for CF estimation, we chose two types of popular architectures. The first one was based on convolutional networks (CNNs). In this scope, we opted for popular well-performing architectures, namely ResNet and EfficientNet. However, they do not take into account the temporal evolution between successive observations. For this purpose, we used a second type of architecture, based on recurrent neural networks (RNNs), specifically long short-term memory (LSTM) cells.

The preferred image input dimension needed for the neural networks was 3-dimensional images, as the models had weights pre-trained on the ImageNet dataset. Using pre-trained weights for fine-tuning CNN architectures yields better results than random initialization. Therefore, the NDVI tile data were stored in the first dimension, and instead of duplicating the data over the 3 image dimensions, the values of the date and mean image NDVI were added. Providing these data combined with the NDVI image to the neural networks standardizes the inputs with respect to the generative models. The models were trained on a Nvidia 2080 TI GPU and built with the Tensorflow 2.4.1 and Keras 2.4.3 frameworks using python 3.6.9. The neural models were trained with an initial learning rate of 1 × 10

^{- 4}

with a reduction factor of 0.1 when reaching a plateau and an early stopping criterion of 5 epochs. Only random rotations were performed by the data loader as data augmentations as the quantity of images and variations due to external factors was high.

2.5.1. ResNet

Called also the deep residual network, this is based on the residual blocks, which implement skip connections between layers, as illustrated in Figure 7. With

x

the input matrix from the previous block,

F (x)

the output of the weight layers, and

H (x)

the output of each block using the skip connection, the model minimizes the residual function during training, as described in Equation (10).

R e s i d u a l (x) = H (x) - x

(10)

The ResNet architecture has shown very good performance in the ImageNet and COCO 2015 competitions. It is implemented in varying depths ranging from 18 to 152 stacked residual blocks. We selected the ResNet-50 version because it incorporates 3 layers of residual blocks, which perform better than the 2-layer residual blocks used in ResNet-18 and ResNet-34.

2.5.2. EfficientNet

As the name of the architecture suggests, the EfficientNet family consists of highly parameter-optimized neural networks. They provide an increased parameter accuracy ratio and training efficiency. With only 5.3 M parameters for EfficientNetB0, compared to 26 M parameters for ResNet-50, the results are slightly better than ResNet-50 on the ImageNet dataset. The neural network optimization was formulated as an optimization problem described in Equation (11), where depth, width, and resolution scaling were performed.

depth : d = α^{ϕ} width : w = β^{ϕ} resolution : r = γ^{ϕ} such that : α \times β \times γ \approx 2 with : α \geq 1, β \geq 1, γ \geq 1

(11)

where

α

,

β

, and

γ

are constants to be determined and

ϕ

a coefficient defined by the user.

2.5.3. LSTM

Long short-term memory neural networks are RNNs using LSTM cells. This type of architecture is well suited for sequential data modeling as it considers long-term dependency between observations and also implements the forget gate mechanism for discarding irrelevant features. The detailed principles of the LSTM cell inner architecture are illustrated in Figure 8 and Equation (12).

\begin{matrix} i_{t} = σ (W_{i} x_{t} + U_{f} h_{t - 1} + b_{i}) \\ f_{t} = σ (W_{f} x_{t} + U_{f} h_{t - 1} + b_{f}) \\ o_{t} = σ (W_{o} x_{t} + U_{o} h_{t - 1} + b_{o}) \\ c_{t} = f_{t} \circ c_{t - 1} + i_{t} \circ tanh (W_{c} x_{t} + U_{c} h_{t - 1} + b_{c}) \\ h_{t} = o_{t} \circ tanh (c_{t}) \end{matrix}

(12)

where W and U denote weight terms, b a bias term, and

x_{t}

the tth observation of the input sequence. The next hidden state and the previous hidden state are expressed by

h_{t}

and

h_{t - 1}

, respectively.

c_{t}

and

c_{t - 1}

are the states of the next cell and the previous one, respectively.

σ (\cdot)

is the nonlinear Sigmoid activation function. The operator ∘ denotes the elementwise product.

As crop maturity progresses over the studied period, considering both past and present observations of the same subzone as dependent, this enables CF prediction, which takes into account the evolution of the crops over time. Thus, bidirectional LSTM cells were used, with a 4-sequence length input and a 4-sequence output to quantify each time observation. Since the LSTM input is a sequence vector, a time-distributed CNN head was used as the feature extractor on the NDVI images to transform them into the format of the LSTM input.

3. Experiments and Results

In this section, we investigate the proposed approaches on real data described previously. First, we evaluated the method of generating labels by parametric and non-parametric techniques. Then, we evaluated the deep learning methods trained with images automatically labeled by these generative methods. To assess the performance for each method on unseen parsley fields and varieties, we conducted a leave one out cross-validation for each technique. We used 3 parsley fields as the training split for fitting and 1 as the test split for the evaluation.

3.1. Evaluation of the Generative Labeling Methods

The monitored crops had varying numbers of observations, with 59, 24, 16, and 28 samples from Fields A, B, C, and D, respectively. The cross-validation procedure for each generative model was trained using the optimal hyperparameters calculated previously on all the ground truth data. For instance, we fit the GMM, KNN, or KDE on the Field A, B, and C ground truths and evaluated their performance on Field D. Table 5 shows the adopted cross-validation folds, with different sizes of training and test splits.

As introduced before, the GMM, KNN, and KDE were utilized to represent the distribution of the ground truth data samples, for which we had full knowledge of the CF value and the corresponding NDVI images for a given acquisition date. With these models, we firstly evaluated the prediction capabilities of the CF with the RMSE metric, and secondly, the best method was used for weak labeling.

Table 6 compares the generative models’ accuracy on the different test sets with the cross-validation procedure. We can observe that the variability in the number of training samples had a direct impact on the prediction performance of the KNN. The RMSE metric decreased with increasing number of training fold samples, with a maximum RMSE of 0.1782 for the test fold D and a minimum RMSE of 0.0968 for the test fold C. The RMSE of the test Field A was not the highest, despite being the largest test set for a model built on the smallest training set. This may be due in part to the fact that the parsley evolution of Field A had less steep maturity variations over time and less internal crop disparities. For the GMM, we observed similar trends to the KNN for each test fold. However, the changes in the RMSE were more proportional to the increase in the number of training samples. Apart from test Field A, the performances of the GMM compared to the KNN were roughly equivalent, reaching an average error of 0.1422. The results provided by the KDE were better, as observed in Table 6. While the errors for the folds B and C remained stable for each method, the error of the fold D decreased by 9%, and the fold A reached a score of 0.1134 compared to 0.1418 or 0.2059 for the KNN and GMM, respectively. On the other hand, we obtained the best prediction performance with the non-parametric KDE method. This approach seems more robust to variations in the number of training and test samples. Therefore, the KDE method was adopted to label the image samples.

To ensure correct quantification of the prediction performance, each labeled field was excluded from the training set for KDE model computation. For instance, Field A was labeled using a KDE model fit on Fields (B, C, D) with respect to the splits presented previously in Table 5. Labeling in this way enables the correct evaluation of the generalization capabilities of the model, since each labeled field is considered as unseen data. As can be seen in Table 7 for each field, we now had thousands of weakly labeled data. Two configurations were considered depending on the deep learning approach used: either a sequence of four images for LSTM or independent images for the CNN-based architecture.

3.2. Evaluation of Neural Regression Models

The weak labeled data were used to train the deep learning models in order to estimate the CF from the NDVI images. As reported in Table 8, the overall performance of the CNN architectures was similar with a slight advantage for the two EfficientNet models. The size of the input images (128 × 128, 32 × 32) did not seem to influence the result for these architectures either. On the other hand, the LSTM architecture provided the best performance, with a minimum average error of 0.0770, because these models used images from four successive observations for CF estimation. Indeed, this type of architecture was better suited for sequential data. For the estimation improvement of the neural network models compared to the parametric and non-parametric models alone, they benefited from additional features extracted from the multispectral images, leading to a better estimation. Since the cross-validation procedure was adopted, the prediction capabilities of the models should remain stable for similar parsley fields.

4. Discussion

This study aimed to analyze the possibility of generating extra labeled data from generative models based on a few ground truth samples. Developing such approaches is necessary to enable the creation of datasets large enough for neural network applications, especially when applied to the agricultural sector. The economic cost and time required to annotate UAV data are high and rarely comprehensive due to continuously changing external factors. To overcome these limitations, as depicted previously, parametric and non-parametric methods were used to fit the ground truths for weak labeling. They were fit on data components (Date, NDVI, CF) with CF as the desired output for chlorophyll concentration quantification. The scope of this study was limited to these components as they were the most representative to model parsley plant variations, as shown in Figure 3.

Given the small amount of ground truth data, the GMM, KNN, and KDE models performed correctly with an RMSE error varying from 0.2059 to 0.0891 (i.e., varying from 2059 to 891 CF). We therefore needed to take this estimation a step further for multiple reasons: the ground truths only considered the mean NDVI of the images, and the ground truths were only a few samples from the field. The incorporation of a neural network enabled additional features to be extracted from multispectral images, improving the CF estimation and potentially correcting for manual experimental ground truth sampling errors. Varying the input image size from 32 × 32 pixels to 128 × 128 pixels also improved the results for LSTM, because instead of having a single plant per image, we had an overview of several plants in a 12 m

^{2}

area, which better matched the size of the monitored zones. We also took into account the aspect of temporal variation of the observations, by feeding the recurrent LSTM network with four successive observations of the same zone. This was performed in order to better address the CF estimation by introducing a factor of vegetation evolution in time. The dataset being distributed in time, the recurrent neural network models performed better than the CNN models for all folds. An input sequence of four observations was used, because Field C was only photographed four times before harvest. Longer input sequences should improve the CF estimation.

From an agronomic point of view, based on company field experts and their CF sampling history, an estimation error below 0.1 or 1000 CF is equivalent to a 3–4 day variation depending on weather conditions, which highlights an optimal harvest date for the farmer. The CF estimation was only performed for past and present UAV orthorectified maps at the time as predicting future evolutions would require combining the current models with connected weather stations.

Finally, in this study, we showed that large amounts of unlabeled aerial images from a UAV can be labeled based on parametric and non-parametric models in order to improve CF estimation and to help generalize neural network prediction on unseen datasets. Limitations for future improvement can be highlighted since the acquired data only covered one harvest season and could be subject to weather and/or crop soil type variations. Furthermore, acquiring aerial images at different heights could also be interesting as parsley types and varieties may have different leaf shapes and reflect light differently. Furthermore, more vegetation indexes could be combined with the NDVI, such as the Normalized Difference Red-Edge (NDRE), which is computed from different wavelengths and used for crop nitrogen monitoring.

5. Conclusions

In this paper, we described a complete workflow for parsley chlorophyll fluorescence (CF) estimation, starting from the aerial UAV image acquisition and agronomic data gathering protocol, ending with the CF labeling and prediction by a neural network model. We assessed the performance of the combination of the non-parametric model and neural network approaches to train a general model from a few specific ground truth data samples. The KDE yielded the best results for weak labeling of multispectral images and represents higher possible data variations, and recurrent neural networks using LSTM cells achieved a lower RMSE error compared to the CNN architectures. The potential of labeling by generative techniques gives rise to a multitude of possible research directions, especially in remote sensing. In future work, we plan to further improve the prediction models and to use more data acquired with different types of modalities, such as satellite images.

Author Contributions

Conceptualization, A.H.; methodology E.D. and A.H.; software and implementation, E.D.; investigation, E.D., A.H. and R.C.; data acquisition and curation, E.D.; writing—original draft preparation, E.D.; writing—review and editing, A.H. and R.C.; supervision, A.H. and R.C.; project administration, A.H. and R.C.; funding acquisition A.H. All authors have read and agreed to the published version of the manuscript.

Funding

Frasem company and Association Nationale de la Recherche et de la Technologie (ANRT).

Data Availability Statement

Not applicabale.

Acknowledgments

This work was carried out as part of a CIFRE thesis with Frasem company specialized in vegetable seeds’ multiplication.

Conflicts of Interest

The authors declare no conflict of interest.

References

Marklein, A.; Elias, E.; Nico, P.; Steenwerth, K. Projected temperature increases may require shifts in the growing season of cool-season crops and the growing locations of warm-season crops. Sci. Total Environ. 2020, 746, 140918. [Google Scholar] [CrossRef] [PubMed]
Maity, A.; Pramanik, P. Climate change and seed quality: An alarming issue in crop husbandry. Curr. Sci. 2013, 105, 1336–1338. [Google Scholar]
Singh, R.; Prasad, P.V.V.; Reddy, K. Impacts of Changing Climate and Climate Variability on Seed Production and Seed Industry. Adv. Agron. 2013, 118, 49–110. [Google Scholar] [CrossRef]
Jalink, H. Werkwijze voor het bepalen van de rijpheid en kwaliteit van zaden middels het chlorofylgehalte en inrichting voor het selecteren van zaden met behulp van een dergelijke werkwijze. NL Patent NL1002984C2, 6 November 1997. [Google Scholar]
Groot, S.; Birnbaum, Y.; Rop, N.; Jalink, H.; Forsberg, G.; Kromphardt, C.; Werner, S.; Koch, E. Effect of seed maturity on sensitivity of seeds towards physical sanitation treatments. Seed Sci. Technol. 2006, 34, 403–413. [Google Scholar] [CrossRef]
Jalink, H.; Schoor, R.v.d.; Frandas, A.; Pijlen, J.G.v.; Bino, R.J. Chlorophyll fluorescence of Brassica oleracea seeds as a non-destructive marker for seed maturity and seed performance. Seed Sci. Res. 1998, 8, 437–443. [Google Scholar] [CrossRef]
Jalink, H.; van der Schoor, R.; Birnbaum, Y.; Bino, R. Seed chlorophyll content as an indicator for seed maturity and seed quality. Acta Hortic. 1999, 504, 219–228. [Google Scholar] [CrossRef]
Kenanoğlu, B. Chlorophyll fluorescence sorting method to improve quality of Capsicum pepper seed lots produced from different maturity fruits. HortScience 2013, 48, 965–968. [Google Scholar] [CrossRef]
Demir, I.; Kenanoğlu, B.; Jalink, H.; Mavi, K. Chlorophyll Fluorescence Sorting Method to Improve Seedling Emergence Potential and Vigour of Commercial Tomato and Cucumber Seed Lots. Int. J. Agric. For. 2013, 3, 333–338. [Google Scholar]
NI, Z.; Lu, Q.; Huo, H.Y.; Zhang, H. Estimation of Chlorophyll Fluorescence at Different Scales: A Review. Sensors 2019, 19, 3000. [Google Scholar] [CrossRef] [Green Version]
Antonio, D. Computerised seed imaging: A new tool to evaluate germination quality. Commun. Biometry Crop. Sci. 2006, 1, 20–31. [Google Scholar]
Rodríguez-Pulido, F.J.; Ferrer-Gallego, R.; Lourdes González-Miret, M.; Rivas-Gonzalo, J.C.; Escribano-Bailón, M.T.; Heredia, F.J. Preliminary study to determine the phenolic maturity stage of grape seeds by computer vision. Anal. Chim. Acta 2012, 732, 78–82. [Google Scholar] [CrossRef] [PubMed]
Ooms, D.; Destain, M.F. Evaluation of chicory seeds maturity by chlorophyll fluorescence imaging. Biosyst. Eng. 2011, 110, 168–177. [Google Scholar] [CrossRef] [Green Version]
ElMasry, G.; Mandour, N.; Al-Rejaie, S.; Belin, E.; Rousseau, D. Recent Applications of Multispectral Imaging in Seed Phenotyping and Quality Monitoring—An Overview. Sensors 2019, 19, 1090. [Google Scholar] [CrossRef]
Wang, Z.; Tian, X.; Fan, S.; Zhang, C.; Li, J. Maturity determination of single maize seed by using near-infrared hyperspectral imaging coupled with comparative analysis of multiple classification models. Infrared Phys. Technol. 2021, 112, 103596. [Google Scholar] [CrossRef]
Chakraborty, M.; Khot, L.R.; Peters, R.T. Assessing suitability of modified center pivot irrigation systems in corn production using low altitude aerial imaging techniques. Inf. Process. Agric. 2020, 7, 41–49. [Google Scholar] [CrossRef]
Modica, G.; Messina, G.; De Luca, G.; Fiozzo, V.; Praticò, S. Monitoring the vegetation vigor in heterogeneous citrus and olive orchards. A multiscale object-based approach to extract trees’ crowns from UAV multispectral imagery. Comput. Electron. Agric. 2020, 175, 105500. [Google Scholar] [CrossRef]
Jiang, R.; Sanchez-Azofeifa, A.; Laakso, K.; Wang, P.; Xu, Y.; Zhou, Z.; Luo, X.; Lan, Y.; Zhao, G.; Chen, X. UAV-based partially sampling system for rapid NDVI mapping in the evaluation of rice nitrogen use efficiency. J. Clean. Prod. 2021, 289, 125705. [Google Scholar] [CrossRef]
Shammi, S.A.; Meng, Q. Use time series NDVI and EVI to develop dynamic crop growth metrics for yield modeling. Ecol. Indic. 2021, 121, 107124. [Google Scholar] [CrossRef]
Spadoni, G.L.; Cavalli, A.; Congedo, L.; Munafò, M. Analysis of Normalized Difference Vegetation Index (NDVI) multi-temporal series for the production of forest cartography. Remote Sens. Appl. Soc. Environ. 2020, 20, 100419. [Google Scholar] [CrossRef]
Marques Ramos, A.P.; Prado Osco, L.; Elis Garcia Furuya, D.; Nunes Gonçalves, W.; Cordeiro Santana, D.; Pereira Ribeiro Teodoro, L.; Antonio da Silva Junior, C.; Fernando Capristo-Silva, G.; Li, J.; Henrique Rojo Baio, F.; et al. A random forest ranking approach to predict yield in maize with uav-based vegetation spectral indices. Comput. Electron. Agric. 2020, 178, 105791. [Google Scholar] [CrossRef]
Yoder, B.J.; Waring, R.H. The normalized difference vegetation index of small Douglas-fir canopies with varying chlorophyll concentrations. Remote Sens. Environ. 1994, 49, 81–91. [Google Scholar] [CrossRef]
Haboudane, D.; Tremblay, N.; Miller, J.R.; Vigneault, P. Remote Estimation of Crop Chlorophyll Content Using Spectral Indices Derived From Hyperspectral Data. IEEE Trans. Geosci. Remote Sens. 2008, 46, 423–437. [Google Scholar] [CrossRef]
Wu, C.; Wang, L.; Niu, Z.; Gao, S.; Wu, M. Nondestructive estimation of canopy chlorophyll content using Hyperion and Landsat/TM images. Int. J. Remote Sens. 2010, 31, 2159–2167. [Google Scholar] [CrossRef]
Pôças, I.; Calera, A.; Campos, I.; Cunha, M. Remote sensing for estimating and mapping single and basal crop coefficientes: A review on spectral vegetation indices approaches. Agric. Water Manag. 2020, 233, 106081. [Google Scholar] [CrossRef]
Cao, Z.; Yao, X.; Liu, H.; Liu, B.; Cheng, T.; Tian, Y.; Cao, W.; Zhu, Y. Comparison of the abilities of vegetation indices and photosynthetic parameters to detect heat stress in wheat. Agric. For. Meteorol. 2019, 265, 121–136. [Google Scholar] [CrossRef]
Qiao, L.; Gao, D.; Zhang, J.; Li, M.; Sun, H.; Ma, J. Dynamic Influence Elimination and Chlorophyll Content Diagnosis of Maize Using UAV Spectral Imagery. Remote Sens. 2020, 12, 2650. [Google Scholar] [CrossRef]
Tenreiro, T.R.; García-Vila, M.; Gómez, J.A.; Jiménez-Berni, J.A.; Fereres, E. Using NDVI for the assessment of canopy cover in agricultural crops within modelling research. Comput. Electron. Agric. 2021, 182, 106038. [Google Scholar] [CrossRef]
Peñuelas, J.; Gamon, J.A.; Griffin, K.L.; Field, C.B. Assessing community type, plant biomass, pigment composition, and photosynthetic efficiency of aquatic vegetation from spectral reflectance. Remote Sens. Environ. 1993, 46, 110–118. [Google Scholar] [CrossRef]
Lu, J.; Miao, Y.; Shi, W.; Li, J.; Yuan, F. Evaluating different approaches to non-destructive nitrogen status diagnosis of rice using portable RapidSCAN active canopy sensor. Sci. Rep. 2017, 7, 14073. [Google Scholar] [CrossRef] [Green Version]
Elvidge, C.D.; Chen, Z. Comparison of broad-band and narrow-band red and near-infrared vegetation indices. Remote Sens. Environ. 1995, 54, 38–48. [Google Scholar] [CrossRef]
Qiao, L.; Tang, W.; Gao, D.; Zhao, R.; An, L.; Li, M.; Sun, H.; Song, D. UAV-based chlorophyll content estimation by evaluating vegetation index responses under different crop coverages. Comput. Electron. Agric. 2022, 196, 106775. [Google Scholar] [CrossRef]
Hasan, A.S.M.M.; Sohel, F.; Diepeveen, D.; Laga, H.; Jones, M.G. A survey of deep learning techniques for weed detection from images. Comput. Electron. Agric. 2021, 184, 106067. [Google Scholar] [CrossRef]
Yu, J.; Sharpe, S.M.; Schumann, A.W.; Boyd, N.S. Deep learning for image-based weed detection in turfgrass. Eur. J. Agron. 2019, 104, 78–84. [Google Scholar] [CrossRef]
G c, S.; Zhang, Y.; Koparan, C.; Ahmed, M.R.; Howatt, K.; Sun, X. Weed and crop species classification using computer vision and deep learning technologies in greenhouse conditions. J. Agric. Food Res. 2022, 9, 100325. [Google Scholar] [CrossRef]
dos Santos Ferreira, A.; Freitas, D.M.; da Silva, G.G.; Pistori, H.; Folhes, M.T. Unsupervised deep learning and semi-automatic data labeling in weed discrimination. Comput. Electron. Agric. 2019, 165, 104963. [Google Scholar] [CrossRef]
Ouhami, M.; Hafiane, A.; Es-Saady, Y.; El Hajji, M.; Canals, R. Computer Vision, IoT and Data Fusion for Crop Disease Detection Using Machine Learning: A Survey and Ongoing Research. Remote Sens. 2021, 13, 2486. [Google Scholar] [CrossRef]
Jackulin, C.; Murugavalli, S. A comprehensive review on detection of plant disease using machine learning and deep learning approaches. Meas. Sens. 2022, 24, 100441. [Google Scholar] [CrossRef]
Abiodun, O.I.; Jantan, A.; Omolara, A.E.; Dada, K.V.; Umar, A.M.; Linus, O.U.; Arshad, H.; Kazaure, A.A.; Gana, U.; Kiru, M.U. Comprehensive Review of Artificial Neural Network Applications to Pattern Recognition. IEEE Access 2019, 7, 158820–158846. [Google Scholar] [CrossRef]
Ahmed, F.; Al-Mamun, H.A.; Bari, A.S.M.H.; Hossain, E.; Kwan, P. Classification of crops and weeds from digital images: A support vector machine approach. Crop Prot. 2012, 40, 98–104. [Google Scholar] [CrossRef]
Lottes, P.; Khanna, R.; Pfeifer, J.; Siegwart, R.; Stachniss, C. UAV-Based Crop and Weed Classification for Smart Farming. In Proceedings of the 2017 IEEE International Conference on Robotics and Automation (ICRA), Singapore, 29 May–3 June 2017. [Google Scholar] [CrossRef]
Anagnostis, A.; Tagarakis, A.C.; Asiminari, G.; Papageorgiou, E.; Kateris, D.; Moshou, D.; Bochtis, D. A deep learning approach for anthracnose infected trees classification in walnut orchards. Comput. Electron. Agric. 2021, 182, 105998. [Google Scholar] [CrossRef]
Xue, W.; Hu, X.; Wei, Z.; Mei, X.; Chen, X.; Xu, Y. A fast and easy method for predicting agricultural waste compost maturity by image-based deep learning. Bioresour. Technol. 2019, 290, 121761. [Google Scholar] [CrossRef] [PubMed]
Tetila, E.C.; Machado, B.B.; Astolfi, G.; Belete, N.A.d.S.; Amorim, W.P.; Roel, A.R.; Pistori, H. Detection and classification of soybean pests using deep learning with UAV images. Comput. Electron. Agric. 2020, 179, 105836. [Google Scholar] [CrossRef]
Chen, H.; Chen, A.; Xu, L.; Xie, H.; Qiao, H.; Lin, Q.; Cai, K. A deep learning CNN architecture applied in smart near-infrared analysis of water pollution for agricultural irrigation resources. Agric. Water Manag. 2020, 240, 106303. [Google Scholar] [CrossRef]
Kim, H.S.; Park, S.J.; Seo, S.M.; Yoo, Y.S.; Jeong, H.W.; Jang, H. Regression analysis of high-temperature oxidation of Ni-based superalloys using artificial neural network. Corros. Sci. 2021, 180, 109207. [Google Scholar] [CrossRef]
Lathuilière, S.; Mesejo, P.; Alameda-Pineda, X.; Horaud, R. A Comprehensive Analysis of Deep Regression. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 2065–2081. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Hamrani, A.; Akbarzadeh, A.; Madramootoo, C.A. Machine learning for predicting greenhouse gas emissions from agricultural soils. Sci. Total Environ. 2020, 741, 140338. [Google Scholar] [CrossRef]
Chen, S.; Li, B.; Cao, J.; Mao, B. Research on Agricultural Environment Prediction Based on Deep Learning. Procedia Comput. Sci. 2018, 139, 33–40. [Google Scholar] [CrossRef]
Yang, X.; Song, Z.; King, I.; Xu, Z. A Survey on Deep Semi-supervised Learning. arXiv 2021, arXiv:2103.00550. [Google Scholar]
Sun, L.; Ye, P.; Lyu, G.; Feng, S.; Dai, G.; Zhang, H. Weakly-supervised multi-label learning with noisy features and incomplete labels. Neurocomputing 2020, 413, 61–71. [Google Scholar] [CrossRef]
Algan, G.; Ulusoy, I. Image classification with deep learning in the presence of noisy labels: A survey. Knowl.-Based Syst. 2021, 215, 106771. [Google Scholar] [CrossRef]
Trentin, E.; Lusnig, L.; Cavalli, F. Parzen neural networks: Fundamentals, properties, and an application to forensic anthropology. Neural Netw. 2018, 97, 137–151. [Google Scholar] [CrossRef] [PubMed]
Li, B.; Zhou, Z.; Li, D.; Zou, W. A novel Parzen probabilistic neural network based noncoherent detection algorithm for distributed ultra-wideband sensors. J. Netw. Comput. Appl. 2011, 34, 1894–1902. [Google Scholar] [CrossRef]
Nguyen, H.P.; Liu, J.; Zio, E. A long-term prediction approach based on long short-term memory neural networks with automatic parameter optimization by Tree-structured Parzen Estimator and applied to time-series data of NPP steam generators. Appl. Soft Comput. 2020, 89, 106116. [Google Scholar] [CrossRef]

Figure 1. Orthorectified UAV images of curly leaf-type parsley Fields A and B and flat-type parsley Fields C and D acquired at a 40 m height.

Figure 2. Example of a control zone set up in a parsley field marked with a ribbon to monitor maturity in UAV images according to ground truth harvested samples.

Figure 3. In the left column, the chlorophyll fluorescence (CF) evolution over time of the ground truth seed samples for Fields A, B, C, and D. In the right column, the associated NDVI evolutions of the control zones computed from the UAV multispectral images until the field swathing date.

Figure 4. NDVI orthorectified heat maps’ evolution over time. Warmer colors represent higher NDVI values, ranging from 0 to 1, as we monitored vegetation evolution.

Figure 5. Generated weak labels in green versus ground truth data in red for the GMM (left), KNN (middle), and KDE (right) for Field A.

Figure 6. On the left, the boxplots of the ground truths of Fields B, C, and D on which the KDE was fit. In the middle, the generated labels by the KDE for Field A and, on the right, the ground truth boxplots of Field A. Date and CF dimensions were rescaled to 0–1 to match the NDVI data range. The dates 0 and 1 correspond, respectively, to the 1st of June and the 15th of October. CF rescaled linearly from 0–10,000 to the 0–1 range.

Figure 7. A 2-layer residual block with skip connection.

Figure 8. The LSTM cell architecture.

Table 1. Multispectral specifications of the sensor.

Band Name	Wavelength Center (nm)	Bandwidth (nm)
Blue	475	32
Green	560	27
Red	668	14
Red-Edge	717	12
Near-IR	842	57

Table 2. Pearson correlation matrix.

Pearson	NDVI	CF	Date
NDVI	1	$0.645975$	$- 0.526745$
CF	$0.645975$	1	$- 0.691285$
Date	$- 0.526745$	$- 0.691285$	1

Table 3. Spearman correlation matrix.

Spearman	NDVI	CF	Date
NDVI	1	$0.649225$	$- 0.525424$
CF	$0.649225$	1	$- 0.715251$
Date	$- 0.525424$	$- 0.715251$	1

Table 4. BIC and AIC scores for a varying number of Gaussians in the GMM, ranging from 1 to 5.

n GMM	1	2	3	4	5
BIC	$- 527.40$	$- 532.46$	$- 517.70$	$- 494.46$	$- 482.60$
AIC	$- 553.01$	$- 586.50$	$- 600.17$	$- 605.39$	$- 621.97$

Table 5. Parametric and non-parametric training and test split ratios.

Train Fields	Train Samples	Test Field	Test Samples
B, C, D	68	A	59
A, C, D	103	B	24
A, B, D	111	C	16
A, B, C	99	D	28

Table 6. Root-mean-squared error between the true CF and that predicted by the GMM, KNN, and KDE for each cross-validation fold.

Test Field	GMM	KNN	KDE
A	0.2059	0.1418	0.1134
B	0.1079	0.1124	0.1084
C	0.0909	0.0968	0.0891
D	0.1774	0.1782	0.1656
Mean	0.1455	0.1323	0.1191

Table 7. Weak labeled data for each parsley field for 32 × 32 and 128 × 128 pixel size images per time step.

Labeled Field	32 × 32	128 × 128	No. of Time Steps
A	64,656	4041	15
B	84,208	5263	6
C	88,640	5540	4
D	73,584	4599	7

Table 8. RMSE for each neural network and labeling models with varying sizes of the input images (32 × 32 pixels on the left side and 128 × 128 pixels on the right side of the table).

	$ResNet 50_{32 \times 32}$	$EfficientNetB 0_{32 \times 32}$	${LSTM}_{32 \times 32}$	$ResNet 50_{128 \times 128}$	$EfficientNetB 0_{128 \times 128}$	${LSTM}_{128 \times 128}$
GMM
Field A	0.1982	0.2094	0.1782	0.1831	0.1932	0.1743
Field B	0.1273	0.1249	0.1097	0.1105	0.1056	0.1032
Field C	0.1129	0.1175	0.1063	0.1074	0.1069	0.0980
Field D	0.1938	0.1821	0.1632	0.1768	0.1731	0.1564
Mean	0.1581	0.1584	0.1393	0.1445	0.1447	0.1330
KNN
Field A	0.1564	0.1457	0.1325	0.1435	0.1385	0.1157
Field B	0.1345	0.1262	0.1141	0.1357	0.1246	0.1083
Field C	0.1211	0.1147	0.1089	0.1149	0.1051	0.0894
Field D	0.1718	0.1588	0.1351	0.1522	0.1493	0.1238
Mean	0.1459	0.1363	0.1226	0.1366	0.1294	0.1093
KDE
Field A	0.1320	0.1228	0.1133	0.1207	0.1127	0.1021
Field B	0.1239	0.1136	0.0981	0.1138	0.1047	0.0673
Field C	0.1152	0.1094	0.0953	0.1042	0.0978	0.0750
Field D	0.1767	0.1723	0.0902	0.1546	0.1471	0.0636
Mean	0.1366	0.1295	0.0992	0.1233	0.1156	0.0770

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dericquebourg, E.; Hafiane, A.; Canals, R. Generative-Model-Based Data Labeling for Deep Network Regression: Application to Seed Maturity Estimation from UAV Multispectral Images. Remote Sens. 2022, 14, 5238. https://doi.org/10.3390/rs14205238

AMA Style

Dericquebourg E, Hafiane A, Canals R. Generative-Model-Based Data Labeling for Deep Network Regression: Application to Seed Maturity Estimation from UAV Multispectral Images. Remote Sensing. 2022; 14(20):5238. https://doi.org/10.3390/rs14205238

Chicago/Turabian Style

Dericquebourg, Eric, Adel Hafiane, and Raphael Canals. 2022. "Generative-Model-Based Data Labeling for Deep Network Regression: Application to Seed Maturity Estimation from UAV Multispectral Images" Remote Sensing 14, no. 20: 5238. https://doi.org/10.3390/rs14205238

APA Style

Dericquebourg, E., Hafiane, A., & Canals, R. (2022). Generative-Model-Based Data Labeling for Deep Network Regression: Application to Seed Maturity Estimation from UAV Multispectral Images. Remote Sensing, 14(20), 5238. https://doi.org/10.3390/rs14205238

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Generative-Model-Based Data Labeling for Deep Network Regression: Application to Seed Maturity Estimation from UAV Multispectral Images

Abstract

1. Introduction

2. Materials and Methods

2.1. Data Acquisition

2.1.1. Drone Multispectral Images

2.1.2. Ground Truth

2.2. Dataset

2.2.1. Data Preparation

2.2.2. Correlation between CF and NDVI

2.3. Labeling Based on Generative Models

2.3.1. Gaussian Mixture Model

2.3.2. K-Nearest Neighbors

2.3.3. Kernel Density Estimator

2.4. Weak Data Labeling

2.5. Deep Neural Networks for CF Estimation

2.5.1. ResNet

2.5.2. EfficientNet

2.5.3. LSTM

3. Experiments and Results

3.1. Evaluation of the Generative Labeling Methods

3.2. Evaluation of Neural Regression Models

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI