**1. Introduction**

Farmers use soil plant analysis development (SPAD) devices as a field diagnostic tool to estimate nitrogen (N) content in the plant and to predict grain yield [1,2]. For rice crops, N provides critical insight into plant-growth, crop yield, and biomass accumulation [3]. Furthermore, monitoring N allows farmers to properly adapt crop management practices [4,5]. However, using SPAD devices for crop diagnosis is still time consuming. With the advent of low-cost unmanned aerial vehicles (UAVs), several authors have reported faster and accurate remote sensing tools and methods [6,7] to estimate the N canopy from aerial multispectral imagery [8,9]. One of the most common methods used, relies on sensing the canopy reflectance in both visible (VIS) and near-infrared (NIR) wavelengths, using hyperspectral sensors [10–13].

Multiple vegetation indices (VI) have been established to associate specific spectral reflectance wavelengths with the crop variables of interest [14,15]. Although these VIs can be used to estimate the N contents in rice plants, there is no single standard set of VIs that works across all crop stages, rice varieties, soils, and atmospheric conditions. Most of the existing body of work associated with the estimation of N contents in plants, has been conducted using multi-variable regressions [5,16], in which linear functions are defined by combining and weighting each VI according to the regression model. This approach is simple, but sometimes inaccurate since the evolution of the crop tends to exhibit highly nonlinear effects that affect the N content over time. Machine and deep learning methods have recently gained traction, in solving these issues. Machine learning has the potential to drive significant breakthroughs in plant phenotyping [17–20].

For canopy N estimation, training machine learning algorithms requires the precise extraction of VI features from the aerial multispectral imagery. Several authors have applied data fusion methods from different sensors [21–23] for applying image mosaicing techniques [24–27], computing crop surface models based on dense image matching (DIM) techniques [28], or applying photogrammetry methods that are computationally expensive [26,27,29,30]. Other approaches rely on the real-time segmentation and image registration for the extraction of relevant features associated with the leaf/canopy N, including edge detection thresholding, color histograms, and clustering (otsu, K-means, watershed) [31–33].

In this work, we improve on the stabilization control of the UAV for acquiring accurate plot imagery, instead of relying on mosaicing or photogrammetry methods. Commercial UAV quadrotors used for the monitoring of the canopy N usually come with a proportional-integral-derivative (PID) waypoint navigation autopilot. The lack of proper UAV angular stabilization in the presence of large wind perturbations limits the accuracy of image registration algorithms, therefore affecting the estimation of canopy N content through image processing. To address this problem, we demonstrate a nonlinear trajectory-tracking controller that enables precise UAV positioning through a stabilization/attitude control loop. We call this method backstepping+ desired angular acceleration function (BS + DAF). This approach incorporates UAV aerodynamics information within the control law to produce roll and pitch acceleration commands to reduce abrupt angular acceleration changes caused by external wind disturbances. This added compliance enables the UAV to hover over the crop plots precisely, which in turn allows for capturing imagery that can be individually processed in real-time.

Here, we combine an autonomous UAV to acquire multispectral imagery from a crop, with machine learning methods as a means to high-throughput nitrogen content estimation. Three major contributions are involved:


(iii) The integration of machine learning methods to process nonlinear N dynamics with the calculations of the VIs during all stages of crop growth. A comprehensive characterization of the crop by designing a ground-truth dataset with several contrasting rice genotypes and accurate direct measurements of leaf nitrogen (training model).

### **2. Materials and Methods**

### *2.1. System*

Figure 1 details the proposed approach. Our UAV is a quadrotor called the AscTec Pelican (manufactured by Intel's Ascending Technologies (https://robots.ros.org/astec-pelican-and-hummi ngbird/)). This UAV comes with a C++ software development kit (SDK) that enables onboard code development with ease. A high-level Atom Intel processor (HLP) offers enough computing power to run solutions in real-time, whereas a low-level processor (LLP) handles the sensor data fusion and rotor control with an update rate of 1kHz. As shown in Figure 1, we developed a closed-loop attitude controller to drive the low-level rotor's control running in the LLP. This control method depends on the dynamics of the UAV to properly reject wind disturbances and keep the UAV steady. During flight, a dataset of multispectral imagery is precisely collected aiming at the above-ground estimation of nitrogen by using machine learning methods. In previous work [35], our UAV system was tested during two years of in-field experiments with the aim of estimating above ground biomass accumulation. Here, we have extended the capabilities of the system in Figure 1 by developing and integrating novel methods for UAV flight control, multispectral imagery segmentation for VI feature extraction, and their impact on nitrogen estimation using machine learning algorithms.

**Figure 1.** Unmanned aerial vehicle (UAV)-based robotic system for canopy nitrogen estimation in rice crops.

The UAV was equipped with the Parrot Sequoia multispectral sensor (https://www.parrot.com /business-solutions-us/parrot-professional/parrot-sequoia) fabricated with 4 separate bands: green, red, red-edge, and near-infrared. The camera offers a resolution of 1280 × 960 for each independent spectral sensor, yielding a crop-to-image resolution of 1.33 cm/pixel according to the flying altitude of 20 m. In addition, the multispectral camera comes with a radiometric calibration target that enables offline reflectance calibration across the spectrum of light captured by the camera, and an onboard irradiance sensor designed to correct images for illumination differences, all of which enables its outstanding performance in cloudy conditions, as evaluated in [36]. In our application, this calibration procedure was executed prior to any flight mission of the UAV. Additional sensors such as an IMU, magnetometer, and GPS are also embedded within the sunshine sensor. Figure 2a depicts the mounted camera setup, while Table 1 details the general specifications of our system.


**Table 1.** General specifications.

<sup>1</sup> Differential Global Positioning System (DGPS) Piksi module: https://www.swiftnav.com/piksi-m ulti. <sup>2</sup> Ground Sampling Distance (GSD) calculator: https://support.pix4d.com/hc/en-us/articles/ 202560249-TOOLS-GSD-calculator#gsctab=0. 3 International Center for Tropical Agriculture (CIAT) website: https://ciat.cgiar.org. <sup>4</sup> Weather information: https://weatherspark.com/y/24273/Avera ge-Weather-in-Villavicencio-Colombia-Year-Round.

**Figure 2.** Crop field: (**a**,**b**) each plot was designed with an area of 5.7 m<sup>2</sup> and a rice crop density of 38.4 kgha−<sup>1</sup> . The UAV is programmed to capture multispectral imagery from the plots of interests by using a Sequoia camera manufactured by Parrot <sup>7</sup> . (**c**) The results reported in this work were obtained during three stages of rice growth: vegetative, reproductive, and ripening with an entire phenological cycle ranging between 86–101 days.

#### *2.2. Rice Crops*

The crops were designed with 3 spatial repetitions containing 8 contrasting rice genotypes in terms of N concentration, biomass accumulation, and flowering: FED50, MG2, ELWEE, NORUNKAN, IR64, AZUCENA, UPLRI7, and LINE 23. These rice varieties have a phenological cycle ranging between 86-101 days , as detailed in Figure 3b. The experimental design consisted of six months of in-field sampling with three different planting crops. For each experiment, the amount of applied nitrogen and water was modified as follows:


Experimental trials were conducted during the dry season. To assess the effects of limited and permanent irrigation on the crops, leaf temperature (MultispeQ (https://www.photosynq.com)), and soil humidity (AquaPro (http://aquaprosensors.com)) were constantly monitored. Fertilizers were applied accordingly in order to maintain the crops through the phenological cycle. Given that, the same amount of fertilizers were applied for the experiments: 60 kgha−<sup>1</sup> of *P*2*O*<sup>5</sup> (diammonium phosphate) and 130 kgha−<sup>1</sup> of *K*2*O* (potassium chloride), as detailed in the experimental protocol available in the Supplementary Materials section.

Figure 2 details some characteristics of the crop. As shown in plots (a), (b), the length of a plot was about 2.75 m with a distance between plants of 25 cm and 30 cm between rows. Within each plot, we defined 6 sampled areas with 1 linear meter in length (containing four plants). A ground-truth was defined based on the direct measurements of plant chlorophyll using a SPAD 502 Plus meter (Konica-Minolta) over these sampled areas. Measurements from the crop were obtained during three stages of rice growth: vegetative, reproductive, and ripening. Figure 3b details the specifications of the crop experiments reported here.

**Figure 3.** (**a**) Correlation between soil plant analysis development (SPAD) readings and leaf N concentration [37]. (**b**) Experimental specifications.

Table 2 details how the dataset was collected, in which the SPAD value corresponds to the average of 24 measurements conducted over each plot, as depicted in Figure 3b. All ground-truth data are available through the Open Science Framework in the Supplementary Materials section. On the other hand, Figure 3a shows the linear correlation used to relate the measured SPAD value with the leaf-blade N concentration [37].

As mentioned, vegetation indices are widely used to quantify both plant and soil variables by associating certain spectral reflectances that are highly related to variations in canopy chemical components such as nitrogen. From the extensive list of vegetation indices (VIs) available [38–41], we selected 7 VIs with sufficient experimental evidence and quantitative trait loci (QTL)-based characterization regarding their high correlation with rice nitrogen [42]. Table 3 details the selected VIs. These formulas are applied in different wavelengths for taking into account the changes in canopy color, since several factors affect the spectral reflectances of the crop: solar radiation, plant morphology and color, leaf angles, undergrowth, soil characteristics, and water.

In our system, the parrot sequoia camera has a solar radiation sensor that compensates the light variations in the canopy. The change in the rice canopy color is perhaps the most notable variation during the phenological cycle. In the vegetative stage, the green color is predominant whereas in the reproductive stage, panicle formation starts and thus yellow features appear in the images. In ripening, the maturation of the plants occur while the leaves begin to senesce, turning the yellow color predominant. These changes can be observed in Figure 2c. Furthermore, different wavelengths of light have a different level of plant absorption depending on the leaf composition given by several genetic traits. In particular, the relation between the selected VIs in Table 3 with the photosynthetic activity and canopy structural properties has allowed the association of certain spectral reflectances that are highly related to the physico-chemical canopy N variations in plants, especially the green, red, red-edge, and near infrared bands.


**Table 2.** An example of a ground-truth dataset. The crop field was designed with three spatial repetitions (Rep) containing 8 contrasting rice genotypes.

**Table 3.** Selected near infrared vegetation indices (extracted features). The *ρ<sup>f</sup>* term denotes the reflectance of the frequency *f*).


The selected VIs exhibit a strong dependence on the NIR reflectance due to leaf chlorophyll absorption, providing an accurate approach to determine the health status of the plants and the canopy N. Most of the existing body of research focused on multispectral-based N estimations [14,21,40,42,43], combine the information provided by several vegetation indices in order to avoid saturation issues. For instance, the NDVI, which is one of the most common VIs used, tends to saturate with dense vegetation. In turn, the NDVI alone is not accurate during the reproductive and ripening stages of rice growth. Here, we combine several VIs across the crop stages, to ensure data on wavelengths located in

the red-edge and another spectral reflectances that accurately express the healthy status of the leaves (higher NIR and green-band readings).

#### *2.3. UAV Stabilization Control*

The estimation of the canopy N requires the precise extraction of Vegetative Index features from the acquired multispectral imagery. Capturing aerial images during UAV hovering that accurately matches with the GPS positions of the SPAD measurements at ground-level is essential.

As previously detailed in Figure 1, three closed-loop controllers are needed to regulate: (i) the *X*-*Y* position based on GPS feedback, (ii) the *Z* altitude based on barometric pressure and laser readings (pointing downwards), and (iii) the *φ*, *θ*, *ψ* attitude based on IMU data. The UAV is constantly subjected to wind disturbances that cause unsteady angular motions and therefore imprecise trajectory tracking. Consequently, aerial imagery captured across the crop with the UAV system is affected. To overcome this issue, we replaced both roll and pitch PID-based controllers by a robust nonlinear backstepping (BS) control.

The classical BS method has several advantages. It explicitly takes into account the nonlinearities of the UAV model defined in Equation (A2) (see Appendix A), and most importantly, allows the incorporation of a virtual control law to regulate angular accelerations. For this, we have derived a desired acceleration function (DAF) for roll and pitch. This enhanced controller is called backstepping+DAF (BS+DAF). Our goal is to use the dynamics equations of motion (EoM) defined in Algorithm A1 within the control law in order to compensate for abrupt angular acceleration changes, concretely in roll and pitch. The DAF terms improve the responsiveness of the control law to external perturbative forces. Appendix B details the control law derivation for roll and pitch.

The BS + DAF control supports on the Lyapunov stability concept that guarantee asymptotic stabilization around the equilibrium points. For our application, we require both roll *φ* a pitch *θ* angles to remain in zero while the UAV is hovering above the crop for capturing multispectral imagery, i.e., *e<sup>φ</sup>* = *φ <sup>d</sup>* <sup>−</sup> *<sup>φ</sup>* <sup>→</sup> <sup>0</sup> and *<sup>e</sup><sup>θ</sup>* <sup>=</sup> *<sup>θ</sup> <sup>d</sup>* <sup>−</sup> *<sup>θ</sup>* <sup>→</sup> 0. Otherwise, the set-point references for both roll and pitch controllers are defined by the *X*-*Y* position controller.

As previously mentioned, high wind-speed perturbations affect the UAV during hovering. Nonetheless, our controller law is sensitive to angular acceleration changes caused by external wind disturbances, thanks to both error dynamics (*e*2) and the DAF terms (*φ*¨*<sup>d</sup>* ) in Equation (A18) (Appendix B). Given that, we present how to model the effects of the wind over the UAV aerodynamics, by following the wind effect model (WEM) developed by [44]. Figure 4 details the control architecture and how the WEM module has been incorporated as an external disturbance acting on the UAV body frame.

**Figure 4.** Attitude (roll and pitch) control architecture. The UAV has a low-level PID-based controller to drive each rotor. The proposed backstepping+DAF generates the references to the inner loop according to the dynamics of the system. A wind effect model has been adopted from [44] to add disturbances to our model.

The WEM defines the relationship between the oblique airflow acting on a propeller and the resulting forces and moments applied onto the UAV. In our quadrotor system, the oblique airflow form and angle of 90*<sup>o</sup>* with the propeller azimuth angle. In terms of aerodynamics, the incoming airflow induces an increase in vertical thrust (*Toi*), a side force (F*wind*), and a pitching moment (M*wind*). The WEM presented by [44] uses blade element momentum theory to determine the aforementioned effects. Here, we are interested in incorporating the effects caused to the thrust and the pitching moment (M*wind*) as a function of the magnitude of a given wind speed vector *Vwind* and the rotational speed *ω* of each propeller driven by the rotor's model. In this regard, the WEM is defined as:

$$M\_{\rm wind} = \begin{bmatrix} \cos\left(\beta + \frac{\pi}{2}\right) & \sin\left(\beta + \frac{\pi}{2}\right) & 0\\ -\sin\left(\beta + \frac{\pi}{2}\right) & \cos\left(\beta + \frac{\pi}{2}\right) & 0\\ 0 & 0 & 1 \end{bmatrix} \begin{bmatrix} M\_{\rm prop} \\ 0 \\ 0 \end{bmatrix} + \frac{l}{2} \begin{bmatrix} \mathbb{C}\_{4}T\_{4} - \mathbb{C}\_{2}T\_{2} \\ \mathbb{C}\_{1}T\_{1} - \mathbb{C}\_{3}T\_{3} \\ 0 \end{bmatrix} \tag{1}$$

The angle *β* determines the direction of the *Vwind* vector. Depending on the direction of the applied relative wind, certain propellers will be more affected by this external disturbance. Tran et al., in [44,45], propose the use of a weight function that assigns values between 0 to 1, where the maximum value of 1 means the propeller is directly exposed to the wind. In Equation (1), *C* corresponds to the weighting vector that affects the thrust *T* generated by each propeller. The parameter *l* is the distance between non-adjacent propellers and *Mprop* is:

$$M\_{prop} = \begin{bmatrix} \mathbf{C}\_1 & \mathbf{C}\_2 & \mathbf{C}\_3 & \mathbf{C}\_4 \end{bmatrix} \begin{bmatrix} \tau\_1\\ \tau\_2\\ \tau\_3\\ \tau\_4 \end{bmatrix} \tag{2}$$

Therefore, the scale of the weighting vector *C* is defined based on the magnitude of the applied wind speed vector and the corresponding direction (the *β* angle between the relative wind vector pointing towards the UAV body frame). In addition, the rotor's speeds *ω* are required to calculate the augmented thrusts *T* and torques *τ* (weighted by *C*). Further details on this model can be found in [44,45].

Sections 3.1 and 3.3 present the results regarding the impact of precise UAV positioning on the accurate estimation of the canopy N through the entire phenological cycle.

#### *2.4. Multispectral Image Segmentation*

Multispectral imagery is evaluated by using a multispectral image segmentation method called GrabCut [34]. The original method is widely used for its ease of implementation and for the excellent results in generating a binary classification; however, it suffers from the drawback of being a semi-manual algorithm. The original GrabCut method requires a manual input during the algorithm iteration in order to properly determine both background and foreground pixel values.

This section introduces a modified version of the GrabCut algorithm that is fully automatic, thanks to an added refinement step using a guided filtering [46] to extract the relevant pixel information associated with the plant's canopy. Our approach solves an optimization problem using an energy function that allows the proper labeling of texture in the multispectral image by using a Gaussian mixture model. As mentioned, we use a refinement process based on a guided-filter by taking into account information from all band channels of the multispectral camera: green, red, red-edge, and near-infrared. The resultant multispectral image-mask includes only relevant pixel information that accurately represents the canopy for the estimation of N.

#### 2.4.1. GrabCut Segmentation

The GrabCut method requires an initial model known as a trimap. This model consists of a partition of the input image into three regions: a foreground *TF*, a background *TB*, and a region with pixels that result from the combination of the foreground and background colors *TU*.

The image is modeled as an array of values **z** = (*z*1, ..., *zN*), and the output segmentation can be a channel of values between 0 and 1 or a hard segmentation with a binary assignment (0 or 1). The segmentation is written as *α* = (*α*1, ...*αN*) with 0 ≤ *α<sup>n</sup>* ≤ 1, or *α<sup>n</sup>* = {1, 0}.

The GrabCut algorithm also requires a model for the distribution of the foreground/background colors and gray levels. This distribution can be represented as a histogram directly assembled from *T<sup>F</sup>* and *TB*, as: *θ* = {*h*(*z*; *a*), *a* = 0, 1}. Under this trimap model, the segmentation algorithm determines the values of *α* from the image **z** and the distribution model *θ*. The *α* values are calculated from an energy minimization function, as:

$$\mathbf{E}(\underline{\mathbf{g}}, \underline{\theta}, \mathbf{z}) = \mathcal{U}(\underline{\mathbf{g}}, \underline{\theta}, \mathbf{z}) + V(\underline{\mathbf{g}}, \mathbf{z}), \tag{3}$$

where the sub-function *U* evaluates the fitness by assigning a low score if the segmentation (*α*) is consistent with the image **z**, defined as follows:

$$\text{LU}(\mathfrak{a}, \theta, \mathbf{z}) = \sum\_{\mathfrak{n}} -\ln h(z\_{\mathfrak{n}}; \mathfrak{a}\_{\mathfrak{n}}) \tag{4}$$

Instead of using the histogram as the estimator of the probability density function, the algorithm uses a Gaussian mixture model in order to take into account the information coming from all channels.

$$V(\underline{a}, \mathbf{z}) = \gamma \sum\_{(m,n)\in \mathbb{C}} [a\_n \neq a\_m] e^{-\beta(z\_m - z\_n)^2} \tag{5}$$

In Equation (5), the sum set **C** ∈ 3 × 3 refers to the neighbors pixels in a given window, and the the term *β* can be calculated according to Equation (6).

$$\beta = \frac{1}{\left(2E\left|\langle(z\_m - z\_n)^2\rangle\right|\right)}\tag{6}$$

By using the global minimum in 7, the image segmentation is estimated as:

$$
\underline{\Psi} = \arg\min\_{\underline{\Psi}} \mathbf{E}(\underline{a}, \underline{\theta}, \mathbf{z}) \tag{7}
$$

Algorithm 1 details the original GrabCut method. In order to eliminate the third manual step of the algorithm, a fixed background image *T<sup>B</sup>* mask is used during the iteration, and a guided-filter refinement process is applied to achieve an automatic segmentation, as detailed in Algorithm 2.

**Algorithm 2:** Modified GrabCut with guided filter calculation. In the algorithm, *Er*(*I*) denotes a function that calculates the image mean over a radius *r*, the operations .∗ and ./ denotes the matrix element-wise calculation, and *q* is the image output.

**Step 1: Initialization of** *TB***,** *T<sup>F</sup>* **and** *TU***.** Initialize the trimap *T<sup>F</sup>* as the foreground Initialize the trimap *T<sup>B</sup>* as the background All remaining pixels are set as a possible foreground pixels *TUF* **Step 2: Iterative minimization.** Assign and learn GMM Minimize energy function Estimate segmentation image *α* **Step 3**:Input image *p*, input guidance *I*, radius *r*, and regularization *e*. 1: *µ<sup>I</sup>* ← *Er*(*I*), *µ<sup>p</sup>* ← *Er*(*I*), *Corr<sup>I</sup>* ← *Er*(*I*. ∗ *I*), *CorrI p* ← *Er*(*I*. ∗ *p*). 2: *σ* 2 *<sup>I</sup>* ← *Corr<sup>I</sup>* − *µ<sup>I</sup>* .\**µ<sup>I</sup>* , *σ* 2 *I p* ← *CorrI p* − *µ<sup>I</sup>* .\**µ<sup>p</sup>* 3: *a* ← *σ* 2 *I p*./(*σ* 2 *<sup>I</sup>* + *e*), *b* ← *µ<sup>p</sup>* − *a*. ∗ *µ<sup>I</sup>* 4: *µ<sup>a</sup>* ← *Er*(*a*), *µ<sup>b</sup>* ← *Er*(*b*) 5: *q* = *µa*.\**I* + *µ<sup>b</sup>*

#### 2.4.2. Guided Filter Refinement

The guided filter (GF) [46] can be defined as a convolutional Bilateral Filter with a faster response due to its *O*(*N*) computational complexity. In this regard, the output of each pixel denoted as *q* can be expressed as a weighted average over the convolutional window *W* (*i*, *j* denote the pixel coordinates):

$$q\_i = \sum\_j \mathcal{W}\_{ij} p\_j \tag{8}$$

The GF implies that an image can be filtered using the radiance of another image as guidance. We exploited this concept to filter our segmented plot mask created with the GrabCut algorithm, with the aim of refining the segmented image with the original image as guidance. In this regard, the weight used by the GF is given by:

$$\mathcal{W}\_{ij}^{GF}(I) = \frac{1}{|\omega|^2} \sum\_{k; (i,j) \in \omega\_k} \left( 1 + \frac{(I\_i - \mu\_k)(I\_j - \mu\_k)}{\sigma\_k^2 + \varepsilon} \right),\tag{9}$$

where *WGF ij* depends entirely of the guidance image *I*. The parameters *µ<sup>k</sup>* and *σ* 2 *k* are the mean and variance of the guidance image *I* estimated over a window *w<sup>k</sup>* , *e* denotes a regularization parameter and |*ω*| is the number of pixels in the window *w<sup>k</sup>* .

Section 3.2 presents the results of applying the proposed modified version of GrabCut, by comparing the method against traditional segmentation techniques such as thresholding, K-means, meanshift, but also against the original semi-manual GrabCut method.

#### *2.5. Machine Learning for N Estimations*

As detailed in Table 2, the ground-truth for training machine learning (ML) algorithms was defined based on the direct measurements of plant chlorophyll using a SPAD 502 Plus meter (Konica-Minolta) over these sampled areas, as depicted in Figure 1. Datasets contain the measured SPAD value that directly correlates with the leaf-blade N concentrations by following the linear correlation [37] defined in Figure 3a. In this regard, measurements from the crop were obtained during three stages of rice growth: vegetative, reproductive, and ripening, in which 3 trials were conducted per crop stage. These datasets are the result of 10 flights per trial, capturing around 500 images, and yielding a database of 1500 images per stage. Since 3 trials were conducted per crop stage, around 13, 500 images were processed in this work. Figure 3b details the experimental specifications.

The ML methods were trained with a set of images accounting for the 60% of the entire database. For the final estimations of leaf nitrogen, we used the remaining 40% of the database (testing phase of the ML methods). The entire imagery dataset and the ground-truth available in the Supplementary Materials section. Here, we report on the use of classical ML methods based on multi-variable linear regressions (MLR), support vector machines (SVM), and artificial neural networks (NN) for the estimation of the canopy N.

For MLR models, the VIs from Table 3 were combined by following the formula: *N* = *β*<sup>0</sup> + 7 ∑ *k*=1 *βk* (*V I*)*<sup>k</sup>* , where the parameter *β<sup>k</sup>* changes accordingly to the growth stage of the plant, by weighting each VI independently. In this regard, each crop stage will have an independent MLR model that linearly fits the N content.

With the aim of improving the adaptability of the estimation models by considering several nonlinear effects (dense vegetation, optical properties of the soil, and changes in canopy color), this section presents the estimation results using support vector machines (SVM) and artificial neural networks (NN). SVM models were used with different kernel functions with the aim of determining the proper mathematical function for mapping the input data. Six different kernels based on linear, quadratic, cubic, and Gaussian models will be tested for each crop stage independently.

Contrarily to the MLR and SVM methods, in which N estimation models are determined for each crop stage independently, neural networks enable the combination of the entire dataset (SPAD values and VI extracted features) into a single model that fits for the entire crop phenological stages. Several non-linear training functions are tested with different hidden layers. In addition, we discarded the use of deep-learning methods such as convolutional neural networks (CNN), due to the high computational costs associated to the pooling through lots of hidden layers in order to detect data features. For this application, we use well-known vegetative index features (reported in Table 3) that have been widely used and validated in the specialized literature [14,40,42]. Other image-based features such as color, structure, and morphology do not work well with multispectral imagery of 1.2 mega-pixel in resolution, compared to the 16 mega-pixel in the RGB image. In fact, the main advantage of using VIs (as features for training), relies on having information at different wavelengths, providing key information of the plant health status related to N.

In Section 3.3, we report a comprehensive comparison among multi-variable linear regressions (MLR), support vector machines (SVM), and artificial neural networks (NN) for the estimation of the canopy N. We used three metrics based on the root mean square error (*RMSE*), Pearson's linear correlation (*r*), and coefficient of determination (*R* 2 ), detailed as follows:

$$\begin{array}{l} \text{RMSE} = \sqrt{\frac{1}{n} \sum\_{i=1}^{n} (\hat{N}\_{i} - \bar{N}\_{i})^{2}} \\\\ \tau = \frac{\sum\_{i=1}^{n} (N\_{i} - \overline{N})}{\sqrt{\sum\_{i=1}^{n} (N\_{i} - \overline{N})^{2}}} \\\\ \mathcal{R}^{2} = \frac{\sum\_{i=1}^{n} (N\_{i} - \hat{N}\_{i})^{2}}{\sum\_{i=1}^{n} (N\_{i} - \overline{N})^{2}} \end{array} \tag{10}$$

where *n* is the total number of samples, *N* is the measured value (SPAD scale cf. Figure 3a), *N*ˆ is the estimated nitrogen, and *N* is the mean of the *N* values. In this application, the Pearson's metric indicates the linear relationship between the estimated value of N VS the measured one (ground-truth). On the other hand, the *R* <sup>2</sup> metric is useful since it penalizes the variance of the estimated value from the measured one.

As mentioned, the ML models require the calculation of the VI formulas presented in Table 3, in order to determine the input feature vector. Every image captured by the UAV is geo-referenced with the DGPS system with a position accuracy of 35 [cm] (see Table 1). Given that, aerial imagery is registered by matching consecutive frames according to a homography computed with the oriented features from accelerated segment test (FAST) and rotated binary robust independent elementary feature (BRIEF) computer vision techniques [47]. The UAV path is planned by ensuring an image overlapping of 60%, which allows for the precise matching with the GPS coordinates of the ground-level markers to ensure a proper comparison between the aerial-estimations and ground-measurements of plant N. Part of this procedure is described in previous work reported in [48]. In turn, the metrics described in Equation (10) report on the performance of the ML models based on the aerial-ground data matching of canopy N. The aforementioned geo-referencing process is conducted by using the affine transformation, a 1st order polynomial function that relates the GPS latitude and longitude values with the pixel coordinates within the image. This procedure is also detailed in previous work reported in [48].

Figure 5 details the procedure required by the machine-learning models. After registration, images are segmented by using the proposed GrabCut method. Although all pixels in the image are evaluated in Algorithm 2, we use pixel clusters of 10 × 10 to calculate the VI formulas, as shown in Figure 5a. After segmentation, pixels representing the rice canopy are separated from the background, as shown in Figure 5b. All vegetation indices are calculated as the average within each image sub-region.

At canopy-level, several factors affect the spectral reflectances of the crop: solar radiation and weather conditions, plant morphology and color, leaf angles, undergrowth, soil characteristics, and ground water layers. As mentioned in Section 2.1, the multispectral camera comes with an integrated sunshine sensor to compensate light variations in the resultant image. In addition, the GrabCut segmentation method deals with the filtering of undergrowth and other soil noises. Despite that, it remains crucial to validate the accuracy of the selected VIs, since the estimation of N depends on the accuracy and reliability of the extracted features. In this regard, Figure 5c presents several VI calculations in order to analyze the variance of the VI features through the entire phenological cycle of the crop. For this test, we calculated the VI formulas from 360◦ random images per crop stage under different environmental conditions. We show the most representative VIs to nitrogen variations: simple ratio (SR), normalized difference vegetation index (NDVI), green normalized difference vegetation index (GNDVI), and soil-adjusted vegetation index (MSAVI). As observed, the low variance exhibited through the entire phenological cycle allows us to use the VI features as inputs to the ML models detailed in Figure 5d.

Desempeño inicial (GrabCut + GF)

**Figure 5.** Procedure required by the machine-learning models: (**a**) single images are processed by using the proposed GrabCut segmentation method. Image subregions of 10 × 10 pixels were analyzed. (**b**) Pixel clustering in the Red+Green+Near-infrared (RGN) space. (**c**) Vegetative index calculation for the foreground cluster. Low index variability was observed at each crop stage. (**d**) Machine-learning methods applied for the estimation of the N canopy.

#### **3. Results**

The experiments reported in this paper were carried out during 2018 in the rice farms of the Center of International Agriculture (CIAT). Trials were conducted during the dry season (June-September), as detailed in Table 1. We conducted 3 trials per crop stage (vegetative, reproductive, and ripening).

#### *3.1. UAV Control Results*

Figure 6 shows simulation results to evaluate the performance of the proposed controller in terms of wind-disturbance rejection. In plot (a), the desired trajectory (red line) was defined for the UAV to cover the crop while following the trapezoidal velocity profile shown in plot (b). This trajectory profile enables the UAV to hover at certain points (black dots) to capture aerial images. The maximum UAV velocity was set to 1.5 ms−<sup>1</sup> .

In plot (a), a wind disturbance (*Vwing* = 9 ms−<sup>1</sup> ) was added at the starting point of the trajectory, causing the UAV to mismatch the desired path (the *Vwing* vector was applied onto the *y* + direction). The response of the UAV driven by the BS-DAF attitude controller corresponds to the black line, whereas the PID controller is the green line. As observed, the BS + DAF immediately counteracted the disturbance by generating the corresponding roll command *φ* depicted in plot (d). This response allowed the UAV to follow the path precisely, maintaining the position error along the *y* axis near to zero, as shown in plot (c). Under this scenario, our system obtained a maximum tracking error of 2%.

A second wind disturbance was added when the UAV reached 10 m in altitude. In this case, *Vwing* was applied onto the *z*-direction (*z* points downwards). As observed, an augmented thrust caused a large altitude tracking error that affected the UAV for both control schemes similarly. Unlike for roll and pitch, the BS+DAF control does not drive the altitude loop. In general, the UAV was able to position in the hovering knot-points accurately (black dots in plot a), maintaining a minimum error with the

GPS waypoints. For the rest of the trajectory, the results demonstrate that the backstepping + DAF is accurate and reliable to reject external wind disturbances.

**Figure 6.** (Simulation) Closed-loop trajectory tracking comparison between the proposed backstepping (BS)-desired acceleration function (DAF) (black lines) and the classical PID control (green lines): (**a**) 3D navigation results. Wind disturbances were added at five instances during the trajectory (black arrows). The black dots indicate the UAV must stop to capture multispectral data. (**b**) Trapezoidal velocity profile for the desired path. (**c**) Position errors. (**d**) Roll and pitch profiles.

#### *3.2. Image Segmentation Results*

Algorithms 1 and 2 are applied for each input image in an iterative manner. In Figure 7a, two markers placed as geo-referenced guiding points appear in the RGB image (red dots in the top-right corner). The algorithm provides to the user with the option to manually select these points in order to remove them from the segmentation process. An example of semi-manual GrabCut segmentation with minimal user interaction is shown in Figure 7d. If those points are not manually removed, the corresponding pixels will be classified either in the background of foreground cluster automatically. Subsequently, plots (b) and (c) show how the initial foreground and background are defined. Plot (d) shows an example of applying Algorithm 1 to the input image from (a), whereas plot (e) shows the results when combining the GrabCut with the guided-filter (GF) approach, yielding an automatic segmentation process, as detailed in Algorithm 2. As shown in plot (e), the GF refinement achieves richer detail in the final segmentation. This result can be compared against a traditional Otsu's threshold method shown in plot (f). Although the results from Figure 7e are promising, we implemented a final refinement process based on the so-called GF feathering filtering, in which Algorithm 2 is used with a carefully selected radius *r* and regularization parameter *e*. The GF feathering filtering enables a faster implementation of Algorithm 2 by avoiding the *O*(*N*<sup>2</sup> ) computational restriction of the convolution.

The quality of the proposed segmentation method was tested and compared against the binary mask segmentation of three well-known methods: (i) k-means [49], (ii) mean-shift [50], and (iii) manual threshold over the HSV color representation [51]. The acronyms used in the comparison results are listed in Table 4. By applying the fully automatic segmentation method described in Algorithm 2, we used the precision, recall, and the F1-score to measure the performance of the proposed segmentation listed as GCFauto in Table 4. Figure 8 and Table 5 show the results.

For image pre-processing, the automatic GrabCut method in Algorithm 2 with the guided-filter refinement enabled precise plot segmentation of multispectral imagery with richer detail of the canopy structure and proper elimination of the background cluster. In this regard, Figure 9 shows the final segmentation results. In plot (a), the segmented image is achieved by combining the multispectral

image shown in plot (b) and the RGB images from plot (c). In plot (d), the values for the radius and *e* depend on the image size. For this application, those values were experimentally determined as *r* = 60 and *e* = 0.001<sup>2</sup> . Without this segmentation method, significant image corrections would be needed.

**Figure 7.** Segmentation results: (**a**) Original image, (**b**) initial *TF*, (**c**) initial *TB*, (**d**) GrabCut segmentation with manual hints, (**e**) Proposed GrabCut + guided filter (GF) refinement , (**f**) Otsu's threshold + GF refinement. Insets in plots d-f show closeups for each segmentation output.

**Figure 8.** F1 metric of all tested algorithms in Table 4.

**Figure 9.** Final GF feathering filtering results for segmentation. (**a**) Example for an output image. (**b**) *α* image from GrabCut using the complete Algorithm 2, (**c**) original RGB image used as guidance, (**d**) refinement of image in plot (a) using the guidance from plot (b), *r* = 60, and *e* = 0.001<sup>2</sup> .

**Table 4.** Acronyms for the segmentation methods tested.


**Table 5.** Mean Results for precision, recall, and F1-score for 10 images.


#### *3.3. Nitrogen Estimations*

In the following, we present a comprehensive comparison among multi-variable linear regressions (MLR), support vector machines (SVM), and artificial neural networks (NN) for the estimation of the canopy N. All these models were trained using the ground-truth (cf. Table 2) containing the direct measurements of leaf chlorophyll based on SPAD readings. In this regard, the estimation results are all given in SPAD scale, in which the linear relationship with nitrogen is defined as: *N* = 0.079(*SPAD*) − 0.154, according to the work reported in [37].

#### 3.3.1. MLR Models

Figure 10 shows the estimation results using the MLR. The samples-axis corresponds to the aerial imagery used for the estimation of nitrogen thought the phenological cycle. As previously mentioned, direct SPAD measurements were conducted for selected crop plots in order to assemble the ground-truth dataset. Given that, our system selects those aerial samples matching with the GPS coordinates of the ground measurements. Overall, the MLR achieved an average N correlations of 0.93 (V), 0.89 (V), and 0.82 (Ri). The data was filtered using a Moving Average Filter to reduce signal fluctuations and smooth the estimation result. Table 6 details the numerical data. Since the coefficients for the linear regressions were found and calibrated for each stage independently, we found strong linear relationships in the vegetative stage between the VIs and the N accumulation. Through the ripening stage, the yellow color becomes predominant and parcels cannot be distinguished with ease

(as shown in Figure 2c). This changes in the canopy color and dense biomass accumulation tend to saturate the linear relationships between the VIs and the canopy reflectances.

**Figure 10.** Multi-variable linear regressions (MLR). Numerical data reported in Table 6.

**Table 6.** Numerical data obtained from the multi-variable linear regressions (MLR).


### 3.3.2. SVM Models

We first tried to find a single SVM model for the entire crop phenological cycle; however, the canopy reflectances and the VIs changed drastically from vegetative through ripening. In this regard, an SVM model was defined for each crop stage independently, as shown in Table 7. In addition, Figure 11 details the estimation results for the N dynamics. Based on these results, the following SVM configurations are selected for our estimation system:


#### 3.3.3. NN Models

Neural networks (NN) with several hidden layers and optimization algorithms were tested. We highlight the results with two and five hidden layers. In Figure 12, the NN with two layers achieved a correlation = 0.964, RSME = 3.315, and *R* <sup>2</sup> = 0.94. Increasing up to five layers, the NN achieved a correlation = 0.986, RSME = 1.703, and *R* <sup>2</sup> = 0.97. Several training functions were tested for each crop stage, where the BFGS Quasi-Newton functions achieved accurate results for most of the vegetative stages, whereas the Levenberg–Marquardt function for both reproductive and ripening. The numerical data are reported in Table 8. Based on the results, the following NN configurations are selected for our estimation system:



Las pruebas se harán con los siguientes Kernel:




diferentes máquinas entrenadas con un Kernel en particular para todas las etapas. Los resultados por etapa se presentan en 3 gráficas separadas para una mejor visualización de la tendencia de los datos. En cada

Se presentan en las figuras, los resultados de las predicciones generadas por parte de cada una de las 6


*Figura 33. Comparación Etapa Vegetativa (3) - SVR*

**Table 7.** Numerical data obtained from support vector machine (SVM) with several kernels.

34 *Figura 37. Comparación Etapa Cosecha (1) <sup>±</sup> SVR Figura 38. Comparación Etapa Cosecha (2) - SVR* Una vez se obtienen las máquinas que manifiestan una mayor correlación y sus correspondientes Kernel con base a las tablas, se procede a modificar el parámetro, con el fin de analizar el comportamiento de la máquina con diferentes valores de . Se presentarán los resultados por etapa variando el parámetro en la sección de análisis de resultados. x 10 x 7.5 x 5 x 2 x 1 x 0.75 **Figure 11.** Support vector machine (SVM). On the left, we have several kernels used for the vegetative stage of the crop. Likewise, middle and right plots show the results for reproductive and ripening, respectively. These results were achieved with a margin of tolerance *e* = 2 for both vegetative and reproductive and *e* = 0.5 for ripening. Table 7 reports the numerical data.

*Figura 37. Comparación Etapa Cosecha (1) <sup>±</sup> SVR Figura 38. Comparación Etapa Cosecha (2) - SVR*

x 2.5

*Figura 36. Comparación Etapa Reproductiva (3) - SVR*

*Figura 37. Comparación Etapa Cosecha (1) <sup>±</sup> SVR Figura 38. Comparación Etapa Cosecha (2) - SVR*

35

35

*Figura 40. Relación entre correlación y - Etapa Vegetativa Figura 41. Comparación variación <sup>±</sup> Etapa Vegetativa*

*Figura 42. Relación entre correlación y - Etapa Reproductiva Figura 43. Comparación variación <sup>±</sup> Etapa Reproductiva*

x 0.5

35

36

35

*Figura 36. Comparación Etapa Reproductiva (3) - SVR*

**Figure 12.** Artificial neural networks (NN). A 9:1 configuration was used for the two layer NN, whereas a 17:12:9:6:1 configuration for the five layer. In addition, several training functions were tested. Table 8 reports the numerical data.


**Table 8.** Numerical data obtained from the 5-layer NN with several training functions.

#### 3.3.4. Machine-Learning Comparative Results

Figure 13 shows overall N estimation results obtained for each machine learning model. In general, we demonstrated strong correlations between the canopy N and the corresponding vegetative indices. In the case of MLR models, the coefficients for the regressions were determined and independently calibrated for each crop stage. We encountered that the N dynamics have strong linear dependencies with MSAVI, GNDVI, and NDVI; concretely, during vegetative and reproductive stages. Table 9 reports the overall numerical data.

From the ROC curve reported in Figure 13d, the accuracy (ACC) was calculated for each ML model. Neural networks achieved an average ACC = 0.85, improving the estimations of canopy N over the other ML models. In fact, by comparing the N-to-SPAD correlations reported on Table 9, the correlation metric was improved over each crop stage. Table 9 compares the mean N estimations achieved by each ML method against the mean SPAD-based N readings measured at ground level. Mean results are presented for each crop stage: vegetative (V), reproductive (R), and ripening (Ri). The last columns in Table 9 report the mean linear correlations between estimations and measurements.

Vegetativa (VEG) 0.935 3.688 0.929 3.914 Reproducción (REP) 0.851 4.80 0.890 4.356 Cosecha (RIP) 0.819 2.355 0.822 2.318 *Tabla 3. Comparación Regresiones Lineales*

Se mostrarán los indicadores numéricos correspondientes al método de regresiones lineales en la tabla

Estimación sin filtrar Estimación filtrada CORRELACIÓN RMSE CORRELACIÓN RMSE

**6. ANÁLISIS DE RESULTADOS**

a continuación (3): REGRESIONES LINEALES

del método de regresiones lineales.

**6.1.1. Regresiones Lineales Multivariable**

**6.1.Implementación de Técnicas de Aprendizaje de Máquina**

estudiar, siguiendo la tendencia de una manera más aproximada.

predominó en el mayor desempeño entre las demás redes implementadas.

ETAPA DE CULTIVO RED NEURONAL CORRELACIÓN RMSE R2 VEGETATIVA BFGS Quasi-Newton (BFG) – 15 N 0.986 1.643 0.972 REPRODUCTIVA Levenberg-Marquardt (LM) – 6 N 0.9442 3.007 0.891 COSECHA Levenberg-Marquardt (LM) – 6 N 0.890 1.835 0.792 TOTAL - 0.973 2.170 0.948 *Tabla 20. Comparación Estimación Total – Redes Neuronales* En la figura 62 se muestra el comportamiento total de las redes neuronales definidas previamente. A su vez en la figura 61 se observan los resultados obtenidos si en vez de realizar una red neuronal por etapa se implementa solo una red para el conjunto de datos total. Esta red neuronal tiene seis neuronas en su capa oculta y su función de aprendizaje es el algoritmo Levenberg-Marquardt, ya que esta fue la red que

En la tabla 21 se logra observar la comparación cuantitativa entre las dos implementaciones de redes

a la tendencia de datos *groundtruth*.

parámetro .

 RMSE CORRELACIÓN R2 10 4,030 0.0000 0 7,5 3,073 0.7615 0,5798482 5 2,324 0.8306 0,6898657 2,5 2,401 0.8198 0,6720121 2 2,478 0.8184 0,6697144 1 2,284 0.8540 0,7292768 0,75 2,110 0.8699 0,7567429 0,5 2,060 0.8770 0,7691721 *Tabla 10. Análisis Variación ϵ – Etapa Cosecha* Partiendo del hecho que la región tubular que comprende los datos está en función del parámetro , el híper-plano generado debe aproximarse más a los vectores soporte generando una función más aproximada

Las figuras 57 y 58 muestran la diferencia en cuanto a tendencia de datos con 2 valores diferentes del

*Figura 57. Análisis ϵ=2 Figura 58. Análisis ϵ=0.5* La figura 58 presenta un modelado más acertado si es contrastado contra el modelado de la figura 57, dado que, al tener el parámetro épsilon menor, el híper-plano debe tener una distancia menor a los datos a

Las siguientes gráficas muestran la unión de todos los datos diferenciando la forma de entrenamiento. La figura 59 muestra el resultado de la estimación con los datos entrenados por separado con las máquinas que mejor correlación y <sup>2</sup> presentaron, mientras que la figura 60 muestra el resultado de la unión de todos los datos y un solo entrenamiento para todos. Dado el resultado encontrado en esta sección, este set de datos fue entrenado con Kernel cuadrático con un valor = 2, parámetros que mostraron el mejor desempeño en

43

46 La máquina con un mayor nivel de exactitud según esta medición son las redes neuronales. Exactitud Regresiones Lineales 0.82 Máquinas de Soporte Vectorial 0.78 Redes Neuronales 0.85 *Tabla 23. Accuracy* **Figure 13.** Overall N estimation results. (**a**–**c**) show average data for the estimated N dynamics from vegetative to ripening stages (between 86–101 days of phenological cycle). We compared the results obtained from MLR (linear regressions), support vector machines (SVM), and neural networks (NN). (**d**) ROC curve obtained for the three ML methods evaluated: ACC = 0.82 for MLR, ACC = 0.78 for SVM, and ACC = 0.85 for NN. (**e**) Histogram of N correlations during crop growth from the initial vegetative stage until ripening. Table 9 reports the numerical data.

de las máquinas, indicando cuál de todas posee una mayor probabilidad de clasificar los datos correctamente.



By comparing the neural networks (NN) against the multi-variable linear regressions (MLR) and support vector machines (SVM), the former estimator achieves higher correlations partly due to the reliability and size of the assembled ground-truth database for training, i.e., reliable N datasets (direct N-leaf measurements) with sufficient variations of the crops during the growth stage (see the experimental protocol in Section 2). In addition, some VIs, such as the simple ratio (SR), tend to saturate as long the the crop grows, but other vegetative indices that behave better with higher biomass and N concentration compensate the effect of the saturated ones. The nonlinear combinations of these VIs enable the NN to achieve accurate estimations.

Furthermore, we tested the effects of the UAV control architecture proposed in Section 2.3, for improving the N estimations by means of precise position tracking during flight. Figure 14 presents the results. Plot (b) shows the X-Y tracking trajectory for each flight control scenario. The UAV was flying at a constant altitude reference of 20 m over the crop at a maximum speed of 1.5 m/s. The black boxes represent the crop plots with ground-level markers where direct N measurements were taken via SPAD. The green dotted circles show the points where the UAV stops and hovers during 4 s to capture canopy imagery.

**Figure 14.** (Experimental) Effects of UAV navigation on N estimations. (**a**) UAV over the crop fields. The white markers placed at ground-level indicate the points for SPAD N measurements. (**b**) UAV X-Y trajectory tracking. Comparison between the proposed BS+DAF attitude controller and PID-based autopilot. The desired path was configured with hovering waypoints to capture canopy imagery. The star-like markers correspond to GPS data of the taken images. (**c**) N estimation results. (**d**) Histograms of N-to-SPAD correlations.

The paths followed by the UAV are shown for both controllers; one driven by the PID that comes standard with the UAV autopilot, and the other driven by the proposed BS+DAF. Likewise, the star markers represent the GPS points of the images taken while hovering (black color for the PID and red for the BS + DAF). As observed, the attitude stabilization of the UAV seems to be more affected with the PID, since the aerial samples within each hovering area (black star markers) are spread over different positions. Instead, the aerial samples obtained with the BS + DAF (red star markers) tend to be more concentrated within the crop plot of interest. The goal is actually capturing most images in the GPS coordinates that accurately matches with the GPS position of the white markers at ground-level, as depicted in Figure 14a.

Finally, Figure 14c presents the N estimation results achieved under both control flight scenarios. Larger fluctuations, noise, and estimation errors are presented when the ML methods are trained and tested with the imagery dataset captured by the PID-driven UAV. This is mainly due to the imprecise X-Y positioning of the aerial vehicle during hovering (caused by improper attitude stabilization), in which several images capture crop areas that do not necessarily match with the ground-level markers, as detailed in Figure 14b. On the other hand, the proposed BS+DAF controller demonstrated being suitable for the proper regulation of the angular motions of the UAV by compensating external disturbances faster and achieving accurate X-Y positioning. As a result, The N estimations correlate higher with the ground-truth. As shown in the histograms in Figure 14d, the N-to-SPAD correlations grouped higher for the experiments driven by the proposed BS+DAF controller.

#### **4. Discussion**

The results presented in this work demonstrate a significant improvement in leaf nitrogen prediction in rice crops from images obtained via UAVs, validated with high SPAD correlations. These results are thanks to a combination of technological contributions, namely: (i) a novel UAV stabilization control scheme named BS+DAF that enables precise multispectral aerial image acquisition and registration with ground-truth fiducials, even in the presence of strong wind perturbations, (ii) an automated GrabCut image segmentation method, which leads to finer detail of the plant's canopy in the RGN space, as detailed in Figures 5b and 9d, and (iii) the successful application of machine learning methods trained with the vegetation indices (VI) extracted from the segmented multispectral imagery. Figure 5c confirms the increased reliability and accuracy per VI by calculating the data variance at each crop stage.

In a similar manner, Table 9 summarizes the higher nitrogen-to-SPAD correlations achieved with the implemented neural networks, and confirms the significant nonlinear relationships between leaf spectral reflectances (VI) and chlorophyll-based estimated N concentrations. These N estimation results are comparable to other state-of-the-art results presented in the scientific literature. In [52], Figure 7b shows N estimations in rice using linear regression models, for which the authors report: *RMSE* = 0.212, correlation *r* = 0.89, and *R* <sup>2</sup> = 0.803. Our results (tabulated in Tables 8 and 9) reach mean correlations for the vegetative stage (*r* = 0.986), reproductive (*r* = 0.9442), and ripening stage (*r* = 0.89). In [53], Table 6 and Figure 5 present N-status estimations in rice using vegetation indices calculated with aerial UAV imagery. Authors compared several regression models, with a highest *R* <sup>2</sup> of 0.74 using the LDM method. In our case, Table 8 lists higher *R* <sup>2</sup> values from the Levenberg–Marquardt method.

Our approach contributes novel solutions to commercial UAV-based phenotyping technologies, particularly to image-based remote sensing applications that adopt photogrammetric geometric image post-processing methods for image correction, and enables crop/plot analysis in real-time.

One important drawback of our solution was the estimation of canopy N for crops with higher biomass. Counter to the expectation of finding a higher N correlation as a function of crop growth, the correlation results decreased in the ripening stage, as depicted in Figure 13e and from the numerical data presented in Table 9. We attribute this to the disproportionately high number of senescent leaves (yellow color with a bandwidth of 570–590 nm) that do not properly match with the VI dominant wavelengths (see Table 3). Consequently, we suggest introducing plant senescence reflectance as an added variable for the discovery of novel vegetative indices to enhance our system.

#### **5. Conclusions**

This paper presents an integrated UAV system with non-invasive image-based methods for the estimation and monitoring of the N dynamics in rice crops. Several challenges were addressed, associated with image segmentation and UAV navigation control, to achieve reliable machine learning training and highly correlated results to SPAD. The proposed BS+DAF attitude controller led to precise UAV way-point tracking, with position errors below 2%. This was key to capture aerial multispectral imagery during hovering that accurately matched with the GPS positions of the SPAD measurements at ground-level. This reduces the need for computationally expensive photogrammetry methods.

Higher N correlations were achieved with neural networks: 0.98 in the vegetative stage, 0.94 in reproductive, and 0.89 in ripening, with an ROC accuracy of 0.85. This is a promising result towards the autonomous estimation of rice canopy nitrogen in real-time. With the advent of small AI-dedicated systems on chip (SoC) and the powerful computing capabilities of our UAV system, we expect upcoming work to achieve real-time computation of the machine learning methods presented in this work.

Several challenges still remain for improving the N estimations, especially when the crop biomass is high. Rice crops commonly have several mixed varieties per small plot areas. Therefore, besides the assessment of the N dynamics as a function of the crop stages, it is also crucial to identify and classify the plants according to their genotype, given that nitrogen is highly sensitive to plant oxygen consumption, root growth, among other variables. To this purpose, the aerial identification of different plant varieties requires sensing spatial resolutions below 30 cm. In addition, high-dimensional data clustering is needed for the extraction of relevant features according to the plant variety.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/2072-4292/12/20/3 396/s1, FILE S1: RAW data supporting image segmentation metrics, multispectral imagery imagery used for machine learning testing, and nitrogen estimation results are available at the Open Science Framework: https://osf.io/cde6h/?view\_only=1c4e5e03b9a34d3b96736ad8ab1b2774. FILE S2: Experimental protocol for crop monitoring available at https://www.protocols.io/view/protocol-bjxskpne. VIDEO S3: The video is available in the online version of this article. The video accompanying this paper illustrates the steps performed during the experiments.

**Author Contributions:** Conceptualization, J.D.C., N.C.-B., F.C., and J.S.C.; methodology, J.D.C., M.C.R., F.C., and A.J.-B.; software, N.C.-B., J.S.C., and D.C.; validation, J.D.C., I.F.M., E.P., D.C., N.C.-B., and J.S.C.; formal analysis and investigation, J.D.C., M.C.R., E.P., F.C., I.F.M., and A.J.-B.; data curation, E.P.; writing—original draft preparation, J.D.C. and F.C.; writing—review and editing, A.J.-B., I.F.M., and M.C.R.; supervision, A.J.-B. and J.D.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was funded by the OMICAS program: *Optimización Multiescala In-silico de Cultivos Agrícolas Sostenibles (Infraestructura y validación en Arroz y Caña de Azúcar)*, anchored at the Pontificia Universidad Javeriana in Cali and funded within the Colombian Scientific Ecosystem by The World Bank, the Colombian Ministry of Science, Technology and Innovation, the Colombian Ministry of Education and the Colombian Ministry of Industry and Turism, and ICETEX under GRANT ID: FP44842-217-2018.

**Acknowledgments:** The authors would like to thank all CIAT staff that supported the experiments over the crops located at CIAT headquarters in Palmira, Valle del Cauca, Colombia; in particular to Yolima Ospina and Cecile Grenier for their support in upland and lowland trials. Also, to Carlos Devia from Javeriana University for the insights regarding the structuring of the datasets.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A. UAV Equations of Motion**

Equations of motion (EoM) that describe the inertial and aerodynamics effects are introduced, including a six-dimensional operator describing the spatial accelerations of the body frame as a function of the UAV inertias, moments, and aerodynamics. Second, we introduce a nonlinear MPC-based controller to regulate the stabilization of the UAV, specifically both pitch *θ* and roll *φ* angles, under external force perturbations.

Our UAV is a Vertical Takeoff and Landing (VTOL) four-rotor drone. As shown in Figure A1a, the body frame {*b*} is a six degree of freedom rigid body. Rotations about the body-frame axes are designated by the Euler angles: roll (*φ*), pitch (*θ*), and yaw (*ψ*) following standard aerodynamic conventions. In this sense, the 6D spatial acceleration *V*˙ *<sup>b</sup>* ∈ <6*x*<sup>1</sup> of the body frame can be written as:

$$\dot{V}\_b = \begin{bmatrix} \dot{\omega}\_b \\ \dot{\upsilon}\_b \end{bmatrix} = \begin{bmatrix} \ddot{\theta} \\ \ddot{\theta} \\ \ddot{\psi} \\ \ddot{x}\_b \\ \ddot{y}\_b \\ \ddot{z}\_b \end{bmatrix} \tag{A1}$$

**Figure A1.** Physical parameters used for UAV modeling and control. (**a**) Equations of motion are derived using three frames of reference: the inertial frame {*i*}, the body frame {*b*} located at the center of mass, and the rotors frame {*oi*}, ∀*oi* : 1...4. Each rotor generates a vertical thrust (*Toi*) that depends on the rotor angular speed (*ωoi*) and the geometrical properties of the propeller blades. (**b**) Blade properties for deriving aerodynamics equations. We used Blade-Element-Theory computation to calculate both lift and drag coefficients using the FoilSim III simulator provided by NASA [54]. Our UAV has a lift coefficient of *C<sup>L</sup>* = 1.6 and a drag coefficient of *C<sup>D</sup>* = 0.042, since *l*/*r* = 0.7.

Both rotational *ω*˙ and translational *υ*˙ accelerations could be derived from the Newton–Euler formulation, as:

$$
\dot{V}\_b = I\_b^{-1} [F\_b - \dot{I}\_b V\_b]\_\prime \tag{A2}
$$

being *<sup>I</sup><sup>b</sup>* ∈ <6*x*<sup>6</sup> the spatial inertia operator calculated at the Center of Mass (CM) of the body frame {*b*}. It can be expressed as:

$$\begin{aligned} \;^1I\_b = \begin{bmatrix} \;^1I\_b & 0\\ 0 & m\!\/I \end{bmatrix} = \begin{bmatrix} \;^1I\_{xx} & -\;^1I\_{xy} & -\;^1I\_{xz} & 0 & 0 & 0\\ -\;^1I\_{xy} & \;^1I\_{yy} & -\;^1I\_{yz} & 0 & 0 & 0\\ -\;^1I\_{xz} & -\;^1I\_{yz} & \;^1I\_{zz} & 0 & 0 & 0\\ 0 & 0 & 0 & m & 0 & 0\\ 0 & 0 & 0 & 0 & m & 0\\ 0 & 0 & 0 & 0 & 0 & m \end{bmatrix} \end{aligned} \tag{A3}$$

where *<sup>J</sup><sup>b</sup>* ∈ <3*x*<sup>3</sup> is the inertial tensor with *diag*(*Ixx*, *Iyy*, *Izz*) being the moments of inertia, *m* is the mass of the UAV, and *<sup>U</sup>* is a <sup>3</sup> <sup>×</sup> <sup>3</sup> identity operator. Likewise, the term *<sup>F</sup><sup>b</sup>* ∈ <6*x*<sup>1</sup> in Equation (A2) is the 6D spatial force acting on the CM of {*b*}. *F<sup>b</sup>* contains the effects caused by both inertial (*N<sup>b</sup>* ) and aerodynamics (*T<sup>b</sup>* ) forces acting on the body frame:

$$F\_{b} = \begin{bmatrix} N\_{b} \\ f\_{b} \end{bmatrix} = \begin{bmatrix} (N\_{b,x}) + (\tau\_{\Phi}) \\ (N\_{b,y}) + (\tau\_{\theta}) \\ (N\_{b,z}) + (\tau\_{\Psi}) \\ f\_{b,x} \\ f\_{b,y} \\ f\_{b,z} \end{bmatrix} = \begin{bmatrix} (\dot{\theta}\dot{\psi}\left[I\_{yy} - I\_{zz}\right]) + s\_{0b}\hat{f}(T\_{4} - T\_{3}) \\ (\dot{\phi}\dot{\psi}\left[I\_{zz} - I\_{xx}\right]) + s\_{0c}\hat{\mu}(T\_{1} - T\_{2}) \\ (\dot{\theta}\dot{\phi}\left[I\_{xx} - I\_{yy}\right]) + (T\_{3} + T\_{4} - T\_{1} - T\_{2}) \\ (s\dot{\psi}s\phi + c\psi s\theta c\phi) \, T\_{b} \\ (-c\psi s\phi + s\psi s\theta c\phi) \, T\_{b} \\ mg - (c\psi c\phi) \, T\_{b} \end{bmatrix} \tag{A4}$$

In Equation (A4), we have determined an expression that incorporates the thrust produced by each independent rotor (*Toi*) ∀*oi* : 1...4. These aerodynamic terms govern the generation of rolling (*τφ*), pitching (*τ<sup>θ</sup>* ) and yawing (*τψ*) torques at the CM of the UAV, where the term *soi*,*<sup>b</sup>* = 0.18*m* is the distance between each rotor to the body frame (see Figure A1a). In addition, *Toi* depends on the lift (*L*) and drag (*D*) forces acting on each propeller, as shown in Figure A1b. It can be written as:

$$\begin{array}{l} T\_{oi} = L + D\\ = \frac{1}{2} \rho\_{air} \omega\_{o1}^2 A\_{prop} \left( \mathbb{C}\_L + \mathbb{C}\_D \right), \end{array} \tag{A5}$$

where *ρair* = 1.20 Kgm<sup>3</sup> is the density of air, *<sup>ω</sup>oi*, <sup>∀</sup>*oi* : 1...4 is the rotor speed, *<sup>A</sup>prop* <sup>=</sup> 0.013 <sup>m</sup><sup>2</sup> is the propeller transversal area, *C<sup>L</sup>* is the lift coefficient, and *C<sup>D</sup>* is the drag coefficient. As shown in Figure A1b, we have estimated both values as *C<sup>L</sup>* = 1.6 and *C<sup>D</sup>* = 0.042, respectively. In this sense, the net vertical thrust (*T<sup>b</sup>* ) generated at the CM of the UAV can be calculated as:

$$T\_b = \sum\_{oi=1}^{4} T\_{oi} \tag{A6}$$

As observed in Equation (A4), *T<sup>b</sup>* governs the generation of the linear forces. The expressions *sψ*, *cψ* denote *sin*(*ψ*) and *cos*(*ψ*), respectively. Finally, the term *m* = 0.43 Kg is the mass of the UAV and *g* = 9.81 ms−<sup>2</sup> is the gravitational acceleration. In the forthcoming section, we will derive the control strategy to regulate the angular motions precisely. Since our control approach will depend on the UAV model, we introduce the computational steps to calculate the EoM in Algorithm A1.

**Algorithm A1:** EoM Computation.

**Step 1: Aerodynamic forces** Read the rotors speed from encoders: *ωoi*, ∀*oi* : 1...4 Calculate both lift and drag forces acting on each propeller: *<sup>L</sup>* <sup>←</sup> <sup>1</sup> 2 *CLρairω*<sup>2</sup> *oiAprop*, *<sup>D</sup>* <sup>←</sup> <sup>1</sup> 2 *CDρairω*<sup>2</sup> *oiAprop* Calculate the thrust produced by each rotor: *Toi* = *L* + *D* ∀*oi* : 1...4 Calculate net thrust produced at CM: *T<sup>b</sup>* = 4 ∑ *oi*=1 *Toi* Rotational forces (rolling, pitching, and yawing) torques onto the body frame: *τ<sup>φ</sup>* ← *soi*,*cm* <sup>ˆ</sup>*j*(*T*<sup>4</sup> <sup>−</sup> *<sup>T</sup>*3) *τ<sup>θ</sup>* ← *soi*,*cm* <sup>ˆ</sup>*i*(*T*<sup>1</sup> <sup>−</sup> *<sup>T</sup>*2) *τ<sup>ψ</sup>* ← *τ*<sup>3</sup> + *τ*<sup>4</sup> − *τ*<sup>1</sup> − *τ*<sup>2</sup> Linear forces acting onto the body frame: *fb*,*<sup>x</sup>* ← (*sψsφ* + *cψsθcφ*) *T<sup>b</sup> fb*,*<sup>y</sup>* ← (−*cψsφ* + *sψsθcφ*) *T<sup>b</sup> fb*,*<sup>z</sup>* ← (−*cψcφ*)*T<sup>b</sup>* 6D Aerodynamic Forces: [*τ<sup>φ</sup> τ<sup>θ</sup> τ<sup>ψ</sup> fb*,*<sup>x</sup> fb*,*<sup>y</sup> fb*,*<sup>z</sup>* ] *T* **Step 2: Inertial forces** Calculate 6D inertial operator: *I<sup>b</sup>* = *J<sup>b</sup>* 0 0 *mU* Calculate inertial terms: *<sup>N</sup>b*,*<sup>x</sup>* <sup>←</sup> ˙*θψ*˙ - *<sup>I</sup>yy* <sup>−</sup> *<sup>I</sup>zz <sup>N</sup>b*,*<sup>y</sup>* <sup>←</sup> *<sup>φ</sup>*˙*ψ*˙ [*Izz* <sup>−</sup> *<sup>I</sup>xx*] *<sup>N</sup>b*,*<sup>z</sup>* <sup>←</sup> ˙*θφ*˙ - *<sup>I</sup>xx* <sup>−</sup> *<sup>I</sup>yy fb*,*<sup>z</sup>* ← *mg* − *cos*(*ψ*)*cos*(*φ*)*T<sup>b</sup>* Calculate 6D Forces: *F<sup>b</sup>* ← [*Nb*,*<sup>x</sup>* + *τ<sup>φ</sup> Nb*,*<sup>y</sup>* + *τ<sup>θ</sup> Nb*,*<sup>z</sup>* + *τ<sup>ψ</sup> fb*,*<sup>x</sup> fb*,*<sup>y</sup> fb*,*<sup>z</sup>* ] *T* **Step 3: 6D Equations of Motion (EoM)** *V*˙ *<sup>b</sup>* ← *I* −1 *b* [*F<sup>b</sup>* <sup>−</sup> ˙*IbV<sup>b</sup>* ] Return *V*˙ *b*

#### **Appendix B. Backstepping + DAF Control Derivation**

The proposed control law has to be sensitive to small changes in both angular motions, therefore, an error dynamics could be defined as a function of the angular rates, as:

$$\begin{aligned} \dot{e}\_{\phi} &= \dot{\phi}^{d} - \omega\_{\chi} \\ \dot{e}\_{\theta} &= \dot{\theta}^{d} - \omega\_{y} \end{aligned} \tag{A7}$$

In Equation (A7), both *ω<sup>x</sup>* and *ω<sup>y</sup>* are measured by the IMU sensor onboard the UAV. The goal is to obtain a desired angular acceleration terms within the control law to account for small angular rate changes. These terms are called desired acceleration function (DAF):

$$\begin{aligned} \dot{\Phi}^d &= f(\phi, \dot{\phi}, F\_b) \\ &= \begin{bmatrix} 1 & 0 & 0 & 0 & 0 & 0 \end{bmatrix} \dot{V}\_{b\nu} \\ \dot{\theta}^d &= f(\theta, \dot{\theta}, F\_b) \\ &= \begin{bmatrix} 0 & 1 & 0 & 0 & 0 & 0 \end{bmatrix} \dot{V}\_b \end{aligned} \tag{A8}$$

Both DAF terms *φ*¨*<sup>d</sup>* and ¨*θ <sup>d</sup>* are extracted from the spatial acceleration *V*˙ *<sup>b</sup>* ∈ <6*x*<sup>1</sup> computed in Algorithm A1. To make explicit the DAF terms from Equation (A8) within the backstepping, in the following we focus on deriving the control law for roll (*uφ*).

From Equation (A6), we introduce a virtual control law that governs the error dynamics, yielding a second tracking error *e*<sup>2</sup> = *ω<sup>d</sup> <sup>x</sup>* <sup>−</sup> *<sup>ω</sup><sup>x</sup>* where *<sup>φ</sup>*˙ *<sup>d</sup>* <sup>→</sup> *<sup>ω</sup><sup>d</sup> x* . In this sense, a proportional-derivative-integral structure is defined for the virtual control law *ω<sup>d</sup> x* , as:

$$
\omega\_x^d = k\_p e\_\phi + \dot{\phi}^d + k\_i \int e\_\phi(t) dt \quad \text{(A9)}
$$

where *K<sup>p</sup>* and *K<sup>i</sup>* must be positive constants, since the BS requires a positive definite Lyapunov function (*L*) for stabilizing the tracking error: *L eφ* = *e* 2 *φ* 2 . Now, replacing Equation (A9) into the virtual control error *e*<sup>2</sup> yields:

$$
\omega\_2 = \omega\_\chi^d - \omega\_\chi = k\_p e\_\phi + \phi^d + k\_i \int e\_\Phi(t) dt - \omega\_\chi \tag{A10}
$$

By following the same approach from Equation (A7), the error dynamics for *e*˙<sup>2</sup> is determined as:

$$
\dot{e}\_2 = k\_p \dot{e}\_\phi + \ddot{\phi}^d + k\_i e\_\phi - \dot{\omega}\_x \tag{A11}
$$

In Equation (A11), we have derived *φ*¨*<sup>d</sup>* corresponding to the DAF term for roll (see Equation (A8)). In this regard, the computation of Algorithm A1 is required to close the attitude loop. Now, the control action *u<sup>φ</sup>* is determined as:

$$\begin{aligned} \mu\_{\Phi} \rightarrow \mathfrak{x}\_{\Phi} &= I\_{\text{xx}} \check{\Phi}\_{\prime} \\ \check{\Phi} \rightarrow \dot{\omega}\_{\text{x}} &= I\_{\text{xx}}^{-1} \mathsf{T}\_{\Phi} \end{aligned} \tag{A12}$$

Replacing *ω*˙ *<sup>x</sup>* from Equation (A12) in (A11) and isolating the control action, yields:

$$
\mu\_{\phi} = I\_{\text{xx}} [k\_p \dot{e}\_{\phi} + \ddot{\phi}^d + k\_i e\_{\phi} - \dot{e}\_2] \tag{A13}
$$

Since the calculation of *e*˙*<sup>φ</sup>* and *e*˙<sup>2</sup> for the real system would introduce accumulative numerical errors, we need to rewrite the control law to be dependent of the tracking errors: *e<sup>φ</sup>* and *e*2. By isolating *ω<sup>x</sup>* = *ω<sup>d</sup> <sup>x</sup>* − *e*<sup>2</sup> from Equation (A10) and replacing it into Equation (A7):

$$
\dot{e}\_{\phi} = \dot{\phi}^d - (\omega\_{\chi}^d - e\_{\mathfrak{L}}) \tag{A14}
$$

Replacing Equation (A14) into (A13):

$$\mu\_{\Phi} = I\_{\text{xx}} [k\_p(\dot{\phi}^d - \omega\_{\text{x}}^d + e\_2) + \ddot{\phi}^d + k\_i e\_\Phi - e\_2]. \tag{A15}$$

and replacing *ω<sup>d</sup> x* from Equation (A9):

$$\begin{cases} \mu\_{\Phi} = I\_{\text{xx}} [k\_p(\not\phi^d - (k\_pe\_{\phi} + \not\phi^d + k\_i \not\lnot e\_{\Phi}) + e\_2) + \not\theta^d + k\_i e\_{\phi} - e\_2] \\ = I\_{\text{xx}} [k\_p(e\_2 - k\_pe\_{\phi} - k\_i \not\lnot e\_{\Phi}) + \not\phi^d + k\_i e\_{\phi} - e\_2] \end{cases} \tag{A16}$$

Finally, the expression for *e*˙<sup>2</sup> can be rewritten to follow the same form of *e*˙*<sup>φ</sup>* in Equation (A14). Likewise, *ω<sup>d</sup> x* is also replaced by Equation (A9):

$$\begin{aligned} \dot{e}\_2 &= \dot{\phi}^d - \omega\_\chi^d + \mathbf{e}\_2 \\ \dot{\rho} &= \dot{\phi}^d - (k\_p e\_\phi + \dot{\phi}^d + k\_i \int e\_\phi) + \mathbf{e}\_2 \\ \dot{\rho} &= \mathbf{e}\_2 - k\_p e\_\phi - k\_j \int \dot{\theta}\_\phi^0 \end{aligned} \tag{A17}$$

In Equation (A17), the integration of the error can be eliminated since the control law in Equation (A16) already ensures zero steady-state error for *eφ*. Replacing *e*˙<sup>2</sup> = *e*<sup>2</sup> − *kpe<sup>φ</sup>* in Equation (A16):

$$\begin{aligned} \mu\_{\Phi} &= I\_{\text{xx}} [k\_p (e\_2 - k\_p e\_\Phi - k\_i \int e\_\Phi) + \ddot{\Phi}^d + k\_i e\_\Phi - e\_2 + k\_p e\_\Phi] \\ &= I\_{\text{xx}} [e\_\Phi (k\_p - k\_i - k\_p^2) + e\_2 (k\_p - 1) - k\_p k\_i \int e\_\Phi + \ddot{\Phi}^d] \end{aligned} \tag{A18}$$

Equation (A18) presents the control law to regulate *φ*. This controller allows zero steady-state error for roll via R *eφ*, it is sensitive to small variations in roll rate via *e*2, and it directly depends on the UAV model via the DAF term *φ*¨*<sup>d</sup>* (Algorithm A1). By following the same structure in Equation (A18), the control law to regulate the pitch angular motion (*θ*) is:

$$u\_{\theta} = I\_{yy}[e\_{\theta}(k\_{p,2} - k\_{i,2} - k\_{p,2}^2) + e\_2(k\_{p,2} - 1) - k\_{p,2}k\_{i,2} \int e\_{\theta} + \vec{\theta}^d] \tag{A19}$$

#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
