**Coupling Waveguide-Based Micro-Sensors and Spectral Multivariate Analysis to Improve Spray Deposit Characterization in Agriculture**

**Anis Taleb Bendiab 1,2, Maxime Ryckewaert 1, Daphné Heran 1, Raphaël Escalier 2, Raphaël K. Kribich 3, Caroline Vigreux 2,\* and Ryad Bendoula 1,\***


Received: 7 August 2019; Accepted: 24 September 2019; Published: 26 September 2019

**Abstract:** The leaf coverage surface is a key measurement of the spraying process to maximize spray efficiency. To determine leaf coverage surface, the development of optical micro-sensors that, coupled with a multivariate spectral analysis, will be able to measure the volume of the droplets deposited on their surface is proposed. Rib optical waveguides based on Ge-Se-Te chalcogenide films were manufactured and their light transmission was studied as a response to the deposition of demineralized water droplets on their surface. The measurements were performed using a dedicated spectrophotometric bench to record the transmission spectra at the output of the waveguides, before (reference) and after drop deposition, in the wavelength range between 1200 and 2000 nm. The presence of a hollow at 1450 nm in the relative transmission spectra has been recorded. This corresponds to the first overtone of the O–H stretching vibration in water. This result tends to show that the optical intensity decrease observed after droplet deposition is partly due to absorption by water of the light energy carried by the guided mode evanescent field. The probe based on Ge-Se-Te rib optical waveguides is thus sensitive throughout the whole range of volumes studied, i.e., from 0.1 to 2.5 μL. Principal Component Analysis and Partial Least Square as multivariate techniques then allowed the analysis of the statistics of the measurements and the predictive character of the transmission spectra. It confirmed the sensitivity of the measurement system to the water absorption, and the predictive model allowed the prediction of droplet volumes on an independent set of measurements, with a correlation of 66.5% and a precision of 0.39 μL.

**Keywords:** optical micro-sensors; crop protection; precision agriculture; infrared spectroscopy; principal component analysis (PCA); partial least squares (PLS); droplet characterization

#### **1. Introduction**

Despite the various campaigns conducted to initiate the reduction of pesticides in agriculture, their use has further increased in recent years [1]. The main challenge is to reduce the use of products while improving plant protection. This challenge motivates numerous research projects that aim to find alternative solutions [2] or to develop decision support tools to reduce doses of applied pesticides [3,4]. Meeting this challenge could significantly reduce the environmental risks associated with agricultural activities. Improving spray efficiency can be a solution implemented to optimize plant protection use in the fields. However, the efficiency of the treatments depends largely on the spray drift which must be minimized, and on the foliar coverage, which must be as large as possible.

Different authors have addressed the spray drift. Examples include the work of Grella et al. [5] who have set up a 20 m-long device, with Petri dishes spaced every 0.5 m. The analysis of the different Petri dishes after spraying with water, in which a yellow marker (E-302 Tartrazine yellow dye tracer) was added, allowed them to compare different spray devices and to highlight those that generate the least important drift. To minimize spray drift, knowledge of droplet size is of paramount importance. Indeed, studies have shown, for example, that the use of sprayers distributing too fine drops leads to a greater drift of the spray and therefore reduces its effectiveness [6]. Techniques to measure the size of the spray droplets can therefore be extremely useful. Different methods already exist. Balsari et al. [7] used, for example, a laser diffraction system (Malvern SprayTec instrument) coupled with a SprayTec software to obtain droplet size data from different sprayers. We can also cite the PDPA (Phase Doppler Particle Analyze) method for example [8]. Other optical methods include the use of a non-contact optical sensor to detect and measure droplets in flight [9] or interferometric techniques (IPI: Interferometric Particle Imaging) [10].

The characterization of leaf coverage surface has also interested researchers. The techniques used were mostly based on water-sensitive papers (WSP) [11], more recently coupled with imaging [12–14]. In Reference [14], the authors propose a comparative study of three methods for using data collected on sensitive paper: visual assessment of coverage rate, visual droplet counting and measurement by an imaging system. The correlation obtained was very good only for coverage rates below 20%. In addition to techniques using water-sensitive papers, other methods imply the use of fluorescent tracers. We can mention the work of Llop et al. [15] in which the use of a fluorescent tracer (Helios SC 500; 0.1% *v*/*v*) traced the deposition on the leaves of the plants (in ng per cm2) as well as the coverage rate of the leaves by the sprayed product. In Reference [16], the authors tried to correlate the surface coverage rate and droplet size distribution. To do this, they simultaneously sprayed sensitive papers and Petri dishes containing silicone oil with water to which a red Ponceau marker (with a concentration of 2%) had been added.

As existing qualitative analysis methods adopting the use of sensitive paper or tracers are often not "smart" enough to achieve rapid collection with a single click and automatically adapt to the environment (e.g., changing lighting), researchers have recently proposed an intelligent node based on a vision sensor to automatically collect images of droplet deposition [17]. Other authors have considered the use of a commercial leaf wetness sensor to monitor spraying and have compared the results obtained with the WSP technique [18].

The final objective of our work is to design low-cost micro-sensors that can be used directly in the field to quickly measure leaf coverage surface. It should be noted that our measurement of the coverage surface is an indirect measurement, since we intend to determine the covered surface via a measurement of the volume of droplets deposited on the sensor. Some authors have already proposed such a solution, consisting of an electronic detection system based on resistance [19,20]. However, there is not yet a good enough resolution to detect fine drops. In our case, we propose an integrated optics solution. In a previous work, we fabricated straight waveguides based on Ge-Se-Te chalcogenide films and we tested their sensitivity to water droplet deposition at the wavelength of 1.55 μm. As expected, water absorbed part of the evanescent wave of the guided light, leading to a decrease in the transmitted intensity. Moreover, we showed that the greater the droplet volume and the greater the number of drops, the greater the decrease in intensity [21]. Nevertheless, we observed a non-linearity of the results, with a lower sensitivity for large volumes. Moreover, we limited our study to volumes comprised between 2 and 10 μL, with a step of 2 μL. In this article, we propose to go a little further, even if we are still at the beginning of the development of sensors. The goal was to develop a probe able to measure smaller drops. The studied volumes ranged from 0.1 to 2.5 μL, and a step of 0.1 μL was chosen to estimate the probe precision. Furthermore, we propose, with the same straight waveguides as in our previous work [21], to no longer work at a single wavelength but to use spectroscopy, which, coupled with multivariate analysis, should allow us to increase sensitivity and to consider accessing a measurement of the composition of the sprayed liquid. Among the different

methods of multivariate analysis, principal component analysis (PCA) and partial least squares (PLS) techniques, that are commonly used in chemometrics [22], were chosen.

#### **2. Materials and Methods**

#### *2.1. Waveguide Manufacture and Characterization*

#### (a) Manufacture

According to previous studies [21,23–25], rib-type waveguides were fabricated. The waveguiding structures were made from a 6 μm-thick Ge25Se65Te10 cladding layer (which refractive index n is 2.55 at the wavelength λ = 1.55 μm) deposited onto silicon substrates and a 6 μm-thick Ge25Se55Te20 core layer (n = 2.65 at λ = 1.55 μm) on top, the core layer being then lithographed, and unprotected parts etched down to 4 μm. Ge-Se-Te films were deposited by thermal co-evaporation, using a MEB 500 device from PLASSYS, Marolles-en-Hurepoix, France. Two current-induced heated sources were used to evaporate Tellurium and Selenium and an electron beam evaporator was used for Germanium. The three sources were placed in a specific configuration to obtain homogeneous films on an area of 4 cm in diameter. Silicon substrates of 3 cm × 4 cm were used to fabricate the components and microscope slides were used for further film characterization. Before deposition, the co-evaporation chamber was evacuated down to 10−<sup>5</sup> Pa to avoid contamination. During deposition, the substrate holder was rotated for a best homogenization and heated at 70 ◦C thanks to a hot water circulation. The elemental deposition rates were controlled thanks to three independent thin film deposition controllers (from TELEMARK, Battle Ground, WA, USA and INFICON, Bad Ragaz, Switzerland). The deposition process was initiated when the three elemental deposition rates were stable and was stopped when a total thickness of 6 μm was reached for each film.

Once the cladding layer and the core layer were deposited onto silicon substrates, the geometry of the core layer was modified by photolithography and ion beam etching, successively. For the photolithographic step, the AZ4533 positive resin and the AZ726 developer from MicroChemicals (Ulm, Germany) were used and samples were insolated during 7 s under a 400 W-UV lamp, through a mask patterned with opaque bands of different widths (ranging from 4 to 42 μm). For the etching step, performed using a set-up provided by PLASSYS (Marolles-en-Hurepoix, France), a mixture of argon and few percent of oxygen was used. The working pressure was fixed to 4.5 <sup>×</sup> 10−<sup>2</sup> Pa, the partial pressure PO2 being first set to 7.7 <sup>×</sup> <sup>10</sup>−<sup>3</sup> Pa. A voltage of 400 V was applied, and an etching duration of 50 min was chosen to achieve an etching depth of 4 μm [21].

#### (b) Characterization

After deposition, each film was analyzed in terms of thickness, composition, and refractive index thanks to a DEKTAK mechanical profilometer (BRUKER, Billerica, MA, USA), a S4500 Energy-Dispersive X-ray spectrometer (HITASHI, Tokyo, Japan) and by exploiting NIR spectra recorded in the range 500–2500 nm using a Cary 5000 spectrophotometer (AGILENT TECHNOLOGIES, Santa Clara, CA, USA), respectively. After etching, the etching depth was checked by profilometry and the component was observed from above and on the edge by using an Inspect S50 scanning electron microscope (FEI, Hillsboro, OR, USA).

#### *2.2. Optical Setup*

The fabricated straight waveguides provided either a multi-mode or a single-mode propagation behaviour, depending on their width and on the wavelength. The bench used for their characterization was equipped with a supercontinuum laser source (SM-100 NIR; LEUKOS, Limoges, France) combined with a pulse generator (4422; SEFRAM, Saint-Etienne, France) (Figure 1). A wavelength high pass filter (FELH1150nm; THORLABS, Newton, NJ, USA) was installed to fit to the Ge-Se-Te waveguides transmission range. To minimize the coupling losses, we used a 3.5 mm focal length lens (87–127 AR; Edmund Optics, Barrington, NJ, USA) to focus the signal from the supercontinuum fiber at the waveguide entrance with a 0.4 numerical aperture. The waveguide sample was held on a 3 axes translation stage platform (M-562; Newport) and fixed by a vacuum mount (HWV001, THORLABS, Newton, NJ, USA). Moreover, we used a convex lens (AL1815-C; THORLABS, Newton, NJ, USA) placed at 11 mm from the output of the waveguide sample to collimate the output beam. This beam hit a beam splitter (BSF10-C, THORLABS, Newton, NJ, USA), 10% of the beam was reflected onto a camera (WDY SMR S320-VS; NIT, Verrières le Buisson, France) to control the injection in the right RIB waveguide. The remaining 90% of the beam was transmitted to a lens (AL1225 M-C; THORLABS, Newton, NJ, USA) to inject light into the spectrometer (UV-VIS-NIR TF; ARCoptix, Neuchatel, Switzerland). This spectrometer performed a Fourier-Transform scanning in the NIR spectrum range (FT-NIR). The measurement spectral range was between 1200 and 2000 nm.

**Figure 1.** Spectrophotometric bench used to characterize droplet deposits onto the rib waveguide surface. Zoom on the component to visualize the droplet deposited on the top to perform the tests.

#### *2.3. Droplet Deposition and Spectra Acquisition*

The influence of the droplet deposition on the waveguide transmission was studied by recording the output spectrum before and after the droplet deposition onto the surface of a rib waveguide. It should be noted that a multi-mode (and therefore wide) guide ensuring the highest possible light intensity collection was selected for the experiments. The temperature of the room was regulated at 24 ± 1 ◦C. Demineralized water was used in all the experiments and different volumes of droplets were tested. A micropipette was used to dispense droplets from 0.1 to 2.5 μL in volume, in steps of 0.1 μL, on the surface of the rib waveguides. Between two droplet depositions, the sample was dried with dry air and the initial transmission of the waveguide was ensured not having been altered. All the experiments were performed three times on three different days. Each series, corresponding to each day, contains 25 spectra for deposit volumes from 0.1 to 2.5 μL. In total, 75 spectra were thus recorded. The series are called S1, S2 and S3.

For each spectrum, the transmittance T (λ) was calculated thanks to Equation (1) where I0 is the spectrum recorded before the droplet deposit, I the spectrum after the droplet deposit and In the black spectrum (recorded when the shutter of the spectrometer is closed and no light can enter it):

$$\mathbf{T}(\lambda) = \frac{\mathbf{I}(\lambda) - \mathbf{I}\_{\mathbf{n}}(\lambda)}{\mathbf{I}\_{0}(\lambda) - \mathbf{I}\_{\mathbf{n}}(\lambda)},\tag{1}$$

#### *2.4. Multivariate Data Analysis*

#### (a) Principal Component Analysis (PCA)

The objective of the multivariate data analysis, in spectroscopy, is to extract information from the different spectra by projecting results in a reduced basis containing much less dimensions than a spectrum acquisition points number (number of wavelengths) [22]. The hypothesis which is done is that this information will be obtained by studying the differences between the spectra. Ideally, spectra are expected to be identical for equal droplet volumes, and on the contrary, the spectra corresponding to the smallest volumes are expected to be a little different from the spectra obtained for the largest volumes. But several parameters can interfere with the measurement, such as the position of the droplet on the waveguide, the temperature or the light injection efficiency; indeed, this latter can vary due to misalignment between series of similar experiments or to laser sources fluctuations between reference (I0) and measured (I) spectra; this makes it difficult to exploit the raw spectra. One of the objectives of the PCA will first be to identify the outliers, corresponding to aberrant or atypical spectra to clean the dataset. Another objective will consist in highlighting the differences or similarities between the spectra.

PCA consists in trying to define new variables, which will cause as little information loss as possible. Several formalisms can be used, such as matrix writing [26]. Thus, we can write the matrix X of the initial data (a matrix of dimensions *N* × *P*, *N* corresponding to observations–light transmission and P to variables–wavelengths), as the sum of A terms (A being the dimensionality of the new basis, lower than *N* and *P*), which explains most of the data and a residual matrix E corresponding to "noise" (part of the data that are not explained by the analysis). Each of the A terms is itself the product of two vectors ti and pi ' , so that the matrix X of the initial data can be written as in Figure 2:

**Figure 2.** Writing of the matrix X of the initial data according to the PCA model.

The *ti* are called score vectors, whereas the *pi* are called loading vectors [26]. The pi are the coordinates of a principal component (PC) in the original space (the wavelengths space, i.e., the PC is a spectrum) while the *ti* are the coordinates of measured spectra in the PC basis. The *p1* are computed to maximize the variance of measured spectra versus PC1, and so on for the rest of the variance versus the other PCi. Thus, the PCi are ordered in decreasing percentage of explained variance versus the different recorded spectra, PC1 corresponding to the highest percentage. Usually, the score plot of the first two score vectors *t1* and *t2,* which thus capture the essential information from the spectra, will highlight similarities or differences between the initial spectra, and will show outliers.

#### (b) Regression Model

In chemometrics, partial least squares (PLS) regression is the most common regression method for multivariate data [27]. In spectroscopy, PLS is used in most cases for relating spectral data to measured variables [28]. In this study, PLS regression was used to model droplet volume using T(λ). A general PLS model was built using S1 and S2 calibration sets (50 samples) to predict the samples of the independent test set S3 (25 samples).

Before preprocessing, the data acquired in transmittance were transformed in absorbance [−log T(λ)]. The remained spectra were smoothed by using a Savitzky–Golay algorithm [29] with a second order polynomial and 15 points. Then, standard normal variate (SNV) [30] was applied to remove the baseline. The number of latent variables was determined by comparing performances by leave-one-out cross-validation [31].

The performances of the cross-validations (using S1 and S2 sets), and prediction (using the independent set S3) were assessed through the number of latent variables used in the models, the coefficient of determination R<sup>2</sup> and the standard error of cross-validation (SECV) and standard error of prediction (SEP) corrected from the bias.

All the computations were performed with Matlab software (Matlab R2015b, Mathworks).

#### **3. Results & Discussion**

#### *3.1. Characterization of the Component*

Table 1 gives the characteristics of the two deposited layers. The thicknesses errors are less than 2.5%, the compositions are little different (higher atomic percentages in Te but lower atomic percentages in Se) with those expected and the refractive indices n deduced from the transmission spectra are little lower than the targets but the index difference Δn remains around 0.1, as wanted. Δ

**Table 1.** Ge-Se-Te core and cladding layer compositions and optogeometrical properties.


Figure 3 shows the typical section of a rib waveguide and allows to confirm the etching depth of 4.5 μm measured by profilometry. This value is close to that expected after an etching of 50 min. A thin layer of resin can be seen, since the images were recorded before the step of removing the excess resin that had not been attacked during etching. Note that the 35 μm-wide rib waveguide presented in Figure 3 is the one used for the experiments.

**Figure 3.** SEM picture of a typical rib waveguide.

#### *3.2. Raw Spectra*

A total of 75 relative transmission spectra (corresponding to the three series S1, S2, and S3) were recorded and are superimposed in Figure 4. The first remark is that the raw spectra are not noisy, even for low transmissions, which means that the electronic noise is negligible. The second remark is that there is a certain homogeneity in the spectra with the presence of a hollow at about 1450 nm, corresponding to an overtone of the absorption band related to the vibrational energy of the oxygen-hydrogen (O–H) bond in water. This means that our waveguide is sensitive throughout the

whole range of volumes studied, i.e., from 0.1 to 2.5 μL, whereas resistance-based sensors did not reach the below 5 μL region [19,20]. However, one can notice that spectra are not gathered depending on the volume: This lack of repeatability is probably due to random fluctuations of the optical alignment. Moreover, four spectra are a little different, with a smaller amplitude in the transmittance, and correspond to the rejected outliers (dashed spectra in Figure 4). To finish, we can see that it is impossible from these raw spectra to extract a correlation between transmittance and volumes. This observation is also true if we focus on the water absorption wavelength (1450 nm), as shown in the insert in Figure 4. The non-correlation between the transmittance of the raw spectra and the droplet volumes justifies the use of a multivariate analysis to solve this problem.

**Figure 4.** Transmittance spectra obtained from measurements. Outliers are highlighted in dashed lines. The insert shows the transmittance at 1450 nm for each volume.

#### *3.3. Principal Component Analysis*

A principal component analysis (PCA) was performed on the whole dataset. It allowed identifying two principal components PC1 and PC2, explaining 95.81% of the variance between the different recorded spectra (85.72% for PC1 and 10.09% for PC2). We can therefore consider that the main information is contained in these two principal components, the rest can be considered as noise.

To go further in the analysis, we plotted the scores on each principal component for each sample, corresponding to the different volumes and measurement days (3 times 25 different volumes). Note that scores represent new coordinates of the different samples on the generated components (Figure 5a for PC1 and Figure 6a for PC2). We also plotted the loadings versus wavelength for each principal component (Figure 5b for PC1 and Figure 6b for PC2). To end, we realized a score plot on PC1 and PC2 generated from the whole dataset (Figure 7).

**Figure 5.** Scores (**a**) and loadings (**b**) related to principal component 1 (PC1). Note that in (a) the samples are classified by increasing volume, regardless of the origin of the series (S1, S2, or S3).

**Figure 6.** Scores (**a**) and loadings (**b**) related to PC2. Note that in (a) the samples are classified by increasing volume, regardless of the origin of the series (S1, S2, or S3).

**Figure 7.** Score plot on PC1 and PC2 generated from the whole dataset.

In Figure 5a we can observe that, except for four points, scores on PC1 are homogeneously distributed around the axis corresponding to a score of 0. It signifies that there is no correlation between PC1 and the droplet volume (color coded). The main variance between the different spectra is thus not due to absorption of water but more to physical phenomena. This is confirmed by Figure 5b. Indeed, we can see that loadings for PC1 are all positive and decrease with the wavelength: PC1 is probably correlated with injection losses (which are supposed to be the lowest at short wavelengths where they were optimized) and propagation losses. Concerning the four points that are in margin of Figure 5a, they correspond to the atypical spectra already observed in Figure 4. They probably correspond to errors in the measurements.

In Figure 6a, we observe a correlation between the scores on PC2 and the droplet volumes. Indeed, even if the effect is not very pronounced, we can notice that scores on PC2 are mainly positive for low volumes, decrease when volumes increase and are mainly negative for high volumes. The principal component PC2 is thus related to the absorption of water. This is confirmed by Figure 6b. Indeed, the loadings for PC2 versus the wavelength form a curve which is totally in agreement with the absorption spectrum of water, with a high absorption peak at around 1450 nm and another one at around 1940 nm. The loadings for PC2 are negative in the spectral range where water absorbs.

Note that the outliers observed in Figure 5a are not observed in Figure 6a. It confirms the fact that these outliers are well correlated to a problem in the measurements and not to the volume of the droplets.

The conclusions that can be drawn from Figures 4 and 5 are therefore that 85.72% of the explained variance between the different spectra is due to physical phenomena, while only 10.09% of the explained variance is due to the presence of water and therefore to the volume of the droplet deposited. Unfortunately, this means that system disturbances have a greater impact on spectra than the measurand. But the positive point is that the effect of volume is measurable.

In Figure 7, corresponding to the score plot on PC1 and PC2, outliers are still present: This is consistent with the previous observations (Figures 4 and 5a). The four spectra corresponding to the four outliers are removed for the prediction model, because a problem of measurement is suspected. Then, we notice that there are no groups formed by the different points (representing the initial spectra): The distribution of the points is indeed quite homogeneous as volume correlated groups overlap. Nevertheless, we find the same trend in PC2 as in Figure 6a: One can see that points corresponding to low volumes are mainly characterized by positive scores whereas points corresponding to high volumes are mainly characterized by negative scores.

#### *3.4. Prediction Model*

A PLS model was built with a calibration set containing spectral dataset from S1 + S2, after having removed the outliers. The results of the calibration model are presented in Figure 8a where the predicted volumes obtained are plotted versus the real values. The metrological performance of the system (laser + probe + detector + model) can then be extracted. Repeatability is given by the standard error of cross-validation (SECV) or the variance, accuracy by the bias, and precision by the coefficient of correlation R2. This latter is equal to 0.803, the bias is 0.00508, and the variance is of 0.299 μL. So, it tends to show that the precision of the model for the calibration set is mainly limited by the variance, since most of it is due to physical perturbation (PC1) and limits the learning process. The low value of the bias may be interpreted as the fact that variance of S1 + S2 is symmetric and has little impact on mean error. The result is quite encouraging.

**Figure 8.** Model performance for the (**a**) cross-validation and (**b**) prediction.

In order to validate the predictive model in real conditions, it was then tested on the independent spectral dataset from the series S3. The predicted volumes obtained are plotted versus the actual volumes in Figure 8b. The coefficient of correlation is now of 0.665 and the standard error of prediction or the variance in the predicted volume increased from 0.299 to 0.391 μL. The bias is now 50 times much stronger (at a value of <sup>−</sup>0.105), which negatively impacts the R2.

The failure of the coefficient of determination (66.5%) isn't due to the volume but to the test conditions: Light injection quality, droplet position on the waveguide, spectrometer setting, etc. A first physical improvement is related to light injection, this can be optimized and made robust versus wavelength changes and mechanical vibrations by studying injection alignment impact, pigtailing the fibre from the light source with the waveguide and monitoring directly the output signal of the light source. Also, the temperature may slightly modify the evanescent field extent and thermal monitoring can quantify the temperature explained variance. Finally, droplet lateral position modifies the surface of interaction between water and light and thus varies the output signal; this effect can be studied by imaging droplet position and can be limited by measuring the transmission of a series of close (few microns or tens of microns distance) waveguides on which the droplet is deposited. At the end, by decreasing the physical variance, the percentage of chemically explained variance will increase and improve the learning process to order to obtain a better model. Moreover, the model can be improved from a numerical point of view. Using a bigger calibration test should improve the learning process and, secondly, using a bigger test set should permit better evaluation performance in real conditions. The PLS construction depends on PLS dimension and its error can be minimized by choosing the proper dimensionality.

The 66.5% prediction coefficient of determination can thus be increased, as well as the 0.39 μL precision can be reduced. Nevertheless, it already unveils the possibility of reaching μL or lower resolution, which is more than needed for spray deposit analysis since a less-than-1 μL droplet cannot be deposited as it is suspended in air. The integrated optics approach will permit the addition of functions as power splitting to simplify pigtailing on any number of waveguides in a compact way. Mach–Zehnder or multimode interferometers can be used for on chip temperature monitoring. Additionally, wavelength filters and (de)multiplexers can be used to select wavelengths of interest to reduce the size, complexity, and cost of the system; before this, a study of the reduction of the number of wavelength variables study must be conducted with PCA to feed PLS construction and choose the correct wavelengths depending on prediction performance.

#### **4. Conclusions**

The accurate determination of the leaf coverage, a key measurement of the spraying process to maximize spray efficiency, requires specific measuring tools. We have developed the first brick of micro-sensors that can be used directly in the field to quickly and automatically measure leaf coverage surface. Even if this work is still a proof of principle, the first results are encouraging. The probe based on Ge-Se-Te rib optical waveguides is sensitive throughout the whole range of volumes studied, i.e., from 0.1 to 2.5 μL.

A principal component analysis performed thanks to spectral measurements showed that the spectra variance was related at 10.09% to the presence of water onto the waveguide surface and that 85.72% of the variance had to be suppressed or strongly limited. The tests for each volume were repeated three times to apply a prediction model that estimates the droplet volume present on the waveguide. First results show that the prediction, on an independent data set, is 66.5% reliable and the precision on the predicted volume is equal to 0.39 μL.

Two complementary ways can be investigated to increase the percentage of variance explained by the volume of the deposited droplets: (a) To increase the measuring system robustness versus interfering physical quantities, and (b) to deepen data analysis with thanks to the learning process in order to create a better prediction model. The implementation of the technology to develop a sensor for use in real condition environments will then need further studies to determine the right optical waveguide structure to develop, low cost and portable laser sources and spectrometers, as well as the optimized packing and connectors.

**Author Contributions:** Conceptualization, R.B.; methodology, M.R.; software, R.K.K. and M.R.; validation, A.T.B.; formal analysis, A.T.B. and M.R.; investigation, A.T.B.; resources, R.E. and D.H.; data curation, R.B., R.K.K. and C.V.; writing-original draft preparation, A.T.B. and M.R.; writing-review and editing, R.B., R.K.K. and C.V.; supervision, R.B. and C.V.; project administration, R.B.; funding acquisition, R.B.

**Funding:** This work was supported by the French National Research Agency under the Investments for the Future Program, referred as ANR-16-CONV-0004.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Automated Measurement and Control of Germination Paper Water Content**

#### **Lina Owino \*, Marvin Hilkens, Friederike Kögler and Dirk Söffker**

Chair of Dynamics and Control, University of Duisburg-Essen, Lotharstraße 1, 47057 Duisburg, Germany; marvin.hilkens@de.bertrandt.com (M.H.); friederike.koegler@uni-due.de (F.K.); dirk.soeffker@uni-due.de (D.S.)

**\*** Correspondence: lina.owino@uni-due.de; Tel.: +49-203-37-91866

Received: 21 January 2019; Accepted: 11 May 2019; Published: 14 May 2019

**Abstract:** Germination paper (GP) is used as a growth substrate in plant development studies. Current studies bear two limitations: (1) The actual GP water content and variations in GP water content are neglected. (2) Existing irrigation methods either maintain the GP water content at fully sufficient or at a constant deficit. Variation of the intensity of water deficit over time for plants grown on GP is not directly achievable using these methods. In this contribution, a new measurement and control approach was presented. As a first step, a more precise measurement of water content was realized by employing the discharging process of capacitors to determine the electrical resistance of GP, which is related to the water content. A Kalman filter using an evapotranspiration model in combination with experimental data was used to refine the measurements, serving as the input for a model predictive controller (MPC). The MPC was used to improve the dynamics of the irrigation amount to more precisely achieve the required water content for regulated water uptake in plant studies. This is important in studies involving deficit irrigation. The novel method described was capable of increasing the accuracy of GP water content control. As a first step, the measurement system achieved an improved accuracy of 0.22 g/g. The application of a MPC for water content control based on the improved measurement results in an overall control accuracy was 0.09 g/g. This method offers a new approach, allowing the use of GP for studies with varying water content. This addressed the limitations of existing plant growth studies and allowed the prospection of dependencies between dynamic water deficit and plant development using GP as a growth substrate for research studies.

**Keywords:** moisture measurement; Kalman filter; model predictive control; germination paper

#### **1. Introduction**

Germination paper (GP) is frequently used as substrate in research concerning plants, mainly in studies involving root phenotyping [1,2], root development in early developmental stages [3], and germination [4,5]. The plants are grown on vertically arranged sheets of GP with the roots growing on the GP surface [1–3]. This allows image acquisition of the root system using flatbed-scanners, and therefore can be implemented in an automated, high-throughput setting as shown in Adu et al. [1]. Irrigation of the GP is applied by a pouch-and-wick system, with the lower edge of the GP immersed in a nutrient solution.

A platform for high-throughput, high resolution root phenotyping is described in Adu et al. [1], as applied to different genotypes of Brassica Rapa. Here the pouch-and-wick system is used with the lower 10 cm section immersed in nutrient solution. It is stated that the plants "showed no symptoms of mineral deficiencies when provided with an appropriate nutrient solution through the wick."

A root phenotyping platform applied in the investigation of root growth dynamics of corn (*Zea mays* L.) is presented in Hund et al. [2]. Here the pouch-and-wick system is used with the lower 2 cm of GP immersed in nutrient solution. The root systems growing on the GP surface were scanned daily. The results showed a linear growth rate of axial roots and an exponential growth rate of lateral roots.

In an examination of the effect of phosphorus availability on root gravitropism for five different genotypes of common bean (*Phaseolus vulgaris* L.), Liao et al. [3] carried out experiments with different substrates (GP, sand, and soil) to "validate existing observations of seedlings in growth pouches with studies of older plants in soil and solid media." The GP was also used in a pouch-and-wick configuration with the lower edge of GP hanging into nutrient solutions with different phosphorous concentrations. The results showed that root gravitropism depends on phosphorous availability and show different characteristics among genotypes. Additionally, the adaption of genotypes to phosphorous deficit is related to their ability to develop roots in the top layers of substrate.

The existing studies demonstrate the applicability of GP in studies related to root growth and development, typically under fully watered conditions. The previous studies state that the available water content is responsible for several effects, and should therefore be controlled if related research questions are to be addressed.

#### *1.1. Induction of Water Deficit in GP*

Exploration of water deficit effects on root growth and development with GP as a growth substrate requires the induction of water deficit in GP. Solutions of polyethylene glycol (PEG) as described in [6] have been applied in various studies to induce a water deficit in plants [4,5]. The solutions show different osmotic potentials dependent upon PEG concentration. Plants irrigated with PEG6000-solutions can therefore obtain only a part of the water contained in the solution. The osmotic potential can be measured by thermocouple psychrometry or vapor pressure osmometry [6] and is specified as pressure (measured in bar).

The effect of water deficit on sixteen different wheat genotypes in early developmental stages is described in Rauf et al. [4]. The seeds were placed between two layers of GP moistened with PEG6000-solutions of different concentration to induce water stress of different intensities. It was shown that the tested genotypes differ significantly in water stress tolerance.

In an investigation on the effect of water stress on germination indices, soluble sugar, and proline for five different genotypes of wheat, Qayyum et al. [5] employed petri dishes containing GP moistened with PEG6000-solutions at different concentrations as a growth substrate. The results show that the germination percentage, mean germination time, and coleoptile length decreased with rising water stress while soluble sugar and proline increased.

For studies involving fixed water deficit levels, PEG provides a practical solution. Where dynamic variation of water deficit levels in the course of the growth cycle is necessary, alternative approaches are needed to achieve the required water uptake.

#### *1.2. Current Limitations of GP Use as a Growth Substrate*

In some studies using a pouch-and-wick system for irrigation [1–3], a sufficient amount of nutrient solution (not in deficit) is applied to the plants. However, the actual volumes of nutrient solution contained in the GP at different points in time as well as variations of GP water content between different GP sheets are not considered. It was observed in preliminary experiments conducted for this work that GP water content of different GP sheets strongly varied even if irrigated with the same volumes of nutrient solution at the same time, as illustrated in Figure 1. Taking into account the actual water content of each GP used in a study would result in a more accurate evaluation of the effects of plant water uptake in irrigation studies using GP as a growth substrate.

The pouch-and-wick configuration using a constant supply of nutrient solution leads to a maximal distribution of GP water content. This is defined by the equilibrium between absorption of nutrient solution and evapotranspiration. For research tasks concerning the effect of water deficit on root growth and development, improved approaches allowing for supply of water to GP below maximum level would be helpful.

In studies prospecting water stress [4,5], defined water deficit intensities are applied to GP sheets using a fixed concentration of PEG6000. This method maintains a constant water deficit level throughout and is therefore not applicable to studies requiring an application of a sequence of different water deficit intensities to the same GP sheet.

**Figure 1.** An example of germination paper (GP) water content behavior in sheets with an identical irrigation treatment.

#### *1.3. Resulting Targets for this Research*

The contributions of this work are:


#### **2. Materials and Methods**

#### *2.1. Ambient Conditions and Test Rig*

A system for measurement and control of GP water content was installed in a test rig for the research of plant dynamics under water stress at the Chair of Dynamics and Control, U Duisburg-Essen, Germany. A partial view of the test rig is shown in Figure 2. The test rig contains up to 20 maize plants (*Zea mays* L., Ronaldinio variety from KWS. divided into four groups of five plants each. Sheets of GP (210 × 297 mm, Hahnemühle, grade 3644, approx. 45 g) were chosen as substrate to allow a scan of the root systems. One plant was placed on each sheet. Each GP sheet was mounted on a sheet of Perspex, which was covered with a second, transparent sheet to allow scanning. Small tanks were put on the

bottom right edge of the GP for irrigation, clamped between the two Perspex sheets. Approximately 3 cm of the GP was submerged in the tanks. These tanks could automatically be filled at discrete times with 5–30 mL of nutrient solution via two peristaltic pumps and a distribution system. The system for GP irrigation was described by Sattler [7].

**Figure 2.** Partial view of the test rig. Two out of four test groups are shown. (a) Artificial illumination; (b) Irrigation tubes; (c) Infrared camera (not relevant in this work); (d) Test group of five specimens each consisting of a Perspex frame, GP, and a corn plant.

Test runs showed that the GP could absorb 30 ml of nutrient solution from the tank in less than 2 h (=sampling time of the measurements), as long as its water content was beneath 0.67 g/g. The maximal water content of GP sheets was between 1.78 and 2.00 g/g.

Preliminary tests to determine the distribution of nutrient solution within the GP were conducted. Four pieces of GP were each marked into eight equal sections measuring 105 mm × 74 mm and submerged in nutrient solution to saturation. In experiments, the GP sheets were dried out until the mass of retained water within the GP was 50, 40, 35, and 20 g respectively, with periodic measurements taken using a precision balance to determine mass of water within the GP. As soon as each GP attained the desired mass, it was divided into the eight pre-marked sections, and water content for each individual section was determined gravimetrically.

Environmental conditions within the test rig were controlled to maintain air temperature of 21–23 ◦C and artificial illumination for 14 h of simulated daylight. Illumination was provided by four 150 W, 9500 K fluorescent bulbs.

Relative air humidity was not controlled, but was measured twice a day. Relative humidity values ranged between 21–33% during the experiment.

#### *2.2. GP Water Content Measurement*

In pre-studies, different methods were evaluated for the measurement of GP water content (Table 1). Methods involving the direct measurement of electrical resistance were eliminated due to inconsistency in values obtained. This was taken to be due to the extremely high resistances exhibited by GP. Indirect measurement of the GP electrical resistance via the discharging of capacitors as described in Hering [8] was applied in the development of the moisture measurement system described in this work.

The relationship between the capacitor charge and electrical resistance can be described as:

$$Q(t) = Q\_0^{-\frac{1}{\hbar C}},\tag{1}$$

with,

#### *Q*(*t*) : Current capacitor charge;

	- *t* : Elapsed time;



From this relationship, the electrical resistance of the GP is obtained as:

$$R = \frac{t}{C \ln\left(\frac{Q\_0}{Q(t)}\right)}.\tag{2}$$

The system for the measurement of GP water content is described in Zimmermann [9]. An electric circuit as described in Hering [8] was implemented. Four capacitors (6300 μF ± 20% each) were connected in a series to facilitate a higher operating voltage as well as to increase the range of measurable resistance. The capacitors were connected to multiple relays, two resistors (100 kΩ ± 5%, 5.3 kΩ ± 10%), a laboratory power supply, and a programmable logic controller (PLC) (ifm CR0403). In Figure 3, the circuit for one sheet of GP is exemplarily shown. For simplification, the relays are drawn as switches connected to the PLC. Connections are denoted by chain-dotted lines. The measurement is realized using the PLC, which is connected to a PC via CAN-BUS. Every measurement is saved on the PC with a time stamp.

**Figure 3.** Circuit for the measurement of GP water content.

The measurements were implemented as follows: Each capacitor was consecutively charged to its maximum. The series-wound capacitors were connected to the GP at defined positions, starting the discharging process. The discharging process was terminated after 3 min. The remaining capacitor charge was measured via an analog current measurement via the PLC with an accuracy of ±0.2 mA. To facilitate the analog-to-digital conversion of the signal by the PLC, a sampling frequency of 10 Hz was selected.

The water content of 8 out of 20 GP sheets (two in each group) was measured successively. A single measurement lasted 15 min and the sampling time was 2 h. Two characteristic curves for the calculation of GP water content from the measured remaining charge were derived for two different positions as shown in Figure 4. The capacitor circuit was discharged through the GP between the indicated measurement positions and the ground. Only one measurement position was connected at a time.

**Figure 4.** Positions of contacts for measurement of GP water content.

The data were collected by repeated measurements of remaining charge and corresponding gravimetric water contents. The gravimetric water contents were measured using a precision balance. The dedicated measurements and characteristic curves are shown in Figure 5. The assumed characteristic curves were generated by numerical fitting assuming a logarithmic relationship between GP electrical resistance (and, by extension, water content) and remaining charge as suggested by Equation (2) [8].

**Figure 5.** GP water content and remaining charge for the two different measurement positions.

The evaluation of the measurement accuracy shows a mean deviation of automated measurements from gravimetric measurements of more than ±0.44 g/g for the whole effective range. This initial measurement accuracy, though typical for the implemented circuit, is unfortunately not sufficient enough to be used for the effective control of GP water content. From Figure 5 the practical problem resulting from the use of GP in combination with direct measurements from the automated measurement system can be clearly detected. From a theoretical point of view, the measurements are not accurate or can be assumed as noisy. To overcome this problem using GP, a new (filtered) measurement approach has to be developed. A similar problem is known from studies involving measurement of soil moisture content [10], where filters are also employed as solutions.

#### *2.3. Improved Model-Based Measurement System*

Uncertainties in a system measurement often limit the performance of controllers. Observers, filters, and especially Kalman filters were developed in the last few decades to improve estimations of real states resulting from noisy measurements and/or noisy processes/systems (systems with randomly varying system parameters). Kalman filters have been applied in Groenendyk et al. [10] and Rosnay et al. [11], combining model-based predictions of soil moisture content with in-situ measurements to generate more reliable moisture content estimates.

In this work, for measurement improvement, a specific filtering approach was employed. A model for time behavior of GP water content was developed to generate a prediction, which was combined with actual measurements based on correction factors derived from the mean variation of measurements. The model equation:

$$m\_{GP} \mathcal{W}\_k = m\_{GP} \mathcal{W}\_{k-1} + I\_{k-1} - E\_{GP} T\_{0\prime} \tag{3}$$

with,

*mGP* : GP mass;

*Wk* : Current gravimetric water content;

*Wk*−<sup>1</sup> : Gravimetric water content at previous time step;


is based on the assumption that the evaporation *EGP* depends on the GP water content. Preliminary tests showed a logarithmic dependency between the water content of GP and the rate of evaporation shown in Figure 6. This is consistent with a previously documented relationship between evaporation and soil water depletion [12]. Measurements were carried out gravimetrically using a precision balance. The two curves show the assumed dependencies between evaporation and GP water content. From these results, a strong (and formalizable) dependency on air humidity can be assumed.

To prospect these dependencies on evaporation the Penman equation as expressed in Ostrowski [13] as:

$$E\_{pot} = \frac{s}{s+\gamma} \frac{E\_R^{net}}{L} + \frac{\gamma}{s+\gamma} f(v)(\varepsilon\_s - \varepsilon\_d) \tag{4}$$

is employed, where,

$$c\_d = \frac{c\_s H}{100} \tag{5}$$

is assumed with:

*Epot* : Potential evaporation;


*Enet <sup>R</sup>* : Net irradiance;

*L* : Latent heat of vaporization;


**Figure 6.** Evaporation and GP water content for two different values of air humidity (28% and 58%).

The parameters saturation vapor pressure gradient *s*, wind velocity *v*, net irradiance *Enet <sup>R</sup>* , and latent heat equivalent *L* are assumed to be constant. From this the dependency:

$$E\_{pot} = \mathbb{C}\_1 + \mathbb{C}\_2(1 - H/100),\tag{6}$$

using constant terms *C*<sup>1</sup> and *C*<sup>2</sup> defining the relation between the relative air humidity *H* and the potential evaporation *Epot*, the evaporation from the germination paper, *EGP*, can be derived. The resulting equation for GP evaporation is:

$$E\_{GP}(H, \mathcal{W}\_k) = (\mathcal{c}\_1 H + \mathcal{c}\_2) \ln(m\_{GP} \mathcal{W}\_k) - \mathcal{c}\_3 H + \mathcal{c}\_4. \tag{7}$$

This model is fitted to the measurements using coefficients *c*<sup>1</sup> to *c*<sup>4</sup> (Figure 6) resulting in:

*c*<sup>1</sup> = −0.038292758; *c*<sup>2</sup> = 2.901962527; *c*<sup>3</sup> = −0.096373808, and;

*c*<sup>4</sup> = 7.18246835.

The transpiration influence can be neglected since the surface of plants leaves at the growth stage of interest in the studies related to this work (3 to 5 leaves) was significantly smaller than the surface of GP sheets. Measurements of evapotranspiration from plants at a similar growth stage grown under identical environmental conditions in 200 ml PET pots were found to be lower than 1 g/h, which is significantly lower than the evaporation from the GP sheets.

The automated measurements with model-based correction showed a significantly better correlation to the gravimetrically measured water contents. Implementation of the Kalman filter resulted in the elimination of measurement outliers and consequently a more accurate representation of the system state was obtained as compared to direct measurements taken from the system (Figure 7).

**Figure 7.** Automated measurements of GP water content W(Q) with and without model-based correction (Kalman filtering).

#### *2.4. GP Water Content Control (MPC and PI)*

The control loop for GP water content control is shown in Figure 8. The irrigation element was assumed as a zero order hold element. The measurement circuit was assumed as ideal. The evaporation was considered as a disturbance variable acting directly on the output.

**Figure 8.** Control loop for the control of GP water content. w(k)—reference variable; e(k)—control deviation; u(k)—calculated irrigation volume; u(t)—irrigation volume; E(t)—evaporation; y(t)—GP water content; y(k)—measured GP water content. (k) denotes a variable in discrete time; (t) denotes a variable in continuous time.

The system to be controlled (GP) is modeled as an integral transfer element. Including the hold element of zero order the *z*-transfer function:

$$G\_1(z) = \frac{T\_0}{T\_1} \frac{1}{1 - z} \tag{8}$$

is deduced. From the transfer function as described by Equation (8), two controllers are derived: A model predictive controller (MPC) based on Equations (3) and (7) [14] and a time discrete PI-Controller [15] described as:

$$G\_{R,PI}(z) = \frac{1 - 0.6z^{-1}}{1 - z^{-1}}.\tag{9}$$

The MPC approach was selected because it allowed the quantization of the actuation as well as related limitations. Both controllers were applied to realize two different control tasks, which are:


#### *2.5. Testing*

A test run was carried out:


Test groups A and C were controlled via individual control mode (two GP sheets each). Groups B and D were controlled via group control mode (five GP sheets each). Groups A and B were controlled using the discrete PI-controller while groups C and D were controlled using MPC. The reference variable was set to be 0.89 g/g for five days and afterwards 0.56 g/g for four days for all groups. The ambient conditions were adjusted as stated above. The GP sheets were weighed twice a day using a precision balance. These measurements were taken as reference for the evaluation of accuracy of the automated measurements.

#### **3. Results**

#### *3.1. Preliminary Tests*

The investigation of distribution of nutrient solution in GP over time shows a heterogeneous distribution pattern dependent on total GP water content, as qualitatively shown in Figure 9. The distribution gradient increased as the GP water content decreased, indicating an acceleration of evaporation with time.

**Figure 9.** Qualitative distribution of nutrient solution within a GP sheet.

Drying out the GP from saturation and evaluation of the water distribution indicated greater evaporation from the corner samples. This was due to a greater exposure to the environment at the frame boundary. Similarly, the left edge (Figure 9) experienced greater exposure to the atmosphere as a result of facing the interior of the greenhouse. This is assumed to be the reason for the left-right gradient of water distribution obtained. Gravitational effects would also contribute to faster loss of water from the topmost sections.

#### *3.2. Accuracy of Water Content Measurements*

The accuracy of the automated water content measurements was defined by the maximal difference between the automated measurements and the gravimetrical measurements as shown in Table 2. An accuracy of ±0.2 g/g was assumed to be sufficient for most applications within the experimental setup employed. Therefore the effective range was defined to 1.64 g/g > *W* > 0.80 g/g for measurement position 1 and to 0.66 g/g > *W* > 0.45 g/g for measurement position 2. Outside of these ranges, the automatic measurements are not accurate enough to be used for an effective control of GP water content. Variations of 0.2 g/g are large in comparison with a maximum water content of 2.0 g/g. It is assumed that these variations can be minimized by correction using a look-up table and elimination of disturbances within the measurement unit.

**Table 2.** The results of the measurement accuracy's evaluation. The measurement accuracy is defined as the maximal deviations of automated measurements from the manual measurements . The accuracy is evaluated for different ranges of retained GP moisture.


#### *3.3. Assessment of Water Content Control*

The gravimetric GP water content over time for all groups and for a reference variable of 0.89 g/g is shown in Figure 10.

**Figure 10.** Results of the control test run for a reference variable of 0.89 g/g. GP moisture values over time for all tested GP sheets. Groups A and C were controlled individually and groups B and D were controlled in group control mode.

The control deviation:

$$
\epsilon = \text{w} - \mathsf{W}\_{\\$^{\text{raw}}} \tag{10}
$$

with the reference variable *w* and the gravimetrically measured water content *Wgrav* as output variable is evaluated. Additionally, the integral of squared error:

$$A\_{quad} = \int\_{0}^{\infty} t^2 dt\tag{11}$$

is evaluated. The maximal control deviations and the integral of squared error of the different groups for *w* = 0.89 g/g are shown in Table 3.

**Table 3.** Maximal absolute control deviations and integrals of squared error for the different test groups.


For the plants under group control, the means and standard deviations of the control error within the groups are shown in Table 4. Each group consisted of 5 plants, with 7 measurements for each plant taken into consideration.

**Table 4.** Medians and standard deviations of control error for group-controlled plants.


#### **4. Discussion**

The actual GP water content as well as variations in GP water content between different GP sheets over time have not been considered in past studies [1–3]. One aim of this work was the development of a method for automated measurement of GP water content allowing the consideration of the above mentioned variables. Another aim was the development of GP water content control using automated measurements. This allows the application of defined water deficits with a variable intensity over time to examine related plant growth. This is not possible with common methods in existing studies [1–5].

The main focus of this work was the development of an automatic measurement of GP water content with an adequately effective range and accuracy. The measurements were realized by a capacitor circuit ultimately measuring the electrical resistance of GP. The measured values were additionally processed by a model-based Kalman filter to improve accuracy.

The effective range of measurement was 1.64 g/g > *W* > 0.80 g/g (for contact position 1) with a maximal GP water content of 2.0 g/g. This effective range could be adjusted by the reduction of distance between the measurement contacts on the GP sheets. The final accuracy of measurements (including model-based correction) within the effective range was smaller than ±0.2 g/g.

The control sytem performance was tested at two different levels of GP water content. For the experiments, a MPC was used to reduce the control deviations of the implemented control loop. The best control result in the test run was realized in a single group of GP sheets with a maximal control deviation of 0.09 g/g.

The application of MPC resulted in smaller maximal control deviations when compared to a time discrete PI-controller. Based on the integral of squared error, a direct advantage of one of the two controllers over the other cannot be ascertained. However, both controllers induced similar dynamical behavior. The best control result was obtained using a MPC applied in group control mode.

The control output in group control was based on the measurements from only 2 plants per group which were used for the control of the water content of all the plants in the group. From the results, it can be concluded that this method could be used to improve the control of GP water content based on measurements taken from a sample of GP sheets rather than from all individuals.

The quality of water content control strongly depends on measurement accuracy. The improvement of measurement accuracy in future could be achieved by improving the measurement circuit and by the reduction of disturbance values acting on the measurement.

The system allows the application of defined time-varying intensities of water deficit on plants grown on GP. Therefore it can be used to prospect the dependencies between dynamically changing water deficit and plant development. The system for measurement of GP water content can also be applied to test rigs using a pouch-and-wick configuration to identify variations in GP water content, which are not considered in many studies [1–3].

Causes of the differences between test groups were not explored for this work. The assumption was that these differences are either caused by differences in GP attributes or by local variations of surrounding conditions like temperature or wind.

#### **5. Summary and Conclusions**

Plant experiments involving water deficit require specific knowledge about the moisture available to the plant for control of irrigation. In studies using germination paper as a substrate, moisture measurement and precise watering based on the individual water uptake situation is difficult due to extreme measurement noise.

A novel method for improved measurement and suitable control of GP water content was described in this work. The measurement principle was based on the relationship between electrical resistance of GP, determined via capacitance measurements, and the moisture content of the GP. Kalman filtering was applied for improved accuracy. The control of GP water content was implemented using a model predictive controller.

The measurement system allowed for accurate determination of water content within the 0.8 g/g to 1.64 g/g range for GP with maximal water content of 2.0 g/g. The measurement range could be further extended by adjusting the positioning of the electrical contacts during measurements. The model predictive controller allowed the GP water content to be maintained at predetermined levels, with the ability to vary the desired GP water content over time. System measurement accuracy was ±0.22 g/g and control accuracy was ±0.09 g/g.

The introduced system strongly improved the existing technology and helped to improve sensing and actuation within lab procedures. The application of the system would allow continuous tracking of GP moisture content which is crucial in mapping dependencies between water content and plant behavior under water deficit conditions. The system also allows the maintenance of GP at dynamically varying water content values, which would allow the exploration of controlled variable water deficit sequences in plant growth studies.

Further work on the distribution of water on GP during water uptake would be useful in mapping out the actual water availability at different sections of the GP.

**Author Contributions:** The contribution of the authors to the work is as follows: Conceptualization, D.S. and F.K.; Methodology, D.S.; Validation, M.H., F.K. and L.O.; Formal Analysis, M.H.; Investigation, M.H.; Resources, D.S.; Data Curation, F.K.; Writing—Original Draft Preparation, M.H. (in 2017) and L.O.; Writing—Review & Editing, L.O. and D.S.; Proofreading—D.S.; Visualization, M.H.; Supervision, D.S.; Project Administration, D.S.; Funding Acquisition, D.S.

**Funding:** The APC was funded by the Open Access Publication Fund of the University of Duisburg-Essen. The research was partially funded by the German Academic Exchange Service.

**Acknowledgments:** Support by the Open Access Publication Fund of the University of Duisburg-Essen and German Academic Exchange Service is gratefully acknowledged.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Evaluating Soil-Borne Causes of Biomass Variability in Grassland by Remote and Proximal Sensing**

**Sebastian Vogel 1,\*, Robin Gebbers 1, Marcel Oertel <sup>1</sup> and Eckart Kramer <sup>2</sup>**


Received: 20 August 2019; Accepted: 18 October 2019; Published: 22 October 2019

**Abstract:** On a grassland field with sandy soils in Northeast Germany (Brandenburg), vegetation indices from multi-spectral UAV-based remote sensing were used to predict grassland biomass productivity. These data were combined with soil pH value and apparent electrical conductivity (ECa) from on-the-go proximal sensing serving as indicators for soil-borne causes of grassland biomass variation. The field internal magnitude of spatial variability and hidden correlations between the variables of investigation were analyzed by means of geostatistics and boundary-line analysis to elucidate the influence of soil pH and ECa on the spatial distribution of biomass. Biomass and pH showed high spatial variability, which necessitates high resolution data acquisition of soil and plant properties. Moreover, boundary-line analysis showed grassland biomass maxima at pH values between 5.3 and 7.2 and ECa values between 3.5 and 17.5 mS m−1. After calibrating ECa to soil moisture, the ECa optimum was translated to a range of optimum soil moisture from 7% to 13%. This matches well with to the plant-available water content of the predominantly sandy soil as derived from its water retention curve. These results can be used in site-specific management decisions to improve grassland biomass productivity in low-yield regions of the field due to soil acidity or texture-related water scarcity.

**Keywords:** apparent electrical conductivity (ECa); pH; UAV; boundary-line; quantile regression; law of minimum

#### **1. Introduction**

Precision agriculture (PA) technologies are increasingly applied on arable land to improve resource efficiency, reduce environmental impacts, and optimize agricultural productivity (e.g., [1–3]). This can only be achieved by understanding and controlling within-field variability of soil and/or vegetation properties [4–6]. Soil and biomass mapping using on-the-go proximal and remote sensing are a time, cost, and labor effective way to investigate soil characteristics and biomass production, quantify their spatial variability, and delimit homogeneous management zones (HMZ) for variable rate applications [4,7,8].

In grassland research and management, the advent of PA has long lagged behind [9]. One major obstacle was the high variability of soil and crop characteristics, both within-field and between grassland plots, as well as the lack of knowledge in explaining the causes of such heterogeneity [9,10]. However, over the past decade numerous studies focused on that matter to close that knowledge gap (e.g., [11–15]). They monitored grassland biophysical parameters by means of optical remote sensing technologies [16–20], or mapped soil properties using proximal soil sensors, most of all applying geophysical methods to measure the apparent electrical conductivity (ECa) [11,13,21]. Especially soil ECa is recently being used for precision grassland management to characterize within-field variability

of soil properties and identify HMZ. This is because ECa correlates to a series of yield effecting soil properties such as soil texture or moisture availability [22,23], while, on the other hand, ECa sensors are relatively easy to use and cause only minor damage to the grass sward [11,24].

However, only a few studies exist that combine soil data from proximal sensing with crop data from remote sensing (e.g., [14]). This is of interest because vegetation mapping by remote sensing may indicate spatial variation in a very efficient way but often does not tell the reasons for the observed variability. However, for an appropriate site-specific grassland management it is indispensable to understand the cause-effect relationship between soil properties and grassland productivity. Several soil properties such as pH, nutrient content, bulk density, water content, and soil texture can affect plant growth considerably. A couple of them are manageable (e.g., by fertilization) while others cannot be changed (e.g., soil texture) but have to be regarded in the decision making.

The overarching objective of this study was to investigate the relationship between soil characteristics and biomass production to identify high and low yielding regions within the field and their possible soil-borne causes. Specific aims were to: (i) Map a grassland field on-the-go for the pH value and the apparent electrical conductivity (ECa) using proximal soil sensors, (ii) quantify grassland biomass production from unmanned aerial vehicles (UAV)-based remote multi-spectral imagery, (iii) evaluate the spatial variability of pH, ECa, and grassland biomass using semivariance modelling, and (iv) investigate the correlation among pH, ECa, and grassland biomass by means of boundary-line analysis to evaluate causes of grassland biomass variability with special emphasis on the yield-limiting effect of pH and ECa.

#### **2. Materials and Methods**

#### *2.1. Site Description*

The grassland field Königsfeld is situated northwest of the city of Potsdam, in the federal state of Brandenburg (Germany; 5811800N, 365600E; ETRS89). According to the Koeppen-Geiger climate classification system, Potsdam belongs to the humid continental climate (Dfb). It has an average temperature of 9.2 ◦C with January and July being the coldest (−0.6 ◦C) and hottest (18.6 ◦C) month, respectively. The mean annual total precipitation is 566 mm with February being the driest (35 mm) and June the wettest (67 mm) month (German Meteorological Service [DWD], 1981–2010).

The soil cover consists mainly of Gleyic Cambisols of sandy soil textures [25]. Whereas Cambisols, as pedogenetically young soils, are associated with the glacial and periglacial origin of the soil's parent material. The principal qualifier Gleyic derives from the vicinity to the Sacrow-Paretz Canal in the northeast and its related groundwater influence. However, during the field campaigns in 2016 and 2017, no water logging was observed at the soil surface.

The Königsfeld has a total area of 15 ha and was covered by a mixture of grass and alfalfa. The field was used as grassland since 2011, predominantly as grazing land for cattle. In the past, a fence parted the field into two halfs. Despite removal, the growing patterns of the grassland are still affected at its former location. The same applies to a former watering trough for the cattle that was situated at the western corner of the field. For that reason, both areas were disregarded in the following analyses (Figure 1A).

**Figure 1.** Unmanned aerial vehicles (UAV)-based orthophoto of (**A**) the Königsfeld with the locations of (**B**) biomass sampling sites, (**C**) sensor-based pH, and (**D**) apparent electrical conductivity (ECa) measurements.

#### *2.2. Methods*

#### 2.2.1. Proximal Soil Sensing of Grassland Soil

The Königsfeld was mapped in March 2017 for pH and apparent electrical conductivity (ECa) using the Mobile Sensor Platform (MSP) by Veris technologies (Salinas, KS, USA) as described by Lund et al. [26] and Schirrmann et al. [27] (Figure 2A).

The pH value was measured on-the-go by two ion selective antimony electrodes on naturally moist soil material (Figure 2B). While driving across the field, a sampler is lowered into the soil to about 12 cm depth and a soil core flowed through the sampler's orifice. When the soil sampler is raised out of the soil it pressed the sample against two pH electrodes for measurement. Measurements were stopped if sufficiently stable or if a maximum time of 20 s was reached. A logger recorded the raw potential data along with Global Navigation Satellite System (GNSS) coordinates. Additionally, an on-line conversion of the voltage data into pH values was conducted based on a preceding calibration with pH 4 and 7 standard solutions. After each measurement, the sampler was pushed into the soil again and the old soil core was replaced by new material that entered the sampler trough. In the meantime, the pH electrodes were cleaned with tab water from two spray nozzles to prepare them for the next measurement cycle. Typically, pH values were recorded every 10 to 12 s. GNSS coordinates were recorded when the sampler shank was raised out of the soil.

ECa was measured every second with a galvanic coupled resistivity instrument using six parallel rolling coulter electrodes (Figure 2A). This electrode configuration provided readings over two depths with a median depth of exploration of 0.12 and 0.37 m [28]. This enables the identification of significant soil textural and/or soil moisture changes between soil horizons.

**Figure 2.** (**A**) Veris mobile sensor platform, (**B**) Veris pH-Manager, and (**C**) furrow cut by the pH sampler trough. Numbers indicate (1) rolling ECa coulter electrodes, (2) pH sampler trough, (3) pH electrodes, and (4) cleaning nozzles.

#### 2.2.2. UAV-Based Remote Sensing of Grassland Biomass

The UAV flight was performed at the end of October 2016. The UAV platform was a hexa-copter system HP-Y6 (HEXAPILOTS AHLtec, Dresden, Germany) and with a flight control software based on DJI Wookong M (DJI Innovations, Shenzhen, China). The camera was mounted on a two-axis gimbal underneath the copter. A Sony α6000 (ILCE-6000) (Sony corporation, Tokyo, Japan) RGB camera was used for image acquisition. It had a 23.5 × 15.6 mm sensor (APS-C format) with 4000 × 6000 pixels. The Sony SEL-16F28 lens had a fixed focal length of 16 mm and a maximum aperture of F2.8. The copter was navigated along a predefined route at an altitude of about 100 m. The camera produced a sequence of nadir images at fixed points along the route with an overlap of 70%. From these images an orthophoto mosaic with 10 cm resolution was generated with Agisoft PhotoScan photogrammetry software (Agisoft LLC, St. Petersburg, Russia). Before the flight, white 30 ∗ 30 cm landmarks were placed as ground control points in the field and were georeferenced with an RTK GNSS.

To estimate grassland biomass from the RGB ortho image, the following primary data and derivatives were used:


$$NDVI = \frac{R\_{NIR} - R\_{Red}}{R\_{NIR} + R\_{Red}} \tag{1}$$

• Visible atmospheric resistant index,

$$VARI = \frac{R\_{Gracu} - R\_{Rcd}}{R\_{Grecu} + R\_{Rcd} - R\_{Blue}} \tag{2}$$

#### 2.2.3. Reference Sampling and Laboratory Analysis

To relate the ECa data to soil texture and/or soil moisture, 25 reference soil samples were taken at locations that meet the following three requirements [29]:


At each reference sampling point, five subsamples were taken with an auger from 0 to 30 cm depth within a radius of 0.5 m. The subsamples were mixed and filled in a plastic bag.

In the laboratory, the soil samples were oven-dried at 105 ◦C. Afterwards, the particle distribution of the fraction <2 mm was determined according to the German standard in soil science (DIN ISO 11277) by wet sieving and sedimentation after removal of organic matter with H2O2 and dispersal with 0.2 N Na4P2O7.

To calibrate the pH sensor data 13 reference samples were analyzed in the laboratory by the German standard method (DIN ISO 10390). The soil was dried and sieved (see above) and 10 g of it were mixed with 25 ml of a 0.01 M CaCl2 solution. The pH was measured with a glass electrode after 30 min.

For relating the multispectral aerial photograph on the grassland biomass, at the beginning of November 2016, aboveground biomass from 1 m2 was cut at 20 points along four transects (Figure 1B). The biomass samples were weighted before and after oven-drying at 75 ◦C to obtain fresh and dry matter weight.

#### 2.2.4. Data Analysis

#### Spatial Data Alignment and Visualization

Spatial data alignment and visualization was accomplished with QGIS (QGIS Geographic Information System, QGIS Development Team, Open Source Geospatial Foundation). Coordinates were transformed to a common Cartesian ETRS89 system. Image data at sampling locations and sensor measurement points were extracted by spatial queries.

#### Analysis of Spatial Variability and Spatial Interpolation by Geostatistics

Spatial variability of the sensor data was quantified by variogram modeling [30–32]. The variogram can provide information about the range of spatial autocorrelation (range parameter) and the disparity between observations beyond the range of autocorrelation, which is called the sill parameter. Additionally, the nugget parameter summarizes the measurement error and sample micro-variability. For variogram modeling, firstly, the method of moments [32] was used to obtain the empirical semivariogram, which relates the average squared differences between observed values to their respective distance class (lag interval). Secondly, a theoretical semivariogram model was fitted to the empirical semivariogram. Final variogram models were established after outlier removal (see below). Calculation and visualisation was carried out with own MATLAB (The MathWorks, Inc., Natick, MA, USA) codes based on [30–33].

#### Outlier Removal by Spatial Cross-Validation

Since ECa and pH sensor measurements can be prone to error, data were checked for spatial outliers, which strongly deviate from the surrounding observations. After an initial fit of a variogram model (see above) a leave-one-out cross validation procedure was applied [32], which used the variogram model for estimating the values by kriging at each measurement location after excluding the sample value there. A linear regression was calculated between the measured and the estimated values and data points having a Cooks distance larger than 0.033 were excluded from the data set.

#### Calibration of Sensor Data

Veris pH data: Even though the electrode readings in mV were referenced by pH 4 and 7 standard solutions, the measurements of actual soil samples need further calibration to match them with the standard laboratory method. This was necessary because the standard laboratory method extracts more protons from the soil due to the longer extraction time and because glass electrodes show higher sensitivity/responsiveness compared to antimony electrodes, which results in a trade-off between the laboratory method and the mobile sensor measurements, in particular at lower pH values. Since outlier removal (see above) reduced the number of samples that were co-located with sensor measurements, also samples which fall in a four-meter distance to sensor measurements were included in the calibration. A spatially weighted linear regression was applied which yielded a slope of 0.35, an intercept of 4, a RMSE of 0.2, and a Pearson's r of 0.66.

RGB image: Data from the primary RGB image and derived indices were picked with the zonal statistics procedure of QGIS within a four-meter radius around the biomass sampling locations. Biomass (fresh and dry) was related to image data by linear regression.

#### Modeling of Growth Response by Boundary-Line Analysis

Under outdoor conditions, biomass development can be restricted by various environmental factors in space and time. Consequently, the bivariate correlation between environmental factors and biomass of a whole field very often show only weak relationships and statistical distributions of non-constant variances. This is because a particular factor controls biomass development only in a subset of field observations, namely when all other growth factors are in their optimum range. This fact was first noted by Carl Sprengel in 1828 and later popularized by Justus von Liebig as the "Law of the Minimum" or "Liebig's Law". Consequently, for quantifying the effect of a single factor on biomass development in observational studies (uncontrolled experiments), where several factors can be limiting in different situations, the assumption of a symmetric error distribution in classical regression analyses is not valid. Instead, Webb [34] suggested to use the observable upper boundary of the point cloud in a bivariate scatter plot to describe the cause-and-effect relationship (with the limiting factor on the abscissa and the biomass or yield on the ordinate). While this early approach was mainly biologically motivated, mathematical solutions were presented later by Koenker and Bassett [35], Kaiser and Speckman [36], and Lark and Milne [37]. In particular, quantile regression is a powerful tool to detect relationships between variables when an asymmetric distribution of errors (residuals) is assumed [35]. When estimating an upper quantile (e.g., 90th, 95th, or 99th percentile) of the conditional distribution of a dependent variable, quantile regression can model the effect of limiting factors by masking the effect of other unknown or unmeasured limiting factors [38–40]. From controlled experiments (with all factors but one being optimum) it was learned that growth-controlling factors often vary between too low, optimum, and too high [40]. This is particularly true for pH: Many crops show increasing growth depressions if pH values sink below 5.5 and rise above 7.5 while pH values around 6.5 are optimum. This effect can be modeled by a piecewise linear function with a trapezoidal shape. Fitting of trapezoidal models of growth response by quantile regression was implemented in MATLAB based on the loss function as described in Koenker and Bassett [35].

#### **3. Results and Discussion**

#### *3.1. Descriptive Statistics and Correlation Between Grassland Biomass and Multi-Spectral Indices*

Descriptive statistics of the sensor and laboratory data are shown in Table 1. During the field survey at 7 November 2016 as well as by examination of the UAV-based aerial photograph from 24 October 2016, variation in stand height and species composition was clearly visible. In areas of taller and denser vegetation, alfalfa was dominant, whereas zones of smaller and rather sparse vegetation were characterized by grasses (Figures 1A and 3).


**Table 1.** Descriptive statistics of the sensor and laboratory data.

**Figure 3.** Visible small-scale variation of the grassland vegetation at Königsfeld. Alfalfa is dominant in the fore- and background while mainly graminaceous plants can be seen in the midst of the image.

To estimate area-wide grassland biomass of Königsfeld, multi-spectral indices derived from the UAV-based orthophoto mosaic were checked for correlation with fresh and dry matter (Figure 4). Hue, saturation, normalized difference vegetation index (NDVI) and visible atmospheric resistant index (VARI) correlated well with fresh and dry matter weight of the biomass (Figure 4). Despite the risk of insufficient estimation of high biomass due to saturation of vegetation indices at high densities [41,42] the correlation between NDVI and dry matter weight was strongest having a Pearson correlation coefficient (Pearson's r) of 0.812. The NDVI was used to generate a regression model for the quantification of dry matter (DM) grassland biomass for the entire field (Figure 5). To derive DM, the regression model was rearranged (Equation (3)). The RMSE was 36.1 g.

**Figure 4.** Scatterplots and correlations between fresh and dry matter weight of the grassland biomass and multi-spectral indices deduced from the UAV-based aerial photographs.

**Figure 5.** (**A**) Linear regression model of grassland biomass (dry) and normalized difference vegetation index (NDVI) and (**B**) grassland biomass estimation for the entire field.

#### *3.2. Spatial Variability of Grassland Biomass, pH, and ECa*

The results of spatial variability assessment of the sensor data by means of semivariance modeling are shown in Figure 6. As for the grassland biomass, pH and ECa, two nested models plus the nugget effect were fitted. That means that there were at least two spatial processes with different ranges. Variography of grassland biomass, derived from NDVI, result in a very low nugget effect as the first component of spatial correlation. This indicates a very low spatial micro-variance and a very low random error from the measurement. Two spherical models were fitted components of spatial (macro) correlation. The range of the first spherical model is about 18 m and that of the second is 245 m (Figure 6A). From the first range, it can be deduced that about half of the spatial variance of grassland biomass already occurs at distances smaller than 18 m. This corresponds to the visual impression of the UAV-based grassland biomass map (Figures 1A and 3) showing pronounced small-scale disparities.

The empirical semivariogram of the pH value also exhibits a nested structure (Figure 6B). The nugget effect is relatively high which can be attributed (a) to microvariance, which is partly to the small sensor footprint and (b) to measurement errors. The first spatial macro process was described by a spherical model with a range of 41 m indicating a high small-scale pH variance of the soil. This can be generally attributed to the high degree of soil variability in Brandenburg due to the Pleistocene and Holocene origin of its parent material and is especially amplified by decades of agricultural use [43–45]. Finally, the third component of spatial correlation within the pH semivariogram was described a Gaussian model showing a range of 295 m.

As with the biomass, the semivariogram model of the apparent electrical conductivity (ECa) was also characterized by a very low nugget effect (Figure 6C). Two nested exponential model components are showing ranges of 89 and 900 m.

The semivariance models have shown that most of the spatial variability of grassland biomass, pH, and ECa occurs at relatively low distances of 18, 41, and 89 m. This illustrates the necessity of small-scale data acquisition of soil and plant properties for precision agriculture applications.

**Figure 6.** Empirical semivariograms and fitted models of (**A**) NDVI derived grassland biomass [dry matter weight, g m<sup>−</sup>2], (**B**) pH value, and (**C**) ECa[mS m<sup>−</sup>1]. Dashed vertical lines indicate the range of the first component of the nested theoretical models.

#### *3.3. Relationship Between ECa, pH, and Grassland Biomass*

To investigate the relationship between sensor-based pH value, ECa data, and grassland biomass, in a first step scatterplots were created (Figure 7). At first sight, the scatterplots show no correlation between the dependent and the independent variable and, especially visible for the ECa data, a non-constant variance (Figure 7B). In contrast, the variance of ECa clearly increases with increasing grassland biomass which violates one of the key assumptions of linear regression. This is generally due to the complex nature of factors effecting biomass production. In the context of Liebig's law of the minimum, the variability in the biomass data cannot be entirely explained by ECa and the pH value alone since many other unmeasured factors are a potentially limiting resource for plant growth [46]. Thus, instead of linear regression analyzing the correlation of the means, quantile regression was applied estimating the 95th percentile of the conditional distribution of grassland biomass. This masks the effect of other unknown or unmeasured yield-limiting factors (e.g., [36,38,39]) and the limiting effect of ECa and the pH value can be visualized by means of the boundary line.

**Figure 7.** Association between pH, (**A**) ECa, (**B**) log-transformed ECa, (**C**) and grassland biomass of a alfalfa-grass mixture and modelling of their yield-limiting effect by means of quantile regression.

A trapezoidal boundary-line model fitted best to the 95th percentile of the pH value (Figure 7A). It characterizes three stages of response of alfalfa-grass mixture productivity to soil pH [47]: A linear yield increase until a pH value of 5.3, a middle stage of yield stability consisting of a light negative slope until pH 7.2, and a linear yield decline with increasing pH values. This corresponds with several publications stating a pH optimum for alfalfa between 5.8 and 7.2. However, some of these references [48–50] indicate that there might be an interaction between pH and soil texture: The pH optimum shifts to a higher range on fine-textured soils and to a lower range on sandy soils. As soil texture at Königsfeld is strongly dominated by sand (86% sand, 11% silt, and 3% clay) this could be a reason for observing a lower threshold of optimum pH at 5.3. For management of the field it can be considered to raise the pH value by liming at sites with pH lower than 5.3. In contrast, lowering too high pH values (>7.2) is more difficult, in particular if the high pH is due to carbonaceous parent material. The farmer might, at least, avoid any liming in these areas and can reduce other inputs as well.

Before applying quantile regression to ECa, the data were log-transformed (log10) to improve visibility of patterns in the data (Figure 8C). Additionally, for the ECa data a trapezoidal boundary-line model matched best with the 95th percentile. The model can be interpreted as follows: Biomass increases until 3.5 mS m−1, no limits related to ECa are imposed until 17.5 mS m−1, and biomass decreases with increasing ECa above 17.5 mS m<sup>−</sup>1. In fact, by visually comparing the grassland biomass with the ECadeep map, it is striking that especially low (light green) and high (dark green) values of grassland biomass coincide well with low and high ECadeep values, respectively (Figure 8A,B).

As statistical analyses often show correlations between ECa and clay or sand content, geoelectrical methods are commonly accepted for the estimation of the soil texture [28,51,52]. However, at Königsfeld, the analysis of correlation between ECadeep and soil texture of 25 lab-analyzed reference soil samples showed only poor results. Pearson coefficients of −0.45, 0.43, and 0.25 were obtained for the sand, silt, and clay fraction, respectively. The observation of sometimes weak and inconsistent relationships between ECa and soil properties were also reported by Corwin and Lesch [53] and Sudduth et al. [54]. One reason is that ECa is a cumulative parameter which is affected by a series of soil characteristics such as clay content, soil moisture, salinity, temperature, bulk density, or organic matter content [53,55]. If these parameters are not correlated and if a single parameter is not dominating the influence on ECa statistical analysis becomes difficult. A low range of variation in the control variable can be another reason for poor correlations. At Königsfeld, soil texture varied only within three soil texture classes, namely slightly silty sand (Su2), slightly loamy sand (Sl2), and pure sand (Ss) (Figure 9A). However, the correlation between ECadeep and soil moisture (θ) showed much better results, receiving a Pearson's r of 0.84 (Figure 9B). Thus, in the next step, the ECadeep data were calibrated on the soil moisture following Equation (4):

$$
\partial \left[ \% \right] = 6.0 + 0.4 \, ECa\_{dep} \tag{4}
$$

where θ is the soil moisture in % and ECadeep the sensor-based apparent electrical conductivity in mS m−<sup>1</sup> at a depth of 0.37 m. Similarly, Cousin et al. [56] and Besson et al. [57] used the relationship between geoelectrical methods and water content to map the spatial distribution of soil water content at the field scale.

**Figure 8.** (**A**) Sensor-based estimate of dry grassland biomass, (**B**) soil ECadeep with soil moisture in parentheses, and (**C**) pH value at Königsfeld.

**Figure 9.** (**A**) Soil texture classes of the 25 lab-analysed reference soil samples. (**B**) Linear regression analysis of apparent electrical conductivity (deep) and soil moisture. (**C**) Mean water retention curve of Königsfeld based on soil texture of the samples and the soil physical data base of [37]. (PWP: Permanent wilting point, FC: Field capacity, AWC: Available water content).

Using the calibration model in Equation (4), soil moisture optimum of the alfalfa-grass mixture was derived from the ECadeep and biomass data. It ranges between 7% and 13%. To better understand the ecological significance and plant availability of the two soil moisture values, we calculated a mean water retention curve on the basis of the mean soil texture data of Königsfeld and the soil physical data base of Wessolek et al. [58] (Figure 9C). Special emphasis was put on the two ecologically important soil moisture states field capacity (FC, pF 2.5) and permanent wilting point (PWP, pF 4.2). FC is the maximum amount of soil water that a soil is able to hold in the root zone against gravity at a matrix potential of −0.33 kPa (pF 2.5). In contrast, the PWP is the amount of soil water that is so strongly retained by the smallest soil pores at a matrix potential of −15 kPa (pF 4.2) [59]. As a consequence, plants are not able to absorb that water and start to wilt. The difference between FC and PWP describes the plants available water content of a soil (AWC) [60]. According to Figure 9C the PWP and FC at Königsfeld is at a soil moisture content of 5% and 13%, respectively. At low soil moistures, biomass production of the alfalfa-grass mixture is already decreasing at soil moistures below 7% even though it is still within the range of the usable field capacity of the soil. This is probably caused by an uneven soil water distribution within the range of AWC especially at low moisture contents [61]. Moreover, plant roots are not uniformly distributed in the soil and the soil water potential that can be overcome by roots differ with different plant species [62]. To increase the water holding capacity (WHC) of these sandy and dry sites, showing ECa below 3.5 mS m−1, the farmer can try to ameliorate by spreading organic fertilizers. However, a significant improvement of WHC will require huge amounts of organic matter. Alternatively, an adaptive strategy could include the seeding of grassland species that are more drought-tolerant.

At high soil moistures, on the other hand, grassland yield begins to decline when soil moisture exceeds field capacity (>13%). Below a pF value of 2.5, water is not held by the soil matrix potential anymore. Instead, it starts to slowly percolate through the soil following the gravitational potential and may contribute to the groundwater below the root zone or is retained by soil layers with poor drainage in the root zone. The latter situation creates water logging with adverse effects on crop growth due to the lack of oxygen. High ECa values were observed at elevated areas, in particular top-slopes, of the field. We assume that at these terrain positions soils were eroded and glacial until it can be found in shallow depths. This is in line with information from the national soil survey [63].

#### **4. Conclusions**

A grassland field on sandy soils in Northeast Germany was mapped by proximal soil sensors and remote sensing. The UAV-based orthophoto was used to estimate spatial variation in grassland biomass. Soil pH value and apparent electrical conductivity (ECa) from proximal sensing were investigated as indicators for soil-borne causes of biomass variation. The spatial variability and correlation between pH, ECa, and grassland biomass was analyzed by means of semivariance modelling and boundary-line analysis focusing on the yield-limiting behavior of pH and ECa.

Area-wide mapping of biomass provided means to obtain a sound description of spatial variability by variography based on a large data set. However, the RGB imagery alone does not explain the reasons of spatial variation in biomass. Knowledge about cause-effect relationships is fundamental for taking the appropriate measures in grassland management. Proximal soil sensing in conjunction with boundary line analysis helped to understand some of the drivers of biomass variability.

Comparing the two proximal soil sensors, the pH meter offers the benefit of direct assessment of a key soil fertility factor. The agronomic interpretation of pH data is straight forward and many recommendation tables exist that relate pH values to decisions on liming or selection of crops or varieties. In contrast, the ECa values are affected by several soil properties, including water, salinity, texture, temperature, and compaction. It requires additional efforts to interpret the ECa data and to transfer them into information that is meaningful in agricultural decision making. Technically, however, the EC sensor showed a better performance than the pH system. The ECa data were less noisy (as derived from the nugget effect parameter in the variogram model), they provided a high spatial resolution due to the higher measurement frequency (1 Hz), and the coulter electrodes did not seriously harm the turf. The marks of the pH sampler, however, were visible even after one year. This can promote the expansion of unwanted plant species in the grassland or initiate soil erosion. Compared to the ECa sensor, the pH data were relatively noisy and much sparser. To improve the pH measurements, better solutions for sampling and/or sample preparation have to be found and measurement time has to be reduced.

As one of our reviewers pointed out, this study has neglected the temporal aspect of vegetation dynamics. In particular, seasonal variation of plant growth, due to changes in temperature and photoperiod, could modify the spatial distribution of biomass and probably even vegetation composition. For example, above ground plant parts can die off due to frost over winter. When the plants regrow in the following spring, spatial variation of biomass might not be very strong or may even show different spatial patterns over time. Thus, the relationship between biomass and soil properties (ECa and pH) could change within the season. On the other hand, spatial patterns of vegetation composition and biomass might evolve over several vegetation periods. The vegetation might be spatially more uniform after seeding while more pronounced patterns will appear in

subsequent years. This evolution of stable and pronounced spatial patterns can not only be attributed to abiotic environmental factors but also to interspecies competition and to species persistence due to accumulation of root systems and/or seeds of certain plant species at certain spots. Unfortunately, we had not enough resources to thoroughly monitor temporal dynamics of spatial vegetation patterns over time in this study. From qualitative observation (field scouting) we had the impression that the patterns of Königsfeld were relatively stable five years after sowing. With regard to the interaction of spatial and temporal dynamics of alfalfa-grass mixtures we suggest that biomass should be assessed a few years after sowing and preferentially later in the vegetation period. However, further studies are required to substantiate this assumption.

**Author Contributions:** Conceptualization: R.G. and S.V.; Methodology: R.G.; Formal Analysis: R.G.; Investigation: R.G., S.V., M.O., and E.K.; Data Curation: R.G., S.V., and M.O.; Writing—Original Draft Preparation: S.V.; Writing—Review and Editing: R.G., M.O., and E.K.; Visualization: S.V.; Supervision: R.G.; Project Administration: R.G. and E.K.; Funding Acquisition: R.G. and E.K.

**Funding:** The study was mainly funded by the Federal Ministry of Food and Agriculture of Germany within the research project pHGreen (Grants 2812NA116 and 2812NA031).

**Acknowledgments:** The authors thank Jörn Selbeck for UAV flights, Antje Giebel for photogrammetry, Christiane Oertel, Harald Kohn, and Michael Facklam for laboratory analysis as well as two anonymous referees for their constructive comments that considerably improved the manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest. The sponsors had no role in the design, execution, interpretation, or writing of the study.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **High Speed Crop and Weed Identification in Lettuce Fields for Precision Weeding**

#### **Lydia Elstone \*, Kin Yau How, Samuel Brodie, Muhammad Zulfahmi Ghazali, William P. Heath and Bruce Grieve**

School of Electrical and Electronic Engineering, The University of Manchester, Oxford Rd, Manchester M13 9PL, UK; hkyhow12@gmail.com (K.Y.H.); brodie667@hotmail.com (S.B.); ghazalizulfahmi@gmail.com (M.Z.G.); william.heath@manchester.ac.uk (W.P.H.); bruce.grieve@manchester.ac.uk (B.G.)

**\*** Correspondence: lydia.elstone@postgrad.manchester.ac.uk

Received: 7 November 2019; Accepted: 25 December 2019; Published: 14 January 2020

**Abstract:** Precision weeding can significantly reduce or even eliminate the use of herbicides in farming. To achieve high-precision, individual targeting of weeds, high-speed, low-cost plant identification is essential. Our system using the red, green, and near-infrared reflectance, combined with a size differentiation method, is used to identify crops and weeds in lettuce fields. Illumination is provided by LED arrays at 525, 650, and 850 nm, and images are captured in a single-shot using a modified RGB camera. A kinematic stereo method is utilised to compensate for parallax error in images and provide accurate location data of plants. The system was verified in field trials across three lettuce fields at varying growth stages from 0.5 to 10 km/h. In-field results showed weed and crop identification rates of 56% and 69%, respectively. Post-trial processing resulted in average weed and crop identifications of 81% and 88%, respectively.

**Keywords:** precision weeding; multispectral imaging; kinetic stereo imaging; plant detection

#### **1. Introduction**

Traditionally, weed management has been achieved through broadcast application of selective herbicide. A large proportion of herbicide used in broadcast spraying is released into the environment through run-off and drift [1]. Increasing environmental and public health concerns have resulted in stricter regulation of herbicide use [2,3]. Traditional selective herbicides also have to contend with a growing number of resistant weed species [4]. Weeds represent a loss potential of 32%, with an effect comparable to that of pathogens and parasites combined [5]. Therefore, effective weed control is essential to maintaining and increasing worldwide food productivity to provide for a growing global population [6]. Speciality crops, such as vegetables, are disproportionally affected by herbicide resistance, as very few herbicides are registered for use in the sector [7]. Such farmers have been forced to use hand-weeding methods, which are expensive, inefficient, and made more difficult by an industry-wide labour shortage [8,9].

Several approaches have attempted to tackle weed management issues through the introduction of new technology. Herbicide waste can be moderately reduced through the use of variable-rate spraying and modifying existing spray booms [10]. Herbicide inputs may be completely eliminated through robotic weed removal [11,12]. These approaches can be implemented using relatively simple detection techniques, estimating overall plant coverage [10,13] or crop position [12,14]. Individual weed targeting for mechanical removal [15] or herbicidal micro-dosing [16,17] has also been investigated in recent years. Non-selective herbicides less effected by herbicide resistance can be utilised for micro-dosing systems, which can reduce herbicide requirements by up to 99.3% according to [18]. In order to achieve individual weed targeting, plant identification and accurate target position information are essential.

Weed detection methods most often use a combination of a colour index followed by image segmentation and feature extraction, to locate weeds in the crop bed [19]. Other methods have been considered, but not so extensively, such as the use of LIDAR (light detection and ranging) to discriminate between crops, plants, and soil using height information [20]. Colour indices use some combination of visible and near infra-red (NIR) reflectance to create a single greyscale image [19,21]. Some methods require the transformation into different colour spaces [22], such as normalised RGB [23] or HSV (hue, saturation, and value) [24]. Common indices include ExG (excess green), ExR (excess red), NDI (normalised difference index), and NDVI (normalised difference vegetative index), which discriminate between plants and soil with varying effectiveness [19]. Images are segmented using a variation of thresholding techniques, but most commonly Otsu's thresholding technique [21]. To discriminate crops from weeds, other information is needed; shape [25] or textural features [26] are most commonly used, which can be computationally expensive [3].

Machine learning has been used extensively in the discrimination of vegetation and soil and between vegetation types. Support vector machines proposed in [27–29] have been utilised, in combination with a selection of features, such as colour or shape information, for plant identification. Fuzzy decision making methods were considered in [25] for the identification of monocot and dicot weeds, using a set of shape features achieving up to a 92.9% classification accuracy. Each of these approaches for decision making is applied following the segmentation of the image into crop and weed areas and the calculation of a selection of features for plant areas. Convolutional neural networks (CNN) have also been utilised successfully in [30,31] to locate weed patches, without the need to provide feature sets prior to processing. While the method increases the training time for the algorithm compared to, for example, support vector machines, it may provide a more universally applicable system [31]. These methods show an important next step in the identification process, but are complex and relatively computationally expensive. They are most notably being used to identify patches of weeds within a field, not individual weeds in real-time.

This study focuses on the development of a low cost, high-speed detection of individual plants for use with a herbicide micro-dosing system, assuming a targeting area 20 mm in diameter. As such, our detection system was designed with the target of operating at 5 km/h with the capacity for cost-effective retrofitting to existing tractor tool-bar mounts, and minimal equipment cost. The approach should be robust to variations in ambient lighting and to inadvertent wind dispersion on the injected herbicidal products. The assembly, therefore, is enclosed under a tractor-mounted hood (Figure 1) and controlled illumination is provided. The research aimed to determine the effectiveness of using spectral reflectance in the red, green, and NIR wavebands using controlled illumination, combined with size information in the discrimination between crops, weeds, and soil in horticultural crops. Accurate position information for individual weed targets is required whilst maintaining a large field-of-view from the camera to ensure high-speed operation. A system to mitigate the effect of parallax error on location information of plants has been proposed through a kinematic stereo method. Where possible, the detection method should provide flexibility for use across a range of scenarios. As such, a modular design was utilised to allow for customisable width up to a maximum of seven modules (3.78 m wide), which may be implemented concurrently for each PC.

#### **2. Materials and Methods**

#### *2.1. Hardware Design*

Each camera-lighting module (Figure 1b) covers a width of 0.54 m, allowing for customisation according to application. Weather, time, and location may cause significant variations in the imaging environment, and the detection system must be robust to these changes. Therefore, a closed canopy houses the imaging system to provide illumination control. For easy integration with current farming equipment, the canopy is mounted to the rear of a tractor (Figure 1a) and covers one crop bed (1.6 m, 3 camera-lighting modules).

**Figure 1.** The system in field trials. (**a**) The system mounted to the rear of the tractor. (**b**) The sensor system in-field.

Three main components are used for crop and weed identification: a PC, RGB camera, and lighting rig; a 12 V lead-acid battery powers the system in conjunction with an inverter to supply the PC; LED arrays, 650 (red), 850 (NIR), and 525 nm (green) (Justar Electronic Technology Co. LTD). 5 × 5 arrays at 525–530 nm, 650 nm, 850 nm, 25 W, 1.75 A are used for illumination.

An RGB machine vision camera (PointGrey, Grasshopper 3, GS3-U3-41C6C-C), with a wide angle lens (Fujinon CF12.5HA-1), is used for image capturing. The camera is mounted between two lamps to provide shadowless illumination. Each lamp consists of three LED arrays of each wavelength placed at 120◦ intervals within a metal hemisphere (Figure 2). The camera is positioned 0.6 m above the ground resulting in a field of view of 0.54 × 0.54 m and a resolution of 0.9481 px/mm. This height was chosen to balance coverage and precision requirements, whilst minimising image distortion.

**Figure 2.** Illumination system containing nine LED arrays per lamp, positioned at regular intervals.

Illumination control is facilitated through two circuit boards: the LED driver and microcontroller boards. The camera receives signals from the PC to capture images, which are relayed to the microcontroller board. The microcontroller generates PWM signals to control the intensity of the LED arrays of different wavelengths, and multiplexes these with the ON/OFF signals received from the camera. The LED drivers take this control signal and a 12 V supply to illuminate the area when required by the camera. The duty cycles for each of the LED arrays are 50%, 50%, and 20% for the NIR, green, and red, respectively. Comparisons were made between a selection of images taken at varying duty cycles (between approximately 10% and 70% for each wavelength). Where the duty cycle for each wavelength was equal, the red response appeared to dominate the image. The combination of duty cycles chosen allowed for a balanced response in each channel and resulted in an image with clearly differing responses between plants and soil, and to a lesser extent between plant types.

A custom built PC provides the required processing power for high speed identification, including a dedicated GPU. The full PC component list can be seen in Table 1.


**Table 1.** PC components.

#### *2.2. Plant Identification*

Our approach uses red, green, and NIR reflectivity, combined with an assumption that crops are generally larger than weeds, to identify crops and weeds in-field. In [32] a modified RGB camera is used to implement NDVI using multiple filters to allow the green and blue channels to detect NIR wavelengths. Our approach similarly modifies an RGB camera through the removal of the NIR filter and introduction of a 515 nm long-pass filter to remove the unwanted response to blue light in the blue channel. However, in addition to utilising red and NIR wavelengths, the green response is maintained from the camera. The green response is intended to obtain minor tonal differences between plant types to be included in the index calculation (RGNIR—Equation (1)). This may be useful where leaves of crops and weeds overlap or large bodies of multiple weeds appear to be a single larger object.

A single shot method is used, flashing all LEDs simultaneously and capturing the red, green and NIR reflectance in one frame. The RGB channels of the camera overlap, and as such, all respond to the wavelengths used in the system to a varying extent. From the datasheet for the CMOSIS CMV400 chip, an estimation of the quantum efficiency of the RGB channels at the system wavelengths is shown in Table 2.


**Table 2.** Estimated quantum efficiency of camera channels.

The removal of the NIR filter led to colour distortion of the captured image; to compensate, a background image is taken in a black box condition and then removed from the image. An example of each of the captured RGB frames is shown in Figure 3. The channel responses are combined in Equation (1) to produce a single greyscale image for segmentation. This equation is similar to the normalised green index proposed by [33], although in our case the blue channel is a combined response to NIR and green wavelengths. The RGNIR intensity is calculated as

$$RGNIR = \frac{\beta(G\_{\mathfrak{c}} - g\_{\mathfrak{c}})}{a(R\_{\mathfrak{c}} + r\_{\mathfrak{c}}) + \beta(G\_{\mathfrak{c}} - g\_{\mathfrak{c}}) + \gamma(B\_{\mathfrak{c}} - b\_{\mathfrak{c}}) + L'} \tag{1}$$

where *Gc*, *Rc*, and *Bc* are the green, red, and blue channels, respectively; *gc, rc*, and *bc* are the background corrections for each channels (0–255); *α, β*, and *γ* are the channel weights (0–1); *L* is the soil adjusted constant.

The histogram of the RGNIR image (Figure 4a) is taken and segmentation is performed using Otsu's multi-level (in this case three) thresholding method [34,35]. The features from the segmented image are identified as either crop or weeds using a size exclusion method (Figure 4b).

**Figure 3.** Responses of the red (**a**), green (**b**), and blue (**c**) camera channels, in controlled illumination.

**Figure 4.** Image processing steps. (**a**) Greyscale output of the RGNIR function. (**b**) Image following thresholding.

#### *2.3. Height Estimation*

In order to image a field efficiently at high speed, it is essential that each frame captures the largest feasible area while maintaining sufficient detail to accurately identify and locate small plants. However, where plants are not located directly below the camera, variations in their height can result in an inaccurate determination of their position due to parallax error. In our system, the usable frame area (assuming a parallax error of less than 5% is acceptable) is approximately a 19% slice across the centre (Figure 5).

**Figure 5.** Heat map of the normalised position error across a frame.

While it is possible to disregard all but this area within the image, the camera would need to capture five times more images in order to cover the same area. This effectively reduces the camera frame rate from 90 fps to ≈17 fps, limiting the maximum speed of the system and the ability to incorporate additional wavelengths in the future. A stereo imaging technique using a single camera in motion [36,37] was chosen to compensate for the error, eliminating the need to purchase additional equipment.

Given that the imaging system is in constant motion, and the camera has a high frame rate, it is possible to use a single camera to achieve kinetic depth estimation. As the camera moves between two consecutive images, taller objects will appear to move faster than their shorter counterparts. Therefore the height disparity between the objects can be calculated using a dense optical flow algorithm. The system utilises the Farneback optical flow algorithm [38], chosen because of the low processing time and availability of the GPU kernel in the OpenCV library.

Tests were carried out to verify the reliability of the algorithm using objects of known height imaged twice with 2.1 cm camera displacement between images (chosen from the 90 fps camera frame rate and target 5 km/h system speed). The item on the left in Figure 6a is 2.5 cm tall; the object on the right is 3.5 cm tall at its lowest point and 11 cm at its highest. A height disparity map was produced from these images (Figure 6b), which shows significant error in the height estimation algorithm. The error was attributed to the small number of features, and thus, low spatial frequency of the image. Given that neighbouring pixels had very similar values, the algorithm was not able to accurately calculate the pixel shift.

**Figure 6.** Height estimation using kinematic stereo method—preliminary test. (**a**) The image prior to processing containing two objects of different heights. (**b**) The height disparity map produced using the optical flow algorithm on the first attempt.

The error in the disparity values can be reduced by applying a Laplacian filter to the image and summing the filtered and non-filtered images (Figure 7). This can improve the texture of the image, preserving the high spacial frequency components. As can be seen in Figure 7, the error in disparity values across the image is reduced. Approximately 1 cm in height corresponds to 3 units in the height disparity map (Figure 7), showing a good estimation of the object height by the algorithm. However, some inconsistencies still exist at object edges, which can be mitigated by taking the disparity value in the middle of any given object.

Using Equation (2) the real distance can be calculated, allowing the system to provide accurate position information of individual plants across an entire frame. The distance between two frames is calculated using a speed estimation algorithm and the time between images, as:

$$\mathbf{x} - \mathbf{x}' = \frac{bf}{d} \tag{2}$$

where *x* and *x* are the point of interest in frame one and the one in frame two respectively; *b* is the distance between frames; *f* is the focal length of the camera; and *d* is the disparity map (object height).

(**a**) (**b**) **Figure 7.** Height estimation using the kinematic stereo method with the addition of a Laplacian filtered image. (**a**) Sum of the original and Laplacian filtered images. (**b**) The height disparity map using the optical flow algorithm with addition of laplacian filtered image to reduce error.

#### *2.4. System Optimisation*

The system uses a NVIDIA GTX1080 GPU with a parallel implementation of the software developed in NVIDIA Compute Unified Device Architecture (CUDA)—Toolkit 9.0. A parallel prefix sum algorithm is used to reduce the run time of the multi-thresholding algorithm by several orders of magnitude. Three images were processed 1000 times, and the average speed was taken for the implementations on each the CPU and GPU to establish the time advantage of introducing multithreading. The results show the algorithm is up to 10,000 times faster using the GPU method, with the same threshold values found. A trial run of 50 images gave a total average loop time of 100 ms and maximum of 175 ms per frame with one detection module in use.

An in-field calibration method allows the user to optimise the system for different conditions. A background image is taken with no illumination, as are two further images in different positions to allow the user to tune parameters in Equation (1) and the height estimation algorithm. Figure 8 shows the calibration process with the "ideal" processed images output from the tuned parameters.


**Figure 8.** Screenshot of the system during calibration process for plant identification, showing a desirable output following thresholding.

#### *2.5. Field Trial Procedure*

Trials were performed in three separate iceberg lettuce fields at three growth stages in Ely, Cambridgeshire, United Kingdom on the 17 August 2018.

The tractor speed was varied from 0.5 to 10 km/h. The average weed density varied across each of the fields (150, 69, 31 weeds/m<sup>2</sup> for fields 1, 2 and 3 respectively). A maximum density of 364 weeds/m2 was found in those frames analysed. Weeds found in the fields were between the one and three leaf stage and were predominantly broadleaf type.

Over 20,000 unprocessed images were captured from the trials for further testing. Each image file name contains the timestamp of capture, such that any lag in the system may be detected. The time between frame capture is fixed regardless of system speed, resulting in increased overlap between frames at a lower speed. This variable could be optimised to reduce overlap at lower speeds for future trials. Each trial run produced a database entry for each crop detected, including its size and location. These databases were produced to log information which may be of use to the farmer in other activities.

The in-field calibration files were also saved to reproduce the processed images after the trial; there were two files, one used for field 1 and part of field 2, the other for the remainder of field 2 and field 3. A random sample of 50 images was considered for further analysis. Each of the raw images was evaluated, and weeds and crops were identified by eye. This was then compared to the processed images to identify which plants had been correctly identified and where false negatives and positives had occurred according to the colour code described in Table 3. The process was completed first with the trial calibration and then with the post-trial calibration to establish the limitations of the calibration process and the capabilities of the system with optimal calibration (Figure 9).

**Figure 9.** Images following processing (**left**) using in-field (**top**) and post-trial (**bottom**) calibration. Blue areas were designated as crops, and weeds are shown in green, with the centre of each target identified by a star. Unprocessed images are labelled by hand (**right**) according to the accuracy of the system identification using the key as defined in Table 3.

Crops are encircled with yellow and red regions to indicate the area within 2 and 4 cm of a crop respectively. Targets within the red-zone are ignored, as they are considered likely to result in crop damage, those within the yellow-zone are flagged to be handled with care, as are those within the pink box. Blue boxes at the top and bottom of each image identify the edge of the area for targets to be generated within each frame. The post-trial set used also used two calibrations, one for field 1 and another for field 2 and 3; this is mainly due to the significant size difference between the crops in these fields.



#### **3. Results and Discussion**

Figure 10 shows the proportion of weeds per frame correctly identified taken from a sample of 50 of the processed images. Post-trial calibration (C2) results are consistently better than their in-field (C1) counterparts. With increasing field number, positive weed identification decreased for both in-field and post-trial calibration, with a particularly poor result (18%) for field 3 in-field calibration.

**Figure 10.** Proportion of weeds correctly identified for each field (F1–3) and calibration (C1, C2).

Field 3 had the smallest crops, and therefore, the fewest weeds/frame on average (nine) which may have resulted in the smaller peak in the histogram, a change in the threshold values, and a worse detection rate. Field 3 and part of field 2 also used a second, different calibration in-field due to the substantial change in crop size from those in field 1. This in-field calibration may have been less effective than that used for field 1. Post-trial calibration results are an improvement due to a process of trial and error not possible in-field and due to a lack of screen visibility in-field making it difficult to finely tune as required.

As can be seen from Figure 11, there was a wide range of values for the percentage of weeds identified as multiple targets, particularly in field 3. For increasing field number, there was a decreasing trend of multiple target weeds. Larger weeds with several leaves are most often misidentified as multiple targets, where each leaf is mistakenly identified as an individual plant. Those fields with fewer and smaller weeds are less likely to have weeds identified as multiple targets. Overall, the post-trial calibration reduces this error in identification (from an average 19% to 12%). Post-trial results in field 3 do not appear to reduce the error as expected; however, the wide range in results makes it difficult to draw a definitive conclusion from these results. Although the addition of extra targets to the system may decrease efficiency, these weeds can still be effectively managed, and the overall effect is determined by the resolution of any actuator system. As such, it is clear the system performs well in identifying weeds, with total average identification rates of 56% and 81% for the in-field and post-trial calibrations respectively (Figure 12).

The system has a significant problem with false positives, where debris in the field (e.g., stones and twigs) are incorrectly identified as a weeds. Whereas weed identification improves in the post-trial calibrations, false positives worsen, and there is a clear trade-off when calibrating the system between improved weed identification and increasing false positives. Figure 13 shows the proportion of generated targets which are in fact debris.

**Figure 11.** Proportion of weeds identified as multiple targets for each field (F1–3) and calibration (C1, C2).

**Figure 12.** Proportion of weeds identified for each field (F1–3) and calibration (C1, C2).

**Figure 13.** Proportion of objects identified which are debris for each field (F1–3) and calibration (C1, C2).

Field 1 had the lowest proportion of misidentified targets, due to its higher number of weeds/frame (average of 43 weeds/frame). The larger number of weeds, if average debris per image is approximately constant, results in an apparent reduction of debris identified. Where there is higher leaf coverage in an image, more of the debris will also be obscured. Finally, where there is a very high debris-to-weed ratio, it may result in a false peak in the histogram and error in identification.

Approximately 31% and 12% of crops were misidentified as weeds in the in-field and post-trial calibrations respectively, as shown in Figure 14. The vast majority of these false positives were at the edge of images where the crop is not fully visible, and as such the size exclusion method identified the object as a weed (see Figure 9). The effect was most pronounced in field 3, where crops were smallest. This issue could be mitigated in future work using image stitching when more than one module is in use so the full crop is visible to the system. Alternatively, as in [39], plants not fully contained in the image may be excluded from the classification algorithm to avoid crop damage. The increase in crop misidentification in in-field calibration is due to the threshold for size exclusion being set too high. The size exclusion threshold is currently taken from approximate crop size provided during calibration. It may be more effective to derive the value instead from weed size, as there is considerable variation in crop size within each field.

**Figure 14.** Percentage of crops misidentified for each field (F1-3) and calibration (C1, C2).

Figures 15 and 16a show the debris misidentification and percentage of weeds correctly identified respectively, with respect to weed count. No clear correlation is shown between weed count and correct weed identification or speed and weed identification (Figure 16b). It should be noted that at the highest speeds results may deteriorate as images become more blurry, and identification by eye of objects in raw images becomes more difficult. However percentage of debris and weed count are clearly correlated for the reasons discussed previously.

It should be noted that some lag in the capture of a very small minority of images occurred during the field trial; the reason for this is unclear and should be considered in future studies. Due to the significant overlap of each frame in most of the trial speeds, this did not result in the failure to identify objects in the areas were this error occurred. Individual weed and debris identification may have resulted in the introduction of some human error to the determination of the effectiveness of the system.

These results are promising but further development should be considered to increase classification rates and reduce false positives. The system provides a relatively simple approach for crop and weed identification, but more complex approaches have shown better classification performance. In [28], plants were divided first into three groups of monocot, dicot, and barley with 97.7% accuracy, and weeds further classified by species with varying success using a support vector machine and shape features. Hyperspectral imaging is suggested for species identification in [40], providing 100% crop recognition, and weed species identification (31%–98% correctly identified depending on species and classifier method). This method is implemented at low resolution and

operating speeds (average ground speed = 0.09 m/s). Plants were identified in maize fields using a selection of nine colour indices combined with support vector data description in [41] achieving up to 90.79% classification, but with significant variation in results due to weather and time of day. By comparison, our system has a lower classification rate, but provides real-time identification at high resolution for individual plant treatment, whilst correcting target position data through the height estimation algorithm. The system currently has sufficient computational redundancy such that future work may introduce further discriminating features and processes to improve plant recognition and reduce false positives. The need for this increased complexity to improve recognition rates may be required if it offers an economic advantage to total weed management beyond re-running the system periodically and capturing those undetected weeds from the earlier passes.

**Figure 15.** Proportion of objects identified which are debris versus weed count.

**Figure 16.** System weed identification capability with varying speed and weed count. (**a**) Percentage of correctly identified weeds versus total weed count. (**b**) Percentage of correctly identified weeds versus system speed in-field

#### **4. Conclusions**

This paper reports a full-scale, on-tractor, field demonstration of a single-shot, multispectral imaging system for autonomous identification of emergent weeds at forward velocities of up to 10 km/h (2.8 m/s). The method successfully identified an average of 81% of weeds and 88% of crops in field trials, showing promise for the future development of this approach. The approach combines controlled spectrum artificial illumination alongside a short-wavelength (nominal cut-off wavelength, 515 nm), optically filtered, and modified (removal of NIR filter) Bayer-array (colour) digital imaging sensor. Single shot imaging of the red, green, and NIR reflectivity is delivered to achieve plant-soil discrimination.

Weeds are assumed to have a leaf area significantly less than the lettuce crop, and so a size-exclusion approach has been used to separate the target weeds from the neighbouring crop plants. This method of processing relies on the nature of lettuce production and similar horticultural crops where soil pre-treatment is used prior to the growing season to remove emergent weeds, before the transplantation of young plants. The green wavelengths (centred around 525 nm), are intended to handle more difficult weed detection cases associated with overlapping small weeds, which are then interpreted as a larger single plant. Further system testing to establish the performance at high weed density is required to verify the effectiveness of this approach.

A kinematic stereo method has been used to estimate the height of plants in the image and correct the parallax error to allow for accurate targeting of plants by a herbicide micro-dosing system. The method allows the full frame to be processed without increasing the system cost through the addition of expensive equipment, such as stereo vision cameras or LIDAR sensors. As a follow-on from this research programme, tests to ensure the proper functionality of the height-estimation system, and thereby the accuracy of the plant location data in-field are required.

Future developments of the design may include more sophisticated image processing, and taking advantage of the tonal differences between the crop and weed types or the morphologies of the various plant shapes. The significant improvement in plant identification using the post-trial calibration indicates the calibration system should be improved to ensure the system can be operated more effectively in-field.

**Author Contributions:** Conceptualization, investigation, and methodology K.Y.H., L.E., S.B. and M.Z.G.; software, K.Y.H. and S.B.; validation, L.E., S.B. and K.Y.H.; resources, B.G.; writing—original draft preparation, L.E. and K.Y.H.; writing—review and editing, L.E., B.G. and W.P.H.; visualisation, L.E., K.Y.H. and S.B.; supervision, B.G. and W.P.H.; project administration, K.Y.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** Technical support and materials for experiments were provided by Syngenta and G's fresh. The work was funded by the University of Manchester.

**Conflicts of Interest:** No conflict of interest to declare.

#### **References**


c 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **An Autonomous Fruit and Vegetable Harvester with a Low-Cost Gripper Using a 3D Sensor**

#### **Tan Zhang, Zhenhai Huang, Weijie You, Jiatao Lin, Xiaolong Tang and Hui Huang \***

Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen University, Shenzhen 510000, China; tan.zhang@utoronto.ca (T.Z.); zhenhaihuang0@gmail.com (Z.H.); youweijiee@gmail.com (W.Y.); zeup300@gmail.com (J.L.); tangtxl80@gmail.com (X.T.) **\*** Correspondence: huihuang@szu.edu.cn

Received: 13 October 2019; Accepted: 19 December 2019; Published: 22 December 2019

**Abstract:** Reliable and robust systems to detect and harvest fruits and vegetables in unstructured environments are crucial for harvesting robots. In this paper, we propose an autonomous system that harvests most types of crops with peduncles. A geometric approach is first applied to obtain the cutting points of the peduncle based on the fruit bounding box, for which we have adapted the model of the state-of-the-art object detector named Mask Region-based Convolutional Neural Network (Mask R-CNN). We designed a novel gripper that simultaneously clamps and cuts the peduncles of crops without contacting the flesh. We have conducted experiments with a robotic manipulator to evaluate the effectiveness of the proposed harvesting system in being able to efficiently harvest most crops in real laboratory environments.

**Keywords:** harvesting robot; gripper; segmentation; cutting point detection

#### **1. Introduction**

Manual harvesting of fruits and vegetables is a laborious, slow, and time-consuming task in food production [1]. Automatic harvesting has many benefits over manual harvesting, e.g., managing the crops in a short period of time, reduced labor involvement, higher quality, and better control over environmental effects. These potential benefits have inspired the wide use of agricultural robots to harvest horticultural crops (the term crops is used indistinctly for fruits and vegetables throughout the paper, unless otherwise indicated) over the past two decades [2]. An autonomous harvesting robot usually has three subsystems: a vision system for detecting crops, an arm for motion delivery, and an end-effector for detaching the crop from its plant without damaging the crop.

However, recent surveys of horticultural crop robots [3,4] have shown that the performance of automated harvesting has not improved significantly despite the tremendous advances in sensor and machine intelligence. First, cutting at the peduncle leads to higher success rates for crop detachment, and detachment at the peduncle reduces the risk of damaging the flesh or other stems of the plant and maximizes the storage life. However, peduncle detection is still a challenging step due to varying lighting conditions and occlusion of leaves or other crops [5], as well as similar colors of peduncle and leaves [6,7]. Second, existing manipulation tools that realize both grasping and cutting functions usually require a method of additional detachment [8,9], which increases the cost of the entire system.

In this paper, we present a well-generalized harvesting system that is invariant and robust to different crops in an unstructured environment aimed at addressing these challenges. The major contributions of this paper are the following: (i) development of a low-cost gripper (Supplementary Materials) that simultaneously clamps and cuts peduncles in which the clamper and cutter share the same power-source drive, thus maximizing the storage life of crops and reducing the complexity and cost; (ii) development of a geometric method to detect the cutting point of

the crop peduncle based on the Mask Region-based Convolutional Neural Network (Mask R-CNN) algorithm [10].

The rest of this paper is organized as follows: In Section 2, we introduce related work. In Section 3, we describe the design of the gripper and provide a mechanical analysis. In Section 4, we present the approach of cutting-point detection. In Section 5, we describe an experimental robotic system to demonstrate the harvesting process for artificial and real fruits and vegetables. Conclusions and future work are given in Section 6.

#### **2. Related Work**

During the last few decades, many systems have been developed for the autonomous harvesting of soft crops, ranging from cucumber [11] and tomato-harvesting robots [12] to sweet-pepper [2,13] and strawberry-picking apparatus [14,15]. The work in [16] demonstrated a sweet-pepper-harvesting robot that achieved a success rate of 6% in unstructured environments. This work highlighted the difficulty and complexity of the harvesting problem. The key research challenges include manipulation tools along with the harvesting process, and perception of target fruits and vegetables with the cutting points.

#### *2.1. Harvesting Tools*

To date, researchers have developed several types of end-effectors for autonomous harvesting, such as suction devices [17], swallow devices [18], and scissor-like cutters [15]. The key component of an autonomous harvester is the end-effector that grasps and cuts the crop. A common approach for harvesting robots is to use a suction gripping mechanism [16,19,20] that picks up fruit by pushing the gripper onto the crop to generate air to draw a piece of crop into the harvesting end and through the tubular body to the discharge end. These types of grippers do not touch the crops during the cutting process, but physical contact occurs when they are inhaled and slipped into the container, potentially causing damage to the crops, especially to fragile crops. Additionally, the use of additional pump equipment increases the complexity and weight of the robotic system, resulting in difficulties for compact and high-power density required in autonomous mobile robots [21,22]. Some soft plants, such as cucumbers and sweet peppers, must be cut from the plant and thus require an additional detachment mechanism and corresponding actuation system. The scissor-like grippers [16,23] that grasp the peduncles or crops and cut peduncles can be used for such soft plants. However, these grippers are usually designed at the expense of size. A custom harvesting tool in [16] can simultaneously envelop a sweet pepper and cuts the peduncle with a hinged jaw mechanism; however, size constraints restrict its use for crops with irregular shapes [24].

To overcome these disadvantages, we have designed a novel gripper that simultaneously clamps and cuts the peduncles of crops. In addition, the clamping and detachment mechanism share the same power-source drive rather than using separate drives. The benefits of sharing the same power source for several independent functions are detailed in our previous work [25,26]. Since the gripper clamps the peduncle, it can be used for most fruits that are harvested by cutting the peduncle. Since the gripper does not touch the flesh of the crop, it can maximize the storage life and market value of the crops. As the clamping and cutting occurs simultaneously at the peduncle, the design of the gripper is relatively simple and therefore cost-effective.

#### *2.2. Perception*

Detection in automated harvesting determines the location of each crop and selects appropriate cutting points [1,27,28] using RGB cameras [29,30] or 3D sensors [13,15]. The 3D localization allows a cutting point and a grasp to be determined. Currently, some work has been done to detect the peduncle based on the color information using RGB cameras [6,7,29,30]; however, such cameras are not capable of discriminating between peduncles, leaves, and crops if they are in the same color [7]. The work in [31] proposed a dynamic thresholding to detect apples in viable lighting conditions, and they further used a dynamic adaptive thresholding algorithm [32] for fruit detection using a small set of training images. The work in [27] facilitates detection of peduncles of sweet peppers using multi-spectral imagery; however, the accuracy is too low to be of practical use. Recent work in deep neural networks has led to the development of a state-of-the-art object detector, such as Faster Region-based CNN (Faster R-CNN) [33] and Mask Region-based CNN (Mask R-CNN) [10]. The work in [34] uses a Faster R-CNN model to detect different fruits, and it achieves better performance with many fruit images; however, the detection of peduncles and cutting points has not been addressed in this work. To address these shortcomings, we use a Mask R-CNN model and a geometric feature to detect the crops and cutting points of the peduncle.

#### **3. System Overview and the Gripper Mechanism**

#### *3.1. The Harvesting Robot*

The proposed harvesting robot has two main modules: a detection module for detecting an appropriate cutting point on the peduncle, and a picking module for manipulating crops (see Figure 1). Specifically, the detection module determines the appropriate cutting points on the peduncle of each crop. The picking module controls the robot to reach the cutting point, clamps the peduncle, and cuts the peduncle. The clamping and cutting actions are realized by the newly designed gripper.

**Figure 1.** Architecture of the proposed harvesting robot.

#### *3.2. The Gripper Mechanism*

The proposed gripper is illustrated in Figure 2 from a 3D point of view. Inspired by skilled workers who use fingers to gently hold the peduncles and use their fingernails to cut the peduncles, this gripper is designed with both clamping and cutting functions with three key parts only: a top plate with a cutting blade, a middle plate to press the peduncle, and a bottom plate to hold the peduncle. Two torsion springs and an axle are included.

**Figure 2.** The 3D view of the proposed gripper [35,36].

The clamping and cutting processes work as follows:

1. Clamping the peduncle: pressing the top plate, the middle plate contacts the bottom plate based on "*torsion spring 2*", and thus the peduncle is clamped, as shown in Figure 3b.

2. Cutting the peduncle: continuing to press the top plate, the cutting blade contacts the peduncle on the bottom plate, thereby cutting the peduncle, as shown in Figure 3c.

After releasing the top plate, the top and middle plates are automatically restored to the rest position (Figure 3a) by "*torsion springs 1 and 2*", respectively.

**Figure 3.** Cutting process: (**a**) rest position; (**b**) clamping state; (**c**) cutting state.

#### *3.3. Force Analysis of the Gripper*

As shown in Figure 4, when the cutting blade touches the bottom plate, the cutting force on the peduncle is calculated using the following equations:

$$M\_{\rm cut} = M\_{\rm pushd} - M\_{\rmspring\\_1} - M\_{\rmspring\\_2} \tag{1}$$

$$M\_{\text{spring},1} = k\_1 \cdot \theta\_1 \tag{2}$$

$$M\_{\text{spring\\_2}} = k\_2 \cdot \theta\_2 \tag{3}$$

where *Mcut*, *Mpush*, *Mspring\_1*, *Mspring\_2* are the moments of cutting force, pushing force, spring 1 and spring 2. *k*<sup>1</sup> and *k*<sup>2</sup> are the torsion coefficients of "torsion springs 1 and 2", respectively. θ<sup>1</sup> and θ<sup>2</sup> are the angles of deflection from the equilibrium position of "torsion springs 1 and 2", respectively.

"*Torsion spring 1*" is the spring between the top plate and bottom plate, while "*torsion spring 2*" is the spring between the top plate and middle plate; see also Figures 2 and 4d. With the moment *Mcut*, the cutting force *Fcut* can be described as:

$$M\_{\rm cut} = F\_{\rm cut} \cdot L\_{\rm cut} \tag{4}$$

where *Fcut* is the force applied on the peduncle from the cutting blade, *Lcut* is the length between the axis of the coil and the line of the cutting force, see Figure 4b. *P* is the contacting point between the cutting blade and the peduncle.

Thus, *Fcut* is derived from Equations (1)–(4) as follows:

$$F\_{\rm cut} = \frac{M\_{\rm push} - k\_1 \cdot \theta\_1 - k\_2 \cdot \theta\_2}{L\_{\rm cut}} \tag{5}$$

For further investigating the cutting force on peduncle, we analyzed the stress:

$$F\_{\rm cut} = \tau\_{\rm cut} A \tag{6}$$

$$\tau\_{\rm cut} = \frac{M\_{\rm push} - k\_1 \cdot \theta\_1 - k\_2 \cdot \theta\_2}{A \cdot L\_{\rm cut}} \tag{7}$$

where τ*cut* is the stress, and *A* is the surface area of the peduncle. The two variables determine if the peduncle can be cut off. Equation (7) is derived from Equations (5) and (6).

*Sensors* **2020**, *20*, 93

As shown in Figure 5, when pressing the top plate and the middle plate touches the bottom plate, "*torsion spring 2*" creates the pressing force on the middle plate, and thus generates force on the bottom plate. This force ensures that the gripper clamps the peduncle before the cutting blade touches the peduncle:

$$M\_{\text{clump}} = M\_{\text{spring\\_2}} \tag{8}$$

$$M\_{clamp} = F\_{clamp} \cdot L\_2 \tag{9}$$

$$M\_{\text{spring\\_2}} = k\_2 \cdot \theta\_2 \tag{10}$$

**Figure 4.** Force diagram of the cutting blade touching the bottom plate. (**a**) The diagram of the gripper. The force analysis of the gripper (**b**), *Torsion spring* 1 (**c**), and *Torsion spring* 2 (**d**).

**Figure 5.** Force diagram of the middle plate touching the bottom plate.

After the peduncle is cut, the crop is detached from the peduncle if the clamping force is smaller than the gravity of the crop. To ensure that the crop is clamped, the minimum clamping force overcoming the gravity is expressed as:

$$F\_{clamp} \geq \frac{m \cdot \mathcal{g}}{\mu} \tag{11}$$

where *m* is the weight of the crop, *g* the gravity, and μ the coefficient of friction for a crop on the gripper plate surface. Based on the Equations (8)–(11), we obtain the minimum coefficient of "*torsion spring 2*":

$$k\_2 \ge \frac{m \cdot \mathcal{g} \cdot L\_{clamp}}{\mu \cdot \mathcal{O}\_2} \tag{12}$$

#### **4. Perception and Control**

#### *4.1. Cutting-Point Detection*

The approach to detecting the cutting points of the peduncles consists of two steps: (i) the pixel area of crops is obtained using deep convolutional neural networks; (ii) the edge images of fruits are extracted, and then a geometric model is established to obtain the cutting points. Here, we obtain the external rectangle size of each pixel area to determine the region containing the peduncle for each crop. The architecture of the cutting-point-detection approach is shown in Figure 6.

Accurate object detection requires detecting not only which objects are in a scene, but also where they are located. Accurate region proposal algorithms thus play significant roles in the object-detection task. Mask R-CNN [10] makes use of edge information to generate region proposals by using the Region Proposal Network (RPN). Mask R-CNN uses color images to perform general object detection in two stages: region proposal, which proposes candidate object bounding boxes, and a region classifier that predicts the class and box offset with a binary mask for each region of interest (ROI). To train Mask R-CNN for our task, we perform fine-tuning [37], which requires labeled bounding-box information for each of the classes to be trained. After extracting the bounding box for each fruit, the geometric information of each region (e.g., external rectangle) is extracted by geometric morphological calculation. The cutting region of the interest of the targeted crop can thus be determined. A RGB-D camera like the Kinect provides RGB images along with per-pixel depth information in real time. With the depth sensor, the 2D position of the cutting point is mapped to the depth map, and thus the 3D information of the cutting point (x,y,z) is obtained.

**Figure 6.** Illustration of the vision system that detects and localizes the crop and its cutting point.

The peduncle is usually located above the fruit. Thus, we set the ROI of the peduncle above the fruit, as shown in Figure 7.

The directions of the crops can be downward or tilted. We also adapted the minimum bounding rectangle to Mask R-CNN, thus to obtain the bounding boxes for crops in tilted direction. Other cases such as the direction of the crop parallel to the camera or the peduncle being "occluded" by its flesh will be discussed in the future work.

We then obtain the bounding box of the fruit, i.e., a top point coordinate (*xt, yt*), and a bottom point coordinate (*xb, yb*). The coordinates of the cutting point (*xc, yc*) can be determined as follows:

$$\dot{Roi}\_L = 0.5 \cdot L\_{max} \tag{13}$$

$$Roi\_H = 0.5 \cdot H\_{\text{max}} \tag{14}$$

$$L\_{\text{max}} = |\mathbf{x}\_{\text{b}} - \mathbf{x}\_{t}| \tag{15}$$

$$H\_{\text{max}} = |y\_b - y\_t| \tag{16}$$

$$\mathbf{x}\_c = 0.5(\mathbf{x}\_b - \mathbf{x}\_t) \pm 0.5 \text{Roi}\_L \tag{17}$$

$$y\_{\mathcal{C}} = y\_{\mathcal{I}} - 0.5Roi\_{\mathcal{H}} \tag{18}$$

**Figure 7.** A schematic diagram of the cutting-point-detection approach.

#### *4.2. System Control*

A common method of motion planning for autonomous crop harvesting is open-loop planning [16,38]. Existing work on manipulation planning usually requires some combination of precise models, including manipulator kinematics, dynamics, interaction forces, and joint positions [39]. The reliance on manipulator models makes transferring solutions to a new robot configuration challenging and time-consuming. This is particularly true when a model is difficult to define, as in custom harvesting robots. In automated harvesting, a manipulator is usually customized to a special crop; thus, they are not as accurate as modern manipulators and it is relatively difficult to obtain accurate models of the robots. In addition, requiring high position sensing becomes challenging in manipulators given the space constraints.

In this paper, we demonstrate a learning-based kinematic control policy that maps target positions to joint actions directly, using convolutionally neural network (CNN) [40]. This approach makes no assumptions regarding the robot's configuration, kinematic, and dynamic model, and has no dependency on absolute position sensing available on-board. The inverse-kinematics function is learned using CNNs, which also considers self-collision avoidance, and errors caused by manufacturing and/or assembly tolerances.

As shown in Figure 8, the network comprises three convolutional layers and three fully connected layers. The input is the spatial position of the end-effector, and the output is the displacement of all of the joints. The fully connected layers will serve as a classifier on top of the features extracted from the convolutional layers and they will assign a probability for the joint motion being what the algorithm predicts it is.

We collected data by sending a random configuration request *q* = (*q*1, *q*2, ... , *qn*) to a robot with *n* joints and observing the resulting end-effector position *x* = (*x*1, *x*2, ... , *xn*) in 3D space. *n* is the number of joints. This results in a kinematic database with a certain number of trails:

$$X = (, , \dots, ) \tag{19}$$

We use the neural network shown in Figure 8 to estimate the inverse kinematics from the collected data. Then we generate a valid configuration *q<sup>n</sup>* for a given target position *xn*. *q<sup>n</sup>* and *x<sup>n</sup>* are the *nth* set of training data. We use this approach for planning to reach a cutting point of the peduncle.

**Figure 8.** Network architecture for the kinematic control of a robotic manipulator.

#### **5. Hardware Implementation**

The following experiments aim at evaluating the general performance of the harvesting system. As shown in Figure 9, the proposed gripper was mounted on the end-effector of the Fetch robot (see Figure 9c). The Fetch robot consists of seven rotational joints, a gripper, a mobile base, and a depth camera. The depth camera was used to detect and localize the crops. The robot is based on the open source robot operating system, ROS [41]. The transformation between the image coordinate frame and the robot coordinate frame are obtained with ROS packages. The three plates and the shaft of the proposed gripper are created using Polylactic Acid (PLA) with a MakerBot Replicator 2 printing machine. The length, width, and height of modules are taken as 11, 5, and 9 cm, respectively.

Since the Fetch robot is not suitable for working in the field, the experiments were conducted in a laboratory environment. In order to verify the effectiveness of the proposed system in identifying and harvesting a variety of crops, we use both real and plastic ones, as shown in Figure 9a,b. Two experiments are presented in the following section aimed at validating the perception system and the harvesting system. First, we present the accuracy of the detection of crops and the cutting points. Second, we present the experiment of the full harvesting platform, demonstrating the harvesting performance of the final integrated system.

**Figure 9.** Setup for the experiments. (**a**) The experiment setup; (**b**) The plastic crops with plastic peduncles hung on a rope; (**c**) The proposed gripper mounted on the end-effector of the Fetch robot.

(a) (b) (c)

#### *5.1. Detection Results*

Seven types of crops were tested, including lemons, grapes, common figs, bitter melon, eggplants, apples, and oranges. For each type of crop, we used 60 images for training and noises were introduced in the images. The number of images was inspired by Sa et al. [34]. We collected images from Google and set labelled bounding box information for each crop. The images taken with the Kinect sensor

were used for testing only. In this paper, we used 15 testing images for each type of crop. These images are taken from different viewpoints. We have one plant for each type of the real fruits.

An example is shown in Figure 10 where the real crops were on the plants (see Figure 9a). For a convenient harvesting, all the plastic crops were hung on a rope, as shown in Figure 9b. We had three attempts and the crops were repositioned the crops every time. Therefore, we used 27 plastic crops (six apples, three lemons, six oranges, three bitter melons, three clusters of grapes, and six eggplants) and 31 real crops (seven grape bunches, 14 lemons, and 10 common figs).

(a) (b)

**Figure 10.** Results of the cutting-point-detection for some plastic (**a**) and real crops (**b**).

We utilize the precision-curve [42] as the evaluation metric for the detection of cutting points of lemons. The performance of the proposed detection method and the baseline method Conditional Random Field (CRF) [43] was compared. Figure 11 shows that the two methods have similar performance. It can be seen that there is a gradual linear decrease in precision with a large increase in recall for the proposed approach.

**Figure 11.** Precision-recall curve for detecting lemons using the CRF-based approach and the proposed approach.

Some of the detection results for real and plastic crops were shown in Figure 10. The results show that the detection success rates for plastic and real crops are 93% (25/27) and 87% (27/31), respectively. The detection success rates for the cutting points for plastic and real crops are 89% (24/27) and 71% (22/31), respectively. The result can be seen in Figure 12.

**Figure 12.** Harvesting rates for autonomous harvesting experiment.

#### *5.2. Autonomous Harvesting*

Motion planning is performed for autonomous attachment and detachment. The attachment trajectory starts at a fixed offset from the target fruit or vegetable determined by the bounding box. Then, the trajectory makes a linear movement towards the cutting point. The whole motion was computed using the control policy that maps target positions to joint actions introduced in Section 4.2. Self-collisions of the robot were included within the motion planning framework to ensure the robot safely planned. Once the attachment trajectory has been executed attaching the peduncle, the end-effector is moved horizontally to the robot from the cutting point.

A successful harvest consists of a successful detection of the cutting point and a successful clamping and cutting of the peduncle. The crop that was successfully picked was replaced with a new one hung on the rope. We repositioned the crop for repicking if it was not harvested successfully. For real crops, the robot harvests each crop once, including seven grape bunches, 14 lemons, and 10 common figs. Similar to plastic fruits, reaching the crop's cutting point is counted as a successful attachment, and clamping and cutting of the peduncle is counted as a successful detachment. Results show that most plastic fruits and vegetables were successfully harvested. The result is shown in Figure 12. The detailed success and failure cases can be seen in Table 1. The attachment success, i.e., the success rate of the robot's end-effector reaching the cutting point is 81% and 61% for plastic and real crops, respectively. The detachment success rates are 67% and 52%, respectively.


**Table 1.** Harvesting results for plastic and real crops.

Figure 13 is the video frames showing the robot reaching a cluster of plastic, cutting the peduncle, and placing them onto the desired place. Figure 14 is the video frames showing the robot harvesting real lemons, grapes and common figs in the laboratory environment. A video of the robotic harvester demonstrating the final experiment by performing autonomous harvesting of all crops is available at http://youtu.be/TLmMTR3Nbsw.

For cases where the crop's cutting point is occluded by leaves, the robot can remove the leaves to make the crop's cutting point visible. First, the overlap between crop's cutting area and the leaf is calculated to see if it exceeds a given threshold. Then, the robot detects the leaf's cutting point and cuts it using the same method as that for cutting crops. Figure 15 shows a modified scenario in which the robot removed leaves that occluded the fruits. We made four attempts and failed twice. The figure shows that the leaf stuck onto the cutting blade and it did not detach even if the gripper is open.

**Figure 13.** Video frames showing the robot reaching a cluster of plastic grapes, cutting the peduncle, and placing them onto a table.

**Figure 14.** Video frames showing robot harvests real lemons (top), grapes (middle), and common figs (bottom).

**Figure 15.** Video frames for the modified scenario in which the robot picked an occluded lemon (bottom) by removing an occluding leaf (top).

#### *5.3. Discussion*

A failure analysis was conducted to understand what were the major causes of harvesting failures within the system proposed, and thus to improve the system in the future work. The mode of failures includes: (1) fruits or vegetables not detected or detected inaccurately due to occlusions; (2) cutting point not detected due to the irregular shapes of the crops; (3) peduncle not attached due to motion planning failure; (4) peduncle partially cut since the cutting tool stuck into the peduncle; (5) crop dropped since the gripper does not clamp the crop tightly; (6) detachment fails since the crops or leaves stays on the blade when the gripper is open.

Figure 11 shows the success rates for three scenarios: Scenario 1 with plastic crops, Scenario 2 with real crops, and a modified scenario in which the robot picked crops by removing occlusions. The detection of the fruits and the cutting-points, attachment, and detachment harvest success rate were included. The detachment success rate reflects the overall harvesting performance.

The results showed that Scenario 1 with plastic crops has a higher harvest success. The most frequent failure mode was (2) cutting point not detected and (5), occurring 30% harvest failure. For instance, the two attachment failures (orange 1 and eggplant 1) occurred in the first nine crops. For orange 1, the cutting blade missed the peduncle as the peduncle is short and the cutting blade missed it. Eggplant 1 was irregularly shaped, which resulted in a poor cutting-pose estimation. We then adjusted the directions of the orange and eggplant for the following two rounds, and there was no detection error. The lower detachment success rate is most likely attributed to the grapes and bitter melon since they have a larger weight and fall.

The harvest success rate (55%) was lower for Scenario 2 with real crops. The most frequent failure modes were (4), and (5) which represented a significant portion of the failure cases and showed that attachment and detachment process is challenging due to the lower stiffness of the cutting mechanism, as well as the sophisticated environment. For instance, the common figs have shorter peduncles and the gripper is thick and wider, resulting in a detachment failure. The grape peduncle is thicker and difficult to be cut off.

The results also showed a lower harvest success rate (50%) for the modified scenario. The most failure modes were (2) and (6). For instance, the leaves of the lemon tree did not fall when the gripper is open since the leaves were thin and light. The cutting points of the leaves were difficult to be detected due to the random poses. Thus, the leaves may not be removed completely once. In the future work, we will improve the detection of the occluded crops and the leaves, as well as the robot grasping pose.

#### **6. Conclusions and Future Work**

In this paper, we developed a harvesting robot with a low-cost gripper and proposed a cutting-point detection method for autonomous harvesting. The performance of the proposed harvesting robot was evaluated using a physical robot in different scenarios. We demonstrated the effectiveness of the harvesting robot on different types of crops. System performance results showed that our system can detect the cutting point, cutting the fruit at the appropriate position of the peduncle. Most failures occurred in the attachment stage due to irregular shapes. This problem was avoided by adjusting the directions of crops. In the future, we plan to devise a new approach to detect the cutting pose for irregularly shaped crops.

There are some detachment issues related to the fabrication of the gripper. For instance, the crop did not detach from the peduncle when the gripper is open, while the clamped crop falls due to gravity. Thus, we are working to improve the gripper using metal materials with a better cutting blade, so the gripper would be sufficiently stiff to successfully clamp different types of crops, including one with a thicker peduncle.

In this paper, we mainly focused on the cases where the directions of the crops are downward or tilted. The hypothesis posed in this paper is that a robot capable of reaching targets or leaves without considering obstacles. Human approaches a fruit by pushing a leaf aside or going through the stems without causing any damage to the plant since a damage can seriously hamper plant growth

and affect future production of fruit. Inspired by this, we demonstrated how a robot removes leaves to reach the target crop. However, the proposed crop picking process by removing leaves is still in an ideal situation. In the future, we will improve the detection of obstacles, such as unripen crops, leaves and stems. For cases where the direction of a crop parallel to the camera or the peduncle being "occluded" by its flesh, the robot needs to rotate the camera to the next best view [44] to make the cut point of the crop visible. A method to determine the next best view with an optimal grasp [45] based on occlusion information will be considered in the future work.

For the case where stems are viewed as obstacles which are not avoided, we are currently working on developing a vision-guided motion planning method, using a deep reinforcement learning algorithm. This approach allows the robot to reach the desired fruits and avoid obstacles using current visual information. The aforementioned next-best-view-based approach will be combined with the motion planning. Furthermore, we aim to explore the estimation of the locations of crops in three dimensions and predict the occlusion order between objects [46].

As our current robot is inappropriate for use outside, we have not tested our harvesting robot in the field. Our next step is to test more types of crops in the field using a UR5 robot and, in addition, consider the autonomous reconstruction of unknown scenes [47].

**Supplementary Materials:** The following are available online at http://www.mdpi.com/1424-8220/20/1/93/s1. Video: Autonomous fruit and vegetable harvester with a low-cost gripper using a 3D sensor.

**Author Contributions:** Conceptualization, T.Z.; methodology, T.Z., Z.H. and W.Y.; software, Z.H. and W.Y.; validation, T.Z., Z.H. and W.Y.; formal analysis, T.Z.; investigation, T.Z.; resources, J.L.; data curation, X.T.; writing-original draft preparation, T.Z.; writing-review and editing, H.H.; visualization, T.Z.; supervision, T.Z., H.H.; project administration, H.H.; funding acquisition, H.H. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was funded in parts by NSFC (61761146002, 61861130365), Guangdong Higher Education Innovation Key Program (2018KZDXM058), Guangdong Science and Technology Program (2015A030312015), Shenzhen Innovation Program (KQJSCX20170727101233642), LHTD (20170003), and the National Engineering Laboratory for Big Data System Computing Technology.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Development of Willow Tree Yield-Mapping Technology**

#### **Maxime Leclerc 1, Viacheslav Adamchuk 1,\*, Jaesung Park <sup>1</sup> and Xavier Lachapelle-T. <sup>2</sup>**


Received: 26 February 2020; Accepted: 3 May 2020; Published: 6 May 2020

**Abstract:** With today's environmental challenges, developing sustainable energy sources is crucial. From this perspective, woody biomass has been, and continues to be, a significant research interest. The goal of this research was to develop new technology for mapping willow tree yield grown in a short-rotation forestry (SRF) system. The system gathered the physical characteristics of willow trees on-the-go, while the trees were being harvested. Features assessed include the number of trees harvested and their diameter. To complete this task, a machine-vision system featuring an RGB-D stereovision camera was built. The system tagged these data with the corresponding geographical coordinates using a Global Navigation Satellite System (GNSS) receiver. The proposed yield-mapping system showed promising detection results considering the complex background and variable light conditions encountered in the outdoors. Of the 40 randomly selected and manually observed trees in a row, 36 were successfully detected, yielding a 90% detection rate. The correctly detected tree rate of all trees within the scenes was actually 71.8% since the system tended to be sensitive to branches, thus, falsely detecting them as trees. Manual validation of the diameter estimation function showed a poor coefficient of determination and a root mean square error (RMSE) of 10.7 mm.

**Keywords:** precision agriculture; yield estimation; machine vision; willow tree

#### **1. Introduction**

Biomass has garnered much interest as an alternative sustainable energy source through combustion and transformation. Biomass comprises natural materials from living, or recently dead, plants, trees, or animals which are recycled as fuel in industrial production. More specifically, forest biomass consists of all parts of the tree, such as the trunk, bark, branches, needles, leaves, and roots [1]. At the basic level, photosynthesis allows trees to convert light energy, carbon dioxide, and water into biomass and oxygen. Since trees fix carbon while growing, the production of woody biomass demonstrates a neutral carbon balance when cultivated in low-quality areas such as marginal and abandoned fields. To maximize carbon fixation, research has focused on tree species that yield high amounts of rapid biomass production. Suitable species for biomass production include Willow (*Salix* spp.), Poplar (*Populus* spp.) and Eucalyptus (*Eucalyptus* spp.). Forest biomass can be processed as biofuel in a solid, liquid, and gaseous phase, which can then be burnt to generate useful energy, usually in the form of electrical energy [2].

As biomass production can positively impact sustainability in the energy industry, Precision Agriculture (PA) has had a positive impact on sustainability in the agricultural industry. In the case of crop production, this technology aims at implementing site-specific management (SSM) solutions for agricultural inputs [3]. By doing so, farmers can optimize their operations, reducing the effect of these inputs on the environment. PA technologies rely heavily on the Global Navigation Satellite System (GNSS), which can be used to locate agriculture vehicles in real-time in a field environment. One of the earliest and most common applications of GNSS-based systems is yield monitoring and mapping. Yield monitoring systems are designed to estimate crop yield and/or yield quality in real-time during harvesting [4]. The system is then able to link yield estimates to geographical coordinates to produce a yield map. The map is updated in real-time on a display that can be viewed by the operator. After harvesting, it can be retrieved for further analysis to ascertain its effectiveness in terms of performance, impact of management practices, locating low- and high-yielding zones, and so on.

Although yield maps are essential tools for today's agriculture, not all crops can be monitored for yield. Currently on the market, yield-monitoring systems are available for grain (e.g., corn, soybean, wheat), cotton, and sugarcane. Consequently, farmers growing other field crops (e.g., specialty crops, vegetables, fruits) must still rely on prior techniques to determine crop yield, where spatial resolution is far inferior to yield monitoring systems.

Methods have been proposed to estimate the yield of specialty crops based on optical feature detection [5–8]. The acquisition of image data using vision sensors, such as conventional cameras, can be analyzed in real-time or post-harvest by computer-vision (CV) algorithms to generate yield maps. The main advantage of optical feature detection is its flexibility since multiple types of feature can be extracted from an image such as color, texture, and shape. Yield-monitoring systems can be built with the capability of extracting distinctive features of a crop to use the output to produce a yield map.

A machine vision (MV)-based yield-monitoring system was developed for vegetable crops [6]. Vegetable segmentation was performed through the watershed algorithm and color data was used for classification. The system was designed to detect crop flow at the individual level, classify them according to shape and size, and count them. Testing was done with shallot onions, but the algorithm was constructed in a way to make it transferable to charlotte onions, Chinese radish, carrots, and lettuce. Results showed a coefficient of determination (R2) of 0.46 for the overall accuracy of the system. A single RGB camera was used and occlusion, as well as variable light conditions, were the limiting factors for this research.

An image segmentation framework for fruit detection and yield estimation in apple orchards had been investigated [7]. A spherical video camera capable of capturing a full 360◦ panoramic view was mounted on a remotely controlled testing platform. The framework included contextual information concerning relevant metadata to evaluate state-of-the-art convolutional neural network (CNN) and multiscale multilayered perceptions to perform pixel-wise fruit segmentation. Once the binary image was obtained, watershed segmentation followed by a circle Hough transform algorithm to detect and count the apples. Results showed that the combination of CNN pixel-wise classification and watershed segmentation produced a higher coefficient of determination (R2) at 0.83 compared to post-harvest fruit counts obtained from grading and counting machines.

An automated apple yield monitor using a stereo rig was demonstrated [8]. The system was mounted on an autonomous orchard vehicle fitted with a controlled artificial light source for nighttime data acquisition. Images from both sides of each tree row were captured. The algorithm started by hue and saturation segmentation, followed by the detection of local maxima in the grayscale intensity profile (i.e., specular reflection) as the feature indicating apples. Once detected, the intensity profile over four lines (i.e., going through the local maxima) of 21 pixels each was checked for roundness to remove false positives. Apples were then registered and counted as crop yield estimation. Errors of −3.2% and 1.2% in yield were reported for red and green apples, respectively.

A method for robust detection of trees using color images in a forest environment was published [9]. The proposed approach was characterised by a seed point generation algorithm followed by a multiple segmentation algorithm. Output segments were refined by morphological operations before being evaluated by a quality function based on the segment's specific property (i.e., area of convex hull, area, orientation, eccentricity, solidity, and height). For each seed point, the segment with the best quality score was selected and all selected segments were fused in one frame which represented the final output of the tree detection algorithm. For that research, 30 images containing 197 trees were collected

in a forest environment. The tree detection rate was reported as 86.8% with a corresponding precision of 81.4%.

The goal of this research was to develop a new yield mapping system for willow trees based on MV technology. The system was designed and built in partnership with Ramea Phytotechnologies, Inc. (Ramea), based in Saint-Roch-de-l'Achigan, Quebec. The system was designed to gather the physical characteristics of willow trees and tag them to their geographical coordinates on-the-go, while the trees are being harvested. Features assessed include the number of trees harvested and their diameter. Furthermore, the system has potential as a fast and non-destructive yield estimation tool which could be used to assess carbon dioxide (CO2) sequestration of willow trees grown in a short-rotation forestry (SRF) system over time.

Since methods for growing willows in SRF vary greatly depending on countries and regions [2], the developed willow tree yield mapping technology must be malleable enough to make this research relevant to industry partners and to producers around the world. Optical sensors have the capacity of sensing more than one feature such as color, intensity, geometry and texture [10]. Considering that the number of stems and morphological characteristics of the willow trees were assessed, and not bulk mass, the use of an optical, multi-sensing, platform was desirable. Moreover, for systems to be adopted in the agricultural industry, they must be affordable, rugged, and easy to install and manage. Thus, willow tree yield mapping based on MV was the preferred and selected method to accomplish the goals of this project.

#### **2. System Components**

In terms of fruit detection and localization, several studies have been carried out to assess the performance of different vision sensor technologies. In doing so, researchers reported on the main issues involved in using camera sensors in an outdoor environment. Variations in light conditions, as well as variations of color, shape, and size of the target, are common problems that researchers face [10]. Moreover, complex plant canopy structures can create occlusion of targets, which makes detection even more challenging [11]. For this research, cameras using monochrome imagers were preferred to diminish the effect of variations in light conditions that are prominent in outdoor scenes.

To improve target segmentation in the complex background environment, a camera featuring 3D vision was also favored to better extract the willow's feature from the scenes. By using two imagers, it is possible to extract depth information from a pair of images. This technology is called stereovision. It aims to mimick the mammalian binocular vision system by collecting a stereo pair of images of a scene from two cameras with a known geometrical relationship. Using a matching algorithm, the system matches key-points of one stereo-pair image to the other. It then estimates the disparity between matching key-points and uses this to compute depth according to the system's intrinsic and extrinsic parameters [12].

#### *2.1. Sensors*

To acquire raw data, the RealSense D435 RGB-D camera from Intel (Intel Corporation, Santa Clara, California) and a Global Positioning System (GPS) receiver with Wide Area Augmentation System (WAAS) differential correction capability model 19x-HVS from Garmin (Garmin Ltd., Olathe, Kansas) were used. The RealSense D435 (Table 1) features infrared active stereovision technology to create depth maps using an embedded infrared projector. Its large Field Of View (FOV) makes it ideal for an outdoor setting with large targets at close range. The camera is fitted with one red green blue (RGB) imager and one stereo depth module containing two monochrome imagers. Images from the stereo depth module are processed by an embedded processor (Vision Processor D4) to generate depth maps. The RealSense can output four data streams at once in real-time: RGB stream, two monochrome imager streams, and the depth map stream. The on-board Vision Processor D4 makes this camera suitable for an embedded solution since it does not require the host computer to perform the heavy computation required by the stereo-matching algorithm.


**Table 1.** RealSense D435 camera specifications.

\* Varies depending on calibration, scene, and lighting condition.

For this research, a resolution of 1280 × 720 was used to obtain the best pixel-to-millimeter conversion factor possible. The infrared projector was turned off, since it is designed to improve performance in low texture scenes such as a dimly lit white wall [13,14].

#### *2.2. Sensors Position and Mount*

Early in the research, the front end of the tractor was considered to be the best location for the camera. The region of interest (ROI) for the willow trees was set between 3.05 m and 3.66 m measured from the soil surface. To achieve this with the RealSense FOV, the camera was mounted at 1.78 m from the ground surface and given a 28.5◦ tilt angle measured from the ground plane. The horizontal axis of the camera FOV was parallel to the row and the perpendicular distance between the camera and the row was 1.22 m.

The camera and GNSS mount (Figure 1) were designed as a multi-purpose assembly. In addition to providing mounting points for both sensors, it served as a guard against branches and sunlight. Branches can be expected to be leaning outside of the row if they have been hit while harvesting the adjacent row. Contact between a branch and the stereovision camera could result in loss of frame quality and physical damage to hardware (i.e., camera and data transfer cable). As the camera is tilted, direct sunlight can overexpose the optical sensors resulting in a significant decline of the frame's information. The mount was designed to protect the camera when the sun is shining directly above. The sensor mount is also designed to allow easy fine-tuning of the camera position and angle within the field. Extra mounting points were also integrated to secure the assembly to the tractor in the event that future vibration became excessive.

**Figure 1.** (**a**) Overall view of final camera position; (**b**) close view of final camera and Global Positioning System (GPS) mount.

#### **3. Willow Tree Algorithm**

The multiple steps that had to be taken in order to count the number of willow trees and to estimate their diameter for each frame are described below and summarized in Figure 2. From this point, when the term "monochrome imager" is used, it implies the leftmost imager of the RealSense D435 sensor (looking at the front view). Since the depth map is computed from this frame's perspective of a scene, they are naturally aligned from one another and additional computation is avoided. The frame produced by the monochrome imager will be referred to as the "grayscale image".

**Figure 2.** Willow tree algorithm flowchart.

Python was chosen as the programming language for the algorithm and was heavily dependent on NumPy and OpenCV libraries. As the RealSense camera is marketed to developers, a low-level software development kit (SDK) was freely provided by Intel through GitHub (GitHub, Inc., San Francisco, CA, USA) for the RealSense.

The Willow Tree algorithm runs using multiple parameters inputted in open-sourced and user-defined functions. It is relevant to mention that no formal procedures took place to optimize those values. For steps where parameters were involved, system performance and processing time were important criteria with special attention for overfitting since the system was not validated on an extensive data set.

As this system is the first of its kind to the knowledge of the researchers (i.e., MV-based yield monitoring system for willow trees grown in SRF), it was of interest to see if classical CV operations and algorithms such as masks, histograms, adaptive histogram equalization and contours properties could achieve tree detection and diameter estimation with good results. More complex methods such as machine learning and deep learning were not examined for the development of this algorithm since potential problems with classical methods were not known for this application [15]. Furthermore, for the system to be adopted, the algorithm must be able to run on an embedded system, with finite computing power, and affordable for farmers while being designed for a harsh environment.

#### *3.1. Segmentation*

To perform the segmentation, several steps were carried out using the depth map and the grayscale image to yield a segmented binary frame. The first was the computation of a depth mask, followed by the application of adaptive histogram equalization on the grayscale image. Then, both outputs were used as inputs in a dynamic histogram thresholding function where seed points were determined and input in the connected components algorithm flood-fill. The flood-fill algorithm output was the product of the segmentation procedure.

#### 3.1.1. Depth Mask

The computation of a binary mask using the depth map was a necessary prior step to the dynamic histogram thresholding. The goal was to create a rough mask of the row being harvested to be input in the histogram calculation. This ensured that most of the background pixels were not considered in the histogram. A thresholding plane was first initialized to set the depth value for every pixel to zero if it was greater than the threshold plane at a specific location. Since the camera was tilted at 28.5◦ from the soil surface, the threshold plane was tilted by the same amount to account for a greater thresholding distance for the top pixel row compared to the bottom pixel row of the depth map. The top was computed as 2.33 m and the bottom was set as a constant at 1.23 m, which corresponds to the center of the row relative to the camera position. Using the vertical FOV of the monochrome imager, the height of a pixel in meters was computed based on the perpendicular distance from the camera to the threshold plane. Next, simple trigonometry was applied to find the length in meters of the hypotenuse formed between the camera center (fix), the bottom pixel row (fix) and the currently accessed row. This method was iteratively repeated until the top row was reached. This approach was found to be more computationally efficient than rotating the depth map itself.

Before inputting the depth map in the thresholding function, a median filter was applied to remove some of the noise with a 5 × 5 pixel square kernel. This type of filter works by replacing the kernel's center value by the median of all pixels under itself [16]. It was effective at removing the within region noise while keeping the tree stems sharper than a standard Gaussian filter which was important to promote accurate diameter estimation. The equation below shows the final thresholding operation.

$$\text{Depth mask}\_{i,j} = \begin{cases} 1 & \text{if Depth median}\_{i,j} < \text{Threshold plane}\_{i,j} \\ 0 & \text{otherwise} \end{cases} \tag{1}$$

#### 3.1.2. Adaptive Histogram Equalization

Adaptive histogram equalization was used to increase the contrast in the grayscale image from the monochrome imager. The goal was to increase the intensity of the tree stem pixels relative to the background and non-tree foreground objects. The OpenCV implementation of the contrast-limited adaptive histogram equalization (CLAHE) algorithm was used. This algorithm was first introduced by Pizer, et al. [17] for post-processing of still medical images. It starts by dividing the input image into regions. There are three classes of regions: the corner regions (one in each corner), the border regions (adjacent to the image border), and the inner regions. Then, the CLAHE algorithm calculates the histogram of each individual region independently. From the function input parameters, a clip limit is calculated and applied to the histograms. Those histograms are then redistributed so as not to exceed the clip limit. Lastly, the cumulative distribution functions of the resultant contrast-limited histograms are determined and a linear combination of the four nearest regions are taken for grayscale mapping. The major drawback of this first implementation was its expensive computation. However, Reza [18] proposed another approach that made the CLAHE algorithm possible for real-time applications. CLAHE, with a clip limit of 40 and 8 × 8 pixel subregions, was applied as a pre-processing step before being input in the dynamic histogram thresholding function. The resulting frame is shown in Figure 3. CLAHE was used over non-adaptive histogram equalization methods as it yielded a frame with a

greater contrast between tree objects and the background; thus, making trees easier to segment using the dynamic histogram thresholding methods explained in the next section.

**Figure 3.** (**a**) Original grayscale image from the infrared imager; (**b**) grayscale image after applying the contrast-limited adaptive histogram equalization (CLAHE) algorithm.

#### 3.1.3. Dynamic Histogram Thresholding

A user-defined dynamic histogram thresholding function was implemented to find a list of pixel intensities that were most likely representing a willow tree in the CLAHE equalized frame. While developing the Willow Tree algorithm, it was noticed that when applying the depth mask to a histogram function of the corresponding grayscale image, the histogram region representing the tree pixel count spiked significantly compared to other regions.

The function was designed to take five arguments (i.e., image, mask, peak range, low limit, count difference) and output a Python list of the intensity values (0 to 255 scale) that best represent the tree objects. The first step was to compute the histogram of the complete input image with one bin for every pixel intensity value and then to normalize the obtained histogram object from 0 to 255. Those two operations use functions already available in the OpenCV library. Next, the location of the maximum value was found, and the peak range parameter was applied where the central element was the maximum value of the normalized histogram object (Equation (2)). Figure 4 graphically represents the location of the parameters for the histogram of Figure 3b).

$$Inputity\text{ }range\text{ }boundedies = \text{maximum pixel count} \pm \frac{\text{peak range}}{2} \tag{2}$$

**Figure 4.** Histogram curve with thresholding parameters and boundaries displayed.

In this case, the function was run with a peak range value of 50. A low and high clipping value equal to the low limit parameter and the total number of bins were applied to restrict the intensity range in one area of the histograms as well as handling run time errors in Python. The following step was to check if the normalized pixel count values found in the intensity range respect the pixel count difference relative to the maximum normalized pixel count found earlier. To perform this test, the difference in Equation (3) was computed. If the difference was greater than zero, the intensity value was added to the final list of pixel intensity values which best represent willow trees in the current frame.

$$Difference\_i = intensity\,\,range\_i - (maximum\,\,pixel\,\,count - \,count\,\, difference) \tag{3}$$

$$\text{Pixel intensity list}\_{l} = \{ \text{intensity range}\_{i} \text{ if } difference\_{i} > 0 \tag{4}$$

#### 3.1.4. Seed Point Determination and Flood-Fill Algorithm

In this final step of the segmentation procedure, the intensity values found in the previous step are used to find seed point coordinates to be used in the flood-fill algorithm. The output is a binary image where the white pixels represent the segmented areas that are most likely to be willow trees using depth and intensity information from the scene. The first step was to create a frame, called background removed, from which the seed points could be found. The following equation describes this operation.

$$\text{Background removed}\_{i,j} = \begin{cases} \text{input}\,\,\text{image}\_{i,j} & \text{if } \text{Depth}\,\text{mask}\_{i,j} \neq \; 0\\ 0 & \text{otherwise} \end{cases} \tag{5}$$

In this case, the input image is the same as with the previous step, which is the CLAHE equalized grayscale image. Next, for every intensity value in the pixel intensity list, the background removed image was scanned and the seed point image array is given a value of 1 at the location where a match is found.

$$\text{Seed points}\_{\mathbf{i},\mathbf{j}} = \begin{cases} 1 & \text{if Background removed}\_{\mathbf{i},\mathbf{j}} = \text{Pixel intensity list}\_{\mathbf{k}} \\ 0 & \text{otherwise} \end{cases} \tag{6}$$

Using the argwhere() function in the NumPy library, a list of all (x,y) coordinates of the marked pixels in the seed point array was created. Finally, the coordinates were iteratively used in the OpenCV implementation of the flood-fill algorithm. Two other image arrays were given to the function; the CLAHE equalized infrared image and an empty mask. The grayscale image was used to compute the connected components of a seed point and for which they had their value set to 1 for the corresponding pixel in the mask. For two pixels to be connected, they must agree with the following statement:

$$\text{Mask}\_{i,j} = \begin{cases} 1 & \text{if } input\_{x,y} - lowdiff \le input\_{i,j} \le input\_{x,y} + updiff\\ 0 & \text{otherwise} \end{cases} \tag{7}$$

where, *inputi*,*<sup>j</sup>* is the pixel intensity of the currently observed pixel and *inputx*,*<sup>y</sup>* is the pixel intensity of the input image at the seed point coordinate. The parameters lowdiff and updiff are the maximal lower and higher brightness difference, respectively, between the currently observed pixel and its corresponding seed point. In this case, it is a fixed range implementation since the condition always refers to the original seed points [16]. For this research, a value of 10 and 30 for lowdiff and updiff, respectively, were used.

More points were added to the mask as the function scan the list of seed point coordinates. The running time was limited by adding an if statement to check if a value of 1 was already assigned in the mask at the coordinate of the next seed point to be analyzed. This condition reduced the amount of time flood-fill would be run. For example, Figure 5a contains 11,300 points, but only 748 were used to produce the segmentation result in Figure 5b. After the iteration of all seed points, the mask array was reported as the final segmented binary image of the current scene

**Figure 5.** (**a**) Resultant seed point frame from the background removed image; (**b**) segmentation output binary frame.

#### *3.2. Tree Detection*

In the detection procedure of the willow tree algorithm, a morphological closing operation was applied to the segmented binary image before using the OpenCV contour function to find all close contours that could represent willow trees. A user-defined function was designed to filter out unwanted contours and retain only tree contours under the same assumption. Contours that were still present after the filtering operation were considered by the algorithm as true detected willow trees.

The morphological closing operation was undertaken to reduce noise that might have emerged in the flood-fill algorithm. It is commonly used after connected-components algorithms to remove elements resulting purely from noise and to connect nearby regions [16]. The closing operation is simply two morphological operations that are applied one after the other. The first is dilatation, where the pixel at the center of the kernel is replaced by the local maximum of all pixels covered by the kernel. For the binary image, the dilatation operation has the effect of "growing" each filled region. The second operation is erosion. It is simply the inverse of dilation, where the pixel at the center of the kernel is replaced by the local minimum of all pixels covered by the kernel. The effect of erosion on a binary image is to "reduce" the size of each region [16]. For this research, the kernel size used was a solid 2X15 kernel, applied with only one iteration. The kernel was set as an upward rectangle since it was the shape the target objects were expected to be. A morphological close was applied using the OpenCV implementation.

Once the noise had been removed, the binary image was input to the contour-finding function provided in the OpenCV library (i.e., *f*ind contours() function). It outputs a list of contour objects found in the binary frame. At the basic level, contour objects are a list of points that represent a curve within the image [16]. Contour object makes it possible to retrieve features about filled regions such as area, center, and bounding box. Those features were exploited to filter the raw contour list and find parameters for the true tree contours.

Contour filtering used two criteria to remove non-tree contours. The first was the area to remove small filled regions that were most likely unwanted foreground objects or background residuals. The area of each contour was computed using a built-in function in OpenCV. A fixed threshold area of 600 was applied for this experiment. Next, it was assumed that if a contour is a tree, the contour must be present on the bottom pixel row of the frame, so it does not "appear" somewhere in the frame. To test this assumption, each contour was individually drawn in an empty image array. If all pixel values of the bottom row were equal to zero, the observed contour did not meet the requirement and was then removed from the contour list. Contours that were left in the list were considered as true detected willow tree regions. The willow tree algorithm then moves on to assign an identification (ID) number and to find the diameter of all detected trees within the frame. Figure 6b shows the final output.

**Figure 6.** (**a**) Segmentation result after morphological closing was applied; (**b**) Detection procedure output frame.

#### *3.3. Diameter Estimation*

Once the willow trees were detected, the algorithm proceeded by computing the diameter of the trees. As a first step, the pixel width of all remaining contours at the bottom row of the image array was found. Moreover, all trees were at a different distance from the camera, so the depth at this location was also found. Lastly, those features were fed into a function to find the pixel to millimeter ratio and the diameter in millimeters for all detected trees.

To find the pixel width and depth of a detected tree, its corresponding contour was drawn in an empty image array. Then, the bottom pixel row of the drawn contour and the depth median frame were extracted. A new array, including both aligned rows, was created. The array was then trimmed on the column axis, where zeros were present in the contour row. The function traversed the row starting from both ends until it reached a non-zero element. From there, two possible cases were handled; in the first case, the contour row is solid (meaning no zeros within the trimmed array) and for the second case some zeros are present.

Finding the pixel width and depth for the first case was straightforward. Once the zero columns were trimmed, the length of the array was saved as the pixel width and the depth was computed as the median of all depth values at the coordinates corresponding to the non-trimmed pixels. For the second case, the array was split in two at the location of the first zero encounters while traversing the contour row. Then, both sections were trimmed and traversed again for zeros. If no further zeros were found, the function recoded the pixel width and depth for every section. The section showing a greater pixel width was assumed to belong to the true tree. The consequence of handling all contours as in the first case would be to significantly overestimate the tree's diameter.

As depth values did not always exist at the non-trimmed pixel coordinates, the zeros within the depth row were removed, so they did not affect the median result. In the particular case where all depth values were not available at the non-trimmed pixel coordinate, the median of the depth for all other contours in a specific frame was assigned. This was to be expected since depth values were not always available for all pixels due to low confidence within the RealSense stereo-matching algorithm.

To compute the estimate of the tree's diameter, a function takes the argument found in the previous step (i.e., pixel width, depth) along with the depth scale and the FOV of the monochrome imager, which are a fixed constant acquired through the camera's SDK. The depth scale was first used to convert the depth value of an observed tree from the camera scale to millimeters. Then, the half-width of the frame in millimeters for the converted distance was found. Using this value, the pixel-to-millimeter ratio was found for the observed tree. The final step was to multiply the pixel width with the obtained conversion factor. The steps described above to find any *treei* diameter are shown by the following equation,

$$\text{Distance}\_{i} \,(mm) = \text{Depth value}\_{i} \ast Depth\,\text{scale} \ast 1000 \tag{8}$$

$$\text{Half } frame\text{ }width\_i \text{ (}mm \text{) } = \tan\left(\frac{FOV\_x}{2} \ast \frac{\pi}{180}\right) \ast Distance\_i \tag{9}$$

$$Ratio\_i \left(\frac{\text{ynu}}{\text{prxel}\_x}\right) = \frac{Ikalf\\_frame\\_width\_i}{Frame\\_size\_x/2} \tag{10}$$

$$\textit{Diameter}\_{i}\ (mm) = \textit{Pixel}\; width\_{i} \* \textit{Ratio}\_{i} \tag{11}$$

where *Depth scale* is the camera's extrinsic parameter to convert the depth from the camera's 3D coordinate system to meters; *Distance* is the distance between the tree and the camera in millimeters; *FOVx* is the camera's FOV over the *x*-axis; *Half frame width* is the equivalent length of 640 pixels on the *x*-axis (i.e., half the normal resolution of 1280 pixels) in millimeters for a particular scene at *Distancei*; *Frame sizex* is the normal resolution on the *x*-axis in pixels (i.e., 1280 pixels); *Ratioi* is the pixel to millimeter conversion factor for *treei* over the *x*-axis; and *Diameter* is the output diameter value for *treei* in millimeters.

#### *3.4. Field Trials*

All field trials were conducted in Ramea fields located in Saint-Roch-de-l'Achigan. Preliminary field trials were run in the first half of August 2019. The goal was to test the camera position as discussed and test for any hardware issues on site. In the final field trials, five classes of trees were defined, and eight trees were randomly selected per class. A total of 40 trees were observed from the same row on the same day. The classes were defined with the Ramea team. They represent their internal classification system for the manufacturing of green fences and noise barriers. Table 2 shows the morphological characteristics of all classes.


**Table 2.** Willow tree classification classes.

Two ranges for diameter were looked at: greater than 28.57 mm but smaller than 34.92 mm and strictly greater than 34.92 mm. Trees having a diameter less than 28.57 mm were not targeted for diameter characterization by Ramea, since they would be converted to mulch instead of being kept as rods for manufacturing green fences and noise barriers. It is essential to mention that diameter measurement for classification was taken at 152.4 mm from the ground surface. Diameter measurements for data validation were taken at 1.88 m from the soil surface, which was the real-world height of pixels located on the bottom row of the analyzed frame using a caliper with sub-millimeter accuracy. The stem diameter was measured on the plane parallel to the row line and perpendicular to the ground plane. Straightness was a subjective measure of how straight the stem was relative to itself for the 3.05 m to 3.66 m height section measured 152 mm from the ground surface. Table 2 describes where an excessive curvature is located relative to the height range of interest, if any.

Each randomly chosen tree was identified using colored electrical tape as a marker. The electrical tape was applied at 1.5 m above the soil surface. A caliper was used to measure the stem diameter at 1.78 m and 0.15 m from the ground surface. As stems are not perfectly round in nature, the diameter was measured on a plane parallel to the tree row, which was coplanar to the plane used by the camera to estimate diameter. To ensure repeatability, a custom-built measuring stick was prepared to indicate the height where every measurement should be taken. This tool reduced the chance for errors as well as making data collection faster. The tool was used to adjust the tilt of the RealSense camera as well. Once it was placed straight up at the center of the row, the camera was tilted upward until the corresponding mark (3.81 m) on the stick could be seen in the monochrome imager output frame.

Data was then recorded by driving the tractor in front of the row as if harvesting were taking place. In order not to compromise the tree segmentation procedure, the colored electrical tape was placed below the RealSense camera FOV. To capture the markers, another camera, model KeyMission, featuring a wide FOV color sensor from Nikon (Nikon Corporation, Tokyo, Japan), was used. The latest model was able to capture the color markers in addition to the ROI of the RealSense. However, this technique required manual work to match the marked trees with the correct tree ID to perform data validation. As the Willow Tree algorithm was not ready for live testing during the harvesting season, all four streams of the RealSense were saved as one .bag file along with one text file containing raw GPS strings and one .mp4 file from the KeyMission for later analysis. Final field trials were conducted in late September 2019. The tractor was a Massey Ferguson (AGCO, Duluth, Georgia) model 6485 equipped with a track system form SoucyTrack (SoucyTrack, Quebec, Canada). The operator aligned the tractor center line with the center of the next parallel willow tree row allowing the camera to be at roughly 1.22 m from the center of the row as designed. The data acquisition was carried out at a speed of 2.3 km/h with the engine crank-shaft speed set at 2200 rotation per minutes (RPM). Those parameters are typical for the harvesting operation at Ramea. Figure 7 shows the complete hardware installed during the final field trials carried out on a sunny afternoon with little cloud cover.

**Figure 7.** (**a**) Final field trials; (**b**) sensing components layout.

#### *3.5. Data Analysis*

To assess the performance of the tree-detection procedure, 10 labeled images out of the 37 images containing the 40 observed trees were randomly selected and manually validated. Additionally, stratified sampling of frames with respect to the number of detected trees was undertaken. Two categories were created, frames containing 7 or less detected trees and frames containing 10 or more detected trees. For each category, 3 frames were randomly selected, so a total of 6 frames were selected as stratified samples. Thus, the complete set of images for which manual validation has been performed was 16 out of 37 images from the same pass in the field (i.e., 10 random samples and 6 stratified samples). This was done to better capture the distribution of the number of detected trees across frames for statistical analysis. Three variables were examined in each frame: number of correctly detected trees, number of falsely detected trees, and number of undetected trees. Since it can be ambiguous to distinguish trees and branches visually, clear criteria had been specified to count a tree as a tree and not a branch (and vice-versa). Criteria are defined in Table 3.


**Table 3.** Tree detection validation criteria.

The performance of the diameter estimation procedure was assessed using the 40 observed trees in Ramea fields. They corresponded to eight randomly selected trees per class for the five classes observed in the same row and on the same day. Since two ranges of diameters are examined within all classes (i.e., 28.57 mm < d < 34.92 mm, 34.92 mm < d), classes with the same range were grouped together. Thus, results from Class 1 and Class 2 are shown together as well as Class 3, Class 4, and Class 5. Root mean square error (RMSE) was computed according to class group as well as for all data points together. A linear regression with intercept fix at the origin between the observed and estimated diameter was undertaken without regard to class. Moreover, the two highest residuals (i.e., above and below regression lines) were removed from the dataset for further analysis. Hence, they were not included in the linear regression.

#### **4. Results and Discussion**

#### *4.1. Hardware Reliability*

During the field trials, no issues related to hardware occurred. Mechanical and electrical connections were sturdy, and the sensor mount did not show excessive vibration. No loss of power, connection fault with sensors or loss of data took place. Considering the relatively high machine vibration due to the installed track system, the system hardware proved to be rugged and reliable.

#### *4.2. Tree Detection Results*

Manual validation results of the tree detection procedure are shown in Figure 8. From the 16 images, a total of 135 trees were detected where 71.8% were correctly detected and 28.2% were falsely detected. Furthermore, 19 trees were not detected, which represents 16.4% of the total detected count. However, from the falsely detected group, 89.7% were branch objects and the remaining were clusters of trees for which the algorithm filled in the same contour. To support those results, it was also found that 90% of the 40 observed trees were correctly detected. This means that the algorithm was more sensitive than expected to branch objects. From the manually validated images, no non-tree or non-branch objects had been detected, which indicated that the segmentation procedure was able to properly remove the background and none-tree like objects in the foreground. On the horizontal axis of Figure 8, ID 4 to ID 37 are the 10 randomly sampled frames out the original 40 frames dataset. To the right of ID 37, ID 26 to ID 0 represents the 6 stratified sampled frames.

Undetected trees were mostly due to fragmented filled regions, which in fact, represented a single tree. Since the contour filtering function deletes regions smaller than a certain threshold before confirming its presence on the bottom pixel row, if only a small region was observed, the entire contour was deleted. Thus, all other contours representing the same tree were deleted as well and the tree was not detected. The Willow Tree algorithm has no features to regroup fragmented contours of the same object.

Looking at the distribution of the undetected trees within the sample of 40 observed trees, one tree per class was undetected, excluding Class 5, which equates to a 90% detection rate. Table 4 summarizes those findings.

**Figure 8.** Manual validation compared to algorithm results.


**Table 4.** Results of tree detection of the 40 observed trees.

To examine detection precision, a linear regression between the true number of trees and the estimated number of trees was undertaken with the same 10 randomly sampled and the 6 stratified sampled frames (Figure 9). As can be seen, the algorithm tends to overestimate the count of true trees in frames. Knowing that the system has a high amount of falsely detected trees due to branches, this behavior was to be expected. ID 12 had the highest residual from the fitted regression line and so, it was not included in the statistical analysis but still shown as an outlier in Figure 9. The resultant linear regression features an R2 of 0.40 and a RMSE of 1.37 trees.

**Figure 9.** True tree count vs. estimated tree count per frame.

High falsely detected tree rates were mainly due to branches being detected as trees. The willow tree algorithm was sensitive to branches that were the closest to the camera since they could be picked up by depth maps; hence, their pixel intensity value was included in the determination of seed point coordinates to be input in the flood-fill algorithm. In the detection procedure, some of the filled regions representing branches were not filtered out since they were large enough and present on the bottom of the pixel row. Thus, improvement in contour filtering is needed to handle such cases without diminishing the number of correctly detected trees. One solution would be the implementation of a machine-learning technique to improve tree classification. Contour features could be fed into a classifier, such as support vector machine (SVM), which would be able to distinguish between contours representing trees and branches. SVM classifier has demonstrated a high accuracy rate compared to other machine-learning algorithms for fruit detection in outdoor scenes [10]. However, to create a robust machine-learning model, training data would need to be acquired at a different time of the day (e.g., morning, afternoon), in different weather conditions (e.g., sunny, cloudy), and at a different time of the year (i.e., spring, summer, fall). This, of courses, requires more time and resources but have the potential to improve results.

#### *4.3. Diameter Estimation Results*

Due to detection performance, it was possible to analyze 36 of 40 data points. The slope of the regression was found to be 0.78, which indicated that the function tends to underestimate tree diameter. The RMSE for the full dataset was found to be 10.7 mm, which represents 37.5% and 30.7% of the low and high diameter range boundaries, respectively. RMSE for group Classes 1–2 and Classes 3–4–5 was 10.6 mm and 10.8 mm, respectively. The overall coefficient of determination R<sup>2</sup> was found to be poor at 0.099 as shown in Figure 10.

**Figure 10.** Measured tree diameter compared to estimated tree diameter.

Tree ID 277 and Tree ID 52 were the furthest outliers from the fitted regression line for the lowest and highest diameter estimation, respectively. Analyzing the situation at these points, provided insight into the system behavior. For Tree ID 277, the system output was 12.8 mm in diameter, where the measured diameter was 43.2 mm. After the diagnosis, a possible cause was found to be the poor quality of CLAHE equalized grayscale images. CLAHE equalized frames are used as input in the dynamic histograms thresholding function to compute the histograms and find seed point coordinates. In this case, some parts of the trees within the CLAHE frame were darker. Since the flood parameters in the flood-fill algorithm are taken as constant, the function was not able to connect the darker components to the final segmented binary image. Other contours of the same scene appeared skeletal as well. Figure 11 displays the CLAHE frame and the filtered contours frame.

**Figure 11.** (**a**) CLAHE equalized frame; (**b**) segmentation procedure output with labeled detected trees.

For Tree ID 52, the system output a diameter of 43.4 mm whereas the measured diameter was only 24.9 mm. Again, a diagnostic of the processed frames was conducted. In contrast to the previous case, here, the CLAHE equalized was too bright in the background, leading to the non-tree object being connected to a pixel pertaining to true tree objects as the clear distinction in pixel intensity was more subtle. This effect caused the filled region to be larger and hence, overestimated the diameter.

Even if the use of color images was avoided to increase the robustness of the algorithm in varying light conditions, the effect was not completely prevented with grayscale images. Since the RealSense was tilted upward to improve depth map quality, it was also prone to over saturation when beams of light found their way through the tree canopy and into the camera's optical sensors. To address this problem, the low and high difference parameters used by the flood-fill algorithm could adapt dynamically to current frames. It is possible that intensity location of the peak count value in the dynamic histogram thresholding function could be used as an indicator.

#### *4.4. Row Transect*

Because data were acquired for a single willow tree row, spatial interpolation of willow tree count would not have been appropriate. Instead, a transect graph (Figure 12) was produced to show the variability of yield (i.e., number of trees) along the row. Since the operator must stop harvesting to unload before moving on to the next load, data collection must stop once for every cycle. To produce a yield map for an entire field, all data from single loads need to be merged and analyzed together. A spatial interpolation using the inverse distance weighting method could then be used.

**Figure 12.** Willow tree yield transect.

#### **5. Conclusions**

A willow tree yield-mapping technology was developed based on the number and diameter of tree stems harvested. An RGB-D camera using stereovision technology was implemented to collect grayscale images and compute the depth maps of willow tree row scenes grown in a SRF system, prior to harvesting. The system also has potential as a fast and non-destructive yield estimation tool which could be used to assess carbon dioxide (CO2) sequestration of willow trees grown in SRF over time. Data from a GNSS receiver were integrated to tag image data to their corresponding geographical coordinates. A CV algorithm was developed to detect willow trees in images and estimate their diameter. To achieve this, the algorithm relied on equalized grayscale images using the CLAHE algorithm and depth maps information. The RGB stream of the camera was not used. The system was designed and built in partnership with Ramea Phytotechnologies, Inc.

The system was able to correctly detect 71.8% of tree stems and estimate their diameter with an RMSE of 10.7 mm. To achieve this, the willow tree algorithm relied on equalized grayscale images using the CLAHE algorithm and depth map information. Detection errors were primarily due to the detection of branches as tree objects which represented 89.7% of the number of falsely detected trees. Diameter estimation errors were mostly due to the low contrast between pixels pertaining to tree objects and backgrounds despite the implementation of the CLAHE algorithm. To increase system performance, the contour filtering procedure needs to be strengthened and the system must be able to better handle cases of oversaturation in scenes. This system is the first of its kind and provides a promising first step for further development of a more robust and commercially viable product.

**Author Contributions:** Conceptualization, X.L.-T., and M.L.; methodology, M.L. and V.A.; software, M.L. and J.P.; validation, M.L.; formal analysis, M.L.; writing—original draft preparation, M.L.; review and editing, V.A.; supervision and project administration, V.A.; funding acquisition, M.L., X.L.-T., and V.A. All authors have read and agreed to the published version of the manuscript.

**Funding:** Research was funded by the Natural Sciences and Engineering Research Council of Canada (NSERC), grant number NSERC EGP 536515-18.

**Acknowledgments:** The authors would like to thank Ramea Phytotechnologies, Inc for their time, input, and field-based advice throughout this research.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Deep Learning and Machine Vision Approaches for Posture Detection of Individual Pigs**

#### **Abozar Nasirahmadi 1,\*, Barbara Sturm 1, Sandra Edwards 2, Knut-Håkan Jeppsson 3, Anne-Charlotte Olsson 3, Simone Müller <sup>4</sup> and Oliver Hensel <sup>1</sup>**


Received: 22 July 2019; Accepted: 28 August 2019; Published: 29 August 2019

**Abstract:** Posture detection targeted towards providing assessments for the monitoring of health and welfare of pigs has been of great interest to researchers from different disciplines. Existing studies applying machine vision techniques are mostly based on methods using three-dimensional imaging systems, or two-dimensional systems with the limitation of monitoring under controlled conditions. Thus, the main goal of this study was to determine whether a two-dimensional imaging system, along with deep learning approaches, could be utilized to detect the standing and lying (belly and side) postures of pigs under commercial farm conditions. Three deep learning-based detector methods, including faster regions with convolutional neural network features (Faster R-CNN), single shot multibox detector (SSD) and region-based fully convolutional network (R-FCN), combined with Inception V2, Residual Network (ResNet) and Inception ResNet V2 feature extractions of RGB images were proposed. Data from different commercial farms were used for training and validation of the proposed models. The experimental results demonstrated that the R-FCN ResNet101 method was able to detect lying and standing postures with higher average precision (AP) of 0.93, 0.95 and 0.92 for standing, lying on side and lying on belly postures, respectively and mean average precision (mAP) of more than 0.93.

**Keywords:** convolutional neural networks; livestock; lying posture; standing posture

#### **1. Introduction**

Computer vision techniques, either three-dimensional (3D) or two-dimensional (2D), have been widely used in animal monitoring processes and play an essential role in assessment of animal behaviours. They offer benefits for the monitoring of farm animals due to the wide range of applications, cost and efficiency [1]. Examples of employing machine vision techniques in monitoring of animal health and behaviour have been reviewed [1–3]. In monitoring of pigs, interpretation of animal behaviours, particularly those relating to health and welfare as well as barn climate conditions, is strongly related to their locomotion and posture patterns [3,4]. In order to determine lying and standing postures in pigs, 3D cameras have been widely used, due to their possibility of offering different colours in each pixel of an image based on the distance between the object and the depth sensor. One such effort was reported by [5], in which the monitoring of standing pigs was addressed by using the Kinect sensor. In this research, initially noise from depth images was removed by applying a spatiotemporal interpolation technique, then a background subtraction method was applied to detect

standing pigs. Lao et al. [6] recognized lying, sitting and standing behaviours of sows based on 3D images. In another study, a Kinect sensor was also employed to localize standing pigs in depth images for automatic detection of aggressive events [7]. 2D images have also been used for locomotion and lying posture detection, which were mainly based on pixel movement [8] or features of ellipses fitted to the animals [9,10].

In most of the studies, due to different light sources and resolution (quality) of captured images, problems of a high level of noise and the generation of a great deal of data cause challenges for machine vision-based monitoring of livestock. It has been reported that machine learning techniques have the possibility to tackle these problems [11,12]. The most popular machine learning techniques employed for analysis of optical data for monitoring of pigs are included models (i.e., linear discriminant analysis (LDA), artificial neural networks (ANN), support vector machine (SVM)). However, in recent years, deep learning approaches, a fast-growing field of machine learning, have been used in image classification, object recognition, localization and object detection [13]. Deep learning is similar to ANN, but with deeper architecture and the ability for massive learning capabilities which leads to higher performance [14]. Deep learning has recently been applied in 2D and 3D pig-based machine vision studies. An example is the separation of touching pigs in 3D images using a low-cost Kinect camera, based on the convolutional neural network (CNN), with high accuracy of around 92% [15].

Deep learning has also been used for detection and recognition of pigs' or sows' behaviours in different imaging systems. A detection system based on the Kinect v2 sensor, the faster regions with convolutional neural network features (Faster R-CNN) technique in combination with region proposal network (RPN) and Zeiler and Fergus Net (ZFnet), were developed by [16] to identify different postures of sows (i.e., standing and sitting behaviours, and sternal, ventral and lateral recumbency). Yang et al. [17] used deep learning for automatic recognition of sows' nursing behaviours in 2D images, with an accuracy of 97.6%, sensitivity of 90.9% and specificity of 99.2%. Faster R-CNN and ZFnet were also employed to recognize individual feeding behaviours of pigs by [18], where each pig in the barn was labelled with a letter. Their proposed method was able to recognise pigs' feeding behaviours with an accuracy of 99.6% during the study. Segmentation of sow images from background in top view 2D images was also addressed by means of Fully Convolutional Network (FCN), built in Visual Geometry Group Network with 16 layers (VGG16), in another study by [19]. Their results showed the possibility of employing deep learning approaches for segmentation of the animal from different background conditions.

Due to the fast growth of pig production around the world, the issue of real-time monitoring of the animals, particularly in large scale farms, becomes more crucial [1]. Monitoring of an individual pig's posture during the whole lifetime in large scale farms is almost impossible for farmers or researchers, due to the labour- and time-intensive nature of the task. Utilizing state-of-the-art machine learning, along with machine vision techniques, for monitoring of groups of pigs in different farming conditions could offer the advantage of early problem detection, delivering better performance and lower costs. Thus, the main objective of this study was to develop a machine vision and a deep learning-based method to detect standing and lying postures of individual pigs in different farming conditions, including a variety of commercial farm systems.

#### **2. Material and Methods**

#### *2.1. Housing and Data Collection*

The data sets for this study were derived from three different commercial farms in Germany and Sweden. In Germany, two farms for weaning and fattening of commercial hybrids of Pietrain× (Large White × Landrace) were selected. A set of four pens in a room in each farm were chosen; these had fully slatted floors with central concrete panels and plastic panels on both sides (weaning farm), and fully slatted concrete floors (fattening farm). In a Swedish fattening farm, two rooms with two pens were selected, each having part-slatted concrete flooring with some litter on the solid concrete in the lying area. Pigs were of commercial Hampshire × (Landrace × Large White) breeding.

The images used in this study were recorded by top view cameras (VIVOTEK IB836BA-HF3, and Hikvision DS-2CD2142FWD-I). The cameras were connected via cables to servers and video images from the cameras were recorded simultaneously and stored on hard disks. Recording in different farming conditions allowed for capture of images with pigs of different skin colour and age, housed under various floor type and illumination conditions, which facilitated the development of a robust deep learning model. Examples of images used for development of the detection model are illustrated in Figure 1.

**Figure 1.** Example of images used for development of the detection algorithms in this study.

#### *2.2. Proposed Methodology*

According to the findings of Nasirahmadi et al. [12], when in a side (lateral) lying posture pigs lie in a fully recumbent position with limbs extended, and when in a belly (sternal) lying posture the limbs are folded under the body. Due to the lack of available benchmark datasets for pigs' lying and standing postures, three classes of standing, lying on belly and lying on side were defined in this work. Examples of these postures are represented in Figure 2. Most of the possible scenarios of individual pig standing and lying postures within the group have been considered. The proposed detection methods were developed and implemented in python 3.6 and OpenCV 3.4 using TensorFlow [20]. Experiments and the deep learning training were conducted on a Windows 10 with a NVIDIA GeForce GTX 1060 GPU with 6GB of memory.

**Figure 2.** Examples of three posture classes used in this study in different farming conditions.

In order to establish the lying and standing postures, images from the three commercial farms over a period of one year were collected from surveillance videos. A total of 4900 images (4500 for training and validation, and 400 for testing) from this dataset, which incorporated various farm, animal age, and animal and background colour conditions, were selected to provide the training (80%: 3600) and validation (20%: 900) datasets. The images were selected at 1–2 week intervals from the beginning to the end of every batch, and different samples of images were collected randomly in one day of these selected weeks. Furthermore, another 400 images were taken randomly from the image data set as a testing set. The testing data were independent of the training and validation sets, and were used for an evaluation of the detection phase. All images were first resized to 1280 × 720 (width × height), then annotated using the graphical image annotation tool LabelImg, created by [21]. The annotations included the identification of standing and lying postures and were saved as XML files in the PASCAL VOC format. Since each image included different pigs (7–32 in number) based on farm conditions and time of recording, the total number of labelled postures was 52,785. The information on each posture class is summarized in Table 1.


**Table 1.** Details of image data sets used for posture detection.

In this study, various combinations of deep learning-based models (i.e., Faster R-CNN (Inception V2, ResNet50, ResNet101 and Inception-ResNet V2), R-FCN (ResNet101) and SSD (Inception V2)) were developed with the aim of finding the best posture detection technique in 2D images. Two architectures of the ResNet model (i.e., ResNet50 and ResNet101) were applied. ResNet is a deep residual network introduced by [22] and has been widely used in object detection and localization tasks. Both ResNet50 and ResNet101 are based on repeating of four residual blocks. These feature extractors are made of three convolution layers of 1 × 1, 3 × 3 and 1 × 1. They also have an additional connection joining the input of the first 1 × 1 convolution layer to the output of the last convolution 1 × 1 layer. Additionally, residual learning in the ResNet model resolves the training by fusing filtered features with original features. ResNet50 and 101 are very deep networks and contain 50 and 101 layers. ResNet applies skip connections and batch normalization and provides short connection between layers [23]. Furthermore, ResNets allow direct signal propagation from the first to the end layer of the network without the issue of degradation. Inception V2 GoogLeNet [24] with layer-by-layer structure was also implemented as a feature extractor of the input pig posture images. The feature extractors which map input images to feature maps, characterize computational cost as well as performance of networks. Inception V2 GoogLeNet is based on the repetition of building blocks and extracts features concatenated as output of the module [24]. It is made of a pooling layer and different convolutional layers. The convolution layers are of different sizes of 1 × 1, 3 × 3, 5 × 5 and help to extract features from the same feature map on different scales and improve performance. Inception V2, by factorization of filters n×n to a combination of 1 × n and n × 1 convolutions, has led to better performance and reduction in representational bottlenecks. This has resulted in the widening of the filter banks instead of deepening, to remove the representational bottleneck without increase in computational cost and number of parameters [25]. Finally, Inception-ResNet V2, which was proposed in 2017 [26], was utilized as a feature extractor in this work. Inception-ResNet V2 is one of the state-of-the-art approaches in image classification and is based on combination of residual connections and Inception architectures. Inception-ResNet V2 is an Inception style network which uses residual connections instead of filter concatenation [26]. Further details on all parameters of the proposed feature extractors can be found in [22,24,26].

#### 2.2.1. Faster R-CNN

The Faster R-CNN method [27] was used to monitor standing, side and belly lying postures of pigs. Figure 3 shows the architecture and different steps of the Faster R-CNN utilized in posture detection. Various feature extraction methods (Inception V2, ResNet50, ResNet101 and Inception-ResNet V2) were applied and the regional proposal network (RPN), which is a fully convolutional network for generating object proposals, was used to produce a proposed region for each image and generate feature maps in the final layer by predicting the classes [25]. The feature maps were fed into the region of interest (RoI) pooling to extract regions from the last feature maps. Each RoI (region of pigs) was then determined to a confidence score. Finally, the feature vector from the RoI was fed into several fully connected (FC) layers of a softmax classifier to obtain the final assurance scores and a regression bounding box to localize the coordinates of the detected objects.

**Figure 3.** Schematic diagram of the faster regions with convolutional neural network (Faster R-CNN) used in this study.

#### 2.2.2. R-FCN

R-FCN [28] is an improvement of the Faster R-CNN and is proposed as an object detection technique using CNNs. The structure of the proposed R-FCN applied in this study is shown in Figure 4. The R-FCN consists of a feature extractor (ResNet101 in this study) and several FC layers behind an RoI pooling layer. The RoI pooling layer uses position sensitive maps to address the issue of translation invariance [29]. In the R-FCN, the computation is shared across the whole image by creating a deeper FCN without increasing the speed overhead. Region proposal and classification are done by use of RPN, followed by the position sensitive maps and RoI pooling, respectively. For training of the R-FCN model in this study, the same hyperparameters as the faster R-CNN were used.

**Figure 4.** Schematic diagram of the region-based fully convolutional network (R-FCN) used in this study.

#### 2.2.3. SSD

The SSD was proposed by Liu et al. [30] and is based on a bounding box regression principle. In the SSD, the input images are first divided into small kernels on different feature maps to predict anchor boxes of each kernel (Figure 5). In comparison with the Faster R-CNN and R-FCN techniques,

SSD uses a single feed forward CNN to produce a series of bounding boxes and scores for the presence of the objects in each box [29]. Finally, the classification scores of each region are computed based on the score obtained in the previous section. The SSD model, instead of applying proposal generation, encloses the whole process into a single network, which leads to less computational time.

**Figure 5.** Schematic diagram of the single shot multibox detector (SSD) used in this study.

The initial learning rate is a hyperparameter and shows adjustment of weight of the networks. Too high learning rates may cause poor converge or a network unable to converge, and a low initial learning rate will result in slow convergence [31]. So, in this study different initial learning rates of 0.03, 0.003, 0.0003 were selected to train the Faster R-CNN, R-FCN and SSD networks. The momentum algorithm accumulates the weighted average of the previous training gradient and continues to move in that direction. The past weighted average updates the current gradient direction, and the momentum coefficient specifies to what extent the update needs to increase or decrease [32]. In this study, training of all networks was conducted with a momentum 0.9. The right number of iterations is important in training of the models, as too small numbers will result in poor performance while a high number of iterations may cause a weakening of the generalization ability of the trained model [32]. So, according to the number of samples [32] the iteration number, which shows the number of weight updates during the training process, was chosen to be 70,000 for all networks in this study. Furthermore, for data augmentation a random horizontal flip method for all detectors was used.

#### **3. Experimental Results and Discussion**

The big challenges in monitoring of pigs' behaviour, particularly in large-scale farms, are cost, time and labour-intensity of the visual investigation. The importance of automatic monitoring of group and individual behaviours of pigs has led to the development of image processing techniques with the ability to monitor a large number of pigs. However, due to the variability in farm and animal conditions, especially in commercial farms, the development of a robust monitoring technique with the ability to detect individual animal behaviours has been examined in this study.

The performance obtained for each posture class of the models in the test data set with each initial learning rate is illustrated in Table 2. In order to evaluate the detection capability of the proposed models, the widely used average precision (AP) of each class and mean average precision (mAP) were calculated. High values for the AP and mAP show the acceptable performance of the R-FCN ResNet101, Faster R-CNN ResNet 101 and Faster R-CNN Inception V2 detection approaches for scoring of standing, lying on belly and lying on side postures among group-housed pigs when compared to the other models. Additionally, the learning rate of 0.003 gave the best results in the mentioned detection models. The evaluation metrics illustrate a trend for improvement when the learning rate decreases

from 0.03 to 0.003, however these values declined at the learning rate of 0.0003. This finding is in line with the previous finding that the detection performance changes with the same trend in learning rates in various scenarios [33].


**Table 2.** Performance of the detection phase in test data set in various learning rates.

The training and validation loss of the models at a learning rate of 0.003 are presented in Figure 6. As can be seen, there are more fluctuations in the validation curves than in the training ones for each model, which could be due to the lower size of data set used in the validation phase [34]. The graphs show the decrease in the loss of training and validation steps of the developed models, and the values converged at a low value when the training process was finished, which is an indicator of a sufficiently trained process. After training the model, 400 images (not used for training and validation) were used as a test set (including 4987 lying and standing postures) to evaluate the correctness of the posture detection approaches. The test data set contained images from the three farms of the study which were selected from different times of the day where animals had lying, feeding, drinking and locomotion activities in the barn.

(**b**)

**Figure 6.** Training and validation loss during the (**a**) Faster R-CNN, (**b**) R-FCN and SSD training processes.

Figure 7 illustrates examples of detected postures by Faster R-CNN, R-FCN and SSD models in the different farming conditions in the learning rate of 0.003. As can be seen from the figure, some of the developed algorithms have the ability to detect different lying and standing postures under all tested commercial farming conditions. In some cases, there were multiple postures in an image, so the possibility of detecting various postures in an image is also shown in Figure 7.

It can be observed from Table 2 that one of the major misdetections was with respect to standing and belly lying postures. Due to the similarity between these postures in top view images (Figure 8), some incorrect detection was observed. However, in lying on side the animals have a different top view image (Figure 2) compared to the other postures, and the number of incorrect detections was lower.

**Figure 7.** Examples of detected standing (light green rectangle), lying on belly (yellow rectangle) and side posture (green rectangle) of six different models in various farming conditions.

The confusion matrix of the selected R-FCN ResNet101 model in the test dataset is shown in Table 3. As can be seen in the table, the number of misdetections between standing and belly lying postures are higher than the other postures.

**Figure 8.** Sample of images of standing posture which are similar to belly lying posture in top view. **Table 3.** Confusion matrix of the proposed R-FCN ResNet101 in the test data set at learning rate of 0.003.


The proposed R-FCN ResNet101 technique for detection of standing and lying postures in pigs under commercial farming conditions by 2D cameras provided a high level of detection performance (e.g., mAP of 0.93), which is illustrated in Figure 7. In a sow posture study, Faster R-CNN along with RPN and ZF-net models were able to detect standing, sitting and lying behaviours with a mAP of 0.87 using Kinect v2 depth images [16]. In another study, monitoring of lying and nursing behaviour of sows using a deep learning approach (FCN and VGG16) resulted in achieved accuracy, sensitivity and specificity of 96.4%, 92.0% and 98.5%, respectively [17]. However, the study reported by [17] was carried out to monitor one sow per image, which has less probability of detection mistake compared to our study with number of pigs varying from 7–32 per image. The use of Faster R-CNN as a detector and ZF-net as a feature extractor showed high values of precision and recall (99.6% and 86.93%, respectively) in a study by [18]. However, compared to our study, they had just four animals in each image and each animal's body was labelled as A, B, C, and D for tracking their behaviours, which would help to improve detection performance. There were some misdetections of the test data set for each posture in our study. This was mainly due to the similarity between standing and belly lying postures in top view images (Figure 8, as previously discussed) and to the problem of covering of the camera lens with fly dirt as time progressed, reducing the visibility in images [10]. Furthermore, the shape change of an animal's body caused by the fisheye lens effect at the edge of the pen (particularly in the weaning farm), and the existence of a water pipe in the middle of the pen (at the fattening farm) which caused some invisible area in the image, gave some misdetection in the models. Low resolution of the image data impacts on the performance of machine vision techniques, and the model is not able to extract enough features to perform accurate detection. As shown in this study, misdetection was more likely for pigs with black skin colour as they have a greater similarity with the background (pen floor) colour.

Furthermore, in this study testing time of the state-of-the-art detection models was calculated. The speed of each proposed method was measured based on the frames per second (FPS) [35]. The results show a FPS of 6.4 for the Faster R-CNN Inception V2, 3.6 for the Faster R-CNN ResNet50, 3.8 for the Faster R-CNN ResNet101, 2.3 for the Faster R-CNN Inception-ResNet V2, 5 for the R-FCN ResNet101, and 8.8 for the proposed SSD Inception V2. According to Table 2, R-FCN ResNet101 showed the highest performance as well as an acceptable FPS (5) for the pig posture detection. It was also found that the Faster R-CNN Inception V2 had high enough performance for lying-on-side posture detection along with a FPS of 6.4. Since the lying posture can be used as a parameter of pig comfort assessment in various thermal conditions, the Faster R-CNN Inception V2 model can be beneficial in terms of higher processing speed. The SSD model had the highest rate of FPS (8.8), however the lowest performance was obtained in this model, which was not sufficiently reliable in pig lying and standing detection.

The proposed R-FCN model could be used to monitor pig postures in the group to control climate and barn conditions continuously. In this context, three days of recorded video data were fed into the model in order to score the number of the lying and standing postures. Results of the scoring phase

Lying on belly Lying on side

**Figure 9.** (**a**) Results of scoring (in percentage) of the lying and standing postures across the day. (**b**) Standing posture (light blue rectangle), lying on belly (blue rectangle) and side posture (green rectangle).

As shown in the scoring diagrams, the algorithm can continuously monitor the activity and postures of group-housed pigs in the barn. In the graph of standing postures, activity peaks were apparent at feeding and activity times or when the animals were checked by the farmer. The lying and standing activity patterns of the animals, which were scored using the proposed automated method, offer the advantage of early detection of health and welfare issues in pig farms [3]. For example, changes in lying and standing activities levels could be used for the detection of lameness and tail biting events [36].

In a previous study, Nasirahmadi et al. [12] showed that image processing and a linear SVM model was able to score different lying postures (sternal and lateral) in commercial farming conditions. However, the performance of the scoring technique was highly dependent on the output of the image processing method, which led to some wrong scoring in the pigs lying postures. In this study, due to the use of different deep learning approaches, high enough precision for practical use was obtained. The monitoring approach described in this study could be a valuable tool to detect changes in the number of pigs in the standing, lying on side or belly postures to improve welfare and health surveillance. However, this new method needs to be adapted and evaluated with a wider range of farming conditions (both commercial and research) in future, which may require a greater number of images for training of the model or use of other feature extraction methods.

#### **4. Conclusions**

In pig husbandry, it is important to monitor the animals' behaviour continuously between birth and slaughter. However, this can only be achieved using robust machine learning and machine vision techniques capable of processing image data from varied farming conditions. In this study, two commercial farms (weaning and fattening) in Germany and one commercial fattening farm in Sweden were used to record 2D video data for a year and provide an image database. In order to have various imaging conditions, pens with different floor type and colour, various numbers of pigs per pen (from 7 to 32) and varying ages and colours of pig were selected. The techniques proposed in this study were based on using image data from the surveillance system and Faster R-CNN, R-FCN and SSD methods. These models were trained using 80% (3600) of the image data and validated by 20% (900) of the 2D images, with a total of around 52,785 standing, lying on belly and lying on side postures in the images. The trained model was then evaluated using 400 new images. Results of the testing phase showed a high level of mAP and good processing speed for the different postures in the R-FCN ResNet101 model. This technique performed well in the monitoring of various lying and standing postures of pigs in image data acquired from commercial farms, which has been the limitation of most 2D and 3D conventional machine vision techniques. The proposed model has the flexibility and good robustness toward different pig farm lighting conditions, age of the animals and skin colour, which can be an important step toward developing real-time, continuous computer-based monitoring of livestock farms.

**Author Contributions:** Conceptualization, A.N., B.S. and S.E.; methodology, A.N., B.S., S.E. and S.M; software, A.N; validation, A.N., B.S. and S.M.; formal analysis, A.N., B.S. and S.M. investigation, A.N., B.S., S.E. and S.M.; resources, H.J., A.-C.O. and S.M.; data curation, A.N.; writing—original draft preparation, A.N.; writing—review and editing, B.S., K.-H.J., A.-C.O., S.E., S.M. and O.H.; project administration, B.S. and A.N.; funding acquisition, B.S., S.E., A.N. and O.H., please turn to the CRediT taxonomy for the term explanation. Authorship must be limited to those who have contributed substantially to the work reported.

**Funding:** This research was funded by the European Union's Horizon 2020 research and innovation programme, grant number "No 696231". The work was financially supported by the German Federal Ministry of Food and Agriculture (BMEL) through the Federal Office for Agriculture and Food (BLE), grant number "2817ERA08D" and The Swedish Research Council Formas, grant number "Dnr 2017-00152".

**Acknowledgments:** We thank the funding organizations of the SusAn ERA-Net project PigSys.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Body Dimension Measurements of Qinchuan Cattle with Transfer Learning from LiDAR Sensing**

**Lvwen Huang 1,2,\*, Han Guo 1, Qinqin Rao 1, Zixia Hou 1, Shuqin Li 1,2,\*, Shicheng Qiu 1, Xinyun Fan <sup>3</sup> and Hongyan Wang <sup>4</sup>**


Received: 7 October 2019; Accepted: 18 November 2019; Published: 19 November 2019

**Abstract:** For the time-consuming and stressful body measuring task of Qinchuan cattle and farmers, the demand for the automatic measurement of body dimensions has become more and more urgent. It is necessary to explore automatic measurements with deep learning to improve breeding efficiency and promote the development of industry. In this paper, a novel approach to measuring the body dimensions of live Qinchuan cattle with on transfer learning is proposed. Deep learning of the Kd-network was trained with classical three-dimensional (3D) point cloud datasets (PCD) of the ShapeNet datasets. After a series of processes of PCD sensed by the light detection and ranging (LiDAR) sensor, the cattle silhouettes could be extracted, which after augmentation could be applied as an input layer to the Kd-network. With the output of a convolutional layer of the trained deep model, the output layer of the deep model could be applied to pre-train the full connection network. The TrAdaBoost algorithm was employed to transfer the pre-trained convolutional layer and full connection of the deep model. To classify and recognize the PCD of the cattle silhouette, the average accuracy rate after training with transfer learning could reach up to 93.6%. On the basis of silhouette extraction, the candidate region of the feature surface shape could be extracted with mean curvature and Gaussian curvature. After the computation of the FPFH (fast point feature histogram) of the surface shape, the center of the feature surface could be recognized and the body dimensions of the cattle could finally be calculated. The experimental results showed that the comprehensive error of body dimensions was close to 2%, which could provide a feasible approach to the non-contact observations of the bodies of large physique livestock without any human intervention.

**Keywords:** transfer learning; deep learning; body dimensions; point cloud; Kd-network; feature recognition; FFPH; non-contact measurement

#### **1. Introduction**

The availability of three-dimensional (3D) sensing models for large animals is becoming more and more significant in many different agricultural applications [1–3]. For the healthy cultivation and genetic breeding of large animals, periodic measurement of the animals' body dimensions is necessary for breeders and researchers to master the growing complications related to pregnancy, laming, and animal diseases [4–7]. Large-scale measuring at a hold frame by skilled inspectors (traditional ways consist of a meter stick and metric tapes [8]) is ubiquitous, which has costs in terms of heavy manual

labor, the animal's stress response, and low efficiency due to long fatigue or differences in the individual experience among inspectors [9]. Very often, large animals cannot be effectively modeled on the spot by means of a series of classical 3D modeling methods due to their geometrical complexity or texture at growing periods, and 3D imaging or range scanners have been widely used to acquire the shape of an animal [10]. With the great achievements and breakthroughs of deep convolutional neural networks (DCNNs) in light detection and ranging (LiDAR) data classification [11–13], deep models could be reconstructed for the body measuring of large animals. This paper follows the state-of-the-art DCNN models and classical PCD processing methods, focusing particularly on the possibility of measuring the body dimensions of Qinchuan cattle, one of five most excellent beef breeds in China.

Qinchuan cattle are historic, famous, and have excellent performance in meat production. For an adult cow, the average height at the withers is about 132 cm and its average body weight reaches 420 kg, whilst for an adult bull, the average height at the withers is about 148 cm and its average body weight can reach above 820 kg. With the growing demand for meat from the vast population, the number of Qinchuan cattle is on the rise. To master the periodic feedback of growth and nutrition status in mass production, it is necessary to be able to automatically measure them in a feasible and effective way.

#### *1.1. Livestock Body Measuring with LiDAR*

With the development of 3D information technology, LiDAR (light detection and ranging) sensor scanning and structural light measurements have become important methods to quickly obtain 3D spatial data. They can be used to quickly and accurately obtain the coordinate data of the measured object surface and realize the non-contact measurement of the object. Compared with traditional measurement methods, 3D scanning measurement technology has many advantages such as fast speed, good real-time performance, and high precision [14]. According to different research objectives, relevant studies on the measurement requirements of 3D data in animal husbandry can be roughly divided into three categories: (i) to improve the breeding quality and precision feeding and to prevent diseases such as claudication; (ii) to analyze the body condition of specific parts; and (iii) to study the acquisition methods of body weight, body dimensions, etc. To generate a 3D cattle body model based on depth image, the point cloud data of cattle can be collected by a Kinect sensor to obtain 3D information of the cattle body to quantify the evaluation of cattle body condition [15]. By using the advantages of 3D depth image, researchers combined a thermal imaging camera and proposed a system for measuring the body shape and temperature of black cattle to complete the periodic quality assessment during the growth of cattle [16]. Some researchers have used a depth camera to obtain the body dimension data of pigs to estimate their weight, and found that the average absolute error of the nonlinear model measured by non-contact depth camera was 40% lower than that of the same nonlinear model measured by hand [17]. A dual web-camera high-resolution system was developed to obtain the 3D position of homologous points in a scene, which can be used to estimate the size and weight of live goats [18].

Due to the wide applications of livestock body measurement with 3D data, many processing and analysis methods have been proposed for different purposes. In order to measure pig body dimensions, the random sample consensus (RANSAC) algorithm can be used to remove the background point cloud and the foreground point cloud can be extracted with Euclidean clustering [19]. Additionally, some researchers have applied spatiotemporal interpolation techniques to remove moving noises in the depth images and then detect the standing-pigs with the undefined depth values around them [20]. Azzaaro et al. [21] used the linear and polynomial kernel principal component analysis to reconstruct shapes of cows with a linear combination of the basic shapes constructed from the example database and model validation showed that the polynomial model performed better than other state-of-the-art methods in estimating body condition score (BCS). These successful studies show that it is feasible to use 3D depth sensors to measure cattle body dimensions.

#### *1.2. Applications of Deep Learning*

In recent years, with sustainable development in the field of artificial intelligence and the continuous improvement of deep learning methods, image classification and recognition technology are developing toward a more intelligent direction. At present, image feature extraction is mainly divided into two categories: manual feature and mechanized feature extraction. Most representative and classical manual features are scale invariant feature transform (SIFT) [22], and histogram of oriented gradients (HOG) [23], and so on. These features are widely used for image classification of small datasets; however, in the case of large datasets, it is difficult to extract appropriate features from images [24]. Therefore, deep learning is commonly considered to extract high-level features of images to resolve this problem and reduce the impact of low-level features of manual feature extraction on image classification performance. At present, there have been many in-depth studies on two-dimensional (2D) image processing [25], and good results have been achieved in classification and recognition and other tasks. However, 3D data processing is still in its infancy. With the advent of 3D sensors such as IFM O3D303, Microsoft Kinect, Google Tango, and so on, 3D data have grown rapidly, and the recognition or classification based on 2D images has some limitations on special occasions. Image processing is developing from 2D to 3D, and the construction of deep learning networks for 3D image will be the hotspot of future research [26].

For 3D data sensed by LiDAR, deep learning techniques have been developed and applied for different occasions. To extract and classify tree species from mobile LiDAR data, deep learning techniques have been used to generate feature abstractions of the waveform representations of trees, which contribute to the improvement of classification accuracies of tree species [27]. A building detection approach based on deep learning utilizing the fusion of LiDAR data and orthophotos has been presented, and comparison experiments show that the proposed model outperforms the support vector machine (SVM) models in working area [28]. A way to segment individual maize from terrestrial LiDAR data has been proposed by combining deep leaning and regional growth algorithms, and it shows the possibility of solving the individual maize segmentation problem from LiDAR data through deep leaning [29].

The development of a neural network of 3D data has experienced three stages of PointNet [30], PointNet++ [31], and the Kd-network [32]. PointNet mainly resolved the problem of the disorder of PCD. The point cloud feature was abstracted point by point, and then the global feature vector was obtained by using symmetric function. PointNet++ has a hierarchical structure based on PointNet, divides more child point cloud in each level, and uses PointNet to extract the features of each point cloud. PointNet in local point cloud recognition has obvious advantages in accuracy, which is up to 95%. The wide applications of convolutional neural networks (CNNs) in 2D images provide the theoretical basis and technical support for the classification of PCD. A team designed and implemented a visualization technology to optimize the CNN model and deeply understand the functions and operations of the intermediate layers, which is the classic study of CNN visualization [33]. The Kd-network can perform fast computation through a 3D indexing structure, where the parameter sharing mechanism is applied and the representations from leaf nodes to the roots are calculated. However, this method needs to sample point clouds and to construct Kd-trees for every iteration and can employ multiple Kd-trees to represent a single object [34].

3D sensing for the non-contact measurement of body conditions has a good effect and effectively reduces the possibility of injuries to animal and inspectors [35]. With the development of deep learning models proposed for PCD, the evolving applications of PCD with deep learning could improve the accuracy of object classification and measurement, and similarly, the application of transfer learning could refine a deep model to an object. Transfer learning is an emerging research direction of machine learning, and unlike deep learning, it is mostly applied to cases of insufficient training data. The prior knowledge acquired by deep learning, namely the training results, can be applied to the relevant recognition fields. Most 3D data are not enough or perfect to satisfy all demands of different occasions and there also exists a lack of sufficient PCD of large livestock to train and verify deep models. Therefore, it is necessary to resolve the problem of the different distributions of the training and test dataset by means of transfer learning [36]. To improve the generalization ability of the deep model, the features and related examples could be transferred from a massive dataset to a trace dataset [37]. The deep model framework has a practical impact on the actual operation performance and classification accuracy. In this paper, the framework of PyTorch was employed after performance comparisons of the mainstream deep learning frameworks, whose application is more concise and flexible than the TensorFlow framework of PCD.

#### *1.3. Main Purposes*

There are two key problems in realizing the automatic measurement of body dimensions of live Qinchuan cattle through transfer learning including how to preprocess the original PCD and how to transfer the features of PCD to obtain the target cattle body and how to automatically recognize the feature points of the body dimensions. For these problems, the original contributions of this paper can be summarized as follows:


The rest of this paper is organized as follows: Section 2 describes the proposed methods in detail. In Section 3, the automatic extraction experimentation of the feature points of body dimensions was carried out on the collected PCD of several groups of live Qinchuan cattle. Section 4 describes the experimental results. Section 5 discusses the performance of the cattle body dimension measurements. The last section summarizes the considerations, conclusions, and future works.

#### **2. Materials and Methods**

An overview of the proposed methodology is shown in Figure 1, which combines the Conda software package management system under the PyTorch deep learning programming framework to develop the deep learning network in the environment of Python 3.6 and CUDA 9.1. The LiDAR sensor of IFM O3D303 (IFM Inc., Essen, Germany) was used to acquire the PCD of cattle body shape. Visual Studio 2015 was combined with the software development suite of IFM and Point Cloud Library (PCL), the C++ language was adopted to acquire and preprocess the collected PCD, and the acquired point cloud dataset of the cattle body shape was combined with the obtained deep learning network information for transfer learning. Finally, the recognition of the feature points of the cattle body dimensions was realized, and normalization of the cattle body point cloud and selection of the feature points were carried out to obtain the measurement data of cattle body dimensions.

**Figure 1.** Flowchart of body dimensions measurement of Qinchuan cattle with transfer learning from LiDAR sensor.

#### *2.1. D Point Cloud Deep Learning Network*

The construction of a neural network is influenced by the defects of PCD collection and the characteristics of the point cloud itself. Three problems will be encountered in the acquisition process: data missing, noise, and rotation variability [38]. In addition, the characteristics of 3D PCD also affect the construction of a neural network including unstructured data, invariance of arrangement, and change in point cloud number [39].

To resolve the problems in the acquisition process and the point cloud's own characteristics, in this paper, the Kd-tree network [40] was first used to make the 3D PCD have a fixed representation method and arrangement order. Then, the CNN was used to construct the deep learning network structure to realize the deep learning of the 3D point cloud.

The size of the input PCD in this paper was 211, and there were 16 categories. Therefore, 11 convolution layers were set, and each layer corresponded to each layer in the tree. The deep learning network framework was composed of 11 convolution layers and one full connection layer. The specific model hierarchy is shown in Figure 2.

**Figure 2.** Network model hierarchy.

Similar to the CNN, the network could share the weights in the same level, and used a bottom-up approach and the process was layered. In some sense, the representation of spatial positions on a certain layer was obtained through linear and nonlinear operations in the representation of multiple surrounding positions on the previous layer. In the same layer of the Kd-tree, the receptive field of any two nodes did not overlap [41].

Since the data of deep learning are not usually linearly distributed, the activation function needs to be used to add nonlinear factors [42]. Common activation functions include the Sigmoid function, Tanh function, and ReLU function. The ReLU function is the modified linear unit function, which is the mainstream neuron activation function in deep learning in recent years. Its mathematical expression is shown in Equation (1). When *x* > 0, the function gradient is identically equal to 1. Therefore, in the process of back propagation, the parameters of the first few layers of the network can be updated quickly to alleviate the problem of the disappearing gradient. Compared with the other two functions, the ReLU function is linear and unsaturated, which can significantly accelerate the convergence rate of the neural network [43].

$$f(\mathbf{x}) = \max(0, \mathbf{x}) \tag{1}$$

where *x* is the input neuron.

As there is no body shape dataset for animals to train the network, the 3D contour shape dataset of PCD in ShapeNet was selected. The training data contained 16 classes, a total of 15,990 samples of PCD files, which included planes (2421), bags (68), hats (49), cars (1641), chairs (3371), headsets (62), guitars (708), knives (352), lights (1391), laptops (400), motorcycles (181), cups (165), guns (247), rockets (59), skateboards (136), and tables (4739). (data source: [44]).

Different training epochs have different influences on deep models. Smaller training epochs lead to under fitting and very large training errors. When the training epoch is increased to a certain number, the accuracy can basically remain unchanged or change a little bit with more and more computing facilities. If continuing to increase the epochs, it could waste more training time and more facilities [45]. The training epoch of 100, 500, 1000, 2000, and 3000, respectively, was selected to train the deep model, and the training results are shown in Figure 3, where the abscissa in each figure is the training epoch, and the ordinate is the corresponding accuracy rate. Table 1 is a comparison of the average accuracy rate with each training epoch. From Figure 3 and Table 1, as the training epoch increased, so did the accuracy rate. If the training epoch increases to or exceeds 1000, the changing tendency turns slowly. Considering the training consumption-time, learning rate, and computing loss, the training epoch was set as 1000 for this Kd-network model.

**Figure 3.** Accuracy rates with different training epochs. (**a**) Accuracy rates with 100 training epochs; (**b**) accuracy rates with 500 training epochs; (**c**) accuracy rates with 1000 training epochs; (**d**) accuracy rates with 2000 training epochs; (**e**) accuracy rates with 3000 training epochs.


**Table 1.** Comparison of the accuracy rate with different training times.

The learning rate reflects the speed of gradient descent (SGD) in training, whose size can affect the convergence errors. If the learning rate is set too large, the error could be difficult to converge. If the learning rate is set too small, the consuming time could increase to cause the local optimization [46]. The learning rate of 0.001, 0.003, 0.005, 0.007, and 0.009, respectively, was set to train the deep network. The training results are shown in Figure 4, where the abscissa is the training epoch and the ordinate is the corresponding accuracy. It can also be seen from Table 2, that when the learning rate was less than 0.003, the average accuracy rate increased with the increase in the learning rate; when the learning rate was over 0.003, the average accuracy rate decreased with the increase in the learning rate. Considering the global SGD, computing facilities, accuracy, and number of samples, the learning rate can be set as 0.003 to achieve better training results. By selecting an appropriate training epoch and learning rate as above-mentioned, the comprehensive accuracy rate of 3D PCD could reach up to 89.6%, where the deep model can effectively extract, recognize, and classify 3D PCD.

**Figure 4.** Accuracy rates with different learning rates. (**a**) Accuracy rates with a learning rate of 0.001; (**b**) accuracy rates with a learning rate of 0.003; (**c**) accuracy rates with a learning rate of 0.005; (**d**) accuracy rates with a learning rate of 0.007; (**e**) accuracy rates with a learning rate of 0.009.


**Table 2.** Comparison of accuracy with different learning rates.

#### *2.2. Cattle Body Point Cloud Recognition Based on Transfer Learning*

#### 2.2.1. Data Acquisition and Preprocessing

The LiDAR sensor of IFM O3D303, based on the principle of Time of Flight (ToF), was employed to acquire the PCD of cattle shape. By sending continuous light pulses to the target, the sensor receives the light returned by the target, and calculates the flight time of the detected light pulses to obtain the distance of the target [47]. As the industrial depth sensor has relatively low resolutions, the captured PCD is sparse, therefore, every five frames, files could be stored to meet the body shape computation of enough density. To capture more concise and more precise PCD, the live cattle to be measured could be guided to be calm, or stationary or walk slowly in front of the LiDAR sensor by farmers or workers, and the cattle could be led to pass the measuring passage one by one, and not to trot or jump greatly. The original results of the 3D PCD are illustrated in Figure 5, where the 3D PCD after transformation clearly showed the basic contour of the target cattle with background information.

(**a**) (**b**)

**Figure 5.** 3D PCD (Point Cloud Data) acquisition for Qinchuan cattle (real specimen) with the LiDAR sensor where the 3D image shows the basic silhouette of the target cattle. (**a**) Shown in RGB; (**b**) Shown in 3D image.

To obtain better results of segmentation, clustering, and feature extraction, the original PCD obtained needs to be preprocessed by a series of classical point cloud processing methods to cancel the background information such as different occlusions, target surface reflections, equipment errors, and so on [48]. According to the different characteristics of each filter, a preprocessing fusion with different filters was applied to achieve the best filtering results, where the conditional filter [49] was employed to remove the background with different distances, the statistical filter [50] to cancel outliers with different object surfaces, and the voxel filter [51] to compress the PCD of cattle. The fused processing results of multiple filters are shown in Figure 6, where much irrelevant information and outliers could be clearly eliminated. In Figure 6, the regions marked with yellow circles in the left were almost removed, leaving a clear PCD of the target cattle shape and other adherent outliers.

**Figure 6.** Filtering results with multiple filters. (**a**) Original PCD; (**b**) Results with three filters where most noises and outliers were well removed.

(**a**) (**b**)

For some adherent outliers after preprocessing, the clustering segmentation [52] was required to obtain the PCD of a single cattle body. Based on Euclidean distance, the clustering segmentation [53] can only divide objects outside the target within a certain distance, and objects close to the target (such as the ground) can be classified into the same clusters, which cannot be separated [54]. To separate the same clusters of different objects, the RANSAC algorithm [55] is needed for plane extraction. The plane model is used to extract the ground PCD close to the cattle body and to segment a single file of cattle PCD. The segmentation results with Euclidean and RANSAC clustering are shown in Figure 7, where only a single cattle body silhouette is well preserved in the point cloud file.

**Figure 7.** Segmentation with Euclidean clustering and RANSAC, where most of the background adherent to cattle has been canceled.

Since there is no 3D PCD dataset for animals, all data need to be collected manually, so the amount of data obtained is limited [56]. However, a large amount of training data is needed in the training process of neural networks to prevent overfitting. Therefore, the affine transformation [57] was used to enrich the existing PCD of the cattle body silhouette.

By rotating PCD at different angles and by mirroring PCD in horizontal and vertical directions, the PCD of the cattle body can be enriched. As shown in Equation (2), the coordinate - *xyz* of PCD multiples the affine transformation matrix *M* to get a new coordinate of rotation - *x y z* , where the rotation, mirroring, and other transformations can be realized.

$$\begin{pmatrix} \ x' & y' & z' \end{pmatrix} = M \begin{pmatrix} \ x & y & z \end{pmatrix} \tag{2}$$

The PCD could be rotated clockwise along the 45◦, 90◦, 135◦, 180◦, 225◦, 270◦, and 315◦ as well as mirrored through horizontal and vertical transformation. The data augmentation operation expanded the original 251 cattle by nine times to a final dataset of 2510 PCD files of cattle bodies. The transformation results of a single PCD of the cattle body silhouette are shown in Figure 8.

**Figure 8.** Transformation results of a single point cloud. (**a**) Rotation result with clockwise 45◦; (**b**) rotation result with clockwise 90◦; (**c**) rotation result with clockwise 135◦; (**d**) rotation result with clockwise 180◦; (**e**) rotation result with clockwise 225◦; (**f**) rotation result with clockwise 270◦; (**g**) rotation result with clockwise 315◦; (**h**) horizontal mirror result; (**i**) vertical mirror result.

#### 2.2.2. Design of Transfer Learning Network Structure

After the pre-training of the Kd-network, the initial parameters of the 3D deep model were obtained on large-scale ShapeNet network datasets. The spatial feature information of 3D PCD was extracted, and needed to be transferred to recognize the cattle body silhouette. For transfer learning here, the source data and the target data do not need to have the same data distribution [58].

In transfer learning, the PCD dataset of the cattle body was used as input to obtain partial output of the convolution layer that has been pre-trained on the Kd-network, and then use this output to train a fully connected network. Connect the convolution layer with the fully connected layer obtained from the previously pre-trained Kd-network, and then start the training of the transfer learning model. During transferring, there is training and verification in each iteration, and the loss can be back-propagated during training. Meanwhile, the parameters can be optimized, and the training results can be adjusted during verification. Each iteration, the loss value and correct value can be counted, the model with the highest correct value is saved, and then the next iteration of training is conducted. In transfer learning, the weights of all network layers, except the last fully connected layer, are frozen, and only the fully connected layer is modified so that the gradients during back propagation are not calculated, which can effectively avoid the occurrence of overfitting and improve training efficiency [59]. Finally, the original 16 outputs were changed into two outputs, that is, the original 16 classifications were changed into two classifications: cattle body point cloud and other point clouds.

In the training, 2510 enriched PCD of cattle body were applied for input, and the TrAdaboost algorithm [60] was used to transfer the samples. This algorithm can gradually improve the training weight of target samples in the examples of source datasets according to certain weight rules, reduce the weights of non-target samples, and improve the generalization ability of the model.

#### *2.3. Recognition of Feature Points of Live Qinchuan Cattle Body*

#### 2.3.1. Normalization of Cattle Body Point Cloud

In the process of collecting cattle data, due to the influence of the position and orientation of collecting equipment, the acquired PCD of the cattle body had a different orientation. To extract uniform features, the normalization method [61] was first used to calculate a unified orientation.

The standard measurement coordinate of cattle body was defined as follows: take the center of the mass of PCD of live Qinchuan cattle silhouettes as the coordinate origin, the body length direction of cattle as the *x* axis, the body height direction of cattle as the *y* axis, and the chest width direction of cattle as the *z* axis. The tail direction of the cattle is the positive direction of the *x* axis, and the direction pointing vertically to the ground is the positive direction of the *y* axis. The positive direction of three axes conforms to the right-handed rectangular coordinate system.

Principal component analysis (PCA) [62] was adopted to obtain the coordinate axis of the cattle point cloud and establish a new coordinate system. The processing flow is as follows:

(1) Obtain the center point of the cattle body.

Define the input point set of PCD of the cattle body *P* = *pi <sup>i</sup>* <sup>=</sup> 1, 2, ··· , *<sup>n</sup>* , where *n* is the number of points in the set. Then, the center point *pm* of the cattle body can be calculated by Equation (3).

$$p\_m = \frac{1}{n} \Sigma\_{i=1}^n p\_i \tag{3}$$

(2) Calculate the covariance matrix.

The covariance matrix *Cp* of the PCD of cattle body can be calculated by Equation (4) with the center point *pm* of the cattle body.

$$C\_p = \frac{1}{n} \Sigma\_{i=1}^n (p\_i - p\_m)(p\_i - p\_m)^T \tag{4}$$

(3) PCA coordinate system is established according to feature vectors.

Three non-negative eigenvalues, λ0, λ1, and λ2, can be obtained through the covariance matrix *Cp*, and Equation (5) is used to calculate the eigenvector *e*0, *e*1, and *e*2, where λ*<sup>i</sup>* is the N-*the* eigenvalue of the covariance matrix *Cp* and *ei* is the corresponding eigenvector of λ*i*.

$$\mathbb{C}\_{\mathbb{P}}\mathbf{e}\_{i} = \lambda\_{i}\mathbf{e}\_{i}, \ i \in \mathbf{0}, \ \mathbf{1}, \ \mathbf{2} \tag{5}$$

The PCA coordinate system is established by taking the obtained center point *pm* of the cattle body as the coordinate origin, and the direction of the obtained characteristic vectors *e*0, *e*1, and *e*<sup>2</sup> as the direction of the *x* axis, *y* axis, and *z* axis, respectively.

The PCA coordinate obtained was ambiguous for the orientation of the cattle tail point to the positive direction of the *x* axis or the negative direction of the *x* axis. Here, the tail pointed uniformly to the positive direction of the *x* axis. The uniform orientation of all the PCD of the cattle body needed to be corrected as the highest point of the first half part of the cattle body was higher than that of the latter half part with common sense [63]. Define the point set *Q*<sup>1</sup> over zero in the point set of cattle *P* on the *x* axis after orientation correction, and define the point set *Q*<sup>2</sup> less than zero on the *x* axis.

$$\begin{array}{l} Q\_1 = \{ p\_i \in P | \mathbf{x}\_i > 0 \} \\ Q\_2 = \{ p\_i \in P | \mathbf{x}\_i < 0 \} \end{array}$$

The point sets *Q*<sup>1</sup> and *Q*<sup>2</sup> were searched to find the maximum points on the *y* axis direction, which were denoted as *q*<sup>1</sup> and *q*2, respectively. If *q*<sup>1</sup> < *q*2, the orientation of the cattle body is in the positive direction of the *x* axis and the point cloud of the cattle body needs to be corrected. Otherwise, no correction of the point cloud is required. A mirror transformation on point set *P* is needed and its symmetric data can be obtained by Equation (6) to change the orientation of the cattle body, where - *xyz* represents the coordinates before the mirror, and - *x y z* represents the new coordinates after the mirror. Finally, the tail of the cattle body in the corrected PCD files points to the positive direction of the *x* axis.

$$
\begin{pmatrix} x' & y' & z' \end{pmatrix} = \begin{pmatrix} x & y & z \end{pmatrix} \begin{vmatrix} -1 & 0 & 0 \\ 0 & 1 & 0 \\ 0 & 0 & 1 \end{vmatrix} \tag{6}
$$

#### 2.3.2. Extraction of the Candidate Areas of Feature Points

To accelerate the selection efficiency of feature points in the PCD of cattle body, the candidate regions of feature points can be extracted. The curvature is the basic characteristics of the surface shape, reflecting the concave or convex degree of the PCD surface, and has a geometrical invariability of translation, rotation, scaling, and other transformation [64]. The mean curvature *H* [65] and Gaussian curvature *K* [66] were applied to extract the feature points in the candidate region of Qinchuan cattle.

It can be known from the differential geometry that there are countless normal planes at points in a surface, and normal curvature *k* is the curvature of the intersection line between the normal plane and the surface. The normal curvature *k* describes the curvature degree of the surface in a certain direction, and its maximum *k*<sup>1</sup> and minimum *k*<sup>2</sup> are the principal curvature of normal curvature. The normal curvature *k* in any direction can be calculated by the Euler equation, as shown in Equation (7), where θ is the angle between the osculating plane and the direction of the maximum value *k*1.

$$k = k\_1 \cos^2 \theta + k\_2 \sin^2 \theta \tag{7}$$

The mean curvature of a surface is the mean value of the two-principal curvature, as calculated by Equation (8), which can indicate the concavity and convex of a surface. Gaussian curvature *K* of a surface is the product of two principal curvatures, as shown in Equation (9), whose positive and negative value can determine the properties of points on the surface. When *K* > 0, the points on the surface are elliptic points. When *K* = 0, the points on the surface are parabolic points. When *K* < 0, the points on the surface are hyperbolic points.

$$H = \frac{1}{2}(k\_1 + k\_2) \tag{8}$$

$$K = k\_1 k\_2 \tag{9}$$

The principal curvature [67] of a point on a surface is the representation of the local shape of the point, which is independent of the surface parameters and can be calculated according to the first and second basic forms of the surface. Therefore, the mean curvature and Gaussian curvature can also be obtained by Equations (10) and (11), where *Ix*, *Iy* are the first derivatives along the *x* axis and *y* axis, and *Ixy*, *Ixx*, and *Iyy* are the corresponding second derivatives.

$$H = \frac{\left(1 + I\_x^2\right)I\_{yy} - 2I\_x I\_y I\_{xy} + \left(1 + I\_y^2\right)I\_{xx}}{2\left(1 + I\_x^2 + I\_y^2\right)^{\frac{3}{2}}}\tag{10}$$

$$K = \frac{I\_{\text{xx}} I\_{yy} - I\_{\text{xy}}^2}{\left(1 + I\_{\text{x}}^2 + I\_{\text{y}}^2\right)^2} \tag{11}$$

Mean curvature and Gaussian curvature reflect the concavity and convexity of the surface, and nine kinds of combinations can be obtained according to the differences of positive or negative characteristics, as shown in Table 3. As the sixth kind of combination *H* = 0 and *K* > 0 is contradictory in math, it does not exist, so there are actually eight surface types that can be obtained by the combinations. The surface shapes of the corresponding combinations are shown on the right side of Table 3. Finally, through the calculation of mean curvature and Gaussian curvature, the cattle point cloud was classified according to the concavity and convexity of the feature points on the point cloud surface, and candidate areas of feature points were obtained.



#### 2.3.3. Feature Point Recognition

The measuring data of the body dimensions are the height at the withers, chest depth, back height, waist height, and body length, respectively, as shown in Figure 9. The points that need to be automatically obtained are the upper point of the height at the withers, the lower point of the cattle chest depth, the upper point of the height, the upper point of the waist height, the upper point of the body length, and the lower point of the body length, as shown in Figure 9b. As the height at the withers, chest depth, back height, or waist height is perpendicular to the ground, the lower point is the height at the withers or the lower point of the back height. The lower point of the waist height can be obtained by the *y* axis coordinate of the ground point as well as the *x* axis, the *z* axis coordinate of the corresponding point, and the upper point of the chest depth can be obtained through the *x* axis, the *z* axis coordinate of the lower point of the chest depth, and the highest point in the area regarded as the *y* axis. Finally, the fast feature point histogram (FPFH) [69] data were used to construct the feature model database of each cattle, and to recognize the feature point of the cattle body.

**Figure 9.** Schemes of adult Qinchuan cattle: (**a**) Five body dimensions; (**b**) Positions of feature points to be automatically acquired.

As the recognition process of each feature point is similar, we used the recognition of the upper point of the height at the withers as an example to describe it in detail. First, the FPFH values of the feature points in the candidate areas were calculated one by one according to the saving order, and then were matched with the values in the feature model database. If the difference between the two points is within a certain threshold, the matching is successful, and the point is judged to be the feature point of the height at the withers of the target cattle, and the search and calculation are stopped. Otherwise, match failure occurs and the same searching continues with the saving order until the match succeeds.

#### **3. Results**

After the transfer learning model was trained, it took about 25 s to process the dataset and extract the relevant parameters. The non-contact measurements of the body dimensions were performed in Visual Studio 2015 with PCL 1.8 (CPU Intel i7 3.4 G-Eight cores, Gloway DDR4 RAM of 64 GB, Windows 10 of 64-bit, Nvidia GeForce GTX950 of 2G, CUDA 9.1). The running time of all the C++ code was less than five seconds from the acquisition, filtering, and clustering segmentation to reconstruction. Figure 10 shows the result of the transfer learning and training, with the abscissa representing the training epochs and the ordinate representing the corresponding accuracy rate. After training 1000 epochs, the accuracy rate of each training was calculated including all of the training epochs. The average accuracy rate of the deep model was 93.6%, which was greatly improved when compared with the previous average correct rate of deep learning.

**Figure 10.** Correct rates of transformation training.

To validate the deep model for measuring the body sizes of live Qinchuan cattle, the 3D PCD of three live Qinchuan cattle were collected at the National Beef Cattle Improvement Center, Ministry of Agriculture and Rural Affairs, Shaanxi Province. The ear tags of the three cattle were Q0392, Q0526, and Q0456, respectively. In the experimentation, six feature points of the body dimensions were extracted from the upper point of the height at the withers, the lower point of the chest depth, the upper point of the back height, the upper point of the waist height, the upper point of the body length, and the lower point of the body length. The visualization results of body dimensions feature points recognition of the three cattle are shown in Figure 11. Figure 12 shows the results calculated by automatically acquiring the feature points of the reconstructed cattle silhouette.

**Figure 11.** Recognition results of the feature points of three adult Qinchuan cattle: (**a**) Cattle Q0392; (**b**) Cattle Q0526; (**c**) Cattle Q0456.

**Figure 12.** Automatic measurement results of the body dimensions of three live Qinchuan cattle: (**a**) Cattle Q0392; (**b**) Cattle Q0526; (**c**) Cattle Q0456.

#### **4. Discussion**

The body dimension data of cattle Q0392, Q0526, and Q0456 extracted by automatic recognition and manual interactive extraction and their errors are shown in Table 4. The maximum error value of cattle Q0392 was body length, which was 2.36%, and caused by the large error during the recognition of feature points of the body length. For cattle Q0526, the chest depth data error value was the largest, which was 1.45%, while the height at the withers error value was the smallest, only 0.08%. For cattle Q0456, the error values of the height at the withers and chest depth was large, while the error of the back height was the smallest, only 0.09%.


**Table 4.** The data and error values of the body dimensions of the three cattle (unit: m).

From Table 4, the overall error of the body dimensions with the proposed deep learning model was within 0.03 m and the average error was about 0.02 m. In the calculated data of the cattle body dimensions, the error rate was basically within 1.5%. A bigger error in the body length of cattle Q0526 occurred through a bigger error of the upper point of the body length. The methodology can effectively achieve the automatic acquisition of feature point data of cattle body dimensions, avoid any manual intervention, simplify the measurement process, and improve the measurement efficiency.

#### **5. Conclusions**

This paper proposed an automatic acquisition of body dimensions with deep learning for live Qinchuan cattle. The PyTorch framework of deep learning of 3D point clouds based on a Kd-network was pre-trained by a 3D PCD dataset of ShapeNet. The training epoch and learning rate of the Kd-network deep model were analyzed with an average classification accuracy of 89.6%. After the 3D PCD acquisition of cattle by the LiDAR sensor, preprocessing of the classical filters fusion, and PCD processing of classical segmentation, the PCD of the cattle silhouette was extracted, and after data augmentation, the training dataset of the transfer learning based on the TrAdaBoost algorithm was created. The created dataset, as the target data, was applied to train the transfer learning model with a final average accuracy of 93.6%. After the normalization and orientation correction of the cattle body, the candidate feature regions were extracted with the mean curvature and Gaussian curvature. The FPFH of the candidate regions were extracted and matched. Finally, five linear body dimensions of cattle were calculated with the error rate within 2.36% and measuring errors within 0.03 m, and thus basically meets the measuring demands of real production.

The Qinchuan cattle is one of the five main beef cattle breeds in China, and the most widely used in northwest China. This method is only applicable to short-haired cattle breeds such as Qinchuan cattle and there have been proven large errors or even some big mistakes in our experiments with adult Qinghai yaks. Although some achievements have been made in theory, method and application, there are still many challenges to be resolved. In the near future, we will experiment with other beef breeds or crossbred cattle to validate these methods, which will improve the classification accuracy of the deep model by adjusting the network layers and designing convolution filters of different dimensions.

**Author Contributions:** L.H. designed the whole research, edited the English language and rewrote this paper. S.L. contributed the whole research. H.G. and Q.R. designed and implemented the measuring system with C++/C# and wrote this manuscript. X.F. designed the fitting model and correction experimentation of the system and wrote the detection of this manuscript. Z.H. and S.Q. assisted gathering all the experimental data during the experiments and manuscript. H.W. contributed the effective experimental solution.

**Funding:** This research was funded by the Key Research and Development Project in Ningxia Hui Nationality Autonomous Region, grant number 2017BY067.

**Acknowledgments:** All of the authors thank the research group of Prof. Linsen Zan (Chief Scientist of the National Beef Cattle Improvement Center, Ministry of Agriculture and Rural Affairs, China) for supporting the manual guidance and holding of Qinchuan cattle.

**Conflicts of Interest:** The authors declare no conflicts of interest.

#### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
