**Thermal Imaging Reliability for Estimating Grain Yield and Carbon Isotope Discrimination in Wheat Genotypes: Importance of the Environmental Conditions**

**Sebastián Romero-Bravo 1,2,\*, Ana María Méndez-Espinoza 2, Miguel Garriga 2, Félix Estrada 2, Alejandro Escobar 2, Luis González-Martinez 2, Carlos Poblete-Echeverría 3, Daniel Sepulveda 4, Ivan Matus 5, Dalma Castillo 5, Alejandro del Pozo <sup>2</sup> and Gustavo A. Lobos 2,\***


Received: 10 April 2019; Accepted: 5 June 2019; Published: 13 June 2019

**Abstract:** Canopy temperature (Tc) by thermal imaging is a useful tool to study plant water status and estimate other crop traits. This work seeks to estimate grain yield (GY) and carbon discrimination (Δ13C) from stress degree day (SDD = Tc <sup>−</sup> air temperature, Ta), considering the effect of a number of environmental variables such as the averages of the maximum vapor pressure deficit (VPDmax) and the ambient temperature (Tmax), and the soil water content (SWC). For this, a set of 384 and a subset of 16 genotypes of spring bread wheat were evaluated in two Mediterranean-climate sites under water stress (WS) and full irrigation (FI) conditions, in 2011 and 2012, and 2014 and 2015, respectively. The relationship between the GY of the 384 wheat genotypes and SDD was negative and highly significant in 2011 (r<sup>2</sup> = 0.52 to 0.68), but not significant in 2012 (r<sup>2</sup> = 0.03 to 0.12). Under WS, the average GY, Δ13C, and SDD of wheat genotypes growing in ten environments were more associated with changes in VPDmax and Tmax than with the SWC. Therefore, the amount of water available to the plant is not enough information to assume that a particular genotype is experiencing a stress condition.

**Keywords:** remote sensing; phenotype; phenotyping; phenomics; *Triticum aestivum*; water deficit; stress; infrared

### **1. Introduction**

Since the 1960s, crop temperature has been recognized as an indicator of water status [1]. When the plant is facing a water deficit, the stomata begin to close, reducing the transpiratory capacity (evaporative cooling) [2] and this results in increases in canopy temperature [3–7].

The development of infrared sensors/cameras has allowed a faster characterization of canopy temperatures [8]. At the same time, through computational analysis, it is possible to split the different

parts of the image (e.g., soil, air, leaves, weeds) focusing only on the fraction(s) of interest [4,9,10]. Although thermal imaging does not directly measure stomatal conductance, the stomatal response is the main cause of changes in canopy temperature [10], so it is a useful tool to indirectly study spatial and temporal heterogeneity of leaf/canopy transpiration and the photosynthetic rate [10–12]. Indeed, in bread wheat grown in hot environments in Mexico under irrigation, a high correlation has been reported between temperature depression (TD = Ta − Tc) and leaf stomatal conductance (*r* = 0.76 − 0.85) and grain yield (GY; up to *r* = 0.84) [11,13]. Other researchers have used the concept of stress degree day (SDD), defined as the difference between leaf/canopy temperature (Tc) and air temperature (Ta) (SDD = Tc − Ta), which is equivalent to TD (but with positive values), mostly because canopy temperature in rainfed environments is lower than air temperature.

The main problem with the use of thermal assessments to estimate physiological and agronomic traits is that Tc is influenced by several environmental factors, such as air temperature and humidity, wind speed, net radiation, and soil water content [14–17]. Therefore, without detailed information about environmental factors, measurements of Tc are not sufficient to properly perform agronomic or physiological trait estimations.

Unlike irrigated conditions, a good correlation between Tc and GY under water deficit is not always expected [18]. However, it would be very useful for breeding programs to find such associations in stressful environments because the focus is on developing drought-tolerant cultivars with higher GY under water-limiting conditions.

It has been established that measurements of carbon isotope discrimination (Δ13C) in wheat are crucial for the selection of individuals with efficient water-use, mainly because this parameter is positively correlated with GY and negatively correlated with water-use efficiency (WUE) in moderately water-stressed to non-water-stressed environments [19–25]. The determination of Δ13C is simple and relatively fast but needs expensive equipment or engagement of a paid analysis service; attempts have also been made to estimate Δ13C by modeling the canopy spectral reflectance [24,26]. Under non-stressed conditions, the stomata remain open and the substomatal cavity is enriched with 12C relative to the air; the heavier isotopic 13CO2 has a lower diffusion speed than the lighter 12CO2 [20]. Additionally, the ribulose bisphosphate carboxylase/oxygenase (RUBISCO) carboxylation enzyme in C3 plants has a higher affinity to 12CO2. On the other hand, when stress forces the stomata to close, the proportion of 12CO2 in the substomatal cavity is reduced, thus increasing the amount of fixed 13CO2 [20]. Thus, daily conditions throughout the season will be summarized in the Δ13C of leaves and kernels (calculation details in Section 2.2.1). In this sense, under the expected climate change scenarios predicted for the coming decades [27], the estimation of Δ13C should be relevant in plant breeding programs oriented to environmental constraints [28–30].

Like all species, the phenotype of wheat plants is controlled by a large number of genes, and the expression of these is modulated, predominantly, in response to the environmental conditions (GxE) [31–33]. Consequently, it was hypothesized that the environmental conditions during and between seasons could interfere with the ability of canopy thermal imaging to estimate GY and Δ13C; in particular, the vapor pressure deficit (VPDmax) and soil water content (SWC), which can have a strong influence on canopy temperatures [34]. Therefore, the aim of this work was to study the reliability for estimating grain yield and carbon isotope discrimination in wheat genotypes growing under water stress (WS) and full irrigation (FI) conditions using thermal images, considering the relevance of the prevailing environmental conditions in estimation of the results.

### **2. Materials and Methods**

### *2.1. Plant Material and Experimental Conditions*

During four growing seasons (2011, 2012, 2014, and 2015), two sets of plant material were evaluated in two Mediterranean environments: (1) Cauquenes (c) (35◦58' S, 72◦17' W; 177 m.a.s.l.) under WS (rainfed) conditions during seasons 2011, 2012, and 2015, and under FI in 2015; and (2) Santa Rosa (sr) (36◦32' S, 71◦55' W; 217 m.a.s.l.) under WS in 2011 and 2015, and FI conditions in 2011, 2012, 2014, and 2015. Each combination of season (year), location (c or sr), and water condition (WS or FI) was considered as an environment.

A collection of 384 advanced lines and cultivars of spring bread wheat (*Triticum aestivum* L.), were evaluated during 2011 and 2012. Plant material originated from three breeding programs: the Instituto de Investigaciones Agropecuarias in Chile (INIA-Chile) (153 genotypes), INIA-Uruguay (178 genotypes), and the International Wheat and Maize Improvement Centre CIMMYT (53 genotypes). In 2014 and 2105, a subset of 16 genotypes with contrasting tolerance to water deficit was studied.

Each genotype was established in plots of five rows (2.0 <sup>×</sup> 0.2 m) with a seeding rate of 20 g m<sup>−</sup>2. Plots were fertilized with 260 kg ha−<sup>1</sup> of ammonium phosphate (46% P2O5 and 18% N), 90 kg ha−<sup>1</sup> of potassium chloride (60% K2O), 200 kg ha<sup>−</sup><sup>1</sup> of Sul-Po-Mag (22% K2O, 18% MgO, and 22% S), 10 kg ha−<sup>1</sup> of Boronatrocalcita (11% B), and 3 kg ha−<sup>1</sup> of zinc sulfate (35% Zn). Fertilizers were incorporated with a cultivator before sowing. During tillering, an extra 153 kg ha−<sup>1</sup> of N was applied. Weeds were controlled with the application of flufenacet + flurtamone + diflufenican (96 g a.i.) as pre-emergence and a further application of MCPA (525 g a.i.) + metsulfuron-methyl (5 g a.i.) as post-emergent [35]; dates of sowing and the main phenological stages are shown in Table 1. Furrow irrigation was used at Santa Rosa, with the WS trials including one irrigation at the end of tillering (Zadocks stage 21–Z21; [36]) and FI comprising irrigations at the end of tillering (Z21), the flag leaf stage (Z37), heading (Z50), and early grain filling (Z71). At Cauquenes, the WS trials were purely rainfed and the FI trial during 2015 received sprinkler irrigation at Z37, Z50, and Z71. Approximately 50 mm was applied during each furrow/sprinkler irrigation application.

At each location, soil volumetric content (m3 m<sup>−</sup>3) was monitored periodically using 10HS sensors (Decagon Devices, Pullman, WA, USA), scanning the first 50 cm depth every 4 h. In order to generate the soil water content (SWC; mm), the volumetric values were multiplied by the soil depth (500 mm). Precipitation (mm), ambient temperature (◦C), and relative humidity (%) were monitored hourly by autonomous weather stations (AWSs) belonging to the Red Agroclimática Nacional (National Agroclimatic Network, available at: www.agromet.inia.cl). Vapor pressure deficit (VPD; kPa) was determined hourly by the use of ambient temperature and relative humidity, according to Reference [37]. For analysis purposes, each environmental variable was studied as follows from sowing to harvest: (1) precipitation: daily summation; (2) ambient temperature: average of the daily maximum temperatures (Tmax); and (3) VPDmax: estimated at the highest ambient temperature and the corresponding relative humidity of each day, and then the average of the daily maximum VPDs (VPDmax) was calculated. Because water deficit in Mediterranean environments is present, primarily, between anthesis to grain filling, SWC was considered as the average of the daily mean values between anthesis and grain maturity.

Tmax, VPDmax, and SWC are summarized in Table 1 and Figure S1, and rainfall in Figure S2.

### *2.2. Evaluations*

### 2.2.1. Grain Yield and Carbon Isotope Discrimination

Grain yield was evaluated by harvesting the whole plot (2 m2) and was expressed as t ha−1. Carbon isotope composition (δ13C) was determined in mature kernels using an elemental analyzer (ANCA-SL, PDZ Europa, UK) coupled with an isotope ratio mass spectrometer, at the Laboratory of Applied Physical Chemistry at Ghent University (Belgium): δ13C (%) = ( 13C/ 12C)sample/( 13C/ 12C)standard <sup>−</sup> 1 [20], where the 13C/ 12C ratio of the sample refers to plant material and the 13C/ 12C ratio of the standard is calibrated against the international standards from Iso-Analytical (Crewe, Cheshire, UK). The carbon isotope discrimination (Δ13C) of kernels was calculated as: <sup>Δ</sup>13C (%) <sup>=</sup> (δ13Ca <sup>−</sup> <sup>δ</sup>13Cp)/[1 <sup>+</sup> (δ13Cp)/1000], where a and p refer to air and the plant, respectively [20]. <sup>δ</sup>13Ca from the air was taken as <sup>−</sup>8.0%.

### 2.2.2. Thermography

Thermal infrared images were taken using a portable infrared camera (i40, FLIR Systems, Sweden), at the soft dough (Z85) phenological stage. This camera provides images of 120 × 120 pixels (every pixel shows a temperature value) and has an uncooled infrared detector (microbolometer) in the spectral range from 7.5 to 13 microns. Infrared images were taken at ±2 h from the zenith (12:00 to 16:00 h), at a position of 45◦ from the horizontal, 0.5 m above the plant canopy, and a 3 m distance from the plot. Images were filtered using a process of interactive segmentation to exclude foreign matter from the picture (i.e., soil, weeds, neighboring plots, and air) using a custom MATLAB code [38]. To avoid surrounding plot noise, only the center of the image (30 × 30 pixels) was analyzed with a temperature frequency histogram (percentile level). The hottest and coldest pixels were eliminated, taking as a threshold the percentiles 1 downwards and 97.5 upwards, respectively. The remaining pixels were used to calculate the average canopy temperature (Tc), while the air temperature (Ta) was recorded from the AWS at the precise time the image was taken. Finally, Tc and Ta were used to calculate the SDD (◦C) [39,40].

### *2.3. Statistical Design and Data Analysis*

The experimental design for the trials at Cauquenes and Santa Rosa in seasons 2011 and 2012 was an alpha-lattice with two replicates; for this study, just one replicate (*n* = 384 genotypes) was assessed by thermography in each trial. For seasons 2014 and 2015, the experimental design was a random block with four replicates (16 genotypes; *n* = 64).

Correlations (x versus y) were performed through regression analysis: (1) genotype values: SDD versus GY and Δ13C; (2) environmental values: SDD, Tmax, VPDmax, and SWC versus GY and Δ13C; and (3) environmental values: VPDmax, Tmax, and SWC versus SDD.

Using the environmental (Tmax, VPDmax, and SWC), phenological (days between stages), physiological (Δ13C and SDD) and productive (GY) information (Table 1), a clustering analysis was performed to verify whether the two water regimes evaluated (i.e., FI and WS) were grouped together, within and between seasons and locations, which is important in modeling and validation procedures. This consisted of a series of steps necessary to achieve a correct execution of the analysis methodology. For this study, clustering and hierarchical clustering were used, with the purpose of grouping the different environments studied. A group was defined as the set of elements that have a greater degree of similarity between the objects that belong to the same set [41]. The steps performed in the analysis were the following: obtaining the data, eliminating the columns that do not provide information to the grouping model, normalizing the data, then applying a method of hierarchical clustering using the "ward.D2" method [42] as a grouping form, and plotting the Euclidean distance between elements as a dendrogram. For clustering of groupings, a tree cluster was considered, which uses the Euclidean distance to identify the closeness of the nodes (environmental data points). In addition, this algorithm applies the principal component analysis (PCA) method to show the results with greater clarity [41]. The "ward.D2" was set to find two and three main data groups.

All statistical analysis was performed using R 3.0.0 [43].

### **3. Results**

### *3.1. Environmental Conditions, Grain Yield, Carbon Isotope Discrimination, and Stress Degree Days*

In general terms, the environmental conditions (Tmax, VPDmax, and SWC) varied according to the seasons, both within and between FI and WS conditions (Table 1, Figures S1 and S2). Under each water supply condition, minimum and maximum values from sowing to harvest were (Table 1): Tmax: 19.1 and 23.5 ◦C (FI) and 19.1 and 25.4 ◦C (WS); VPDmax: 1.35 and 1.92 kPa (FI) and 1.35 and 2.39 kPa (WS); and SWC: 198.3–542.7 mm (FI) and 180.4–418.8 mm (WS).


discrimination

 in kernels (Δ13C), and stress degree

> **Table 1.** Dates of sowing, anthesis, grain filling, and harvest, and mean values of grain yield (GY), carbon isotope

day (SDD = Tc − Ta), determined

 at the soft dough stage (Z85), for wheat genotypes grown under full irrigation (FI) and water stress (WS) conditions, at Cauquenes

Grain yield under WS conditions was 45% lower than under FI (Table 1). Also, the range of variation among seasons was much greater under WS (1.68–8.13 t ha<sup>−</sup>1) compared to FI (8.03–9.9 t ha−1). The Δ13C data showed lower values (10.5%) and higher variability under WS conditions compared to FI conditions (Table 1). The average SDD was much higher (5.3 fold) under WS and had greater variability compared to FI conditions (Table 1).

### *3.2. Relationships between Grain Yield and Canopy and Ambient Temperatures in 384 Wheat Genotypes*

The relationship between GY and SDD of the 384 genotypes was negative and highly significant in 2011 (r2 = 0.52–0.68; *p* < 0.001) (Figure 1A). However, when SSD was compared with Δ13C, the determination coefficients (r2) were significant only in FIsr and WSsr (0.22 and 0.32, respectively), but not in WSc (Figure 1C). During the second season, r2 values were much lower and not significant for both GY (r2 = 0.03–0.12; *p* > 0.05) (Figure 1B) and Δ13C (r2 = 0.0002–0.04; *p* > 0.05) (Figure 1D). In terms of environmental conditions, both seasons showed important differences; Tmax (◦C) values were higher in 2011 (WSc = 25.4, FIsr and WSsr = 23.4) than in 2012 (WSc = 20.6 and FIsr = 21.5). Consequently, VPDmax (kPa) in 2011 (WSc = 2.4, FIsr and WSsr = 1.8) was higher than in 2012 (WSc and FIsr = 1.5). In the case of SWC (mm), the values in 2011 (WSc = 381.7, FIsr = 550.7 and WSsr = 507.8) exceeded the values recorded in 2012 (WSc = 256.9 and FIsr = 399.0). Also, SDD (◦C) was different between seasons, being higher in 2011 (WSc = 12.3, FIsr = 1.8, and WSsr = 6.4) than in 2012 (WSc = 2.2 and FIsr = −1.7).

**Figure 1.** Relationship between stress degree day (SDD = Tc − Ta; where Tc is crop temperature and Ta air temperature, both measured at the soft dough stage (Z85) versus grain yield and carbon isotope discrimination in kernels for 384 spring bread wheat genotypes grown under two water regimes (full irrigation (FI) and water stress (WS)), in two locations (Santa Rosa (sr) and Cauquenes (c)), during the 2011 ((**A**,**C**) respectively) and 2012 seasons ((**B**,**D**), respectively). Regression lines and equations are presented for each water regime and location (determination coefficients are also included).

### *3.3. Environmental E*ff*ects on Grain Yield, Carbon Isotope Discrimination, and Stress Degree Day*

The average GY of wheat genotypes under FI and WS conditions indicated different responses to environmental variables (Figure 2). Under WS conditions, GY decreased exponentially as SDD, Tmax, and VPDmax increased, which was not the case under FI conditions (Figure 2A–C). Similarly, Δ13C

also decreased incrementally in SDD, Tmax, and VPDmax (Figure 2E–G). No significant relationships were found between SWC and GY or Δ13C (Figure 2D,H).

**Figure 2.** Average grain yield (**A**–**D**) and carbon isotope discrimination in kernels (Δ13C; **E**–**H**) of wheat genotypes growing in ten environments, in relation to the stress degree day (SDD = Tc − Ta; Tc is crop temperature and Ta air temperature, both measured at the soft dough stage Z85); **A** and **E**), the seasonal averages of daily maximum temperature (Tmax; **B** and **F**) and maximum vapor pressure deficit (VPDmax; **C** and **G**) and the soil water content between 0 and 50 cm depth (SWC; **D** and **H**). The environments corresponded to the water regime applied (full irrigation (FI) and water stress (WS)), the trial location (Santa Rosa (sr) and Cauquenes (c)), and growing seasons (2011, 2012, 2014, and 2015); the trial code is a combination of these factors. Regression lines and equations are presented for each water regime (determination coefficients are also included).

In relation to the environmental conditions during the study, when all the environments were combined (Table 2), SDD was only correlated with Tmax (*r* = 0.64; *p* < 0.05) and VPDmax (*r* = 0.80; *p* < 0.01). Under each water regime, close and significant relationships were found between SDD and Tmax and VPDmax, but only in plants growing in WS conditions (Figure 3). The relationship between SDD and SWC was not significant under either WS or FI conditions (Figure 3C).

Pearson correlation analysis (Table 2) showed that mean values of GY in the ten environments were highly correlated with SDD (*r* = <sup>−</sup>0.81; *p* < 0.01) and <sup>Δ</sup>13C (*r* = 0.92; *p* < 0.01). Also, <sup>Δ</sup>13C was negatively correlated with SDD (*r* = <sup>−</sup>0.71; *p* < 0.05). In concordance with this, GY and <sup>Δ</sup>13C were primarily affected by Tmax and VPDmax but not by SWC (Figure 2B–F,C–G,D–H, respectively).

**Table 2.** Pearson's correlation matrix for stress degree day (SDD = Tc − Ta; Tc is crop temperature and Ta air temperature, both measured at the soft dough stage Z85), grain yield (GY), carbon isotope discrimination in kernels (Δ13C), and seasonal averages of daily maximum temperature (Tmax), maximum vapor pressure deficit (VPDmax), and soil water content between 0 and 50 cm depth (SWC; between anthesis and mature grain). Data from the water regime applied (full irrigation and water stress), the trial location (Santa Rosa and Cauquenes), and the evaluated season (2011, 2012, 2014, and 2015).


\* Statistically significant (*p* < 0.05); \*\* Highly statistically significant (*p* < 0.01).

**Figure 3.** Relationships between the stress degree day (SDD = Tc − Ta; Tc is crop temperature and Ta air temperature, both measured at the soft dough stage Z85), and the seasonal averages of daily maximum vapor pressure deficit (VPDmax) (**A**), maximum temperature (Tmax; **B**) and soil water content between 0 and 50 cm depth (SWC; (**C**)). Mean values were the average of all genotypes growing in the particular environment according to the water regime (full irrigation (FI) and water stress (WS)), the trial location (Santa Rosa (sr) and Cauquenes (c)), and growing seasons (2011, 2012, 2014, and 2015). Regression lines and equations are presented for each water regime (determination coefficients are also included).

Finally, when the environmental, phenological, physiological, and productive information (Table 1) was included to generate the two- and three-group cluster dendrograms and plots (Figure 4A,B and Figure 4C,D, respectively), the main difference was found in the division of the first branch of the less stressful environment (green lines in Figure 4A) at the height of the first knot of the most stressful environment (origin of the blue and green lines in Figure 4B). Differences between the cluster plots were according to changes in the cluster dendrograms; the three groups were (Figure 4B): (i) lowest environmental limitations: FIsr 2012, FIsr 2014, FIsr 2015, and WSsr 2015; (ii) intermediate environmental limitations: WSc 2012, Fic 2015, and WSc 2015; and (iii) highest environmental limitations: WSc 2011, FIsr 2011, and WSsr 2011. The cluster plot that explains 61.3% of the variance (Figure 4C) shows a clear distance or separation between the groups with the lowest and the highest environmental constraints (green and red colors in Figure 4, respectively). In the three-group cluster plot, two of the groups overlap. However, even though WSc 2011, FIsr 2011, and WSsr 2011 had the highest SWC values, they also presented, on average, the greatest VPDmax, Tmax and SDD but the lowest GY (Table 1).

**Figure 4.** Cluster dendrogram ((**A**) two groups and (**B**) three groups) and plot ((**C**) two groups and (**D**) three groups) for the general characterization of the assessed environments according to the water regime applied (full irrigation (FI) and water stress (WS)), the trial location (Santa Rosa (sr) and Cauquenes (c)), and the evaluated season (2011, 2012, 2014, and 2015); the trial code is a combination of these factors. Data included the phenological (dates), productive (grain yield—GY), physiological (carbon isotope discrimination in kernels (Δ13C) and the stress degree day measured at the soft dough stage Z85 (SDD)), and environmental information (seasonal averages of daily maximum temperature and maximum vapor pressure deficit, and the soil water content between 0 and 50 cm depth). In the case of GY, Δ13C, and SDD, the mean values analyzed were the average of all genotypes growing in the particular environment.

### **4. Discussion**

### *4.1. Environmental E*ff*ects on Grain Yield and Carbon Isotope Discrimination*

Tolerance to WS usually implies some improvement or maintenance of metabolic processes that enables the plant to regulate cell water status and maintain leaf turgor under stressful conditions. One of the first mechanisms involved in reducing water loss by transpiration is stomatal control, which partially closes the stomata, thus affecting carbon assimilation and storage [44]. This gas exchange limitation between the atmosphere and the substomatal cavity is primarily driven by the surrounding environmental conditions (e.g., water availability, ambient temperature, relative humidity, wind speed, light intensity). To the extent that the diffusion of CO2 through the stomata is more restrictive, the carbon isotope discrimination (Δ13C) between 12C and 13C will also be reduced, increasing the proportion of 13C [24]. Therefore, in a particular environment, Δ13C at the grain level provides an integrated assessment of the transpiration efficiency during the whole season [25,45]. As in other cereal studies [24,35,46–49], GY and Δ13C in the current work had a strong and positive association (*r* = 0.92) (Table 2). Additionally, the evaluated environmental conditions generated similar responses in GY and Δ13C (Figure 2), reaffirming the strong relationship that existed between these characters.

The SWC has also been used as an indicator of water stress in plants and is positively related to GY in wheat [25,50–52]. Working in the same species, Reference [53] evaluated the effect of water content in different soil profiles, concluding that a soil that was well irrigated throughout the first 50 cm of the profile obtained a greater yield and harvest index than a soil with dry upper layers. In the present work, even though FI environments always showed higher SWC than under WS in the same location and season (Figure S2A–D), the results did not show a significant correlation between SWC and GY (Table 2). Moreover, while the GY and Δ13C were both higher under FI than WS (Figure 2D,H), neither GY nor Δ13C were affected by increases in SWC (~200 to 550 mm) under either FI or WS.

Contrastingly, the environmental water demand (VPD; [54]), which is mainly driven by the ambient temperature and relative humidity, proved to influence both GY and Δ13C under WS but not in FI (Figure 2C,G). The combination of high temperatures and low relative humidity, which is frequently encountered in the late stages of the growing season in Mediterranean climates (e.g., Santa Rosa and Cauquenes), caused an increase in the VPD. References [55,56] have assessed the effect of environmental variables on wheat physiology and GY, proposing that a high VPD environment should vary between 2.5 and 3.9 kPa. Therefore, the average values of VPDmax found in the present study (1.35 to 2.39 kPa; Figure 2C,G) could be considered moderately low to moderately high, although maximum values reached as high as 6.34 kPa in WSc in 2011 (Figure S1B).

Several studies have proven that growing cereals under non-limiting water conditions but with high VPD values leads to reduced GY and Δ13C [21,55–57]. The present work, considering all measurements performed, shows a non-significant relationship between GY and VPDmax (Table 2). When FI and WS were analyzed separately, it was only the genotypes growing under WS that showed lower GY and Δ13C as VPDmax increased (Figure 2C,G) and the VPD had a higher association with GY than Δ13C (Table 2).

Likewise, the Tmax trends were similar to VPDmax (Figure 2B,F). However, despite the Tmax and VPDmax being relatively low (20.6 ◦C and 1.51 kPa, respectively) in the WSc 2012 trial, the lowest SWC (256.9 mm) was registered, especially after anthesis, and this generated a low GY (3.18 t ha−1) (Table 1 and Figure S2B). On the other hand, the WSc 2011 trial had a relatively adequate SWC (320 mm) between anthesis and grain filling (Figure S2B), but due to the late sowing date (Table 1), the plants were exposed to higher Tmax and VPDmax (25.4 ◦C and 2.39 kPa, respectively) (Table 1), reaching ~40 ◦C and ~6 kPa for Tmax and VPDmax, respectively (Figure S1A,B), resulting in this trial having the lowest GY (1.68 t ha<sup>−</sup>1).

Under high VPD, guard cell turgor may be decreased by direct evaporative losses from the guard cells and/or decreased water supply to the guard cells if the root or shoot hydraulic conductance is limiting [58], and this leads to a detrimental effect on plant production. Reference [56] tested the effect of VPD and ambient temperature on gas exchange and GY in wheat, finding that environments with high VPD (3.9 kPa) and high temperature (36 ◦C) increased respiration by up to 22% and decreased photosynthetic water-use efficiency by up to 64% compared to environments with high temperature and lower VPD (1.5 kPa). Indeed, environments with high VPD and temperature caused a reduction in leaf area and net assimilation of CO2; however, in the case of plants under the same conditions but without water restriction, there was no decrease in GY. The same authors showed that GY was reduced by 7% in environments with water stress compared to no stress, which is concordant with the findings of the present study, where at similar VPDmax and Tmax (between 1.5 and 2 kPa, and between 21 and 24 ◦C, respectively), plants grown under WS showed lower GY than plants under FI (Figure 2B,C). In FI conditions, the plants had a GY that was higher than 8 t ha−1, while in WS environments, the GY never exceeded that threshold; an exception was WSc 2015 (8.1 t ha<sup>−</sup>1), which was influenced by an abnormally rainy season ("El Niño" phenomenon; Figure S2D).

### *4.2. The Potential of Stress Degree Day to Estimate Grain Yield*

Stomatal closure causes a decrease in the transpiration rate, and as a consequence, there is a reduction in the cooling effect, which finally increases leaf/canopy temperature [10]. The reduction in the stomatal conductance could be a consequence of the limitations of the roots to absorb enough water to supply the atmospheric water demand [6]. Numerous studies have confirmed that the temperature of the canopy is associated with crop yield [59–61], as well as a series of physiological characteristics, including stomatal conductance [11], the hydric state of the plant [59], and the presence of deep roots.

In general, the present study establishes a negative and highly significant correlation (*r* = −0.81) between SDD and GY (Table 2). Analysis of the responses according to water regime (FI or WS) indicated that there was no significant relationship between GY and SDD in plants growing under FI conditions, but under WS conditions the correlation was moderately high (r2 = 0.59) (Figure 2A). The Δ13C followed the same pattern, but with lower determination coefficients (FI = 0.17 and WS = 0.34). When Reference [12] studied the relationship between GY and Tc in wheat genotypes grown with similar water regimes (FI and WS), they also found a stronger association under WS (r2 = 0.66) than FI (r2 = 0.58).

Despite similarities between the studies described above, there are also contradictory results for the relationship between SDD and GY. For example, References [62,63] found no significant relationships, whereas Reference [11] found a high and significant association in irrigated environments. These differences could be explained by the lower VPD registered in the studies of References [62,63] (~2.4 kPa) compared to that of Reference [11] (~5.5 kPa), with the latter case allowing a greater expression of the tolerance of each genotype to the environmental conditions. Although in the present work, there were trials that reached a VPDmax of 6.5 kPa (WSc 2011) (Figure S1), the seasonal averages were ~2.4 kPa.

Therefore, the low SDD values of plants growing under FI is likely due to the ability to meet the water demand of the air (VPD), thus maintaining a high transpiratory rate and allowing the plants to cool down their leaves; under this condition there is more CO2 fixation, explaining the higher yields in FI. In this kind of environment where soil water availability is enough to compensate for VPD, the plants do not need to express their water deficit tolerance mechanisms, which in this case means there is a lower SDD versus GY data dispersion, implying lower coefficients of determination.

Similar to the GY and Δ13C, SDD was more sensitive to the VPDmax and Tmax than to SWC, with the WS environment having the most effect on plant temperature. Interestingly, an SDD of 2 ◦C seems to be the threshold between FI and WS environments (Figure 2A,E); SDD averages in FI were lower than 2 ◦C, while in WS they were greater than 2 ◦C.

When the relationship between SDD and GY was studied in individual genotypes under contrasting environments (seasons 2011 and 2012), the association (r<sup>2</sup> values) between these two variables depended on the environment. While in 2011 the relationships in FIsr, WSsr, and WSc were negative and moderately high (r2 = 0.52, 0.59, and 0.68, respectively) (Figure 1A), the FIsr and WSc relationships in 2012 were low (r<sup>2</sup> = 0.03 and 0.12, respectively; WSsr was not sown in 2012) (Figure 1B). As seen before by the use of the average values per environment, the best determination coefficients were observed in more stressful conditions (FI < WSsr < WSc), likely associated with the higher trait-range during the first season; Fisr 2011, WSsr 2011, and WSc 2011 had a higher SDD data dispersion in relation to FIsr 2012 and WSc 2012 (Figure 1 and Figure S1E).

Except for the WSsr 2011 trial, the minimum SDD values of 2011 corresponded, approximately, to the maximums registered during 2012. As already explained, GY is influenced by SDD (Figure 2A,E, respectively), which in turn depends on the VPDmax and Tmax (Figure 3A,B). In this sense, although there was less SWC during the second season, Tmax and VPDmax were lower too, reaffirming that these last two variables would have a more significant impact on the transpiratory and cooling capacity than even the SWC.

Finally, because the main differences in the dendrograms between the two- and three-environment groups were found in the division of the first branch of the less stressful environment (green lines in Figure 4A) at the height of the first knot of the most stressful environment (origin of blue and green lines in Figure 4B), it is logical to think that there were at least three environmental conditions across the trials. In order to establish the possible differences between the superimposed groups (lower and intermediate environmental constraints; green and blue lines in Figure 4D), each trait average was contrasted (percentage of change) with the respective value in the higher environmental limitations group (2011 data; red line in Figure 4D). Using this as a form of normalization, it was possible to establish which traits had the largest and smallest differences between the superimposed groups; GY varied by 52.4% while SDD, Δ13C, Tmax, VPDmax, and SWC only varied within the range of 9.77 and 15.1% (data not shown), and these were traits that probably shared similar spatial coordinates in the cluster plot (Figure 4B).

For modeling purposes, it would then be desirable to cease grouping collected data according to text code treatments (e.g., FI and WS according to only the amount of applied water) and start associating them with the environmental conditions existing during the season (e.g., VPDmax). For an adequate estimation of GY and carbon isotope discrimination by thermal imaging, we will then need to use more complex models (e.g., tree-based neural networks) that allow us to identify the "type of environment" in which the collected data should be manipulated to generate and apply the appropriate model. Thus, a deeper environmental characterization would allow development of models with better fit and consistency between years.

### **5. Conclusions and Future Perspectives**

The ability to predict GY through the use of thermal images is highly variable and will not only depend on the amount of water stored in the soil profile, but also on other environmental variables such as VPDmax and Tmax. To the extent that better environmental characterization can be achieved, an objective and integral classification of the assessed environment should then be possible. Because the environmental information usually originates from a standard AWS, characterization of the environment at the canopy level or in the first few centimeters above it would also be an important consideration. This would help to generate models with better predictive capacity, thus improving the consistency between seasons.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/1424-8220/19/12/2676/s1, Figure S1: (**A**) Canopy temperature—air temperature (SDD; (A) at the early dough stage of grain filling (Zadoks Z83); (**B**) grain yield; (**C**) daily maximum temperature (from sowing to harvest; carbon isotope discrimination (Δ13C) in kernels; and (**D**) daily maximum VPD from sowing to harvest, for genotypes of wheat grown under full irrigation (FI) and water stress (WS) at Santa Rosa (sr) and Cauquenes (c) during four growing seasons (2011, 2012, 2014, and 2015). Box and whiskers show minimum, 25th percentile, median, mean, 75th percentile, and maximum values. Open symbols represent outlier data; Figure S2: Soil water content between 0 and 50 cm depth and precipitation according to the water regime applied (full irrigation (FI) and water stress (WS)), the trial location (Santa Rosa (sr) and Cauquenes (c)), and the evaluated seasons: 2011 (**A**), 2012 (**B**), 2014 (**C**), and 2015 (**D**); the trial code is a combination of these factors. Bars represent the precipitation and the arrows the phenological stages at Santa Rosa (orange) and Cauquenes (black). Solid arrows indicate anthesis and dashed arrows the early dough stage (Z85 from the Zadoks scale).

**Author Contributions:** S.R.-B., A.d.P., and G.A.L. contributed to the conception and design of the work. S.R.-B., A.M.M.-E., M.G., F.E., A.E., L.G.-M., C.P.-E., D.S., A.d.P. and G.A.L. performed acquisition, analysis, and

interpretation of the data. D.C. and I.M. was in charge of the management of the field experiments and evaluation of agronomic traits. S.R.-B., A.d.P. and G.A.L. collaborated to generate and validate the version for publication.

**Funding:** This work was supported by the National Commission for Scientific and Technological Research CONICYT (Beca doctorado nacional 21130514, FONDECYT Nº1110678, FONDEF IDEA 14I10106, and 14I20106, PAI 781413006) and the Universidad de Talca, Chile (Beca doctorado and the research programs "Adaptation of Agriculture to Climate Change-A2C2" and "Núcleo Científico Multidisciplinario").

**Conflicts of Interest:** The authors declare that the work and publication was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **UAV-Borne Dual-Band Sensor Method for Monitoring Physiological Crop Status**

### **Lili Yao, Qing Wang, Jinbo Yang, Yu Zhang, Yan Zhu, Weixing Cao \* and Jun Ni \***

National Engineering and Technology Center for Information Agriculture, Key Laboratory for Crop System Analysis and Decision Making, Ministry of Agriculture, Jiangsu Key Laboratory for Information Agriculture, Nanjing Agricultural University, Nanjing 210095, Jiangsu, China; 2017201083@njau.edu.cn (L.Y.); 2016101037@njau.edu.cn (Q.W.); 2018101166@njau.edu.cn (J.Y.); zhangyu@njau.edu.cn (Y.Z.); yanzhu@njau.edu.cn (Y.Z.)

**\*** Correspondence: caow@njau.edu.cn (W.C.); nijun@njau.edu.cn (J.N.); Tel./Fax: +86-25-8439-6593 (J.N.)

Received: 29 December 2018; Accepted: 14 February 2019; Published: 17 February 2019

**Abstract:** Unmanned aerial vehicles (UAVs) equipped with dual-band crop-growth sensors can achieve high-throughput acquisition of crop-growth information. However, the downwash airflow field of the UAV disturbs the crop canopy during sensor measurements. To resolve this issue, we used computational fluid dynamics (CFD), numerical simulation, and three-dimensional airflow field testers to study the UAV-borne multispectral-sensor method for monitoring crop growth. The results show that when the flying height of the UAV is 1 m from the crop canopy, the generated airflow field on the surface of the crop canopy is elliptical, with a long semiaxis length of about 0.45 m and a short semiaxis of about 0.4 m. The flow-field distribution results, combined with the sensor's field of view, indicated that the support length of the UAV-borne multispectral sensor should be 0.6 m. Wheat test results showed that the ratio vegetation index (RVI) output of the UAV-borne spectral sensor had a linear fit coefficient of determination (R2) of 0.81, and a root mean square error (RMSE) of 0.38 compared with the ASD Fieldspec2 spectrometer. Our method improves the accuracy and stability of measurement results of the UAV-borne dual-band crop-growth sensor. Rice test results showed that the RVI value measured by the UAV-borne multispectral sensor had good linearity with leaf nitrogen accumulation (LNA), leaf area index (LAI), and leaf dry weight (LDW); R2 was 0.62, 0.76, and 0.60, and RMSE was 2.28, 1.03, and 10.73, respectively. Our monitoring method could be well-applied to UAV-borne dual-band crop growth sensors.

**Keywords:** CFD; airflow field test; monitoring method; spectral sensor; crop growth

### **1. Introduction**

Accurate management of crop water and fertilizer in crop fields is an important prerequisite for ensuring high yield and quality of crops, sustainable use of cultivated land, and healthy development of the environment [1]. High-throughput, accurate, and real-time acquisition of crop-growth information is an important basis for the accurate management of crop water and fertilizer [2]. Monitoring technology based on the characteristics of the reflection spectrum has the advantages of being nondestructive, providing real-time information, and delivering high-efficiency analysis. Thus, it is widely used in crop-growth parameter acquisition. Various research institutions have developed spectral sensors to monitor crop growth, providing effective technical support for field-crop production management [3–5].

In 2004, Moya et al. [6] designed a chlorophyll-fluorescence test device, which uses sunlight as a light source. During the test, the leaf blade was required to be in a relatively static state, and the reflection=spectrum information of chlorophyll fluorescence in the 510 and 570 nm bands could be obtained from a short distance. Quantitative inversion of chlorophyll fluorescence could be achieved by the physiological reflectance index (PRI) calculated from the test results. In 2010, Ryu et al. [7] developed a normalized vegetation index spectroscopy sensor to achieve the inversion of vegetation-growth indicators. It has its own LED light source for illumination. When the test height was less than 3 m, the best test results could be achieved. Several commercial instruments are currently available for crop-growth monitoring. For example, the Greenseeker spectral sensor designed by Trimble USA can obtain the spectral information of reflection characteristics in crop canopy red and near-infrared bands and calculate the relevant vegetation index. For this device, the test distance should be kept at a height of 60–180 cm from the canopy [8,9]. The ASD FieldSpec4 spectrometer developed by American ASD Company can realize reflection-spectrum acquisition of the 350–2500 nm crop-canopy band, the data information is rich, and accuracy is high. Relying on sunlight as a light source, the test needs to be carried out at noon without wind or clouds, the test height should be kept between 30–120 cm, and the crop canopy needs to remain relatively static [10–12]. Holland Scientific designed and developed active light-source spectral sensors for monitoring crop growth, such as Crop Circle and Rapidscan. These instruments can emit light and receive reflection-spectrum information of a crop canopy in real time through their own light-source system. The test height should be kept within 3 m from the canopy, and the canopy structure needs to maintain a steady state [13–17]. These spectral crop-growth monitoring devices are simple in operation, easy to carry, high in test accuracy, intuitive in results, and can provide nondestructive access to crop-growth information, but they also have shortcomings, such as a small monitoring range, high labor intensity, and discontinuous monitoring, which cannot provide high-throughput information and real-time decision-making for large-scale crop-production management in the field.

Unmanned-aerial-vehicle (UAV) operation is highly efficient, flexible, easy, and has strong terrain applicability. Thus, UAVs are widely used in agricultural information-acquisition platforms [12], but so far there are few studies on the use of UAV-borne spectral sensors for monitoring crop growth. Krienke et al. attempted to measure the normalized vegetation index of lawn using a MikroKopterOktoKopter XL UAV equipped with a RapidScan CS-45 spectroscopy sensor. Flight height was maintained at 0.5–1.5 m above the lawn. However, test results were poor because the disturbance of the turf canopy from the downwash airflow field was ignored [18]. Shafian et al. mounted an image sensor on a fixed-wing UAV and collected the image information of a sorghum planting area at an altitude of 120 m. Pix4D software was used to splice, correct, and extract vegetation indices from each acquired image. The leaf area index (LAI) value of sorghum was simultaneously sampled and tested. The results show that the Normalized Difference Vegetation Index (NDVI) value extracted from the acquired image information had a higher linear fit with the sorghum LAI value obtained with the sampling test [19]. Schirrmann et al. used a UAV to work at an altitude of 50 m and obtain an RGB image of the wheat growth period. The acquired image was calibrated by Agisoft lens software, the distortion correction was modeled by Brown distortion model, and the final image mosaic and surface model generation results were improved by radiation pretreatment, from which the information such as crop coverage and plant height were extracted [20]. Zheng et al. used an OktoXL UAV equipped with a Cuubert UHD 185 hyperspectral camera to acquire hyperspectral images of a rice canopy at an altitude of 50 m. Images in the acquisition results were corrected using ENVI software. By comparing the synchronous test results of a ground-object spectrometer and the agronomic parameters obtained from laboratory chemical analysis, they proved that the image information acquired by the UAV platform equipped with imaging instruments could be used for the quantitative inversion of agronomic crop parameters [21]. Stroppianaet al. acquired a large number of multispectral images at a height of 70 m from the ground by using a 3DRobotics SOLO quadrotor UAV equipped with a Parrot Sequoia multispectral camera. By screening the acquired images, and then correcting and extracting the vegetation index, they proposed an automatic classification method for weeds and crops, which can be used for the classification and management of specific weeds in the field [22]. However, these studies simply installed image sensors on UAVs to obtain crop images from a high altitude. Although the influence of the UAV downwash airflow field is small, captured

images can only be stored in the memory. Scientific-research personnel are required to use special software for image correction, cropping, splicing, enhancement, and other offline processing, and then analyze the relationship between the images and crop-growth parameters. The process is complex, requires specialized knowledge, and cannot acquire information in real time, which is not conducive to popularization [23,24].

In this paper, we studied the dual-band crop-growth sensor independently developed by Nanjing Agricultural University [25]. First, we investigated the monitoring method of the UAV-borne dual-band crop-growth sensor based on its spectral-monitoring mechanism and structural-design features. Then, we analyzed the spatial-distribution characteristics of the airflow field under the low-altitude hovering operation of the UAV. Finally, we built a UAV-borne crop-growth monitoring system to achieve high throughput and real-time access to rice- and wheat-growth information.

### **2. Materials and Methods**

### *2.1. Test Equipment*

### 2.1.1. Dual-Band Crop-Growth Sensor

There is a certain relationship between crop growth and the spectral reflectance of the crop canopy. As shown in Figure 1, the reflectivity of a wheat canopy is relatively low when the band is under 710 nm. Reflectivity linearly rises in the 710–760 nm band, and reaches a comparatively high level in the 760–1210 nm band. Wheat-canopy reflectivity shows significant differences between different nitrogen levels for the 460–730 and 760–1210 nm bands. In the 460–730 nm band, spectral reflectivity is negatively correlated with the amount of nitrogen fertilizer, with reflectivity being the lowest at N5 and highest at N1. In the 760–1210 nm band, spectral reflectivity is positively correlated with the amount of nitrogen fertilizer, with reflectivity being the highest at N5 and lowest at N1. The 710–760 nm band is an apparent transition zone. The spectral reflectivity of the wheat canopy at above 1150 nm is not very susceptible to the amount of nitrogen fertilizer, and there is little difference between spectral reflectivity at N2–N5. According to currently available research, there is a good linear relationship between leaf nitrogen accumulation (LNA) and spectral-reflectivity changes near 550, and 600–700 and 720 nm; there is a good linear relationship between leaf dry weight (LDW), and spectral-reflectivity changes at 580–700 and 770–900 nm; the spectral-reflectivity change at 460–680 nm and near 810 nm is closely correlated the LAI. Considering the sensitive bands of the three agronomic parameters, crop growth can be well-inverted with the ratio vegetation index (RVI) index constructed using the 730 and 810 nm bands [26–28].

**Figure 1.** Characteristic curve of the spectral reflectivity of wheat canopy. (**a**) Multispectral reflectivity of wheat canopy under application of different amounts of nitrogen, and (**b**) characteristic spectral bands of agronomic wheat parameters.

The dual-band crop-growth sensor was designed by Nanjing Agricultural University, it is equipped with a dual-band detection lens, and its structure can be divided into an upward light sensor and a downward light sensor, as shown in Figure 2. The upward light sensor is used to acquire sunlight-radiation information at 730 and 815 nm wavelengths, and the downward light sensor was configured to receive crop-canopy-reflected light-radiation information of a corresponding wavelength. The structure is shown in Figure 2. It is packaged in a nylon case and weighs 11.34 g, with a test field of view of 27◦. The crop-canopy RVI can be output in real time, and wireless transmission can achieve long-distance transmission and analysis.

With sunlight as the light source, dual-band crop-growth sensors require the testing object (i.e., wheat canopy) to remain relatively static so that the canopy presents the Lambertian reflection characteristics and the field of view of the sensor points vertically downward. During measurement, optical-radiation energy is converted to electric-energy signals by the sensor. Therefore, to ensure high sensitivity, measuring height should be maintained at 1.0–1.5 m above the canopy. The principle is shown in Figure 2.

**Figure 2.** Introduction of dual-band crop-growth sensor. (**a**) Sensor; (**b**) sensor structure; (**c**) schematic diagram of dual-band crop-growth sensor test.

### 2.1.2. ASD FieldSpec HandHeld 2 Handheld Spectrometer

Developed by American Analytical Spectral Devices (AS, made by Advanced Systems Development, Inc., Alexandria, VA, America), the ASD FieldSpec HandHeld 2 handheld spectrometer can be used for the reflection-spectrum acquisition of different objects such as crops, marine organisms, and minerals. The device has the advantages of portability, simple operation, and accurate results. Test wavelength range is 325–1075 nm, wavelength accuracy is ±1 nm, spectral resolution is less than 3 nm, and test field of view is 25◦ [29,30].

### 2.1.3. LAI-2200C Vegetation Canopy Analyzer

The LAI-2200C (Made by LI-COR, Lincoln, NE, USA) is a vegetation-canopy analyzer manufactured by LI-COR, United States. The analyzer is light in weight, consumes little power, and is very suitable for outdoor measurements. In addition, the analyzer can work independently, perform unattended long-term continuous measurement, and automatically record data. The measurement principle is to measure the transmitted light at five angles above and below the vegetation canopy by using the "fisheye" optical sensor (which has a 148◦ vertical field of view and a 360◦ horizontal field of view), and calculate canopy-structure parameters such as LAI, average leaf dip angle, void ratio, and aggregation index by using the radiation-transfer model of a vegetation canopy [31–33].

### 2.1.4. Three-Dimensional Airflow Field Tester

The three-dimensional (3D) airflow field tester (South China Agricultural University, Guangzhou, China) uses three wind-speed sensors to test X-, Y-, and Z-axial airflow velocity. The test results are transmitted to a computer through the Zigbee module in real time. The power consumption of the sensor is little, and it can maintain continuous operation for a long time. The center point of the three test axes was fixed to a height of 60 cm from the ground, and the distance from each wind speed sensor to the center point was 15 cm. The three test axes were kept perpendicular to each other. The structure is shown in Figure 3.

**Figure 3.** Structure of three-dimensional airflow field tester. (**a**) X-axial velocity test axis; (**b**) Y-axial velocity test axis; (**c**) Z-axial velocity test axis.

### *2.2. Research Methods*

When the dual-band crop-growth sensor is measuring, test height is required to maintain a distance of 1–1.5 m from the crop canopy, and the crop canopy must remain relatively static, exhibiting Lambertian reflection characteristics. However, while the UAV is hovering at a low altitude, rotors rotate at a high speed, which causes the surrounding airflow to shrink and contract, forming an airflow field; this airflow field acts on the crop canopy, causing disturbance to it, destroying the Lambertian reflection characteristics of the crop canopy, affecting the test results, and even preventing completion of the test. Therefore, when a UAV is equipped with a dual-band crop-growth sensor for crop-growth monitoring, it is necessary to consider the disturbance effect of the downwash airflow field on the crop canopy.

To solve this problem, we analyzed the spatial distribution of the UAV rotor downwash airflow field by using computational fluid dynamics (CFD) and 3D airflow field testers. We then determined an acceptable deployment location for the dual-band crop-growth sensor based on the surface velocity and distribution range of the airflow field.

### 2.2.1. Airflow-Field Numerical Simulation

CFD is a branch of fluid mechanics. It uses computers as tools and applies various discrete mathematical methods to conduct numerical experiments, computer simulations, and analytical studies on various fluid-mechanics problems. The advantage of CFD lies in its ability to simulate the experimental process from basic physics theorems instead of expensive fluid-dynamics experiment equipment. According to the specific process of CFD numerical simulation analysis [25], we obtained 3D-sized data of the DJI phantom drone (a type of UAV) with a 3D scanning system, converted the 3D information of the UAV into a digital signal that could be processed by the computer, and used CFD to perform UAV mesh generation and numerical solution. The second-order upwind style was selected for calculation to improve accuracy [34–36]. The 3D scanning technology was only used to measure the profile data of the UAV fuselage and rotor, and construct a 3D UAV model. This is not a 3D model of the crop-canopy structure. The specific calculation process is shown in Figure 4.

**Figure 4.** Computational fluid dynamics (CFD) mesh generation and numerical solution process.

Physical parameters such as UAV flight area, and rotor-rotation speed and direction, were set by CFD software, and UAV hovering-operation-state simulation analysis was carried out. In the grid simulation area with a diameter of 1.2 m and a height of 1.85 m, the velocity of each grid node is composed of the X-axle, Y-axle, and Z-axle velocity components. The flying height of the UAV was set to 1 m from the ground, and the ground-velocity nephogram distribution result was displayed by CFX's own post processing module, as shown in Figure 5.

**Figure 5.** Ground-velocity nephogram distribution results. (**a**) X-axial velocity distribution nephogram; (**b**) Y-axial velocity distribution nephogram; (**c**) Z-axial velocity distribution nephogram; (**d**) combined velocity distribution nephogram.

Velocity intensity was analyzed, as shown in Figure 5. When the UAV was hovering at a height of 1 m from the ground, the Z-axial velocity component was much larger than the X-axial and Y-axial velocity components. The combined velocity depends mainly on the axial velocity component. The size of the axial velocity nephogram was elliptical, the central region had the highest wind speed, and it gradually decreased toward the periphery. The results of the combined velocity nephogram distribution show that, due to the opposite rotation of the two pairs of rotors of the quadrotor UAV, forward-rotating rotors generate a downwash flow, so the region with the highest wind speed is distributed directly below the two forward-rotating rotors, wind speed gradually decreases toward the periphery, and is also distributed in an elliptical shape. The long semiaxis of the region is about 0.35 m, and the short semiaxis is about 0.3 m.

### 2.2.2. Actual Test of Airflow Field

In order to verify the numerical simulation analysis results of the airflow field, the 3D airflow field testers were used to carry out a real-world test of the downwash airflow field under the hovering operation state of the UAV. Nine 3D airflow field testers were used to test the X-axial, Y-axial, and

Z-axial velocity components of the downwash airflow field. The testers were arranged in an array structure of three rows and three columns at equal intervals. The UAV was hovering and flying at a vertical position above the center of the array. Flight height was 1 m from the array plane. When the distance between adjacent testers was 0.6 m, the wind-speed data collected by the testers at the edge of the array were 0 m/s. Therefore, tester spacing was adjusted to 0.5 m, so testers at the edge of the array collected nonzero data, which met the test requirements. In the test, after the flight attitude of the UAV was stabilized, each three-dimensional airflow field tester stopped the test after collecting 100 datasets. The test process is shown in Figure 6.

**Figure 6.** Actual test of downwash flow field.

To reduce the error, the 10 maximum and 10 minimum values were removed from the 100 collected datasets before calculating the average value from the remaining data. The obtained result was solved by interpolation using the four-point spline-interpolation (V4) algorithm. When the adjacent tester spacing was 0.6 m, the edge-tester test result was 0 m/s. Therefore, the interpolation boundary was set to 0.6 m away from the center. The V4 interpolation algorithm is also called the interpolation algorithm based on biharmonic Green's functions. The difference surface is a linear combination of Green's functions centered on each sample. The surface is passed through various points by adjusting the weight of each point. Green's function of the spline satisfies the following biharmonic equation:

$$\frac{d^4\phi}{dx^4} = 6\delta(x) \tag{1}$$

The specific solution of Equation (1) is

$$\phi(\mathbf{x}) = \left| \mathbf{x} \right|^3 \tag{2}$$

When Green's function is used to interpolate *N* data points, the problem of *wi* interpolation at *xi* is

$$\frac{d^4w}{d^4} = \sum\_{j=1}^N \theta \alpha\_j \delta(x - x\_j) \tag{3}$$

$$w(x\_i) = w\_i \tag{4}$$

The specific solution to Equations (3) and (4) is that Green's function is linearly combined around each data point, eliminating the need for a uniform solution.

$$w(\mathbf{x}) = \sum\_{j=1}^{N} \alpha\_j |\mathbf{x} - \mathbf{x}\_j|^3 \tag{5}$$

Green's function of the plane space is shown in Table 1:


**Table 1.** Function in flat space.

Weight value *α<sup>j</sup>* is obtained by using the *x* values and *w*(*x*) of N points, and the interpolation results are obtained by substituting the weight value into Equation (5).

The data of each 3D airflow field tester were sorted, invalid data were eliminated, valid data were retained, and data results were analyzed and processed. According to the V4 interpolation principle and calculated by MATLAB software, the interpolation results of each 3D airflow field tester node data are displayed in Figure 7.

From the results of Figure 7, we can see that, when the UAV hovering flight height was 1 m from the test plane of the 3D airflow field testers, the Z-axial velocity component of the tester was much larger than the X-axial and Y-axial velocity components. The combined velocity mainly depends on the axial velocity component. In the distribution range, the axial velocity component is elliptical, and the center velocity is the largest and gradually decreases toward the periphery. The combined-velocity distribution results show that the influence range of the UAV downwash airflow field was also elliptical, the center-point velocity was the largest, and peripheral speed was gradually reduced. The long semiaxis of the affected area was about 0.45 m, and the short semiaxis was about 0.4 m. Test results are consistent with the CFD numerical simulation results.

**Figure 7.** *Cont*.

**Figure 7.** Three-dimensional tester interpolation results when the DJI phantom drone (unmanned aerial vehicle, UAV) hover-flight height is 1 m from 3D airflow field testers. (**a**) X-axial velocity profile; (**b**) Y-axial velocity profile; (**c**) Z-axial velocity profile; (**d**) combined velocity profile.

### 2.2.3. Dual-Band Crop-Growth-Sensor Deployment Method

According to CFD numerical simulation analysis and the actual test results of the 3D airflow field, combined with the test field of view of the dual-band crop-growth sensor, when the dual-band crop-growth sensor was deployed 60 cm away from the UAV rotors, the test area of the crop canopy retained Lambert characteristics, and measurement results were not affected by the airflow field, enabling normal testing. Therefore, we here designed a carbon-fiber sensor support with a length of 60 cm. One end of the support was fixed under the UAV spiral wing, and the other end extends outward along the UAV arm. The sensor was fixed to the rear end of the support through a mechanical structure. The support is connected to the UAV by a cantilever beam structure. We also designed supports of other lengths for experimental comparison.

In order to avoid the vibration impact of rotor high-speed rotation on the dual-band crop-growth sensor and the support during the flight, a damping rod was designed for shock absorption in order to maintain the stable state of the support and the dual-band crop-growth sensor. The support and damping rod were fixed with a triangular structure to improve the overall stability and shock resistance, as shown in Figure 8. The angle between support and damping rod is very important for the stability of the structure and the balance performance of the aircraft. Therefore, the optimal value of the angle was calculated by static equation analysis.

**Figure 8.** Design of UAV support and damping rod. (**a**) Support; (**b**) damping rod.

Taking the sensor support as the research object, the analysis diagram of the force sustained by the support is shown in Figure 9.

**Figure 9.** Analysis of the force sustained by UAV sensor support. (**a**) Schematic diagram of the structure of the support and damping rod; (**b**) analysis of the force sustained by each part of the support.

In Figure 9, AB is the sensor support, CD is the support damping rod, points C and D are the fixed points of the sensor support and the damping rod at the end of the UAV, and the force sustained by the sensor support is analyzed by the following static equations:

$$F = -F\_{By} + F\_D \sin a\tag{6}$$

$$F\_D \cos \mathfrak{a} = -F\_{Bx} \tag{7}$$

$$Fl = F\_D \sin \alpha \cdot \frac{h}{\tan \alpha} \tag{8}$$

Simultaneous calculations were performed on Equations (6)–(8) to obtain the following:

$$F\_D = \frac{Fl}{h\cos\alpha} \tag{9}$$

$$F\_{Bx} = -\frac{Fl}{h} \tag{10}$$

$$F\_{By} = F\left(\frac{l}{h}\tan a - 1\right) \tag{11}$$

In Equations (6)–(11), *FD* is the internal force of CD, *FBx* and *FBy* are the reaction forces in the horizontal and vertical directions of fixed-point B, respectively, *α* is the angle between AB and CD, *h* is the height difference between fixed points B and C, and *l* is the length of AB.

When the length of the UAV support is 60 cm, the value of *α* ranges from 9.5◦ to 90◦. Since a certain length of the support needs to be used for fixing the UAV and dual-band crop-growth sensor, the actual value range of *α* is 13◦–90◦. From the derivation results of Equations (9)–(11), it can be seen that *FD* and *FBy* decrease with the decrease of *α* , and the smaller the values of *FD* and *FBy* are, the more stable the force sustained by the support structure is. Therefore, the optimal angle between sensor support and damping rod is 13◦.

### *2.3. Field Trial*

### 2.3.1. Test Design

Field Trial 1 was conducted at the Baipu Town (32◦14 58.88 N 120◦45 44.26 E) test base in Rugao City, Jiangsu Province, China, from February to May 2016. The test varieties were Ningmai 13 (V1) and Huaimai 33 (V2), three different gradients of nitrogen fertilizer treatment were set up, which were N0 (0 kg/hm2), N1 (180 kg/hm2), and N2 (270 kg/hm2), and each variety was repeated three times. Each planting area was 30 m2 (5 × 6 m). In addition, the application rate of phosphate fertilizer was 120 kg/hm2, and the application rate of potassium fertilizer was 135 kg/hm2, which was applied once in the base fertilizer. Other cultivation-management measures were the same as those in general high-yield fields.

Field Trial 2 was conducted at the Lingqiao Township (33◦35 53.27 N 118◦51 11.01 E) Test Base in Huai'an City, Jiangsu Province, China, from July to October 2016. The test varieties were Nanjing 9108 (V1) and Lianjing 10 (V2), four different gradients of nitrogen fertilizer were applied, which were N0 (0 kg/hm2), N1 (120 kg/hm2), N2 (240 kg/hm2), and N3 (360 kg/hm2), each variety was repeated three times, and each planting area was 30 m<sup>2</sup> (5 × 6 m). In addition, the application rate of phosphate fertilizer was 105 kg/hm<sup>2</sup> and was applied once in the base fertilizer, the potassium fertilizer was 135 kg/hm2, the base fertilizer was applied 50%, and, at the early boot stage, application was 50%. The other cultivation-management measures were the same as those in general high-yield fields.

### 2.3.2. Test Method

Field Trial 1 was used to test whether the proposed monitoring method can effectively avoid the disturbance range of the UAV downwash airflow field when acquiring data from the crop canopy. Additionally, Field Trial 1 was used to verify the accuracy and stability of the dual-band crop-growth sensor test results. The experiment was carried out in the middle of the wheat jointing stage. The test was carried out on a clear, windless, and cloudless day. Test time was between 10:00 and 14:00. The UAV was flown 1 m above the wheat canopy, and, as shown in Figure 10, the dual-band crop-growth sensor was deployed in three different horizontal distances from the UAV rotors: 0 (i.e., directly below the UAV), 30, and 60 cm. The sensors determined the RVI value of the wheat canopy by measuring three random points in each subarea and repeating the measurement of each point three times to obtain an average value. The ASD FieldSpec HandHeld 2 was used to measure the RVI value of the wheat canopy at the same time.

**Figure 10.** Comparison of field flights with the conditions of three support lengths.

Field Trial 2 was used to evaluate the applicability and accuracy of UAV-borne spectral sensors for crop-growth parameters. The test was carried out in the tillering, jointing, booting, and heading stages of the rice, test weather was sunny and windless, and test time was between 10:00 and 14:00. In the test, the UAV was made to hover at a height of 1 m above the rice canopy in different test plots, and the dual-band crop-growth sensor was deployed at a horizontal distance of 60 cm from the UAV rotors to obtain the RVI value at the 730 and 815 nm bands of the rice canopy. Three points were randomly measured in each subarea, and the measurement of each point was repeated three times to obtain an average value. The FieldSpec HandHeld 2 and LAI2200 testers were synchronously used to obtain the RVI and LAI values of the rice leaf layer. At the same time, in parallel with the test, the rice sample was destructively sampled, and the sample was placed at 105 ◦C for 30 min for fixing, then baked at 80 ◦C to constant weight, and weighed to obtain the LDW. After the sample was pulverized, the LNA was determined by the Kjeldahl method.

### 2.3.3. Analysis of Field-Test Data

The field-test datasets were statistically analyzed with Microsoft Excel 2010 software; the correlation of the model was evaluated by the coefficient of determination (R2) and root mean square error (RMSE).

### **3. Results and Discussion**

### *3.1. Field-Test Results*

Figure 11 shows the test results of Field Trial 1, in which the dual-band crop-growth sensor was located at different positions relative to the UAV in the middle of the wheat-jointing stage. Parts a, b, and c of Figure 11 show the simple linear fitting results of the RVI test value of the UAV-borne spectral sensor when the length of the support was 0, 30, and 60 cm, respectively, and the RVI value of the handheld ASD FieldSpecHandHeld 2 spectrometer in the corresponding growth period. When the length of the support was 0 and 30 cm, the results were close, and R2 values were 0.63 and 0.66, respectively. When the length of the support was extended to 60 cm, the curve fitting degree was obviously improved; R<sup>2</sup> was 0.81, and RMSE was 0.38.

**Figure 11.** *Cont*.

**Figure 11.** Fitting curves for the sensor and ASD data. (**a**) Sensor located under the UAV; (**b**) sensor located 30 cm from the rotor; (**c**) sensor located 60 cm from the rotor.

It can be seen from the above test results that the farther the dual-band crop growth sensor was deployed from the UAV rotors, the better the correlation between the test data and the ASD test results. According to CFD numerical simulation results and 3D airflow field-test results, when the dual-band crop-growth sensor was deployed directly below the UAV, the sensor-test field of view was completely within the disturbance range of UAV rotor downwash airflow field. When the dual-band crop-growth sensor was deployed 30 cm away from the UAV rotors, the sensor-test field of view included both the disturbance and nondisturbance zone of the rotor downwash airflow field. The correlation of the data results improved slightly, but it was still not ideal. When the dual-band crop-growth sensor was deployed 60 cm away from the rotors of the UAV, the sensor-test field was completely in the nondisturbance zone, and the correlation of the data results was significantly improved. In summary, the downwash airflow field generated by the rotation of the UAV rotors has a certain influence on the results of the dual-band crop-growth sensor. The proposed UAV-borne spectral sensor crop-growth monitoring method can effectively target areas of the crop canopy outside the disturbance range of the UAV downwash airflow field.

In Field Trial 2, the flying height of the UAV was 1 m from the rice canopy, the dual-band crop-growth sensor was deployed 60 cm away from the UAV rotors, and measurements were taken throughout the entire growth period of rice. Figure 12 shows the linear fitting results of the RVI values obtained by our method, and the LNA, LAI, and LDW obtained from the field test and the indoor chemical analysis test. R<sup>2</sup> values were 0.62, 0.76, and 0.60, and RMSE values were 2.28, 1.03, and 10.73, respectively. Using the proposed UAV-borne spectral sensor crop-growth monitoring method, rice-growth parameters in the entire growth period could accurately be obtained.

**Figure 12.** Spectral model for the UAV-borne crop-growth monitoring system. (**a**) LNA–RVI fitting curve; (**b**) LAI–RVI fitting curve; (**c**) LDW–RVI fitting curve.

### *3.2. Discussion*

UAVs have the characteristics of simple operation and high efficiency. At present, there is little targeted research on monitoring crop growth by UAV-borne dual-band crop-growth sensors. Under a low-altitude hovering operation, the downwash airflow field generates strong disturbance on the crop canopy that disrupts the Lambertian reflection characteristics of the crop canopy and thus has a serious impact on the accuracy of the test, even causing the test to fail. Therefore, we used CFD numerical simulation and real-world 3D airflow field testing to analyze the spatial distribution of the downwash airflow field when the UAV is hovering at a height of 1 m above the crop canopy and determine the disturbance range on the crop canopy. Most of the current studies simply mounted the energy-type spectrum sensor on UAVs, lacking consideration for the disturbance influence of the crop canopy by the UAV downwash airflow field during the test [37]. Our study filled the gap in these studies. Our test showed that the influence range of the downwash airflow field of the UAV is elliptical, wind speed at the center point is the largest, and gradually decreases toward the periphery. The long semiaxis was about 0.45 m, and the short semiaxis was about 0.4 m. According to the distribution range of the downwash airflow field, we designed a support with a length of 0.6 m to deploy a dual-band crop-growth sensor so that the test field of view is extended beyond the distribution range of the downwash airflow field, and the disturbance effect of the downwash airflow field on the crop canopy was avoided. When the flight height of the UAV was kept at 1 m from the wheat canopy and the dual-band crop-growth sensor was deployed 0.6 m from the rotors, the linear fit R<sup>2</sup> of the test output RVI value and the handheld ASD Fieldspec2 spectrometer, the test RVI value was 0.81, and RMSE was 0.38. Therefore, the UAV-borne spectral sensor crop-growth monitoring method can effectively target areas of the crop canopy outside the disturbance range of the UAV downwash airflow field.

In the test process of the 3D airflow field testers, the wind-speed results of different dimensions measured by the testers were larger than the CFD numerical-simulation results. The main reason was that the test plane of the 3D airflow field testers was 60 cm away from the ground, and the arrangement was lattice. When the downwash airflow diffused downward through the lattice plane, the direction of the airflow changed. In the CFD numerical simulation analysis, the downwash airflow directly reached the ground plane, so the distribution states of the real-world test and numerical simulation differed. Although the analyzed UAV in this paper is a DJI Phantom drone, the proposed UAV-borne spectral sensor crop-growth monitoring method can be applied to various types of multirotor UAV structures.

### **4. Conclusions**

1. We identified the UAV-borne spectral sensor crop-growth monitoring method, used CFD numerical simulation and an actual test of 3D airflow field to determine the distribution range of the UAV downwash airflow field above the surface of the crop canopy, and designed sensor supports to target areas of the crop canopy outside the disturbance range of the UAV downwash airflow field.

2. When the flying height of the UAV was 1 m from the crop canopy, the influence range of the downwash airflow field of the UAV was elliptical, central wind speed was the largest and gradually decreased toward the periphery, the long semiaxis was about 0.45 m, and the short semiaxis was about 0.4 m. When the designed sensor support length was 60 cm, the sensor-test field of view was completely outside the disturbance range of the UAV downwash airflow field.

3. The wheat test showed that, when the dual-band crop-growth sensor was deployed at 0, 30, and 60 cm from the UAV, the linear fit R2 of the RVI value obtained by our method, and the RVI value measured by the ASD FieldSpec HandHeld 2 spectrometer, was 0.63, 0.66, and 0.81, respectively. When the length of the support was 60 cm, the fitting degree was obviously improved. The UAV-borne spectroscopy sensor crop-growth monitoring method can effectively avoid the disturbance range of the UAV downwash airflow field on the crop canopy.

4. The rice experiment showed that the RVI value measured by the UAV-borne spectral sensor had a good linear fitting relationship with the LNA, LAI, and LDW obtained from the field test and the indoor chemical analysis test. R<sup>2</sup> values were 0.62, 0.76, and 0.60, respectively, and RMSE values were 2.28, 1.03, and 10.73, respectively. Using the UAV-borne spectral sensor crop-growth monitoring method, rice-growth parameters during the entire growth period could be accurately obtained.

**Author Contributions:** J.N., Y.Z. (Yan Zhu) and W.C. designed the research, J.N., Q.W., Y.Z. (Yu Zhang) and L.Y. performed the research, L.Y., J.Y. and J.N. analyzed the data, and L.Y. and Y.Z. (Yan Zhu) wrote the paper. All authors have read and approved the final manuscript submitted to the editor.

**Funding:** This research was funded by grants from the National Key Research and Development Program of China (2016YFD070030403), Primary Research and Development Plan of Jiangsu Province of China (Grant No. BE2016378), Jiangsu Agricultural Science and Technology Independent Innovation Fund Project (Grant No. CX(16)1006), the Priority Academic Program Development of Jiangsu Higher Education Institutions (PAPD), and the 111Project (B16026).

**Acknowledgments:** The authors thank all those who helped in the course of this research.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Multi-Pig Part Detection and Association with a Fully-Convolutional Network**

### **Eric T. Psota 1,***∗***, Mateusz Mittek 1, Lance C. Pérez 1, Ty Schmidt <sup>2</sup> and Benny Mote <sup>2</sup>**


Received: 27 November 2018; Accepted: 16 February 2019; Published: 19 February 2019

**Abstract:** Computer vision systems have the potential to provide automated, non-invasive monitoring of livestock animals, however, the lack of public datasets with well-defined targets and evaluation metrics presents a significant challenge for researchers. Consequently, existing solutions often focus on achieving task-specific objectives using relatively small, private datasets. This work introduces a new dataset and method for instance-level detection of multiple pigs in group-housed environments. The method uses a single fully-convolutional neural network to detect the location and orientation of each animal, where both body part locations and pairwise associations are represented in the image space. Accompanying this method is a new dataset containing 2000 annotated images with 24,842 individually annotated pigs from 17 different locations. The proposed method achieves over 99% precision and over 96% recall when detecting pigs in environments previously seen by the network during training. To evaluate the robustness of the trained network, it is also tested on environments and lighting conditions unseen in the training set, where it achieves 91% precision and 67% recall. The dataset is publicly available for download.

**Keywords:** computer vision; deep learning; image processing; pose estimation; animal detection; precision livestock

### **1. Introduction**

Researchers have established that changes in animal behavior correlate with changes in health [1–3], however, the labor-intensive methods used to monitor behaviors do not lend themselves well to modern commercial swine facilities where just two seconds of daily observation per pig is recommended [4]. Because industry caretakers are typically responsible for thousands of pigs, it is nearly impossible for them to thoroughly assess the health and well-being of individuals using manual observation. This limitation is further compounded by the fact that the effectiveness of human visual assessments are inherently limited by both the attention span and subjectivity of observers [5].

By adapting modern technology to commercial operations, precision livestock farming aims to ease the burden on individual caregivers and provide a solution for continuous automated monitoring of individual animals [6–8]. Over the past two decades, researchers have approached this problem from a multitude of different angles [9,10]. These include 3D tracking via wearable ultra-wide band (UWB) devices [11,12], GPS motes [13,14], inertial measurement unit (IMU) activity trackers [15–18], RFID ear tags [19–21], and detection-free tracking using depth cameras [22,23].

This work aims to provide a generalizable vision-based solution for animal detection in group-housing environments. While animal tracking methods using radio identification devices directly provide data on individual animals, there are several disadvantages to using wearable methods when compared to video-based approaches [24,25]. Wearables must withstand harsh environments, they require costly installation and maintenance on a per animal basis, and the localization accuracy of both UWB and GPS systems is typically too low to detect animal orientation, activities, and social behaviors. In contrast, video already provides information-rich data that allows humans to identify precisely what each animal is doing at all times. However, converting digital video to meaningful data without human intervention is an ongoing pursuit in the research community. Existing solutions require sophisticated computer vision algorithms and they have begun to rely increasingly on advancements in machine learning and artificial intelligence [26,27].

Computer vision researchers experienced a dramatic shift in 2012 when Krizhevsky et al. demonstrated that a deep convolutional neural network, trained from scratch without any hand-crafted heuristics, could easily outperform all existing methods for image classification [28]. This shift was made possible by efficient implementations of convolutional neural network (CNN) training [29,30], improvements in computational resources via graphics processing units (GPUs) [31], and large (100K+ images) human-annotated datasets like PASCAL [32], MS-COCO [33], and CityScapes [34]. Modern computer vision applications that locate and classify objects in images often leverage established methods like YOLO [35,36], SSD [37], and Faster R-CNN [38,39] in their pipeline. While common tasks like pedestrian detection lend themselves well to pre-trained networks and existing datasets, there still exists unique challenges that require custom datasets and accompanying solutions.

This work introduces a fully convolutional neural network used to identify the location and orientation of multiple group-housed pigs. The target output of the network is an image-space representation of each pig's body part locations along with a method for associating them to form complete instances. To train the network, a new dataset is used that contains 2000 images with 24,842 uniquely labeled pig instances. The dataset is divided into a training set and a testing set, and the testing set is subdivided into two sets: one with images depicting the same environments as the training set, and another with images of new environments not represented in the training set. This dataset design allows the robustness of detection algorithms to be tested against novel animal presentations and environments. The three key contributions of this work are (1) a fully convolutional instance detection method, (2) a public dataset for training and evaluation, and (3) metrics that can be used to measure the performance of methods that detect both location and orientation.

The remainder of this paper is organized as follows. Section 2 discusses related work in vision-based animal detection and modern deep learning detection methods. Section 3 introduces the proposed method and the design of the fully-convolutional neural network. The dataset, evaluation methodology, and results are presented in Sections 4 and 4.6 discusses some key insights and additional practical considerations. Finally, concluding remarks are provided in Section 5.

### **2. Background**

Visual detection of multiple moving targets with a static camera often begins with segmentation of foreground objects using background subtraction. If sufficient separation between targets exists, traditional computer vision methods like connected components can be used to easily identify unique instances [40]. However, this is hardly the case for group-housed animals that constantly engage each other socially and often prefer to lie in groups to preserve warmth.

Using Otsu's method [41] to adaptively threshold background from foreground and Kashiha's ellipse-fitting method [42], Nasirahmadi et al. [43] used computer vision to identify the lying behavior of group-housed pigs as a function of temperature. While their method achieved approximately 95% accuracy when locating pigs in the images, it was only tested on a single environment where the animals had consistent age and appearance, so it is difficult to predict how well the method generalizes. Ahrendt et al. [44] used a continuously updated model and a 5D Mahalanobis distance between foreground and background pixels to isolate and track the targets, however, the authors point out that the method struggles to identify separate instances when animals are moving quickly and/or overlapping one another. Nilsson et al. [45] used a region-based method to segment group-housed

pigs using supervised learning, demonstrating accurate results when the pigs are fully visible (i.e., non-occluded) from a top-down perspective.

A popular approach to dealing with the difficulties of background subtraction using color images is to incorporate depth-imaging cameras [22,46–54]. However, while foreground extraction is made simple, depth image processing is still faced with the challenge of separating abutting pigs. To address this challenge, Ju et al. [27] introduced a top-down method that first detects bounding boxes around pigs using the YOLO object detector [36]. When YOLO inevitably detects multiple touching pigs with a single bounding box, their method proceeds by separating segmented images (obtained using a Gaussian mixture model) to establish boundaries between instances. While their overall method achieves 92% accuracy, it is limited to three touching pigs and results were obtained within a single environment over a ten minute window, so generalizability is difficult to assess.

Some approaches using the Kinect depth-sensing camera explicitly define the region of interest (pen) as a three-dimensional bounding box to separate background from foreground. Mittek et al. [23] use this approach to track pigs as ellipsoidal collections of points in the foreground. While their method achieves an average of 20 min of continuous tracking without errors, it requires manual initialization and does not introduce a detection method. The method introduced by Matthew et al. [55] uses regional grouping of surface normals to detect individual instances. They do not perform a formal assessment of the detection accuracy, but the resulting tracking algorithm maintains accurate tracking for an average of 22 s without requiring manual initialization.

Beginning in 2014 with the introduction of R-CNN by Girshick et al. [56], visual detection methods have predominantly used deep convolutional neural networks. Furthermore, they generally fall into one of two categories: (1) top-down approaches that define regions of interest before performing subsequent segmentation or keypoint annotation, or (2) bottom-up approaches that directly segment pixels or detect keypoints without explicitly detecting regions of interest. Mask R-CNN [57], for example, is a state-of-the-art top-down approach for performing instance-level object segmentation and keypoint detection. However, because it relies on a priori region proposal, it is inherently unable to separate objects with significant bounding box overlap—a common occurrence among group-housed animals. In contrast, bottom-up detection and classification is directly done per-pixel in the image space [58]. Two notable methods, OpenPose [59] and PersonLab [60], directly identify keypoints and affinity metrics that allow them to be clustered into whole instances.

Bottom-up keypoint detection was adapted to cow tracking by Ardö et al. [26]. They identified a set of landmarks visible from a top-down view to represent each cow's location and orientation, and trained a fully-convolutional neural network to detect them in the image space. A post-processing network was then used to convert the annotations to per-pixel orientation classification outputs, resulting in 95% accuracy in correctly labeling all cows in a given image. In a follow-up experiment, they applied the previously trained network to a new environment and observed that it only succeeded on 55% of images, indicating that the network was overfitting to a particular environment.

The proposed method introduces a new bottom-up strategy that identifies multiple pig instances in images as a collection of keypoints. Unlike the approach by Ardö et al. [26], this method preserves the association between keypoints and instances, making it possible to evaluate the performance of the method directly as a keypoint detection method. Furthermore, keypoints provide a precise representation of the pose of each animal, making it possible to identify activities and interactions between animals.

### **3. Proposed Method**

The goal of the proposed method is to detect the location and orientation of all visible pigs in the pen environment. The first stage of the process aims to find the location of all pertinent body parts, while the second stage aims to associate them with one another to form whole instances. The following two sections (Sections 3.1 and 3.2) describe the method used to represent parts and associations within the image space. Section 3.3 then presents a method that interprets these image space representations

and produces a set of unique animal instances. Finally, Section 3.4 introduces the fully-convolutional network that takes, as input, an image of the pen environment and attempts to produce the image space outputs described in Sections 3.1 and 3.2.

### *3.1. Representation of Body Part Location*

The proposed method assumed that images of the pen environment were captured from a downward-facing camera mounted above the pen. When trying to detect and differentiate multiple animals in a group-housed setting, a top-down view has three distinct advantages over alternative visual perspectives. Firstly, animals are generally nonoccluded from the top-down perspective unless they are crawling over (or lying on top of) one another. Secondly, the size and appearance of animals is consistent from a top-down perspective, making it easier for a system to identify the animals. Thirdly, one can reliably approximate the 3D position of each animal from their projection onto the 2D image plane by assuming a constant height above the pen floor plane. Approximation is often necessary if 3D coordinates are desired and the single-camera system being used lacks the ability to measure depth.

From a top-down perspective, the part of the animal most likely to be visible was the surface of the back. Thus, in order to represent both the position and orientation of each pig, the proposed method used the image-space location of the tail and shoulder belonging to each animal. Assuming there are *N* animals in the pen, the tail and shoulder position of animal *n* ∈ {1, ... , *N*} is denoted **t***<sup>n</sup>* = (*x***t***<sup>n</sup>* , *y***t***<sup>n</sup>* ) and **s***<sup>n</sup>* = (*x***s***<sup>n</sup>* , *y***s***<sup>n</sup>* ), respectively. More specifically, "tail" refers to a surface point along the center ridge of the back that is between the left and right ham. The term "shoulder" refers to a surface point along the center ridge of the back between the shoulder blades. The chosen representation also includes the 2D position of the left and right ears, denoted **l***<sup>n</sup>* = (*x***l***<sup>n</sup>* , *y***l***<sup>n</sup>* ) and **r***<sup>n</sup>* = (*x***r***<sup>n</sup>* , *y***r***<sup>n</sup>* ), respectively. While their visibility is not guaranteed, such as when the animal lies on its side or positions its head in a feeder, their locations can be used to approximate the pose of the head and/or assign animals with a unique visual marker in the form of an ear tag.

Figure 1a illustrates an example of an input image depicting a single pig. The location of the left ear, right ear, shoulder, and tail are represented by the red, green, blue, and yellow spots in the target mapping shown in Figure 1b, respectively. Finally, the superimposed visualization given in Figure 1c illustrates the locations of the four parts in reference to the original image. Note that the target mapping in Figure 1b has the same spatial dimensions (rows and columns) as the input image, but it contains four channels with each corresponding to a single body part type.

**Figure 1.** The original image (**a**) is mapped to a four-channel output (**b**), where the location of the left ear, right ear, shoulder, and tail are represented by Gaussian kernels in channels 1–4 of the output, respectively. Note that the four colors used in (**b**) are purely for visualization of the four separate channels of the target image. The overlay in (**c**) is provided to illustrate the locations relative to the original image.

To approximate the level of uncertainty inherent in the user annotations of each body part location, parts within the target mapping were each represented by 2D Gaussian kernels. While the distribution of the 2D Gaussian kernels is defined by a 2 × 2 covariance matrix, here we have only considered symmetric 2D Gaussian kernels that can be characterized by a single standard deviation multiplied by a 2 × 2 identity matrix. This standard deviation will be denoted *σ<sup>n</sup>* for each animal *n* ∈ {1, ... , *N*}. This mapping of uncertainty was meant to approximate the probability distribution of part locations annotated by a human given the original image **I**. The kernels were also scaled so that the magnitude at the center is 1.0. This allowed for a straightforward translation between kernels and 2D image coordinates via a simple thresholding operation. The first four channels of the target output, as defined in Table 1, were thus proportional to the probability that parts {**l**,**r**, **s**, **t**} exist at each spatial location in the image.

**Table 1.** Channels 1–4 of the image-space representation used to represent pig locations and orientations. Each channel corresponds to a different body part. The locations of each part are marked with Gaussian kernels meant to represent the distribution of part locations provided by a human annotator.


### *3.2. Representation of Body Part Association*

Even if every body part location is detected correctly, parts must be associated with each other in order to identify individual whole-animal instances. A naive approach would be to associate each body part with its nearest neighbor in terms of Euclidian distance using an optimal bipartite assignment method, such as the Hungarian algorithm. However, due to the elongated shape of pigs, this approach was prone to failure in cluttered environments, as illustrated in Figure 2.

(**a**) Image with part detections (**b**) Optimal grouping using Euclidean distance

**Figure 2.** Two nearby pigs with body parts properly annotated are depicted in (**a**). Whole instances must be formed by joining the parts together through part association. An optimal Euclidean nearest-neighbor part association is prone to failure when the animals are in close proximity, as illustrated in (**b**).

The proposed method used additional channels in the target output to encode body part locations with 2D vector offsets to other body parts belonging to the same animal. These offsets represented the direction and distance in pixels from one body part to another. While there were a total of ( 4 <sup>2</sup>) = 6 part pairs that exist between the four parts, the target output only represented three in order to reduce unnecessary redundancy (e.g., vectors joining tail-to-left-ear can be closely approximated by combining a vector joining the tail-to-shoulder and then the shoulder-to-left-ear). Specifically, 12 channels were used to represent three part pair associations, as listed in Table 2. The three part pairs and their associated channels are given below:



**Table 2.** Channels 5–16 of the image-space representation used to represent pig locations and orientations. Pairs of neighboring channels corresponds to the *x* and *y* offset between neighboring parts. Overall, these 12 channels represent bidirectional vectors linking three pairs of body parts.

Each of the 12 channels encoded a real-valued offset from one point to another. Much like the part detection mappings, these vectors were encoded regionally into the spatial dimensions of the image. Figure 3 illustrates this encoding for a pair of side-by-side pigs. The diameter of the circular regions is denoted *dn* for each pig *n* in the image, and it is proportional to the standard deviation used for the Gaussian kernel used in the previous section. For visualization purposes, each of the six images in Figure 3b,c represent the direction and distance between part pairs as a color, where the hue represents the direction and the saturation represents the magnitude (encoding provided in Figure 3d). Figure 3c further illustrates the lines joining the parts to one another.

**Figure 3.** The original image (**a**) is mapped to a 12-channel output (**b**), where vectors joining three pairs of body parts are encoded into circular regions in channels 5–16 of the output. Note that the four colors used in (**b**) are purely for visualization of the direction and magnitude of the vectors, where (**d**) provides a mapping between vectors and colors. The overlay in (**c**) is provided to illustrate the locations of the vector encodings and their magnitude and direction (illustrated by the gray line) in relation to the original image.

### *3.3. Instances from Part Detection and Association*

The goal of the proposed method was to detect all visible parts and group them together in order to form whole-animal instances. Sections 3.1 and 3.2 provided a means to represent body part locations and vectors associated them to one another in the form of a 16-dimensional image-space mapping. To translate this image-space mapping to a set of discrete instance locations, one must follow a sequence of steps (illustrated in Figure 4).

**Figure 4.** Flow diagram of the proposed method for converting the 16-channel image space representation to a set of 2D coordinates of each visible instance.

First, each 2D body part locations must have been extracted from the Gaussian kernels contained in channels 1–4 of the 16-channel representation image-space representation. Thus, the first step was to split channels into 4-channel part detections and 12-channel part associations. The precise 2D part locations were represented by the peaks of the Gaussian kernels in the image space mapping. Let **Mp** be the *R* × *C* image space map for body part **p** ∈ {**l**,**r**, **s**, **t**}, which corresponds with the left ear, right ear, shoulder, and tail, respectively. Note that it was assumed that the number of rows and columns in the input image and output mappings are *R* and *C*. The part locations can be extracted from the image space using a form of regional max response detection defined by

$$\mathcal{R}\{\mathbf{p}\} = \left\{ (\mathbf{x}, y) \middle| \mathbf{M}\_{\mathbf{p}}(\mathbf{x}, y) \ge \mathbf{M}\_{\mathbf{p}}(\mathbf{x}', y') \text{ for all } (\mathbf{x}', y') \in \mathcal{R}\_{(\mathbf{x}, y)} \right\} \quad \text{for } \mathbf{p} \in \{1, \mathbf{r}, \mathbf{s}, \mathbf{t}\}, \tag{1}$$

where R(*x*,*y*) was a region surrounding image space location (*x*, *y*). Part locations are therefore only detected if their value in the image space mapping was greater than that of its neighbors. This worked well for detecting the peak pixel coordinates of Gaussian kernels, but it can be further refined by using quadratic sub-pixel interpolation. Here, interpolation was performed by replacing the original integer coordinates (*x*, *y*) with real number coordinates using

$$f\_{\mathbf{p}}(\mathbf{x},\mathbf{y}) \leftarrow \left(\mathbf{x} + \frac{\mathbf{M}\_{\mathbf{p}}(\mathbf{x}-1,\mathbf{y}) - \mathbf{M}\_{\mathbf{p}}(\mathbf{x}+1,\mathbf{y})}{2\left(\mathbf{M}\_{\mathbf{p}}(\mathbf{x}+1,\mathbf{y}) + \mathbf{M}\_{\mathbf{p}}(\mathbf{x}-1,\mathbf{y}) - 2\mathbf{M}\_{\mathbf{p}}(\mathbf{x},\mathbf{y})\right)}, \mathbf{y} + \frac{\mathbf{M}\_{\mathbf{p}}(\mathbf{x},\mathbf{y}-1) - \mathbf{M}\_{\mathbf{p}}(\mathbf{x},\mathbf{y}+1)}{2\left(\mathbf{M}\_{\mathbf{p}}(\mathbf{x},\mathbf{y}+1) + \mathbf{M}\_{\mathbf{p}}(\mathbf{x},\mathbf{y}-1) - 2\mathbf{M}\_{\mathbf{p}}(\mathbf{x},\mathbf{y})\right)}\right). \tag{2}$$

Given the complete set of detected body part locations

$$\{\mathbf{p}\_1, \dots, \mathbf{p}\_{N\_\mathbf{P}}\} = \{ (x\_{\mathbf{p}\_1}, y\_{\mathbf{p}\_1}), \dots, (x\_{\mathbf{p}\_{N\_\mathbf{P}}}, y\_{\mathbf{p}\_{N\_\mathbf{P}}}) \} \quad \text{ for } \mathbf{p} \in \{1, \mathbf{r}, \mathbf{s}, \mathbf{t}\}, \tag{3}$$

the next step was to estimate the locations of associated parts using an association vector sampling of the 12-channel part associations mapping. The 12 dimensions of the association mapping will be denoted [**M***<sup>x</sup>* **<sup>l</sup>**→**sM***<sup>y</sup>* **<sup>l</sup>**→**sM***<sup>x</sup>* **s**→**l M***<sup>y</sup>* **s**→**l M***<sup>x</sup>* **<sup>r</sup>**→**sM***<sup>y</sup>* **<sup>r</sup>**→**sM***<sup>x</sup>* **<sup>s</sup>**→**rM***<sup>y</sup>* **<sup>s</sup>**→**rM***<sup>x</sup>* **<sup>s</sup>**→**tM***<sup>y</sup>* **<sup>s</sup>**→**tM***<sup>x</sup>* **<sup>t</sup>**→**sM***<sup>y</sup>* **<sup>t</sup>**→**s**] and the estimated location of an associated part **q** from location **p***n* can be obtained using

$$(\mathbf{p}\rightarrow\mathbf{q})\_n = (\mathbf{x}\_{\mathbf{p}n} - \mathbf{M}^x\_{\mathbf{p}\rightarrow\mathbf{q}}(\mathbf{x}\_{\mathbf{p}n'}, y\_{\mathbf{p}n}), y\_{\mathbf{p}n} - \mathbf{M}^y\_{\mathbf{p}\rightarrow\mathbf{q}}(\mathbf{x}\_{\mathbf{p}n'} y\_{\mathbf{p}n})) \quad \text{for all} \quad n = 1, \ldots, N\_{\mathbf{p}}.\tag{4}$$

To join parts together, the distance between the estimated part locations and the actual locations were first computed using a pairwise distance evaluation. Specifically, the association distance between two parts **p***n* and **q***m* is given by

$$d(\mathbf{p}\_{n\prime}, \mathbf{q}\_{m}) = \frac{| (\mathbf{p} \rightarrow \mathbf{q})\_{n} - \mathbf{q}\_{m} | + | (\mathbf{q} \rightarrow \mathbf{p})\_{m} - \mathbf{p}\_{n} |}{2},\tag{5}$$

where |**a**| denotes the L2-norm of vector **a**. Overall, this collection of part-to-part distances forms a set of three unique distance matrices

$$\mathbf{D}\_{\mathbf{p},\mathbf{q}} = \begin{bmatrix} d(\mathbf{p}\_1, \mathbf{q}\_1) & d(\mathbf{p}\_1, \mathbf{q}\_2) & \dots & d(\mathbf{p}\_{1'}, \mathbf{q}\_{N\_{\mathbf{q}}}) \\ d(\mathbf{p}\_{2'}, \mathbf{q}\_1) & d(\mathbf{p}\_{2'}, \mathbf{q}\_2) & \dots & d(\mathbf{p}\_{2'}, \mathbf{q}\_{N\_{\mathbf{q}}}) \\ \vdots & \vdots & \ddots & \vdots \\ d(\mathbf{p}\_{N\_{\mathbf{p}'}}, \mathbf{q}\_1) & d(\mathbf{p}\_{N\_{\mathbf{p}'}}, \mathbf{q}\_2) & \dots & d(\mathbf{p}\_{N\_{\mathbf{p}'}}, \mathbf{q}\_{N\_{\mathbf{q}}}) \end{bmatrix} \tag{6}$$

where (**p** = **l**, **q** = **s**), (**p** = **r**, **q** = **s**), and (**p** = **s**, **q** = **t**). An optimal assignment between pairs of body parts that minimizes the sum of distances can be obtained by applying the Hungarian assignment algorithm to each distance matrix.

Finally, individual animals are identified as those that contain a joined shoulder and tail. The shoulder-tail instance extraction method began by identifying matches from the Hungarian assignment algorithm's output for **Ds**,**t**. Once all instances have been identified, the left and right ear detections can be joined to the shoulder locations of all instances via the Hungarian assignment algorithm's output for **Dl**,**<sup>s</sup>** and **Dr**,**s**.

### *3.4. Fully-Convolution Network for Part Detection and Association Mapping*

A fully-convolutional neural network was used to approximate the 16-channel target output, given a red-blue-green (RGB) image as input. Specifically, the hourglass network illustrated in Figure 5 was proposed. Hourglass networks with symmetry in the downsampling and upsampling stages have previously been used for pose estimation [61] and image segmentation [62]. Specifically, the proposed network was largely based on the popular SegNet architecture [62], which introduced a way to improve upsampling by sharing the indices of each max pooling layer with a corresponding max unpooling layer. This approach has been shown to achieve impressive performance in segmentation tasks by removing the burden of "learning to upsample" from the network.

The network architecture used by the proposed method also incorporated skip-connections in the form of depth concatenation immediately after max unpooling layers. Skip-connections have been demonstrated to encourage feature-reuse, thus improving performance with a fixed number of network coefficients [63]. They were also shown to drastically decrease the amount of training time required by the network. The U-net architecture, introduced by Ronneberger et al. [64], further demonstrated the power of skip-connections for hourglass-shaped networks.

During training, the objective function attempted to minimize the mean-squared error between the network output and the target ground truth. For the first four channels that correspond to part detections, gradients were back-propagated for all pixel locations regardless of their value. For the last 12 channels, gradients were back-propagated exclusively for pixel locations where the target output is assigned (non-zero). Therefore, a specific output was only encouraged in the regions surrounding the point locations. This type of selective training helped to ensure that the vector outputs did not tend toward zero in areas where part detections were uncertain. In essence, this approach was meant to separate the tasks of detection and association mapping.

### Receptive Field

When designing neural networks for visual tasks, it is important that the network was able to "see" the entirety of the objects it is considering. This viewable area is often referred to as the receptive field. To derive the receptive field, the effective stride length between adjacent coordinates in the feature map is calculated using

$$s\_{l\_{\text{offective}}} = s\_{l-1\_{\text{offective}}} \times s\_{l\_{\text{\prime}}} \tag{7}$$

where *sl* is equal to the stride length at layer block *l* in the network and *s*<sup>0</sup> = 1. Note that, in the proposed network, all max pooling layers have *sl* = 2 and all max unpooling layers have *sl* = 0.5, while all other layers have *sl* = 1. The overall stride length essentially relates the resolution of a downsampled feature map to the original input size. Given *sl* for all *l* in the network, the receptive field size can be calculated using

$$r\_l = r\_{l-1} + (w\_l - 1) \times s\_{l-1} \tag{8}$$

where *wl* is the convolutional kernel width at layer *l* and *r*<sup>0</sup> = 1. In the proposed network, each convolutional kernel had width *wl* = 3. Because of the stochastic nature of max pooling and max unpooling operations, it was difficult to define their effective kernel size. Therefore, in this analysis, we have used the lower bound of *wl* = 1 for all pooling operations.

Table 3 provides the receptive field of the network as a function of layer block for a subset of the 41 layer blocks featured in Figure 5. The receptive field represented the width of a square region in the original image space that affected a single pixel location at the output. While the receptive field of the proposed network was 363, the distance between any two image locations that can affect each others' outputs was 181 (the radius of the receptive field). In practice, it was recommended that the receptive field size should be considerably larger than the maximum object size due to a decaying effect that has been observed on the outer extremes of the square region [65]. It will be shown in Section 4 that the chosen image scale results in pigs that are typically much smaller than the receptive field radius.

**Figure 5.** The hourglass-shaped network used by the proposed method to convert images to 16-channel image-space instance detection maps.


**Table 3.** Sampling of the receptive field calculations at the output of every layer of the proposed network. The different types of layers are abbreviated with the following notation: I: input image, C: convolution block, M: max pooling, U: max unpooling, D: depth concatenation, O: output image

### **4. Experimental Results**

### *4.1. Dataset*

To the best of our knowledge, no open-source dataset exists for pig detection in group-housing environments. Therefore, to enable quantitative evaluation, a new dataset with 2000 annotated images of pigs was introduced. The dataset (http://psrg.unl.edu/Projects/Details/12-Animal-Tracking) depicts 17 different pen locations and includes pigs ranging in age from 1.5 to 5.5 months. Each unique image was randomly extracted from video recordings spanning multiple weeks in each location. More than two hours, on average, existed between samples at each location. Thus, a wide range of unique animal poses were represented in the dataset.

The dataset was divided into two subsets: 1600 images for training and 400 images for testing. Furthermore, the 400 testing images were subdivided into two additional subsets: 200 captured in the same environments seen in the training set (test:seen), and 200 images from environments previously unseen in the training set (test:unseen). The cameras used to capture the images included both a Microsoft Kinect v2 color camera with resolution 1080 × 1920 and Lorex LNE3162B and LNE4422 color/IR cameras with resolution 1520 × 2688. All of the environments were captured with the camera mounted above the pen looking down. The distance between the pen floor and the camera varied between 2.5 and 6.0 m, and the specific poses of the cameras ensured that the animal pen of interest was centered and entirely contained within the field of view. Variations in environment and image capture technology were used to ensure that the analysis emphasizes robustness.

Figure 6 shows sample images from the training set, depicting 13 different pen locations with color-coded annotations for each hand-labeled body part. Note that the last two images in Figure 6 depict the same environment, but one was captured with full color in the daytime and the other was captured with active IR at night. The first 200 images of the testing set (test:seen) were captured in the same environment as the training set, but at different times. Because more than two hours existed between subsequent randomly sampled images, it is likely that each test:seen image contained different animal poses than each training set image.

Figure 7 illustrates six sample images of the 200 images from the test:unseen set. Note that, not only was this environment previously unseen in the training set, but this set also included challenging lighting conditions that were also not represented among the training images. Twenty images from the training set were captured where the camera's IR night vision was activated, but all of the remaining 1580 training set images (and all of the test:seen images) were captured with overhead lights on. To achieve the challenging lighting conditions present in the test:unseen set, the lights were turned on at 6 am and off at 6 pm every day. For a short duration between approximately 6 pm and 8 pm, ambient lighting dimly illuminated the pens. After 8 pm, the cameras activated night-vision mode and captured IR images while actively illuminating the scene with built-in IR lights. Two of the four pens presented in the test:unseen set were also illuminated with IR flood lights. This had the effect of creating well lit scenes with harsh shadows and side-lighting.

In each of the images, a user manually annotated the location of the left ear (red), right ear (green), shoulder (blue), and tail (yellow) for each visible animal in that order. Annotations belonging to the

same instance are connected with a continuous black line. If ears were not visible, they were not annotated, however, emphasis was placed on annotating both shoulders and tail for each instance even when these locations are occluded, i.e., both shoulder and tail were annotated as long as they are located in the pen of interest and their estimated positions were within the field of view of the camera.

It should be noted that, in some cases, pigs from adjacent pens were partially visible through bars that separate the pens. These partially visible pigs were not annotated. It was assumed that a camera placed above a pen is responsible for detecting only animals in that pen and, while some areas of the image belonging to the adjacent pen were masked out, it was difficult to remove partially visible pigs from the image without unintentionally masking out pigs within the pen of interest. In practice, areas of interests were defined by polygons for each image in the dataset and masking out was done by setting all pixels outside the area of interest to pure black. Examples of masked out regions can been seen in Figures 6 and 7, where the blacked out regions correspond to areas with pigs in adjacent pens.

**Figure 6.** Sample images depicting different environments represented in the training set. The first 13 images (left-to-right, top-to-bottom) were captured during daylight hours with lights on. The last image (from the same environment as the 13th image) was captured using the infrared night vision mode used by the Lorex LNE3162B with active IR illumination.

**Figure 7.** Sample images from the "unseen" portion of the testing set (test:unseen). These images depict environments and lighting conditions not represented in the training set.

### *4.2. Training Details*

Prior to training the network, images were downsampled so that the number of columns was 480. This was empirically deemed to be a sufficient resolution for discerning the parts of interest while remaining small enough for the computing hardware to process multiple images per second. The average length of pigs in each image after downsampling is presented in the histogram of Figure 8. While the majority of the pigs had a body length of less than 100 pixels, there were some that exceed 140 pixels in length. For these pigs, it was important that the network was able to "see" the entirety of the pig as it estimated the body part locations and vector associations. In Section 3.4, the radius of the receptive field was found to be 181 using the proposed network. Therefore, the network was capable of observing the entire animal even in the most extreme images where the shoulder-to-tail distance approached 140 pixels.

**Figure 8.** Distribution of the average length from shoulder to tail in each partition of the dataset.

Target images for training the fully-convolutional network were created by adapting the size of the Gaussian kernels used to mark each part in channels 1–4 (Figure 1b) to the size of the animals. This adaptation encouraged continuity of image-space annotations between different environments and ages/sizes of pigs. Specifically, this was done by first computing the average distance between the shoulder and tail for all instances, denoted *μ***s**→**t**, to provide a numerical representation of the average size of pigs in the image space. Then, the shoulder-to-tail distance for instance *n*, given by *δ***s**→**t**, was combined with the average distance in order to compute the Gaussian kernel standard deviation, defined as *σ<sup>n</sup>* = 0.16 × (*μ***s**→**<sup>t</sup>** + *δ***s**→**t**). This combination was used to prevent unusual animal poses from shrinking the size of the kernels too much, while still allowing some adaptation to individual size variations. It should be noted that the scale factor of 0.16 was determined empirically to provide a

suitable approximation to the variability of human annotations. If *σ<sup>n</sup>* was too large, kernels belonging to nearby pigs interfered with each other and often resulted in a single part location being extracted by regional maximum detection. When *σ<sup>n</sup>* was too small, the network training unfairly applied a penalty to kernels that were not exactly equal to the location provided by the human annotator, even if the kernel's location was within the natural variance of human annotations. Finally, the Gaussian kernels were then multiplied by a scalar value in order to set their maximum value to 1.0 and, in cases where two nearby Gaussian kernels for the same body part intersected, the output was simply assigned to the maximum of the two Gaussian kernel values. Scaling the kernels so that the peak is 1.0 helped to ensure that a fixed threshold can be used in peak detection regardless of *σn*.

The circular regions used to assign association vectors between parts in channels 5–16 (Figure 3b) should ideally have covered all possible pixel locations where the part might be detected. In practice, this area can be sufficiently covered by considering regions where the Gaussian kernel for each part had a magnitude greater than 0.2. In situations where one region intersected with another, the target output vector was composed of a weighted combination of the two intersecting vectors. The weights in these circumstances came from the corresponding Gaussian kernel magnitude at each pixel location.

The network was trained using heavy augmentation of both input and target images. Augmentations included random left-right flipping, random rotations sampled from a uniform distribution ranging from 0 to 360◦, random scaling sampled uniformly between 0.5 and 1.5, and XY shifts uniformly sampled from the range of ±20 pixels along both dimensions. Careful consideration was needed for augmenting the 16-channel target image. Rotations and scaling were applied spatially to both the association vector regions and also the output values along pairs of channels that correspond to XY offsets between body parts. Left-right flips were handled by switching the labels for left and right ears.

### *4.3. Processing Details*

After obtaining the 16-channel mapping from the trained network, each of the part location maps (channels 1–4) and the association maps (channels 5–16) were smoothened using a 5 × 5 averaging box filter. This step would not be necessary to extract information from ground truth mappings, but it was beneficial for reducing the effects of noise on regional maximum response detection. In practice, box filtering was done by adding an average pooling layer to the end of the neural network. The size of regions R(*x*,*y*) used in (1) consisted of a 15 × 15 window surrounding each pixel (*x*, *y*).

The method was implemented in Matlab 2018b using the deep learning toolbox [66]. A desktop computer, equipped with an Intel i7-6700K CPU, 32 GB of RAM, and an NVIDIA GTX1070 GPU was used for training and inference. The approximate time required by the fully-convolutional neural network to perform forward inference is 0.24 s and it took an additional 0.01 s to find instance locations. Thus, the system was capable of fully processing four frames per second.

### *4.4. Instance Detection Performance Metric*

The goal of the proposed method was to identify the location and orientation of each pig in a given image. Although the method generated detections and associations for four body parts, only the shoulder and tail location were used to identify a complete instance. This decision was based on two factors. Firstly, they are sufficient for approximating the center-of-mass location and orientation of each animal and, second, special emphasis was placed on ensuring their labeling by human annotators. Given a complete set of *N* ground truth shoulder-tail pairs {(**s**1, **t**1), ... ,(**s***N*, **t***N*)} and a set of *M* estimated shoulder-tail pairs {(**s**˜1, ˜**t**1), ... ,(**s**˜*N*, ˜**t***M*)}, an association method was needed to determine if an estimate corresponded to the ground truth, since both sets of pixel coordinates were unlikely to contain exactly the same values.

Bipartite matching problems are commonly solved with a Hungarian assignment, however, this can sometimes lead to matches between far-away pairs in order to minimize a global cost. For this particular matching problem where shoulder-tail pairs are associated with each other, there was likely to be very little distance between the ground truth and detected positions. Setting the maximum distance allowed between matching pairs can fix this issue, but it comes at the cost of introducing additional parameters that depended on image resolution and the relative sizes of animals. To avoid parameterization, the strict cross-check matching criteria was used here to assign estimates to the ground truth only when they were each others' minimum cost matches. More formally, two instances *n* and *m* matched if and only if

$$m = \underset{m \in \{1, \dots, M\}}{\text{argmax}} \ |\mathbf{s}\_{\text{ll}} - \mathbf{\bar{s}}\_{\text{ll}}| + |\mathbf{t}\_{\text{ll}} - \mathbf{\bar{t}}\_{\text{ll}}| \tag{9}$$

and

$$m = \underset{n \in \{1, \dots, N\}}{\text{argmax}} \ |\mathbf{s}\_{\text{ll}} - \overline{\mathbf{s}}\_{\text{ll}}| + |\mathbf{t}\_{\text{ll}} - \overline{\mathbf{t}}\_{\text{ll}}|,\tag{10}$$

where || denotes the L2 norm. Figure 9 illustrates the advantage of using the cross-check method instead of the unparameterized Hungarian algorithm.

**Figure 9.** Example with two ground truth instance locations (solid circles connected with black lines) and two detected instances (empty circles connected with white lines) using matching results achieved with both the Hungarian algorithm and the proposed cross-check matching of Equations (9) and (10). While the detection *x* and ground truth location *b* in the middle are clearly nearest neighbors of one another, they are not matched by the Hungarian algorithm. Instead, in an effort to minimize the global matching cost, the Hungarian algorithm will assign *a* to *x* and *b* to *y*. In contrast, the cross-check matching method leaves the outer detection *y* and ground truth location *a* unmatched while assigning the two in the middle, *x* and *b*, together.

### *4.5. Instance Matching Results*

In order to evaluate the effectiveness of the proposed vector part association method, it was compared to an alternative Euclidean part association method. The Euclidean part association method joins parts together by simply minimizing their Euclidean distance. This method, previously illustrated in Figure 2, removes the effects of part association vectors on detection performance and allows for a partial ablation study. Figure 10 presents the precision and recall for all three partitions of the dataset and Table 4 presents full numerical results over the training set. Each sample along the curve corresponds to a different threshold for part detection, where parts are detected only if the neural network output in channels 1–4 exceeds the threshold value.

The results show that the proposed vector part association method provided a significant boost to matching precision when compared to Euclidean part association. Less than 0.1% of detections were false positives compared to more than 5% when using Euclidean matching, regardless of threshold. Figure 10a,b illustrate nearly identical results across training and test:seen sets. This provides a strong indication that the method was not overfitting to the specific animal poses presented in the training set. Both Figure 10a,b demonstrate a minimum precision of ≈0.91 at a recall of ≈0.42 for the Euclidean matching method. This was because, at this threshold less than half of the animal parts were being detected but the ones that were detected are matched to their nearest neighbor. As a result, there was a relatively high likelihood that only a shoulder is detected, but not a tail, or vice versa. In an effort to form whole instances, the Euclidean method simply joined together nearest neighbors and many of these instances were not aligned with the ground truth. When the threshold was adjusted higher or lower, there was a higher likelihood that either both shoulder and tail were detected or neither was

detected. In either case, this lead to improved precision, because the instances that were identified were more likely to be true positives.

**Figure 10.** Precision-recall curves for both the proposed method and an alternative association strategy that assigns parts to one another by minimizing Euclidean distance. The results are nearly identical across the training set (**a**) and test:seen set (**b**), and both illustrate a dramatic improvement achieved by joining parts together using the proposed vector mapping. The results on the test:unseen set (**c**) illustrate the limitations of the method when operating on different environments than those used for training.



In Table 5, the results are compared across all three partitions of the dataset with a fixed threshold of 0.25. While the F-measure was 0.981 at threshold 0.25, which was lower than the peak F-measure of 0.987 achieved at a threshold of 0.1, the decreased threshold produced more than twice the number of false positives. When F-measure values were nearly identical, the choice of threshold depended on how sensitive an application was to false positives and false negatives. The comparison at threshold 0.25 highlighted both the performance similarities across the training and test:seen sets and the discrepancies between both of those sets and the test:unseen set. One interpretation is that the discrepancy illuminates the importance of environment and lighting variations when training the neural network. The test:seen results were nearly identical to the training results, even though the specific poses of each animal were previously unseen. However, due to the use of heavy image augmentation, similar poses were likely represented during training. In contrast, the test:unseen results were much worse, likely due to the novel environments and challenging lighting conditions not present in the training set images.

**Table 5.** Instance detection results for the training set (1600 images), the test set with images of the same environments in the training set (200 images), and the test set with images of new environments and challenging lighting conditions (200 images). The part detection threshold was fixed at 0.25.


By digging deeper into the results and looking at specific examples, it is possible to learn more about the performance discrepancies. An example of 100% successful detections from both test:seen and test:unseen sets are shown in Figure 11. Here, the neural network output is illustrated for each of the 16 channels, and the final detections are shown below, where a green line connecting the shoulder to the tail defines a true positive detection. Note that, unlike the target part association maps illustrated in Figure 3, the outputs of the neural network do not clearly conform to the part locations. This is because the network was only trained to produce the correct vector (illustrated by color) at part locations and, at all other locations, the network essentially tried to minimize the cost with hypothetical part association vectors in case a part was present in that location. This attempt to minimize cost "in case" was most visible when comparing the part association maps of shoulder-to-tail and tail-to-shoulder (the bottom two part association maps in Figure 11). Even when the network was highly confident that a location belonged to a tail, it produced an association vector pointing from shoulder-to-tail at that location, just in case a shoulder at that location was mistaken for a tail.

Due to the similar lighting and overall appearance of test:seen image in Figure 11a, the method was able to identify every instance within the pen environment with high confidence (as indicated by the first four channel outputs of the neural network). However, in the test:unseen image (Figure 11b), the pig behind bars in the adjacent pen caused some confusion in the network. This was likely due to the fact that the network had never been exposed to this particular pen environment, and thus it had not been trained to ignore partial animals on the other side.

Alternatively, Figure 12 illustrates failure cases for both test:seen and test:unseen images. Each of the failures in the test:seen image occurred because of occlusions that made it difficult to discern the location of the shoulders and/or tail. In this case, it was even difficult for a human observer to confidently assign the ground truth locations. On the other hand, failures on the test:unseen image were not due to occlusions. They can instead by attributed to the unusual lighting conditions and the relatively large presentation of the animals in the image. Both of these properties were not represented in the training set, making it difficult for the neural network to interpret the image.

Figures 13 and 14 illustrate 24 failures from the test:seen and test:unseen set, respectively. In the test:seen sample set of Figure 13, 17 of the 23 false negatives can be attributed to occlusions or lack of visibility when the pig approached the edge of the image. Some other causes of error include unusual poses where the head was hidden, and situations where the pig had atypical markings. In contrast, only four false negatives out of 21 from the test:unseen sample set (Figure 14) can be attributed to occlusion. At least 10 can likely be attributed to lighting conditions. All three false positives occurred when a pig in an adjacent pen was laying next to the dividing bars. The outline of the bars on the pig's body appeared to confuse the network into interpreting this as a smaller body pointed in the orthogonal direction.

(**a**) Test:Seen (**b**) Test:Unseen

**Figure 11.** Examples of successful instance detection from both the test:seen set (**a**) and the test:unseen set (**b**). The top images depict the first four channels of the neural network output. The middle image composed of six sub-images depicts the color-coded vector associations from the last 12 channels of the neural network output. The bottom images depict both ground truth locations and estimates using the following color coding: false negative (blue), and true positive (green). Note that these images depict 100% successful detections, so only true positives are present.

(**a**) Test:Seen (**b**) Test:Unseen

**Figure 12.** Examples of unsuccessful instance detection from both the test:seen set (**a**) and the test:unseen set (**b**). The top images depict the first four channels of the neural network output. The middle image composed of six sub-images depicts the color-coded vector associations from the last 12 channels of the neural network output. The bottom images depict both ground truth locations and estimates using the following color coding: false negative (blue), and true positive (green).

**Figure 13.** Twenty-four random samples of unsuccessful instance detections from the test:seen set using the following color coding: false negative (blue), and false positive (red).

**Figure 14.** Twenty-four random samples of unsuccessful instance detections from the test:unseen set using the following color coding: false negative (blue), and false positive (red).

### *4.6. Discussion*

The proposed method focuses on detecting the location and orientation of individual pigs in group-housing environments. Due to the lack of an available public dataset, it was necessary to create a new collection of annotated images. This presented a challenge in terms of capturing the appropriate level of variability and, while we believe that the chosen images sufficiently represent a variety of environments and ages of pigs, it would likely have been beneficial to include more camera angles and more than three types of cameras. Furthermore, the four body part locations were arbitrarily chosen as representatives of the location and orientation of each animal instance. A different set of body parts might have been equally effective.

Compared to datasets like ImageNet [67] and COCO [33], 2000 images may seem like an insufficient number of images to train a deep network. However, pig detection from an overhead camera is a much more specific task than classifying images into one of 1000 categories. With nearly 25,000 different animal poses captured, it is likely that any new pose presented to the network will strongly resemble one that already exists in the dataset. Augmentation is also critical to the success of network training. The chosen network contains nearly 4,000,000 coefficients, so it might be possible to overfit to 25,000 static animal poses, but it is much more difficult to overfit when the angle, size, and left-right orientation is randomized.

The fully-convolutional network introduced in Section 3.4 to estimate body part locations and association vectors was designed with sufficient complexity and a wide enough receptive field to achieve high performance levels in terms of precision and recall. However, the chosen hourglass architecture using max pooling/unpooling with skip connections and depth concatenations is almost certainly one of many network architectures capable of producing the desired 16-dimensional output representation. Atrous convolution, for example has been shown to be particular effective for creating a large depth of field [68] and spatial pyramid pooling has further been demonstrated to achieve excellent performance on multi-scale tasks [69]. A thorough investigation into the suitability of different network architectures might lead to a more accurate or efficient network for instance detection.

By inspecting the specific outputs of the network and the instance formation process, it becomes evident that errors are most commonly caused when the shoulder or tail of one pig occludes the same body part on another pig. Due to the network's inability to represent multiple part instances in the same image space location, it is only possible for one part instance to be detected in these situations. The vector associations inherently estimate the location of adjacent body parts, therefore, the occlusion can be inferred from the existing output of the network. Alternatively, it might also be possible to augment the dataset to explicitly label occlusions and build the network to detect such events.

In addition to shoulders and tails, the left and right ear were annotated in the dataset and explicitly detected by the network. While the results for instance-level detection do not evaluate the quality of these detections, we foresee them being integrated into future systems as a way to uniquely identify each instance. Ear tags are a common way for livestock to be identified in commercial facilities, and this may provide a convenient way to differentiate between individuals in an otherwise homogeneous population.

Perhaps the most likely application of this detection method would be within a larger tracking system, where detection serves as the first stage for video processing. To this end, the proposed per-frame detection method naturally lends itself to multi-object tracking (MOT). Specifically, a sub-category known as tracking-by-detection MOT methods directly process the outputs of per-frame detection methods, and their performance is often strongly tied to the quality of the detector [70]. For this reason, it is expected that high quality detection methods will eventually contribute to more reliable methods for multi-object tracking.

### **5. Conclusions**

The proposed method and accompanying dataset introduced in this paper attempt to provide a robust solution to instance-level detection of multiple pigs in group-housing environments. A major contribution of this work is the introduction of an image space representation of each pig as a collection of body parts along with a method to join parts together to form full instances. The method for estimating the desired image space representation leverages the power of deep learning using a fully-convolutional neural network. Through gradual downsampling and upsampling, the network is able to consider large regions in the image space with a receptive field that covers even the largest pigs in the dataset.

Results demonstrate that the method is capable of achieving over 99% precision and over 95% recall at the task of instance detection when the network is tested and trained under the same environmental conditions. When testing on environments and lighting conditions that the network had not been trained to handle, the results drop significantly to 91% precision and 67% recall. These results can be interpreted in one of three ways: (1) networks should be fine-tuned to handle new environments, (2) a larger number and variety of images should be included in the dataset, or (3) the design and/or training methodology should be revised to improve the robustness to environmental variability. As the dataset and the number of environments grows, eventually there might be enough variety such that new environments add little to the network's ability to handle novel presentations. Regarding the third interpretation, while significant augmentations were applied to the input and output images during training, it is impossible for spatial transformations to mimic variations in lighting conditions. Therefore, a new set of non-uniform color-space transformations may provide a solution that improves the robustness of the trained network.

By introducing a new dataset and providing an accompanying method for instance-level detection, we hope that this work inspires other researchers to introduce competing methods and perform objective evaluations on the dataset.

**Author Contributions:** E.P., M.M., L.C.P., T.S. and B.M. conceived and designed the experiments and assisted in data acquisition. E.P. designed and implemented the neural network algorithm and performed the formal analysis. E.P., M.M., L.C.P., T.S. and B.M. wrote the paper.

**Funding:** This research was supported by the National Pork Board (NPB #16-122).

**Acknowledgments:** We would like to thank Lucas Fricke and Union Farms for their cooperation and allowing us to capture data in their swine facility.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Monitoring of the Pesticide Droplet Deposition with a Novel Capacitance Sensor**

### **Pei Wang 1,2,\*,†, Wei Yu 2,†, Mingxiong Ou 1,2, Chen Gong 1,2 and Weidong Jia 1,2,\***


Received: 25 December 2018; Accepted: 24 January 2019; Published: 28 January 2019

**Abstract:** Rapid detection of spraying deposit can contribute to the precision application of plant protection products. In this study, a novel capacitor sensor system was implemented for measuring the spray deposit immediately after herbicide application. Herbicides with different formulations and nozzles in different mode types were included to test the impact on the capacitance of this system. The results showed that there was a linear relationship between the deposit mass and the digital voltage signals of the capacitance on the sensor surface with spray droplets. The linear models were similar for water and the spray mixtures with non-ionized herbicides usually in formulations of emulsifiable concentrates and suspension concentrates. However, the ionized herbicides in formulation of aqueous solutions presented a unique linear model. With this novel sensor, it is possible to monitor the deposit mass in real-time shortly after the pesticide application. This will contribute to the precision application of plant protection chemicals in the fields.

**Keywords:** capacitor sensor; deposit mass; pesticide droplets; formulations; ionization

### **1. Introduction**

The usage of pesticides has increased significantly in the last two decades in China [1]. Meanwhile, its average application dose is 2.5 times of the global average level [2,3]. The over usage of pesticides will cause unnecessary invest for the farmers, leading to more residue of the pesticide ingredients in the food products and the soil, and damage the eco-system as well [4–6]. Furthermore, it can also induce the resistant property of the weeds, insects, and diseases, which makes the pest management strategies more complicated [7,8]. Thus, the effective approaches to monitor the droplet deposition doses during the pesticide application are in urgent demands.

To enhance the assessment efficiency of the deposition quality for the spray, several approaches have been introduced based on the image processing technology [9]. Various systems have been open for users to identify the pesticide deposition effect by analyzing the images of the water sensitive paper after the application, for instance, the Swath Kit [10], the USDA Image Analyzer [11], the Droplet Scan [12], the Deposit Scan [13], the Image J [14], and the Drop Vision-Ag [15]. Most of the new technologies are applied for measuring the coverage, droplet density, and or the droplet sizes. Few methodologies have been reported to evaluate the droplet deposition doses after pesticide application.

Conventionally, the droplet deposition dose is measured based on the elution procedure [16]. Pigments like the poinsettia and the methylene blue are usually selected to replace the pesticide for the spray preparation. After the application, plant leaves or the Petri dishes will be collected and the

pigment ingredients will be eluted with deionized water from the leaf surfaces or the Petri dishes. Then the deposition dose can be calculated from the concentration of the eluent by the measurement with a spectrophotometer [17]. Salyani and Serdynski [18] reported a prototype sensor for the spray deposit monitoring based on the measurement of the electric capacitance of the parallel copper conductors with spray deposited between the copper gaps. Thus, it could indicate the mass information of the droplet deposit with the voltage signals, which enables the real-time and rapid measurements of the pesticide deposition doses. Recently, Zhang et al. [19] applied a similar system for droplet deposition evaluation of the aerial spraying. Meanwhile, a fringing capacitive sensor with similar principles was also applied for water content measurement in the field [20]. However, the prototype equipment was tested with the electrolyte solution with high conductivity, such as NaCl and NaOH, which is not a common feature of the typical spray mixture. As is known, besides the water solvent and the emulsion in water with the ingredients as ionic compound, most pesticide formulations with the organic ingredients are hardly ionized in the water mixture. Thus, further studies are required to verify the fitness and the measurement accuracy of this novel sensor before its application in the agricultural practice.

The objectives of this study are, firstly, to test the sensors application capability on the deposition dose measurement of spray mixtures with pesticide in different formulations; secondly, to evaluate the measuring accuracy of droplet deposition doses of spray mixture with several common pesticide formulations including the emulsifiable concentrates (EC), suspension concentrates (SC), and aqueous solutions (AS).

### **2. Materials and Methods**

### *2.1. Implementation of the Droplet Deposit Sensing System*

A leaf like capacitor (Yingtai Tech., Tianjin, China) was adopted for the implementation of the droplet deposit sensing system in this study. The sensor was designed based on a capacitor with 84 parallel coppers (Figure 1). The coppers were separated into two groups and connected respectively as two electrode plates of the capacitor. The whole structure of the capacitor was packaged (painted) with insulation material of ceramic. Lim et al. has presented that, the capacitance varies according to the dielectric constant of the media composition, the air or the spray, inside the gap of the electrodes [21]. The dielectric constant changes when the ratio of each component in the media composition varies. The dielectric constant can be calculated due to the equation

$$\mathbf{C}\_{s} = \frac{\varepsilon\_{a}\mathbf{S}\_{a} + \varepsilon\_{s}\mathbf{S}\_{s}}{d} \tag{1}$$

where *Cs* is the capacitance of the capacitor with droplets depositing onside, *ε<sup>a</sup>* is the dielectric constant of the air, *ε<sup>s</sup>* is the dielectric constant of the spray mixture, *Sa* is the surface area of the electrode contacting to the air, *Ss* is the surface area of the electrode contacting to the spray droplet, and d is the distance between the copper electrodes.

When considering the total area of the electrodes (*S*), which should be the sum of *Sa* and *SS*, the equation of the dielectric constant can be revised as

$$\mathcal{C}\_{s} = \frac{\varepsilon\_{a}(S - S\_{s}) + \varepsilon\_{s}S\_{s}}{d} = \frac{\varepsilon\_{a}S}{d} + \frac{(\varepsilon\_{s} - \varepsilon\_{a})S\_{s}}{d} = \mathcal{C}\_{0} + \frac{(\varepsilon\_{s} - \varepsilon\_{a})S\_{s}}{d} \tag{2}$$

where, *C*<sup>0</sup> is a constant for each sensor.

Theoretically, the surface area of the electrode contacting to the spray droplet (*SS*) can be calculated from the droplets' mass (*ms*) and density (*ρs*). Then, the deposit mass of the spray can be calculated following the equations

$$m = \rho\_s \, S\_5 d = \rho\_s \frac{(\mathbb{C}\_s - \mathbb{C}\_0 \,) d}{\varepsilon\_s - \varepsilon\_a} d = \mathbb{C}\_s \frac{\rho\_s d^2}{\varepsilon\_s - \varepsilon\_a} - \frac{\rho\_s \mathbb{C}\_0 \, d^2}{\varepsilon\_s - \varepsilon\_a} \tag{3}$$

Thus, a linear model can be fitted to calculate the deposit mass of the spray according to the measurement of the capacitance of the capacitor with droplets depositing onside.

With the leaf like capacitor adopted in this study, it could provide the analog signals to indicate its capacitance. The sensor is implemented with an operational amplifier circuit as shown in Figure 1d. With a capacitor in fixed capacitance, the leaf like capacitance sensor will share different voltages when the dielectric constant inside it is changed. The capacitor was linked to an analog to digital converter (ADS1115, Texas Instruments, Dallas, TX, USA), which could convert the analog signal of the real time capacitance into the digital voltage signals. The digital signal was then processed by a 32-bit microcontroller (STM32F4 EXPLORER, ST Microelectronics, Geneva, Switzerland). The microcontroller could read the input digital signal and display the voltage information representing the analog signal on a thin film transistor-liquid crystal display (TFT screen). Linear models of deposit mass to the voltage signals were developed due to different herbicide spray in this research and installed in the microcontroller. Thus, the microcontroller could calculate the deposit results of each measurement and then display that on the TFT screen. All the measurement results were recorded with a trans-flash Card (SD card) which was mounted on the microcontroller processing board. Figure 1 presents the structure of the system.

**Figure 1.** The structure of the leaf-like capacitor sensor and the deposit monitoring system. (**a**) Is the leaf-like sensor. (**b**) Is the electrode structure on the resin board. (**c**) Is the implementation diagram of the deposit monitoring system. (**d**) Is an electric schematic of the circuit of the leaf like sensor. 1 = electrodes, 2 = capacitor, 3 = data cable, 4 = power cable, 5 = insulating coating, 6 = resin board. The sensor is in 11.2 cm of length, 5.8 cm of width and 0.075 cm of thickness. The distance between each electrode is 1.58–1.78 mm. The width of the electrode is 0.59–0.79 mm and the thickness is 0.01–0.02 mm. The thickness of the insulating layer is about 2 μm. In (**d**), *C0* is a fixed capacitor, *Cx* is the leaf like sensor.

### *2.2. Experimental Setup and Processing*

To develop the models of deposit mass to the voltage signals, experiments were conducted to measure the voltage signals of the leaf like sensor with different herbicide spray depositing onside. Herbicides in different formulation types were selected for the spray preparation with the suggested concentration on the product labels including the Yudasheng® (SC, 20% a.i. atrazine, Xinnong Guotai, Beijing, China), Lvlilai Butachlor® (EC, 50% a.i. butachlor, Lvlilai, Suzhou, China), Ruidefeng Lanhuoyan® (AS, 41% a.i. glyphosate isopropylamine salt, Noposion, Shenzhen, China), as well as the water for control. The sprays were prepared according to the details listed in Table 1. A hanging orbit sprayer (Figure 2) was used for the herbicide application. The application pressure was set

at 0.3 MPa, while the application height was set at 50 cm above the sensor as is common in field conditions. Standard flat fan nozzles with different sizes were selected for the spraying (Lechler® ST 110-01/015/02/03, Lechler GmbH, Metzingen, Germany). Before spraying onto the sensors, the medium droplet sizes were measured for each nozzle with a laser particle size analyzer (Winner® 318B, Winner Particle Instrument, Ji'nan, China). During the testing experiment of the sensor, the moving velocity of the nozzles was fixed for three repeated measurements of each treatment and then varied randomly to obtain different deposit on the sensor surface. For each application, the signals from the sensor were recorded for 40 times over three seconds after the application. Mass of the sensor was measured with an electrical scale (YP102N, Sop-Top, Shanghai, China) before and after application. The deposit mass was calculated from the two mass measurements.


**Figure 2.** The hanging orbit sprayer for herbicide application in the experiment.

### *2.3. Data Analysis*

The data analysis was processed with R Studio 0.98.490 [22]. The linear model was employed to fit the relationship between the voltage signals and the deposit mass of the spray. Analysis of variance (ANOVA) was carried out for the evaluation of differences between each treatment. Data were tested for normal distribution using the Shapiro-Wilk test (*p* > 0.05). Equality for heterogeneity of variances was tested using Levene's test for each treatment (*p* > 0.05).

### **3. Results and Discussion**

### *3.1. Measurement Impact Analysis*

To evaluate the impact of application factors as the nozzle types (droplet sizes) and the formulation types of the herbicides on the sensor's measurement results, experiments were conducted. The measuring results were presented in Figures 3 and 4.

**Figure 3.** Linear regression models of the deposit data of sprays with herbicides in different formulations. The shadow area represents the standard error of the regressed linear models.

Linear models were employed for the regression of the signal voltage to deposit mass curves of each treatment. All the models were compared by ANOVA (α = 0.05). It indicated that, when the nozzle type was fixed, there was no significant differences between the deposit to voltage curves of the spray mixture with EC (butachlor), SC (atrazine), and water. However, considering the AS (glyphosate), the voltage signal curve of the spray deposit was markedly different from the other groups. When the deposit is greater than 0.2 g on the leaf like sensor, the signal voltages were much higher of the sensor with AS droplets than the EC, SC, or water droplets. To explain this difference, the ionization property of the glyphosate isopropylamine salt should be concerned. The glyphosate is a weak acid herbicide. It usually exists as a salt compound and will divide into two ions with opposite charges (the cation and the anion) after being dissolved in the water, while the other herbicides with formulations like EC or SC usually contain organic ingredients which could not dissolve and ionize in the water. The concentration of ions would vary the dielectric constant of the spray mixture. As a result, the voltage signals of the electric capacity would be higher when the concentration of the glyphosate was increased.

In Figure 4, deposit curves of different nozzles were compared on each herbicide formulation. It presented that, for the spray mixture with same herbicide formulations, there was no significant differences of the deposit curves between the treatments with each nozzle type. Therefore, the spray mixture could be classified into two groups due to the ionization feature of the ingredients.

**Figure 4.** Linear regression models of the deposit data of sprays with droplet size spectrum generated from different nozzles. The shadow area represents the standard error of the regressed linear models.

### *3.2. Statistical Modeling*

Considering the little differences of the deposit curves of water, SC and EC sprays, the data could be gathered for the modeling of the signal voltage to the deposit mass for all of the non-ionized herbicides, while the ionized herbicides should be tested for separated models. In this research, the glyphosate was taken as an example of the cases for ionized ingredients application. Since, the nozzle types did not show any impact on the signal voltages of the sensor measurement, the data of different herbicides but with the same nozzle type could be gathered for the modeling. Linear models were applied respectively for the regression of the two cases. The linear regression fitted well to the grouped data, which corresponding to the results of a former study by Zhang et al. [19].

With the linear model of the electric signals to the deposit of the non-ionized sprays, the algorithm could be programmed and installed in the microcontroller. Thus, the sensor was able to give precise information of the quantified spray deposit in the field. The dielectric constant of a dielectric material depends on the frequency of the applied electric field. The behavior of the dielectric constant is affected by three types of polarization including the orientation polarization, the ionic polarization and the electronic polarization. The ionized sprays may show all the three types of polarization, so that all polarization mechanisms are acting at lower frequencies till the water dipoles fail to follow the field alternations and so the orientation polar ceases to play, after which the polarization is dominated by ionic polarization. This continues until the ions' oscillations cannot follow the field alternation and it gets out the play at which the reaming will be the electronic polarization. When the frequency of the applied field equal to the resonance frequency of ionic movement, the dielectric constant show resonance behavior. The dielectric constant is proportional to density of the induced dipoles. Therefore, it is expected to increase with the salt concentration of the electrolyte [23]. Thus, the electric signal of ionized spray will be affected by the mount, concentration and other parameters of the electric charges in the mixture. Therefore, the model of one variable regression could not be used to simulate the

application deposit with different ingredients. However, in this certain case of glyphosate application with the recommended doses on the product label, the linear regression model presented in Figure 5 could be adopted with the algorithm program similar to the one used for the deposit detection of the non-ionized sprays.

**Figure 5.** Linear regression models of the deposit data of sprays with ionized and non-ionized ingredients. The shadow area represents the standard error of the regressed linear models (ionized: R<sup>2</sup> = 0.837, non-ionized: R2 = 0.882).

### **4. Conclusions**

According to this study, we can conclude that the electric capacitor sensor has shown strong potential for its application on the real-time monitoring of the herbicide spraying deposit. The system and model implemented in this study can already be applied in the deposit measurement of the herbicide with non-ionized ingredients. For other water-soluble herbicides with ionized ingredients. However, further studies are required for the measuring model simulation, as the concentration and the valence of the ions will have impact on the dielectric constant of the spray mixture, which can directly affect the signal voltage representing the electric capacity of the sensor.

**Author Contributions:** Conceptualization, P.W.; Data curation, W.Y.; Formal analysis, P.W.; Funding acquisition, M.O. and W.J.; Investigation, W.Y.; Methodology, P.W., W.Y., M.O., and C.G.; Project administration, W.J.; Resources, C.G. and W.J.; Software, W.Y.; Supervision, P.W., M.O., and W.J.; Writing—original draft, P.W.; Writing—review & editing, P.W.

**Funding:** This research was funded by the National Key Research and Development Plan of China (grant number 2017YFD0700901), the National Natural Science Foundation of China (grant number 51475215, 31601676), and the Advanced Talent Research Funding of Jiangsu University (grant number 5501200004).

**Acknowledgments:** In addition, the authors would like to thank Huitao Zhou, Guangyang Mao, Shengnan Cao, Wanting Yang, and Shuai Zang for technical help in this study.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**

1. National Bureau of Statistics of China. 2018. Available online: http://data.stats.gov.cn/easyquery.htm?cn= C01 (accessed on 25 December 2018).


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Sensors* Editorial Office E-mail: sensors@mdpi.com www.mdpi.com/journal/sensors

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18