*2.1. Image Acquisition*

Experimental plots were photographed with a quadcopter drone (DJI Inspire 2, DJI Science and Technology Co. Ltd., Shenzhen, China) (Figure 2a) mounted with two cameras: (1) a modified (Llewellyn Data Processing LLC, Carlstadt, NJ, USA) Canon (Canon, Tokyo, Japan) PowerShot ELPH 130 IS; and (2) a MAPIR Survey2 (MAPIR, Peau Productions, Inc., CA, USA) in Figure 2b. During each flight, the quadcopter took images using each camera at a height of 30 meters and a forward and side lap of 90% and 60%, respectively. The technical specifications of the two sensors are shown in Table 2. Figure 3 shows the data collected from both sensors. *Remote Sens.* **2019**, *11*, x FOR PEER REVIEW 5 of 13 (Llewellyn Data Processing LLC, Carlstadt, NJ, USA) Canon (Canon, Tokyo, Japan) PowerShot ELPH 130 IS; and (2) a MAPIR Survey2 (MAPIR, Peau Productions, Inc., CA, USA) in Figure 2b. During each flight, the quadcopter took images using each camera at a height of 30 meters and a forward and side lap of 90% and 60%, respectively. The technical specifications of the two sensors are shown in Table 2. Figure 3 shows the data collected from both sensors. (a) (b) *MAPIR Survey 2* 

*Canon Powershot*

**Figure 2.** (**a**) DJI Inspire 2 with (**b**) sensors installed underneath. Images on (b) were from www.canon.com (Canon Powershot) and www.mapir.camera (MAPIR). **Figure 2.** (**a**) DJI Inspire 2 with (**b**) sensors installed underneath. Images on (b) were from www.canon.com (Canon Powershot) and www.mapir.camera (MAPIR). Image no. for Area 2 154 324 Wavebands Red, Green, NIR Red, Green, NIR GSD (at 120 m) 4.05 cm 4.05 cm

Wavebands Red, Green, NIR Red, Green, NIR GSD (at 120 m) 4.05 cm 4.05 cm **Figure 3.** Examples of false-color near-infrared images taken by (**a**) modified Canon and (**b**) MAPIR **Figure 3.** Examples of false-color near-infrared images taken by (**a**) modified Canon and (**b**) MAPIR cameras.

#### cameras. *2.2. Image Pre-Processing*

cameras.

*2.2. Image Pre-Processing* 

*2.2. Image Pre-Processing*  Images were cropped using a custom, browser-based interface in LabelBox (Figure 4). Data were Images were cropped using a custom, browser-based interface in LabelBox (Figure 4). Data were annotated by dragging a bounding box across each plant and labeling it as 'high water stress', 'low

**Figure 3.** Examples of false-color near-infrared images taken by (**a**) modified Canon and (**b**) MAPIR

(a) (b)

local computer so that individual plants could be cropped from the original aerial images. The

Images were cropped using a custom, browser-based interface in LabelBox (Figure 4). Data were annotated by dragging a bounding box across each plant and labeling it as 'high water stress', 'low water stress', or 'no stress' according to the key provided in Figure 5. The GraphQL application programming interface (API) was used to pull the pixel coordinates of each bounding box onto a local computer so that individual plants could be cropped from the original aerial images. The images for each condition is shown in Table 3.

water stress', or 'no stress' according to the key provided in Figure 5. The GraphQL application programming interface (API) was used to pull the pixel coordinates of each bounding box onto a local computer so that individual plants could be cropped from the original aerial images. The resolution of the cropped images was approximately 150 by 150 pixels. The number of cropped images for each condition is shown in Table 3. *Remote Sens.* **2019**, *11*, x FOR PEER REVIEW 6 of 13 resolution of the cropped images was approximately 150 by 150 pixels. The number of cropped *Remote Sens.* **2019**, *11*, x FOR PEER REVIEW 6 of 13 resolution of the cropped images was approximately 150 by 150 pixels. The number of cropped images for each condition is shown in Table 3.

**Figure 4.** Custom LabelBox interface used to crop aerial images. **Figure 4.** Custom LabelBox interface used to crop aerial images. **Figure 4.** Custom LabelBox interface used to crop aerial images.

**Figure 5.** Example of the key used to identify plants according to species and condition. Stressed plants are labeled red (high water stress), yellow (low water stress), or blue (low phosphorus fertilizer, **Figure 5.** Example of the key used to identify plants according to species and condition. Stressed plants are labeled red (high water stress), yellow (low water stress), or blue (low phosphorus fertilizer, not used). Unmarked plants are non-stressed. **Figure 5.** Example of the key used to identify plants according to species and condition. Stressed plants are labeled red (high water stress), yellow (low water stress), or blue (low phosphorus fertilizer, not used). Unmarked plants are non-stressed.

not used). Unmarked plants are non-stressed.

**Table 3.** Number of images for each camera, species, and treatment **Species Water Treatment Plants Modified Canon MAPIR**  *Buddleia* (BUD) HWS 8 24 24 LWS 8 24 24 NS 18 72 72 *Cornus* (CO) HWS 8 25 25 LWS 8 25 25 NS 28 85 85 *Hydrangea paniculata* (HP) HWS 8 40 40 **Table 3.** Number of images for each camera, species, and treatment **Species Water Treatment Plants Modified Canon MAPIR**  *Buddleia* (BUD) HWS 8 24 24 LWS 8 24 24 NS 18 72 72 *Cornus* (CO) HWS 8 25 25 LWS 8 25 25 NS 28 85 85 *Hydrangea paniculata* (HP) HWS 8 40 40 LWS 8 40 40 Since multiple photographs were taken of the same plots from different angles, cropped images of the same plants were grouped together so that they could be segregated into the training set or validation set as complete units. This procedure protected against overly optimistic performance estimates that would occur if photographs of the same plant appeared in both the training and validation datasets. For each species and treatment, the centers of each bounding box were calculated and normalized to a range of zero to one. Spatstat (http://spatstat.org), an open-source R package for analyzing point patterns, was then used to match plants from different aerial images based on the similarity of their pixel coordinates. For example, if there were eight plants in the HWS treatment of a certain species, all images of plants one through six would be used to train the model and all images

LWS 8 40 40

visual inspection (Figure 6).

*2.3. Model Training and Testing* 

of plants seven and eight would be used for validation. This allowed us to make full use of the data during the training phase without artificially inflating performance metrics by validating models with images of the same plants they were trained with. The successful grouping was confirmed by visual inspection (Figure 6).


**Table 3.** Number of images for each camera, species, and treatment. *Remote Sens.* **2019**, *11*, x FOR PEER REVIEW 7 of 13

HWS = high water stress; LWS = low water stress; and NS = no stress treatment. during the training phase without artificially inflating performance metrics by validating models

with images of the same plants they were trained with. The successful grouping was confirmed by

**Figure 6.** Images of the same plants cropped from multiple aerial images (Img1–Img5) were matched based on their pixel coordinates so that unique plants could be grouped into either the training set or the validation set. Brackets show one of four train/test splits. **Figure 6.** Images of the same plants cropped from multiple aerial images (Img1–Img5) were matched based on their pixel coordinates so that unique plants could be grouped into either the training set or the validation set. Brackets show one of four train/test splits.

#### *2.3. Model Training and Testing*

Cropped images were used to train models with the Watson Visual Recognition API, a cloud-hosted artificial intelligence service provided by IBM that uses CNNs to build custom image classifiers. Here, models were trained to predict water stress status using red, green, and near-infrared pixel values of the cropped images. A Python script was used to access the service and transfer images from a local computer to a cloud server for model training and testing. For each species and camera, three-quarters of NS and HWS images were used to train a model that was then used to classify the remaining quarter. The API returned a prediction between zero and one for each validation image with zero indicating no stress and one indicating water stress (Table 4). This process was repeated four times so that a prediction could be made for each image in the dataset and compared to the ground truth.

**Table 4.** Predictions returned by the Watson Visual Recognition API (Score) are compared to the ground truth (Stress).


#### *2.4. Statistical Analysis*

A receiver operating characteristic area under the curve (AUC) score was used to quantify the degree of separation between treatments for each species and camera. A one-sample *t*-test was used to compare the AUC scores returned by the four-fold validation sets to a hypothesized mean of 0.5, corresponding to random classification.

#### **3. Results**

Of the 11 combinations of species and camera used in this study, four produced models that were able to discriminate images of NS and HWS plants with a statistically significant degree of separation (p < 0.05): Canon and MAPIR images of *Buddleia*, Canon images of *Physocarpus opulifolius*, and MAPIR images of *Hydrangea paniculata* (Table 5). Of these four, models trained with MAPIR or Canon images of NS and HWS *Buddleia* were also able to discriminate NS and LWS plants with high separation (Table 6). Four datasets produced models with a marginally significant degree of separation (0.05 < p < 0.10): Canon and MAPIR images of *Hydrangea quercifolia*, Canon images of *Hydrangea paniculata*, and MAPIR images of *Physocarpus opulifolius* (Table 5). Images of *Spiraea japonica* were not tested because the HWS class in the training set did not meet the minimum of 10 images required by the Visual Recognition API. Overall, models trained with four of five species tested achieved marginal significance or better (p < 0.10) in one or both cameras (Figures 7 and 8).

Results were compared to a previous study by de Castro et al. [18] that described the same dataset by masking the background and comparing mean pixel values in stressed and non-stressed plants. The three wavelengths detected by each camera were delineated and differences between treatments were evaluated by performing an analysis of variance (ANOVA) significance by a Tukey honestly significant difference (HSD) range test. Experiments that demonstrated a significant difference in mean pixel value between water stress treatments in one or more wavelengths (p < 0.05) are highlighted

not report specific *p*-values.

green in Table 5. Marginal significance is not shown because de Castro et al. [18] did not report specific *p*-values. *Buddleia* (BUD) Canon 0.9931 0.0046 1.10E−05 MAPIR 0.9907 0.0076 2.97E−05

**Species Camera Mean AUC St. Dev** *P***-value de Castro et al.**

*Remote Sens.* **2019**, *11*, x FOR PEER REVIEW 9 of 13

are highlighted green in Table 5. Marginal significance is not shown because de Castro et al. [18] did

**Table 5.** Performance of models trained to classify HWS and NS images. Models achieving a statistically significant degree of separation (*p*-value < 0.05) are highlighted green and models

**Table 5.** Performance of models trained to classify HWS and NS images. Models achieving a statistically significant degree of separation (*p*-value < 0.05) are highlighted green and models achieving a marginal degree of separation (0.05 < *p*-value < 0.10) are highlighted yellow. *Cornus* (CO) Canon 0.463 0.0976 0.2635 MAPIR 0.5094 0.1661 0.4602 Canon 0.6448 0.1381 0.0854


**Table 6.** Models that achieved a statistically significant degree of separation on HWS images were also used to classify LWS images. **Species Camera Mean AUC St. Dev.** *P***-value de Castro et al.**


**Figure 7. Figure 7.**  Performance of models trained to classify HWS and NS images. Performance of models trained to classify HWS and NS images.

*Remote Sens.* **2019**, *11*, x FOR PEER REVIEW 10 of 13

**Figure 8.** Models that achieved a statistically significant degree of separation on HWS images were also used to classify LWS images. **Figure 8.** Models that achieved a statistically significant degree of separation on HWS images were also used to classify LWS images.

#### **4. Discussions**

**4. Discussions**  Unlike traditional machine vision models that require users to manually select features, CNNs have layers of neurons that allow them to automatically learn relevant features from data. CNNs improve with each training example by iteratively rewarding neurons that amplify aspects of the image that are important for discrimination and suppressing those that do not. For example, in traditional techniques, the background must be manually segmented prior to analysis. By contrast, CNNs can automatically 'learn' to ignore the background because it is not relevant to the classification task. Similarly, rather than manually delineating spectral indices thought to be correlated with plant health, networks can infer relevant transformation of the input color channels from data. Low level features inferred by the network feed into higher-order features such as the specific location or pattern of discoloration within the plant. Information from spectral indices may combine with other features such as the unique structure of sagging branches or the distinct texture created by the shadows from wilted leaves. Thus, CNNs can learn multiple features of the training Unlike traditional machine vision models that require users to manually select features, CNNs have layers of neurons that allow them to automatically learn relevant features from data. CNNs improve with each training example by iteratively rewarding neurons that amplify aspects of the image that are important for discrimination and suppressing those that do not. For example, in traditional techniques, the background must be manually segmented prior to analysis. By contrast, CNNs can automatically 'learn' to ignore the background because it is not relevant to the classification task. Similarly, rather than manually delineating spectral indices thought to be correlated with plant health, networks can infer relevant transformation of the input color channels from data. Low level features inferred by the network feed into higher-order features such as the specific location or pattern of discoloration within the plant. Information from spectral indices may combine with other features such as the unique structure of sagging branches or the distinct texture created by the shadows from wilted leaves. Thus, CNNs can learn multiple features of the training images and are not limited by a priori hypotheses.

images and are not limited by a priori hypotheses. Models tested in this study demonstrated significant variation in their ability to identify water stress in different species. Models trained on *Buddleia* achieved near-perfect separation while those trained on *Cornus* approximated random classification. Such variation is consistent with previous literature showing differences in morphological and physiological responses to water stress across genera, species, and even cultivar. In Michigan, Warsaw et al. [37] tracked daily water use and water use efficiency of 24 temperate ornamental taxa from 2006 and 2008. Daily water use varied from 12 to 24 mm per container and daily water use efficiency (increase in growth index per total liters applied) varied from 0.16 to 0.31. Of the similar taxa used, *Buddleia davidii* 'Guinevere' (24 mm per container) had the greatest water use followed by *Spirea japonica* 'Flaming Mound' (18 mm per container), *Hydrangea paniculata* 'Unique' (14 mm per container), and *Cornus sericea* 'Farrow' (12 mm per container) with estimated crop coefficients (KC) of 6.8, 5.0, 3.6, and 3.4, respectively. Low-water tolerant taxa such as *Cornus* may simply not have been demonstrating symptoms of water stress when they were photographed. Models that achieved moderate performance were likely provided with too few examples to distinguish patterns relevant to the classification task from those specific to the training data, causing them to generalize poorly to new data during the testing phase. Such Models tested in this study demonstrated significant variation in their ability to identify water stress in different species. Models trained on *Buddleia* achieved near-perfect separation while those trained on *Cornus* approximated random classification. Such variation is consistent with previous literature showing differences in morphological and physiological responses to water stress across genera, species, and even cultivar. In Michigan, Warsaw et al. [37] tracked daily water use and water use efficiency of 24 temperate ornamental taxa from 2006 and 2008. Daily water use varied from 12 to 24 mm per container and daily water use efficiency (increase in growth index per total liters applied) varied from 0.16 to 0.31. Of the similar taxa used, *Buddleia davidii* 'Guinevere' (24 mm per container) had the greatest water use followed by *Spirea japonica* 'Flaming Mound' (18 mm per container), *Hydrangea paniculata* 'Unique' (14 mm per container), and *Cornus sericea* 'Farrow' (12 mm per container) with estimated crop coefficients (KC) of 6.8, 5.0, 3.6, and 3.4, respectively. Low-water tolerant taxa such as *Cornus* may simply not have been demonstrating symptoms of water stress when they were photographed. Models that achieved moderate performance were likely provided with too few examples to distinguish patterns relevant to the classification task from those specific to the training data, causing them to generalize poorly to new data during the testing phase. Such overfitting

bias can be overcome by training models with a larger and more diverse set of training images. Varying the location, weather, and growing period in which images are taken, for example, can force models to learn features that generalize to all conditions. Future studies can also use images of plants with multiple degrees of water stress to train regression models that return a value along a numeric scale rather than a stressed or not-stressed binary.

While CNNs' complicated nature prevents us from knowing what features are driving the model, insight can be gained from the conditions in which classifiers succeed or fail. For example, classifiers trained by pooling images of all species had significantly lower performance than classifiers trained with images of just one species despite having a considerably larger training set. This suggests that symptoms of water stress differ from one species to the next. Subsequent studies can identify what features are driving the model by iteratively removing them from the image. For example, one experiment could train models with individual R, G, or near-infrared channels to determine if certain spectral indices are more sensitive to water stress than others. Another experiment could crop a rectangle circumscribed to the plant in order to see if plant shape or other peripheral features aid the classifier. Features that significantly reduce performance when removed may represent biologically relevant phenotypes that are worthy of further study.

#### **5. Conclusions**

Our findings confirm that the IBM Watson Visual Recognition service can be used to identify early indicators of water stress in ornamental shrubs despite constraints such as small sample size, low image resolution, and lack of clear visual differences. Watson-generated models were able to detect indicators of stress after 48 hours of water deprivation with a significant to marginally significant degree of separation in four out of five species tested (p < 0.10). Models trained on images of *Buddleia* achieved near-perfect separation after only 24 hours with a max AUC of 0.9884. Furthermore, unlike traditional algorithms that require users to manually select plant parameters believed to correlate with health status, CNNs were able to automatically infer relevant features from the training data and combine multiple types of visual information. Despite this, not all models were successful. Failure of models trained on images of *Cornus* was consistent with previous literature, suggesting higher water stress tolerance in *Cornus* compared to the other species tested. Because all plants were grown in the same experimental area, authors cannot be certain that these models will generalize well to new situations.

Future studies can focus on improving model accuracy and generalizability by increasing the number of training examples and varying the conditions in which images are taken. Fully trained networks can also be introspected to give biological backing to the most predictive features. Other studies can expand the application of this workflow by testing data collected with different sensors and on different species. These experiments provide a valuable case study for the use of CNNs to monitor plant health. Brought to scale, artificial intelligence frameworks such as these can drive responsive irrigation systems that monitor plant status in real time and maximize sustainable water use.

**Author Contributions:** Conceptualization, A.I.d.C., J.M.P., J.M.M., J.R., and J.S.O.J.; Methodology, D.F., S.G., and D.H.S.; Software, D.F. and S.G.; Investigation, A.I.d.C., J.M.P., J.M.M., J.R., and J.S.O.J.; Resources, J.S.O.J. and J.M.M.; Validation, A.I.d.C., and J.M.P.; Formal Analysis, D.H.S., D.F., and S.G.; Writing—original draft preparation, D.F.; Writing—review and editing, D.F., S.G., D.H.S., J.M.M., J.R., and J.S.O.J.; Supervision, J.M.M.; Project Administration, J.M.M.; Funding Acquisition, J.M.M. and J.R.

**Funding:** This work was partially supported by a grant from the J. Frank Schmidt Family Charitable Foundation and is based on work supported by NIFA/USDA under project numbers SC-1700540, SC-1700543 and 2014-51181-22372 (USDA-SCRI Clean WateR3). Research of Drs. Peña and de Castro was financed by the "Salvador de Madariaga" for Visiting Researchers in Foreign Centers Program (Spanish MICINN funds) and the Intramural-CSIC Project (ref. 201940E074), respectively.

**Acknowledgments:** The authors would like to thank Julie Brindley and Ana Birnbaum for their support and assistance in this project. Special thanks to IBM for supporting our research with access to their artificial intelligence services.
