**Classification of Smoke Contaminated Cabernet Sauvignon Berries and Leaves Based on Chemical Fingerprinting and Machine Learning Algorithms**

**Vasiliki Summerson 1, Claudia Gonzalez Viejo 1, Colleen Szeto 2,3, Kerry L. Wilkinson 2,3, Damir D. Torrico 4, Alexis Pang 1, Roberta De Bei <sup>2</sup> and Sigfredo Fuentes 1,\***


Received: 22 August 2020; Accepted: 5 September 2020; Published: 7 September 2020

**Abstract:** Wildfires are an increasing problem worldwide, with their number and intensity predicted to rise due to climate change. When fires occur close to vineyards, this can result in grapevine smoke contamination and, subsequently, the development of smoke taint in wine. Currently, there are no in-field detection systems that growers can use to assess whether their grapevines have been contaminated by smoke. This study evaluated the use of near-infrared (NIR) spectroscopy as a chemical fingerprinting tool, coupled with machine learning, to create a rapid, non-destructive in-field detection system for assessing grapevine smoke contamination. Two artificial neural network models were developed using grapevine leaf spectra (Model 1) and grape spectra (Model 2) as inputs, and smoke treatments as targets. Both models displayed high overall accuracies in classifying the spectral readings according to the smoking treatments (Model 1: 98.00%; Model 2: 97.40%). Ultraviolet to visible spectroscopy was also used to assess the physiological performance and senescence of leaves, and the degree of ripening and anthocyanin content of grapes. The results showed that chemical fingerprinting and machine learning might offer a rapid, in-field detection system for grapevine smoke contamination that will enable growers to make timely decisions following a bushfire event, e.g., avoiding harvest of heavily contaminated grapes for winemaking or assisting with a sample collection of grapes for chemical analysis of smoke taint markers.

**Keywords:** smoke taint; remote sensing; climate change; near-infrared spectroscopy; volatile phenols

## **1. Introduction**

The incidence and intensity of wildfires are increasing worldwide, mainly due to the effects of climate change [1–5]. Bushfires that occur near wine regions can result in grapevine smoke exposure, which can alter the chemical composition of grape berries. Wine produced from these smoke-affected grapes may exhibit unpalatable smoky aromas and flavors, such as "burnt wood", "ashy", and "burnt rubber" [6–9]. These undesirable characters have been attributed to smoke-derived volatile phenols (VPs), including guaiacol, 4-methylguaiacol, cresols, and syringol [7,10,11]. It is thought that these VPs accumulate primarily in the skin of grape berries following smoke exposure and, to a lesser extent,

in the pulp and seeds [12–15]. Grapevine smoke exposure, and the resulting smoke taint in wine, have caused significant financial losses for grape growers and winemakers due to discarded grapes and unsaleable wine. For example, the 2009 Black Saturday bushfires in Victoria, Australia, were estimated to have caused AUD 300 million in lost revenue [16–19]. More recently, the Australian Grape and Wine Incorporated (AWGI) estimated an AUD 40 million loss from the 2019/2020 summer bushfires [20]. Vineyard smoke exposure, therefore, remains a significant issue for the wine industry, particularly given the increasing frequency and severity of bushfires [21].

Grapevine leaves have also been found to accumulate VPs, and a positive correlation has been demonstrated between the levels of smoke compounds detected in leaves and wine when they were included in the primary fermentation [13,22,23]. From a physiological point of view, smoke exposure has also been shown to decrease stomatal conductance in leaves, which may result from the reaction of carbon dioxide (CO2) and carbon monoxide (CO) with water vapor in the substomatal cavity producing carbonic acid (H2CO3) [24,25]. Carbonic acid reduces the pH in the stomata, resulting in partial or complete stomatal closure [25,26]. Damage to leaf surfaces following smoke exposure has also been observed, with the development of necrotic lesions or, in extreme cases, total leaf necrosis [10,22,27]. This may be the result of ozone (O3) present in smoke, which has been linked to chlorophyll destruction and accelerated leaf senescence [28,29].

Some chromatographic techniques such as gas chromatography-mass spectrometry (GC-MS) and high-performance liquid chromatography-tandem mass spectrometry (HPLC-MS/MS) have been developed to quantify levels of free and glycosidically bound VPs in grapes and wines [30–33]. While these techniques are currently used for qualitative and quantitative analysis and may assist growers in determining the level of smoke taint in the final wine, there are numerous shortcomings: sample preparation is time-consuming and destructive, and analyses require expensive reagents, standards, and equipment, as well as trained personnel. Furthermore, following a bushfire event, there may be long delays in the availability of results due to large numbers of samples being submitted to commercial laboratories for analysis [34,35]. Consequently, alternative methods of smoke taint analysis have recently been investigated and may offer non-destructive sample preparation, as well as accurate and rapid results.

The use of spectroscopic techniques has increased in recent years due to their ease of use, rapid results, minimal sample preparation, and non-destructive nature, all of which allow repeated measurements to be taken [34–39]. Furthermore, the development of smaller, handheld spectroscopic devices coupled with decreasing costs, has allowed these technologies to be more readily accessible and affordable to growers and farmers, while their portability allows for in-field use, reducing the risk of sample deterioration during transportation [39,40]. Ultraviolet (UV) to visible (Vis) spectroscopy involves the region between 200–780 nm, which can be used to analyze compounds containing organic acids, phenolic compounds, and pigments such as anthocyanins, carotenoids, and chlorophylls [41]. UV-Vis spectroscopy has been used to determine the contribution of chemical compounds towards the composition of extra virgin olive oils to determine the region in the Mediterranean it was produced, to optimize the aging process of Spanish wines, and to assess the impact of heating edible oils and to determine their acid level [42–44]. Near-infrared (NIR) spectroscopy between the light spectra regions of 780–2500 nm has been widely used in agricultural and food science applications, with NIR bands corresponding to overtones resulting from the vibrations of O-H, C-H, N-H, and S-H bonds [39,41]. Various spectroscopic techniques, most notably in the NIR region, have been used for numerous applications in viticulture, including the assessment of grape quality and ripeness as well as the authentication of geographical origin [38,45–50]. Research has also been conducted on the use of mid-infrared (MIR) spectroscopy (between 2500–25,000 nm) of the electromagnetic spectrum, as well as synchronous two-dimensional MIR correlation spectroscopy (2D-COS) for the classification of smoke tainted wines [34,35]. Both techniques showed potential for screening smoke tainted wine, with MIR spectroscopy achieving 61 and 70% classification rates for control and smoke affected wines, respectively. However, classification rates were affected by the degree of smoke taint, as well as

compositional differences arising from the grape variety and oak maturation [34]. While this technology may help to assess wine samples for smoke taint, it does not provide an early, in-field detection system that could help growers identify which grapes may be contaminated before winemaking. At present, there is very little research investigating the in-field use of Vis-NIR spectroscopy for the classification of smoke-affected grapevine leaves and berries. Research by Fuentes and coworkers [19] developed a model using NIR spectroscopy between the region of 700–1100 nm to predict the levels of guaiacol glycoconjugates in berries and wine, and the levels of guaiacol in wine. These models may offer growers a non-destructive in-field detection system for grapevine smoke contamination. However, further research is required to determine the effectiveness of different NIR regions for monitoring smoke contamination.

Several chemometric techniques have been used to analyze spectral data, including partial least squares (PLS) regression, principal component analysis (PCA), and artificial neural networks (ANN), to name a few [41]. Of these techniques, ANNs have increased in popularity as classification, prediction, and clustering tools, particularly since they can better interpret the non-linear patterns of spectral data [51–54]. Machine learning (ML) modeling based on ANN can be trained from a set of given data known as 'inputs' or independent variables and form complex, non-linear relationships with these inputs and the 'targets' or dependent variables [54]. For example, preliminary ML models for the classification of smoke tainted grapevines have been developed using infra-red (IR) thermal imagery from canopies, which gave an indication of changes in stomatal conductance for classification of control and smoke-exposed grapevines [25]. In addition to this, another model has been proposed that aims to quantify levels of smoke derived compounds in grapes and wine using NIR spectroscopy measurements as inputs [25]. Furthermore, UV-Vis spectroscopy may offer insights into the degree of physiological performance of leaves as well as fruit ripening and quality through analyzing pigment content, such as chlorophylls, anthocyanins, and carotenoids [55–59].

The objective of this study was to investigate the use of NIR spectroscopy, coupled with ML modeling for the detection of grapevine smoke contamination. Grapevine leaves and berries were analyzed in the vineyard in a smoke trial using a NIR spectrometer, and the absorbance values were used as inputs to train different machine learning algorithms in order to create ANNs with the best classification performances. In addition to this, UV-Vis spectroscopy was used to assess the physiological performance and degree of senescence of leaves, as well as the degree of ripening and anthocyanin content of grapes. This may offer growers a rapid and non-destructive detection system that they can employ themselves to obtain real-time information regarding smoke exposure. This will facilitate timely decision-making around which fruit to sample for chemical analysis and/or to harvest to maintain wine quality.

### **2. Materials and Methods**

## *2.1. Vineyard Site and Experimental Design for the Smoke Trial*

The smoke trial was conducted in late January-early February during the 2018/2019 growing season, at the University of Adelaide's Waite Campus in Urrbrae, South Australia (34◦58 S, 138◦38 E). The trial, described previously by Szeto and colleagues [60], involved the application of smoke and/or in-canopy misting to Cabernet Sauvignon grapevines and comprised five different treatments: a control (C), i.e., neither misting nor smoke exposure; (ii) a control with misting (CM), i.e., in-canopy misting but no smoke exposure; (iii) a high-density smoke treatment (HS); (iv) a high-density smoke treatment with misting (HSM); and (v) a low-density smoke treatment without misting (LS). Treatments were applied to Cabernet Sauvignon grapevines planted in 1998 at 2.0 and 3.3 m vine and row spacings, and trained to a bilateral cordon, vertical shoot positioned trellis system (VSP), hand-pruned to a two-node spur system, with under vine drip irrigation (twice weekly, from fruit set to pre-harvest). Smoke treatments were applied (approximately seven days post-véraison, the period grapes are thought to be most susceptible to smoke contamination [10]) using a purpose-built smoke tent (Figure 1a,b)

and experimental conditions reported previously [4,61]: low and high-density smoke treatments were achieved by burning different fuel loads (i.e., ~1.5 and 5 kg of barley straw, respectively). In-canopy misting was evaluated as a method for mitigating the uptake of smoke-derived volatile phenols by grapes and involved the continuous application of fine water droplets (65 μm) to the grapevine bunch zone using a purpose-built sprinkler system (delivering water at 11 L/h), as previously described [62]. Each treatment was applied to six vines from three adjacent panels, except the HS treatment, which comprised only five vines, with treatments separated by at least one buffer vine. LS, HS, and HSM treatments comprised duplicate applications of smoke to 1.5 panels/three vines at a time (except for one HS treatment). The in-canopy sprinkler system was turned on 5 min before the first HSM treatment was applied and off 15 min after the second HSM treatment was completed, such that CM and HSM grapevines were misted for approximately 2.5 h in total. The second and fifth vine from each treatment (the middle vines from smoke treatments) were then selected for physiological and NIR measurements.

**Figure 1.** Smoke treatments were applied to grapevines using a purpose-built smoke tent; grapevines were enclosed in the tent and exposed to smoke derived from the combustion of barley straw (**a**,**b**).

## *2.2. Physiological Measurements*

The rate of photosynthesis (A), stomatal conductance (gs), and transpiration (E) were determined using a portable infrared gas analyzer equipped with a broad leaf chamber (LCpro-SD, ADC Bioscientific Ltd., Hoddesdon, UK). Measurements were taken on three leaves of each side of the canopy per vine (*n* = 12 leaves per treatment) with a photosynthetic photon flux density of 1000 μmol m−<sup>2</sup> s−<sup>1</sup> supplied by a high efficiency, low heat output, mixed red-blue light-emitting diode (LED) array unit. Water vapor and CO2 concentration in the chamber were set to ambient. Measurements were taken one day (24 h) after smoke treatments were applied, on clear, sunny days.

## *2.3. Determination of Volatile Phenols and Their Glycoconjugates in Grape Juice*/*Homogenate*

The concentration of volatile phenols and their glycoconjugates were determined (in grape juice and homogenate, respectively) using analytical methods described previously [30,32,33,60]. Volatile phenols were measured by stable isotope dilution analysis (SIDA) [3,30,33], using an Agilent 6890 gas chromatograph coupled to a 5973-mass spectrometer (Agilent Technologies, Forest Hill, Vic., Australia). Isotopically labeled standards, i.e., *d*4-guaiacol and *d*3-syringol, were prepared in-house using methods outlined previously [3,30,33]. The limit of quantitation for volatile phenols was 1–2 μg/L. Volatile phenol glycoconjugates were also measured by SIDA [30,32], using an Agilent 1200 high-performance liquid chromatograph (HPLC) equipped with a 1290 binary pump, coupled to an AB SCIEX Triple QuadTM 4500 tandem mass spectrometer, with a Turbo VTM ion source (Framingham, MA, USA). The preparation of the isotopically labeled internal standard, i.e., *d*3-syringol gentiobioside, has been reported previously [30,32]. The limit of quantitation for volatile phenol glycosides was 1 μg/kg.

## *2.4. Near-Infrared Data Collection*

Grapevine leaf and berry spectra were collected one day after smoke exposure, using a microPHAZIRTM RX Analyzer (Thermo Fisher Scientific, Waltham, MA, USA), which had a spectral range of 1596 to 2396 nm at intervals of 7–9 nm. Prior to undertaking the measurements and after every 10–15 readings, the device was calibrated using a white background calibration standard (included with the device). The white background was placed on top of the leaf while measuring to avoid signal noise inclusion due to variation in light or environmental changes. Leaves and berries were also analyzed using the Lighting Passport ProTM handheld spectrometer (Asensetek Incorporation, Xindian District, New Taipei City, Taiwan), which has a spectral range of 380–780 nm at intervals of 1 nm. Measurements were taken at approximately 3 cm from the leaves and berries. All measurements were conducted at ambient temperature between 9:00 a.m. and 6:00 p.m.

For the leaf spectral measurements, nine sunlit and nine shaded, mature, fully expanded leaves were selected (i.e., 18 leaves per vine, 36 leaves per treatment). Leaves were free of any visible signs of disease or blemishes. Each leaf was measured in three areas, in triplicate, using the microPHAZIRTM RX Analyzer, while three measurements per leaf were taken with the Lighting Passport ProTM handheld spectrometer. For the berry spectra, two bunches were selected per vine, and nine berries (three from the top, middle, and bottom of each bunch) were measured, in triplicates using the microPHAZIRTM RX Analyzer (n = 540). On the other hand, twelve berries per treatment were analyzed using the Lighting Passport ProTM (n = 180) while still attached to the bunch.

## *2.5. Calculating Spectral Indices*

Spectral indices for the analysis of pigment content were calculated for both leaves and berries. Leaf spectra taken using the Lighting Passport ProTM were used to calculate the normalized difference vegetation index (NDVI), normalized anthocyanin index (NAI), plant senescence reflectance index (PSRI), and carotenoid reflectance index (CRI) [56,57,59,63–65]. Berry spectra were used to calculate the NAI and PSRI. The calculations and wavelengths used for determining these indices are given in Table 1.


**Table 1.** Calculations for the spectral indices investigated in this study.

## *2.6. Statistical Analysis*

Physiological measurements, spectral indices, volatile phenols, and their glycoconjugates were analyzed by one-way analysis of variance (ANOVA) using Minitab®version 18.1 (Minitab Inc., State College, PA, USA). Mean comparisons were performed using the Fisher least significant difference (LSD) method as a *post-hoc* test at α = 0.05. Near-infrared data were analyzed using The Unscrambler X version 10.3 software (CAMO Software, Oslo, Norway). Absorbance values for all wavelengths were plotted for both the microPHAZIRTM RX Analyzer and Lighting Passport ProTM leaf and berry readings. Principal component analysis (PCA) was also performed using The Unscrambler X program. All microPHAZIRTM RX Analyzer measurements were pre-processed using the second derivative transformation, Savitzky–Golay derivation, and smoothing using The Unscrambler X version 10.3 software prior to the plotting of graphs and statistical analysis.

## *2.7. Artificial Neural Network Modeling*

Three ANN models were developed for berry and leaf NIR readings, which were used as inputs to classify the different smoke treatments using customized code written in MATLAB®(version R2020a, MathWorks Inc., Natick, MA USA) (Figure 2). This code tested a total of 17 training algorithms in a loop to find the optimum in terms of accuracy and performance. Once the optimum training algorithm was identified, further training was performed to develop the most accurate ANN model. For both models, the Levenberg–Marquardt training algorithm was found to be the best algorithm, resulting in models with the highest accuracy and no signs of overfitting.

**Figure 2.** Two-layer feedforward network with ten hidden neurons and sigmoid function for the three classification models: (**a**). microPHAZIRTM leaf model (Model 1) and (**b**). microPHAZIRTM berry model (Model 2). Abbreviations: C = control without misting; CM = control with misting; HS = high density smoke without misting; HSM = high density smoke with misting; and LS = low density smoke.

Overtones within the 1596–1800 nm range were used as inputs for the microPHAZIRTM leaf model (Model 1). This region was selected to avoid water overtones and any classification resulting from the water status of the vines. The entire spectral range was used for the microPHAZIRTM berry model (Model 2) (1596–2396 nm). The two models were developed using a random data division with 70% (*n* = 1134 for Model 1 and 378 for Model 2) training, 15% (*n* = 243 for Model 1 and 81 for Model 2) for validation with a mean squared error (MSE) performance algorithm and 15% (n = 243 for Model 1 and 81 for Model 2) for testing with a default derivative function. Ten hidden neurons were selected for each of the two models after conducting a trimming exercise with three, five, and ten neurons.

## **3. Results**

## *3.1. Physiological Measurements*

Results of gas exchange parameters are shown in Table 2. The transpiration rate was lower for the HS treatment (*P* < 0.005) with a mean rate of 1.43 mmol m−<sup>2</sup> s<sup>−</sup>1, while no differences were observed in the other treatments. The CM and C treatments both had the highest gs values with an average value of 0.15 mol m−<sup>2</sup> s−<sup>1</sup> for each, while HS and LS treatments had the lowest average gs at 0.056 mol m−<sup>2</sup> s−<sup>1</sup> and 0.082 mol m−<sup>2</sup> s−<sup>1</sup> respectively. Mean rates of A were found to be highest in the C and CM treatments (10.77 μmol m−<sup>2</sup> s−<sup>1</sup> and 9.66 μmol m−<sup>2</sup> s<sup>−</sup>1, respectively), while the LS and HS treatments had the lowest (7.01 μmol m−<sup>2</sup> s−<sup>1</sup> and 5.59 μmol m−<sup>2</sup> s<sup>−</sup>1, respectively).

**Table 2.** Gas exchange parameters measured for the different smoke treatments.


Abbreviations: C = control without misting; CM = control with misting; HS = high density smoke without misting; HSM = high density smoke with misting; and LS = low density smoke; SD = standard deviation. Means followed by different letters are significantly different based on Fisher least significant difference (LSD) post hoc test (α = 0.05).

## *3.2. Levels of Smoke Taint Marker Compounds in Grape Juice*/*Homogenate*

Differences in volatile phenol concentrations between HS and HSM treatments were found for guaiacol, 4-methylsyringol, and syringol (*P* < 0.05; Table S1). In particular, 4-methylsyringol and syringol had the largest differences in concentrations amongst the smoke treatments, with the HS treatment exhibiting the highest mean values (17 and 126 μg/L, respectively) followed by the HSM treatment (9 and 59 μg/L, respectively) while the CM treatments exhibited the lowest mean values (2 and 8 μg/L), which displayed the lowest mean value. There were no differences between the HS and HSM treatments, nor between the C, CM, and LS treatments for 4-methylguaiacol, phenol, and total cresols; however, HS and HSM grapes had significantly higher volatile phenol concentrations than C, CM, and LS grapes.

Some differences in volatile phenol glycoconjugate levels could be seen amongst the five smoke treatments. Some glycoconjugates displayed differences between the HS and HSM treatments. There was no difference in GuRG levels between the LS, HS, and HSM treatments, with no levels detected in the C and CM treatments. The HS smoke treatment had the highest levels of PhRG, PhGG, CrPG, SyGG, and SyPG, followed by the HSM and LS treatments and then the C and CM treatments. Interestingly the C and HS treatments had the highest level of CrGG followed by the CM and HSM treatment, while the LS treatment had the lowest concentration.

### *3.3. NIR Absorbance Patterns for Leaves and Berries*

Absorbance spectra for the averages of replicates for both raw and transformed leaf absorbance spectra are depicted in Figures 3 and 4. For the microPHAZIRTM RX Analyzer leaf absorbances, clear differences in spectral readings were observed for each smoking treatment. A peak was observed at approximately 1784–1793 nm (Figure 3a), while for the transformed data (Figure 3b), large peaks are present between 1596–1647 nm.

**Figure 3.** Raw leaf absorbance (**a**) and second derivative spectra (**b**) measured with the microPHAZIRTM near-infrared (NIR) analyzer for the different smoke and misting treatments. Abbreviations: C = control without misting; CM = control with misting; HS = high density smoke without misting; HSM = high density smoke with misting; and LS = low density smoke.

(b)

**Figure 4.** Raw berry absorbance (**a**) and second derivative spectra (**b**) measured with the microPHAZIRTM NIR analyzer for the different smoke and misting treatments. Abbreviations: C = control without misting; CM = control with misting; HS = high density smoke without misting; HSM = high density smoke with misting; and LS = low density smoke.

Differences in absorption readings were also found for the microPHAZIRTM RX Analyzer berry absorbance spectra (Figure 4a). Peaks were originally observed at approximately 1785 and 1902 nm, but in the transformed data (Figure 4b), large peaks were observed between approximately 1596–1640 nm and 1820–1940 nm.

## *3.4. Principal Component Analysis*

Figure 5a shows the principal component analysis (PCA) for the microPHAZIRTM RX Analyzer leaf spectra with absorbance values between 1600–1800 nm. The first principal component (PC1) accounted for 62% of the data variability, while principal component two (PC2) accounted for 24%. Hence, 86% of the total variability was explained by these PCs. There was no clear separation of the different smoke treatments when modeled with the microPHAZIRTM leaf spectra. PC1 was represented by wavelengths between 1604–1621 nm and between 1621–1647 nm (loadings shown in Figure 5b). PC2 was represented by wavelengths between 1613–1647 nm, as well as 1604 nm.

**Figure 5.** Principal component analysis (PCA) for the microPHAZIRTM leaf absorbance values between 1600–1800 nm (**a**) and loadings (**b**). Abbreviations: C = control without misting; CM = control with misting; HS = high density smoke without misting; HSM = high density smoke with misting; and LS = low density smoke.

Figure 6a shows the PCA for the microPHAZIRTM RX Analyzer berry spectra, where 59% of the data variability was described by PC1, while PC2 accounted for 10% of the data variability; thus, a total of 69% of the total data variability was explained by the first two components of the PCA. As with the microPHAZIRTM RX Analyzer leaf spectra, most of the smoke treatments overlapped quadrants. The CM treatment was grouped primarily in the upper right quadrant, while C and LS treatments were grouped primarily in the lower right. The HS treatment was located primarily in the upper right and left quadrants, while the HSM treatment was grouped in the left upper and lower quadrants. PC1 one was represented by the wavelength region 1604–1622. PC2 was represented by the wavelengths between 1630–1647 nm and 2374–2389 nm (loadings shown in Figure 6b).

**Figure 6.** Principal component analysis (PCA) for the microPHAZIRTM berry absorbance values between 1600–2396 nm (**a**) and loadings (**b**). Abbreviations: C = control without misting; CM = control with misting; HS = high density smoke without misting; HSM = high density smoke with misting; and LS = low density smoke.

## *3.5. Spectral Indices*

Results for the spectral indices are shown in Table 3. In the case of the leaf NDVI and NAI, the HS and C treatments had the lowest mean values (0.72 and 0.64 for the HS treatment and 0.84 and 0.74 for the C) (*P* < 0.05). There were no differences for the remaining treatments. For the leaf PSRI, the HS treatment had the highest mean value at 0.065, with no differences for the remaining treatments. For the leaf CRI500, the LS and HS treatments had the highest values at 1.45 and 1.20, respectively, and for the CRI700, the LS treatments had the highest mean values at 1.76, with no differences for the remaining treatments.

In the case of the berry NAI, the HS and LS treatments had the highest mean values with 0.88 and 0.87, with both the C and LS treatments having the lowest mean values of 0.80 and 0.75. For the PSRI, both the LS and C treatments had the highest mean values of 0.02, while the HSM had the lowest value at −0.02.


**Table3.**Meansandstandarddeviation(SD)ofspectralindicescalculatedforleavesandberries.

Abbreviations: C = control without misting; CM = control with misting; HS = high density smoke without misting; HSM = high density smoke with misting; and LS = low densitysmoke. Means followed by different letters are statistically significant based on Fisher's least significant difference (LSD) post hoc test (<sup>α</sup>=0.05).

## *3.6. Artificial Neural Network Models*

Table 4 shows the confusion matrices for the two models developed using the spectral readings as inputs and the experimental treatments as targets. Both models displayed high accuracy in classifying the spectral readings according to the treatments, with an overall accuracy of 98% for the microPHAZIRTM leaf model (Model 1) and 97.4% for the microPHAZIRTM berry model (Model 2). Models 1 and 2 presented validation accuracies (94% and 93%, respectively) close to those of the training stage (100% both models). Furthermore, performance values for training (Models 1 and 2: MSE < 0.01) were lower than the other stages and validation (Model 1: MSE = 0.02; Model 2: MSE = 0.03) and testing (Model 1: MSE = 0.02; Model 2: MSE = 0.04) were similar; this indicates that there were no signs of overfitting for both Model 1 and Model 2.

Figure 7 depicts the receiver operating characteristic (ROC) curves for the two ANN models developed. All models showed high true-positive rates (sensitivity) and low false-positive rates (specificity) for classifying the spectral readings according to the experimental treatment, which can also be observed in the last column of each confusion matrix. For Model 2, the HS treatment had the highest sensitivity (100%), followed by the CM and HSM treatments (99.1% each) and LS treatment (96.3%). The C treatment had the lowest sensitivity of 92.6% for this model. For Model 1, the C treatment had the highest sensitivity (99.1%), followed by the LS treatment (98.8%), HS treatment (97.8%), and CM treatment (97.5%), while the HSM had the lowest sensitivity of 96.9%.


**Table 4.** Statistical results for the artificial neural networks pattern recognition models. Model 1: microPHAZIRTM for leaves, and Model 2: microPHAZIRTM for berries. Performance is based on means squared error (MSE).

**Figure 7.** Receiver operating characteristic (ROC) curves for the two models developed (**a**) the microPHAZIRTM leaf model, (**b**) the microPHAZIRTM berry model. Colored lines represent the different smoking treatments. Abbreviations: C = control without misting; CM = control with misting; HS = high density smoke without misting; HSM = high density smoke with misting; and LS = low density smoke.

### **4. Discussion**

#### *4.1. Physiological Measurements*

Leaf gas exchange parameters were measured the day after smoking. The three smoke treatments showed significant reductions in gs, in particular, the high-density smoke without misting (HS) treatment, which showed the lowest average reading for gs (Table 2). Stomatal closure is one of the first responses to smoke exposure undertaken by plants [6,26], and a study by Ristic and colleagues [26] found that the time required for gs to recover following one hour of smoke exposure for Cabernet Sauvignon grapevines was approximately 6–10 days. A previous study by Bell et al. [6] found that gs of potted Cabernet Sauvignon grapevines had returned to 60% of pre-smoke exposure rate following fifteen min exposure to smoke using Tasmanian blue gum (*Eucalyptus globulus* L.) leaves as fuel, while rates had returned to 80% of pre-smoke values following exposure to smoke derived from Coast Live Oak (*Quercus agrifolia* Née) leaves. This indicates that in addition to the type of fuel used, the intensity of smoke exposure may also affect the extent of stomatal closure and, hence, reduction in gs. It is, therefore, not surprising that the HS treatment had the lowest gs. However, it is interesting that the low smoke treatment (LS) had lower gs than the high smoke with misting treatment (HSM), which indicates that misting may have reduced the effect of smoke exposure on gs. During a bushfire, the type of fuel burnt will vary depending on the region and the type of plant species native to the area, as well as the amount of smoke exposure due to land topography and wind vectors; therefore, the effect on gs may vary [17,18,23,66]. While misting only partially prevented the uptake of volatile phenols and glycoconjugates in grapes [60], it did appear to have a physiological effect. It is evident that misting reduced the effect of smoke exposure on gs. Smoke contains a complex mixture of gases such as sulfur dioxide (SO2), O3, and nitrogen dioxide (NO2), as well as dust particles that have been shown to inhibit photosynthesis and affect stomatal opening [6,26,29]. Stomata are the primary point of entry for these gases and dust particles [6]; therefore, misting may help prevent the uptake of dust and other particles by trapping them in water that has condensed on the leaf surface, preventing their entrance into the stomata. The present water may also act as a solvent for gases such as SO2 and NO2, thereby incorporating them into a solution that then may drip off the leaf surface. In addition to this, smoke exposure may trigger stomatal closure by producing high vapor pressure deficits [26,29]. The presence of misting may help reduce the leaf-to-air vapor pressure difference produced by smoke exposure, thereby reducing the impact on gs. Misting also appeared to reduce the effect of smoke exposure on transpiration rate (E) as there were no differences between the two control treatments and the LS and HSM treatments. Only the HS treatment had significantly reduced E. Mean rates of photosynthesis (A) followed similar patterns to gs, with the HS treatment having the lowest value, followed by the LS treatment and then the HSM treatment, while the control without misting (C) had the highest rate of A. This indicates that while misting may have reduced A in the control treatments, it may also help reduce the effects of smoke exposure on A.

## *4.2. Near-Infrared Spectroscopy Patterns and Principal Component Analysis*

From the PCA biplots (Figures 5 and 6) and spectra (Figures 3 and 4) generated in the current study, it is evident that smoke exposure alters the NIR spectral signals of grapevine leaves and berries, and this may prove useful for the detection of grapevine smoke contamination. For the microPHAZIRTM RX Analyzer leaf spectra, high loadings (Figure 5b.) were observed for the wavelength regions between 1604–1621, 1621–1647, and 1613–1647 nm, all of which correspond to C-H stretching of sugars and aromatic compounds [67–70]. For the microPHAZIRTM RX Analyzer berry spectra, high loadings (Figure 6b.) were observed for the wavelength regions between 1604–1622, 1630–1647, and 2374–2389 nm, which correspond to C-H stretching of sugars, such as glucose, as well as aromatic hydrocarbons, which may be due to the presence of smoke-derived volatile phenols, such as guaiacols, cresols, and syringols, and their glycoconjugates [67,68,71,72].

#### *4.3. Spectral Indices*

## 4.3.1. Leaf

The normalized difference vegetation index (NDVI) gives an indication of plant vigor and fruit ripening resulting from relative changes in chlorophyll content. It is based on the variation between the maximum absorption of red by chlorophyll pigments and the maximum reflectance in the infrared caused by leaf cellular structure [56,57,73–75]. Similarly, relative changes in anthocyanin content are expressed as the normalized anthocyanin index (NAI). Both the NDVI and NAI are expressed as a normalized value between −1 (lack of green or redness) to +1 (green or red) [56,57]. Not surprisingly, HS leaves had the lowest NDVI and NAI values. Previous studies investigating the effects of pollution on leaf pigments found a decrease in photosynthetic pigments following exposure to pollutants, including sulfur dioxide (SO2), carbon dioxide (CO2), nitrogen dioxide (NO2), and ozone (O3) [59,76,77]. These studies are often used as comparisons for investigating the effects of smoke exposure on leaves as compounds in air pollution can also found in smoke [6,22]. There were no differences in NDVI and NAI values between the LS, HSM, and control treatments (C and CM), indicating that misting may reduce the effects of smoke exposure on leaf pigments, and low levels of smoke exposure for one hour may also have no effect. Longer periods of smoke exposure (days or weeks, as is often the case with wildfires) may be required to cause a noticeable change in leaf pigments.

The plant senescence reflectance index (PSRI) gives an indication of the stage of leaf senescence and fruit ripening through assessing changes in carotenoid accumulation and their proportion to chlorophyll. Values range from −1 to +1, with higher values indicating increased stress and carotenoid accumulation [55,63–65,78]. The PSRI was highest for the HS treatment, indicating heightened stress and leaf senescence. This also corresponds with the high CRI500 value for this smoke treatment, indicating increased carotenoid accumulation.

#### 4.3.2. Berries

Research by Noestheden et al. [5] found that smoke exposure induced changes in phenylpropanoid metabolites in Pinot Noir berries and wine, some of which are associated with the color and mouthfeel of the wine. Berries exposed to HS and LS treatments had the highest mean NAI values, indicating that smoke exposure may increase anthocyanin content, possibly due to an increase in phenolic accumulation as a stress response induced by exposure to ozone present in smoke [5,79,80]. The HSM treatment had a low NAI value, indicating that misting may reduce anthocyanin concentrations through increased irrigation. Castellarin et al. [81] found that early (before véraison) and late (after the onset of ripening) season, water deficits increased anthocyanin accumulation during ripening. The application of in-canopy misting may reduce water stress and, therefore, reduce anthocyanin accumulation.

Interestingly the HSM followed by the HS treatments had the lowest PSRI values. As carotenoid concentrations in grapes generally decrease during véraison, this may have resulted in lower PSRI values. Therefore, the PSRI may not be suitable for assessing the degree of ripening in grape berries.

#### *4.4. ANN Modeling*

Both ANN models classified leaf and berry readings as a function of smoke exposure with high accuracy. The microPHAZIRTM leaf model (model 1) had the highest positive classification, with 98% accuracy (Table 4). The NIR region selected for use in Model 1 was between 1600–1800 nm in order to minimize any possible interference due to the absorption spectra of water in the region of approximately 1930 nm [69]. Furthermore, the region between 1680–1690 nm is associated with aromatic C-H stretching [67]; as such, any patterns observed by the ANN would most likely be due to the presence of smoke-derived volatile phenols. Research by Kennison [22] found a positive correlation between levels of smoke-derived compounds found in leaves and levels in wine; this ANN model developed may, therefore, offer a rapid, in-field method for assessing grapevine smoke contamination. It also demonstrates great promise for further research into the use of NIR spectroscopy coupled with

unmanned aerial vehicles (UAVs) with Global Positioning System (GPS) trackers, which could fly over vineyards to scan grapevine canopies and provide maps of smoke contaminated regions.

The microPHAZIRTM berry model (model 2) also had a high overall accuracy in classifying grape berries according to smoke treatment (97.4%). For Model 2, the entire wavelength range between 1600–2396 nm was used. This includes the C-H stretching of aromatic compounds at 1680 nm, O-H stretching at 1930 nm associated with glucose, cellulose, and water, and C=O second overtone associated with carboxylic acids and water between 1900–1910 nm [67,69]. As NIR measurements were conducted in-field on whole berries, this offers a non-destructive tool for assessing grapevine smoke contamination. Whole grapes may be used for assessment as smoke compounds have been found to occur primarily in grape skins [3,25]. Furthermore, the Lighting PassportTM smart handheld spectrometer may be of interest to growers due to its affordability compared to other spectrometers. It is also very small and lightweight, making it easy to undertake measurements in-field, and it can be connected to smartphones via Bluetooth, where data can be stored and retrieved for later analysis [82].

The two ANN models more accurately differentiated the spectral readings relative to PCA. This may be because ANNs are better suited to handle complex, non-linear data, and more readily find patterns or relationships between data than other forms of analysis [53,83–85]. Research by Janik et al. [53] found that the combination of ANNs with partial least squares (PLS) or PCA overcomes issues of non-linearity as well as increasing the accuracy of regression models in predicting total anthocyanin concentrations in red grape homogenates. This may also explain why Model 2 was able to accurately differentiate the berry spectral readings from C, CM, and LS treatments, despite analysis of variance indicating there were no statistically significant differences.

As smoke exposure altered the chemical fingerprinting of grapevine leaves and berries, the ANN models were able to detect changes in the spectral patterns and then classify the readings as a function of experimental treatments. This may offer grape growers a rapid method of assessing the level of smoke contamination in grape berries and leaves, with a high level of accuracy and precision. This may assist growers in deciding which berry samples to send for further chemical analysis to quantify the levels of smoke compounds in grapes and predict the level of smoke taint in the final wine, or they may decide to avoid harvesting heavily contaminated grapes for winemaking. Furthermore, as this method is non-destructive, repeated measurements are possible. By knowing the level of smoke contamination, growers can make informed decisions.

While the ANN models developed were able to classify Cabernet Sauvignon leaf and berry spectra accurately, further research is required to assess whether these models can be used for other grape varieties, as differences in berry composition and leaf physiology may affect the accuracy of classification [6,34]. Previous research evaluated MIR spectroscopy for the classification of smoke tainted wines found compositional differences due to grape variety prevailed over differences resulting from low levels of smoke exposure [34]. Furthermore, the physiological responses of different grape varieties to smoke were found to vary, both in magnitude and in recovery time [6,26]. Thus, further testing of these models using berry and leaf spectra from different grapevine varieties is required.

#### **5. Conclusions**

Results from this study indicate that smoke exposure alters the NIR spectra of Cabernet Sauvignon grapevine leaves and berries. As a result, accurate classification models can be developed using ANN modeling. Artificial neural networks are better at classifying non-linear or complex data than traditional techniques, such as principal component analysis. Furthermore, the use of UV-Vis spectroscopy may offer insights into the physiological performance of leaves and the quality and degree of ripening of grapes. These techniques may assist grape growers in identifying grapevines that have been contaminated by smoke, thereby informing decision-making to avoid harvesting and processing heavily contaminated grapes and/or the need for mitigation techniques to manage the risk of smoke taint in resulting wine. Further testing of the ANN models developed in the current study is required to assess their accuracy in classifying grapevine leaf and berry spectra from other grape varieties.

**Supplementary Materials:** The following are available online at http://www.mdpi.com/1424-8220/20/18/5099/s1, Table S1: Concentrations of volatile phenols in grape juice (μg/L) and their glycoconjugates in grape homogenate (μg/kg) one hour after smoke treatments.

**Author Contributions:** Conceptualization, V.S., and S.F.; data curation, V.S., C.G.V., and S.F.; formal analysis, V.S.; funding acquisition, S.F.; investigation, V.S., and C.G.V.; methodology, V.S., C.G.V., C.S., K.L.W., and S.F.; project administration, K.L.W., and S.F.; resources, K.L.W., R.D.B., and S.F.; software, C.G.V., and S.F.; supervision, D.D.T., A.P., and S.F.; validation, C.G.V., K.L.W., and S.F.; visualization, V.S., C.G.V., D.D.T., and S.F.; writing—original draft, V.S.; writing—review and editing, V.S., C.G.V., C.S., K.L.W., D.D.T., A.P., R.D.B., and S.F. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** This research was supported through the Australian Government Research Training Program Scholarship, as well as the Digital Viticulture program funded by the University of Melbourne's Networked Society Institute, Australia. C.S. was supported by the Australian Research Council Training Centre for Innovative Wine Production (www.arcwinecentre.org.au), which is funded as part of the ARC's Industrial Transformation Research Program (Project No. ICI70100008), with support from Wine Australia and industry partners. The authors greatly acknowledge the Digital Agriculture, Food, and Wine Group.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

## **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Early Detection of Aphid Infestation and Insect-Plant Interaction Assessment in Wheat Using a Low-Cost Electronic Nose (E-Nose), Near-Infrared Spectroscopy and Machine Learning Modeling**

**Sigfredo Fuentes 1,\*, Eden Tongson 1, Ranjith R. Unnithan <sup>2</sup> and Claudia Gonzalez Viejo <sup>1</sup>**


**Abstract:** Advances in early insect detection have been reported using digital technologies through camera systems, sensor networks, and remote sensing coupled with machine learning (ML) modeling. However, up to date, there is no cost-effective system to monitor insect presence accurately and insectplant interactions. This paper presents results on the implementation of near-infrared spectroscopy (NIR) and a low-cost electronic nose (e-nose) coupled with machine learning. Several artificial neural network (ANN) models were developed based on classification to detect the level of infestation and regression to predict insect numbers for both e-nose and NIR inputs, and plant physiological response based on e-nose to predict photosynthesis rate (A), transpiration (E) and stomatal conductance (gs). Results showed high accuracy for classification models ranging within 96.5–99.3% for NIR and between 94.2–99.2% using e-nose data as inputs. For regression models, high correlation coefficients were obtained for physiological parameters (gs, E and A) using e-nose data from all samples as inputs (R = 0.86) and R = 0.94 considering only control plants (no insect presence). Finally, R = 0.97 for NIR and R = 0.99 for e-nose data as inputs were obtained to predict number of insects. Performances for all models developed showed no signs of overfitting. In this paper, a field-based system using unmanned aerial vehicles with the e-nose as payload was proposed and described for deployment of ML models to aid growers in pest management practices.

**Keywords:** remote sensing; volatile compounds; artificial neural networks; photosynthesis modeling; plant water status modeling

## **1. Introduction**

Early detection of insect infestation in crops is critical for decision-making related to pest management and alerting potential infestation to neighboring susceptible crops. One of the most common agronomical assessments for detrimental insect infestation in crops is visual at determined and critical periods of the crop development in synchronicity with the insect's population dynamics [1] and migrations [2]. The next step for more practical monitoring is using pheromone traps [3], which can be used for more ecological pest management [4]. Some of these pheromone traps have been integrated with digital technologies, such as video cameras [5] to assess effectiveness [6] and implementing computer vision for pest identification and automatic counting using machine learning [7–12]. Some of these systems are web-based and used to support agronomical decision-making in developing countries [8].

Even though these applications are certainly an advancement in automated pest monitoring and management, they still rely on sentinel locations within the crop field.

**Citation:** Fuentes, S.; Tongson, E.; Unnithan, R.R.; Gonzalez Viejo, C. Early Detection of Aphid Infestation and Insect-Plant Interaction Assessment in Wheat Using a Low-Cost Electronic Nose (E-Nose), Near-Infrared Spectroscopy and Machine Learning Modeling. *Sensors* **2021**, *21*, 5948. https://doi.org/ 10.3390/s21175948

Academic Editor: Asim Biswas

Received: 28 July 2021 Accepted: 1 September 2021 Published: 4 September 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

The latter could translate into an economic limitation for extensive crops, which require a significant number of monitoring nodes and increasing complexity of the sensor network. Furthermore, these monitoring and counting systems do not give much information on the insect-plant interaction, insect natural predator's interaction, or detrimental effects or symptomatology from the plant's perspective.

Other remote sensing techniques have been implemented for pest detection in crops [13] based on sensor networks [14], IoT for moths [15], hyperspectral imaging based on airborne [16], satellite [17], and unmanned aerial vehicles (UAV) [18], among others. These systems offer the advantage of increased spatial resolution and potential temporal resolution in airborne and UAV platforms. However, there are some disadvantages related to the plant-based nature of remote sensing monitoring. The first disadvantage is related to monitoring and modeling based on plant symptomatology in response to insect attacks, often assessed late, with detrimental implications in yield and quality of produce. Another disadvantage is that there is no assurance that symptomatology targeted using remote sensing to detect insects are entirely related to the specific biotic stress of interest. Some plants may have other biotic and abiotic symptomatology, such as water, salinity, and other insect interaction stresses. These issues could create biases in models developed and hinder capabilities of deployment of models to other locations.

Hence, there is a need for a digital system that considers the early detection of the pest of interest and early interaction with the host plant. To understand the specific insect-plant interactions for machine learning modeling purposes, controlled experiments must be considered before deployment in field conditions. Furthermore, a digital system based on volatile compounds could offer advantages compared to other systems. The implementation of electronic noses (e-noses) for insect detection have been proposed for disease detection and diagnosis [19] and pest detection [20], specifically for cotton [21], as a portable e-nose development, and specifically for aphid detection on tomato plants [22] using four low-cost gas sensors and comparing with gas chromatography results. In wheat, some authors have also used commercial e-noses to detect mite infestation [23] to predict the age and insect damageduring storage using linear discriminant analysis [24], and to detect rusty grain beetle, *Cryptolestes ferrugineus*, and red flour beetle in wheat [25]. Some studies have also combined computer vision systems and e-noses for pests in agriculture [26]. There is an increasing interest in developing compact, portable, and low-cost e-noses for these purposes [27]. However, most of these new researches are focused only on the detection of the variation in volatile compounds related to the insect presence and the interaction between insects and plants [25,28–30], and in some researches, combining e-nose and computer vision [26] for insect detection and identification, but so far, no attempt has been made to separate them through comprehensive modeling on these separate processes.

This paper proposed the implementation of a newly developed e-nose comprised of nine gas sensors described by Gonzalez Viejo et al. (2020) [31] and near-infrared spectroscopy (NIR) for the early detection of aphids (*Rhophalosiphum padi*) on wheat plants in controlled conditions. Raw data from the e-nose and NIR were used as inputs for machine learning algorithms to develop different classification models to detect insect's presence at different phenological stages and regression models to predict the number of insects and physiological responses of plants based on gas-exchange measurements. Furthermore, a deployment system was proposed to validate these models in the field using the e-nose as a UAV payload to test different flying altitudes for detection sensitivity purposes. The latter system could have several advantages compared to research done so far by addressing the gaps discussed above.

The implementation of the proposed system can be highly beneficial to growers being able to provide high temporal and spatial resolution for more precise and targeted decisionmaking. Furthermore, the deployment of this system could support not only pest detection and management but also other agronomical activities, such as plant water status and irrigation scheduling and the detection of other biotic and abiotic stresses.

## **2. Materials and Methods**

#### *2.1. Plant and Insect Material, and Experimental Design Description*

Wheat seeds of Kittyhawk variety (Pacific Seeds, Toowoomba, QLD, Australia) were surface sterilized with 0.8% sodium hypochlorite and were pre-germinated in the dark at 4 ◦C for 48 h, followed by lit conditions (17–25 ◦C) for 72 h. The germinated seeds were transferred individually to Jiffy-7® pellets (Jiffy Products S.L. (Private) Ltd., Mirigama, Sri Lanka). The seedlings were further grown to a two-leaf stage (GS12) prior to transplanting in pots.

The plants were grown in a non-circulating passive hydroponic method based on Kratky [32]. The wheat seedlings were transplanted into 3 Li (190 mm × 170 mm) hydroponic pots (Anti-Spiral Pot, Garden City Plastics, Dandenong South, VIC, Australia) filled with expanded clay pebbles (CANNA Aqua Clay Pebbles, Subiaco, WA, Australia) as substrate, with three seedlings placed equidistant in each pot. Duplicate pots were placed in a black plastic tub filled with modified Hoagland nutrient solution [33] up to root submergence level. The nutrient solutions were replaced every two weeks throughout the experiment. Each tub of hydroponic set-up is placed inside an insect rearing tent (Bug-Dorm, Australian Entomological Supplies Pty., Ltd., South Murwillumbah, NSW, Australia) constructed with nylon mesh with 160 μm aperture. The plants were maintained inside a growth room (Biosciences Glasshouse Complex, The University of Melbourne, Parkville, VIC, Australia) with 16 h daylight/8 h night and 20–25 ◦C controlled automatically.

Oat aphids (*Rhophalosiphum padi*) were obtained from laboratory cultures of Pest & Environmental Adaptation Research Group, School of Biosciences, The University of Melbourne, Australia. The starting colony was allowed to reproduce for population increase in a rearing tent supplied with wheat plants (in a similar hydroponic set-up described above). Adult *R. padi* were randomly selected from the colony plants and introduced into the experimental plants, approximately at stem elongation stage with third leaf emerged (GS32). Three treatments were determined based on the economic threshold for winter cereals which is an average of 15 aphids per tiller on 50% of tillers [34]: high load (15 aphids per tiller in 50% of tillers), medium load (10 aphids per tiller in 50% of tillers), and low aphid load (5 aphids per tiller in 50% of tillers). The aphids were carefully transferred into the wheat plants with a fine natural bristle brush. For simplicity, days referred in models developed correspond to days after infestation at the wheat phenological stage GS32.

A total of eight experimental set-ups were made with duplicate set-ups for each treatment (low, medium, and high aphid load) and two aphid-free set-ups as controls. Each experimental set-up was composed of one insect rearing tent, containing two pots with each pot planted with three wheat plants, maintained in hydroponics as described above and shown in Figure 1. These were randomly arranged inside the growth room.

Insect population models (adults) were developed using initial insect infestations and exponential growth models applicable to sigmoidal population insect growth [35,36]. Curves were adjusted by image analysis and manual insect counting per leaf, and extrapolation per plant in the middle and end of the experiment to account for insect mortality (data not shown).

## *2.2. Physiological Measurements*

Plant physiological parameters such as stomatal conductance (gs; mol H2O m−<sup>2</sup> s<sup>−</sup>1), transpiration (E; mmol H2O m−<sup>2</sup> s−1), and photosynthesis (A; μmol CO2 m−<sup>2</sup> s−1) were measured using a Li-6400 XT open gas exchange system (Li-Cor Inc., Environmental Sciences, Lincoln, NE, USA). Measurements were made on the youngest fully expanded leaves, repeated three times in different tillers of each plant (*n* = 18 per tent; *n* = 36 per treatment).

**Figure 1.** Wheat plants were grown in non-circulating passive hydroponic system (**left**) and contained in insect rearing tents (**right**).

## *2.3. Near-Infrared Spectroscopy Measurements*

A single leaf of each wheat plant (three per pot and six leaves per tent) was measured on six different spots (*n* = 36 per tent; *n* = 72 per treatment) using a handheld near-infrared (NIR) spectroscopy device (MicroPHAZIR™ RX; Thermo Fisher Scientific, Waltham, MA, USA). This device measures the absorbance values within the 1596–2396 nm wavelength range. A blank reference was used as background to calibrate the device every 10 measurements and was placed on the top of the leaf while measuring to avoid recording noise from the environment (Figure 2). The raw absorbance values were used for all analyses presented in this study.

**Figure 2.** Photosynthetic gas exchange (**left**) and near-infrared (**right**) devices while taking measurements.

## *2.4. Electronic Nose Measurements*

A portable and low-cost electronic nose (e-nose) developed by the Digital Agriculture Food and Wine Group and the Department of Electrical and Electronic Engineering from The University of Melbourne was used to assess volatile compounds produced by the control plants and treatments with aphids. This e-nose consists of an array of nine sensors sensitive to different gases: (i) MQ3 (alcohol), (ii) MQ4 (methane: CH4), (iii) MQ7 (carbon monoxide: CO), (iv) MQ8 (hydrogen: H2), (v) MQ135 (ammonia/alcohol/benzene), (vi) MQ136 (hydrogen sulfide: H2S), (vii) MQ137 (ammonia), (viii) MQ138 (benzene/alcohol/ammonia), and (ix) MG811 (carbon dioxide: CO2), as well as a humidity and temperature sensor to measure the environment conditions (Figure 3; Henan Hanwei Electronics Co., Ltd., Henan, China). The e-nose was calibrated for ~30 s prior to recording each measurement to ensure all sensors reached the baseline and then placed inside the tent on top of the plants to record data for 1.5 min; each tent was measured in triplicates. The output data (Volts) were then analyzed using a code written in MATLAB® R2020a (Mathworks Inc., Natick, MA, USA) to extract the mean values of ten segments from the highest peak of the curves as described by Gonzalez Viejo et al. [37].

(**a**) (**b**)

**Figure 3.** Electronic nose (e-nose) showing (**a**) the front part with gas sensors and their model ID (Henan Hanwei Electronics Co., Ltd., Henan, China) and (**b**) the back part which holds the humidity/temperature sensor; (**c**) Shows the e-nose positioning while taking measurements.

## *2.5. Statistical Analysis and Machine Learning Modeling*

Physiological and e-nose data were analyzed using ANOVA to assess significant differences (*p* < 0.05) between samples; additionally, a Tukey honestly significant difference (HSD) *post hoc* test (α = 0.05) was conducted using XLSTAT v.2020.3.1 (Addinsoft, New York, NY, USA). These data were then analyzed for significant correlations (*p* < 0.05) based on covariance using MATLAB® R2020a and represented with a matrix.

Several machine learning models based on artificial neural networks (ANN) were developed with three different purposes to (i) predict physiological data using e-nose outputs and the infestation level (control: 0, low: 0.25, medium: 0.75, and high: 1) as inputs using data from all treatments (Model 1), and only the baseline and control treatments (Model 2), (ii) classify samples into the different infestation treatments (control, low, medium, and high) using the NIR absorbance values (Models 3–7), and e-nose outputs (Models 8–12) as inputs, and (iii) predict the number of aphids using the NIR absorbance values (Model 13) and e-nose outputs (Model 14) as inputs. All models were constructed using a customized code written in MATLAB® R2020a to test 17 different training algorithms in a loop and find the best models based on accuracy and performance [38,39]. Furthermore, a neuron trimming test (3, 5, 7, and 10 neurons) was performed to assess the most optimal number of neurons to avoid under- or over-fitting of the models (data not shown). The regression models (i, iii) consisted of a feedforward network with a hidden (tan-sigmoid function) and an output (linear transfer function) layer. On the other hand, the classification models (ii) consisted of a feedforward network with a hidden (tan-sigmoid function) and an output (Softmax neurons) layer.

The best models to predict the physiological data (photosynthesis, stomatal conductance, transpiration) were developed using the Bayesian Regularization training algorithm for regression modeling. For this, two models were developed: Model 1 using as inputs the e-nose outputs and infestation level (control: 0, low: 0.25, medium: 0.75, and high: 1) from all measurements and treatments (general model), and Model 2 using the e-nose outputs from samples with no insects such as the baseline and control (Figure 4a). Data were divided randomly as 70% for training and 30% for testing using a performance algorithm based on means squared error (MSE).

Models to classify the samples into the different treatments (Figure 4b) using the NIR absorbance values as inputs were developed using the Levenberg–Marquardt training algorithm. One model was developed per day of measurement as Model 3 (baseline + Day 3), Model 4 (Day 7), Model 5 (Day 10), Model 6 (Day 14), and Model 7 (Day 17) to assess the level of infestation at different stages. Data were divided randomly as 70% for training, 15% for validation using a performance algorithm based on MSE, and 15% for testing. On the other hand, models to classify the samples into the different treatments using the e-nose outputs as inputs were constructed using the Bayesian Regularization training algorithm. Same as the previous, one model was developed per day of measurements as Model 8 (baseline + Day 3), Model 9 (Day 7), Model 10 (Day 10), Model 11 (Day 14), and Model 12 (Day 17). Data were also divided randomly as 70% for training and 30% for the testing stage using the MSE performance algorithm.

The Bayesian Regularization training algorithm produced the best models to predict the number of aphids using the NIR absorbance values (Model 13) and e-nose outputs (Model 14) from days 7 to 17 as inputs (Figure 4c). A random data division was used as 70% for training and 30% for testing with an MSE performance algorithm.

(**c**)

**Figure 4.** Diagrams of machine learning models based on artificial neural networks showing (**a**) the structure of regression Models 1 and 2; (**b**) Pattern recognition Models 3 to 12, and (**c**) Regression Models 13 and 14. Abbreviations: W: weights; b: bias; electronic nose sensors MQ3: alcohol; MQ4: methane; MQ7: carbon monoxide; MQ8: hydrogen; MQ135: ammonia/alcohol/benzene; MQ136: hydrogen sulfide; MQ137: ammonia; MQ138: benzene/alcohol/ammonia; MG811: carbon dioxide.

For Models 3–14, six support vector machine (SVM) algorithms (i) linear, (ii) quadratic, (iii) cubic, (iv) fine Gaussian, (v) medium Gaussian, and (vi) coarse Gaussian were also tested to compare results with ANN and find the best models. These algorithms were run using the Classification and Regression Learner applications in MATLAB® Statistics and Machine Learning Toolbox 12.1. Accuracy percentage for classification and correlation coefficient (R) and MSE for regression models were considered to compare the different ML methods/algorithms. However, only accuracy percentage and R values are reported in results due to their lower accuracy compared to ANN. These algorithms were not tested for Models 1 and 2 because SVM algorithms are unable to construct multi-target models, which makes them inefficient for further deployment.

## **3. Results**

Table 1 shows non-significant differences (*p* > 0.05) between treatments for baseline measurements of any physiological parameters. For photosynthesis, at days 10 and 17, the control was significantly higher (*p* < 0.05; 12.47 and 12.57 μmol m−<sup>2</sup> s−1, respectively) than the infested treatments. Similarly, stomatal conductance was significantly higher for control at days 7, 10, and 17 (0.51, 0.55, and 0.62 mol m−<sup>2</sup> s−1, respectively). On the other hand, transpiration was significantly higher for control in all measurement days (day 3–17) with values within the 3.60–6.00 mmol m−<sup>2</sup> s−<sup>1</sup> range.

Figure 5a shows that the non-infested plants presented higher absorbance values at Days 10–17, especially within the 1900–2000 nm range, with Day 7 being the lowest. For the infested treatments (Figure 5b), the major overtones were also within the 1900–2000 nm range. The lowest absorbance values were at Day 7 for all low, medium, and high treatments, while the highest values were found at Day 17 for the medium and high treatments.

**Figure 5.** Near-infrared curves showing the absorbance values within the 1596–2396 nm wavelength range for (**a**) the control measurements and (**b**) the treatments (low, medium, high infestation) measured at different dates.

Figure 6 shows there were significant differences (*p* < 0.05) between treatments in all measurement days for all sensors, except for MQ4 (CH4) on Day 3, MQ136 (H2S) on Days 3, 14, and 17, and MQ8 (H2) at Days 14 and 17. It can be observed that the highest values were found in sensors MG811 (CO2), MQ4 (CH4), MQ3 (alcohol), and MQ7 (CO).



*Sensors* **2021** , *21*, 5948

Abbreviations:

test (α = 0.05).

 BL: baseline; D: Day. Different letters denote significant differences between treatments according to ANOVA (*<sup>p</sup>* < 0.05) and Tukey honestly significant difference *post hoc*

**Figure 6.** Electronic nose (e-nose) outputs (means) for each integrated sensor at each day of measurements. Error bars are based on standard error, and different letters denote significant differences between treatments according to ANOVA (*<sup>p</sup>* < 0.05) and Tukey honestly significant difference post hoc test (α = 0.05).

Figure 7 shows that both photosynthesis and transpiration had a positive and significant correlation (*p* < 0.05) with MQ3 (alcohol; r = 0.45 and r = 0.65, respectively), and MQ7 (CO; r = 0.55 and r = 0.71, respectively), and a negative correlation with number of aphids (r = −0.44 and r = −0.59, respectively) and MQ4 (CH4;r= −0.51 and r = −0.45, respectively). Similarly, stomatal conductance had a positive correlation with MQ3 (r = 0.65) and MQ7 (r = 0.69), and a negative correlation with number of aphids (r = −0.56). Transpiration and stomatal conductance were also positively correlated with MQ8 (H2; r = 0.43). On the other hand, number of aphids had a positive correlation with MQ4 (r = 0.52) and a negative correlation with MQ3, MQ7, MQ8, MQ135, MQ136, MQ137, and MQ138 with correlations within the r = −0.56–−0.81 range.

**Figure 7.** Matrix showing the significant correlations (*p* < 0.05) between the physiological data, number of aphids, and the electronic nose sensors. Color bar represents the negative (yellow) to positive (blue) correlations. Numbers within the boxes denote the correlation coefficients (r).

Table 2 shows the results from the machine learning regression models to predict physiological data (photosynthesis, stomatal conductance, and transpiration) using the e-nose outputs and infestation level as inputs. Model 1 was constructed as a general model using data from all treatments, and measurement days had an overall correlation coefficient R = 0.86. It had no signs of under- or overfitting as the MSE value of the training stage (MSE = 0.05) was lower than the testing (MSE = 0.06); however, the slope values were medium (b = 0.76). On the other hand, Model 2, which was developed using only the data from non-infested plants (baseline and controls), had high overall accuracy (R = 0.94) with high slope values (b = 0.90) and no signs of under- or overfitting with training MSE = 0.02 lower than testing MSE = 0.04. The overall models are shown in Figure 8, where data points from Model 1 (Figure 8a) are more dispersed and had 5% of outliers (216 out of 4320) based on the 95% prediction bounds. Model 2 (Figure 8b) also presented 5% of outliers (81 out of 1620), but the slope was closer to the unity (b = 0.90). It can also be observed that for Model 2, most of the outliers were from stomatal conductance, while in Model 1, they were more similar for the three targets.


**Table 2.** Machine learning regression models based on artificial neural networks (Bayesian Regularization) to predict physiological data using the electronic nose outputs as inputs. Abbreviations: R: correlation coefficient; b: slope; MSE: means squared error.

**Figure 8.** Overall regression models to predict physiological data using (**a**) the electronic nose outputs and infestation level as inputs for general data using all treatments at all measurement days and (**b**) using the electronic nose outputs as inputs with the baseline and control data (non-infested). Abbreviations: R: correlation coefficient; T: targets.

Table 3 shows the results from the pattern recognition models to classify samples into the different treatments (control, low, medium, and high) using the NIR absorbance values as inputs. It can be observed that Model 3 was constructed using data from the baseline and Day 3, and Model 6 was developed with data from Day 14; both had a very high overall accuracy of 97%, being the lowest in accuracy compared to the other days of measurement. Model 4 was developed using data from Day 7 presented a higher overall accuracy of 98%. On the other hand, Models 5 and 7 had the highest overall accuracy (99%), with Model 5 being the best as it was constructed using a lower number of neurons (Model 5: 7 neurons; Model 7: 10 neurons). None of the five models presented any signs of under- or overfitting, and the MSE values of training (MSE < 0.01 for all) were lower than the validation and testing, and the latter stages had similar MSE values.


**Table 3.** Machine learning pattern recognition models based on artificial neural networks (Levenberg– Marquardt) to classify samples into infestation treatment levels using the near-infrared absorbance values as inputs. Abbreviations: MSE: means squared error.

Accuracy results for Models 3–7 using SVM were lower than those from ANN Levenberg–Marquardt algorithm (Table 3). Results were within the following ranges: (i) Linear SVM (Models 3–7: 56–74%), (ii) Quadratic SVM (Models 3–7: 80–95%), (iii) Cubic SVM (Models 3–7: 90–92%, 88–98%), (iv) Fine Gaussian SVM (Models 3–7: 82–83%), (v) Medium Gaussian SVM (Models 3–7: 58–65%), and (vi) Coarse Gaussian SVM (Models 3–7: 41–45%). As can be seen, the model with the highest accuracy was with quadratic SVM (98%). However, this is lower than the ANN models, which presented the highest accuracy of 99.3%.

**Overall** 288 99.3% 0.7% -

Table 4 shows the results from the pattern recognition models to classify samples into the different treatments (control, low, medium, and high) using the e-nose outputs as inputs. It can be observed that Models 8, 9, and 11 were developed using data from days 3, 7, and 14, respectively, and had very high overall accuracy (98%). Whilst Model 10 constructed with data from Day 10 presented the highest overall accuracy (99%). On the other hand, Model 12, developed using data from the last day of measurements (Day 17), presented high overall accuracy of 94%; however, it was the lowest compared to models from previous days. All of the models were constructed using three neurons, and none

of them presented signs of under- or overfitting as the MSE values of the training stage (MSE < 0.01) were lower than the testing.

**Table 4.** Machine learning pattern recognition models based on artificial neural networks (Bayesian Regularization) to classify samples into infestation treatment levels using the electronic nose outputs as inputs. Abbreviations: MSE: means squared error.


Accuracy results for Models 8–12 using SVM were lower than those from ANN Bayesian Regularization algorithm (Table 4). Results were within the following ranges: (i) Linear SVM (Models 8–12: 75–85%), (ii) Quadratic SVM (Models 8–12: 84–96%), (iii) Cubic SVM (Models 8–12: 88–98%), (iv) Fine Gaussian SVM (Models 8–12: 89–93%), (v) Medium Gaussian SVM (Models 8–12: 85–94%), and (vi) Coarse Gaussian SVM (Models 8–12: 72–85%). As can be observed, the model with the highest accuracy was cubic SVM (98%). However, this was lower than the ANN models, which presented the highest accuracy of 99.2%.

Table 5 shows the results from regression models to predict the number of aphids using data from Days 7 to 17. It can be observed that Model 13, developed using NIR absorbance values as inputs, had a very high overall correlation coefficient (R = 0.97). However, Model 14, constructed with the e-nose outputs as inputs, presented higher accuracy (R = 0.99). Both models had very high overall slope values (b = 0.97), and none showed any signs of under- or overfitting based on the performance values. From the overall models, Model 13 (Figure 9a) had 4.98% of outliers (43 out of 864) based on the 95% prediction bounds with the highest number of outliers due to the low infestation treatment. Similarly, Model 14 (Figure 9b) presented 5% of outliers (36 out of 720); however, for this model, the highest number of outliers was due to the medium infestation treatment. The difference in the number of aphids (target) between both models relies on the samples/measurements as NIR was measured on each plant, while e-nose was measured per tent.

**Table 5.** Machine learning regression models based on artificial neural networks (Bayesian Regularization) to predict the number of aphids' data using the near-infrared absorbance values (Model 13) and electronic nose outputs (Model 14) from Days 7 to 17 as inputs. Abbreviations: R: correlation coefficient; b: slope; MSE: means squared error.


**Figure 9.** Overall regression models to predict the number of aphids using (**a**) the near-infrared absorbance values and (**b**) the electronic nose outputs as inputs with data from Days 7–17. Abbreviations: R: correlation coefficient; T: targets.

Correlation coefficients for Models 13 and 14 using SVM were lower than those from ANN Bayesian Regularization algorithm (Table 5). Results from regression SVM were the following: (i) Linear SVM (Model 13: R = 0.68; Model 14: R = 0.80), (ii) Quadratic SVM (Model 13: R = 0.85; Model 14: R = 0.91), (iii) Cubic SVM (Model 13: R = 0.95; Model 14: R = 0.89), (iv) Fine Gaussian SVM (Model 13: R = 0.80; Model 14: R = 0.97), (v) Medium Gaussian SVM (Model 13: R = 0.73; Model 14: R = 0.92), and (vi) Coarse Gaussian SVM (Model 13: R = 0.60; Model 14: R = 0.70). It can be observed that for the model developed using NIR inputs, i.e., Model 13, the highest accuracy was with cubic SVM (R = 0.95), while for the model developed using e-nose inputs, i.e., Model 14, the highest accuracy was obtained with medium Gaussian SVM (R = 0.97). However, these were presented with lower accuracies than the ANN models which presented R values of 0.97 (Model 13) and 0.99 (Model 14).

## **4. Discussion**

#### *4.1. Physiological Response of Plants to Insect Infestation*

Having no statistical differences in physiological data for the baseline with no insects for all plants (Table 1) helped ascertain that those initial conditions were similar for all the plants considered in the study, and no other stresses were present. Differences in physiological data after the introduction of insects in different treatments followed a variable pattern with not much difference for the photosynthetic rate (A), which is expected since plants compensate by either maintaining or increasing in some conditions A due to abiotic [40,41] or biotic stresses such as aphid attack [42].

In the case of stomatal conductance (gs) and transpiration, there were decreasing values according to the level of insect infestation, which is in accordance with previous studies, which have shown that gs is the most sensitive parameter to other stresses, such as water stress [43,44], pathogen-based stress [45], and water stress–aphid interactions in wheat [46].

#### *4.2. Chemical Fingerprinting and Volatile Compounds' Response to Insect Infestation*

The NIR measurements offer a chemical fingerprint of the different leaf samples monitored, including the baseline measurements (Figure 5a) and treatments (Figure 5b) for the different days of the experimental trial. The main variations observable are in the overtones corresponding to hydrogen peroxide (H2O2) in the range of 1596 and 1650 nm [47,48] with similar absorbance levels for all treatments, which may explain the lower effect on photosynthesis reductions. The overtones for water content (status) are shown in the major peak within 1900–2000 nm (1940 nm) [49]. Furthermore, overtones of aromatic compounds can be found in the range of the NIR instrument sensitivity, at 1660 nm, 1672 nm, and 1685 nm [50,51]. Compounds with amide functional groups are at 1920 nm, 1960–1980 nm, 2000–2050 nm, and 2110–2160 nm [51,52]. Overtones of urea, which is an important amide compound, are found at 1990 nm, 2030 nm, and 2070 nm [51]; this was expected to be found in the samples as it is a nitrogen component contained in fertilizers added to the hydroponic solution, which is translocated through the plants.

In the case of e-nose (Figure 6), the baseline data were similar for plants and all tents measured. However, some differences between tents were statistically significant, contrary to the physiological data measured by gas exchange (Table 1). This can be explained by the sensitivity and responsiveness of e-nose sensors (every 0.5 s), which depend on small eddies formed in the growth chamber. Some sensors were more stable than others, such as MQ4 for Day 3, corresponding to methane sensitivity. The differences in sensor readings for subsequent days are expected, and it is assumed that their patterns are related to the interaction between aphids and plants and the increased number of insects in time and plant growth/decline, even small changes in the MG811 (CO2), which corresponds to photosynthetic activity.

When analyzing the correlations between physiological parameters and the sensors that compose the e-nose (Figure 7), it can be seen that, as expected, there is a positive and direct correlation between photosynthesis, stomatal conductance, and transpiration. On the contrary, there is an inverse correlation between physiological parameters and the number of insects, which corresponds to the decline of the plants or response to insect activity. Alcohol has been documented to be produced in plants as an allelopathic response to insect infestation. However, salivary proteins from aphids are able to stop this process when feeding on the plant sap [53,54]. The latter effect may explain the inverse correlation between the number of insects and the MQ3 sensor response signal. Furthermore, methane (MQ4) signal response increase may be due to activity of insects and anaerobic digestion [55], which explains the inverse correlations with physiological parameters and positive correlations with the number of insects. The carbon monoxide sensor (MQ7) had a similar response as the alcohol sensor (MQ3), shown by the high correlation coefficient (r = 0.89) and MQ8 (hydrogen). Most of the other sensors (MQ135, MQ136, MQ137, and MQ138) had an inverse correlation with the number of insects. The

inverse correlation between the ammonia sensors (MQ135, MQ137, and MQ138) may be due to the capacity of aphids to assimilate ammonia into amino acids [56]. Finally, the levels of CO2 (MG811) were not significantly affected by the interaction between insects and plants. The correlations among the different sensors from the e-nose have been reported in previous research [31], which explains in detail the e-nose used in this study.

## *4.3. Machine Learning Models Developed*

The plant physiological machine learning model developed from e-nose data as inputs and LiCOR data as targets for all plants and treatments (Figure 8a), and only control plants (Figure 8b) showed high correlation coefficients and no signs of overfitting. As far as authors ' knowledge, this is the first time these models are presented, which use a low-cost e-nose compared to an established gas exchange method for plant physiological measurements. The LiCOR instrument has been used as a validation method for several remote sensing techniques for other crops [57–60]. The accuracy of the models obtained may not be surprising since both systems, LiCOR and the e-nose, measure gas exchange in different ways. Furthermore, these models are supported by the correlations between different sensors and physiological parameters (Figure 7). The lower correlation found in the ML model, including all plants (R = 0.86), may be explained by the higher variability of the data due to the interaction between plant and insect. Both models may be used to assess the level of the effect of plant-insect interaction on physiological parameters and for further applications to assess plant water stress [61], irrigation scheduling, and the physiological effect of other biotic or abiotic stresses, such as salinity, other insects, plant diseases, and environmental stress such as heatwaves, cold temperatures, and smoke contamination due to bushfires [62].

The accuracy of classification ML models based on NIR and e-nose data as inputs and level of insect infestation as targets was high and similar with over 94% accuracy for all models and dates, with slightly higher accuracies for ML models based on e-nose (Tables 3 and 4, respectively). Within the most important are Models 3 and 8, respectively, since they can be considered for early detection only after three days of insect introduction to the plants' environments and the corresponding treatments in a critical and vulnerable wheat phenological stage. In these models specifically, the baseline data from all plants were used as control, which explained the higher number of samples (576 and 480, respectively). Even though there was unbalanced data for the treatments as classifiers, the models were able to recognize non-infected plants with 96.5% and 98.3%, respectively. All further models can be used either to monitor insect activity or to verify the effectiveness of control methods using either chemical pesticides [63], organic pesticides [64,65], and natural predators through integrated pest management (IPM) [66,67].

## *4.4. Deployment Method for ML Models Developed Proposed Using UAV*

One of the main advantages of creating AI models for the early detection of pests using growth chambers is that data can be obtained in control conditions. Hence the ML models developed do not include stresses related to other biotic or abiotic factors. The similarity of models developed using NIR and e-nose validate the effectiveness of the low-cost instrumentation proposed by comparing them with more established instruments, such as NIR spectroscopy; other studies have used, as a validation point, gas chromatography [20,22].

One advantage of the NIR models, especially for insect number detection, is that they are based on the different patterns of chemical fingerprinting resulting from the plant-insect interactions. Hence, this instrument can be used as a validation method to deploy the ML models developed in this study to field conditions. NIR measurements in plant leaves take just seconds and can be made on a grid of 10 × 10 m in a wheat field instead of visual insect counting, which is extremely difficult and time-consuming [68,69]. The latter can also be assessed using mathematical modeling strategies based on population models [36,70] or through smartphone devices and machine vision [71], image analysis and machine learning [72], and deep learning [73]. However, the e-nose model with R = 0.99 was more adequate, accurate, practical, and is a low cost method. Even though ANN models were selected as best compared to SVM, the authors also have the latter models available for deployment depending on future usage needs.

The deployment method for the e-nose proposed is as a payload for a UAV (Figure 10); the advantage of the e-nose is that it weighs only 200 g, and power can be accessed via the UAV. To assess the sensitivity and efficacy of the models, it is proposed to start flights at 5, 15, 20, and 50 m from the crop's surface to test the ML models.

**Figure 10.** Diagram showing the proposed validation and deployment of machine learning models developed for early detection of aphids in wheat fields using an unmanned aerial vehicle and the e-nose as payload.

## **5. Conclusions**

The low cost and accuracy of the models presented in this study could make the early detection of insect infestation in crop fields feasible using the UAV system proposed. The data and models used in this study can be used as a base for deployments in wheat fields and validation points considering other insects of interest. Further models developed following the phenological stages of plants can be used as testing systems for agronomical management practices for insect control, such as chemical and organic product applications, the introduction of natural predators, and integrated pest management tools. Furthermore, plant physiology models based on the low-cost e-nose opens the use of models to assess other biotic and abiotic stress effects on plants for further management practices such as fertilization, irrigation scheduling, and the general effect of climate change and climatic anomalies, such as heatwaves, frosts, and smoke contamination due to bushfires.

**Author Contributions:** Conceptualization, S.F., E.T. and C.G.V.; Data curation, C.G.V.; Formal analysis, C.G.V., S.F.; Funding acquisition, S.F.; Investigation, S.F., E.T. and C.G.V.; Methodology, S.F., E.T. and C.G.V.; Project administration, S.F. and C.G.V.; Resources, S.F. and R.R.U.; Software, S.F., R.R.U. and C.G.V.; Validation, S.F. and C.G.V.; Visualization, S.F., E.T. and C.G.V.; Writing—original draft, S.F., E.T. and C.G.V.; Writing—review and editing, S.F., E.T., R.R.U. and C.G.V. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was supported by seed funding from (i) the Australian Grain Pest Innovation Program, an investment from the Grains Research and Development Corporation (GRDC), Australia (ID: 2062311) and (ii) Bayer Grants4Ag sustainability-focused program (ID: 106027).

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Data and intellectual property belong to The University of Melbourne; any sharing needs to be evaluated and approved by the University.

**Acknowledgments:** This research was supported by the Australian Research Council's Linkage Projects funding scheme (LP160101475). The authors would like to acknowledge Bryce Widdicombe, Mimi Sun, and Jorge Gonzalez from the School of Engineering, Department of Electrical and Electronic Engineering of The University of Melbourne for their collaboration in electronic nose development.

**Conflicts of Interest:** The authors declare no conflict of interest.

## **References**


MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Sensors* Editorial Office E-mail: sensors@mdpi.com www.mdpi.com/journal/sensors

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18

www.mdpi.com

ISBN 978-3-0365-2905-9