**A Comprehensive Study of the Potential Application of Flying Ethylene-Sensitive Sensors for Ripeness Detection in Apple Orchards**

### **João Valente \*†, Rodrigo Almeida† and Lammert Kooistra**

Laboratory of Geo-information Science and Remote Sensing, Wageningen University & Research, 6708 PB Wageningen, The Netherlands; rodrigo.almeida@wur.nl (R.A.); lammert.kooistra@wur.nl (L.K.)

**\*** Correspondence: joao.valente@wur.nl; Tel.: +31-628-398-164

† These authors contributed equally to this work.

Received: 25 November 2018; Accepted: 14 January 2019; Published: 17 January 2019

**Abstract:** The right moment to harvest apples in fruit orchards is still decided after persistent monitoring of the fruit orchards via local inspection and using manual instrumentation. However, this task is tedious, time consuming, and requires costly human effort because of the manual work that is necessary to sample large orchard parcels. The sensor miniaturization and the advances in gas detection technology have increased the usage of gas sensors and detectors in many industrial applications. This work explores the combination of small-sized sensors under Unmanned Aerial Vehicles (UAV) to understand its suitability for ethylene sensing in an apple orchard. To accomplish this goal, a simulated environment built from field data was used to understand the spatial distribution of ethylene when subject to the orchard environment and the wind of the UAV rotors. The simulation results indicate the main driving variables of the ethylene emission. Additionally, preliminary field tests are also reported. It was demonstrated that the minimum sensing wind speed cut-off is 2 ms−<sup>1</sup> and that a small commercial UAV (like Phantom 3 Professional) can sense volatile ethylene at less than six meters from the ground with a detection probability of a maximum of 10%. This work is a step forward in the usage of aerial remote sensing technology to detect the optimal harvest time.

**Keywords:** apple orchards; modeling and simulation; unmanned aerial vehicles; fruit ripeness; ethylene gas detection

### **1. Introduction**

Sustainable agriculture is a top priority for all the governments and nations worldwide. Our population is growing fast, and our resources are getting more scarce each day. By 2050, our population will reach nine billion, requiring crop production to double in order to meet food demands [1].

An efficient way to increase the upcoming demands is to avoid fruit spoiling during the harvesting. Immature fruits result in poor quality and are subject to mechanical damage, and overripe fruit results in a soft and flavorless quality, with a very short shelf-life. In general, if the harvesting is done too early or too late, physiological disorders in the fruits will be provoked with the consequence of a shorter shelf-life [2]. These issues becomes more relevant with international trade of fruit and vegetables is increasing, making the shelf-life become an important marketing tool [3]. Therefore, the Optimal Harvest Date (OHD) will dictate the resulting fruit yield.

The OHD is usually obtained from maturity indices that take into account fruit chemical composition, like total soluble solids or total acidity, fruit physical properties, like firmness or color, fruit physiological changes, like aroma and ethylene emission rate, and finally, chronological features, like the number of days after planting or blooming [4].

Fruits' and vegetables' lifespan can be broken down into three steps: maturation (i.e., increase in fruit size), ripening (i.e., increase in flavor), and senescence (i.e., tissue death) [3]. Fruits that ripen after harvesting are denoted as climacteric fruits [4]. For climacteric fruits, like apples, the optimal harvest date occurs when the pre-climacteric minimum happens, equivalent to the end of the maturation process or the beginning of the ripening process, as illustrated in Figure 1.

**Figure 1.** Relative rate of respiration, ethylene production, and growth in climacteric and non-climacteric fruits. Adapted from [5].

The fruits' distinctive aromas are characterized by a wide variety of Volatile Organic Compounds (VOCs) that are released during their maturation process [6]. The VOCs can be detected using a single-gas sensor or an array of gas sensors (also known as an electronic nose) [7]. An important VOC that is associated with fruit ripening is ethylene [8].

Ethylene (*C*2*H*4) is a gaseous phytohormone that regulates several growth and development processes in plants. In climacteric fruits, ethylene production regulates processes like flesh softening, color changes, and aroma emissions during ripening [9]. Ethylene can be measured via gas chromatography techniques, electrochemical sensors, and optical sensors [10].

Most current destructive and non-destructive methods of assessing fruit maturity require the sampling of individual fruit in the field and in some cases a further assessment in the lab [8,11,12]. That process is both labor intensive, since it requires an operator to physically go to the field and sample fruits, and dependent on the individual fruits that are sampled. Using the electronic noses and gas measurements with the fruit in concentration chambers provides less noise and augments the ethylene signal substantially, but it requires time and manpower to harvest and analyze the fruit, and at the same time, it is a method that is highly reliable on the sampling scheme used for the fruit [13,14].

The increasing availability of UAVs is a potential solution to acquire remotely and quickly data on a plot of land without the manual labor that would be required traditionally. The land manager/owner does not have to survey the plot manually, but can deploy a UAV. There are several aerial remote sensing applications in agriculture that were successful reported as an important contribution and step forward in Precision Agriculture (PA) practices [15].

Using the combination of airborne and electronic nose technology to map ethylene concentration in the orchard might give important information regarding fruit maturity in a fast and more representative way, without the need for additional labor. To the author's knowledge, no studies have been made so far regarding the potential limitations of this mapping application, but one could hypothesize that the sensitivity of the sensor and the atmospheric conditions during the measurements (i.e., wind speed and direction) are decisive.

Although plenty of research has been developed linking ethylene emission or VOC emission in apples to their maturity [8,16–18] and, in some literature, there are indications towards measuring ethylene in the field [11,19], to the authors' knowledge, no work of this sort has been carried out. This work should, because of this, be considered as a first attempt at understanding the potential and the limitations of such measurements, creating with it a theoretical framework from which further work can be developed.

On the other hand, in air quality monitoring systems, some development has occurred considering mobile measurement platforms such as a UAV, especially when it comes to gas source localization and adaptive path planning for gas plume estimations [20,21]. Additionally, the optimal position of a gas sensor in the UAV has been studied using a simulation approach by [22]. Although some successful gas sensing experiments have been reported with the sensor pointing down [23,24], no literature was found regarding the challenges of measuring in an orchard environment, especially when it comes to the dispersion dynamics and its effect on the measurement process. Additionally, most of these works were performed using artificial gas sources that are easily modeled and do not take into account the complexities of a natural emission source such as apples.

The main goal of this work is to evaluate if ethylene produced by apple orchards can be sensed using an electrochemical sensor mounted on a UAV. The evaluation is made using a model-based approach to identify the most influential factors for detection, after which the model results are compared to measurements from a UAV-mounted electrochemical sensor flown over an experimental apple orchard.

### **2. Materials and Methods**

### *2.1. Study Area*

The study area in which this research is based on is located in the Wageningen Plant Research for Flower bulbs, Nursery stock and Fruits in Randwijk, The Netherlands (see Figure 2a). A test plot of 0.17 ha of apple trees was selected (Study Area A). It had a length of 5 m in between tree rows and 1.1 m between trees in the row (5 × 1.1), which results in 14 lines of about 300 trees in the plot. Two apple (*Malus domestica*) cultivars are shown in Figure 2c: Junami and Golden Delicious (on the headers of each line for pollination purposes). Additionally, one other test plot was selected: Study Area B, which is a traditional apple orchard with 5 m between rows and 1 m between apple trees. The variety in this plot is Natyra. Only two lines were selected in Study Area B shown in Figure 2d.

**Figure 2.** General and detailed map of the study area. (**a**) Map of the selected study areas in Randwijk (A and B). (**b**) Sections of apple lines used for fruit load assessment in Study Area A. Some lines show discontinuities since trees were removed in that section. (**c**) Junami and Golden Delicious cultivar. (**d**) Natyra cultivar.

### *2.2. Ethylene Flying Detector*

The selected ethylene sensor was the Winsen ME4-C2H4, an electrochemical gas sensor, also referred to as a Taguchi gas sensor (TGS). According to the sensor specification sheet, it has a sensing range of 0–100 ppm of *C*2*H*<sup>4</sup> and a response and recovery time of 100 s. Furthermore, the manufacturer indicates that the sensor has less than 10% of error.

In order to test both the ethylene sensor and the entire prototype, several preliminary experiments were conducted. With this, response times (amount of time it takes the sensor to detect the presence of ethylene) and recovery times (amount of time until the sensor signal returns to null after the ethylene source is removed) were tested. Figure 3 illustrates the results from the experiment in a controlled environment during four hours. The ethylene-sensitive sensor was placed inside a sealed plastic container of 40 cm × 50 cm × 40 cm with four *Junami* apples. After 3.5 h, the box was open, and after that, the sensor was placed outside the box.

The UAV-based measurements were conducted with the Phantom 3 Professional. This is a quadcopter drone weighing 1280 g with approximately 23 min of maximum flight time designed primarily for photo and video capture applications. The default payload (an HD camera) was removed and replaced with the ethylene sensor, as illustrated in Figure 4. The maximum payload of the UAV is 300 g, and the total payload was 218 g (very similar to the default payload).

Additionally, the sensor prototype is equipped with a memory card when the device is on, with a configurable measurement frequency, were measurements are recorded with the respective time-stamp and output signal from the sensor. In these experiments, one measurement per second (1 Hz) was determined as the measurement frequency.

**Figure 3.** Tests conducted indoors in a sealed environment with an ethylene emission source (apples) that was placed in the box at the green line and removed at the red line.

The complete remote sensing system design for detecting and measuring ethylene is illustrated in Figure 4, and it has three main components: electrochemical sensor, Arduino board, and battery. The system was composed of commercially-available materials and open source tools. Finally, it can be easily acquired with a cost of less than 1000 Euros.

**Figure 4.** The ethylene flying-detector system: (**a**) air-ground system architecture and (**b**) Phantom 3 Professional (UAV) with the sensor prototype attached.

### **3. Determining the UAV Hovering Height**

Understanding how the ethylene emission distributes above the orchard canopy is very important in order to define a starting sampling strategy. Determining the height above the orchard canopy where ethylene presents a higher concentration is not a trivial task mainly because there are several biophysical parameters such as the wind speed, temperature, and humidity. In this study, the wind speed (environment) and wind flow (UAV rotors) effect on the ethylene distribution was observed, while the temperature and humidity were omitted.

In order to determine the ideal sensing position, a modeling and simulation framework was developed to decrease the system deployment and testing times. Moreover, it allows a more reliable data acquisition by restricting the aerial sampling to areas within the orchard where a minimum ethylene concentration is expected. The modeling and simulation framework used GADEN, a gas dispersion simulation framework developed by [25], which is compatible with the ROS (Robot Operating System) [26].

### *3.1. Environment Wind Speed Modeling*

Several parameters had to be obtained from the orchard field manager and from the research center where the experimental field is located in order to build the simulation workspace in GADEN. The parameters taken into account to simulate the ethylene distribution within the orchard when subject to wind were:


These parameters were used to define an ethylene emission source for each tree represented in the simulation environment. Only one sample was taken per tree. Ideally, it would be possible to simulate each individual fruit on the tree canopy, but in this case, a simplification was performed using an artificial center for the total emissions of ethylene from a single tree. The distribution of the samples used in the simulations is illustrated in Figure 5.

**Figure 5.** Distribution of the parameters used in the simulations: {*e*, *E*}1, {*e*, *E*}2, and {*e*, *E*}<sup>3</sup> stand for pre-climacteric, entering climacteric, and climacteric stages, respectively. Moreover, *l* stands for fruit load per tree and h for height.

This artificial center (P) can be described as the average position of the emission sources of the tree and is defined by a height (h) and direction in relation to the main stem (dir). This dir parameter in relation to the stem is defined in order to make the distribution of this parameter uniform, and therefore, the number of directions must be divisible by the amount of trees. In this case, six directions were defined, and each one was the sixth part of a circle, equivalent to 60◦.

This simplification was applied mainly due to computational constraints. The simulator creates for each emission source a separate process, and for each process and time-step, the output is a simulation file of 90 MB. One can imagine that if each apple were simulated individually, the local memory of a standard computer would be very quickly surpassed. At the same time, according to our observations in the field, apples are usually clumped together in a branch, which means that the average distance between apples is in general small. Several branches can be further apart, but usually occupy one zone of the canopy. Figure 6 shows the orchard CAD model and the assumptions previously explained and used in the simulation process.

**Figure 6.** CAD model and respective parameters set in GADEN.

In the designed workspace, two environment inlets were set, the *x*-plane = 0 and the *y*-plane = 0. One of these inlets was chosen in order to simulate wind flow in a given direction: *x* for *x* and *y* for *y* (see Figure 7). The corresponding wind speed was assigned to this inlet, while the exact opposite plane (at the end of the environment) was set as a pressure outlet. All the other boundaries in the environment were set as walls with a slip setting. The computational fluid dynamics simulations were developed in SimScale, an online CFD software, with the recommended settings given in [25].

The number of ethylene-occupied cells in the environment is another important metric since it provides information on the probability of randomly finding an ethylene-filled cell. To get an understanding about which height is the most suitable to fly above the orchard, we must first look into the percentage of occupied cells with ethylene concentration above the canopy, as represented in Figure 8.

**Figure 7.** Wind flow simulations used as input for GADEN without considering the rotors' airflow.

**Figure 8.** Percentage of occupied cells (cells with ethylene concentration higher than zero) in the environment across all time steps and simulations for the z-plane.

In almost all the simulations, less than 5% of the cells above the tree height were filled with ethylene. When wind speed was zero, there were more ethylene-filled cells above the tree height, but the majority of ethylene-filled cells can still be found under the tree height. It is also clear that the ethylene filled cells above the tree height had much lower ethylene concentration than the cells lower than the tree height. Therefore, the more likely place in the *z* axis to find ethylene-filled cells is between 1 and 2 m, where all simulations showed the biggest percentage of occupied cells.

To evaluate the impact of wind speed on the average ethylene concentration, Figure 9 was constructed. When looking at the environment, one can conclude that on average, a 1-ms−<sup>1</sup> increase in wind speed results in a 30% decrease in average ethylene concentration. In the rows, the zone with higher average ethylene concentration, this decrease was 440%, while for in-between rows, this was only 110%. This difference is also accompanied by a very large difference in absolute ethylene concentration. This gives us an indication that choosing to sample in the rows might yield a higher concentration, but this measurement is very sensitive to the wind conditions.

**Figure 9.** Relation between wind speed and average ethylene concentration in the four different zones. The colored lines represent the trend line for each zone, as given by the equation *y* = *a* + *bx*, where *b* is the decrease in average ethylene concentration (ppb) per additional unit of wind speed (ms<sup>−</sup>1).

From Figures 8 and 9, it can be inferred that higher concentration levels of ethylene can be found below the trees and that the wind speed cut-off for the best practice is 2 ms<sup>−</sup>1. In the next section, the rotors' airflow affect will be added to the environment to corroborate the results previously obtained omitting the rotors' airflow.

### *3.2. Rotors' Airflow Modeling*

In order to simulate the effect of a UAV flying in the orchard, two different drone positions were considered: over the row (Position 1) and in between rows (Position 2). The drone over the row was positioned at 4 m, while the drone in between rows was positioned at 2 m. Only one wind scenario was considered for these simulations, *x* = 2 ms−1, in line with the results obtained in the previous section. This results in a total of six drone simulations, as exemplified in Table 1.

**Table 1.** Summary of drone simulator runs. The simulation number (#) will be used as a reference for naming each of these scenarios in the following sections.


The wind flow caused by the rotors of the drone was modeled as four square air inlets with a given wind speed in the negative*z* direction. The squares had a width of 0.1 m, which is approximately the diameter of a single rotor in the Phantom 3 Professional. A relationship exists between the rotation speed of the propeller and the resulting wind speed generated, or thrust [27]. Taking the example of the Phantom 3 Professional in hovering flight in normal conditions, the rotors spin at around 8000 rpm [28], which results in an airflow of about 18 ms−1. This wind speed was assigned to the velocity inlets mentioned above. The resulting wind flow simulations used as input for the GADEN simulations are displayed in Figure 10.

**Figure 10.** Drone wind flow simulations used as input for GADEN.

The biggest difference between the ethylene concentration distribution with and without a drone appears to be the range of values that are present. When looking at the climacteric simulations with the drone in both positions, the maximum concentration was about 250 ppb, while without the drone, the same conditions yielded a maximum of 300 ppb. This range also decreased substantially with the height of the drone (Position 2 to 1), from 250 to 150 ppb, as Figure 11 clearly shows. There was also a gas concentration effect right under the drone position where it appeared that the wind displacement of the gas decreased.

**Figure 11.** Maximum ethylene concentration in the *xz*-plane (top plots) and *yz*-plane (bottom plots) for the drone simulations in the climacteric stage: (**a**) omitting rotor wind flow; (**b**) Drone Position 1; and (**c**) Drone Position 2. The ethylene sources' position and emission rate are also provided at the bottom.

When looking at the immediate vicinity of the position of the drone, a clear difference was detected between Position 1 and 2, as Figure 12 illustrates. While no ethylene was detected around Position 1, at Position 2, in every simulated time step, ethylene was present. This is a consequence of the concentration effect mentioned above.

**Figure 12.** Average ethylene concentration across time in the vicinity of the drone position (±0.2 m in *xyz*) for Positions 1 and 2.

The distribution of the occupied cells in the environment was also very different, as Figure 13 illustrates. The percentage of occupied cells was in general lower due to the increase in average wind speed in the environment, and especially on the *z* axis, a compression of the occupied cells closer to *z* = 0 was visible, depending on the height of the drone, which further confirms the concentration of ethylene effect described previously. This compression results in a higher percentage of occupied cells closer to the ground.

**Figure 13.** Percentage of occupied cells (cells with ethylene concentration higher than zero) in the environment across all time steps and simulations for the z-plane.

In general, we can say that the drone flying overhead had two main effects: a decrease in average ethylene concentration in the orchard, directly correlated with the height of the drone (4 m caused more gas dispersion than 2 m) and a concentration of gas directly under the drone, close to the ground (an effect that was more discernible at a 2-m height). In general, the drone flying overhead at 4 m caused a decrease in average ethylene concentration of 95%, while at 2 m, a decrease in 90%.

### **4. Field Tests on the Orchard**

The results obtained in Figure 3 provide evidence that both the wind speed and rotor wind flow played an important role in ethylene sensing. The behavior observed in the simulation is used now as the reference to define boundaries in the field tests, observe other measuring heights, and analyze the feasibility of this practice in the real orchard environment from an aerial mission perspective.

### *4.1. Sampling Scheme*

In order to analyze the sensor functioning and detect ethylene in the selected plot on the ground and using a UAV, both a spatial and temporal sampling scheme was defined. The measurements with the UAV were conducted also per measurement point: at 6 m and 12 m during 120 s. This difference in sensing time has to do with practical constraints related to the battery life of the UAV. Please refer to Figure 14.

**Figure 14.** Hovering and sampling at two different position per experiment: (**a**) low and high heights within the orchard; (**b**) samples on Study Field A over the two days; and (**c**) samples on Study Field B over the two days.

For Study Area A (Figure 2a,c), nine measurement points were selected for UAV-based measurements on two different days: 15 and 21 September 2017. These points were selected using a three by three grid in the plot and placing the point roughly in the center of each grid. Additionally, two UAV-based measurements were conducted also per measurement point: at 6 m and 12 m above the ground (3 m and 9 m above the canopy), with a sensing time of less than 3 min. This difference in sensing time has to do with practical constraints related to the battery life of the UAV employed. With a battery charge from the Phantom 3 Professional, Study Area A was sampled a maximum of nine times.

For Study Area B (Figure 2a,d), the same spatial sampling approach as Study Area A was used, but in this case, only two lines were taken into account. Three points were selected in the middle of these two apple lines, where measurements were taken as described before, with a different height in the UAV measurements: 4 and 6 m. These measurements were also performed on two different dates: 4 and 10 October 2017.

### *4.2. System Deployment and Sensitivity Tests*

A weather station adjacent to the plot was used to provide real-time atmospheric measurements of wind speed and direction, air temperature, and air moisture during the sampling period on the different dates. The average wind speed on 15 September was about 3 ms−1, while on 21 September, it was about 3.2 ms<sup>−</sup>1. The average wind speed on 4 October was about 5 ms−1, while on 10 October, it was about 3.9 ms<sup>−</sup>1. Furthermore, the first harvesting day for Study Areas A and B was, respectively, 28 September and 10 October.

The average measurements obtained per day are shown in Figure 15. Looking at the UAV-based measurements, no output was measured above 10% of the reference signal, thus achieving a maximum of 0.5 ppm (500 ppb). We can notice that there was no variation in the first two days of measurements (Study Field A). Nevertheless, there was an increase from the 4–10 October, which in this case suggested that there was more ethylene concentration on the second date (which was expected). The decrease in wind speed might also explain some of this variation from one date to the other.

**Figure 15.** Sensor voltage output for aerial measurements on different days.

When flying at different heights (see Figure 16), an interesting behavior was observed: the measurements performed at a higher altitude (12 m) showed no variation from the baseline; at 6 m, there were more outliers that indicated more detection peaks, and at only 4 m, some variation was actually detected.

**Figure 16.** Sensor voltage output for aerial measurements at different heights.

### **5. Discussion**

Although plenty of research has been developed linking ethylene emission or VOC emission in apples to their maturity [8,16–18] and in some literature there are indications towards measuring ethylene in the field [11,19], to the authors' knowledge, no work of this sort has been carried out, and there is to date no investigation addressing flying ethylene-sensitive sensor systems.

The modeling part of this study has shown that several factors influence the ethylene emission from the apple trees: these will be discussed in more detail below. We included the main influencing factors in the model, but additional ones, like time of the day, might be important, as well. These should be evaluated in follow-up studies.

In this section, the lessons learned from this study and major outcomes will be summed up.

### *5.1. Wind Speed and Rotors' Effect*

The hypothesis that wind speed decreases the probability of ethylene detection was verified through simulation. Although, this supposition was expected, in the literature, there was not any discussion of the the maximum wind speed to ensure the minimum sensing. It was shown through simulation that the wind speed cut-off was 2 ms−<sup>1</sup> (Figure 9).

Moreover, several authors discussed the rotor effect when using this sensor technology mounted in multi-rotor UAVs. It was stated that this is true, and there is a small margin left for detection that may be improved at a determinate hovering height (Figure 8).

### *5.2. Theoretical versus Practical Optimal Sampling Height*

Simulation and practical results agreed that for wind speeds higher than 2 ms<sup>−</sup>1, there would be very few or almost no detections. The minimum wind speed recorded during the field campaign was 3 ms<sup>−</sup>1, and indeed, the sensor variation was very low.

The most important analytical outcome reinforced by the field tests was that flying lower would increase ethylene detection. Furthermore, flying close to or under the tree canopy gave a better result, although the margin for detection was limited, as shown in Figure 13. For the sake of the security and safety of the platform and the flying crew, the drone did not fly under the tree canopy, but it was stated that as the height decreased, more variation was observed (see Figure 16).

This study suggests that the UAV overflight should be performed at the lowest possible height to decrease the impact of the wind flow generated by the rotors on the ethylene distribution. However, further flying maneuvers should be explored when flying within or close to the orchard.

### *5.3. Discrete versus Continuous Sampling*

It is important to note that the UAV measurements will only be considered during hovering flights over the determined measurement point. With that, the data acquired during the path of the UAV in the study area was not determinant, but it is an important point for further research considering moving and continuous measurements. However, the increased response time from the sensors (>90 s) should be taken into consideration in the sampling strategy to adopt.

### *5.4. Ethylene Detection over the Season and Inferring the OHD*

The simulations showed that the range of ethylene concentration in an orchard was in the ppb range, and wind speed had a very big impact on this ethylene concentration. According to [29], the usual measuring range for VOCs starts at 100 ppb, and not very many sensors offer a sub-ppb range. If the state-of-the art of the technology does not provide a sensor with such characteristics, this might decrease the feasibility of this remote sensing strategy.

The observations when entering the climacteric stage were expected to be stronger than the ones observed in Figure 12. In order to determine the OHD, an ethylene increase during this stage was expected. Therefore, this strengths even more the idea that a sensor with more sensibility must be considered in future experiments.

### *5.5. Feasibility of Using Flying Ethylene-Sensitive Sensors*

The challenges of measuring ethylene with a UAV in an orchard environment especially when it comes to the dispersion dynamics and its effect on the measurement process was not found in the literature.

On the other hand, in air quality monitoring systems, some development has happened considering mobile measurement platforms such as a UAV, especially when it comes to gas source localization and adaptive path planning for gas plume tracking [20,21]. The benefits of using the mobile platform for air quality monitoring and also for the purpose of this research are similar: they can offer high resolution sampling both at a spatial and temporal level at a low cost [23]. However, most of these works are performed using artificial gas sources that are easily modeled, and none takes into account the complexities of a natural emission source such as apples trees.

The optimal position of the gas sensor also has been discussed [22,23] and could have been a valuable reference in this study. However, in one work, the authors performed experiments indoors, inside a garage, and in the other work, the outcomes provided were very limited. Both authors suggested different sensors placements: pointing down separated from the main frame and on the top of the platform. In this study, the sensor was used pointing down because the frame of the UAV did not allow other configurations. Further, studies are needed to explore the position suggestion from the previous authors. Nevertheless, the simulations provided in this study reveal that higher concentrations values will be found mostly below the platform (see Figure 13).

### *5.6. UAV vs. UGV*

The discussion of which mobile vehicle will perform better in a determinate agricultural management task is not new. In general, UAVs have more of a sensing role, like aerial surveying, where there is the need to increase spatial resolution; while Unmanned Ground Vehicles (UGV) have more of an actuation role, where there an action should be performed, such as mechanical weeding [30].

In this study, we were interested in a versatile platform that could carry different instrumentation and sample the orchard on different 2D and 3D positions. Moreover, while this could be achieved at different heights with the UAV, it would be limited to a static height with the UGV. Moreover, UAVs are considerable better than UGVs, as regards the price, maintenance, and portability. Summing up, they can offer high resolution sampling both at a spatial and temporal level at a low cost [23].

### **6. Conclusions**

This is the first study to investigate the feasibility of using a flying ethylene-sensitive sensor systems in a fruit orchard only some days before being harvested. A simulated environment built from field data was used to understand the spatial distribution of ethylene within the apple orchard, to define the field sampling boundaries, and to evaluate how this influences the detection from a miniaturized sensor on a UAV. Finally, some preliminary tests in the orchard field were carry out to elucidate the sensor;s sensitivity and to contrast with the theoretical study.

The drone flight effect on the ethylene distribution was tested, and we concluded that flying at a higher altitude will cause more disturbance and lower the average ethylene concentration than flying lower. At the same time, at higher altitude, almost no ethylene is present in the vicinity of the drone. In general, the drone flying overhead at 4 m causes a decrease in average ethylene concentration of 95%, while at 2 m, a decrease of 90%. The detection margin is short and not sufficient to infer the fruit maturity, where increased variability over the season is expected. With these results, the issue of the measurement system sensitivity is further confirmed: a requirement for a sub-ppb ethylene sensor is clearly supported.

The use of a UAV to perform ethylene measurements in an uncontrolled environment such as an apple orchard still needs to be further explored, but it is suggested that future practices using this system are imminent with further research. The effect different UAV propeller spans on the intensity of dispersion of the gas and also detailed response models of different sensor models are, among others, pressing issues to be considered in the future.

**Author Contributions:** Conceptualization, J.V. and R.A.; methodology, J.V. and R.A.; software, R.A.; validation, R.A.; formal analysis, J.V. and R.A.; investigation, J.V. and R.A.; resources, J.V. and L.K.; writing, original draft preparation, all; writing, review and editing, all; supervision, J.V. and L.K.; project administration, J.V.; funding acquisition, L.K.

**Funding:** This work was supported by the SPECTORSproject (143081), which is funded by the European cooperation program INTERREGDeutschland-Nederland.

**Acknowledgments:** The authors would also like to thank Pieter van Dalfsen, whom is with Wageningen Plant Research, for providing the field test and data that helped us to understand more about the apple orchard lifecycle.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Investigating 2-D and 3-D Proximal Remote Sensing Techniques for Vineyard Yield Estimation**

### **Chris Hacking 1, Nitesh Poona 1, Nicola Manzan <sup>2</sup> and Carlos Poblete-Echeverría 3,\***


Received: 27 June 2019; Accepted: 20 August 2019; Published: 22 August 2019

**Abstract:** Vineyard yield estimation provides the winegrower with insightful information regarding the expected yield, facilitating managerial decisions to achieve maximum quantity and quality and assisting the winery with logistics. The use of proximal remote sensing technology and techniques for yield estimation has produced limited success within viticulture. In this study, 2-D RGB and 3-D RGB-D (Kinect sensor) imagery were investigated for yield estimation in a vertical shoot positioned (VSP) vineyard. Three experiments were implemented, including two measurement levels and two canopy treatments. The RGB imagery (bunch- and plant-level) underwent image segmentation before the fruit area was estimated using a calibrated pixel area. RGB-D imagery captured at bunch-level (mesh) and plant-level (point cloud) was reconstructed for fruit volume estimation. The RGB and RGB-D measurements utilised cross-validation to determine fruit mass, which was subsequently used for yield estimation. Experiment one's (laboratory conditions) bunch-level results achieved a high yield estimation agreement with RGB-D imagery (r<sup>2</sup> = 0.950), which outperformed RGB imagery (r2 = 0.889). Both RGB and RGB-D performed similarly in experiment two (bunch-level), while RGB outperformed RGB-D in experiment three (plant-level). The RGB-D sensor (Kinect) is suited to ideal laboratory conditions, while the robust RGB methodology is suitable for both laboratory and in-situ yield estimation.

**Keywords:** Kinect sensor; RGB; RGB-D; image segmentation; colour thresholding; bunch area; bunch volume; point cloud; mesh; surface reconstruction

### **1. Introduction**

Modern-day viticulture has seen an increase in the use of robust scientific methods combined with new technologies to improve overall production [1]. Precision farming, a direct result of the modernisation in agriculture, can be discipline-specific—i.e., specific to horticulture or viticulture. Precision viticulture aims to effectively manage production inputs to improve yield and grape quality while reducing the environmental impact of farming [2]. The use of remote sensing technology and techniques in precision viticulture allows variability to be monitored at vineyard level, per individual block or on a vine basis. Aspects such as vine shape, size and vigour can be observed, providing more accurate yield and fruit quality information [3].

Yield estimation provides information to the winegrower that can be used to manage the vineyard, optimising quality and yield [4]. Awareness of the estimated yield allows the vineyard manager to manipulate the vines to obtain the desired grape characteristics, and provides an effective plan for use during the winemaking process [5]. Accurate yield forecasting assists with logistical planning, both during and after the harvest; for example, what volume will be harvested, where the grapes will be stored, and an expected market price [5].

Wolpert and Vilas [6] outlined a two-step method for vineyard yield estimation. The start of the process determines the number of bunches situated on individual vines early in the season. Subsequent determination of bunch weights occurs at vèraison. Unfortunately, the two-step method is labour-intensive, error-prone and destructive in the estimation process. Additionally, De la Fuente et al. [7] presented yield prediction models using destructive, manually collected data between fruit-set and vèraison, aligning with the more 'classical' two-step method. To overcome the limitations of the manual methods, modern techniques have employed sensors attached to automatic harvesters to monitor yield during the harvesting process [3]. Yield estimation before harvest is becoming possible, with increasing accuracies when making use of non-invasive proximal remote sensing (PRS) technology and techniques [8–12].

A common PRS approach employs 2-D (2-dimensional) RGB (Red, Green, Blue) imagery, captured with a digital camera for yield estimation; for example [8,9,13]. The 2-D approach can be categorised into two steps: (i) image segmentation (to differentiate the bunch from the background); and (ii) yield estimation, using a suitable bunch metric (i.e., the pixel count of the bunch area in an image).

Diago et al. [13] used image classification (for segmentation) and a bunch metric to classify 'background noise' and 'grape' classes. The authors achieved a testing r<sup>2</sup> of 0.73. An alternative segmentation approach to image classification uses colour thresholding to differentiate grapes from the background. Dunn and Martin [8] presented an RGB colour thresholding approach that applied specific thresholds to the colour properties of an RGB image, generating a binary image—background and grapes. The authors achieved an r<sup>2</sup> of 0.72. A similar thresholding approach was adopted by Liu, Marden and Whitty [9] and Font et al. [14]. Liu, Marden and Whitty [9] introduced a bunch-level experiment under laboratory conditions with manual colour thresholding for image segmentation, which resulted in a yield estimation r<sup>2</sup> of 0.77. Additionally, these authors [9] presented a more complex automatic process for image segmentation on the same dataset presented by Dunn and Martin [8], resulting in an improved r<sup>2</sup> of 0.86. Automatic segmentation stemming from colour thresholding removes the human factor of manual thresholding, resulting in a more robust methodology [15,16]. However, these techniques are more sophisticated than manual thresholding, and therefore require adequate knowledge of computer vision techniques.

The second step, yield estimation, depends on the selected bunch metric. A simple pixel count of the segmented bunches is a favoured metric [8,13], with adaptions to the pixel count presented by Liu, Marden and Whitty [9] and Font et al. [14]. Liu, Marden and Whitty [9] tested five metrics: (i) volume, (ii) pixel count, (iii) perimeter, (iv) berry number and (v) berry size. Yield estimation using the pixel count produced superior results over the remaining metrics [9]. Unlike Liu, Marden and Whitty [9], Nuske et al. [11] avoided image segmentation and used a berry detection algorithm to determine a berry count. The use of the berry count as a yield estimation metric provided an r2 of 0.74. Subsequent work on multiple multi-temporal datasets produced r2 values between 0.6–0.73 [4].

A less common PRS approach utilises an RGB-D (RGB-Depth) camera to capture a 3-D (3-dimensional) model in either mesh or point-cloud format, representing the bunch or vine in a 3-D coordinate system [17]. The use of the Microsoft Kinect™ constitutes an ideal low-cost RGB-D sensor for in-situ imaging of vines [17]. The resulting 3-D models can be used to extract volumetric measurements for yield estimation. A limited number of studies have investigated the utility of the Kinect sensor for volume estimation. For example, Wang and Li [18] and Andújar et al. [19] employed the Kinect sensor under laboratory conditions for the volume estimation of sweet onions and cauliflowers, respectively. To date, the only use of an RGB-D Kinect sensor for yield estimation within viticulture was presented by Marinello et al. [10]. The authors assessed the sensor position for volume estimation of table grape bunches by testing two viewing angles, side-on and bottom-up. Multiple sensor-target distances were tested for the side-on viewing angle. Marinello et al. [10] concluded that a side-on viewing angle with a sensor-target distance of 0.8–1.0 m generated the best results.

An alternative PRS technique combines 2-D imagery with computer vision techniques, whereby 3-D models are created from the 2-D RGB images. Volume estimations are extracted from the subsequent 3-D models, enabling yield estimation calculations. Advanced computer vision techniques allow substantial automation in the process [20–22].

RGB and RGB-D technology and techniques incorporated into a suitable methodology present a viable solution for vineyard yield estimation. The established use of RGB imagery is evident, while the novel use of RGB-D imagery shows promise for future yield estimation. However, to date no study has investigated 2-D and 3-D PRS techniques side-by-side for vineyard yield estimation. Examining these two techniques in a commercial vineyard could provide insight into their capabilities and operational potential.

A key aspect for consideration when implementing these methodologies within a vineyard is the canopy coverage. The combination of essential canopy management practices and the vineyard's trellis system—particularly a vertical shoot positioned (VSP) system—directly influences canopy coverage, and inevitably the success of PRS techniques for yield estimation. High canopy coverage in the bunch zone results in bunch occlusion from the sensor. The incorporation of a canopy treatment was therefore proposed for this study.

This study aimed to investigate 2-D RGB and 3-D RGB-D PRS techniques for yield estimation in a VSP vineyard, using bunch area/volume estimation. The study was undertaken as three experiments, occurring under laboratory and field conditions. Field conditions were conducted at both bunch- and plant-level. Two canopy treatments were implemented from direct canopy manipulation and were defined as full canopy (FC) and leaf removal (LR). Hypothetically, the LR treatment will produce better estimation results. Furthermore, to achieve the aim of the study, two objectives were determined: to develop independent 2-D and 3-D yield estimation methodologies and to analyse and compare the success of the two PRS techniques for yield estimation.

### **2. Materials and Methods**

### *2.1. Study Site*

The study was carried out at the end of the 2016/17 growing season in a drip-irrigated Shiraz vineyard at the Welgevallen Experimental Farm located in Stellenbosch, South Africa (33◦56 26 S; 18◦51 56 E). The vineyard was planted in the year 2000, with a grapevine spacing of 2.7 × 1.5 m in a North-South orientation, and lies approximately 157 m above sea level. A seven-wire hedge VSP trellis system with three sets of moveable canopy wires is used in the vineyard. The Stellenbosch area falls within the coastal wine grape region of the Western Cape, which is characterised by a Mediterranean climate with long, dry summers [23]. Thirty-one individual vines were selected across three rows and sampled for this study (Figure 1).

### *2.2. Data Acquisition*

Data was acquired between 28 February and 3 March 2017 (harvest), where data collected in situ was implemented with the two canopy treatments. The purpose behind the canopy treatments was to gain a direct line of sight to the bunches, which were generally hidden by the vine's canopy. No manipulation of the canopy, essentially the normal canopy condition, was classified as the FC treatment. The alternative LR treatment occurs after manual manipulation of the canopy, resulting in complete leaf removal in the bunch zone, effectively displaying the bunches.

Figure 2 illustrates the data-acquisition process that resulted in a total of ten datasets, as outlined by each step:

1. RGB and RGB-D imagery acquired for the FC treatment taken at bunch-level (n = 21; randomly selected and labelled bunches from the 31 vines) and plant-level (n = 31; individual vines). The resulting four datasets included: (i) RGB: bunch; (ii) RGB-D: bunch; (iii) RGB: vine; and (iv) RGB-D: vine.


**Figure 1.** Location of the Shiraz vineyard in Stellenbosch, South Africa. Inset map (red rectangle) shows the three rows used for data collection.

**Figure 2.** Data-acquisition protocol used in this study. Order of acquisition indicated by the grey arrow. {Key: FC = full canopy; LR = leaf removal; Lab = laboratory; Ref = reference measurements; Exp = experiment}.

RGB and RGB-D images were captured by two PRS sensors. A D3200 digital single-lens reflex camera (Nikon, Tokyo, Japan) was used for capturing 24.2-megapixel RGB images. The camera captured images in *auto* mode, with the flash disabled. The second sensor, a Kinect™ V1 (Microsoft, Redmond, WA, USA), was used to capture RGB-D imagery as either a mesh (bunch-level) or a point cloud (plant-level). The following data-acquisition subsections provide experiment-specific details.

### 2.2.1. Reference Measurements

Reference measurements were collected under laboratory conditions for the 21 individual bunches and the 31 individual vines. Individual bunch mass (g) was recorded with a Mentor scale (Ohaus, Parsippany, NJ, USA), and individual vine mass (g) was recorded with a Viper SW scale (Mettler Toledo, Columbus, OH, USA). Bunch/vine volume measurements were recorded as the displacement (mL) of water when bunches were submerged in a container of water [24]. The mass and volume measurements were used as reference measurements for the estimated measurements derived from the two PRS techniques.

### 2.2.2. Experiment One: Individual Bunches under Laboratory Conditions

The 21 individual bunches were imaged using the RGB camera and Kinect sensor under laboratory conditions; i.e., the laboratory was illuminated with white fluorescent lights and natural light entered through the windows. Each bunch was suspended from a tripod against a green background to maximise image contrast (Figure 3a):


**Figure 3.** Data acquisition under laboratory conditions. (**a**) Experimental setup for image capture; (**b**) RGB image of an individual bunch with a ruler for reference length; and (**c**) RGB-D (Kinect mesh) of an individual bunch.

### 2.2.3. Experiment Two: Individual Bunches in Field Conditions

Images of the same 21 individual bunches were captured in in-situ conditions under both canopy treatments with the same RGB and RGB-D proximal sensors that were used in experiment one. Here, individual bunches were still attached to the respective vines:


**Figure 4.** Data acquisition of individual bunches in the field. (**a**) RGB image with full canopy (FC); (**b**) RGB image with leaf removal (LR); (**c**) RGB-D (Kinect mesh) with FC; and (**d**) RGB-D (Kinect mesh) with LR.

### 2.2.4. Experiment Three: Individual Vines in Field Conditions

The same RGB and RGB-D sensors were used to capture in-situ images at plant-level (31 individual vines imaged for both FC and LR treatments):


**Figure 5.** Experiment three data examples at plant-level. RGB imagery of FC (**a**) and LR (**b**) treatments. RGB-D (Kinect point cloud) of FC (**c**) and LR (**d**) treatments.

### *2.3. Data Analysis*

Data pre-processing and analysis occurred sequentially from experiment one (indicated via red arrows in Figure 2). The canopy treatments existing in experiment two and three had no effect on how the datasets were analysed for yield estimation. The proposed RGB and RGB-D methodologies were created on the LR datasets, before being directly applied to the FC datasets.

### 2.3.1. RGB Imagery

RGB images were processed using a custom script in MATLAB® [27], as follows:


Figure 6 illustrates the image segmentation process for experiment one at bunch-level (Figure 6a,b), and experiment three at plant-level (Figure 6c,d).

For experiment three, segmentation produced a segmented bunch area image (Figure 6d) for a single side of the vine, with a similar image for the reverse side of the vine. To obtain a single area per vine, the *Total Bunch Area of Vine (TBAV)* was calculated, as follows:

$$TBAV = (Ae + Aw)/2\tag{1}$$

where *Ae* was the area of the east-facing side of the vine, and *Aw* represented the vine's west-facing area. The *TBAV* was in cm2. Segmentation success was assessed on the testing subset (25% of the total dataset). A confusion matrix was computed for each experiment, using the predicted values from the segmented binary image versus the actual values of the original RGB image. F1-score and accuracy metrics calculated from the confusion matrix evaluated the segmentation accuracy.

**Figure 6.** (**a**) Represents the original RGB image, with (**b**) illustrating the segmented binary image at bunch-level. (**c**) An RGB image of an east-facing vine, with (**d**) the segmented binary image at plant-level.

2.3.2. RGB-D (Kinect) Imagery

Due to the two image data types captured—mesh vs. point cloud—RGB-D imagery was processed differently for experiment one and two (mesh: bunch-level), and experiment three (point cloud: plant-level). Data processing for experiments one and two progressed as follows:


*Screened Poisson Surface Reconstruction* [30] allows the back of the mesh, which was 'open' (Figure 7a), to be reconstructed. It becomes 'watertight', as seen in Figure 7b.

**Figure 7.** (**a**) Example of mesh prior to reconstruction, and (**b**) the same mesh after Poisson reconstruction.

The nature of the point-cloud data requires different processing steps to that of the mesh data. This effect was attributed mainly to the necessity of closing the points of the point cloud, thereby producing a 'watertight mesh' for volume extraction.

The point cloud datasets from experiment three were processed as follows:


Figure 8 illustrates the raw point cloud (Figure 8a) captured by the Kinect sensor, with the segmented point cloud (Figure 8b) displaying the bunches.

**Figure 8.** (**a**) The Kinect point cloud for an LR-treated vine (1E—east side) and (**b**) the segmented point cloud of the same vine.

The custom script in R statistical software [31] incorporated the *alphashape3d* package v1.3 [32] to compute the 3-D shape for volume calculation purposes. The *alphashape3d* [32] package includes the α*-shape* algorithm [33] to recover the geometric structure of the 3-D point cloud for volume calculation. The α*-shape* algorithm [33] requires a specific alpha value for computation; hence, the alpha value directly influences the total volume calculated. To adjust an alpha value for our experimental conditions, various levels of alpha were tested on 25% of the dataset and linearly regressed against the reference values. Similar investigative experiments were conducted by Rueda-Ayala et al. [34] and Ribeiro et al. [35] to determine experiment-specific alpha values.

The process described above for experiment three was repeated for both sides of the vine, resulting in two volume measurements per vine. A single volume value per vine was obtained via the *Total Bunch Volume per Vine* (*TBVV*) calculation:

$$TBVV = (Ve + Vw)/2\tag{2}$$

where *Ve* was the volume of the vine's east-facing side, and *Vw* was the volume of the west-facing side. The resultant *TBVV* was in cm3 per vine.

### 2.3.3. Cross-Validation

Five-fold cross-validation was used to develop the yield estimation model for each dataset. Cross-validation was implemented using the *Caret* package [36] in R statistical software [31], repeated ten times for model robustness. The model produced 'fitted values', which represented the estimated yield (in grams). Following this, the estimated yield (g) values were linearly regressed against the

actual mass (g) to produce a final r<sup>2</sup> value (coefficient of determination), indicating the potential for yield estimation. The Root Mean Square Error (RMSE) was computed from the linear regression, indicating the yield estimation error (in grams).

### **3. Results**

### *3.1. Reference Measurements*

Results of the reference measurements indicated a strong relationship between mass and volume at bunch-level (r<sup>2</sup> = 0.971) and plant-level (r<sup>2</sup> = 0.996). The established relationships between mass and volume served as the basis for the subsequent experiments, which were used to evaluate the 2-D and 3-D techniques.

### *3.2. Pre-Processing*

The complexity of the RGB-D (Kinect) datasets required two additional pre-processing steps. The first step was the determination of an alpha value required for volume calculation, using the *alphashape3d* package [32]. The second step was volume correction for all Kinect datasets, due to volume estimation errors in the datasets. The segmentation accuracy for the RGB pre-processing has also been included in this section.

### 3.2.1. RGB Segmentation Accuracy

Experiment one's segmentation results yielded an F1-score of 0.976, with an accuracy of 0.971. Experiment two resulted in a lower F1-score of 0.842, with an accuracy of 0.781. Lastly, experiment three's F1-score and accuracy were 0.833 and 0.932, respectively. Solar illumination could contribute to the lowered results presented in the in-situ measurements.

### 3.2.2. alphashape3d's Adjusted Alpha Value

Figure 9 represents the r2, and RMSE curves for the various alpha values tested. Values tested ranged from 0.001 to 0.050, in increments of 0.001. It is evident from this figure that the alpha value selected (alpha = 0.010) satisfies a high coefficient of determination (r2 = 0.605) combined with the lowest RMSE (703.301 cm3). Figure 10 provides a visual interpretation of the results from Figure 9. A low alpha value, such as 0.005 (Figure 10b), produced an r<sup>2</sup> of 0.520 and an RMSE of 2052.751 cm<sup>3</sup> (sample dataset mean volume = 1961.875 cm3). Conversely, a high alpha value of 0.050 (Figure 10d) produced an r<sup>2</sup> of 0.506 and an RMSE of 13 936.31 cm3. The selected alpha value of 0.010 (Figure 10c) was subsequently used for all further analyses.

**Figure 9.** The relevant r2 and RMSE values for each alpha value tested for the *alphashape3d* package in the custom R script. Alpha values incremented in 0.001, ranging from 0.001–0.050.

**Figure 10.** Point-cloud reconstruction testing the alpha value for the *alphashade3d* package. The original point cloud before reconstruction—represented as white points for a visual purpose (**a**), and after reconstruction (**b**–**d**).

### 3.2.3. Kinect Volume Correction

Figure 11 shows a volume estimation error present in experiment one's data, which aligns with a subsequent review of the literature [10,18,19]. The 21 bunches have a mean actual volume (the reference volume) of 144.952 cm<sup>3</sup> and a mean estimated Kinect volume of 175.672 cm3. Overestimation by the Kinect V1 sensor was evident from the results. Volume correction via cross-validation was therefore subsequently incorporated into the methodology, where a mean corrected volume of 144.952 cm<sup>3</sup> was achieved. Thereafter, the correction was applied to the remaining Kinect datasets.

**Figure 11.** Experiment one's results, for the 21 individual bunches, illustrating the volume estimation error by the Kinect sensor.

### *3.3. RGB Results*

Figure 12 shows the results for the three experiments that used 2-D RGB digital imagery. The best results were obtained in experiment one (Figure 12a), which produced an r<sup>2</sup> of 0.889 and an RMSE of 17.978 g. The level of accuracy achieved in experiment one can be attributed to the controlled laboratory conditions and supports the proposed methodology for yield estimation. Applying this methodology to in-situ bunches produced less accurate results, as seen in experiment two. Experiment two's FC treatment (Figure 12b) produced the lowest yield estimation results for 2-D RGB imagery, with an r<sup>2</sup> of 0.625 and an RMSE of 27.738 g. The LR treatment's (Figure 12c) r<sup>2</sup> and RMSE values were 0.742 and 25.066 g, respectively. The lesser FC values were directly attributed to the canopy coverage present, as the bunches were partially occluded from the sensor's view. The effect of the canopy treatment was evident when comparing the results. At plant-level (experiment three), the same pattern was present between the canopy treatments. The FC (Figure 12d) treatment of experiment three produced an r<sup>2</sup> of 0.779, while the LR (Figure 12e) treatment produced an even higher r<sup>2</sup> of 0.877. The respective RMSE values were 559.357 g and 443.235 g. The success of yield estimation in experiment three, specifically the LR treatment, supports the methodology's capability for 2-D RGB yield estimation. The in-situ yield estimation was preferable at plant-level, which may be attributed to the lighting conditions and the success of the colour thresholding for bunch segmentation at plant-level.

### *3.4. RGB-D Results*

Figure 13 illustrates the RGB-D results obtained for the three experiments. The unrivalled results obtained in experiment one (Figure 13a) produced an r2 of 0.950 and an RMSE of 12.458 g—the best-performing results presented in this study. The Kinect sensor favoured the controlled conditions of the laboratory, specifically the artificial illumination as a source of light. Applying the same methodology to experiment two (in-situ bunches) resulted in a lower yield estimation performance for both canopy treatments. The FC treatment (Figure 13b) produced an abnormally low r2 of 0.020, with an RMSE of 8.081 g. A statistical outlier in the data was evident when the results were analysed. With the removal of this outlier, the modified FC results (n = 20) improved drastically, with a new r2 of 0.609 and an RMSE of 26.790 g (Figure 13c). In contrast, the LR treatment (Figure 13d) resulted in an r2 of 0.756 and an RMSE of 24.601 g, which aligned with the LR results for bunch-level obtained with RGB imagery (Figure 12c). At plant-level, the unfavourable results of the FC treatment (Figure 13e) generated an r<sup>2</sup> of 0.487 and an RMSE of 673.535 g. However, the LR treatment (Figure 13f) provided some promise for the Kinect sensor at plant-level, achieving an r2 of 0.594 and an RMSE of 661.739 g. The same effect of the canopy treatment was evident in the RGB-D results as in the RGB results, with the LR treatment producing a better yield estimation agreement. The results of experiment three indicated a limitation within the proposed methodology for RGB-D yield estimation at plant-level. Such limitation could be from the data-acquisition process, or the image segmentation within the data analysis.

The poor results obtained in experiment two's FC dataset can be attributed to the mesh reconstruction step in the data-analysis process. Exaggerated reconstruction of bunches presents a potential limitation of the *Screened Poisson Surface Reconstruction* algorithm [30], as depicted in Figure 14. The statistically-outlying bunch circled in red (Figure 13b) produced a Kinect volume of 856 cm3, as illustrated in Figure 14a, when the actual volume was only 35 cm3. An example of an accurately reconstructed mesh (Figure 14b) was included for visual comparison. This anomaly was unavoidable during data processing.

**Figure 12.** RGB results presented for the three experiments; experiment one (**a**), experiment two FC (**b**) & LR (**c**) and experiment three FC (**d**) & LR (**e**). *{Key: Exp.* = *experiment; FC* = *full canopy; LR* = *leaf removal}.*

**Figure 13.** Presented RGB-D results of the three experiments; experiment one (**a**), experiment two FC (n = 21) (**b**), experiment two FC\* (n = 20) (**c**), experiment two LR (**d**) and experiment three FC (**e**) & LR (**f**). {Key: Exp. = experiment; FC = full canopy; LR = leaf removal; \*statistical outlier removed, resulting in 20 bunches}.

**Figure 14.** Illustration of *Screened Poisson Surface Reconstruction*. The reconstructed bunch (circled in red—Figure 13b) with the incorrect volume (**a**), and an example of a reconstructed bunch of the correct volume (**b**).

### **4. Discussion**

To date, most studies have employed 2-D RGB imagery for vineyard yield estimation at both bunch- and plant-level. However, only Marinello et al. [9] have investigated the use of 3-D RGB-D (specifically making use of a Kinect sensor) imaging for vineyard yield estimation. The research we have presented investigated 2-D and 3-D PRS techniques for vineyard yield estimation. The study was undertaken as three experiments, consisting of bunch-level and plant-level datasets, with in-situ measurements captured for the two canopy treatments (FC and LR). The following subsections discuss the results of the two PRS techniques in further detail.

### *4.1. Using 2-D RGB Imagery for Yield Estimation*

The presented results using RGB imagery for yield estimation are robust and support the 2-D PRS technique. Experiment one (r2 = 0.889; RMSE = 17.978 g) illustrated the success of RGB imagery in a controlled environment for yield estimation at bunch-level. At plant-level, similar results were presented for the LR treatment in experiment three (r<sup>2</sup> = 0.877; RMSE = 443.235 g). The success of the methodology under both laboratory and field conditions supports the use of colour thresholding for image segmentation, and the adapted pixel area metric for yield estimation.

Colour thresholding for image segmentation was favoured by several studies [8,9,14]. In our study, the use of the HSV colour space for thresholding has proven fruitful. The HSV colour space is supported by Font et al. [14], who achieved favourable results (estimation error of 13.55%) when working with the H layer for segmentation purposes. At bunch-level, experiment one's result (r<sup>2</sup> = 0.889) shows an improvement to the result (r<sup>2</sup> = 0.77) presented by Liu, Marden and Whitty [9]; similarly conducted under laboratory conditions. This aspect was a noteworthy improvement, especially considering the manual nature of the colour thresholding. At plant-level, the manually produced results of experiment three (FC: r2 = 0.779; LR: r2 = 0.877) align with the automated classification results (r2 = 0.865) of Liu, Marden and Whitty [9]. Here, Liu, Marden and Whitty [9] use the same 1×1 m image dataset as Dunn and Martin [8], as opposed to the plant-level imagery used in our study. Additionally, the colour thresholding approach of our study outperformed the image classification approach (test r2 = 0.73) [13] to segmentation.

We presented an adaption of the pixel count metric for yield estimation, which expands on current literature [8,9,13,14]. Of the five metrics tested by Liu, Marden and Whitty [9], the pixel count produced the best results. We found the incorporation of a calibration length (the ruler) for pixel count resulted in an improved quantitative pixel area (cm2) for yield estimation. Again, our results at bunch-level (r<sup>2</sup> = 0.889) improved the bunch-level results (r2 = 0.77) conducted in laboratory conditions—as presented by Liu, Marden and Whitty [9], who were specifically testing the various yield estimation metrics. At plant-level, our LR treatment (r<sup>2</sup> = 0.877) outperformed all current literature to date, with our FC treatment (r2 = 0.779) representing a slight improvement for in-situ measurements. The presented pixel area (cm2) metric also improved on the berry count results [4,11], with the highest berry count r2 (0.74) presented still being lower than our plant-level pixel area r2 (0.779) for the FC treatment. However, the potential of the berry count is applicable across all cultivars, as it does not depend on the colour of the berry [11].

The limitation of slight distance variations between the camera and bunches within each image was resolved by the incorporation of the reference length (the ruler). The necessity of determining the calibration length for each image outweighs the added processing requirement of this step. Overall, this allows improved yield estimation success, as represented in our results. Future work could attempt to automate this process. However, a more restrictive limitation was the human involvement in determining the appropriate threshold values; this could explain the lowered in-situ estimation performance at bunch-level. Future work could investigate a more automated methodology, thus alleviating this limitation.

### *4.2. Using 3-D RGB-D Imagery for Yield Estimation*

To date, the work presented by Marinello et al. [10] is the only literature supporting the use of the Kinect V1 sensor for vineyard yield estimation. Marinello et al. [10] used table grape bunches to determine the optimal viewing angle and distance for the sensor. The authors concluded that a side-on view of the bunch, with a distance of between 0.8–1.0 m, produced the least variability in mass estimations. The current findings of our study, which incorporated RGB-D imagery for yield estimation across the three experiments, are therefore the most comprehensive findings to date. The presented study exemplifies the potential of 3-D PRS techniques for yield estimation, specifically a cost-effective sensor like the Kinect V1.

The nature of our study's datasets resulted in separate data analysis between the bunch- and plant-level datasets. Astounding results were obtained in experiment one (r<sup>2</sup> = 0.950; RMSE = 12.458 g). It was evident that the Kinect sensor favours ideal laboratory conditions, allowing accurate yield estimation at bunch-level. The ramification of less-favourable conditions, such as in-situ monitoring, became apparent in experiment two (bunch-level) and three (plant-level).

Although no bunch-level studies have used a Kinect sensor for vineyard yield estimation, a similar approach under laboratory conditions, estimating the volume of cauliflowers, was presented by Andujar et al. [19]. The authors were able to achieve an r<sup>2</sup> of 0.868 when regressing the estimated volume against the known fruit mass. Conversely, our methodology enables the relationship between fruit volume and mass to be modelled, allowing the use of the adjusted mass (calculated from the model) for subsequent yield estimation. A fundamental difference between our methodology and that of Andujar et al. [19] is the method in which the 3-D model was captured. Our methodology captured a 3-D mesh of the fruit, while Andujar et al. [19] captured a 3-D point cloud of the vegetable. This required model reconstruction, constructively producing a 'watertight mesh'. Our improved results at this level (r2 = 0.950) can be accredited to this fundamental difference. Interestingly, both methodologies used the same 3-D reconstruction method—*Screened Poisson Surface Reconstruction* [30] found in MeshLab [29].

The primary limitation of the proposed methodology at bunch-level was a combination of dataset quality and the *Screened Poisson Surface Reconstruction* method [30]. The consequence of a poor-quality mesh became apparent when bunch reconstruction resulted in significant defects, as seen in Figure 14a. Such imperfections directly affect the potential for accurate yield estimation, exemplified by the results of experiment two's FC dataset (r<sup>2</sup> = 0.020; in Figure 13b). Future research is necessary to gain a better understanding of this shortfall in the methodology.

This study presented the novel use of a Kinect RGB-D sensor for in-situ vineyard yield estimation at plant-level (experiment three). The presented plant-level methodology produced promising results for the LR treatment (r2 = 0.594; RMSE = 661.739 g), while the effect of the canopy coverage was evident in the FC treatment (r<sup>2</sup> = 0.487; RMSE = 673.535 g). Future work should improve on the presented methodology, potentially overcoming several limitations. Additionally, the nature of the Kinect V1 sensor results in the fundamental constraint of being suited to indoor use only, as solar illumination produces excessive interference in the captured imagery [19]. Future work should make use of the Kinect V2 sensor, since several improvements to the sensor have been implemented, effectively allowing improved outdoor imagery to be captured [37]. RGB-D sensors, specifically the Kinect V2, are being incorporated in terrestrial vehicles as cheap sensor alternatives for vineyard modelling and yield estimation, which is demonstrated by [38].

### *4.3. The Operational Potential of Developed Methodologies*

Both the presented 2-D RGB and 3-D RGB-D methodologies achieved acceptable accuracies across the three experiments. Our results (Figures 12 and 13) support the use of PRS technology and techniques for vineyard yield estimation, especially for VSP-trained Shiraz vineyards. The nature of the presented work was conceptualised to assess 2-D and 3-D PRS sensors side-by-side, a novelty in the vineyard yield estimation domain.

Experiment one illustrated the capability of these two techniques for successful yield estimation of individual bunches, where the Kinect RGB-D sensor (r2 = 0.950) outperformed the digital RGB sensor (r2 = 0.889). The suitability of the lighting under laboratory conditions coupled with the Kinect's ability to capture a 3-D model of the bunch both contribute to the success of the Kinect sensor over the RGB sensor. Nonetheless, robust methodologies were established in a controlled environment.

Experiment two tested the established methodologies in situ, under both FC and LR canopy treatments. The produced results of experiment two—and experiment three—confirmed the hypothesis of a superior yield estimation agreement under the LR canopy treatment. If good canopy management practices are established early in the season, better yield estimation results than those obtained under the FC treatment could be achieved. Both sensors produced similar results for FC [approximate r2 = 0.61; using Kinect's modified results (n = 20)] and LR (approximate r2 = 0.75) treatments in experiment two.

The success of the two PRS methodologies can be differentiated at plant-level by the results of experiment three. Unlike experiment one, the RGB methodology outperformed the RGB-D methodology for yield estimation. The RGB results (r<sup>2</sup> = 0.877) significantly outclassed the RGB-D results (r2 = 0.487) under the FC treatment, whereas a smaller margin between the RGB (r2 = 0.779) and RGB-D (r2 = 0.594) sensors occurred under the LR treatment. Inference behind the differing results can lead to the following observations. The different lighting conditions influenced the results: the RGB imagery was collected at midday, while the RGB-D imagery was collected immediately before sunset. Additionally, the continuous movement of the Kinect sensor (RGB imagery captured in a stationary position) could have contributed to lowered RGB-D results. Further research is encouraged, using a standardised experimental setup, where feasible, to create more favourable conditions under which both sensors can operate.

For the RGB and RGB-D datasets, the LR treatment yielded higher estimation agreements compared with the FC treatments. The LR treatment was adopted to create an ideal in-situ environment for yield estimation. The commercial feasibility of complete leaf removal in the bunch zone is not practical in some viticulture regions, since there are adverse effects associated with this practice (such as 'sunburn'). However, this practice can be implemented in specific zones of the vineyards, enabling an approximation of the total yield using a monitoring strategy. The FC results provide a better indication of the commercial readiness of the developed methodologies. Future work should determine an optimal canopy coverage that favours both sensor operationality and commercial farming methods.

Overall, the use of the Kinect sensor as a cost-effective RGB-D sensor for vineyard yield estimation, specifically at bunch-level, is supported by the results obtained for experiment one. However, the robustness of the RGB methodology is evident across all three experiments, with substantial plant-level results obtained in situ. A better understanding of the current limitations would allow further improvements to the methodologies. To this end, the results obtained in this study support the potential for operationalisation of both PRS sensors. A refined methodology could see commercially favourable data-acquisition methods implemented on a larger dataset, with fully automated data processing. See for example [15,16]. The commercialisation of such methodologies could become more feasible, due to the simplicity and robustness of the current methodologies, coupled with improved yield estimation accuracy.

### **5. Conclusions**

A novel approach to a side-by-side investigation of 2-D and 3-D PRS techniques for successful vineyard yield estimation has been presented. This study assessed RGB imagery captured by a digital camera and RGB-D imagery captured by a Kinect V1 sensor across three experiments, with in-situ measurements obtained under two canopy treatments. Our results show that the Kinect RGB-D sensor produced the highest yield estimation agreement under laboratory conditions (bunch-level). At bunch-level, RGB and RGB-D PRS techniques performed equally under both canopy treatments for in-situ yield estimation. At plant-level, the best in-situ results were obtained using the RGB imagery, which significantly outperformed the RGB-D results. Both sensors support the use of PRS technology and techniques for vineyard yield estimation, with improved accuracies presented. The results of this study confirm the operational potential of 2-D RGB imagery for accurate yield estimation, with the recommendation that future work should investigate a more automated RGB methodology suitable for operational environments. Regarding the presented RGB-D methodology, the Kinect demonstrates the potential for vineyard yield estimation using 3-D RGB-D imagery. Future work should investigate the use of the Kinect V2 sensor coupled with suitable lighting conditions for in-situ yield estimation.

**Author Contributions:** C.H.—data analysis, writing and editing of the manuscript; N.P.—content supervision, and editing of the manuscript; N.M.—data collection and analysis; C.P.-E.—conceptualisation, experimental study design, content supervision and editing of the manuscript.

**Funding:** This research was funded by Winetech, project DVO 07: "Near-real-time characterization of vines for more efficient vineyard management".

**Acknowledgments:** The authors would like to thank Albert Strever, Berno Greyling, Aloïs Houeto and Alessandro Bellotto for their technical and research support.

**Conflicts of Interest:** The authors declare no conflicts of interest.

### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
