**Contents**



### **About the Editors**

**Sigfredo Fuentes** is a professor in Digital Agriculture, Food and Wine Sciences at the University of Melbourne. Previously, he worked at the Universities of Adelaide, Technology, Sydney; Western Sydney (Ph.D.) and Chile. His scientific interests include climate change impacts on agriculture, the development of new computational tools for sensory, animal, plant physiology, food, and wine sciences, new and emerging sensor technology, proximal-, short- and long-range remote sensing using robots and UAVs, machine learning and artificial intelligence. For more information visit: www.vineyardofthefuture.com.

**Carlos Poblete-Echeverria** is a senior lecturer of Advanced Viticulture and a coordinator of the research group of Digital Viticulture at Stellenbosch University, South Africa. Carlos was working as a lecturer and researcher in Chile for 10 years at the University of Talca and the Catholic University of Valparaiso, in the areas of water management and the use of new technologies. Currently, he is the research leader of the Winetech project "Integrated vineyard monitoring system to improve water management". His research areas are related to estimations of water consumption using models and micrometeorological techniques, climate change, machine vision, and the detection of biotic and abiotic stresses using thermography, spectroscopy, remote sensing, UAVs, and robots. Carlos has published more than 80 scientific and popular articles. For more information, please visit: https://www.researchgate.net/profile/Carlos Poblete-Echeverria.

### *Editorial* **Editorial: Special Issue "Emerging Sensor Technology in Agriculture"**

### **Carlos Poblete-Echeverría <sup>1</sup> and Sigfredo Fuentes 2,\***


Received: 3 July 2020; Accepted: 6 July 2020; Published: 9 July 2020

Research and innovation activities in the area of sensor technology can accelerate the adoption of new and emerging digital tools in the agricultural sector by the implementation of precision farming practices such as remote sensing, operations, and real-time monitoring. The agricultural industry has been greatly affected by climate change; therefore, to be successful in overcoming these effects and remain competitive and sustainable in the market, there is the need to support research and application development of new and emerging sensor technologies and their applications in agriculture. A total of 13 papers were published in this Special Issue entitled: "Emerging Sensor Technology in Agriculture", and the topics addressed include different emerging technologies with applications on ecosystems (grasslands) [1] and several agriculture crops such as peppers [2], apples [2,3], grapevines [2,4–7], cocoa trees [6], citrus [8], legumes [9], wheat and rice [10,11]. Two papers were also related to the use of remote sensing to detect forage quality [9], regions of interest of pigs [12], and pesticide droplet deposition [13] using machine learning.

In Rueda-Ayala et al. [1] an aerial (Unmanned aerial vehicle—UAV) and an on-ground (Kinect sensor—RGB-D (Depth camera)) methods were used to characterize grass ley fields (plant height, biomass, and volume) composed of different species mixtures, using digital grass models. In this study, both methods presented a good performance. From a comparison point of view, the authors took into consideration some basic economic and practical aspects of the methodologies. Hacking et al. [4] used a similar approach to determine yield in grapevines. Another UAV-based study investigated the effect of eddies formed at low altitude in wheat to estimate water status effectively and other physiological parameters in rice [11]. Yield estimation is a key topic in agriculture in general, and it is very relevant in viticulture since winegrowers need such information to manage several logistic aspects at the cellars. In Hacking et al. [4], 2D (RGB images) and 3D (RGB-D) approaches were tested and compared, providing promising results and perspectives in terms of the potential application of these technologies at the vineyard scale (in situ yield estimation). Another interesting use in viticulture was presented in the research of Palacios et al. [5], where they combined computer vision (RGB images) and machine learning for assessing cluster compactness (degree of the aggregation of its berries) under field conditions (system mounted on an all-terrain vehicle). In this study, the bunches were detected and classified to perform the cluster compactness determination using a Gaussian process regression model. The authors highlighted the potential applicability of this method to determine the spatial variability of cluster compactness in commercial vineyards. As was stated by Palacios et al. [5] and Hacking et al. [4], fruit detection is the first mandatory step to perform other calculations. In this regard, Zemmour et al. [2], presented an automatic parameter-tuning procedure for fruit detection. They developed a tuning process to determine the best fit values of the algorithm parameters to enable easy adaption to different kinds of fruits (shapes, colors) and environments (illumination conditions). In this study, the algorithm was tested under challenging conditions in three crops: red

apples, green grapes, and sweet yellow peppers. The algorithm presented successfully detected apples and peppers in variable lighting conditions; however, for green grapes, the authors indicated that there is the need to incorporate some additional features such as morphological parameters to improve the detection process. Estimations of the amount of fruit are important for yield predictions, but also for the right moment to harvest them [4]. The study presented by Valente et al. [3] explored the use of a small-sized electrochemical sensor mounted on a UAV for sensing ethylene concentration in an apple orchard. The latter was the first study focused on investigating the feasibility of ethylene-sensitive sensors in a fruit orchard. However, the results are not conclusive for harvest decisions (fruit maturity). This study opens a research area in this field.

As RGB and RGB-D information, temperature is another variable that can be remotely measured to detect some plant conditions, such as water status and stresses (biotic and abiotic). New technologies of infrared sensors/cameras and computational analysis have allowed a faster and accurate characterization of canopy temperatures. Romero-Bravo et al. [10] presented an application of thermography for estimating grain yield and carbon isotope discrimination in wheat genotypes growing under water stress and full irrigation conditions. The results of this study show that the water regime influences the thermal approach, showing better results under water stress conditions. The authors highlighted that more complex models are needed to estimate grain yield and carbon isotopes since the environmental conditions have a strong influence on the temperature profile of the plants.

Bushfires are one of the climatic anomalies that have increased in number, severity, and window of opportunity within agricultural seasons. For grapevines, they present a critical problem due to smoke contamination and smoke taint. Fuentes et al. [7] proposed the first artificial intelligence approach to model smoke contamination in canopies and smoke taint in grapes and wines using non-invasive infrared thermal imagery (IRTI) and near-infrared spectroscopy (NIR), producing highly accurate machine learning models. From the same research group, further applications of remote sensing and machine learning modelling rendered one of the first specific models to assess aroma profiles of cocoa beans for chocolate manufacturing based on canopy architecture profiles at harvest [6]. These two technological developments can assist growers in combatting environmental hazards and predict quality traits of final products.

Mechanization and agricultural management practices can require significant labor and investment that may not necessarily secure efficiency. Mechanical harvesting can be considered a hot topic in agriculture that requires technology to monitor different aspects to increase productivity. A vibration monitoring system for citrus harvesting was proposed and tested to improve fruit detachment frequency with promising results [8]. Other management practices such as pesticide application require accurate monitoring methods to assess efficiency in the distribution of droplets within crop canopies to minimize detrimental effects in the environment and maximize application efficiency. A rapid method to detect spraying deposit was developed based on capacitance sensors [13].

Agriculture involves not only crop production but also animal farming. Digital technologies have been applied in recent years to monitor the quality of animal feed and to detect animals in order to extract information from remote sensing systems that can provide information on physiological stresses and the general welfare of the animals. One paper researched the use of NIR to predict forage quality of warm-season legumes using machine learning modelling with high accuracy [9]. For animals, a different region of interest from pig bodies was successfully detected using convolutional network deep learning techniques, which may allow for more efficient extraction of information from animals to identify biotic or abiotic stress-related problems [12].

The diversity in applications within this Special Issue makes evident the importance of novel research on new and emerging technologies for the agricultural industry.

**Acknowledgments:** The guest editors would like to extend their gratitude to all the authors who contributed to this Special Issue and to the reviewers who dedicated their valuable time providing the authors with critical and constructive recommendations.

**Conflicts of Interest:** The guest editors declare no conflict of interest.

### **References**


© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Comparing UAV-Based Technologies and RGB-D Reconstruction Methods for Plant Height and Biomass Monitoring on Grass Ley**

**Victor P. Rueda-Ayala 1,†, José M. Peña 2, Mats Höglind 1, José M. Bengochea-Guevara 3, Dionisio Andújar 3,***∗***,†**


Received: 10 December 2018; Accepted: 23 January 2019; Published: 28 January 2019

**Abstract:** Pastures are botanically diverse and difficult to characterize. Digital modeling of pasture biomass and quality by non-destructive methods can provide highly valuable support for decision-making. This study aimed to evaluate aerial and on-ground methods to characterize grass ley fields, estimating plant height, biomass and volume, using digital grass models. Two fields were sampled, one timothy-dominant and the other ryegrass-dominant. Both sensing systems allowed estimation of biomass, volume and plant height, which were compared with ground truth, also taking into consideration basic economical aspects. To obtain ground-truth data for validation, 10 plots of 1 m<sup>2</sup> were manually and destructively sampled on each field. The studied systems differed in data resolution, thus in estimation capability. There was a reasonably good agreement between the UAV-based, the RGB-D-based estimates and the manual height measurements on both fields. RGB-D-based estimation correlated well with ground truth of plant height (*R*<sup>2</sup> > 0.80) for both fields, and with dry biomass (*R*<sup>2</sup> = 0.88), only for the timothy field. RGB-D-based estimation of plant volume for ryegrass showed a high agreement (*R*<sup>2</sup> = 0.87). The UAV-based system showed a weaker estimation capability for plant height and dry biomass (*R*<sup>2</sup> < 0.6). UAV-systems are more affordable, easier to operate and can cover a larger surface. On-ground techniques with RGB-D cameras can produce highly detailed models, but with more variable results than UAV-based models. On-ground RGB-D data can be effectively analysed with open source software, which is a cost reduction advantage, compared with aerial image analysis. Since the resolution for agricultural operations does not need fine identification the end-details of the grass plants, the use of aerial platforms could result a better option in grasslands.

**Keywords:** 3D crop modeling; remote sensing; on-ground sensing; depth images; parameter acquisition

### **1. Introduction**

Pastures are botanically diverse and difficult to characterize, due to their complex species composition. A good characterization of a forage crop parameters is crucial for successful grassland management [1]. Technological advancement and current sensing technologies are powerful tools for elaborating accurate

plant architecture models for phenotyping [2,3]. Digital modeling can be used to detect environmental stress problems, diseases, or the necessity of applying agricultural operations, at the right location and timing. Sensor data could be acquired throughout the whole plant life cycle and be available for model development and validation [4]. Spatial and temporal crop parameter information, for instance biomass and nitrogen content [5], add up value to vegetation models [6], strengthening decision-support systems for site-specific agronomic applications. Because perennial crops have a great biomass building potential, continuous supervision of crop development parameters along their life cycle is recommended [7]. Continuous supervision by means of spatial models could improve decision-making for forage grass production. Moreover, spatial models facilitate estimation of above-ground biomass, canopy height or plant cover in a non-destructive manner, which allow better programming of specific tasks, such as cutting time, fertilization and grasslands renewal [8].

Spatial vegetation models are based on data from digital imaging, spectrometry, fluorescence, thermal or distance measurements, which relate to some plant traits. Spectral reflectance of plant leaves, ranging from ultraviolet (UV), through visible light and near-infrared (NIR) and infrared (IR) wavelengths have been found to be particularly important for calculation of various vegetation indices [9,10]. Vegetation indices often correlate with leaf area index (LAI), biomass or dry matter yield [11]. Biomass estimation using the normalized difference vegetation index (NDVI) has given good results on annual pastures under grazing, although, poor data quality caused large estimation discrepancies on grazed or partially grazed paddoks [12]. Ground-based and aerial visible imaging data acquired at a specific time and the use of algorithms able to segment the RGB spectrum have been proposed for quick and simple description of plant growing dynamics [5,13,14]. However, data assessment from RGB images is limited in some aspects, such as leaf overlapping, that can make important parts of the plant difficult to detect, especially in grass mixtures. Distance sensors can measure distances by different principles (e.g., time-of-flight) and enable estimation of plant height or derive biomass weight by indirect relationships with height [15,16]. Distance sensors, which are normally divided into ultrasonic devices and LiDAR (Light Detection and Ranging), have been widely applied in modern agriculture operations [17–20]. Because they are easy to handle, these sensors can be used to assess big field areas in short time.

The use of 3D technologies from on-ground or aerial platforms open new scenarios for plant modeling. Characterization of plants with the aid of 3D models is available for use in breeding programs and agricultural decision making [8,21]. Various processes are available for capturing the three dimensions, height, width and depth, as 3D point clouds with X-Y-Z coordinates. The most explored, fastest and accurate 3D sensing system is LiDAR combined with sequential displacement of the sensor to acquire the Z coordinate [22]. A drawback of this system is the requirement of calibration and displacement across the sampling space, which increases the associated costs, as the resolution increases [23]. Fortunately, RGB-Depth (RGB-D) cameras and image processing based on Time-of-Flight can compensate those drawbacks by combining depth information with the color scene in a single shoot [17]. RGB-D cameras have been used for several agricultural research and application purposes. The most common is Microsoft Kinect-<sup>R</sup> v2, which allows reconstruction of 3D models, associated with color information. Microsoft launched the fist version of this development in 2010 with the Kinect, and since then, several other devices have appeared in the market: RGB-D cameras, such as the Intel RealSense, Primesense Carmine, Google Tango, or Occiptial's Structure Sensor. These sensors are available at low price, and can capture pixel color and depth images at adequate resolution and at a high rate. Due to the similar output produced by those, the reconstruction method can be easily replicated. This methodology has been successfully applied on many crops, except on grasslands. Wang and Li [24] calculated onion volume with a high accuracy, compared to real measurements. Foliar density of threes was estimated for autonomous spraying [25]. Andújar et al. [17] used a dual methodology separating crops and weeds from soil in maize crops under field conditions. The latter methodology included height selection and RGB segmentation, using a unique

model for plant discrimination. Combination of various frames allows reconstruction of big crop surface areas [21,26]. Live use of RGB-D on outdoors scenarios is possible with the current version of Kinect-<sup>R</sup> v2.

UAV's can cover large areas and operate independently of soil conditions [27], which allows more flexibility than ground-based systems, at reduced operational time and costs. Photogrammetry on aerial imagery has shown a high functionality in different studies. High spatial resolution images can be obtained when flying at low altitudes, with large overlapping between images. The data can be processed through Structure-from-Motion reconstruction for building the 3D model. This method has been tested in olive trees to calculate canopy area, tree height and crown volume by generation of digital surface models and OBIA algorithm [28]. Hyperspectral aerial imagery can be used to calculate plant height and values related with dry biomass [29]. Nevertheless, this technology is rapidly improving for application in complex grassland scenarios [30].

Current challenges in both, agricultural research and production rely on sensing devices and technological advancement directed to improve crop quality and increment yield levels. As for other crop producers, forage farmers can immensely benefit from advanced technological support of digital grass modeling, to enhance forage productivity. Digital models could objectively deal with the complexity of grass-mixtures, and assist in the optimization of inputs, leading to better distribution or reduction of fertilizers, pesticides or seeds, e.g., by site-specific fertilization and renewing grass mixtures. In addition, some grassland farming activities in the field depend on biomass estimation to evaluate productivity, normally done via destructive methods (i.e., in this study referring to cutting numerous grass samples). Therefore, this study was carried out with the aim of evaluating aerial and on-ground methods to characterize grass ley fields, composed of different species mixtures. Specifically, it was attempted to objectively estimate plant height, biomass and volume, using digital grass models, and avoiding the unnecessary destruction of the swards.

### **2. Materials and Methods**

#### *2.1. Experiments and Modeling Systems*

Two digital characterization systems, a on-ground and a UAV-based system, were used to map pasture architecture on two fields located at NIBIO Særheim research station (Klepp Stasjon, Norway, lat. 58.76 N, long. 5.65 E). The site is characterized by a cold maritime climate with cool summers and cold winters. The precipitation is about 1180 mm annually, especially in autumn and spring. The on-ground system used a RGB-D camera, while the UAV-based sytem used a RGB camera with geo-positioning (geo-tagging) for data acquisition. These systems were tested and compared previously on a small area [31], at the same location. Two fields, each of 0.5 ha in size, were mapped. To obtain ground-truth data for validation, 10 plots of 1 m2 were sampled on each field (Figure 1a). Each plot was subdivided in four quadrants for measuring the variability within the 1 m<sup>2</sup> area (Figure 1c). Field 1 (ryegrass dominant) was composed of 80% perennial ryegrass (*Lolium perenne* L.), 5% annual ryegrass *Lolium multiflorum* L. and 15% white clover (*Trifolium repens* L.). Field 2 (timothy dominant) was composed of 85% timothy (*Phleum pratense* L.), 10% perennial ryegrass, and 5% annual ryegrass. Both fields were established in 2015. From 2016, they were fertilized annually with 10, 8 and 6 tons of liquid manure at early spring (March–April), after first cut (June–July) and after second cut (August–September), respectively, corresponding to about 260 kg ha−<sup>1</sup> year−1. Field assessments were conducted during July–August 2017 when the swards were fully developed, at anthesis stage.

**Figure 1.** Field test conducted at NIBIO Særheim, orthophoto of the ryegrass-dominant field (**a**) with 10 sampling plots and a zoomed in 1 m<sup>2</sup> sampling plot subdivided in four quadrants (**b**); UAV sampling system (**c**) and RGB-D sampling system (**d**).

For the on-ground system the RGB-D Microsoft Kinect-<sup>R</sup> v2 (Microsoft, Redmond, WA, USA) was used, as described by Andújar et al. [17]. Kinect V2 is the most widely used among RGB-D sensors. Although the device is no longer supported by Microsoft, its capabilities are similar to any other option in the market. In addition, readings sensors of this type show a common output, and the processing methodology is similar. The device is equipped with a standard RGB camera of 1080p, a depth camera, an infrared camera and an array of microphones. The RGB camera has a resolution of 1920 × 1080, which can adapt automatically the exposure time of the RBG to obtain brighter images at limiting light conditions. The IR camera can take a clear view into the darkness with a resolution of 512 × 424 pixels. The opening field of view (FOV) is different for every camera. The IR camera has a FOV of 70 degrees horizontally and depth perception is limited to 60 degrees vertically. The range of depth that can be measured with this camera goes from 0.5 to 4.5 m of distance from the sensor, although in outdoors conditions, the maximum range decreases. Studies conducted outdoors under different daytime illumination conditions showed valid depth measurements up to 1.9 m during sunny days, while the distance increases up to 2.8 m under the diffuse illumination of an overcast day [17]. The req uired overlap to fuse the acquired images and create the models is reached by a frame rate than can be set up to 30 fps during data acquisition. The distance is calculated for every pixel in the scene by the method of Time-of-Flight method by phase detection, i.e., the distance is calculated based on the time that a pulse of light takes to travel from the light source to the impacted plant and back to the sensor.

An Intel laptop computer with Windows 8 supported by Kinect SDK (software development kid) was used for data collection. The SDK helps acquiring data by classes, functions and structures, providing the necessary drivers for the sensor, and some sample functions that were implemented for the measurements combined with some OpenCV (The Open Source Computer Vision https://www.opencv.org/). The sensor was hand held pointing out the field samples from top view. The developed method for point cloud generation and reconstruction of large regions, using the fusion of different overlapped depth images was based on a previous development [32]. Storing information only on the voxels closest to the detected object and accessing to the stored information by using a hash table. Following that, for every new input depth image and knowing camera position, the ray-casting technique [33] was applied to project a ray from the camera focus for each pixel of the input depth image to determine the voxels in the 3D world crossing each ray. Then, the voxels related to the depth information are determined. Next step was conducted with a variant of the iterative closest point (ICP) algorithm, which provides a point cloud as output. Thus, the modified algorithm creates a point cloud by detecting the overlapping areas in sequential frames by assessing the relative position of the Kinect sensor for each frame to create a 3D model and removing outliers from the mesh [26]. Outliers could appear isolated in the point cloud. A point was considered an outlier if the average distance to its 64 nearest neighbours is greater than the standard deviation of the distance to the neighbours of all the points (Figure 2). The time to complete the acquisition was lower than 2 s from the top view. The system was supplied with electric power by a field vehicle that allows field measurement and support every device used during the acquisition process.

**Figure 2.** Section of the 3D reconstruction: before filtering (**a**); removed points are marked in fluorescent (**b**) and after filtering (**c**).

One 3D model was build on the sward before harvesting and one inmmediately after. Once those 3D representations of the sampled plots were available (Figure 3), plant height and volumes could be estimated. For this purpose, both models were overlayed and plant height was estimated by difference between the two models, using cloudcompare. Firstly, an alpha shape [34] or volume that enveloped the set of 3D points was obtained. The alpha parameter specifies how tight the body fits the points. To address this issue, the R package alphashape3d [35] was employed. Figure 4 shows different alpha shapes according to the alpha value selected for the same point cloud. Higher values showed very loose shapes, whereas lower values generated tight bodies. The volume was estimated by applying the same function library that allows calculation of the alpha shapes.

**Figure 3.** Point clouds created by RGB-D (Microsoft Kinect-<sup>R</sup> v2) system.

**Figure 4.** Alpha shapes for the same point cloud using alpha = 0.1 (**a**) and (**b**), alpha = 0.2 (**c**) and alpha = 0.4 (**d**).

The aerial system consisted of a DJI Mavic Pro quadcopter, combining a 4K digital camera and location information, was used for aerial imaging. The camera mounted on the UAV had a 28 mm lens with a Field of Fiew (FOV) of 78.8 degrees and a resolution of 4000 × 3000, capable of shooting 12.35 megapixel photos; the camera was 3-axis stabilized by its drone's gimbal (https://www.dji.com). The acquired aerial imagery was tested and compared with the RGB-D on-ground system. The UAV flew autonomously following the programed route by an internal GPS receiver using Litchi APP. The route was set up to take images at an interval of 1 s, creating minimum overlaps of 90% forward and 60% sidewards, at 30 m of flying altitude, and ensuring a necessary overlapping between images for photogrammetry post-processing, mosaicking and Digital Surface Model (DSM) generation.

Agisoft PhotoScan Professional Edition (Agisoft LLC, St. Petersburg, Russia) version 1.0.4 was used for 3D model building. This software provides a fully automatic process for image alignment, building field geometry, and orthophoto generation. Quality analysis of all acquired images was done with this software, and images with a value higher that 0.7 were used to reconstruct the DSMs by photogrammetry process. The whole process was fully automatic, except for the manual location of reference points used to

correct the model. The model building included several phases: acquisition of very high spatial resolution images with the UAV, and importing them into the software; image alignment; building field geometry by applying close-range photogrammetry methods; dense point cloud generation; application of advanced image analysis to extract the selected geometric features. After that, common points and camera position for each image were located and matched to ensure the refinement of the camera calibration parameters. Then, the software searches for more points in the images to create a dense 3D point cloud, followed by the creation of 3D polygon mesh, from which the final model was generated (Figure 5a).

The DSM and orthomosaics were joined to create a 4-band multi-layer file, i.e., RGB bands and DSM. This file was processed using an OBIA algorithm developed with the eCognition Developer 9 software (Trimble GeoSpatial, Munich, Germany). The software tool for image segmentation and classification applies the multiresolution algorithm Otsu's of automatic thresholding. 3D features (volume) were calculated by integrating the volume of the individual pixels below the top of the crop as a solid generated object (Figure 5b,c, respectively) [36]. This technique has been successfully applied in UAV images both in agriculture and grassland, as well as urban areas and forestry. A desktop computer equipped with an Intel Core i7-4771@3.5 GHz processor, 16 GB of RAM, and NVIDIA GeForce GTX 660 graphic card was used for image processing and 3D modeling.

**Figure 5.** Model constructed by photogrammetry methods (**a**) and processes of point cloud of DSM model (**b**) and solid generation (**c**).

After sensor data acquisition, actual height of every sampled plot was determined with the aid of a measuring tape, on the four quadrants plus the center of each plot. Additionally, the compressed sward height was determined using a rising plate meter, which represented the average height at each sampling plot. The compressed height is used by pasture managers as an indicator of the herbage yield, for decision support. Thereafter, all plants inside in the sampling plot were cut at ground level, then oven-dried at 80 degrees Celsius during 48 h, and finally the dry biomass was measured. The calculated ground-truth data was compared with that extracted from 3D models. From the Kinect-based models, plant volume, maximum height, average height and cover area were extracted.

### *2.2. Statistical Analysis*

Actual field measurements of plant height and dry biomass were compared with the RGB-D-based and UAV-based 3D models assessments, within each field. Simple linear regression were the tested on all relationships, using the Pearson's correlation and *R*<sup>2</sup> coefficients, with their corresponding standard errors in the evaluation for best fit. Differences between both assessed fields were determined through Anova and subsequent lack-of-fit tests for linear regression models.

#### **3. Results and Discussion**

### *Plant Height, Volume and Biomass*

The studied sampling systems differed in data resolution, thus differences were also visible for the estimation capability. Accurate measurement of plant height and volume in pastures is difficult, because single grass plants vary enormously in height, even within areas as small as 0.25 m2. Measurements of compressed sward height with a raising plate-meter or of undisturbed sward height with a measuring tape disregard such variation, as well. The former bends down the largest leaves to a 'common height' (comparable to an average value) at which all grass plant tips support the plate weight. Similarly, using a measuring tape is based on an 'average height', but determined visually. Despite this difficulty, there was a reasonably good agreement the RGB-D-based estimates and the manual height measurements on both fields. These relationships were stronger for the averaged estimates by 1 m<sup>2</sup> sampling plots, and to a lesser extend for the measurements at the four quadrants. Poor quality of UAV-images resulted due to difficult weather conditions for flying, typicall from the south-wester Norwegian region were this study was carried out. Consequently, UAV-based plant heights could not be reliably estimated, but volume was used instead to evaluate the system. The volume was calculated from the 3D surface, using the alpha shapes (Figure 4). An alpha shape represented the outline surrounding a set of 3D points. Spheres of radius alpha, which did not contain any point inside were generated, and in turn, their surfaces were in contact with more than one point. After connecting those points with the ones of the nearest spheres, the surrounding outline was made, generating the volume. The alpha parameter specified how tight the outline to the points was. Although the height measurement could be done, the exact positioning within the frame in the model was difficult to locate. The height varied significantly as the plot was positioned, resulting in false measurements, consequently, height values were avoided as validation information. The actual plant height (raising plate-meter) averaged for the 20 sampled plots was 49.37 cm while UAV measurements underestimated on average 6.18 cm on the 20 reconstructed models.

RGB-D height assessments by quadrant in sampling plots showed a good linear relation between the measured heights at each quadrant, with *R*<sup>2</sup> = 0.88 for field 1 and *R*<sup>2</sup> = 0.81 for field 2 (Figure 6a). This relationship improved greatly when the assessments were averaged per sampling plot *R*<sup>2</sup> = 0.98 and 0.99, respectively for fields 1 and 2 (Figure 6b). Although end-details of grass plant leaves were difficult to reconstruct in the model, the RGB-D-based system showed its powerful capability to estimate accurate

height measurements, which is suitable even in small sampling areas, like those in the present study. Field measurements with a raising plate-meter or using a measuring tape also disregard such end-details. Similarly, UAV missions could not reconstruct the end-details. However, good relationship agreements have been found between UAV-estimated heights and ground-truth in other studies [1,37].

UAV-based plant height estimation generally offers two major advantages over the on-ground technologies. UAV can be properly defined as non-destructive monitoring method, and UAVs can cover huge areas in short time. Technically, the use of on-ground measurements should not be considered fully non-destructive when a whole field is to be scanned. Driving field vehicles to carry out the assessments would lead to a high sward biomass destruction, because of the absence of appropriate sampling pathways across the grass field. On the other hand, on-ground monitoring is more time consuming and of a higher operational cost than aerial inspections with UAVs. These type of results was found using multi-temporal crop surface models derived from UAVs and from a terrestrial laser scanner (TLS) [37]. The crop density was well related with the 3D model reconstructions, but differences between the studied methods were evident. Comparable to our results, UAV-derived plant height was generally lower than TLS estimations at all growth stages. However, the coefficient of variation was expected to be higher for the TLS than in those models created from UAV data [37]. Furthermore, Bareth and Schellberg [1] showed the temporal stability of UAV measurements in grassland fields using Structure from Motion and Multi-view Stereopsis techniques, reaching an overall agreement of *R*<sup>2</sup> = 0.86 between rising plate meter heights and model estimations.

RGB-D-based grass heights correlated poorly with actual dry biomass on the ryegrass field (Figure 7a, left). This results indicate that plant height is a weak proxy for grass biomass on ryegrass dominant pastures. Conversely, the correlation between RGB-D estimated heights and measured biomass on the timothy dominant field showed a much better fit, with an *R*<sup>2</sup> = 0.88 (Figure 7a, right). These differing results may be explained by the different growth habit of the two species. In the studied fields, timothy built biomass primarily by growing tall, whereas ryegrass built biomass only partly by growing tall, but more by tillering and development of biomass close to ground. However, the models created from the UAV system showed more stability regarding the relationship between dry biomass and plant heights. Equivalently to height measurements, aerial models provide a baseline to avoid the noise caused by some leaves or steams above the average coverage.

In general, the RGB-D sward height estimates were slightly lower than assessments with measuring tape and raising plate-meter heights. UAV systems seem to offer highly reliable assessments, closer to the reality. Nevertheless, capturing fine details on grass plants, such as tips of leaves, would require low flying heights, increasing the amount of images to be acquired and enormous computational power needed for the corresponding analysis. These aspects would therefore, increment the risk of disturbing the estimation results. This effect was more common in UAV flights. The use of on-ground methods could improve the models in some breading programs when high fidelity is demanded, keeping in mind that these type of agronomic applications need fast and higher scanning capacity with non-destructive methods.

*Sensors* **2019**, *19*, 535

**Figure 6.** RGB-D estimated grass height compared with field measurements on all four quadrants per sampling plot (**a**), and raising plate-meter height per sampling plot (**b**), on fields 1 and 2. Shadow indicates upper and lower confidence limits.

*Sensors* **2019**, *19*, 535

**Figure 7.** RGB-D estimated grass height (**a**) and volume (**b**) compared with dry biomass per sampling plot, on field 1 and 2. Shadow indicates upper and lower confidence limits.

RGB-D-based volume estimates showed low and intermediate correlation with the assessed grass biomass in fields 1 and 2, with *R*<sup>2</sup> = 0.32 and 0.66, respectively (Figure 7b). Apparently, the higher content

of leaves in ryegrass (Figure 7b, left), contributes more to biomass than to the visible and measurable plant volume to which plant height contributes more than plant density. Conversely, biomass on the timothy dominant pasture (Figure 7b, right), corresponded better with the RGB-D-based volume estimates, as it built yield primarily by growing tall. This same trend was observed comparing the actual measured data of plant height and biomass produced, where this relationship was rather poor for ryegrass (Figure 8, left), while it was good for timothy (Figure 8, right).

A different tendency was observed for aerial models. Volume estimated with the UAV-system on the 20 reconstructed plots had a mean value of 0.39 m<sup>3</sup> and a standard deviation on 0.17 m3 (min = 0.15; max = 0.67). The created models showed an intermediate agreement between the assessed grass biomass and the calculated volume, with an *R*<sup>2</sup> = 0.54. This result shows good capabilities of this method for volume calculation. The developed models showed an irregular shape of the different plots (Figure 5a) and the typical corridors in the experiment. Thus, the accuracy of this method is high and only a few centimeters were underestimated. In the models is also observed that the procedure for 3-D reconstruction was more problematic on areas with a low canopy density. An analogous problem was found in tree reconstruction of orchards, when visible-light images were used. 3-D structure of some of the trees was not properly built, consequently, the mosaicked images showed some blurry areas [28].

**Figure 8.** Actual plant height (raising plate-meter) averaged by plot compared with average dry biomass. Shadow indicates upper and lower confidence limits.

The estimated grass volume and the rising plate meter height (average height) per plot showed identical values for both, aerial and on-ground methods. Plant volume estimated with UAV system correlated somewhat low with plant dry biomass (*R*<sup>2</sup> = 0.54, Figure 9a). The aerial model showed a correlation between both values of *R*<sup>2</sup> = 0.57 (Figure Figure 9b). RGB-D estimated plant volume for the ryegrass dominant pasture showed a high correlation (*R*<sup>2</sup> = 0.87) with plant height measured

and averaged per plot (Figure 10, left), but just an intermediate correlation (*R*<sup>2</sup> = 0.6) for the timothy pasture (Figure 10, right). The timothy pasture showed much more variability in height among individual tillers within the 0.25 m<sup>2</sup> quadrants (Figure 6), which may explain the low correlation with estimated plant volume for this species. Even though ryegrass plants had a higher number of leaves occupying more volume than timothy per unit area, their leaves bent almost uniformly to a common plant height, which was better measured by the RGB-D system. This fact have been shown in similar studies. A good plant volume estimation using UAV-based image analysis was showed for small weeds [36]. In addition, the combination with multispectral images could improve the results. Estimating above-ground biomass helped monitor crops to predict yield in cereals [29]. The method was proven to be reliable in several scenarios, for instance, relating model biomass estimations to crop nitrogen requirements.

**Figure 9.** UAV estimated grass volume compared with measured dry biomass (**a**) and with raising plate-meter height (**b**) averaged per sampling plot, on field 1 and 2. Shadow indicates upper and lower confidence limits.

**Figure 10.** RGB-D estimated grass volume compared with raising plate-meter height averaged per sampling plot, on field 1 and 2. Shadow indicates upper and lower confidence limits.

Comparing costs of UAV-based with RGB-D-based systems, considerable differences exist. It has been argued that cost of aerial imaging is lower and can cover bigger areas [38]. The advantage of using UAV-based sampling was notorious in our study, where whole-field coverage could be achieved in less than 12 min. Contrarily, the RGB-D-based system needed considerably more time for all sampling plots. However, the RGB-D-based system in grassland production could be mounted on a tractor and monitoring can be done simultaneously with other agronomic operations, e.g., fertilization or reseeding, thus diminishing the cost.

### **4. Conclusions**

The use of UAV-based sampling systems offer a higher operative capability, being also affordable from an economic point of view. This system is more affordable, easier to operate and can cover a larger surface than on-ground systems. Since the resolution for agricultural operations does not need fine identification the end-details of the grass plants (i.e., tips of leaves), the use of aerial platforms could result a better option in grasslands. However, resolution of UAV acquired imagery is affected by other conditions external to the camera sensor, such as sunlight, clouds, wind speed and climate, which also affect the imagery resolution and thus the models for parameter estimations. Conversely, on-ground techniques with RGB-D cameras can produce highly detailed models. Nevertheless, far from higher fidelity models, the results showed more variability than UAV models. Increasing speeds for on-ground platforms would improve the performance of these systems, to monitor more area. On-ground RGB-D data can be effectively analysed with open source software, as it was done in this study, which may compensate and challenge the expenses, compared with aerial sampling. However, this technique can be destructive in pasture scenarios. Although

not part of this study, the use of on-ground reconstruction method could be more reliable for row-crops or breeding programs. Particularly, the inclusion of depth information in vegetation models, could contribute to improve the results in breading programs.

**Author Contributions:** Conceptualization, V.P.R.-A. and D.A.; methodology, V.P.R.-A., D.A. and J.M.P.; software, J.M.B.-G.; validation, V.P.R.-A., D.A. and J.P.; formal analysis, D.A., V.P.R.-A., J.M.P. and J.M.B.-G.; investigation, V.P.R.-A., and D.A.; resources, V.P.R.-A., and D.A.; data curation, D.A. and V.P.R.-A.; writing—original draft preparation, D.A. and V.P.R.-A.; writing—review and editing, D.A., V.P.R.-A., J.P. and M.H.; visualization, D.A. and V.P.R.-A.; supervision, D.A. and V.P.R.-A.; project administration, D.A. and V.P.R.-A.; funding acquisition, V.P.R.-A., D.A. and M.H.

**Funding:** This research was funded by the projects AGL2017-83325-C4-1-R and AGL2017-83325-C4-3-R (Spanish Ministry of Economy and Competition); the RYC-2016-20355 agreement, Spain, as well as, by the Norwegian research funding for agriculture and the food industry (NRF), project 255245 (FOREFF) and the Department of Grassland and Livestock, NIBIO, Norway.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**




c 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Automatic Parameter Tuning for Adaptive Thresholding in Fruit Detection**

### **Elie Zemmour \*, Polina Kurtser and Yael Edan**

Department of Industrial Engineering and Management, Ben-Gurion University of the Negev, Beer-Sheva 8410501, Israel; kurtser@post.bgu.ac.il (P.K.); yael@bgu.ac.il (Y.E.)

**\*** Correspondence: eliezem@post.bgu.ac.il

Received: 27 March 2019; Accepted: 2 May 2019; Published: 8 May 2019

**Abstract:** This paper presents an automatic parameter tuning procedure specially developed for a dynamic adaptive thresholding algorithm for fruit detection. One of the major algorithm strengths is its high detection performances using a small set of training images. The algorithm enables robust detection in highly-variable lighting conditions. The image is dynamically split into variably-sized regions, where each region has approximately homogeneous lighting conditions. Nine thresholds were selected to accommodate three different illumination levels for three different dimensions in four color spaces: RGB, HSI, LAB, and NDI. Each color space uses a different method to represent a pixel in an image: RGB (Red, Green, Blue), HSI (Hue, Saturation, Intensity), LAB (Lightness, Green to Red and Blue to Yellow) and NDI (Normalized Difference Index, which represents the normal difference between the RGB color dimensions). The thresholds were selected by quantifying the required relation between the true positive rate and false positive rate. A tuning process was developed to determine the best fit values of the algorithm parameters to enable easy adaption to different kinds of fruits (shapes, colors) and environments (illumination conditions). Extensive analyses were conducted on three different databases acquired in natural growing conditions: red apples (nine images with 113 apples), green grape clusters (129 images with 1078 grape clusters), and yellow peppers (30 images with 73 peppers). These databases are provided as part of this paper for future developments. The algorithm was evaluated using cross-validation with 70% images for training and 30% images for testing. The algorithm successfully detected apples and peppers in variable lighting conditions resulting with an F-score of 93.17% and 99.31% respectively. Results show the importance of the tuning process for the generalization of the algorithm to different kinds of fruits and environments. In addition, this research revealed the importance of evaluating different color spaces since for each kind of fruit, a different color space might be superior over the others. The LAB color space is most robust to noise. The algorithm is robust to changes in the threshold learned by the training process and to noise effects in images.

**Keywords:** adaptive thresholding; fruit detection; parameter tuning

### **1. Introduction**

Fruit detection is important in many agricultural tasks such as yield monitoring [1–8], phenotyping [9–11], precision agriculture operations (e.g., spraying [12] and thinning [13–15]), and robotic harvesting [16–18]. Despite intensive research conducted in identifying fruits, implementing a real-time vision system remains a complex task [17,18]. Features like shape, texture, and location are subject to high variability in the agricultural domain [18]. Moreover, fruits grow in unstructured environments with highly-variable lighting conditions [19,20] and obstructions [21] that influence detection performance. Color and texture are fundamental characteristics of natural images and play an important role in visual perception [22].

Nevertheless, despite the challenges, several algorithms have been developed with impressive detection rates of over 90–95%. However, these detection rates were achieved only for specific fruit (apples, oranges, and mangoes) [13,23,24]. These crops are known for their high ratio of fruits per image allowing easier acquisition of large quantities of data. Other crops such as sweet peppers and rock melons [23] with a lower fruit-to-image ratio yield lower results of 85–90% [19,23], even with the employment of cutting edge techniques such as deep learning [25]. Additionally, the crops with high detection rates with these results are very distinct from their background in terms of color, a central feature for a color-based-only detection. The only recent research made to detect green crops is on weed detection [26,27], but these green crops are held against a brown background. In grape clusters' detection [28], a rate of about 90% accuracy was achieved. Some work in the development of cucumber harvesters [29] has been done, but the success rate of harvesting was not distinguished from detection success rates and therefore cannot be reported.

In this research, we focus on the detection of three different types of challenging crops: red apples (a high ratio of fruits per image; however, we used a very small dataset http://icvl.cs.bgu.ac.il/lab\_projects/agrovision/DB/Sweeper05/#/scene), green grapes ("green-on-green" dataset [28]), and yellow sweet peppers (a low fruit-to-image ratio http://icvl.cs.bgu.ac.il/lab\_projects/agrovision/DB/Sweeper06/#/scene).

The adaptive thresholding algorithm presented in this paper is based on previous work [19] that was developed for sweet peppers' detection for a robotic harvester. A set of three thresholds was determined for each region of the image according to its lighting setting. Preliminary results of the same algorithm for an apple detection problem have been previously presented [30].

The current paper advances previous research [30] with several new contributions: (1) a new parameter tuning procedure developed to best-fit the parameters to the specific database; (2) the application and evaluation of the adaptive thresholding algorithm for different color spaces; (3) application of the algorithm to different types of fruits along with intensive evaluation and sensitivity analyses; (4) comparing the contribution of the new developments (Items 1–2) to previous developments.

### **2. Literature Review**

### *2.1. Detection Algorithms in Agriculture*

While this paper does not aim to be a review paper, in addition to the many recent reviews (e.g., [16,18,25]), a summary of previous results helps place the outcomes of this paper into context (Table 1).

As can be seen in the table, most algorithms focus on pixel-based detection (e.g., segmentation). This is indeed a common method in fruit detection (e.g., [31–34]). Many segmentation algorithms have been developed [35] including: K-means [36], mean shift analysis [37], Artificial Neural Networks (ANN) [38], Support Vector Machines (SVM) [39], deep learning [25], and several others.

A common challenge facing agriculture detection research is the lack of data [20], due to the harsh conditions for image acquisition and the tedious related ground truth annotation [40]. Current advanced algorithms (e.g., deep learning) require collecting many data. Therefore, to date, the best detection results are provided for crops with high fruit-to-image ratios (e.g., apples, oranges, and mangoes) and fruits that grow in high density, and hence, each image provides many data. Some research [26,41] aimed to cope with the need for large quantities of highly-variable data by pre-training a network on either non-agricultural open access data [26] or by generating synthetic data [41]. Both methods have shown promising results.

In this paper, we present an alternative direction, focusing on the development of algorithms based on smaller datasets. This research focuses on segmenting objects in the image using an adaptive thresholding method. Observing the histogram of the image color implies that a threshold can be determined to best differentiate between the background and the object distributions [42]. The threshold is computed by finding the histogram minimum (Figure 1) separating two peaks: the object and the background. However, the global minimum between the distributions is very hard to determine in most cases [43].

Currently, most optimal thresholding algorithms determine the threshold only in a one-dimensional space, for example in the RGB space, either R, or G, or B, or a linear combination of their values (e.g., grayscale transformation) [44]. In the transformation from three color dimensions into one, information is lost. In this research, a three-dimensional thresholding algorithm based on [19] was applied and evaluated also for additional color spaces (RGB, HSI, LAB, and NDI color spaces); a threshold is determined for each dimension in the color space.

There are two common adaptive thresholding algorithm concepts: (1) global thresholding, in which for each image, a different threshold is determined according to specific conditions for the entire image that is then transformed into a binary image; (2) local thresholding, in which the image is divided into sections and a different threshold is calculated for each section; the sections are then combined to a binary image. There are several methods that utilize dynamic local thresholding algorithms [45,46]. A common approach is to use multi-resolution windows that apply a bottom-up method, merging pixels while a criterion is met [45,46]. Another approach is the top down method, where the image is divided into subregions according to specific criteria. The top-down approach reduces execution speed and improves generalization [47] and was therefore used in this research.

The previously-developed algorithm by Vitzrabin et al., [19], which this research is based on, dynamically divides the image into several regions, each with approximately the same lighting conditions. The main contribution of the adaptive local 3D thresholding is a very high True Positive Rate (TPR) and a Low False Positive Rate (FPR) in the fruit detection task in an unstructured, highly-variable, and dynamic crop environment. Another contribution is the ability to change in real time the task objective to enable fast adaption to other crops, varieties, or operating conditions requiring small datasets and fast training. The algorithm's adaptation to the desired ratio between TPR and FRP makes it specifically fit for robotic harvesting tasks for which it was originally designed; it contributes to a better success rate in robotic operations in which at first, FPR should be minimum (to reduce cycle times), and when approaching the grasping operation itself [19], TPR should be maximized (to increase grasping accuracy). This can be applicable towards other fruit detection tasks (e.g., same as above for spraying, thinning and in yield detection first maximizing TPR for deciding on harvesting timing and then minimizing TPR for accurate marketing estimation).

**Figure 1.** Optimal threshold in the bimodal histogram.


DL = Deep Learning; PX = Pixels' segmentation; AD = Adaptive threshold; NB = Naive Bias;W=Window detection; F = F-score; A = Accuracy; P = Precision; R = Recall; \* Calculated F-score based on reported TPR and FPR according to Equation (4)

### *2.2. Color Spaces*

Images can be represented by different color spaces (e.g., RGB, HSI, LAB, and NDI), each one emphasizing different color features [22]. RGB is the most common color space representing each pixel in the image in three color channels as acquired: red, green, and blue. HSI represents every color with three components: hue (H), saturation (S), and intensity (I), also known as HSV [37]. The LAB color space is an approximation of human vision [36] and presents for each pixel the L\* (Lightness) from black to white, a\* from green to red, and b\* from blue to yellow. An additional color space commonly employed in the agriculture field [19] is the Normalized Difference Index (NDI) space. The NDI is used to differentiate between fruit and background [48] since it helps to overcome changes in illumination and shading due to its normalization technique [49]. Each dimension in the NDI space is the normalized difference index between two colors in the RGB space, resulting in three dimensions (Equation (1)). These operations are applied for all pixel locations in the image, creating a new image with this contrast index. These equations yield NDI values ranging between −1 and +1.

$$NDI\_1 = \frac{R-G}{R+G}; NDI\_2 = \frac{R-B}{R+B}; NDI\_3 = \frac{B-G}{B+G} \tag{1}$$

### **3. Materials and Methods**

### *3.1. Databases*

The algorithm was evaluated on three databases representing three different fruit colors: red (apples), green (grapes), and yellow (peppers) and different types of fruits (high image and low image ratios) for two environmental settings (greenhouse and field) in different illumination conditions. Images were acquired with different cameras. Each image was processed by a human labeler who performed manual segmentation of the image into targets and background (Figures 2 and 3) by visually analyzing the image and marking all the pixels considered as a fruit, in accordance with the common protocols used in the computer vision community [50].

### 3.1.1. Apples

The orchard apples database included 113 "Royal Gala" apples in 9 images acquired from an orchard in Chile in March 2012 under natural growing conditions with a Prosilica GC2450C (Allied Vision Technologies GmbH, Stadtroda, Germany) camera with 1536 × 2048 resolution; the camera was attached to a pole. The images were captured in daylight; half of the images were acquired under direct sunlight, and half of the images were acquired in the shade.

**Figure 2.** Apple (**top**) and grape (**bottom**) RGB image (**left**) and ground truth (**right**) examples

### 3.1.2. Grapes

The images were acquired in a commercial vineyard growing green grapes of the "superior" variety. An RGB camera (Microsoft NX-6000) with 600 × 800 resolution was manually driven, at mid-day, along a commercial vineyard in Lachish, Israel, during the summer season of 2011, one month before harvest time. The images were captured from five different growing rows. A set of 129 images was acquired and included 1078 grape clusters.

### 3.1.3. Peppers

The dataset included 30 images of 73 yellow peppers acquired in a commercial greenhouse in Ijsselmuiden, Netherlands, using a 6 degree of freedom manipulator (Fanuc LR Mate 200iD/7L), equipped with an iDS Ui-5250RE RGB camera with 600 × 800 resolution. Two different datasets were created by marking the images twice. The first dataset included only peppers with high visibility (denoted as "high visibility peppers"; this was done for 10 images of 25 yellow peppers). In the second dataset, all peppers were marked including peppers in dark areas that were less visible in the image (denoted as "including low-visibility peppers", done for all 30 images) (Figure 3).

**Figure 3.** Peppers tagging example. **Top**: RGB image (**left**) and ground truth (**right**) example in high visibility. **Bottom**: RGB image (**left**) and labeled image (**Right**). "High-visibility peppers" marked in red and "low-visibility peppers" marked in blue.

### 3.1.4. Performance Measures

Metrics included the TPR (True Positive Rate, also noted as hit), FPR (False Positive Rate, also noted as false alarms), and the F-score (the harmonic mean of precision and recall [51]. The TPR metric (Equation (2)) states the number of correctly-detected objects relative to the actual number of objects, while the FPR metric calculates the number of false objects detected relative to the actual number of objects (Equation (3)). The F-score (Equation (4)) balances between TPR and FPR equally.

$$TPR = \frac{N\_{TDF}}{N\_F} \tag{2}$$

where *NTDF* is the number of pixels detected correctly as part of the fruit and *NF* is the actual number of pixels that represent the fruit.

$$FPR = \frac{N\_{FDF}}{N\_B} \tag{3}$$

where *NFDF* is the number of pixels falsely classified as fruit and *NB* is the number of pixels that represent the background.

$$F(TPR, FPR) = \frac{2 \ast (TPR \ast (1 - FPR))}{TPR + (1 - FPR)} \tag{4}$$

### *3.2. Analyses*

The following analyses were conducted for the three databases, apples, grapes, and peppers, using 70% of the data for training and 30% for testing [52]. This rate was chosen to make the algorithm performances more rigid since the number of images in each DB was relativity small. In addition, to ensure robust results, each split into training and testing was randomly performed 5 times, and all detection results reported are an average of the 5 test sets.

	- 1. Noise: Noise was created by adding to each pixel in the RGB image a random number from the mean normal distribution for noise values up to 30%. The artificial noise represents the algorithms' robustness toward other cameras with more noise, or when capturing images with different camera settings. Noise values of 5%, 10%, 20%, and 30%, were evaluated.
	- 2. Thresholds learned in train process: Thresholds were changed by ±5%, ±10%, and ±15% according to the threshold in each region.
	- 3. Stop condition: The selected STD value was changed by 5% and 10% to test the robustness of the algorithm to these parameters.
	- 4. Training vs. testing: The algorithm performances were evaluated while using different percentages of DB images for the training and testing processes.

### **4. Algorithm**

### *4.1. Algorithm Flow*

The overall flow of the algorithm is outlined in Figure 4, and it is as follows. The RGB images were the inputs for the training process. Some areas in the images contained more illumination than others, depending on the position of the light source and shading caused by leaves, branches, and the covering net when it existed. To overcome this issue, the algorithm divided each image into multiple sub-images, with approximately homogeneous illumination conditions (Figure 5). These sub-images were categorized into three illumination conditions: low, medium, and high. The illumination level was obtained by calculating the average on the grayscale sub-images. The grayscale image showed values between zero (completely dark) and 255 (completely white). In the previous algorithm [19], the sub-images were categorized into groups using levels selected empirically as 10, 70, and 130, corresponding to low-, medium-, and high-level images based on manual image analyses. The high value was set as 130 in order to filter overexposed areas in the images. In the current algorithm, a tuning parameter process (detailed in Section 4.3) was developed to determine these three values.

**Figure 4.** Algorithm flowchart.

**Figure 5.** Image split into sub-images: visualization.

The algorithm then created a 3D color space image (transformed the RGB image to NDI, HSI, and LAB space or used the RGB space directly). For each color dimension, a binary image (mask) was created, where each pixel that represents the fruit received a value of one and all other pixels received a value of zero. Finally, the algorithm created an ROC (Receiver Operator characteristics Curve) representing TPR as a function of FPR [53] including all nine thresholds learned from the training process. Figure 6 presents an example of nine ROC curves computed for three sub-images with different Light levels (L1, L2, L3) in the NDI color space. In this example, the sub-image with Light Level 2 (L2) in the first NDI dimension obtained the best performances (high TPR and low FPR).

In the test process, the algorithm received RGB images from the camera in real time, transformed the representation to the relevant color space (HSI/LAB/NDI), and created a binary image by applying the thresholds as follows: three thresholds, one for each dimension, were calculated from the nine thresholds learned by linear interpolation between two of the three illumination regions (low, medium, and high) selected as closest to the calculated illumination level for the specific sub-image from the grayscale image and using Equation (5).

$$T = \frac{T(LL[i]) \* (currentLL - LL[i])(LL[i+1] - currentLL)}{LL[i+1] - LL[i]} \tag{5}$$

where LL is an array of the light level values for each group: LL = [low,medium,high]), and *i* is the light level index representing the group that is the closest to the current image light level from below.

For example, if the current light level is 40 and the thresholds in the train process for the low, medium, and high light levels were 10, 70, and 130, the threshold would be calculated in the following way (Equation (6)):

$$T = \frac{T(10)(40-10) + T(70)(70-40)}{70-10} \tag{6}$$

The end of the process results in binary images where white areas (pixels with a value of one) in the binary image represent the fruits and the black areas (pixels with a value of zero) represent the background (see Figures 2 and 3). In total, the algorithm created 7 binary images, 3 images corresponding to the three color space dimensions and 4 binary images corresponding to the intersections between the first 3 binary images. For example, the intersection between Binary Images 1 and 2 resulted in a binary image 1 ∩ 2 that contained white pixels only, where the same pixels were white in both Images 1 and 2 (Figure 7).

**Figure 6.** Nine ROC curves: 3 dimensions × 3 light levels *NDIi* − *Lj*; *i* represents the color space dimension; *j* represents the illumination level.

**Figure 7.** Use of dimension intersection to increase performance.

### *4.2. Morphological Operations: Erosion and Dilation*

The algorithm result is a binary image with major fruit detected and small clusters of pixels that were wrongly classified as fruits (e.g., Figure 8). In addition, some fruits were split between several clusters (e.g., Figure 8). To overcome these problems, several morphological operations were performed based on previous research that indicated their contribution [19]: erosion followed by dilation with a neighborhood of 11 × 11-pixel squares. The square function was used since there was no pre-defined knowledge about the expected fruit orientation. To connect close clusters, the closing morphological operation was then applied by dilation followed by erosion implemented with a 5 × 5-pixel square neighborhood.

**Figure 8.** Morphological operation.

### *4.3. Parameter Tuning*

The algorithm used several parameters that influence the algorithm performances: T1, T2, STD, classification rule Direction (D1/D2). The following parameter tuning procedure (Figure 9) was developed and should be performed when exploring images from a new database or when exploring a new color space or new operating conditions (cameras, illumination). The parameters, as detailed below, are: light level thresholds, stop splitting condition, classification rule direction.

### 4.3.1. Light Level Thresholds (T1, T2)

The algorithm split the images into sub-images set to 1% pixels of the entire image. Then, the algorithm computed the light level of each sub-image by calculating the average pixels values of

the grayscale sub-image. Finally, the algorithm grouped the sub-images into three light level categories (see Figure 10) using two thresholds as presented in Equation (7).

$$i = \begin{cases} Low & 0 < \mathbf{x} < T\mathbf{1} \\ Median & T1 < \mathbf{x} < T\mathbf{2} \\ High & \mathbf{x} > T\mathbf{2} \end{cases} \tag{7}$$

where *i* is the light level index, as detailed above in Equation (5).

**Figure 9.** Parameter tuning process.

**Figure 10.** Sub-images level of light distribution.

Research was done to identify the PDF function of the data distributions of each database through a *χ*<sup>2</sup> goodness of fit test. However, since these tests did not reveal significant results [54], the thresholds were selected as follows: T1 and T2 were chosen so that 15% of the data would be categorized as low, 15% as high, and 70% as medium.

Note that as described in the algorithm flow, the algorithm used a third threshold. Sub-images above that threshold were ignored in the training process since they were almost completely white.

### 4.3.2. Stop Splitting Condition (STD)

The algorithm split an image into sub-images until the sub-image achieved a predefined Standard Deviation (STD) value. This approach assumes that a larger sub-image contains a higher STD value. To test this assumption, STD was calculated for different sizes of sub-images for the different databases. The stop condition value (STD minimum value) was determined by maximizing the F-score (Equation (4)).

### 4.3.3. Classification Rule Direction (D1, D2, D3)

As detailed in the Introduction section, as part of the thresholding process, an intensity value was determined to differentiate between objects and background pixels. In order to execute the thresholding process, the algorithm must receive as input the classification rule direction (the algorithm must automatically determine if the intensity of the background distribution is higher or lower than the intensity of the object distribution in each color dimension).

This information was learned as part of the tuning procedure. A simple heuristic rule was used as follows based on the assumption that the images contained more background pixels than objects: (1) execute *image* > *Threshold*; (2) if the pixels categorized as background represent less than 70% of the image, reverse the thresholding direction *images* < *Threshold*.

### **5. Results and Discussion**

### *5.1. Sub-Image Size vs. STD Value*

For images with a size of 300 × 300 or lower, splitting an image into small sub-images (small S) decreases the average STD of the sub-images (in all three databases; Figure 11). Although a direct decrease in very large images is not noted, we still can conclude that splitting a large image to 300 × 300 or lower will decrease the average STD.

**Figure 11.** Sub image size vs. average STD.

### *5.2. Tuning Process*

This section presents the tuning procedure results, including thresholds derived to categorize the sub-images into light level groups, as well as the recursive stop condition that achieved the best result for each database.

### 5.2.1. Light Level Distribution

The light level distribution was computed for each database (Figure 12) along with T1 and T2 (Table 2). The variation in the light distributions between the different databases are described in Table 3. The variance of light in the grape databases was significantly higher than in both the apple and the pepper databases, the pepper database being significantly darker and highly skewed. Therefore, for each database, the selected T1 and T2 were significantly different, implying the importance of the tuning procedure.

**Table 2.** T1 and T2 values determined for each database.


**Table 3.** Descriptive statistics of the different light distributions.


**Figure 12.** Light level distribution computed for each database.

### 5.2.2. Stop Splitting Condition

Using a low STD value as a stop condition increased the performance (Figure 13). This happens since smaller sub-images contain less illumination differences. However, small STD values can create also too small sub-images, which may not contain fruit and background pixels in the same frame. In these cases, the algorithm cannot learn a threshold that could differ between them. Additionally, results revealed that when using high STD values, the performances remained constant. This happens since beyond a certain value, the algorithm did not split the image even once.

As part of the parameter tuning process, the STD value was selected by testing the performances of a range of STD [0, 100]. For each STD value, the algorithm ran five iterations where it randomly selected P% of the images, from the selected images; it used 70% for training and 30% testing. The final selected STD values are presented in Table 4 for each database and color space (using P = 30% and 50%).

**Figure 13.** F-score vs. increasing STD value as the stop condition for the recursive function on the apple DB.

### 5.2.3. Classification Rule Direction

As shown in Table 5, the direction of the classification rule in the thresholding process can be different for each color dimension; therefore, this must be learned as part of the tuning procedure.


**Table 4.** STD value chosen for each database and color space. D, Direction.

### *5.3. Color Space Analyses*

In this section, algorithm performance results are presented for each color space followed by a table representing the best color space performances including the performances for all color space dimensions' combinations.

### 5.3.1. Apples

Results (Figure 14) revealed that NDI and LAB color spaces resulted in similar best performances. In Table 5, the preferences for each dimension in the NDI color space and the performances when using the intersection between them is shown. The NDI first dimension (see Equation (1)) represents the difference between the red and green colors in the image. The objects in this database were red apples, and most of the background was green leaves; therefore, as expected, the first NDI obtained the best F of 93.17%. In the LAB color space, results (Table 5) revealed that the second dimension (A) yielded the best F-score of 93.19.

**Figure 14.** Color space performances for (**top**) apples (**left**), grapes (**right**), peppers (**bottom**) with high visibility (**left**), and peppers with low visibility (**right**).

### 5.3.2. Grapes

The NDI color space obtained the best result for grapes (Figure 14) with an F-score of 73.52%. The second-best color space was the LAB with an F-score of 62.54%. The best NDI results were obtained using the second dimension (Table 5).

### 5.3.3. Peppers

High visibility: Figure 14 indicates that the HSI color space obtained the best results with relatively low FPR (0.81%) and very high TPR (99.43%), resulting in a high F-score (99.31%). The second-best color space was NDI with FPR = 2.48% and TPR = 97.96% (F = 97.72%). The best HSI result was obtained using the combination of the first and the second dimensions (Table 5).

Including low visibility: Figure 14 indicates that the NDI color space obtained the best results with relatively low FPR (5.24%) and very high TPR (95.93) resulting in a high F-score (95.19%). Although for the "high visibility" peppers HSI obtained the best performances, when trying to detect peppers in dark areas that were less visible, NDI showed better results. The best NDI result was obtained using the intersection between the first and the second dimensions (Table 5).


**Table 5.** Performances of each color space for each dimension and intersection for all datasets.

### *5.4. Sensitivity Analysis*

### 5.4.1. Noise

Analysis showed that the algorithm was robust to noise in the image up to 15% in the apple and pepper databases (Figure 15). The grape images were more sensitive to noise, and performance dropped when noise values of 5% were added. Although better F-score values were obtained for NDI and HSI for grapes and peppers, we can see that the LAB color space yielded more robust performance when adding noise to the images.

**Figure 15.** Sensitivity analysis: adding noise to the image.

### 5.4.2. Thresholds Learned in the Training Process

As expected, TPR decreased when the threshold values changed. The algorithm was relatively robust to the change in the thresholds for apples and peppers. Performance in the grape images was more sensitive to threshold changes, and yielded a significant decrease in TPR when increasing the threshold value (Table 6).


**Table 6.** Threshold values changed by ±5%, ±10%, and ±15% according to the threshold in each region.

### 5.4.3. Stop Condition

The algorithm showed more robustness to apple and pepper images than grapes (Figure 16).

**Figure 16.** Sensitivity analysis: adding noise to STD stop condition.

### 5.4.4. Training/Testing

The expectation was that more training images would lead to better performance until over fitting was accommodated. There was a clear increase in TPR; however, FPR increased as well at 80% and 90% training (Table 7).

The tuning process resulted (Table 8) in increased performances for both the grape and pepper databases with a 6.22% and 0.84% increase, respectively. The results for the apple database were similar with only a 0.1% increase, as expected (since this was similar to the database from which the previous parameters were derived).


**Table 7.** Performances vs. different % images database as the training set.

### *5.5. Morphological Operations*

The morphological operations process increased the F-score by 2.85%, 8.59%, and 2.71% for the apple, grape, and pepper databases respectively, (Figure 17).


**Table 8.** Parameter tuning contribution to algorithm performances.

**Figure 17.** Sensitivity analysis: adding noise to the STD stop condition.

### **6. Conclusions and Future Work**

The algorithm successfully detected apples and peppers (Table 1) in variable lighting conditions resulting in an F-score of 93.17% and 99.31%, respectively, which is one of the best detection rates achieved to date in fruit detection to the best of our knowledge. The average F-score across all datasets was 88.8 (Table 1). Previous research achieved the lowest F-score (65) with the method of [11] for red and green pepper plants, while oranges obtained the highest F-score (96.4) with that of [24]. Previous reported results (Table 1) revealed a 91.5 and 92.6 F-score for peppers in [19,34], respectively, versus our method, which resulted in an F-score of 99.43. For apple images, our method obtained similar F-score performances as in previous work (~93), even though the dataset was much smaller (64 vs. 9 images).

The high F-score was mostly due to low FPR values (except for grapes). In addition, our method achieved high performances using a relatively small dataset.

The algorithm resulted in less impressive results in the grape database of 73.52% due to the difficulties in differentiating between green fruits and green background (leaves). In this case, additional features (e.g., morphological operations fitted for grapes; see [28]) should be used to increase performance. However, this requires the development of specially-tailored features. It is important to note that these results cannot be compared to the weed detection results presented in Table 1, since the background of the green obkects was the ground on which it grew and not the green leaves. Different color spaces yielded the best results for each fruit variety, implying that the color space must be analyzed and fitted to the specific fruit. The LAB color space was more robust to noise in images and hence should be used when images are of low quality. The algorithm was robust to changes in the threshold learned by the training process and to noise effects in images. Morphological operations such as erosion and dilation can improve performance in agriculture images and hence should be utilized. The tuning process developed in this paper enabled the previous algorithm [30] to adapt automatically to changing conditions/objectives (i.e., to detect other fruit with different colors and

other outdoor conditions) and, hence, should be used for improved target detection in highly-variable illumination conditions. Finally, this work has presented the feasibility of color-based algorithms solving the challenges that advanced machine learning algorithms face such as small training sets (small number of images and/or small number of fruits per image). This work has shown that for challenging color conditions (e.g., green on green for grapes), additional features should be considered for improved fruit detection.

**Author Contributions:** E.Z.: Formal algorithm development and analysis, Investigation, Methodology, Software, Writing—Original draft, Writing—review & editing; P.K.: Formal analysis, Methodology, Supervision, Validation; Writing—review & editing; Y.E.: Methodology, Supervision, Validation, Writing—original draft; Writing—review & editing.

**Funding:** This research was partially supported by the European Commission (SWEEPER GA No. 664313) and by Ben-Gurion University of the Negev through the Helmsley Charitable Trust, the Agricultural, Biological and Cognitive Robotics Initiative, the Marcus Endowment Fund, and the Rabbi W. Gunther Plaut Chair in Manufacturing Engineering.

**Conflicts of Interest:** The authors declare no conflict of interest.

### **References**


© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

*Article*
