1. Introduction
Digital models allow information gathering about crop status for agricultural management or breeding programmess. The use of high-throughput plant phenotyping greatly impacts in plant characterization and plant selection processes [
1]. Digital plant models can be objective tools for quantifying plant characteristics that avoid the appreciative differences incorporated in human judgement. Phenological traits can be accurately measured, expediting the decision-making processes [
2]. Thus, plant models can measure and characterize complex plant shapes, providing essential information to plant breeding programmes that is necessary for modifying traits related to physiology, architecture, stress or agronomical management [
3]. Plant modeling makes information, such as plant treatment or growth assessments, more accessible to agricultural managers, therefore providing managers with a detailed and comprehensive understanding of plant development throughout the life cycle. In addition to plant breeding, digital models can be used to help growers in disease detection, yield estimation or biomass production, to characterize fruit quality, discriminate between weeds and crop, to correlate with the photosynthetic activity, etc. Although the creation of digital models allows for a better understanding of the internal processes in plant growing, they require technological developments for sensing and capturing purposes. New sensors and procedures are required to achieve these objectives. Electronic devices can rapidly characterize crops in a non-destructive, accurate and repeatable manner. Most of the sensing technologies used are based on two-dimensional characterization, from visible imagery to thermal, multispectral imaging or fluorescence sensors [
4]. Fluorescence sensors are typically used for the extraction of parameters related to leaf composition; they expose plants to a specific wavelength of visible or ultraviolet (UV) light, and the emitted fluorescent radiation is measured and the calculated values are related with nitrogen content, plant stress, etc., due to the metabolic changes occurring in the chloroplast. This method is typically used for measuring the response to induced stress [
5]. In similar applications, spectral reflectance sensors are commonly used for nitrogen assessment or to separate plants from the ground by distinguishing the values of different spectral bands [
6]. In this way, it is possible to combine these spectral values to quantify crop aspects such as leaf area index (LAI), biomass or expected yield. Thermal imaging is generally used for the monitoring of water stress by measuring the leaf or canopy temperature [
7] whereby the measured values are related to the evapotranspiration capacity of the plants and the water availability, in order to manage precise irrigation systems. Visible imaging methods are the most extensively developed; they are widely used and have been improved in recent decades [
8]. These methods are a fast, reliable and simple ways to describe plant dynamics using high resolution and low-cost cameras. However, some problems arise when these methods are used in complex environments, such as uncontrolled illumination, the presence of shadows or overlapping leaves leading to information loss, which may affect the accuracy of the measurements [
9]. Although using red-green-blue (RGB) cameras is a common method of plant phenotyping, the details included in a planar image only provide a limited view of the scene. However, these cameras are widely used for phenology monitoring, nitrogen application, yield monitoring and weed discrimination [
10].
The aforementioned sensors only provide information in a two-dimensional plane while other characteristics of the third dimension are hidden. The creation of 3D models by distance sensors is opening new opportunities in phenotyping processes by increasing efficiency and accuracy [
11]. During the last two decades, the use of distance sensors has permitted distance measurement from the sensor to an object, thereby becoming widely used in agricultural research. Distance sensors can describe plant morphology with high accuracy and 3D models can be created by the addition of a third dimension due to the movement and spatial displacement of the sensor. The main distance sensors that are available for many operations and that can supply data suitable to create 3D models for plant reconstruction are ultrasonic and LiDAR (light detection and ranging) sensors. However, these two types of sensors are completely different with regards to their mechanism, cost, field of view and accuracy. An ultrasonic sensor measures distance an echo; sound waves are transmitted and return back to their source as an echo after striking an obstacle. Since the speed of the sound is known and the travelling time can be measured, the distance is then calculated. This methodology has been used for different agronomical tasks, such as crop characterization, fertilizer application or weed discrimination [
12]. Although, the use of ultrasonic devices for canopy identification is commonly accepted as a good method for tree characterization and fertilization, their limited resolution in combination with their wide field of view makes them an invalid tool for precise plant description. On the other hand, LiDAR sensors are able to provide 2D- or 3D-plant models by displacing the sensor following a path and storing the sensor’s relative position. The LiDAR sensors fire rapid pulses of laser light at a surface and measure the amount of time it takes for each pulse to bounce back; this time is then converted to distance. The accuracy of LiDAR is higher than that of ultrasonic sensors, and they have been widely used for tree canopy description [
13] and lastly in breeding programmes creating high-resolution 3D models. LiDAR sensors are robust and reliable and they allow scanning at high frequency and over large distances. Similar systems, such as radar systems [
14], hemispherical photography [
15], and magnetic resonance and X-ray visualization, have been studied for plant morphology characterization [
16].
Although the aforementioned methods can accurately describe plant shape, they are time-consuming and lack colour information, which, in many cases, limits their application. While colour can be added to the scene by means of sensor fusion, this combination of the traditional sensing devices with RGB information from a camera is technically challenging and remains limited. The emergence of new sensors in the market coming from different sectors opened new modes of plant description. Initially distributed mainly by the gaming industry, today there are a large number of commercially available RGB-Depth cameras. The more advanced ones combine more than one sensor like the Multisense from Carnegie Robotics (Carnegie Robotics, LLC, Pittsburgh, Pennsylvania) which combines lasers, depth cameras and stereoscopic vision; the ZED Stereo Camera (Stereolabs Inc, San Francisco, CA, USA) which also provides depth information; or the DUO MLX sensor (Code Laboratories Inc, Henderson, NV, USA) which uses a stereoscopic camera to provide depth information.
However, Microsoft Kinect v2® (Microsoft Corp., Redmond, WA, USA) is the primary and most widely known sensor in this market and has been widely used for plant characterization in agriculture. For instance, the maximum diameter, volume and density of sweet onions were estimated with an accuracy of 96% [
17]. Paulus, et al. [
18] evaluated the possibilities of RGB-D cameras in comparison with a high-precision laser scanner to measure the taproots of sugar beet, the leaves of sugar beets and the shape of wheat ears. These authors concluded that the low-cost sensors could replace an expensive laser scanner in many of the plant phenotyping scenarios. Andújar, et al. [
19] reconstructed trees at different wind speeds to estimate the LAI and the tree volume; the resolution potentials of this sensor from null wind speed to high wind speeds were successfully demonstrated as the 3D models were properly reconstructed at high wind speeds. Far from being used only for plant characterization, a Kinect sensor could be used to estimate plant health. The NDVI (normalized difference vegetation index) can be calculated by using the values of near-infrared (NIR) pixels and red pixels for plant health monitoring [
20]. Consequently, new applications for sensors, unlike those traditionally used in plant phenotyping, are emerging. However, there are still some unresolved issues, such as a proper reconstruction of the end-details required in some phenotyping programmess. In this regard, a new approach has emerged in recent years; the use of photogrammetry using low-cost RGB cameras has a high impact in the phenotyping processes. This method, also known as stereo vision, is relatively inexpensive but allows the acquisition of a high definition of details. By examining the relative positions of objects in the two planes similar to the human vision, 3D information can be extracted. The 3D scene can be reconstructed either by a single camera or multiple cameras shooting around the object of interest. For model reconstruction, some parameters should be fixed prior to image acquisition; for example, the distance from the camera to the target should be established based on the camera’s focal length in order to reach a proper overlap between images according to the reconstruction algorithm. This principle has been already used for plant reconstruction to extract information regarding plant height, LAI and leaf position [
21]. In addition, a combination of cameras and images from Structure from Motion (SfM) creates high-fidelity models wherein a dense point cloud is created by means of a growing region [
22]. Similarly, the multi-view stereo (MVS) technique creates dense 3D models by fusing images [
23]. Nevertheless, there are still issues that should be improved and considered during the reconstruction processes. Boundaries and limitations of methods must be clearly stated for its usage. Some aspects that need to be improved, such as the accurate reconstruction of end-details, should be well know before of applying reconstruction techniques for more detailed applications. In addition, these low-cost methods for plant reconstruction are opening up possibilities for an easy use of a single camera or a budget device without expensive additional equipment for plant model reconstruction.
Thus, research of new methods and modes that employ depth for 3D plant reconstruction for plant phenotyping has a strong impact and still are of great interest to the sector, in the search for inexpensive equipment that can perform field high-throughput phenotyping functions with adequate precision requirements. Furthermore, the use of 3D-structural models can improve the decision-making processes. The objective of this study was to assess the combination of some of the most novel methods and sensors for plant reconstruction and to compare them to the planar RGB method. The capacities of RGB-D cameras and photogrammetry for 3D model reconstruction of three crops at different stages of development were compared. The crops selected to be characterized had differential shapes; they were maize, sugar beet and sunflowers. We specifically explored the capabilities and limitations of the different principles and systems in terms of accuracy to extract various parameters related to the morphology of these crops. The methodology compared the accuracy and the capacities of self-develop algorithms to process the point clouds for solid model construction.
3. Results
The plant reconstruction method created different models (
Figure 2), which were similar to reality; three-dimensional models were properly reconstructed lacking only small details. These errors varied slightly between both systems the Kinect-based v2 measurements were more imprecise than the photogrammetry measurements. However, in all of the models, the errors appeared at the end of leaves and branch borders. Although the final end-details were not properly reconstructed, the reconstruction displayed high fidelity, as seen by the statistical analysis. The measured parameters using the modeling systems showed a high correlation with the LA and dry biomass ground truth data. Indeed, strong consistency in linear correlation equations was observed between the estimated parameters by the digital 3D model and their actual measured values. Regarding 2D RGB image processing, the results differed with the three-dimensional systems, as their precision was lower in every case. In addition, the number of leaves was always accurately estimated by the three-dimensional methods, while the 2D method always underestimated the number of leaves due to the overlapping of upper leaves. No missed information was found on 3D reconstruction. Every leaf appeared reconstructed in the digital models with only small differences in LA estimation, as shown in
Figure 2. Although the general values were similar between both three-dimensional methods, small differences arose when the sampled crop was analyzed independently. Regarding 2D, dicots showed less underestimation of number of leaves. Small leaves on the bottom were occluded by top ones. Only one to two leaves were occluded on sugar beet and sunflower plants. In maize plants due to the parallelism between leaves, the number of occluded leaves varied from 0 (small plants) to a maximum of four true leaves in the case of those plants at BBCH 19. These slight differences can be due to manual movement of the camera and areas becoming hidden while scanning [
31]. Therefore, fine details can be improved by increasing the sampling size or reducing the distance between the camera and the target plant. However, this modification in the acquisition process would increase the time required for data acquisition and the data processing.
While separate analyses concerning each crop showed differences between them, the sampling methodology (SfM, RGB-D or planar RGB) was the main factor in terms of accuracy of the constructed models (
Table 1). The average height of the maize plant was 61.17 cm with an R
2 = 0.998 between the actual height and the measured height for both Kinect-based calculations and the photogrammetry reconstruction methods at a significance of P < 0.01. Similarly, the results of the RMSE with values of 7.58 and 4.48 and MAPE values of 4.34% and 3.08% for each method, respectively, confirmed the accuracy, as they indicate little deviation of the actual values from those estimated by the model. Varied results were found when considering every crop individually; dicot crops were more similar to the real plants than the monocot crop (maize). In addition, the plant growing stage influenced the results obtained, with smaller plants showing slightly higher differences with the ground truth data. Considering the dicotyledonous crops (sunflower and sugar beet), the results improved the accuracy with respect to the maize models- the leaf shape allowed for better reconstruction leading to better results. The actual height of sunflower and sugar beet averaged 34.67 cm and 21.28 cm, respectively. The photogrammetry and Kinect v2 models in sunflower plants generated underestimates of 1.3 cm and 0.94 cm, respectively, which was similar for sugar beet plants wherein underestimates of 0.91 cm for the maximum height in both crops were calculated for the 3D models. Nevertheless, regression analysis showed a strong correlation (R
2 = 0.998 and R
2 = 0.999) with extracted values from the model (
Figure 3). Both crops’ RMSE values exhibited similar tendency of lower values than maize plants for both methodologies. The comparison of sugar beet models with the actual values reached a RMSE of 0.81 cm for the photogrammetric method, and a similar value was obtained for sunflower the difference with ground truth showed an RMSE of 0.89 cm and a MAPE error of 2.7%. When considering the use of Kinect v2, the calculated values of the RMSE and MAPE showed a similar tendency; these parameters showed that maize plants models had higher error than dicots crops. However, the Kinect v2 models showed slightly higher errors than those created by photogrammetry (
Table 1). The underestimation of maximum height in maize plants can be affected by the difficult identification of the plant at the end of the stem. Although this point is clear for dicots species, it remains unclear for maize plants with regard to their leaves and stem. While the measurements were conducted manually in the models, the use of an automated approach of colour segmentation followed by a definition of the region of interest defined by the stem through a circular Hough transform could improve the results [
23]. Although, the use of this automated process would increase the accuracy on height measurements, the current level of detail can be enough for most of the phenotyping purposes; the possibilities of both methods to determine height have been displayed. While the acquisition time and post-processing is higher in the case of photogrammetry, slightly better results were obtained in terms of determining plant height.
Regarding the comparison with ground truth data, the estimation of LA using three-dimensional models, was in good agreement with the reality. The RGB models, which were 2D-based, reached low values of agreement due to leaf overlapping. The actual averaged values of LA were 989.11 cm
2 for maize, 657.44 cm
2 for sugar beet and 495.22 cm
2 for sunflower plants; planar models underestimated these values, and the correlations were not significant. Thus, the use of 2D images for LA determination cannot be considered an accurate method. When maize plants were modelled, the values extracted from the model averaged an LA of 868.3 cm
2 and 954.4 cm
2 for the Kinect-based system and the photogrammetry reconstruction methods, respectively. The correlation between actual and model values was significant at P < 0.01 with R
2 = 0.997 and R
2 = 0.995, respectively (
Figure 4). The statistic parameters of RMSE and MAPE showed high accuracy in model fitness compared to the actual LA. The RMSE values were of 4393.4 cm
2 and 22267 cm
2, and MAPE values revealed a deviation of 3.88% and 11.66% for photogrammetry and Kinect v2, respectively. Similar to the aforementioned height values, the dicot crops (sugar beet and sunflower) were modelled with higher accuracy with respect to LA (
Table 2). The MAPE values showed a deviation of approximately 1% when plants were modelled with photogrammetry and even higher for Kinect v2 models. The regression models were always significant at P < 0.01 (
Figure 4) with low deviations. Similar values were obtained when correlating LA with the dry biomass (
Figure 5).
Although there are differences between the studied methods, both of the methods are consistent and generate significant values. The photogrammetry models attained higher fidelity, as they have a better capacity of reconstructing end-details and thinner stems; this effect was more pronounced in maize plants due to their elongated shape. Similar results were obtained by Andújar, et al. [
23] when modeling weed plants. Our study showed that there are still some technical aspects to be improved, such as an accurate reconstruction of the end-details, mainly for monocot plants. Additionally, the effect of the underestimation of height underestimation could be corrected with the automated recognition of the region of interest by image-processing software in combination with the reconstruction algorithm, for instance through a colour identification. The use of SfM can be combined with other algorithms for automatic recognition helping breeders to take fast decisions. However, the method requires of automatic system for image acquisition and processing since the time–cost relationship is not suitable for large scanning areas. Although, the use of the current process is suitable for research purposes it needs of a integration on autonomous systems for reducing the time-cost function, reducing the use of human labor during the image acquisition process. Rose, et al. [
30], using a similar approach of SfM- and MVS-based photogrammetric methods, created models at high resolution in lab conditions. These authors reported missing parts and triangulation errors at the leaf and branch borders which are similar to the current study. However, the analysis of different crop species has led to differing results mainly because overestimation is due to leaf borders and stem borders. Thus, thinner plants, such as monocots, would lead to more inaccurate results. This effect was also observed for all methods studied; although, there was a lower level of detail, the methods revealed similar effects, usually showing differences regarding plant shape. The close-up laser scanning method has shown a high accuracy in point cloud reconstruction in combination with the steps needed to create a 3D model; this method is fast and attains a high level of detail. However, investments costs are higher and the use of this method requires calibration and warming-up [
32]. The use of RGB images in phenotyping processes considerably reduces the acquisition cost. Imagery systems are the most common system and although the method has been highly expanded, the effect of occlusions is challenging and the usage of 2D phenotyping and its usage could be limited for some specific scenarios [
33]. Thus, three-dimensional modeling is rapidly expanding, as the higher accuracy of these models leads to better results in plant breeding programmes [
34] or in decision-making processes in agriculture [
32]. The computational power and the availability of new low-cost sensors favours this development. The two studied methods for 3D reconstruction are both low-budget techniques, which a reach high level of detail. Although, photogrammetry reconstruction leads to more accurate models than RGB-D, the processing and acquisitions cost requires more time. This procedure has shown similar results in other studies; previous approaches using three-dimensional modeling were based on stereovision to reconstruct maize plants, whereby accurate estimations of leaf position and orientation and of the leaf area distribution were undertaken [
35]. Additionally, close-range photogrammetry has been used to calculate similar parameters, such as LAI [
36], which allows managing the crop needs of pesticide applications by using differential sprayers [
37]. The SfM improved the acquisition time by calculating the position of the camera in relation to stereovision [
38]. Similarly to our results, a more expensive high-throughput stereo-imaging system for 3D reconstruction of the canopy structure in oilseed rape seedlings showed the valuable options of the SfM for plant breeding programmes [
39]. In addition, the easy mode of operation allows the SfM to be implemented in on-field applications. Andújar, et al. [
23] reconstructed weed plants by the SfM and MVS to create 3D models of species with contrasting shapes and plant structures. The models showed similar conclusions with good consistency in the correlation equations, as dicot models were more accurate than monocots. Indeed, using a fixed camera position with a tripod increases accuracy. Santos and Oliveira [
40] reconstructed basil plants using the SfM; however, these authors concluded that the method was not suitable for very dense canopies. Another study using MVS-modelled plants provided a good representation of the real scene but did not fully reconstruct branch and stem details [
41]. Every studied case concluded that measuring leaves and stem is possible through different photogrammetry processes. Completely reconstructing small leaves and thin stems to reach a high level of accuracy requires high cost-time procedures and errors are mostly focused on small leaves and stem borders. Increasing the number of images, lowering the distance from the camera and filling holes in the mesh by manual or automatic processes would lead to better models; these processes, however, will increase the cost of model creation.
A compromise in the relation between cost and accuracy must be developed. While the cost of photogrammetry processes is low when the SfM is applied, generating more detailed models would increase time and cost. In this case, RGB-D cameras have shown their lower acquisition time and model processing. However, the terminal parts and small details were not properly reconstructed. Thus, the use of this type of camera can be suitable for on-line methods which requires fast decision. The obtained results showed a slighter underestimation of the calculated values than photogrammetry procedure; however, the acquisition time was much lower. Thus, its use in field applications or situations in which high detail is not demanded, such as agronomical plant management at field level can widely reduce the applied inputs, i.e., the use of this models can be helpful to adapt the use of agrochemicals to the calculated plant volume or LA. The low cost of the procedure and the sensing device could help farmers and machinery developers to develop site-specific applications thought this method. On the other hand, the SfM requires almost 40 images per plant along a concentric track to create a proper model, one second of acquisition time was enough to reconstruct plant shape when Kinect v2 measurements were taken, one second of acquisition was enough to reconstruct plant shape. Thus, the models created with this principle lead to a much faster process of data acquisition and processing. However, SfM is a suitable and budget solution for breeders demanding a low-cost system for decision making processes. The possibilities of the use of depth cameras such as Kinect v2 have been previously probed in other scenarios with similar results. The scanning of lettuce plants showed that Kinect-measured height and projected area have fine linear relationships with actual values [
42]. Similarly, Hämmerle and Höfle [
43] derived crop heights directly from data of full-grown maize and compared the results with LiDAR measurements. The obtained results showed similar values to our study with a general underestimation of crop height, and the combination of more point clouds of Kinect v2 would increase accuracy. However, this approach will increase the acquisition and processing time, and the combination of point clouds would require online approaches for registration [
44]. Thus, an improved solution must be developed to reach the final target. The use of the close photogrammetry method could propose a budget solution in breeding programmes or on-field mapping when decision-making processes need not be quickly undertaken. On the other hand, the current algorithms for RGB-D processing allow Kinect v2 measurements to rapidly create models which can be used in phenotyping scenarios when high resolution is not demanded or for on-field applications.