*Article* **Automatic Segmentation of** *Mauritia flexuosa* **in Unmanned Aerial Vehicle (UAV) Imagery Using Deep Learning**

#### **Giorgio Morales** *∗***, Guillermo Kemper, Grace Sevillano, Daniel Arteaga, Ivan Ortega and Joel Telles**

National Institute of Research and Training in Telecommunications (INICTEL-UNI), National University of Engineering, Lima 15021, Peru; guillermo.kemper@gmail.com (G.K.); ksevillano@uni.pe (G.S.); darteaga@inictel-uni.edu.pe (D.A.); ivan.ortega91@gmail.com (I.O.); jtelles@inictel-uni.edu.pe (J.T.)

**\*** Correspondence: gmorales@inictel-uni.edu.pe

Received: 30 October 2018; Accepted: 23 November 2018; Published: 26 November 2018

**Abstract:** One of the most important ecosystems in the Amazon rainforest is the *Mauritia flexuosa* swamp or "aguajal". However, deforestation of its dominant species, the *Mauritia flexuosa* palm, also known as "aguaje", is a common issue, and conservation is poorly monitored because of the difficult access to these swamps. The contribution of this paper is twofold: the presentation of a dataset called MauFlex, and the proposal of a segmentation and measurement method for areas covered in *Mauritia flexuosa* palms using high-resolution aerial images acquired by UAVs. The method performs a semantic segmentation of *Mauritia flexuosa* using an end-to-end trainable Convolutional Neural Network (CNN) based on the Deeplab v3+ architecture. Images were acquired under different environment and light conditions using three different RGB cameras. The MauFlex dataset was created from these images and it consists of 25,248 image patches of 512 × 512 pixels and their respective ground truth masks. The results over the test set achieved an accuracy of 98.143%, specificity of 96.599%, and sensitivity of 95.556%. It is shown that our method is able not only to detect full-grown isolated *Mauritia flexuosa* palms, but also young palms or palms partially covered by other types of vegetation.

**Keywords:** *Mauritia flexuosa*; semantic segmentation; end-to-end learning; convolutional neural network; forest inventory

#### **1. Introduction**

The *Mauritia flexuosa* L. palm is the main species of one of the most remarkable ecosystems of the Amazon rainforest: the *Mauritia flexuosa* swamp, also known as "aguajal" [1–3]. Its importance is not only ecological but also social and economic. It is the ecosystem with the greatest carbon dioxide absorption capacity in the Amazon [4,5] and it is habitat of a wide range of fauna [1]. In addition, due to high demand of *Mauritia flexuosa* fruit and derivatives, this species is a key economic engine for the indigenous populations and contributes to their economic and social development [3,6]. Unfortunately, in spite of the stringent government efforts to control deforestation, cutting down *M. Flexuosa* palm trees to harvest their fruits is a common activity [1]. For trees that are harvested, the proportion that is cut versus climbed is unknown, which is why carrying out multidisciplinary studies regarding species population assessment and extraction locations would help to target conservation and management efforts in communities that are hot-spots for extraction [7,8].

Recently, there has been a drastic increase in the use of Unmanned Aerial Vehicles (UAVs) for forest applications due to their low cost, automation capabilities, and the fact that they can support different types of payloads, e.g., RGB or multispectral cameras, LiDAR (Light detection and

Ranging), radar, etc. For instance, UAV photogrammetric data is used to rapidly detect tree stumps or coniferous seedlings in replanted forest harvest areas using basic image processing and machine learning techniques [9,10]. Similarly, UAVs have been used to tackle the problem of tree detection from many perspectives. For example, LiDAR-based methods model the 3D-shape of trees for detection with accuracy values ranging from 86% to 98% [11,12]; however, the high cost of LiDAR for UAVs represents an important limitation. The same limitation occurs with hyperspectral-based methods, such as [13], which uses a hyperspectral frame format camera and an RGB camera along with 3D modelling and Multilayer Perceptron (MLP) neural networks, and obtains accuracy values ranging from 40% to 95% depending on the conditions of the area. Following the idea of exploiting the 3D-shape of trees, some methods perform tree detection from RGB images using generated Digital Surface Models (DSMs), Structure-from-Motion (SfM) or local-maxima based algorithms on UAV-derived Canopy Height Models (CHMs) [14,15]. Nevertheless, the aforementioned methods are likely to show poor performance for trees with irregular canopy, trees in mixed-species forests, or trees that are partially occluded by taller trees.

There exist tree detection methods that use multispectral or RGB cameras and specific descriptors such as crown size, crown contour, foliage cover, foliage color and texture [16]; while others rely on pixel-based classification techniques, such as calculating the Normalized Difference Vegetation Index (NDVI), Circular Hough Transform (CHT) and morphological operators to segment palm trees with an accuracy of 95% [17]. Other methods depend on object-based classification techniques; for example, they use the Random Forest algorithm on multispectral data with an accuracy value of 78% [18], or a naive Bayesian network on high-resolution aerial ortophotos and ancillary data (Digital Elevation Models and forest maps) with an accuracy value of 87% [19].

In recent years, the availability of large datasets and optimal computational resources has allowed for the development of different deep learning techniques, which have now become a benchmark for tackling computer vision problems such as object detection or segmentation. Nevertheless, to the best of our knowledge, few deep learning-based techniques have been proposed to solve the problem of tree detection in aerial images. For instance, the method in [20] used the AlexNet CNN (Convolutional Neural Network) architecture with a sliding window for palm tree detection and counting, obtaining an overall accuracy of 95% over QuickBird images with a spatial resolution of 2.4 m. Similarly, the method in [21] used a pre-trained CNN in combination with the YOLOv2 algorithm to detect Cohune palm trees (*Attalea cohune* C.), with an average precision of 79.5%, and deciduous trees, with an average precision of 67.3%. Furthermore, the method in [22] used Google's CNN Inception v3 with transfer learning and sliding windows to detect coconut trees with a precision of 71% and a recall of 93%. Finally, the method in [23] first segmented aerial forest images into individual tree crowns using the eCognition software and then trained the GoogLeNet model to classify seven tree types with an accuracy of 89%. It is worth mentioning that all of these methods are trained to classify visible tree crowns in the images but do not attempt to delineate or segment the tree crowns; as a consequence, if most of a tree crown is covered by taller trees, trained CNNs are not likely to detect it.

In this work, we present a new efficient method to semantically segment *Mauritia flexuosa* palm trees in aerial images acquired with RGB cameras mounted on Unmanned Aerial Vehicles (UAV). Our aerial images of a *Mauritia flexuosa* swamp located south of the Peruvian city of Iquitos were obtained with three different cameras under different climate conditions. By doing so, we created a publicly available dataset of 25,248 image patches of 512 × 512 pixels, each of them with their respective hand-drawn ground truth. With this dataset, we trained five state-of-the-art segmentation deep learning models and decided to use a model based on the Deeplab v3+ architecture [24], as it showed the best performance. The model was trained to detect and segment *Mauritia flexuosa* crowns at different growing stages and scales, even when only a small part of the crown was visible.

#### **2. Materials and Methods**

#### *2.1. Mauritia flexuosa*

The *Mauritia flexuosa* swamp, also known as "aguajal", is a swamp (humid forest ecosystem) in permanently flooded depressions. Although it is home to more than 500 flora species and 12 fauna species, its dominant species is the *Mauritia flexuosa* palm, also known as "aguaje", which is a palm tree that belongs to the family Arecaceae. In the adult stage, aguajes can grow up to 40 meters (131 feet) in height and 50 centimeters (1.6 feet) in trunk diameter; their leaves are large and form a rounded crown (Figure 1). Each palm tree has an average of eight clusters of fruit, and each cluster produces more than 700 oval-shaped drupes covered in dark red scales [1].

**Figure 1.** Aerial view of a *Mauritia flexuosa* palm.

The extent of *Mauritia flexuosa* swamps in the Peruvian Amazon rainforest is quite significant. An example is the Ucamara depression between the Ucayali and Marañón rivers, in the region of Loreto, whose capital is the Iquitos City. There, the extent of these swamps reaches about four million hectares (10% of the region surface) [3].

In addition to the economic (Iquitos City alone consumes up to 50 metric tons of aguaje a day) [1], social [3] and nutritional value [25] of this palm tree, its environmental importance is also to be highlighted: in 2010, the FAO Forestry Department stated that, for the evaluation period 2002–2008 in an area of 1,415,100 hectares of aguajales, 146,462,850 metric tons of carbon were stored in vegetation (103.5 t/ha) and 141,510,000 metric tons of carbon in soil (100 t/ha), which represents the greatest carbon absorption capacity of all ecosystems in the Amazonian rainforest [5].

Worryingly, cutting down these trees to harvest the fruit of aguaje is affecting several populations of *Mauritia flexuosa* female palms. It is estimated that 17 million of these palms are cut down in the surroundings of Iquitos to meet the demand of the city [1]. This has resulted in the disappearance of female individuals in accessible *Mauritia flexuosa* populations, thus affecting the food chains of such regions (due to their key importance in the diet of the Amazonian fauna) and causing genetic erosion (since the best and more productive palms are cut down). For such reasons, these ecosystems should be properly and continuously monitored so that preventive measures can be taken in order to prevent illegal logging and the disappearance of this important palm tree.

#### *2.2. Image Acquisition*

#### 2.2.1. Study Area

The study area consisted of two regions with different densities of *Mauritia flexuosa*. The one with the higher density was located in the surroundings of Lake Quistococha, south of Iquitos City. The other region was located next to the facilities of the Peruvian Amazon Research Institute (IIAP). Both areas are in Iquitos City, in Maynas Province. Figure 2 shows six orthomosaics corresponding to the regions above.

**Figure 2.** Study area in Iquitos City, Maynas Province, north of Peru.

#### 2.2.2. UAV Imagery

UAV imagery was collected over the years (2015, 2016, 2017 and 2018) under different atmospheric conditions. The flight crew consisted of two pilots and one spotter. We used three UAVs with different camera models; and so, we acquired images with different features. Further details are summarized in Table 1.


**Table 1.** Unmanned aerial vehicles (UAVs) and cameras specifications.

The Sony Nex-7 camera mounted in the Matrix-E UAV was manually configured: the ISO value was 200; the maximum aperture was *f/8*; and the shutter speed was 1/320. The settings of the SkyRanger and the Mavic Pro cameras were set to automatic. Many of the images were acquired near midday with cloud-free conditions (Figure 3a); however, Iquitos is normally covered in big clouds, and that is why we obtained some dark images of forest under shadows (Figure 3b). Some images were also acquired in the afternoon, and due to the angle of incidence of the sun's rays, there were many shadows cast by tall trees (Figure 3c). Moreover, the images acquired with the SkyRanger camera

showed a defect around the corners known as vignetting (Figure 3d). Finally, because we flew at different altitudes, we achieved Ground Sample Distances (GSD) from 1.4 to 2.5 cm/pixel. In summary, we acquired images with different resolutions, white balance settings, light conditions and others defects; nevertheless, *Mauritia flexuosa* palms can still be recognized by any trained human.

**Figure 3.** Aerial images acquired by different UAVs. (**a**) Cloud-free region captured with a Sony Nex-7. (**b**) Shadowed region captured with a Sony Nex-7. (**c**) Aerial image acquired in the afternoon with a Sony Nex-7. (**d**) Aerial image captured by the Skyranger UAV with vignetting. (**e**) and (**f**) Aerial images captured by the Mavic Pro UAV.

#### 2.2.3. MauFlex Dataset

Among all the aerial images acquired over the last four years, we selected 96 of the most representative to create the dataset: 47 were acquired by the TurboAce UAV; 28, by the Mavic Pro UAV; and 21, by the SkyRanger UAV. Each image has a binary hand-drawn mask indicating the presence of *Mauritia flexuosa* palms in white. From these images, we extracted image patches of 512 × 512 pixels.

To analyze the images at different scales, the images captured by the TurboAce UAV were resized to 50% and 25% of their original size due to their high level of detail. In addition, we used data augmentation to increase the dataset size and to prevent overfitting issues; thus, each patch was rotated 90◦, 180◦and 270◦ [26]. This is how we created the MauFlex dataset (See Supplementary Materials) [27], which is made up of 25,248 image patches, each one with its respective binary mask, as shown in Figure 4. We split 95% of the data to create the training set, 2.5% to create the validation set and 2.5% to create the test set. These three sets are independent among them.

**Figure 4.** Samples of original images and shadow masks from the MauFlex dataset.

#### *2.3. Proposed CNN for Segmentation*

We propose a semantic level segmentation of *Mauritia flexuosa* using a Convolutional Neural Network (CNN). The architecture of our network is based on the Deeplab v3+ architecture [24], which integrates an encoder, a spatial pyramid pooling module, and a decoder. Those modules use inverted residual units, atrous convolutions and atrous separable convolutions, which are briefly described below:

• Inverted residual unit: The main feature of a residual unit is the skip/shortcut between input and output, which allows the network to access earlier activations that were not modified by the convolution blocks, thus preventing network degradation problems such as gradient vanishing or exploding when it is too deep [28]. Inverted residuals units were first introduced in [29]; the main difference is that instead of expanding the number of input channels and then shrinking them, inverted residual units (IRUs) expand the input number of channels using a 1 × 1 convolution, then apply a 3 × 3 depthwise convolution (the number of channels remains the same), and, finally, apply another 1 × 1 convolution that reduces the number of channels, as shown in Figure 5. The IRU shown in Figure 5 uses a batch normalization layer ("BN") and a Rectified-Linear unit layer with a maximum possible value of 6 ("ReLU6") after each convolution layer.

**Figure 5.** Inverted residual unit (IRU) used in our proposed network. It uses regular 1 × 1 convolutions ("Conv"), 3 × 3 depthwise convolutions, batch normalization ("BN") and Rectified Linear Unit activation with a maximum possible value of 6 ("ReLU6").

• Atrous convolution: Also known as dilated convolution, it is basically a convolution with upsampled filters [30]. Its advantage over convolutions with larger filters, is that it allows enlarging the field of view of filters without increasing the number of parameters [31]. Figure 6 shows how a convolution kernel with different dilation rates is applied to a channel. This allows for multi-scale aggregation.

**Figure 6.** Atrous convolution kernel (green) dilated with different rates.

• Atrous separable convolution: It is a depthwise convolution with atrous convolutions followed by a pointwise convolution [24]. The former performs an independent spatial atrous convolution over each channel of an input; and the latter combines the output of the previous operation using 1 × 1 convolutions. This arrangement effectively reduces the number of parameters and mathematical operations needed in comparison with a normal convolution.

#### *2.4. CNN Architecture*

As we stated before, our proposed architecture is similar to the Deeplab v3+ architecture [24]. Figure 7 shows our architecture and its three main modules: an encoder, an Atrous Spatial Pyramid Pooling (ASPP) module, and a decoder. The main difference from the original Deeplab v3+ network is the number of layers used.

**Figure 7.** The proposed network architecture. It uses regular convolutions ("CONV"), inverted residual units ("IRU") and atrous separable convolutions ("ASC").

The encoder is a feature extractor that uses several inverted residual units as a backbone and reduces the original size of the image by a factor of eight (*output stride* = 8). The ASPP module applies four parallel atrous separable convolutions with different dilation rates; this allows analyzing the extracted features at different scales. These outputs are concatenated and passed through a 1 × 1 convolution in order to reduce the number of channels. This result is upsampled by a factor of four and concatenated with low-level features of the same dimension. The motivation for doing so is that the structure in the input should be aligned with the structure in the output, so it is convenient to share information from low levels of the network, such as edges or shapes, to the higher ones. Then, we apply two more 3 × 3 separable convolutions and finally, a 1 × 1 convolution with one channel and sigmoid activation, so that a binary mask is obtained. This result is upsampled by a factor of two to recover the original size of the image.

In Figure 7, convolution blocks are denoted as : "CONV;" inverted residual units, as "IRU;" and atrous separable convolution blocks, as "ASC." The output number of filters of each block is reported using the hash symbol ("#"). The stride of all convolutions is denoted as "s." Blocks marked with "S" are "same padded," which means that the output is the same size as the input. "ReLU" represents a standard rectified linear unit activation layer and "BN" a batch normalization layer. If an IRU block is strided, there cannot be a skip between its input and its output; in such cases the "skip" option is set to "False".

#### **3. Results and Discussion**

#### *3.1. CNN Training*

The training algorithm was implemented using Python 3.6 on a PC with Intel i7-8700 at 3.7 GHz CPU, 64GB RAM and a NVIDIA GeForce GTX 1080 Ti GPU. The proposed CNN was trained using an Adam optimizer [32] with a learning rate of 0.003, a momentum term *β*<sup>1</sup> of 0.9, a momentum term *β*<sup>2</sup> of 0.999 and a mini-batch size of 16. The binary cross-entropy function was chosen as our loss function given the fact that it is commonly used for binary segmentation problems and that there is a balance between the amount of pixels of both training classes; thus, it was not necessary to implement specialized loss functions, such as weighted binary cross-entropy function. Figure 8 shows the evolution of network accuracy and loss over training time. After each training epoch, the accuracy and the loss are calculated on the validation set to monitor its ability to generalize and avoid overfitting. The spikes shown in validation loss in epochs 30 and 50, approximately, correspond to a decrease in performance in the training set. This is an expected behaviour during the first training epochs, since the model is still unstable and it is not able to generalize well; however, when the model stabilizes, the validation loss fluctuates with small spikes close to the training loss.

**Figure 8.** Metrics evolution over training time of our proposed network. (**a**) Epochs vs. Accuracy. (**b**) Epochs vs. Loss.

In order to compare the performance of our proposed network with a different segmentation approach, we trained four other networks based on the U-NET structure [33] to compare the results and choose the best one. A U-NET is a network composed of an encoder and a decoder with skip connections that has been widely used for solving segmentation problems. The encoder-decoder structure of the U-NET tends to extract global features of the inputs and generate new representations from this overall information. Because we experienced a sudden drop in the accuracy metric during training, we decided to strengthen our networks by implementing skips between the input and output of each layer with 1 × 1 convolutions in order to equalize the number of channels before the addition operation, thus converting our U-NETs to ResU-NETs [34]. The first implemented network (*U-NET1*) has three layers in the encoder and three in the decoder; each layer has a 3 × 3 convolution block followed by a batch normalization block and a ReLU activation. Furthermore, we added a 10% dropout rate in the decoder layers to prevent overfitting. The second network (*U-NET2*) is similar to the previous one but has four layers in the encoder and four in the decoder. The third (*U-NET3*) and fourth (*U-NET4*) networks have the same structure as the first and the second networks, respectively, but they apply atrous separable convolutions with dilation rates of two instead of regular convolutions. Figure 9 shows the evolution of accuracy and loss of all networks over training time.

**Figure 9.** Comparison of metrics evolution over training time of all networks. (**a**) Epochs vs. Accuracy. (**b**) Epochs vs. Loss.

To statistically analyze the behavior of our network against the other networks, we calculated four metrics from the validation set: accuracy (ACC), precision (PREC), recall/sensitivity (SN), and specificity (SP), as shown in Table 2. The ACC ratio indicates correctly predicted observations against total observations; the PREC ratio indicates correctly predicted positive observations against total predicted positive observations; the SN ratio indicates correctly predicted positive observations against total actual positive observations, and the SP ratio indicates correctly predicted negative observations against total actual negative observations. Additionally, the number of trainable parameters of each network is added in Table 2.

**Table 2.** Metrics Comparison of Different Shadow Detection Methods.


In Table 2 we observe that our method has achieved the highest metric values. Our method is nearly 0.5% more accurate, sensitive and specific when compared to the second best accuracy, sensitivity and specificity values; and nearly 1.5% more precise when compared to the second best precision value. That means that our proposed network is particularly better than the others are at avoiding false positives. Although these differences may not seem significant, we observe in Figures 8 and 9 that only our method shows a little difference between the training and validation values over the training time, meaning that it prevents overfitting problems and has better performance than the other networks when it comes to predicting new samples outside the training set. Furthermore, we notice a huge difference between the number of trainable parameters of *U-NET1* and *U-NET3*, and *U-NET2* and *U-NET4*, although they have similar architectures, proving that using atrous separable convolutions instead of regular convolutions significantly reduces the amount of computation. Finally, another advantage of our method is that it has 34,731 less parameters than U-NET4; thus, it is faster because it has less operations to perform. When evaluating on the test set, the proposed network showed an accuracy of 98.143%, a specificity of 96.599%, and a sensitivity of 95.556%. This represents an unbiased evaluation of the final selected network.

#### *3.2. Mauritia flexuosa Segmentation*

Figure 10 shows the segmentation results of 512 × 512 patches; however, one aerial photograph contains several of these small patches, as its dimensions are much larger (Table 1). Thus, to perform the *Mauritia flexuosa* segmentation of a whole image, we apply a 512 × 512 sliding window across the image in both horizontal and vertical direction with a 50-pixel overlap. This sliding window is processed by the trained CNN in each position. Then, the image is reconstructed with the segmentation results, as shown in Figure 11. In order to avoid discontinuities or discrepancies in the overlapping pixels captured by the moving pixels, we always considered the maximum pixel values. Furthermore, a threshold of 0.5 is applied over the probability map (Figure 11b) to obtain a binary mask as shown in Figure 11c.

**Figure 10.** *Mauritia flexuosa* segmentation results.

**Figure 11.** *Mauritia flexuosa* segmentation result for a whole image. (**a**) Original image. (**b**) *Mauritia flexuosa* probability map. (**c**) *Mauritia flexuosa* binary mask.

#### *3.3. Mauritia flexuosa Monitoring*

The proposed algorithm is designed to be used as a tool by experts from the Peruvian Amazon Research Institute (IIAP). They will acquire aerial images of areas of interest to monitor periodically the approximate amount of *Mauritia flexuosa* palms on a regular basis.

Hundreds of images can be taken in one single flight; using only one of them is not representative enough to analyze a big area, which is why it is necessary to create a georeferenced image mosaic using the GPS information of each image. The elaboration of a mosaic consists of reconstructing a scene in two dimensions from the combination of images acquired with a certain overlap. To carry out this operation, a series of geometric transformations between pairs of images must be estimated, so that when warping one image on another, they can be blended with the least possible error. For this, we use an algorithm that was specifically developed as part of this project to work on areas with abundant vegetation [35]. Figure 12 illustrates two types of mosaics: one made up of RGB images and the other of binary *Mauritia flexuosa* masks. Figure 13 shows five mosaics of areas with different concentration of *Mauritia flexuosa* palms. By doing this, we can analyze large areas and fly periodically to monitor this kind of natural resources.

**Figure 12.** Aerial image mosaic composed of 168 photographs. (**a**) Mosaic of RGB images. (**b**) Mosaic of *Mauritia flexuosa* masks.

**Figure 13.** Aerial image mosaics acquired near Lake Quistococha. (**a**) Mosaics of RGB images. (**b**) Mosaics of *Mauritia flexuosa* masks.

#### **4. Conclusions**

In this paper, we have presented a new end-to-end trainable deep neural network to tackle the problem of *Mauritia flexuosa* palm trees segmentation in aerial images acquired by Unmanned Aerial Vehicles (UAVs).

The proposed model is based on Google's Deeplab v3+ network and has achieved better performance than those of other Convolutional Neural Networks used for performance comparison. With an accuracy of 98.036%, the segmentation results prove to be quite similar to the hand-drawn ground truth masks. What is more, after learning the particular features of *Mauritia flexuosa* and its leaves (e.g. shape, texture, color, etc.), our model , our model is able to detect the presence of *Mauritia flexuosa* palms and segment them even when partially covered by taller trees. Further work will be focused on both segmenting and counting the approximate amount of *Mauritia flexuosa* palms in high-resolution aerial photographs.

**Supplementary Materials:** The dataset are available at http://didt.inictel-uni.edu.pe/dataset/MauFlex\_Dataset. rar, dataset license: CC-BY-NC-SA 4.0.

**Author Contributions:** Conceptualization, G.M.; Methodology, G.M.; Software, G.M.; Investigation, G.M., G.S., D.A., and I.O.; UAV Data Acquisition, D.A., I.O., and G.M.; Writing—original draft preparation, G.M.; Writing—review and editing, G.M., G.K. and J.T.; Supervision, G.K.; Project administration, J.T.

**Acknowledgments:** This research was funded by Programa Nacional de Innovación para la Competitividad y Productividad (Innóvate Perú) grant number 393-PNICP-PIAP-2014. The authors acknowledge the Peruvian Amazon Research Institute (IIAP) for its support during the image acquisition process in the Amazon rainforest.

**Conflicts of Interest:** The authors declare no conflict of interest. The founding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **References**


c 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Application of UAV Photogrammetric System for Monitoring Ancient Tree Communities in Beijing**

#### **Zixuan Qiu 1, Zhong-Ke Feng 1,\*, Mingming Wang 1, Zhenru Li <sup>2</sup> and Chao Lu <sup>3</sup>**


Received: 18 October 2018; Accepted: 23 November 2018; Published: 24 November 2018

**Abstract:** Ancient tree community surveys have great scientific value to the study of biological resources, plant distribution, environmental change, genetic characteristics of species, and historical and cultural heritage. The largest ancient pear tree communities in China, which are rare, are located in the Daxing District of Beijing. However, the environmental conditions are tough, and the distribution is relatively dispersed. Therefore, a low-cost, high-efficiency, and high-precision measuring system is urgently needed to complete the survey of ancient tree communities. By unmanned aerial vehicle (UAV) photogrammetric program research, ancient tree information extraction method research, and ancient tree diameter at breast height (DBH) and age prediction model research, the proposed method can realize the measurement of tree height, crown width, and prediction of DBH and tree age with low cost, high efficiency, and high precision. Through experiments and analysis, the root mean square error (RMSE) of the tree height measurement was 0.1814 m, the RMSE of the crown width measurement was 0.3292 m, the RMSE of the DBH prediction was 3.0039 cm, and the RMSE of the tree age prediction was 4.3753 years, which could meet the needs of ancient tree survey of the Daxing District Gardening and Greening Bureau. Therefore, a UAV photogrammetric measurement system proved to be capable when applied in the survey of ancient tree communities and even in partial forest inventories.

**Keywords:** UAV photogrammetry; forest modeling; ancient trees measurement; tree age prediction

#### **1. Introduction**

Ancient trees are the cornerstone of the natural, agricultural, and urban ecosystems on earth [1–5]. The investigation of ancient tree communities is of great scientific value to the study of biological resources, plant distribution, environmental change, genetic characteristics of species, and historical and cultural heritage [6,7]. In view of the specific situation of the ancient tree community survey, few unique survey patterns have been formed. At present, the traditional forest survey pattern is mainly used in the survey of ancient tree communities. The Daxing District of Beijing has the world's rarest ancient pear tree communities. For the purpose of protecting and managing ancient trees, preventing the malicious and illegal felling of ancient trees, and strengthening the information management of ancient trees, the Daxing District Gardening and Greening Bureau hopes to carry out ancient tree communities monitoring every year. As more than 100,000 pear trees and 40,000 ancient pear trees are scattered over more than 400 km2, great difficulties have arisen in the investigation of ancient tree communities. In addition, the Daxing District Gardening and Greening Bureau can only conduct the survey of ancient pear trees within one month before the Pear Flower Festival every year. Therefore, the high efficiency and real-time performance of the survey of ancient tree communities becomes particularly critical.

Traditional forest survey patterns include ground surveys and aerial remote sensing surveys [8,9]. As ground survey equipment, an electronic total station can accurately measure the three-dimensional coordinates, height, diameter at breast height (DBH), stem volume, and canopy volume of an individual tree [10]. Forest intelligent surveying and mapping instruments, which are portable and internal-external integrated, can be applied in the measurement of height, DBH, and volume of an individual tree. In addition, stand average height, stand average DBH, stand density, and stand volume can be estimated by the angle gauge measure function embedded in the program [11]. A forest telescope intelligent dendrometer is capable of measuring height and DBH of an individual tree from a long range. In addition, stand parameters can be estimated by using the embedded micro sample plot measure function [12]. Even though the above survey instruments have the function of stand parameter estimation, they are more suitable for the accurate measurement of an individual tree than a large-scale forest survey. A terrestrial laser scanning (TLS) system, as an efficient and high-precision measurement method, has gradually become an important means of conducting a forest survey and can visually measure stand structure parameters in three dimensions [13,14]. The main purpose of TLS is to improve the efficiency of forest sample plot monitoring. With the help of the point cloud, automatic acquisition replaces manual measurement of tree attributes, which mainly include tree height, DBH, crown width, and coordinates [15]. The line of sight of TLS is not limited to only a few meters. Several scanning locations are required rather to avoid gaps in the point cloud due to occlusion from terrain or vegetation [16]. This limitation means that using TLS point cloud data to describe large areas of forest space is time-consuming and costly [17].

Airborne laser scanning (ALS) is a good solution for large areas of forest investigation. The ALS systems can generate 3D point cloud data to describe tree height and canopy structure and use other methods to build the relationship between tree heights, canopy structures, and other forest attributes [18]. Many research results showed that the accuracy and precision of the forest survey were satisfying in obtaining forest attributes, such as forest volume and forest biomass, using ALS systems [19–23]. More and more people choose to use high-resolution digital aerial images to generate 3D data just like ALS and apply it to the forest survey [24–26]. The cost of unmanned aerial vehicle (UAV) photogrammetric measurement systems is more acceptable compared to ALS, and attributes such as species composition, maturity, and health status can be acquired through images. Many studies have shown that UAV photogrammetric measurement systems can generate 3D data and the combination of the digital surface model (DSM) and digital elevation model (DEM) can deduce the height of a canopy on the ground, thus producing a canopy height model (CHM) [27]. Generally, UAV photogrammetric measurement systems rely on a ground control point (GCP), which is regarded as a source of reliable georeferencing information [28]. Furthermore, in the UAV photogrammetric measurement systems, additional GCPs are often used to calibrate the location parameters [28]. In general, UAV photogrammetric measurement systems require more than 30 control points per square kilometer, which is undoubtedly extremely difficult and inefficient in a complex forest environment [29]. Therefore, we need a kind of UAV photogrammetric measurement system that can meet the forestry survey requirements. In addition, without a proper sensor geometric calibration, the GCPs will not provide enough accuracy for coordinate correction and that will affect the final 3D model accuracy. Nan An and others imported the converted TIFF images into the Agisoft Photoscan program to generate an orthophoto by correcting perspective distortion [30]. Dongwook Kim and others used Pix4Dmapper software to auto-compensate the principal point and radial distortion by processing the bundle block adjustment [31].

Forest modeling is a type of ecological modeling. With the development of forestry informatization, the relationship between forest modeling and high spatial resolution three-dimensional (3D) remote sensing has become closer [32–35]. Temuulen T. Sankey and others have proposed combining light detection and ranging (LiDAR) data and hyperspectral data for ecological modeling and subtle environmental change detection [36,37]. Remote sensing techniques, in combination with forest modeling, which can be applied to estimate forest biomass and carbon stock [38,39] and to monitor

forest harvest and recruitment [40], are widely used in ecosystem process modeling [41]. Remote sensing techniques such as LiDAR can be combined with the algorithm for individual tree detection [42–44] to obtain tree height and canopy area. Many studies have shown that the use of linear mixed models can well establish the correlation between tree height, canopy area, and DBH [45]. In addition, in the forest modeling study, if DBH and tree height are known, tree growth modeling combined with forest environment can predict tree age well [46,47]. However, the development of these two models requires an expensive monitoring system to effectively monitor the forest environment [48]. The survey of ancient tree communities in Daxing District requires high efficiency and low cost. Therefore, a new forest modeling method is needed to predict DBH and tree age.

This study aims to solve the following main problems:


#### **2. Materials and Methods**

#### *2.1. Profile of Study Sites*

Daxing District (Figure 1) is located in the south area of Beijing, with Tongzhou District to the east, Gu'an County and Bazhou City in the Hebei province in the south, the Yongding River in the west, and the Fangshan, Fengtai, and Chaoyang districts in the north. It has an east longitude of 116◦13 –116◦43 and a north latitude of 39◦26 –39◦51 . The whole district is on the Yongding River alluvial plain. The terrain gradually slopes from the west to the southeast, with an altitude between 14 and 52 m. There are six main rivers in the Daxing District, including the Yongding, Liangshui, Tiantang, Dalong, Xiaolong, and Xinfeng. The Beijing Daxing Wanmu pear orchard is the largest ancient pear tree community with the largest planting area, the earliest flowering, and the most varieties around Beijing. The central area is located in Lihua Village, where more than 40,000 ancient pear trees (Figure 2) have been preserved for over 50 years. There are more than 40 varieties of pear trees in this area, one of which is 417 years old and has been named "gold yellow" by an emperor of Qing dynasty. From Appendix A, we can see the distribution of ancient trees in each town in Daxing District.

#### *2.2. Technical Information*

In this study, by using the YS-500 Fixed-Wing UAV (Figure 3 and Table 1, Beijing Global Forest Technology Co. LTD, Beijing, China) and Sony-A7R camera (Table 2), aerial images of Panggezhuang Town were obtained. The flight area was about 49 km2, and the ground resolution was better than 5 cm. A total of 4984 images were taken through taking off and landing six times, with an average height of 331 m. The fore-and-aft overlap of route planning was 65%, and the side overlap was 75%.

Control points were measured by Yinhe I RTK (Real Time Kinematic) measurement system (made in China South Surveying & Mapping Instrument Company). The specific parameters of RTK are shown in Table 3.

**Figure 1.** Layout of the research area.

**Figure 2.** The scene of ancient trees investigation.

**Table 1.** Parameters of YS-500 Fixed-Wing UAV.



**Table 2.** Calibration parameters of the Sony-A7R camera (measurement by the Chinese Academy of Surveying and Mapping).

**Figure 3.** YS-500 Fixed-Wing Unmanned Aerial Vehicle (UAV).

**Table 3.** Specific parameters of Yinhe I RTK (Real Time Kinematic) measurement system (measurement by the China South Surveying & Mapping Instrument Company).


#### *2.3. Research on the Improved UAV Photogrammetric Program*

SfM is the abbreviation of structure from motion, which is a valuable tool for generating 3D models from 2D images. It is developed from computer vision and conventional photogrammetry. Unlike conventional photogrammetry, SfM uses algorithms to identify matching features in the set of overlapping images and to calculate the camera position and direction in accordance with the

difference of the multiple matching features [49,50]. Based on these calculations, the overlapping images can be used to reconstruct the "sparse" or "rough" three-dimensional point cloud model of the captured object. The model obtained by the SfM method can be further refined by a multi-view stereo (MVS) algorithm, so as to complete the workflow of SfM-MVS [51].

The SfM-MVS method is relatively inexpensive, both in terms of hardware and software requirements. It is faster than other digital measurements in the field and is a process almost independent of spatial scales. In addition, the SfM-MVS can also produce 3D point cloud data with high precision, high density, and high resolution. In some cases, it may even catch up with the terrestrial laser scanner [49].

In this study, the SfM-MVS method was used to detect the feature points of the image, and the scale-invariant feature transform (SIFT) algorithm (Figure 4) was used to detect feature points and generate feature vectors. By identifying the correspondence between feature points on different images, we screen out the image pairs with overlapping parts. Then, the matching was carried out according to the feature vectors, and the RANSAC (random sample consensus) algorithm was used to delete the connection of the conflicting geometric features of corresponding feature points.

**Figure 4.** SIFT algorithm image matching the partial region (blue "+" represents feature points that match between images).

The regional network bundle adjustment method of aerial triangulation contains a beam of light composed of an image as the basic unit of adjustment and the collinearity formula of central projection as the basic formula of adjustment [52,53]. Through the rotation and translation of every light beam in space, the best intersection of the light from the common points between the models can be achieved, and the entire region can be optimally incorporated into the known control point coordinate system [54,55]. The advantage of this algorithm is that it only takes more than four GCPs for the free network bundle adjustment method and the system bundle adjustment method, so as to obtain the correction number of the images, improve the efficiency of the field work, and minimize the measurement error.

*ϕ* is longitudinal tilt, *ω* is lateral tilt, and *κ* is swing angle. *X*, *Y*, *Z* are longitude, latitude, and altitude. Given the profile of the test area, UAV POS (Positioning and Orientation System) data (*X*<sup>0</sup> *i* , *Y*0 *<sup>i</sup>* , *<sup>Z</sup>*<sup>0</sup> *<sup>i</sup>* , *<sup>ϕ</sup>*<sup>0</sup> *<sup>i</sup>* , *<sup>ω</sup>*<sup>0</sup> *<sup>i</sup>* , *<sup>κ</sup>*<sup>0</sup> *<sup>i</sup>* ) and ground control point *Pn*(*Xn*0, *Yn*0, *Zn*0) (*n* ≥ 4), free network adjustment was first conducted (Figure 5). Nine control points were arranged in the four corners of the rectangle, the center of the four edges and the center of the rectangle. Among them, (*uj* , *vj* ) and (*uj*, *vj*) are the corresponding image point to image *i* + 1 and image *i,* respectively. *λ <sup>j</sup>* and *λ<sup>j</sup>* are the corresponding scale factor to image *i* + 1 and image *i*, respectively. *X*<sup>0</sup> *<sup>i</sup>*+1, *<sup>Y</sup>*<sup>0</sup> *<sup>i</sup>*+1, *<sup>Z</sup>*<sup>0</sup> *<sup>i</sup>*+1, *<sup>ϕ</sup>*<sup>0</sup> *<sup>i</sup>*+1, *<sup>ω</sup>*<sup>0</sup> *<sup>i</sup>*+1, and *<sup>κ</sup>*<sup>0</sup> *<sup>i</sup>*+<sup>1</sup> are the elements of exterior orientation of the POS data in the image. *i* + 1, *X*<sup>0</sup> *<sup>i</sup>* , *<sup>Y</sup>*<sup>0</sup> *<sup>i</sup>* , *<sup>Z</sup>*<sup>0</sup> *<sup>i</sup>* , *<sup>ϕ</sup>*<sup>0</sup> *<sup>i</sup>* , *<sup>ω</sup>*<sup>0</sup> *<sup>i</sup>* , and *<sup>κ</sup>*<sup>0</sup> *<sup>i</sup>* are the elements of exterior orientation of POS data in image *i*. *R*<sup>0</sup> *<sup>i</sup>* is the initial value of the rotation matrix composed of the elements of exterior orientation of POS data in image *i*. *R*<sup>0</sup> *<sup>i</sup>*+<sup>1</sup> is the initial value of the rotation matrix composed of the elements of exterior orientation of POS data in image *i* + 1. Formula (1) can be obtained by substituting the values of three corresponding image points.


*j* stands for corresponding image point. *λ*<sup>0</sup> *<sup>j</sup>* and *<sup>λ</sup>*<sup>0</sup> *j* , and *b*1, *b*2, and *b*<sup>3</sup> are calculated beforehand. *uj*, *vj*, *uj* , *vj* , and *f* are coordinates of the corresponding image point. Therefore, we can get the corrections, such as Δ*X*<sup>0</sup> *i*+1,*i* , Δ*Y*<sup>0</sup> *i*+1,*i* , Δ*Z*<sup>0</sup> *i*+1,*i* , Δ*ϕi*+1, Δ*ωi*+1, and Δ*κi*+1.

**Figure 5.** Image correction of the free network adjustment in ancient tree communities.

The selection and positioning of the image control points (Figure 6) is to accurately indicate the position of the image control point on the image. It is the basis of image interpretation and measurement. In the selection and positioning of image control points, the intersection of linear ground objects and the corner of ground objects are generally selected.

According to the correction between images (Figure 7), the adjustment value of each image center is calculated, and the coordinates of each ground control point in the independent coordinate system of images are then calculated. We convert the image coordinate system to the ground coordinate system, so as to obtain the final correction value after the system bundle adjustment method.

From the above, based on the development platform of Microsoft Visual Studio 2010, using C# language, we independently developed software called "New UAV Bundle Adjustment" for image matching and POS correction. The image and POS data processed by this software were imported into Pix4Dmapper software for three-dimensional (3D) points cloud modeling. Pix4Dmapper comes from the Swiss company Pix4D, which is the research result of the world-class research institute EPFL (Swiss Federal Institute of Technology in Lausanne).

**Figure 6.** Selection and positioning of the image control points.

**Figure 7.** Image correction of the regional system adjustment of ancient tree communities.

The number of control points and the layout position will have a great influence on the precision of aerial triangulation. To verify the accuracy of improved UAV photogrammetric program in this study, nine control points were set up in Panggezhuang Town (Figure A1), and a total of 68 field control points were collected by RTK. The flight area was about 49 km2. The same height points were set up on the four sides and the center line of the region. Considering that this test area was a more regular rectangular region, nine points were arranged in the four corners of the rectangle, the center of the four edges, and the center of the rectangle. The remaining 59 points were precision check points. Data acquisition was based on the same camera (Table 2). Data Source A used the improved bundle adjustment for calculation, and Data Source B adopted the conventional aerial measurement method. The layout scheme is shown in Figure 8.

**Figure 8.** Layout scheme.

The precision of orientation points and check points by using the bundle adjustment was evaluated via field measurement. The difference value between the calculated value of ground coordinate and the measured coordinate was considered to be the true error. The root mean square error was calculated according to Formula (2).

$$\begin{cases} \sigma\_X = \sqrt{\frac{\sum \left(X\_{\mathbb{C}} - X\_R\right)^2}{n}}\\ \sigma\_Y = \sqrt{\frac{\sum \left(Y\_{\mathbb{C}} - Y\_R\right)^2}{n}}\\ \sigma\_{XY} = \sqrt{\sigma\_X^2 + \sigma\_Y^2}\\ \sigma\_Z = \sqrt{\frac{\sum \left(Z\_{\mathbb{C}} - Z\_R\right)^2}{n}} \end{cases} \tag{2}$$

In the formula, *σ<sup>X</sup>* and *σ<sup>Y</sup>* were the root mean square error (RMSE) of points in the *X* and *Y* directions. *σ<sup>Z</sup>* was the root mean square error of the points in the elevation. *σXY* was the root mean square error of the points in the plane. *XC*, *YC*, and *ZC* were the measured coordinate values of the check points. *XR*, *YR*, and *ZR* were the calculated coordinate values of check points by using the bundle adjustment.

#### *2.4. Research on the Ancient Tree Information Extraction Method*

The image matching point cloud based on the UAV platform can cover large areas and generate high-density accurate point cloud, and the cost is relatively low. An image matching point cloud is a series of inhomogeneous and discrete point sets in space, which contains certain texture information. A DSM can be obtained through the processing of point cloud data, and a DEM can be obtained through the filtering process [49]. As the image point cloud is a passive remote sensing product, it does not have multiple echoes, and it is difficult to obtain accurate DEM data directly in densely vegetated areas. Therefore, it is necessary to conduct research and analysis in information extraction of forest. This study used LiDAR 360 (Beijing Digital Green Earth Technology Co. LTD, Beijing, China) software to extract ancient tree information.

#### 2.4.1. Classification of Ground Point Based on Point Cloud Data

(1) Point cloud denoising. In the process of point cloud acquisition, some noise points will appear due to equipment inaccuracy and environmental factors. Removal of noise points before data processing can improve data accuracy and reduce the error caused by noise points [56]. The noise removal algorithm adopted in this study includes the high threshold method, the isolated point search method, and the low point search method.

(2) Ground point classification. Ground point refers to the point below ground vegetation or the ground building. The basic idea of extracting the ground point is to assume that the lowest point in an area is its ground point and search these local lowest points to form the initial surface. On this basis, the relationship between other points and the initial surface can be valued. If it conforms to a certain relationship, it will be regarded as the ground point for classification, which is dealt with by multiple iterations. Different geomorphic features have different iterative algorithms and given thresholds [49]. In this study, a hierarchical robust linear predictive filtering algorithm, a morphological filtering algorithm based on gradient, and a TIN stepwise encryption algorithm were used.

#### 2.4.2. Generation of Raster Data from Point Clouds

(1) DSM generation. Point cloud data are irregular three-dimensional discrete points, which need to be interpolated to generate a three-dimensional model with continuous changes [49]. The algorithms used in this study include inverse distance weighted (IDW) interpolation, Kriging interpolation, natural neighbor interpolation, and radial basis function interpolation.

(2) DEM generation. Discrete ground points are interpolated to generate DEM [49]. Firstly, the ground point cloud is rasterized, the ground point is then extracted by the local minimum search window algorithm, and the raster data is interpolated using the TIN interpolation algorithm.

(3) CHM generation. The canopy height model (CHM) is a high-resolution raster dataset that maps the height of the tree to a continuous surface, where each pixel represents the height of the tree above the ground. The method of obtaining CHM is the subtraction of DSM and DEM [49]. Ground fluctuation in the study of ancient tree community causes the bottom of the trees to not be on the same horizontal surface, and the introduction of CHM can solve this problem well. CHM reduces the calculation of tree height to a plane, which can conveniently reflect the information of tree height, as shown in Figure 9.

**Figure 9.** Digital surface model (DSM) and digital elevation model (DEM) schematic.

#### 2.4.3. Segmentation of Individual Ancient Tree

The three-dimensional point cloud is rasterized and transformed into CHM based on the 3D point cloud, from which information such as tree height and crown width can be obtained. In the individual tree segmentation of pear trees, the model algorithm of seed region growth is adopted [49]. The basic method is to form convex hull polygons based on CHM and then reconstruct the canopy along the two-dimensional convex hull with the normalized image point cloud, so as to achieve the purpose of individual tree segmentation. The detailed process is as follows.

(1) A sliding window to detect the location of seed point was defined. The minimum threshold of tree height will be set during processing. When the value is greater than the minimum threshold detected in slide detection, this point is considered as the seed point. (2) The points on CHM are marked and divided into seed points and non-seed points. (3) Four adjacent points near a seed point are searched for to determine whether the seed point to each point is greater than the set crown width threshold on the plane and whether the height is greater than the height threshold. If the above two points are satisfied, it will be regarded as a new seed point, which will be reclassified and processed several times until all seed points and non-seed points are classified. (4) Marked seed points were used as the center to establish a two-dimensional convex envelope marking boundary. (5) The final boundary contour of the generated point boundary polygon is used as the basis to segment the normalized point cloud data and achieve the purpose of individual tree segmentation.

According to image data and impression of the field situation, the position of missed, excessive, and false detection points were edited. The false seed points could be deleted. The excessive seed points could be deleted and the corrected seed points could be reserved. The missed seed points could be added. Then, the individual tree could be segmented. The central location of the tree could be obtained by the center coordinate of prominent position.

#### *2.5. Research on the Ancient Tree DBH and Age Prediction Model*

The height–crown-width–DBH model and "3 speed 2 inflection points" optimum condition growth model of pear trees were developed based on the measured data from a pear tree survey in Lihua Village, Daxing District. In combination with a UAV photogrammetric tree measurement system, the DBH and age of individual trees can be estimated accurately.

#### 2.5.1. Building of the Height–Crown-Width–DBH Model

Data of fixed plots are usually affected by the within-plot and temporal correlation. In order to solve this potential problem of autocorrelation, linear mixed models were fitted to the data by including a random effect of plot (to model spatial correlation) in the model and by specifying an autoregressive error structure (to model temporal correlation) [45]. The fixed plot data of this study is the previous survey data of pear tree communities in Lihua Village (Appendix A, Figure A1), with a total of 1484 pear trees, including only DBH, tree height, and crown width. In 2014, Local Forestry Station investigators made use of the Diameter Ruler and Electronic Total Station (NTS 362R, China South Surveying & Mapping Instrument Company, Guangdong, China) to measure the DBH, height, and crown breadth of pear trees. Therefore, it is more suitable to use the nonlinear model. Referring to the common height–DBH model and crown width–DBH model, the height–crown-width–DBH model (Formula (3)) is obtained.

$$d\_{1\cdot 3} = \mathcal{g}\_1 \cdot H^{\mathcal{q}\_1} + \mathcal{g}\_2 \cdot D^{\mathcal{q}\_2}.\tag{3}$$

*d*1.3 is the diameter at breast height, *H* is the tree height, and *D* is the crown width. *g*<sup>1</sup> and *q*<sup>1</sup> are the factors to establish the correlation between tree height and DBH. *g*<sup>2</sup> and *q*<sup>2</sup> are the factors to establish the correlation between crown width and DBH.

#### 2.5.2. Building of the "3 Speed 2 Inflection Points" Optimum Condition Growth Model

The growth data in this study were 39 ancient pear trees in Lihua Village obtained by specimens of trees, including the annual diameter growth of different types of pear trees. Commonly used tree growth modeling includes the logistic model, the Mitscherlich model [57], the Gompertz model, the Korf model [58], and the Richards model [59]. In the above model fitting process, the sample data will be regarded as a mean tree of the same site and environment. Therefore, it is necessary to propose a growth model, which can not only fit the sample data at different sites and under different environments but also fit the overall growth trend of the sample data. Tree growth consists of three basic processes, namely cell division, cell elongation, and cell differentiation. Theoretically, the growth potential of cells and tissues is unlimited, and their growth should always be exponential. However, since the internal interactions between individual cells or organs limit growth [60], the growth process of trees is divided into the juvenile period, the medium period, and the near-mature period. In this study, the three stages were summarized as "3 speed 2 inflection points," and the overall growth

trend was relatively stable. Different tree growth indexes were assigned to each sample data, which is the composite index of site index, structure index, and growth rate index. By selecting the optimal tree growth index and the tree growth model, the optimal tree growth can be obtained, as shown in Formula (4).

$$\begin{cases} & d\_{1,3} = a\_1 \cdot e^{\frac{-b\_1}{l}} \cdot \left( e^{-b\_4 \cdot t + a\_4} + a\_5 \right) & 0 < t \le t\_1 \\ & d\_{1,3} = a\_1 \cdot e^{\frac{-b\_2}{l} + a\_2} \cdot \left( e^{-b\_4 \cdot t + a\_4} + a\_5 \right) & t\_1 < t \le t\_2 \\ & d\_{1,3} = a\_1 \cdot e^{\frac{-b\_3}{l} + a\_3} \cdot \left( e^{-b\_4 \cdot t + a\_4} + a\_5 \right) & t\_2 < t \end{cases} \tag{4}$$

*d*1.3 is the DBH, *t* is the age of the tree, *a*1, *a*2, *a*3, *b*1, *b*2, and *b*<sup>3</sup> are the parameters of the tree growth model of the overall sample data. *e*−*b*4·*t*+*a*<sup>4</sup> + *a*<sup>5</sup> is the tree growth index model of each independent sample data, and *b*4, *a*4, and *a*<sup>5</sup> are the factors of this model.

#### **3. Results**

#### *3.1. Experiment Preparation*

We choose Beicao Village (Figure A1), Gaodian Village (Figure A3), Houyechang Village (Figure A3), Nandi Village (Figure A1), Qiancao Village (Figure A1), Shilipu Village (Figure A2), Taiziwu Village (Figure A2), Xinzhuang Village (Figure A2), Zhouying Village (Figure A4), and Zhuzhuang Village (Figure A4) as the study area. The parameters of the pear tree were obtained by using an electronic total station, diameter tape, and an increment-borer, and the system accuracy was verified.

#### *3.2. Analysis of the Improved UAV Photogrammetric Program*

According to the computer three-dimensional visual algorithm, the matched points are handled to generate the sparse point cloud (Figure 10). The sparse point cloud is encrypted to obtain the dense point cloud with a geographical reference, as shown in Figure 11. The missing point cloud problem is the most common phenomenon in the photogrammetric system. The main solution of this study is to use missing area images separately for three-dimensional point cloud construction and select the high-precision and high-density point cloud construction options of Pix4Dmapper. The precision results of orientation points and check points by using the bundle adjustment are shown in Table 4.

Analysis of results from different data sources shows that the RMSE in the orientation point and check point of Data Source A was generally less than that of Data Source B. The RMSE in the plane of Data Source B is more than two times higher than that of Data Source A, and the RMSE in the elevation Data Source B is also more than two times higher than that of Data Source A.

**Figure 10.** Sparse 3D point cloud of ancient tree communities.

**Figure 11.** Dense 3D point cloud of ancient tree communities.


**Table 4.** Aerial triangulation accuracy for different data sources.

#### *3.3. Analysis of the Tree Information Extraction Method*

After multiple processing experiments, tree information extraction has the best processing effect by dividing areas (no more than 10 km2). The point cloud data is displayed by the elevation of ancient tree communities, as shown in Figure 12. The classification results of ground points are shown in Figure 13. The DSM (Figure 14a), DEM (Figure 14b), and CHM (Figure 14c) after processing are shown in Figure 14.

**Figure 12.** Point cloud data displayed by elevation of ancient tree communities.

**Figure 13.** Ground point classification of ancient tree communities.

**Figure 14.** The DSM (**a**), DEM (**b**), and canopy height model (CHM) (**c**) of ancient tree communities.

According to the seed points (Figure 15), the prominent position of point cloud data is segmented and the information of individual trees is calculated, as shown in Figure 16.

To verify the measurement accuracy of tree height in the system, the reference tree heights of 745 pear trees were arranged from small to large, and the tree number was reset. The tree height was distributed between 3.02 and 5.42 m (Figure 17). The results (Table 5) show that the measured values were distributed on both sides of the reference values, and the maximum measurement error was 0.688 m. Most of the measurement error was within 0.3 m.

**Table 5.** Verification and analysis of measurement accuracy of tree height and crown width.


In order to verify the measurement accuracy of the crown width of the system, the reference values of 745 pear trees were arranged according to the size, and the tree number was reset. The canopy amplitude was distributed between 3.01 and 12.02 m (Figure 18). The results (Table 5) show that the

measured values are distributed on both sides of the reference value, with the maximum measurement error of 1.165 m. Most of the measurement errors were within 0.5 m.

**Figure 15.** Seed point generated in ancient tree communities.

**Figure 16.** Eye-dome lighting (EDL) display of ancient tree communities.

**Figure 17.** Reference value and measured value distribution of tree height.

**Figure 18.** Reference value and measured value distribution of tree crown width.

#### *3.4. Analysis of the Ancient Tree DBH and Age Prediction Model*

The fitting status of the height–crown-width–DBH model is shown in Table 6. In order to verify the DBH prediction accuracy of the system, the reference values of 745 pear tree data was adjusted according to the size, and the tree number was reset. DBH was distributed between 16.7 and 55.3 cm (Figure 19). The results (Table 7) show that the predicted values were distributed on both sides of the reference value, and the maximum prediction error was 10.33 cm. Most of the prediction errors were within 5 cm.

**Table 6.** Fitting analysis of the height–crown-width– diameter-at-breast-height (DBH) model (*R*<sup>2</sup> = 0.986).


**Table 7.** Accuracy analysis of DBH prediction.

**Figure 19.** Reference value and measured value distribution of DBH.

In the "3 speed 2 inflection points" model, K-means cluster analysis was carried out on the sample data of 39 ancient pear trees, which was divided into three growth stages: 1–26 years as the juvenile period, 27–59 years as the medium period, and ≥60 years as the near-mature period. In addition, the sample of the optimal tree growth index was taken as the mean tree, and the fitting analysis of the "3 speed 2 inflection points" model is shown in Table 8.


**Table 8.** Fitting analysis of "3 speed 2 inflection points" model (*R*<sup>2</sup> = 0.892).

In order to verify the accuracy of the system's age prediction, the reference ages of 745 pear trees were arranged from small to large, and the tree number was reset, and the ages were distributed between 30 and 104 (Figure 20). The results (Table 9) show that the predicted value is distributed on both sides of the reference value, and the maximum prediction error is 16 years. Most of the prediction errors were within 8 years. The number of pear trees identified through the system was 391, and the number of pear trees sampled through the increment borer was 401.

**Table 9.** Age prediction accuracy analysis.


**Figure 20.** Reference value and measured value distribution of tree age.

#### **4. Discussion**

#### *4.1. Comparative Analysis of UAV Photogrammetry in Forestry Application*

#### 4.1.1. UAV Photogrammetry to Obtain Forest Structure

There are many studies on extracting the scale structure information of single trees by UAV photogrammetry. Dandois and others (2010) used cameras mounted on a kite platform to obtain aerial images of different-age forests and same-age forest and reconstructed 3D point clouds through Ecosynth. Image construction CHM can be used to estimate tree height (*R*<sup>2</sup> > 0.64) [61]. However, the above research differs greatly from the accuracy of the estimated tree height by our improved system, mainly because our fixed-wing UAV platform is relatively stable. Zarco-Tejada and others analyzed the near-infrared images of olive trees obtained by fixed-wing UAVs, conducted 3D construction by Pix4UAV software (Swiss company Pix4D), and obtained tree height information from DSM image construction, which had good correlation with ground measured trees (*R*<sup>2</sup> = 0.83, RMSE = 35 cm) [62]. The above research is very close to the accuracy of the tree height estimation of this system, but there is still a certain gap in accuracy. The main reason is that our system matches and corrects UAV image and POS data before 3D construction, which is also an innovative and important means to improve the accuracy of low-cost UAV photogrammetry. Ni and others obtained northern forest aerial images through a multi-rotor UAV, reconstructed the three-dimensional point cloud images with Agisoft Photoscan, compared and analyzed the photogrammetric CHM and the LiDAR CHM, and found that the soil scale forest was highly correlated (*R*<sup>2</sup> = 0.87, RMSE = 1.9 m) [63]. In the above research, a multi-rotor UAV platform is used, so there is still a certain gap compared to the fixed-wing UAV platform accuracy. In addition, White and others used RSG to perform the three-dimensional construction of high-altitude aerial images (point cloud density is 12.27 points/m2) [64]. Because the side overlap rate is too low, only fore-and-aft overlap images are used to match. Image construction point cloud minus LiDAR DEM obtains a terrain-normalized point cloud and performs hierarchical analysis according to the slope and canopy closure change. The image construction point cloud and the LiDAR point cloud feature quantities are statistically significant, and the DBH is the largest difference. The model result is the same. There is no trend between slope and canopy closure. However, the combination of photogrammetry and LiDAR proposed by the institute provides a direction for the future extraction of forest structures with higher precision.

#### 4.1.2. UAV Photogrammetry to Obtain Topography under Tree Crowns

There are many related studies on UAV photography to measure the terrain under the forest. Dandois and others (2010) used aerial images of different forests and same-age forests for 3D construction [61]. By comparing the image construction point cloud DEM with the LiDAR DEM, it was found that the image construction DEM accuracy is low due to the influence of forest canopy occlusion. Dandois and others (2013) used three-dimensional construction to obtain aerial images of deciduous forest growing seasons and deciduous seasons [65]. By comparing the image construction DEM and the LiDAR DEM in the growing season and the deciduous season, it was found that the DEM accuracy (RMSE is 0.89–3.04 m) in the deciduous season images is higher than that in the growing season (RMSE is 2.49–5.69 m). In the difference value between the image construction DEM and the LiDAR DEM, the forest coverage area has a larger DEM difference value than the non-forest area. Wallace and others used the aeronautical aerial image of Eucalyptus forest for 3D construction. The image construction DEM and the LiDAR DEM generally have little difference (the average difference value is 0.09 m). For the high canopy density area, the image construction DEM is not as accurate as the LiDAR DEM. In view of the investigation of ancient tree community, our system chose to take advantage of UAV photogrammetry to extract DEM. The main reason besides the low cost is that the overall density of the ancient pear tree community (7000-60,000 trees/km2) is small, and the surrounding bare ground is extensive. As a result, the accuracy of image construction DEM is higher.

#### *4.2. Error Analysis of Improved UAV Photogrammetric Measurement System*

UAV photogrammetric measurement system is based on studies of SfM-MVS. In the case of tree height and canopy width measurement, errors are caused by all sorts of reasons. As point clouds are scaled, shifted, and rotated into geographic coordinates, the measurement error of each GCP's three-dimensional position will also bring additional registration errors to SfM-MVS. The objective of the modified UAV image matching and correction algorithm in this paper is to minimize the registration errors of SfM-MVS. SfM-MVS can generate data equivalent to TLS over short distances. However, as the measurement distance increases, the accuracy decreases significantly, which is also the main precision limiting factor of UAV photogrammetric tree measurement system. Reasonable flight height can improve the measurement accuracy.

The difference of contrast and terrain texture can also affect the measurement accuracy of the UAV photogrammetric tree measurement system. SfM-MVS also faces many challenges in the vegetation survey, such as the dynamic nature of vegetation, the complexity of vegetation, and the need of many topographic models to filter out vegetation and restore bare surface spots. Although TLS data can also restore vegetation surfaces, most verified data sets are bare land elevation obtained from TS (total station), differential GPS (global positioning system), and other measurement methods. A UAV photogrammetric tree measurement system uses a variety of vegetation filtering algorithms, classifies pixels according to RGB values with multi-scale dimension standards, resampling point cloud at lower resolution, and finally extracts the minimum observation height in a wide and dense vegetation area. In this process of vegetation filtration, the error will increase with the increase of vegetation density. Because of the low stand density of ancient trees, the error is relatively small in the investigation of the ancient tree community. Moreover, for the purpose of protecting the ancient tree community, the Daxing District Gardening and Greening Bureau regularly processed the other vegetation around the ancient tree. Ancient trees, other trees, and bare land are obviously different from each other in images. Therefore, the UAV photogrammetric tree measurement system can be well used for terrain data extraction.

In addition, due to the large imaging distortion residuals of consumer-grade camera, the aerial triangulation calculation of GCPs may lead to the cumulative diffusion of orientation error. Therefore, choosing medium format cameras, high-precision UAV platforms, and optimizing algorithms can not only significantly reduce the uncertainty of the elements of exterior orientation, but also improve measurement accuracy.

#### *4.3. Error Analysis of the Ancient Tree DBH and Age Prediction Model*

Both the growth of DBH and changes in tree rings were influenced by many factors such as climate, environment, and the physiological characteristics of vegetation, especially sensitive to climate change. In the past, the non-linear growth model of trees did not consider factors such as climate and environment. The main reason was that the data source was normal-growth trees, and the purpose was to study the growth curve of the tree. Since ancient pear trees are old, it is difficult to study a growth model by a normal-growth trees method. Forest modeling in this study was combined with UAV photogrammetry to obtain a universal model of the growth of ancient pear trees for the inversion of DBH and tree age. Therefore, by effectively classifying the growth index of each pear tree in the data source, the height–crown-width–DBH model and the "3 Speed 2 Inflection Points" optimum condition growth model can simulate the optimal growth process of ancient pear trees.

There are some uncertainties in the establishment of an ancient tree community model, which is caused by many aspects. There are few studies on the physiological parameters of pear species in China, which makes it unreliable to predict the growth trend of pear trees with their physiological parameters. The model is randomly fitted with its own parameters, which results in a certain amount of error in the accuracy of the simulation results. Due to the lack of climate data in this study, there is no relevant climate data to study the optimal growth process of ancient pear trees, which will reduce the accuracy of the model. In addition, forest modeling in this study also has some limitations, and factors such as external interference activities of the forest ecosystem (such as natural disasters, diseases, and pests) are not taken into account. The improvement of model parameters, the physiological parameters of vegetation, and climate data will increase the accuracy of the model simulation.

The forest modeling experiment results in this paper show that most of the predicted results have good consistency and achieve significant relevant predictions. The variability of individual prediction results is mainly caused by individual differences and special environmental impacts. In addition, the distribution of pear community is relatively scattered, and canopy breadth is less affected by forest density. Therefore, the height–crown-width–DBH model's precision is high. However, some pear trees are affected by site conditions and disease and insect disasters, which produces large prediction errors. Because a small number of pear trees grow in sandy environments, this results in slow DBH growth. Therefore, the predicted age is much smaller than the actual age, but most pear trees have better accuracy in predicting age. The overall level of precision could meet the demand of the ancient tree community survey.

#### **5. Conclusions**

In this study, a UAV photogrammetric measurement system was developed for the investigation of ancient tree communities. Through an improved UAV photogrammetric program and an ancient tree information extraction method, highly efficient and highly precise measurement of tree height and crown width can be achieved. The accurate prediction of DBH and age can be achieved through the construction of a height–crown-width–DBH model and a "3 speed 2 inflection points" optimum condition growth model.

Although the system is aimed at the investigation of ancient tree communities, the improved UAV photogrammetric program and the ancient tree information extraction method can provide a new method for the application of UAV photogrammetry in a forestry survey. In addition, the height–crown-width–DBH model can predict tree diameter with high precision. The study proposed that a "3 speed 2 inflection points" optimum condition growth model based on tree growth index could predict tree age accurately and provide a new method for age identification in a forest survey. Therefore, a UAV photogrammetric tree measurement system can be applied in ancient tree community surveys and even partial forestry surveys. In the future, if forest environment ground monitoring data can be combined, forest modeling research can be improved and achieve the low-cost and high-precision measurement of stand density, stand volume, carbon storage, and biomass.

**Author Contributions:** Z.Q. and Z.-K.F. conceived and designed the experiments; Z.Q. and M.W. performed the experiments; Z.Q., Z.L., and C.L. analyzed the data; Z.Q. wrote the main manuscript. All authors contributed in writing and discussing the paper.

**Funding:** This work was supported by the Medium-to-Long-Term Project of Young Teachers' Scientific Research in Beijing Forestry University (Grant number 2015ZCQ-LX-01), the National Natural Science Foundation of China (Grant number U1710123), and the Beijing Municipal Natural Science Foundation (Grant number 6161001).

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A**

Ancient trees are mainly distributed in Panggezhuang Town (Figure A1), Yufa Town (Figure A2), Anding Town (Figure A3), and Zhangziying Town (Figure A4).

**Figure A1.** Ancient trees distributed in the Panggezhuang Town.

**Figure A2.** Ancient trees distributed in the Yufa Town.

**Figure A3.** Ancient trees distributed in the Anding Town.

**Figure A4.** Ancient trees distributed in the Zhangziying Town.

#### **References**


© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Detection of Coniferous Seedlings in UAV Imagery**

#### **Corey Feduck 1, Gregory J. McDermid 1,\* and Guillermo Castilla <sup>2</sup>**


Received: 26 June 2018; Accepted: 16 July 2018; Published: 18 July 2018

**Abstract:** Rapid assessment of forest regeneration using unmanned aerial vehicles (UAVs) is likely to decrease the cost of establishment surveys in a variety of resource industries. This research tests the feasibility of using UAVs to rapidly identify coniferous seedlings in replanted forest-harvest areas in Alberta, Canada. In developing our protocols, we gave special consideration to creating a workflow that could perform in an operational context, avoiding comprehensive wall-to-wall surveys and complex photogrammetric processing in favor of an efficient sampling-based approach, consumer-grade cameras, and straightforward image handling. Using simple spectral decision rules from a red, green, and blue (RGB) camera, we documented a seedling detection rate of 75.8 % (*n* = 149), on the basis of independent test data. While moderate imbalances between the omission and commission errors suggest that our workflow has a tendency to underestimate the seedling density in a harvest block, the plot-level associations with ground surveys were very high (Pearson's *r* = 0.98; *n* = 14). Our results were promising enough to suggest that UAVs can be used to detect coniferous seedlings in an operational capacity with standard RGB cameras alone, although our workflow relies on seasonal leaf-off windows where seedlings are visible and spectrally distinct from their surroundings. In addition, the differential errors between the pine seedlings and spruce seedlings suggest that operational workflows could benefit from multiple decision rules designed to handle diversity in species and other sources of spectral variability.

**Keywords:** unmanned aerial vehicles; seedling detection; forest regeneration; reforestation; establishment survey; machine learning; multispectral classification

#### **1. Introduction**

The rapid assessment of forest and vegetation structure using unmanned aerial vehicles (UAVs) is likely to decrease the cost of field surveys for a variety of resource industries. UAVs may be particularly well suited for applications in reforestation, because they can collect very high-resolution imagery of small features with great operational flexibility. In Canada, establishment surveys are conducted at every forest-harvest area that has been replanted, to assess the adequacy of spacing, survival, growth, and species composition. For example, the Regeneration Standards of Alberta [1] call for a reconnaissance survey to be conducted three growing seasons after planting, wherein certified forestry technicians walk through the harvest area to visually estimate 'stocking', the percentage of 10 m<sup>2</sup> cells within the block that contain a live seedling at least 30 cm in height, from an acceptable tree species. If the estimated stocking rate is above 84%, the harvest area passes the establishment assessment. If the stocking rate is below 70%, the harvest area is rejected and must be replanted. If the stocking falls between 70%–84%, the harvest area becomes subject to further assessment [1].

If the condition, minimum height, and species of seedlings within the sample cells used to perform establishment surveys could be derived from UAV imagery, then the reduced need for manual surveys could lead to considerable cost savings. However, it needs to be demonstrated that the seedlings can be automatically or semi-automatically detected from a remote sensing. Coniferous seedlings under five years of age in Canada have crown diameters between 5 and 30 cm; imagery of a very high spatial resolution is required to detect the seedlings of this size. Although automated procedures have been used to detect small tree stumps [2], and to detect weed seedlings on bare soil in an agricultural application [3], we could not find any previous research attempting to identify coniferous seedlings of this age in an automated manner. Early studies of remote sensing applications in forestry [4,5] used manual photointerpretation of large-scale aerial photographs acquired from piloted helicopters. For example, Hall and Aldred [5] detected just 44% of seedlings with crown diameter less than 30 cm using 1:500 color-infrared photography. More recently, Goodbody et al. [6] classified 2.4 cm spatial resolution red, green, and blue (RGB) imagery acquired from a UAV over harvest blocks replanted 5 to 15 years earlier in British Columbia, Canada, and obtained user accuracies for coniferous cover between 35% and 97%. However, no attempt was made to detect individual conifer seedlings or saplings, which in their study area were greater than 1 m in height. These authors acknowledged that the potential to detect all stems using aerial remote-sensing technologies is still limited, and called for further research.

Depending on the spatial resolution of the imagery, seedling detection is akin to individual-tree detection in mature forests, which has been well studied using satellites [7], piloted aircraft [4,8–15] and, more recently, UAVs [16–23]. Given a fixed spatial resolution, the accuracy with which individual trees are detected tends to improve with crown size [11]. For example, using 15 cm resolution multispectral airborne imagery and an image-segmentation algorithm, Hirschmugl et al. [24] obtained 70% accuracy on replanted coniferous trees between 5 and 10 years of age, with an average height of 138 cm. The accuracy levels improved for the crown diameters larger than 30 cm. The UAV-based studies of the tree detection have achieved even higher accuracies. For example, Wallace et al. [16] obtained overall accuracies of 97% (*n* = 308) using a tree-crown segmentation algorithm applied to a dense (>50 points/m2) light detection and ranging (LiDAR) point cloud. However, the LiDAR sensors come with equipment costs and operational difficulties that some may wish to avoid. Consumer-grade cameras mounted on UAVs provide an attractive low-cost source of vegetation information over disturbed and regenerating forests [25,26].

In this research, we show how millimetric (i.e., spatial resolution on the order of millimeters) UAV imagery can be used to detect coniferous seedlings less than five years old within forest-harvest areas in Alberta, Canada, with good accuracies using simple processing workflows. This study represents the first step towards creating a larger UAV-based stocking-assessment workflow, which, once realized, could extend to the remote assessment of height (from LiDAR or photogrammetric point clouds), species, and condition (from deep-learning algorithms). Achieving this complete workflow would reduce the need for in situ assessments of forest stocking, and provide a powerful new tool for establishment surveys.

#### **2. Materials and Methods**

#### *2.1. Study Area*

We surveyed two replanted forest-harvest areas located in western Alberta, Canada, for this study (Figure 1). One of the harvest areas was used to develop and train the seedling-detection algorithm, and is hereafter referred to as the 'training study area'. A second block was used as an independent validation site and is hereafter referred to as the 'test study area'. The 20.3 ha training study area is managed by Weyerhaeuser Canada (Pembina Forest Management Area), and was replanted approximately four years before our survey, with a mix of lodgepole pine (*Pinus contorta*) and white spruce (*Picea glauca*) seedlings. The 3.3 ha test study area was also replanted with a mix of lodgepole pine and white spruce seedlings between three and four years before our field survey, although most individuals we encountered in the field were lodgepole pine. The vegetation surrounding

these two harvest areas consist of forests regenerating to mixed stands of aspen (*Populus tremuloides*), lodgepole pine, and white spruce.

**Figure 1.** Location of two replanted forest-harvest areas surveyed for this research in southwestern Alberta, Canada. Sample plots within the harvest areas are depicted with the red dots.

#### *2.2. Reference Data*

Field crews surveyed the test study area on 24 April 2014, and the training study area on 1 October 2015. We timed our visits to exploit the leaf-off seasonal windows in the spring and fall when the coniferous seedlings have increased their spectral contrast with their surroundings. We contend that the 1.5 year time gap between the two surveys is irrelevant, given that testing took place independently of the training. The crews located randomly generated plot centers using handheld global positioning system (GPS) units and established 50 m<sup>2</sup> circular plots with 3.99 m radii (Figure 2). Biodegradable clay targets or plastic boards were placed in the center of the plot and pinned down with metal spikes as ground control points, whose precise locations were recorded with a survey-grade Trimble real-time kinematic (RTK) global navigation satellite system (GNSS) unit. The plot outlines were marked using chalk or spray paint. The crews then recorded the species and precise location of each seedling inside the plot using the RTK GNSS. A total of 254 seedlings within 14 plots in the training study area, and 149 seedlings within 10 plots in the test study area were surveyed in this manner.

#### *2.3. UAV Imagery*

UAV imagery was collected by a flight crew consisting of a pilot and a spotter using a 3DR X8 + octocopter (Figure 3a). Both areas were flown in conjunction with the seedling surveys 24 April 2014 for the test study area and 1 October 2015 for the training study area. Details of the X8+ platform and payload are summarized in Table 1. The platform was modified to carry two cameras simultaneously, one standard RGB camera and a second camera with a modified red-edge (RE) filter. Single-scene images, one RGB and one RE, were acquired for each of the 24 plots during 24 separate flights. The flights took place between 10:00 am and 5:30 pm to cover a variety of lightning conditions. The platform operated a simple automated flight plan, as follows: The X8+ launched, flew to the plot center, and hovered at 15 m above ground level to acquire imagery (single photographs) at a consistent scale (Figure 3b). We installed a LidarLITE laser range finder (vertical accuracy < 2.5 cm) to the UAV, which allowed us to control the altitude of the X8 for imaging. It is important to note that that circular plot did not cover the entire image, but was instead located at the center of each frame. As the UAV hovered directly over each plot prior to image acquisition, the plots were always located very close to the principal point, with the maximum off-nadir angles never exceeding 15 degrees. This reduced the terrain distortion and layover effects. Each flight was less than three minutes in length. The imagery was collected using a sampling approach, avoiding the need for wall-to-wall aerial surveys designed to image the entire harvest area.

**Figure 2.** A sample plot, outlined with paint, measured 3.99 m in radius. The coordinates of both ground control point/center point and seedlings were measured using a survey grade global navigation satellite system (GNSS) unit. Note that the image has been clipped to just the plot extent for the purpose of display.

We used a Nikon Coolpix A digital camera to collect standard RGB imagery and a modified Canon PowerShot S110 to collect imagery in the RE wavelength. We substituted the internal near-infrared (NIR) filter in the Canon PowerShot S110 with an Event 38 near-infrared green blue (NGB) filter, which pushed the red band response to be centered on 715 nm. The aerial imagery was collected at a low altitude of 15 m, resulting in a ground sampling distance of 3 mm for the Nikon Coolpix A (18.5 mm focal length) and 5 mm for the Canon PowerShot S110 (5.2 mm focal length). The cameras were set to a fixed shutter speed of 1/1250 s with varying apertures, and were manually triggered using a remote control.

**Figure 3.** (**a**) The 3DR X8+ unmanned aerial vehicle (UAV) can collect density samples at a rate of 45 s/ha. (**b**) The sample-based flight plan used for this study enables a rapid assessment of large forestry blocks. The UAV moves to a sample plot waypoint, descends to a 15 m altitude above ground level (AGL), captures an image, ascends, and continues to the next waypoint.


**Table 1.** Unmanned aerial vehicle (UAV) and camera specifications. RGB—red green blue. NIR—near-infrared;


#### *2.4. Data Handling and Image Analysis*

The seedling locations were exported to a geographic information system (GIS) point layer, visually confirmed using the UAV imagery, and, if required, spatially edited to be centered within each seedling crown in the image. The seedlings ranged in height from 5 cm to 35 cm, and the crown radii ranged between 5 cm and 50 cm. The tall seedlings were those manually planted about four years

before our study, while the short seedlings regenerated naturally. The wider crowns (up to 50 cm) corresponded generally to clusters of naturally regenerating seedlings, rather than to individuals. We treated these clusters as single entities for the purpose of this study. It is important to note that the planted seedlings (taller, generally isolated from other individuals) are more important than the naturally regenerating seedlings (shorter, sometimes occurring in clusters) when assessing stocking in the planted harvest areas. Not all of the seedlings will survive to maturity, and the planted seedlings have the best chance. Within a cluster, no more than one seedling will typically survive.

The image analysis workflow is a three-step object-based process consisting of (i) image segmentation, (ii) automated classification using a classification and regression tree (CART) machine-learning algorithm, and (iii) the merging of adjacent image objects classified as 'seedlings' into single seedling objects. Priority was placed on creating a workflow that is economic with regards to both the UAV flight time and processing time, so image analyses were conducted on single scenes with minimal preprocessing. The challenge was to identify a classification ruleset that could perform under a variety of target and illumination conditions. We used the CART approach in the training study area to test the importance of the spectral, spatial, and textural variables for this task. On the basis of the results of this testing, we selected a single model for application in the test study area.

Red-edge and RGB imagery were co-registered into 16-bit unsigned raster layers for each sample, using ArcGIS Desktop 10.1 (Figure 4a,b). The images were rectified with first-order polynomial functions, using the seedlings and ground control points as reference marks. It is worth noting that this step—rectifying images to match field data—would not be required in an operational workflow. Additional raster layers were generated from band ratios to gain a spectral contrast between the green seedlings and their non-photosynthetic surroundings (Figure 4c). We used Trimble eCognition Suite 9.1 (www.ecognition.com) to segment the imagery and derive image-object statistics. The input raster for the initial segmentation was a ratio of ratios (red ratio/blue ratio) scaled to the 0–255 interval. The user-defined segmentation parameters were as follows: scale = 50; shape = 0.1; and compactness = 0.3. All of the 48 images (24 RGB and 24 RE) were segmented using the same parameters. We arrived at the final segmentation parameters iteratively through trial and error. Our goal was to develop object primitives that best delineated the seedling edges from their surroundings. The resulting image-objects were then further merged using a homogenous region-growing algorithm, with shape and compactness factors of 0.1 and 0.5, respectively. Once the final image objects were generated for each plot, we assembled a number of attributes for each image object. The final list of spectral, spatial, and textural variables evaluated by the CART approach is summarized in Table 2.

#### *2.5. Machine Learning*

Image-object attributes were exported to a table, resulting in a database with 18,905 records (image-objects from all of the 14 plots in the training study area together). Each record was then classified as either seedling or non-seedling using the CART machine-learning algorithm in the Salford Predictive Modeler (SPM v. 70) software (info.salford-systems.com). This algorithm generates a classification decision tree, with rules that can be used with structured query language (SQL) queries or to build a decision tree in eCognition. Three sets of models were evaluated. All three of the sets used the same spatial and textural attributes as the predictors (Table 2), but the spectral attributes varied as follows: (i) RGB-only variables, (ii) RE-only variables, and (iii) RGB-combined-with-RE variables. The adjacent image objects classified as seedling were merged together using a GIS 'dissolve' function. The detection accuracy of each model was assessed using a 10-fold cross validation procedure. The mean overall seedling classification accuracy was obtained by each model across all of the 10 trials and was assigned as a measure of global accuracy. The most accurate model was applied to the test study area, which served as an independent validation of our workflow.

**Figure 4.** Detail of the UAV image from a sample plot. Left to right: (**a**) red, green, and blue (RGB) (3 mm spatial resolution), (**b**) red-edge (RE) (5 mm spatial resolution), and (**c**) red ratio/blue ratio (from the RGB image).

#### **3. Results and Discussion**

#### *3.1. Model Selection*

The overall classification accuracy of all of the image-objects in the training study area was 96%, 97%, and 97% for the RGB-only variables, RE-only variables, and RGB and RE variables, respectively. It should be noted that the overall accuracy reported here (raw agreement) is a high-level accuracy statistic based on a disproportionately small number of seedling objects (2420) to non-seedling objects (18,663). We report on more detailed error analytics associated with the test dataset below. As the RE-only and RGB and RE models did not result in significant increases in performance (1%), we chose to use the RGB-only model for parsimony.

The final CART decision tree was pruned to a simple two-rule model based on just two spectral vegetation indices, the green-red difference index and the blue-green difference index. We found that reasonable classification models could be generated using an RGB camera alone, and—more importantly—that one classification model could be used to detect coniferous seedling crowns across many sample plots in our study sites imaged under different lighting conditions. An example test-plot classification is shown in Figure 5.

#### *3.2. Detection Accuracy, Error Patterns, and Density Estimates*

To assess the detection accuracy in the test area, we considered a reference seedling detected (i.e., a true positive) if its corresponding geolocation point was inside a seedling object; otherwise, we considered the image object containing the point to be a false negative. Likewise, the seedling objects not containing a seedling geolocation point were considered false positives, and the rest of the non-seedling image objects were accordingly considered true negatives. The overall detection rate (sensitivity) for conifer seedlings in the independent test dataset was 75.8%: 113 out of the 149 seedlings surveyed in the test site were detected (Table 3). The classification model had a commission error (false positive) rate of 12.4% and an omission error (false negative) rate of 24.2%. The moderate imbalance between the omission and commission errors suggests that our workflow

tends to underestimate the seedling density. This is understandable given the small size of the target seedlings and the complex environmental conditions in which they are found. The overall Kappa coefficient was 0.810 and the area under the receiver operating characteristic (ROC) curve was 0.93.


**Table 2.** Spectral, spatial, and textural attributes of image objects used for classification and regression tree (CART) modelling.

**Figure 5.** GNSS-surveyed reference seedlings (**a**) were surveyed in the field by trained personnel. UAV imagery was segmented into image objects (**b**) and subject to a decision-tree classification based on spectral indices from RGB imagery. The final output (**c**) delineates individual seedlings, whose accuracy was checked against the reference data.

We noted differential errors between the pine seedlings (86% detection rate, *n* = 124) and spruce seedlings (24% detection rate, *n* = 25), which were relatively rare in our test study area. This issue could likely be addressed using multiple classification models (one per species), although we did not attempt it here.

Plot-level associations between the CART-predicted stem numbers and field observations produced a Pearson's correlation coefficient of *r* = 0.98 in the test dataset (Figure 6). The plot accuracies in the test dataset ranged between 62.5% and 100%, with a mean accuracy of 86.2%. Once again, we observed a tendency of our workflow to underestimate when large numbers of seedlings are present (top end of Figure 6). This is not a critical problem, although, as plots with large numbers of small seedlings would be considered fully stocked, a moderate underestimation bias in these conditions can be tolerated in practical applications.

**Table 3.** Confusion matrix for the test data (independent validation) set. Interior matrix cells indicate both correctly classified objects (true positives [TP] and true negatives [TN]) and errors (false positives [FP] and false negatives [FN]). Square brackets in the interior cells break down the seedling reference data by species [pine/spruce].


Seedling Commission Errors (false-positive rate) = 12.4%; Seedling Omission Errors; (false-negative rate) = 24.2%; Overall Accuracy = 99.4%; Kappa = 0.810.

Expressing the stem numbers derived from remote sensing on a per-hectare basis produced an estimated seedling density of 2160 stems/ha, versus 2500 stems/ha from the reference data. This represents an underestimation of 320 stems/ha (13.6%), which is consistent with the UAV data's tendency to slightly underestimate seedling counts, noted previously. Puliti et al. [20] estimated the stem numbers of mature trees in 38 fixed-area plots in Norway using photogrammetric data from UAVs, and reported root-mean-square errors of 538 stems/ha (39.2%). While less accurate than our results, the area-based regression analyses used by the authors is quite different than the object-detection approach used here.

**Figure 6.** Association between classified seedling counts and field reference observations in the test-case study.

We could not find any published literature on the automated classification of very small seedlings (less than 5 years of age) against which to compare our results. Hall and Aldred [5] detected just 44% of the seedlings with a smaller than 30 cm crown diameter using the manual interpretation of 1:500 scale color-infrared imagery. Our results are slightly less accurate than those of Sperlich et al. [32], who reported an 88% overall accuracy from the photogrammetric detection of 219 mature tree crowns. Not surprisingly, our accuracy is lower than that in studies using UAVs to detect mature trees in plantation contexts, which present a simpler classification problem, in which individuals are spatially

and structurally homogenous. For example, Torres-Sánchez et al. [21] reported a 95% accuracy using photogrammetric data on 54 olive trees, and Wallace et al. [17] reported a 98% detection rate of 308 eucalyptus plants in rows using UAV LiDAR data. Our results exceed those reported by Ke and Quackenbush [13] (70% user and producer accuracy) in their classification of individual trees in single-scene forest stands from piloted aerial imagery, and those of Chisolm et al. [33] (73% overall accuracy) in a below-canopy LiDAR survey of mature trees.

#### *3.3. Challenges with Small Seedlings and Clusters of Seedlings*

Many of the omission errors we encountered arose from image-segmentation challenges. Small individuals (down to 10 cm height with 5 cm crown diameter) and clusters of seedlings with contiguous crown types posed problems for our workflow (Figure 7). While millimetric spatial resolution helps identify fine features in our environment, it comes at the cost of spatial and spectral heterogeneity. While the CART algorithm is efficient at negotiating this heterogeneity, it can do so at the expense of specificity. To guard against this tendency, we pruned the decision tree to just two rules.

**Figure 7.** Despite using images with ultra-high spatial resolution, segmentation routines had difficulty detecting very small seedlings (**a**) or delineating groups of seedlings with contiguous crowns (**b**). Normally, individuals within clusters could only be delineated when small gaps occurred between crowns (**c**).

#### *3.4. A Sample-Based Approach to Silvicultural Surveys*

A significant portion of our study was devoted to working with a sample-based survey approach, rather than conventional wall-to-wall mapping: as is common in remote sensing. This approach achieves significant time savings in terms of field data collection and avoids additional photogrammetric and orthomosaic processing costs. We estimate that a standard wall-to-wall UAV survey over the training site would require a flight time of 33 minutes, during which the platform would fly 9.8 km (Figure 8a). Alternatively, a sample-based survey of the same area could take place in under six minutes and cover just 2.8 km, with a time savings of 81% (Figure 8b). While we acknowledge that wall-to-wall surveys of forest-harvest areas the size of the ones we worked in are currently possible with UAV platforms, and that wall-to-wall surveys (i.e. census) may provide incremental benefits to sampling, we contend that sample-based flight planning is currently underutilized by the UAV community, and may be crucial to developing operational workflows.

Our workflow is based on spectral variables from a standard RGB camera alone, with no geometric pre-processing and no secondary photogrammetric products. This approach has its pros and cons. For example, the exclusion of spatial and structural variables from our workflow limits our approach to applications where seedlings are spectrally distinct from their surroundings; hence, our requirement for seasonal leaf-off conditions. The main benefit is a streamlined workflow that simplifies the survey and processing procedures. Nex and Remondino [34] estimate that the placement of ground control points and photogrammetric processing constitute 55% of the time effort required to perform a photogrammetric UAV survey, compared to 25% for flight planning and image acquisition. Alternative workflows to ours that incorporate multiple datasets often require precise geometric integration using ground-control points, which must be laid out and surveyed prior to image acquisition. Photogrammetric processing also adds significant computational costs, and may introduce ground-object distortions [35], moaicking artefacts [36], and radiometric inconsistencies. For example, Borgogno-Mondino et al. [37] explain how color-balancing algorithms embedded in commercial image-processing packages can degrade the radiometric quality of the resulting orthomosaics and limit the effectiveness of derived spectral indices. These issues will certainly diminish in time, given the rapid advancement in direct georeferencing technology [34,38], alternative spatial processing routines [39], and integrated sensor systems. As a result, we expect future studies to find incremental value in photogrammetric data for forest-regeneration surveys, as other authors have reported with the detection of mature trees [2,21,32]. In the meantime, the benefits of the simplified workflows for operational projects are substantial.

**Figure 8.** The flight plan for a standard wall-to-wall survey (**a**) is almost six times longer in duration and three times longer in distance than the sample-based approach (**b**) used in this research.

Despite these benefits, our simplified sample-based workflow also contains a number of drawbacks. The lack of geometric correction means that images contain relief distortion and optical lens distortion, which introduce variability into the seedlings' appearance. Excluding the geometric correction from the workflow also introduced variability into the spatial resolution of the resulting images, with the ground-sample distance (GSD) being just a simple function of the flying altitude and camera lens focal length. We reduced this scale effect by outfitting our UAV with a laser rangefinder, which allowed us to hover at exactly 15 m above ground level during imaging. With this, we ensured a common GSD of 3 mm at nadir (5 mm for the NIR camera), which became 3.2 mm at the edge of the circular plot over flat terrain. Even in slightly uneven ground (slopes in our study area did not exceed 10%), the GSD at the low side of a sloping lot did not exceed 3.4 mm. A standard UAV equipped with a conventional GNSS and barometer would be unable to acquire images in such a consistent manner. While the seedlings at the edge of plots appeared up to 30% smaller than those at the center, this effect is unlikely to decrease the detection rate, except perhaps for very small seedlings arising from natural regeneration. The occlusion by taller vegetation could also impair the detection

of those seedlings close to the edge of the plots. Once again, though, this is unlikely to decrease the detection, because most shrubs were devoid of leaves at the acquisition date, and there were no saplings or mature conifer trees within the plots. Additional challenges in our workflow arose from the substantial spectral variability caused by the changing illumination conditions during our flights. While this variability could be reduced with the use of an integrated irradiance sensor, operational workflows could benefit from several classification models designed to account for diversity in seedling species, radiometric conditions, and surrounding vegetation, as well as for other sources of spectral variability. We encourage future researchers to assess the value of incorporating scene-level variables that categorize samples on the basis of brightness, time of day, latitude, greenness, and other factors.

Finally, we note that the Regeneration Standards of Alberta [1] call for between 2.77 and 12.4 sample plots per hectare, depending the size of the harvest area being assessed. In a real-world scenario, this means that we would need to increase our sampling intensity substantially (4 or 5 fold) over the design used in this research. However, it was not our intention to conduct actual stocking assessments, but rather to create and evaluate a workflow that could perform efficiently in this respect.

#### **4. Conclusions**

In this study, we assessed the capacity of optical photography from an unmanned aerial vehicle (UAV), to perform coniferous-seedling detection in an object-based environment. The 75.8% overall detection rate of the ground-surveyed seedlings in an independent test site (*n* = 149) demonstrates the utility of our approach. Error analytics revealed a slight tendency to underestimate seedlings, although the plot-level associations with ground surveys were very high (*r* = 0.98, *n* = 14). Red-edge imagery offered no significant advantage over the data from a standard RGB camera, and our final decision tree was comprised of a simple two-rule model based on just two spectral vegetation indices, the Green-Red Difference Index and the Blue-Green Difference Index. We found spatial and textural variables to be unnecessary for identifying coniferous seedlings in the conditions we assessed. Our workflow relies on seasonal leaf-off conditions when grasses are senesced and seedlings are spectrally distinct from their surroundings, and it would not be expected to perform with the same efficiency at other times of the year, or with deciduous seedlings.

The seedling surveys conducted with UAVs are feasible and efficient, but further research is required. For example, we expect that the increased variability encountered under operational conditions will require the complementary use of several models or the application of more sophisticated machine-learning approaches. We encourage other researchers to explore the detectability of other seedling species in new environments, and at different times of the year. Also, a full stocking-assessment workflow would require delineated seedlings to be assessed for other attributes, including height, species, and condition: challenges that will probably require enhanced spectral information and 3D data collected using light detection, ranging or photogrammetry.

The trend towards quantified vegetation surveys at this level of detail is a promising development, both for forest management and in the larger context of restoration. Perhaps the most useful area for the future development of the use of UAVs in forest management is repeatability. This is a progressive approach, as the future use of UAVs may not depend as much on the correlation of the metrics derived from the data collected with unmanned aerial vehicles to field observations as on simple data and mensuration consistency. Wallace et al. [18] have been pioneers in this regard, publishing a study focused on the repeatability of the measurements taken using unmanned aerial vehicles. We encourage other researchers to assist in the development and reporting of forest mensuration workflows based on remote sensing to establish a body of literature that provides a foundation from which the consistency and repeatability of these novel techniques can be assessed.

**Author Contributions:** C.F. conceived of the study, performed the field work, operated the UAV, completed the data analysis, and wrote the first draft of the manuscript. C.F., G.J.M., and G.C. designed the experiments and interpreted the results. G.C. and G.J.M. edited the manuscript and prepared the final draft for publication.

**Acknowledgments:** This research is part of the Boreal Ecosystem Recovery and Assessment (BERA) project (www.bera-project.org), and was supported by a Natural Sciences and Engineering Research Council of Canada Collaborative Research and Development, Grant (CRDPJ 469943-14), in conjunction with Alberta-Pacific Forest Industries, Cenovus Energy, ConocoPhillips Canada, and Devon Canada Corporation. The BERA funding facilitated both the research activities and subsequent open-access publishing of this work. Jamie Miller (Weyerhaeuser Grand Prairie) and Victor Fobert (Weyerhaeuser Pembina) provided access to the forest-harvest blocks used for this study. We also thank Jennifer Thomas for her editorial review. The comments of three anonymous reviewers helped us improve the manuscript.

**Conflicts of Interest:** The authors declare no conflict of interest. The funding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

#### **References**


© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Forests* Editorial Office E-mail: forests@mdpi.com www.mdpi.com/journal/forests

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18

www.mdpi.com