Classification of Tree Species in Transmission Line Corridors Based on YOLO v7

Xu, Shicheng; Wang, Ruirui; Shi, Wei; Wang, Xiaoyan

doi:10.3390/f15010061

Open AccessArticle

Classification of Tree Species in Transmission Line Corridors Based on YOLO v7

¹

College of Forestry, Beijing Forestry University, Beijing 100083, China

²

Beijing Key Laboratory of Precision Forestry, Beijing Forestry University, Beijing 100083, China

³

Beijing Ocean Forestry Technology Co., Ltd., Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Forests 2024, 15(1), 61; https://doi.org/10.3390/f15010061

Submission received: 20 October 2023 / Revised: 8 December 2023 / Accepted: 10 December 2023 / Published: 28 December 2023

(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The effective control of trees in transmission line corridors is crucial to mitigate the damage that they can cause to transmission lines. Investigating trees in these corridors presents a significant challenge, particularly in classifying individual tree species. Although the current deep learning model can segment single tree species, it exhibits low recognition accuracy in areas with dense forest canopies. The detection speed is also subject to limitations. To address these challenges, this study relies on aerial multispectral images obtained from drones as the primary data source. The process begins by extracting single tree crowns and establishing a sample dataset, divided in a 9:1 ratio into training and verification sets. Subsequently, the training set undergoes iterative parameter training using the YOLO v7 network. Once optimal parameters are obtained, the system outputs information on individual tree types. The verification sample set is then employed to assess the accuracy. Simultaneously, the YOLO v4 network model is applied to the same data, and the training results of the YOLO v7 network are compared and analyzed, revealing peak accuracy of 85.42% in recognizing single tree species. This approach provides an effective solution, offering reliable data for the in-depth investigation of trees in transmission line corridors and the accurate monitoring of concealed tree hazards.

Keywords:

single tree species identification; transmission lines; machine learning; YOLO v7 network model

1. Introduction

Countries are accelerating the development of the construction of power line networks in order to adapt to economic development and address the demand for electricity. As the voltage of transmission lines continues to increase, their length also increases, as well as the coverage of the transmission lines, making it increasingly important to ensure the safe, stable, and efficient operation of the transmission line network. At the same time, the construction process of transmission lines will inevitably pass through forested areas [1], causing trees near the transmission line corridors to interfere and cause accidents. As a result, the power lines will short-circuit after contact with trees, causing power outages and even fires in severe cases [2,3]. In order to avoid the adverse impacts of forest areas on the power corridor transmission system, efficiently obtaining tree species information in and around the transmission line corridor plays an important role in eliminating potential safety hazards [4,5]. The management of trees in power line corridors is a critical aspect in ensuring the uninterrupted flow of electricity and safeguarding communities from potential hazards. Implementing comprehensive vegetation management programs can minimize the risk of outages, prevent damage to power lines, and promote a thriving wildlife habitat within transmission corridors.

With the rapid advancement of high-resolution satellites and drones, the utilization of remote sensing images for automated tree identification and counting has become increasingly prevalent [6]. Among traditional recognition algorithms, support vector machines and random forest algorithms are widely employed due to their higher recognition accuracy. However, ongoing research indicates that their accuracy and speed have largely plateaued, presenting challenges for further improvement [7]. The recent rapid progress in deep learning has significantly enhanced the detection accuracy and computing speed, particularly through the use of convolutional neural networks in image detection. This improvement can be attributed to the advantages of automatic learning and feature extraction inherent in deep learning techniques [8]. In the realm of tree species classification, deep learning recognition processing consistently produces classification results that outperform those of other commonly used classifiers, such as support vector machines and random forests [9]. Hakula et al. [10], in their study, employed a drone equipped with a multispectral laser scanning system to scan the forest. They utilized layer-by-layer segmentation on dense point cloud data to identify and separate individual trees, subsequently calculating features to aid in identifying different tree species. The researchers categorized co-dominant and dominant tree species, achieving classification accuracy ranging between 92 and 93 percent. Liu [11] and his collaborators adopted a novel approach by directly abstracting high-dimensional features from three-dimensional data, bypassing the conventional step of converting point clouds into voxels or two-dimensional images. Their methodology involved establishing multi-layer perceptrons, maximum pools, fully connected layers, and shared weights. The resulting deep neural network, incorporating a softmax classifier, automatically extracted high-dimensional features from trees and seamlessly executed tree species classification. Hao [12] and his team pioneered the exploration of fir tree detection in artificial forests through the application of the Mask R-CNN network. Their findings affirm the substantial potential of Mask R-CNN in enhancing the accuracy and efficiency of remote sensing for forest resource surveys. Weinstein et al. [13] established a semi-supervised deep learning model using LiDAR point cloud data. By supplementing a small amount of manually annotated data to the tree species labeling data generated by an unsupervised algorithm, they achieved an average tree detection rate of 82% in the dataset. This outcome serves as compelling evidence that deep learning can significantly enhance the detection results and accuracy. Liu et al. [14] introduced LayerNet, a point-based deep neural network designed to extract local 3D structural features from LiDAR data. By aggregating features from all layers and utilizing convolution to obtain global features, they successfully classified tree species, achieving the highest classification accuracy rate of 92.5%. Wang et al. [15] focused on LiDAR data, converting the frontal and lateral projections of point clouds into depth images. Their use of the Faster R-CNN network for the training and identification of the locations of tree trunks in single tree segmentation yielded an accuracy rate exceeding 90%, particularly in overlapping tree trunks. In a different study, Yu et al. [16] employed three machine learning classification algorithms—a neural network, three-dimensional convolutional neural network (3DCNN), and support vector machine—to identify and compare dominant forest tree species in airborne hyperspectral images. The results demonstrated that 3DCNN exhibited the highest classification accuracy among the three algorithms.

Distinct tree species exhibit varying reflections of ground objects, and the utilization of multispectral data frequently yields more information than a single spectrum [17]. Osco et al. [18] applied deep learning for the detection of individual tree crowns. In their study, they examined different band combinations using convolutional neural networks to analyze fruit trees in orchards. The research revealed that a combination of the green, red, and near-infrared bands exhibited excellent performance. Yiannis et al. [19] implemented a modification by substituting the RGB green light band with the near-infrared band, leveraging the distinct reflection characteristics of plants in the infrared spectrum. This adjustment resulted in an enhancement in recognition accuracy, underscoring the value of leveraging different spectral bands for improved tree species identification.

By 2022, the YOLO network model was available in its seventh iteration. It is best known for its quicker speed, portability, and versatility as a typical one-stage detector algorithm [20]. As a result, the YOLO v7 algorithm used in this study is currently the most sophisticated in the YOLO series. The YOLO v7 model outperforms all other target detection models in the FPS range of 5 to 160 in terms of speed and accuracy [21].

Lin [22] and his team utilized the improved YOLO v4 network for the detection of larch caterpillar damage to trees, achieving an impressive accuracy rate of 97.5%. The accuracy closely rivals that of the mainstream Faster CNN network, and the detection speed significantly outpaces that of the original two-stage convolutional neural network. In a related context, Jin [23] incorporated an attention mechanism into the YOLO v4-tiny network to detect dead trees, resulting in accuracy of 93.36%. This marks a notable increase of 9.69% compared to the original setup. While there is limited current research employing the YOLO network to classify tree species in forest areas, its demonstrated accuracy and efficiency in tree identification suggest its viability for species classification.

Presently, the majority of research on tree species classification centers on identifying single tree species or individual species within forest stands. There is a relative scarcity of studies addressing the identification of complex forest structures and mixed forests with multiple tree species. Similarly, research classifying single tree species in transmission line corridors is limited. To address these gaps, this paper employs UAV multispectral remote sensing images to initially extract single tree crowns. Subsequently, tree species are labeled to create a dataset, which is then input into the YOLO v7 network model for parameter learning. The model is trained to discern the distinctive characteristics of single trees in transmission line corridors, ultimately outputting information on the identified single tree species.

2. Materials and Methods

2.1. Materials

2.1.1. Study Area

The study area is located within the jurisdiction of Haikou City, Hainan Province, along the Haikou to Longjiang power transmission corridor, with elevation of 0~60 m. The region falls within the tropical monsoon climate zone. The predominant vegetation includes coconut trees (Cocos nucifera L.), betel nut trees (Areca catechu L.), rubber trees (Hevea brasiliensis), jackfruit trees (Artocarpus heterophyllus Lam.), banyan trees (Ficus microcarpa L. f.), neem trees (Melia azedarach), eucalyptus trees (Eucalyptus spp.), bamboo groves (Bambusoideae), and pine trees (Pinaceae). The geographical location of the research area and a true color image are shown in Figure 1. The total area is approximately 5 hectares, characterized by a complex composition of tree species, with various evergreen broad-leaved trees, shrubs, and tall trees intermingled, exhibiting overlapping patterns. Due to its representativeness, this area was chosen for deep learning feature extraction.

2.1.2. Data Acquisition

The data for this study were collected in June 2022 and were scanned via aerial photography with an M300 UAV equipped with a Changguang Yuchen MS600 Pro UAV-borne multispectral camera(Changguang Yuchen Information Technology And Equipment (Qingdao) Co., Ltd., Qingdao, China). The parameters are shown in Table 1 below. On the day of data collection, the weather conditions were good and suitable for the drones to carry out aerial photography operations. The drones were flown at a height of 120 m during the aerial photography operations, with a side overlap of roughly 45% and a heading overlap of 65%. Six bands were present in the aerial remote sensing images that were obtained. Figure 2 displays the remote sensing true color composite image of the research area.

2.2. Methods

This study employs aerial multispectral remote sensing images captured by drones from high altitudes near the ground. The study area is characterized by dense broad-leaved forests with a large crown width and an overall high canopy density of broad-leaved trees. To address the limitations of traditional single tree species segmentation methods, which can be time-consuming and prone to over- and under-segmentation, the YOLO v7 network is utilized. This network is employed to learn the texture characteristics of tree species in drone images and identify their distinctive features. Recognizing the challenge of extracting multi-dimensional information from remote sensing image sources through the YOLO network, the study tackles this issue by selecting the best band combination. Different band combinations are assessed and compared within the same experimental environment to optimize the extraction of relevant information. In response to the issue of high similarity in drone images, the study employs the Mosaic and Mixup data enhancement techniques. These strategies are utilized to enhance the model’s robustness, mitigating the risk of overfitting and ensuring more accurate and reliable results.

In this study, the following steps are used to identify individual tree species.

Step 1: We construct a dataset using labeled original aerial drone images.

Step 2: Three different band combinations are utilized on the images in the dataset.

Step 3: The Mosaic and Mixup methods are used for data enhancement.

Step 4: We use the YOLO v7 model to carry out feature extraction and training for data under different band combinations to obtain the best parameter set.

Step 5: We input the optimal parameter set and learn to obtain single tree species information.

Step 6: The accuracy is verified. The research process is shown in Figure 3.

2.2.1. Construction of Original Dataset

Training deep learning models for feature extraction often requires a substantial amount of labeled data [24]. The production of the dataset involves three main components: visual interpretation, data annotation, and data partitioning. In the study area under consideration, characterized by diverse vegetation, extensive coverage, and a high density, the process involves capturing remote sensing images using drones and employing manual visual interpretation to construct vegetation sample data. The visual interpretation process includes classifying various types of trees by drawing vector labels, with the crown outline of each individual tree serving as the boundary. Attributes of the vector are then assigned to correspond to different tree species. To streamline the recording and subsequent dataset production, this study employs numerical identifiers from 1 to 6 to represent each tree species, as illustrated in Table 2 below.

Due to the constraints posed by the network model and computer hardware, including limitations in memory and video memory, it is not feasible to directly input the entire remote sensing image into the network model for training. Consequently, there is a need to segment the remote sensing image. To expedite the processing of remote sensing images during model training and enhance the network model’s ability to extract various tree features, the study employed the strategy of dividing the remote sensing images from two drones into 1677 images, each sized 412 × 412 pixels, based on the label positions. This segmentation approach, along with the corresponding labels, facilitates the more efficient handling of the data during model training, overcoming the constraints imposed by the hardware limitations. The data sample description is shown in Table 3.

2.2.2. Data Enhancement

In the realm of deep learning applied to image recognition, be it for target detection or semantic segmentation, a primary challenge is that most models inherently support only three-dimensional data, while the original data often comprise 6 bands. Consequently, there is a need to reduce the dimensionality of the data, transforming them into a three-dimensional input suitable for the network. Widely employed methods of reducing the dimensionality of multispectral data images fall into two categories. One involves characteristic band selection, where bands meeting specific conditions are chosen from multiple bands based on certain principles. The other utilizes a feature extraction algorithm to compress multi-band images into 3-band images [25].

The second challenge arises from the fact that large or general network models typically require tens of thousands of data samples for effective training. However, many smaller research projects may not have access to such extensive datasets. Small datasets often lack sufficient information, leading to issues like insufficient feature extraction or overfitting during model training. Larger datasets, on the other hand, tend to yield more favorable outcomes. Addressing this challenge involves employing data augmentation techniques on the dataset, resolving overfitting issues at the data level and enhancing the model’s generalization. Common data augmentation methods include random cropping, flipping, color dithering, noise injection (Gaussian noise), rotation, translation, scaling, and affine transformation.

As a result, the main methods used in this study to address the aforementioned issues with deep learning network training are band selection comparison, Mosaic data enhancement, and Mixup data enhancement. There are three steps.

The first step is to select the dataset band. As can be seen in Figure 4 below, three different band combinations were chosen for this study, namely (a) the red light band, green light band, and blue light band, to synthesize a true color image in the form of RGB; (b) the near-infrared band, red light band, and green light band, for false color synthesis; and (c) the near-infrared band, green band, and blue band’s synthesis. We choose the band combination with the best detection accuracy after comparing the data from three different band combinations that are input into the network for learning and feature extraction.

In the second step, Mosaic data enhancement is the primary technique employed to augment the dataset. This process involves randomly cropping, zooming, and splicing up to four images into a mosaic picture. The remaining areas are filled with gray edges [26], as depicted in Figure 5. The advantage of this technique lies in the substantial increase in dataset samples. The images are scaled, enhancing the model’s ability to detect the same target at different size scales, thereby significantly improving the model’s robustness. One notable benefit is the considerable enrichment of the background surrounding detected objects [27]. Furthermore, when performing batch normalization (BN), data from the four pictures are computed simultaneously. This approach allows for effective results with a smaller mini-batch size, making it feasible to achieve improved results with a single GPU.

In the third step, the Mixup data enhancement method is applied to enhance the model’s generalization ability and increase the robustness against adversarial samples. This approach also aids in reducing the model’s sensitivity to noisy samples, minimizing the impact of multiple noisy samples on the model. Mixup has demonstrated effectiveness in improving the recognition accuracy in image detection [28]. The main process of Mixup involves the linear transformation of randomly selected images within the training set. The formula for Mixup is as follows:

\tilde{x} = λ x_{i} + (1 - λ) x_{j},

(1)

\tilde{y} = λ y_{i} + (1 - λ) y_{j},

(2)

Among them,

x_{i}

,

y_{i}

and

x_{j}

,

y_{j}

are randomly selected from the training set, and the value of

λ

is in the beta distribution, ranging from 0 to 1. In this study,

λ

is set to 0.5, and the Mixup effect is shown in Figure 6.

2.2.3. Feature Extraction and Training Based on YOLO v7 Network Model

Deep learning algorithms rely on different components in the network architecture, usually including convolutions, fully connected layers, storage units, gates, pooling layers, activation functions, encoding/decoding, etc. [29]. The backbone network of YOLO v7 consists of several BConv layers, E-ELAN layers, and MPconv layers [30]; the network model is shown in Figure 7 below. (1) The BConv layer is made up of a Conv layer, which is a convolution layer, a BN layer, which is a batch normalization layer, and a Silu layer, which is an activation function. (2) The output of the E-ELAN layer contains multiple branches, which are composed of four feature layers, namely two one-time convolution normalized activation functions, one three-dimensional convolution normalized activation function, and one five-time convolution normalized activation function. The structure is shown in Figure 8. (3) The MPconv layer is a cutting-edge transition module responsible for downsampling.

The YOLO v7 model utilizes multi-feature layer extraction, extracting a total of three feature layers positioned at different locations in the main part. Specifically, these layers are situated in the middle layer, middle-lower layer, and bottom layer, respectively. The underlying features are initially extracted using SPPCSPC, as illustrated in Figure 9. This structure expands its receptive field, followed by upsampling and layer stacking. The features obtained from the lower and middle layers are directly stacked, convolved through the feature layer from the next level, and then output to the subsequent step. The three enhanced feature layers obtained through the FPN pyramid are fed into the YOLO Head to generate the prediction results. YOLO v7 employs a Rep Conv structure preceding the YOLO Head, inspired by the Rep VGG concept. The core idea is to introduce a special residual structure to aid training during the training process, primarily employed to reduce the network complexity. This maintains the prediction performance of the network without degradation [31]. Each feature layer can use a convolution to adjust the number of channels. The final number of channels is correlated with the number of categories requiring differentiation. In YOLO v7, three a priori boxes are designated for each feature point on every feature layer, corresponding to small, medium, and large targets to be detected. These boxes are employed to detect objects of various sizes, ultimately producing the detection results.

2.2.4. Single Tree Species Identification Based on YOLO v7 Network Model

Utilizing a dataset with individual tree species information labels, multispectral images captured by drones were selected from different bands and divided into three control groups. The original dataset underwent data enhancement, and the processed data were then loaded into the YOLO v7 network for deep learning feature extraction. Following the extraction of features by the network, the best parameters trained on three different band combinations were employed for the identification of single tree species. The result obtained included the detection of individual trees and the identification of their respective tree species.

2.2.5. Experimental Environment and Parameter Settings

In this experiment, the GPU is an NVIDIA GeForce RTX 3060 Laptop GPU (6 GB), the motherboard is a LENOVO, and the memory is 16 GB. The CPU used for training is an AMD Ryzen 7 5800H. Because the ideal combination of hyperparameters depends on the model itself, as well as the data environment and hardware environment, the choice of hyperparameters in deep learning training has a significant impact on the model training and final results. Multiple experiments were conducted in order to identify the model hyperparameters that were appropriate for this study. When training a target detection model, a variable learning rate mode is used, which modifies the learning rate by decaying at regular intervals. The learning rate decrease method is cosine calculation, and the initial learning rate is set to 0.001 with a minimum learning rate of 0.00001. The SGD optimizer, which employs stochastic gradient descent with momentum, is used, and the weight attenuation is set to 0.00005. The freezing and thawing phases are separated in the training. The network’s main feature extraction training network remains unchanged during freezing training. The feature extraction network changes during the thawing phase, but the training parameter backbone network remains frozen. A total of 300 epochs are allotted for the thawing phase, and 50 epochs are set aside for training during the freezing phase, due to the fact that the two stages’ respective batch sizes, set at 8 and 4, have varying occupancies.

2.2.6. Accuracy Evaluation

Experiments on the sample dataset were performed using a combination of the red, green, and blue bands, and the YOLO v7 network model and the widely used YOLO v4 network model were used. A comparative analysis of the two sets of results was performed by choosing the single tree species with the highest recognition accuracy and average accuracy evaluation index. The YOLO v7 model significantly improves both the recognition of individual tree species and the average accuracy of tree species recognition, as shown in Table 4 below, and it is capable of realizing the identification of tree species in the area. This is in contrast to the previous YOLO v4 version.

2.2.7. Accuracy Evaluation Index

The average precision (AP) and mean average precision (mAP) are typically used in target detection to assess the effectiveness of detection and model performance [32]. The average precision is the average across different categories, i.e., the sum of the average accuracies of all categories divided by the average of all categories, i.e., the average accuracy of all classes in the dataset. The percentage of samples that are correctly classified out of all samples is known as the accuracy rate, and the average accuracy, or AP, is the area under the PR curve for a given class across all samples.

The recall rate R and F1 score were also added in this study’s evaluation of the model’s performance, giving important insights into how the model performed at various levels of confidence. When the average detection accuracy is IoU = 0.5, the mean average accuracy (AP) of all categories is calculated as follows:

P = \frac{T P}{T P + {F P}^{'}}

(3)

P = \frac{T P}{T P + {F N}^{'}}

(4)

F 1 = \frac{2 * P * R}{P + R}

(5)

m A P = \frac{\sum_{i = 1}^{c l a s s e s} A P_{i}}{c l a s s e s} = \frac{\sum_{i = 1}^{c l a s s e s} \int_{0}^{1} P (R) d (R)}{c l a s s e s}

(6)

3. Results

3.1. Experimental Results

In this study, within a carefully designed experimental framework, the YOLO v7 target detection model was trained using a manually labeled training set specific to individual tree species. Subsequently, the trained target detection model was employed to analyze individual tree species across three different band combinations. The comparison of the detection effects for the three band combinations is presented in Table 5, showcasing the final average accuracy (mAP). Notably, under the YOLO v7 network model, the combination of the red, green, and blue bands demonstrates the most effective classification and detection of single tree species, while the combination of the near-infrared, green, and blue bands yields the least favorable detection results.

The final detection results for the YOLO v7 model are presented in Figure 10. In Figure 10b, betel nut trees are marked with an orange box, coconut trees with blue, and jackfruit with green. Most of the single trees are accurately detected, demonstrating precise categorization. In Figure 10b, the sky-blue box indicates the detected single rubber trees. The quantity and quality of the detected rubber trees are relatively average. It is evident that the detection effect is excellent for single trees with distinct characteristics. However, the model’s performance diminishes in areas with a high canopy density or unclear features. Additionally, the detection results suggest that the model performs better for trees with larger crown widths. However, for the detection of single trees with broken patches and small crown widths, the category prediction effect is less satisfactory. This indicates potential areas for improvement in the model’s performance under specific conditions.

3.2. Model Accuracy Evaluation

Following model training, the recall rate R and F1 score, as well as the model’s average precision (AP) and mean average precision (mAP), were computed. The results are shown in Table 6 below. The betel nut tree, with average accuracy of 85.42%, is the tree that has the best detection effect, according to the recognition accuracy and recall rate of different trees. Its characteristics are simple to extract and typically appear in clear spaces on the image. Overlaps with other trees and other tree types’ detection effects have different benefits and drawbacks. The detection abilities of tree species with wider crowns are, generally speaking, better.

3.3. Model Performance Analysis

In deep learning target detection, the loss function serves as a metric that reflects the error between the final prediction result of the model and the actual true values. It provides direct insights into the quality of the training process, the convergence of the model, and the potential occurrence of overfitting [33]. The model loss values in this study are depicted in Figure 11. After 30 epochs, the model’s loss function gradually converges and stabilizes. There is a sudden drop around 220 epochs, after which the loss stabilizes again. The final convergence of the model’s loss value is at 0.125. This observed phenomenon may be attributed to the settings of the model’s learning rate and batch size. It is essential to carefully tune these hyperparameters to achieve optimal training performance and avoid issues like abrupt drops or plateaus in the loss function. Adjustments in these settings could potentially enhance the model’s convergence and stability during training.

4. Discussion

Currently, deep learning in tree species identification primarily focuses on the research of single or dominant tree species, with relatively few studies addressing the identification of individual tree species within mixed forests. Nevertheless, the target detection algorithm model has exhibited commendable accuracy and speed in detecting individual targets within wooded areas.

In this study, we used datasets that were manually annotated on UAV remote sensing images and used data enhancement methods such as Mosaic and Mixup to improve the richness of the dataset. We conducted comparative experiments on the YOLO v7 network on the input methods of different band combinations, and we finally selected a combination of the red, green, and blue bands to achieve the optimal single tree species detection accuracy. Using the YOLO v7 network parameters trained with this band combination, we propose a fast and efficient single tree species identification and classification method. Compared with traditional algorithms, this algorithm performs better in both speed and accuracy.

In comparison to related studies, Qin et al. [34] applied the watershed algorithm to identify individual trees in subtropical broad-leaved forests, achieving overall accuracy of 72.8% using only RGB images. However, our method excels in mixed forest environments, attaining not only higher recognition accuracy but also mitigating the over-segmentation problem often associated with watershed algorithms.

Furthermore, the choice of the deep learning network significantly influences its classification accuracy. The network model that we employed is widely recognized as an excellent choice for target detection. Zhang et al. [35] conducted a comparison between the k-nearest neighbor neural network (KNN) and BP neural network, affirming the relatively high accuracy of convolutional neural networks (CNN), providing effective support for this assertion. In the study conducted by Choi [36], which focused on tree detection around streets, the YOLO v3 model was utilized to train 5480 images up to one million times. The precision and recall achieved were 0.727 and 0.634, respectively. In addition, the results obtained by inputting the research dataset into the YOLO v4 network for training in this study show that the YOLO v7 model has better recognition accuracy and better detection results in identifying single trees and their types in images.

Regarding the selection of feature extraction bands for deep learning in single tree detection, our study ultimately identified a combination of the red, green, and blue bands as the optimal choice. However, Xi [25] conducted a comparison on the YOLO v3 model, evaluating various band combinations, such as green and blue; near-infrared, red, and green; and blue, red, and near-infrared. The study found that these combinations exhibited the best detection accuracy for urban single tree crowns, with the near-infrared, red, and green bands showing the most effective detection results. The variance in outcomes may be attributed to differences in tree species within distinct study areas and variations in the reflective properties of different trees across different wavelength bands.

5. Conclusions

In this study, we endeavored to utilize the YOLO v7 deep learning model to identify tree species in power transmission line corridors through drone aerial photography. The key findings are summarized as follows.

We introduced a single tree species detection method based on the YOLO v7 model. The model exhibited average accuracy of 75.77% in the tropical tree species research area of Hainan Province, with a measured frames per second (FPS) value of 3.39 on the GPU. This model proves effective in rapidly and accurately detecting single tree species in small areas, significantly reducing the manual workload.
Considering the characteristics of the YOLO network, we compared the performance of YOLO v7 in single tree species identification under different band combinations. Among the various combinations, including the red light band, green light band, blue light band, near-infrared band, and combinations of these bands, the red, green, and blue band combination demonstrated the most effective single tree species identification and segmentation.
The YOLO v7 model used in this study exhibited improved average accuracy compared to the YOLO v4-Mobilenet and YOLO v5 models. Additionally, the GPU detection speed was faster, showcasing its superiority in classifying single tree species.

However, certain limitations were noted in the research process. The spatial resolution of the aerial images utilized was low, and the presence of jagged edges in the images could impact the recognition accuracy. The tree species in the study area primarily consisted of broad-leaved trees and others, necessitating further research to extend the methodology to tree species identification in larger areas. Concerning single tree segmentation, in regions with a high tree canopy density, the boundary segmentation of individual trees may not be clear, indicating the need for further model refinement in handling this scenario. Future research considerations involve incorporating an attention mechanism into the model or combining it with other auxiliary data to enhance the information content and improve the feature extraction.

Author Contributions

Conceptualization, S.X. and R.W.; methodology, S.X.; software, S.X. and X.W.; validation, S.X. and W.S.; writing—original draft preparation, S.X.; writing—review and editing, S.X. and W.S.; visualization, S.X. and X.W.; supervision, S.X. and X.W.; project administration, R.W.; funding acquisition, R.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 41971376.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

Author Wei Shi is employed by the company Beijing Ocean Forestry Technology Co., Ltd. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The Beijing Ocean Forestry Technology Co., Ltd. had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

Chen, Y.; Lin, J.; Liao, X. Early detection of tree encroachment in high voltage powerline corridor using growth model and UAV-borne LiDAR. Int. J. Appl. Earth Obs. Geoinf. 2022, 108, 102–740. [Google Scholar] [CrossRef]
Mu, C. Study on Power lines Corridor Features Extraction Method from Different Remote Sensing Data; Wuhan University: Wuhan, China, 2010. [Google Scholar]
Ruan, J.; Tao, X.J.; Wei, X.K. 3D Modeling and Tree Barrier Analysis of Transmission Lines Based on LiDAR Point Cloud Data of Fixed Wing UAV. South Energy Const. 2019, 6, 114–118. [Google Scholar]
Zhang, R.J. Risk assessment of power transmission corridors in forestry area based on multi-source data. Acta Geod. Cartogr. Sin. 2022, 51, 78. [Google Scholar]
Wang, R.; Li, W.; Shi, W.; Su, T. Tree Species Classification of Power Line Corridor Based on Multi-source Remote Sensing Data. Trans. Chin. Soc. Agric. Mach. 2021, 52, 2226–2233. [Google Scholar]
Kipli, K.; Osman, S.; Joseph, A.; Zen, H.; Awang Salleh, D.N.S.D.; Lit, A.; Chin, K.L. Deep learning applications for oil palm tree detection and counting. Smart Agric. Technol. 2023, 5, 100241. [Google Scholar] [CrossRef]
Wang, Z.-W.; Sun, J.J.; Yu, Z.-Y.; Bu, Y.-Y. Review of Remote Sensing Image Classification Based on Support Vector Machine. Comput. Sci. 2016, 43, 11–17, 31. [Google Scholar]
Wang, Q. Research on Classification of Coniferous Tree Specie of Airborne Hyperspectral Images Based on Convolutional Neural Network; Northeast Forestry University: Harbin, China, 2022. [Google Scholar]
Liao, W.; Van Coillie, F.; Gao, L.; Li, L.; Zhang, B.; Chanussot, J. Deep Learning for Fusion of APEX Hyperspectral and Full-Waveform LiDAR Remote Sensing Data for Tree Species Mapping. IEEE Access 2018, 6, 68716–68729. [Google Scholar] [CrossRef]
Hakula, A.; Ruoppa, L.; Lehtomäki, M.; Yu, X.; Kukko, A.; Kaartinen, H.; Taher, J.; Matikainen, L.; Hyyppä, E.; Luoma, V.; et al. Individual tree segmentation and species classification using high-density close-range multispectral laser scanning data. ISPRS Open J. Photogramm. Remote Sens. 2023, 9, 100039. [Google Scholar] [CrossRef]
Liu, M.; Han, Z.; Chen, Y.; Liu, Z.; Han, Y. Tree species classification of airborne LiDAR data based on 3 D deep learning. J. Natl. Univ. Def. Technol. 2022, 44, 123–130. [Google Scholar]
Hao, Z.; Lin, L.; Post, C.J.; Mikhailova, E.A.; Li, M.; Chen, Y.; Yu, K.; Liu, J. Automated tree-crown and height detection in a young forest plantation using mask region-based convolutional neural network (Mask R-CNN). ISPRS J. Photogramm. Remote Sens. 2021, 178, 112–123. [Google Scholar] [CrossRef]
Weinstein, B.G.; Marconi, S.; Bohlman, S.; Zare, A.; White, E. Individual Tree-Crown Detection in RGB Imagery Using Semi-Supervised Deep Learning Neural Networks. Remote Sens. 2019, 11, 1309. [Google Scholar] [CrossRef]
Liu, M.; Han, Z.; Chen, Y.; Liu, Z.; Han, Y. Tree species classification of LiDAR data based on 3D deep learning. Measurement 2021, 177, 109301. [Google Scholar] [CrossRef]
Wang, J.; Chen, X.; Cao, L.; An, F.; Chen, B.; Xue, L.; Yun, T. Individual Rubber Tree Segmentation Based on Ground-Based LiDAR Data and Faster R-CNN of Deep Learning. Forests 2019, 10, 793. [Google Scholar] [CrossRef]
Yu, H.; Tan, B.X.; Shen, M.T. Research on identification of dominant tree species using airborne hyperspectral image based on machine learning algorithm. Remote Sens. Nat. Resour. 2023, 1–10. [Google Scholar]
Cui, B.; Dong, W.; Yin, B.; Li, X.; Cui, J. Hyperspectral image rolling guidance recursive filtering and classification. J. Remote Sens. 2019, 23, 431–442. [Google Scholar] [CrossRef]
Osco, L.P.; Arruda, M.d.S.d.; Marcato Junior, J.; da Silva, N.B.; Ramos, A.P.M.; Moryia, É.A.S.; Imai, N.N.; Pereira, D.R.; Creste, J.E.; Matsubara, E.T.; et al. A convolutional neural network approach for counting and geolocating citrus-trees in UAV multispectral imagery. ISPRS J. Photogramm. Remote Sens. 2020, 160, 97–106. [Google Scholar] [CrossRef]
Ampatzidis, Y.; Partel, V. UAV-Based High Throughput Phenotyping in Citrus Utilizing Multispectral Imaging and Artificial Intelligence. Remote Sens. 2019, 11, 410. [Google Scholar] [CrossRef]
Redmon, J.; Divvala, S.; Girshick, R.; Farhadi, A. You only look once: Unified, real-time object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 779–788. [Google Scholar]
Wang, C.-Y.; Bochkovskiy, A.; Liao, H.-Y.M. YOLOv7: Trainable bag-of-freebies sets new state-of-the-art for real-time object detectors. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 7464–7475. [Google Scholar]
Lin, W.S.; Zhang, J.; He, N. Real-time Detection Method of Dendrolimus superans-infested Larix gmelinii Trees Based on Improved YOLO v4. Trans. Chin. Soc. Agric. Mach. 2023, 54, 304–312, 393. [Google Scholar]
Jin, Y.H. Dead Tree Information Detection Based on Convolution Neural Network; University of Science and Technology Liaoning: Anshan, China, 2022. [Google Scholar]
Kuai, Y. Research on UAV Remote Sensing Vegetation Recognition Method Based on Deep Learning; Anhui University: Hefei, China, 2022. [Google Scholar]
Xi, X.; Xia, K.; Yang, Y.; Du, X.; Feng, H. Urban individual tree crown detection research using multispectral image dimensionality reduction with deep learning. J. Remote Sens. 2022, 26, 711–721. [Google Scholar] [CrossRef]
Zhu, W.; He, Y.; Chen, J.; Ren, W.; Sun, Y. Marine Organism Detection Algorithm Based on Improved YOLOv5. Comput. Digit. Eng. 2022, 50, 1631–1636. [Google Scholar]
Wang, Y.Y. Research on Remote Sensing Image Target Detection Algorithm Based on Depth Learning; Hebei University of Economics and Business: Shijiangzhuang, China, 2023. [Google Scholar]
Zhang, Z.; He, T.; Zhang, H.; Zhang, Z.; Xie, J.; Li, M. Bag of freebies for training object detection neural networks. arXiv 2019, arXiv:1902.04103. [Google Scholar]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Wu, Z.; Chen, M. Lightweight detection method for microalgae based on improved YOLO v7. J. Dalian Fish. Univ. 2023, 38, 129–139. [Google Scholar]
Su, P.C. Research on Dim and Small Target Recognition Technology in Earth Background; Xi’an Technological University: Xi’an, China, 2023. [Google Scholar]
Hou, R.H.; Yang, X.W.; Wang, Z.C.; Gao, J. A Real-Time Detection Method for Forestry Pests Based on YOLOv4-TIA. Comput. Eng. 2022, 48, 255–261. [Google Scholar]
Ma, Y.K.; Liu, H.; Ling, C.X.; Zhao, F.; Zhang, Y. Object Detection of Individual Mangrove Based on Improved YOLOv5. Laser Optoelectron. Prog. 2022, 59, 436–446. [Google Scholar]
Qin, H.; Zhou, W.; Yao, Y.; Wang, W. Individual tree segmentation and tree species classification in subtropical broadleaf forests using UAV-based LiDAR, hyperspectral, and ultrahigh-resolution RGB data. Remote Sens. Environ. 2022, 280, 113–143. [Google Scholar] [CrossRef]
Zhang, C.; Xia, K.; Feng, H.; Yang, Y.; Du, X. Tree species classification using deep learning and RGB optical images obtained by an unmanned aerial vehicle. J. For. Res. 2020, 32, 1879–1888. [Google Scholar] [CrossRef]
Choi, K.; Lim, W.; Chang, B.; Jeong, J.; Kim, I.; Park, C.-R.; Ko, D.W. An automatic approach for tree species detection and profile estimation of urban street trees using deep learning and Google street view images. ISPRS J. Photogramm. Remote Sens. 2022, 190, 165–180. [Google Scholar] [CrossRef]

Figure 1. Location of the study area and its true color image.

Figure 2. UAV remote sensing image of some study areas (the imagery is composed of the red band, green band, and blue band).

Figure 3. Research flow chart.

Figure 4. Three different band combinations.

Figure 5. Mosaic method example (four random pictures are cropped and combined into one picture, and the length and width of the cropping position can be changed randomly).

Figure 6. Mixup method example (interpolate the two images proportionally to mix the samples).

Figure 7. Overview of YOLOv7 network structure. From left to right, it can be roughly divided into the backbone, FPN, and YOLO Head detectors.

Figure 8. Structure of ELAN_Block.

Figure 9. Structure of SPPCSPC.

Figure 10. Single tree identification results. The box represents the single tree detection result, and the upper left corner indicates its confidence level. The dark blue box represents coconut trees, the orange box represents betel nuts, the green box represents jackfruit, and the sky-blue box represents rubber trees.

Figure 11. Model loss curve.

Table 1. Sensor bands and parameters.

Band Name	Wavelength	Band Value Range
Blue	450 nm–35 nm	0–7350
Green	555 nm–27 nm	0–10,420
Red	660 nm–22 nm	0–8737
NIR1	720 nm–10 nm	0–6843
NIR2	750 nm–10 nm	0–12,454
NIR3	840 nm–30 nm	0–10,260

Table 2. Tree species numbers and label quantities in the study area.

Number	Tree Species	Number of Labels
1	Betel Nut	9591
2	Jackfruit Trees	4688
3	Neem Trees	1113
4	Banyan Trees	2336
5	Rubber Trees	2195
6	Coconut Trees	290

Table 3. Dataset sample description.

Category	Value
Sample Labeling Diagram
Number of Training Set Images	1509
Number of Images in the Verification Set	168
Number of Test Set Images	187
Total Number of Dataset Labels	22,790

Table 4. Comparison of single tree species detection accuracy between YOLO v4 and YOLO v7.

Model	Highest Recognition Accuracy for a Single Tree Species	Average Accuracy
YOLO v4	33.27%	29.43%
YOLO v7	85.42%	75.77%

Table 5. Comparison of detection accuracy of single tree species under different band combinations.

Band Combination Class	mAP
Red, Green, Blue	75.77%
NIR, Red, Green	36.74%
NIR, Green, Blue	34.66%

Table 6. Comparison of single tree species detection accuracy indicators of different tree species under YOLO v7 model detection.

Class	AP	F1	Recall	Precision
Betel Nut	85.42%	0.84	76.58%	92.92%
Jackfruit Trees	61.28%	0.58	47.75%	74.56%
Banyan Trees	54.00%	0.52	35.77%	98.00%
Neem Trees	56.84%	0.46	33.00%	78.57%
Rubber Trees	50.68%	0.55	36.41%	66.67%
Coconut Trees	63.27%	0.60	42.86%	100%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xu, S.; Wang, R.; Shi, W.; Wang, X. Classification of Tree Species in Transmission Line Corridors Based on YOLO v7. Forests 2024, 15, 61. https://doi.org/10.3390/f15010061

AMA Style

Xu S, Wang R, Shi W, Wang X. Classification of Tree Species in Transmission Line Corridors Based on YOLO v7. Forests. 2024; 15(1):61. https://doi.org/10.3390/f15010061

Chicago/Turabian Style

Xu, Shicheng, Ruirui Wang, Wei Shi, and Xiaoyan Wang. 2024. "Classification of Tree Species in Transmission Line Corridors Based on YOLO v7" Forests 15, no. 1: 61. https://doi.org/10.3390/f15010061

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Classification of Tree Species in Transmission Line Corridors Based on YOLO v7

Abstract

1. Introduction

2. Materials and Methods

2.1. Materials

2.1.1. Study Area

2.1.2. Data Acquisition

2.2. Methods

2.2.1. Construction of Original Dataset

2.2.2. Data Enhancement

2.2.3. Feature Extraction and Training Based on YOLO v7 Network Model

2.2.4. Single Tree Species Identification Based on YOLO v7 Network Model

2.2.5. Experimental Environment and Parameter Settings

2.2.6. Accuracy Evaluation

2.2.7. Accuracy Evaluation Index

3. Results

3.1. Experimental Results

3.2. Model Accuracy Evaluation

3.3. Model Performance Analysis

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI