1. Introduction
Countries are accelerating the development of the construction of power line networks in order to adapt to economic development and address the demand for electricity. As the voltage of transmission lines continues to increase, their length also increases, as well as the coverage of the transmission lines, making it increasingly important to ensure the safe, stable, and efficient operation of the transmission line network. At the same time, the construction process of transmission lines will inevitably pass through forested areas [
1], causing trees near the transmission line corridors to interfere and cause accidents. As a result, the power lines will short-circuit after contact with trees, causing power outages and even fires in severe cases [
2,
3]. In order to avoid the adverse impacts of forest areas on the power corridor transmission system, efficiently obtaining tree species information in and around the transmission line corridor plays an important role in eliminating potential safety hazards [
4,
5]. The management of trees in power line corridors is a critical aspect in ensuring the uninterrupted flow of electricity and safeguarding communities from potential hazards. Implementing comprehensive vegetation management programs can minimize the risk of outages, prevent damage to power lines, and promote a thriving wildlife habitat within transmission corridors.
With the rapid advancement of high-resolution satellites and drones, the utilization of remote sensing images for automated tree identification and counting has become increasingly prevalent [
6]. Among traditional recognition algorithms, support vector machines and random forest algorithms are widely employed due to their higher recognition accuracy. However, ongoing research indicates that their accuracy and speed have largely plateaued, presenting challenges for further improvement [
7]. The recent rapid progress in deep learning has significantly enhanced the detection accuracy and computing speed, particularly through the use of convolutional neural networks in image detection. This improvement can be attributed to the advantages of automatic learning and feature extraction inherent in deep learning techniques [
8]. In the realm of tree species classification, deep learning recognition processing consistently produces classification results that outperform those of other commonly used classifiers, such as support vector machines and random forests [
9]. Hakula et al. [
10], in their study, employed a drone equipped with a multispectral laser scanning system to scan the forest. They utilized layer-by-layer segmentation on dense point cloud data to identify and separate individual trees, subsequently calculating features to aid in identifying different tree species. The researchers categorized co-dominant and dominant tree species, achieving classification accuracy ranging between 92 and 93 percent. Liu [
11] and his collaborators adopted a novel approach by directly abstracting high-dimensional features from three-dimensional data, bypassing the conventional step of converting point clouds into voxels or two-dimensional images. Their methodology involved establishing multi-layer perceptrons, maximum pools, fully connected layers, and shared weights. The resulting deep neural network, incorporating a softmax classifier, automatically extracted high-dimensional features from trees and seamlessly executed tree species classification. Hao [
12] and his team pioneered the exploration of
fir tree detection in artificial forests through the application of the Mask R-CNN network. Their findings affirm the substantial potential of Mask R-CNN in enhancing the accuracy and efficiency of remote sensing for forest resource surveys. Weinstein et al. [
13] established a semi-supervised deep learning model using LiDAR point cloud data. By supplementing a small amount of manually annotated data to the tree species labeling data generated by an unsupervised algorithm, they achieved an average tree detection rate of 82% in the dataset. This outcome serves as compelling evidence that deep learning can significantly enhance the detection results and accuracy. Liu et al. [
14] introduced LayerNet, a point-based deep neural network designed to extract local 3D structural features from LiDAR data. By aggregating features from all layers and utilizing convolution to obtain global features, they successfully classified tree species, achieving the highest classification accuracy rate of 92.5%. Wang et al. [
15] focused on LiDAR data, converting the frontal and lateral projections of point clouds into depth images. Their use of the Faster R-CNN network for the training and identification of the locations of tree trunks in single tree segmentation yielded an accuracy rate exceeding 90%, particularly in overlapping tree trunks. In a different study, Yu et al. [
16] employed three machine learning classification algorithms—a neural network, three-dimensional convolutional neural network (3DCNN), and support vector machine—to identify and compare dominant forest tree species in airborne hyperspectral images. The results demonstrated that 3DCNN exhibited the highest classification accuracy among the three algorithms.
Distinct tree species exhibit varying reflections of ground objects, and the utilization of multispectral data frequently yields more information than a single spectrum [
17]. Osco et al. [
18] applied deep learning for the detection of individual tree crowns. In their study, they examined different band combinations using convolutional neural networks to analyze fruit trees in orchards. The research revealed that a combination of the green, red, and near-infrared bands exhibited excellent performance. Yiannis et al. [
19] implemented a modification by substituting the RGB green light band with the near-infrared band, leveraging the distinct reflection characteristics of plants in the infrared spectrum. This adjustment resulted in an enhancement in recognition accuracy, underscoring the value of leveraging different spectral bands for improved tree species identification.
By 2022, the YOLO network model was available in its seventh iteration. It is best known for its quicker speed, portability, and versatility as a typical one-stage detector algorithm [
20]. As a result, the YOLO v7 algorithm used in this study is currently the most sophisticated in the YOLO series. The YOLO v7 model outperforms all other target detection models in the FPS range of 5 to 160 in terms of speed and accuracy [
21].
Lin [
22] and his team utilized the improved YOLO v4 network for the detection of
larch caterpillar damage to trees, achieving an impressive accuracy rate of 97.5%. The accuracy closely rivals that of the mainstream Faster CNN network, and the detection speed significantly outpaces that of the original two-stage convolutional neural network. In a related context, Jin [
23] incorporated an attention mechanism into the YOLO v4-tiny network to detect dead trees, resulting in accuracy of 93.36%. This marks a notable increase of 9.69% compared to the original setup. While there is limited current research employing the YOLO network to classify tree species in forest areas, its demonstrated accuracy and efficiency in tree identification suggest its viability for species classification.
Presently, the majority of research on tree species classification centers on identifying single tree species or individual species within forest stands. There is a relative scarcity of studies addressing the identification of complex forest structures and mixed forests with multiple tree species. Similarly, research classifying single tree species in transmission line corridors is limited. To address these gaps, this paper employs UAV multispectral remote sensing images to initially extract single tree crowns. Subsequently, tree species are labeled to create a dataset, which is then input into the YOLO v7 network model for parameter learning. The model is trained to discern the distinctive characteristics of single trees in transmission line corridors, ultimately outputting information on the identified single tree species.
4. Discussion
Currently, deep learning in tree species identification primarily focuses on the research of single or dominant tree species, with relatively few studies addressing the identification of individual tree species within mixed forests. Nevertheless, the target detection algorithm model has exhibited commendable accuracy and speed in detecting individual targets within wooded areas.
In this study, we used datasets that were manually annotated on UAV remote sensing images and used data enhancement methods such as Mosaic and Mixup to improve the richness of the dataset. We conducted comparative experiments on the YOLO v7 network on the input methods of different band combinations, and we finally selected a combination of the red, green, and blue bands to achieve the optimal single tree species detection accuracy. Using the YOLO v7 network parameters trained with this band combination, we propose a fast and efficient single tree species identification and classification method. Compared with traditional algorithms, this algorithm performs better in both speed and accuracy.
In comparison to related studies, Qin et al. [
34] applied the watershed algorithm to identify individual trees in subtropical broad-leaved forests, achieving overall accuracy of 72.8% using only RGB images. However, our method excels in mixed forest environments, attaining not only higher recognition accuracy but also mitigating the over-segmentation problem often associated with watershed algorithms.
Furthermore, the choice of the deep learning network significantly influences its classification accuracy. The network model that we employed is widely recognized as an excellent choice for target detection. Zhang et al. [
35] conducted a comparison between the k-nearest neighbor neural network (KNN) and BP neural network, affirming the relatively high accuracy of convolutional neural networks (CNN), providing effective support for this assertion. In the study conducted by Choi [
36], which focused on tree detection around streets, the YOLO v3 model was utilized to train 5480 images up to one million times. The precision and recall achieved were 0.727 and 0.634, respectively. In addition, the results obtained by inputting the research dataset into the YOLO v4 network for training in this study show that the YOLO v7 model has better recognition accuracy and better detection results in identifying single trees and their types in images.
Regarding the selection of feature extraction bands for deep learning in single tree detection, our study ultimately identified a combination of the red, green, and blue bands as the optimal choice. However, Xi [
25] conducted a comparison on the YOLO v3 model, evaluating various band combinations, such as green and blue; near-infrared, red, and green; and blue, red, and near-infrared. The study found that these combinations exhibited the best detection accuracy for urban single tree crowns, with the near-infrared, red, and green bands showing the most effective detection results. The variance in outcomes may be attributed to differences in tree species within distinct study areas and variations in the reflective properties of different trees across different wavelength bands.