1. Introduction
Forest resource inventory is an essential component of forestry, reflecting the quantity, quality, and dynamic changes of forest resources. Identifying tree species in the survey area is a primary goal of forest resource inventory. To identify tree species in survey areas, surveyors typically rely on visual methods based on external characteristics, such as roots, stems, leaves, flowers, and fruits. For trees that cannot be visually identified, surveyors usually need to collect specimens and consult reference materials. This manual identification process requires solid expertise in dendrology and is often time consuming, costly, and inefficient. Machine learning can be applied to identify tree species and improve the efficiency of forest resource inventory. In machine learning and cognitive science, neural networks are models that simulate the structures and functions of biological neural networks [
1,
2]. After training, deep neural networks can automatically learn to extract features from large-scale, diverse, high-dimensional complex data and perform efficient classification, prediction, and pattern recognition [
3].
Early studies of tree species identification usually used artificial neural networks or support vector machines to analyze the hyper-spectral features of trees, and there were also multiple methods that used texture feature extraction and descriptors to assist in tree classification [
4,
5,
6,
7,
8,
9,
10,
11,
12,
13]. Although these methods reduced some manual costs of classification, traditional machine learning methods usually required human design or selection of texture features and descriptors, which were subjective and often could not fully express the complexity and diversity of textures, and lost some information during post-processing operations, such as dimensionality reduction [
14,
15,
16]. Therefore, the efficiency of traditional machine learning methods needs to be improved.
Deep learning is a machine learning method that uses deep neural networks to extract features and automatically learn data representations. Compared to traditional machine learning methods, deep learning can better discover and extract the feature information in datasets, thus having a higher generalization ability. The development of deep learning algorithms led to the increase in network layers, which makes convolutional neural networks or multi-feature recognition networks ever-more applicable in species classification and identification based on forest images [
17,
18,
19,
20]. For tree species identification, scholars who use deep learning models usually focus on leaves, flowers, fruits, or tree shapes. The classifier of the network usually recognizes the entire image as a whole [
21], and the mixed background value can easily confuse the original representation of the network. Tree organs, such as leaves and flowers, are usually difficult to distinguish from the background (noise) during image acquisition, making it challenging to obtain precise image information for leaves and flowers found on tall trees. In addition, flowers and fruits are only present at certain times of the year, while leaf information is unavailable during defoliation. Consequently, there are many challenges involved in species identification using images of tree organs in forest resource inventory. The morphological characteristics of the bark, which is the outermost layer of the stems and roots of woody plants, are important features involved in distinguishing tree species. Employing tree bark as an identifier has several advantages compared to leaves. Most tree bark shapes are stable, unless subjected to irreversible disasters (e.g., forest fires), and tree bark textures do not change or change little with seasonal change [
11].
There are few existing studies of tree species identification based on bark texture features that used deep learning algorithms. Most of these studies used ResNet [
22] network as the basis for analysis and adopted deeper convolutional layers to achieve higher accuracy when performing bark texture recognition. Carpentier, M. et al. publicized a bark dataset called BarkNet 1.0, which is also the largest publicly available dataset, and a high tree species classification accuracy of 93.88% using ResNet18 and ResNet34 [
23] was achieved. Misra et al. implemented an alternative classification method using patch-based convolutional neural networks that fine-tuned the network’s patch predictions and determined the image category via majority voting with an ensemble-based classifier [
24]. Robert et al. developed DeepBark, which is a model capable of detecting bark surfaces under high background brightness [
25]. Faizal achieved promising results on BarkVN-50 using a deeper network called ResNet101 [
26]. Kim et al. trained VGG-16 and EfficientNet, obtained an identification accuracy value above 90%, and applied class activation mapping (CAM) aggregation to identify the critical classification features for each tree species [
27]. Therefore, the image recognition technology of the convolutional neural network has great practical value for bark identification, which can quickly and accurately identify tree species, making forest resource surveys more intelligent and efficient. Using this technique, forestry workers can collect tree bark images on-site and upload them to the server for identification. By optimizing data collection and image processing, this identification technique can meet the needs of forest survey fieldwork teams in terms of efficiency and accuracy.
Most existing studies on tree species identification use pre-trained weights of networks that were trained on ImageNet, rather than using bark images as the pre-training data [
22,
23,
24,
25,
26,
27]. This method could lead to some misclassification and performance degradation. Many researchers use relatively backward networks to work on image classification of tree species [
22,
23,
24,
25,
26] (like ResNet). Compared to some algorithms developed in recent years, the performance of traditional convolutional neural networks is inferior. Furthermore, due to the difficulty in demonstrating the identification processes of deep convolutional neural networks on images, only a few studies combine network vision with the biological features of tree species, thus failing to reveal the effectiveness of network classification results.
In this paper, we pre-train three ConvNeXt networks using different depths on the bark dataset. The three research objectives are as follows: (1) to compare the performances of these ConvNeXt networks; (2) to analyze the biological features that led to the discrepancies in classification accuracy values between different tree species; and (3) to explore the relationships between the visual attractiveness and biological features of bark images of different tree species.
4. Discussion
Our experiments show that the ConvNeXt network can accurately identify 33 tree species, with an average accuracy rate of 97.61% on the test set, using bark images in the BarkNetV2 dataset. Compared to the identification results obtained using other models on the BarkNet dataset, the ConvNeXt network used in our experiments outperformed the identification results of previous studies.
Transfer learning is a technique where a model trained to perform one task is adapted for another related task. The advantage of transfer learning is that it can save time and computational resources by leveraging the knowledge contained within a large dataset (such as ImageNet) for use in a smaller dataset (such as CIFAR-10). Despite the fact that the network weights obtained via ImageNet are generally applicable, there may be cases where the network performance was not as expected for specific identification tasks. Taxonomically similar trees often show a high degree of similarity in appearance; thus, it is difficult to identify species with very similar taxonomic attributes based on bark images, as demonstrated by our experimental results. The network frequently fails to accurately distinguish tree species with similar identification features or low identification confidence. It is worth mentioning that the same tree species often undergo several phases of changes during the growth process. Although the bark changes much less than other tree organs, some morphological changes may also occur, such as plate shedding, groove deepening, and plate scale thickening. Therefore, if the network pays too much attention to local features on bark images, instead of capturing the common biological patterns at the species level, the network may experience overfitting, and the overall identification performance of the network will be reduced. For these identification tasks, network pre-training can incorporate similar features from different categories more effectively than transfer learning. Compared to the results obtained in the study by Carpentier and Kim, who utilized ImageNet pre-trained weights, the network pre-trained weights obtained through our training on BarkNetV2 have better identification capabilities [
23,
27].
During the training process, the accuracy and loss function curves of the three types of networks were generally similar. Although ConvNeXt-B has more parameters, ConvNeXt-S performed better on the test set, possibly because having too many parameters may decrease the network’s generalization ability, impairing its predictive performance. The reason for this issue lies in the fact that increasing the number of parameters in the network may cause it to experience overfitting of the noise in the training data, instead of capturing the potential signal, leading to a model that is highly accurate on the training set but performs poorly on a new dataset. Although the overall identification accuracy is considerably high, the confusion matrix results show that it is often difficult to distinguish some taxonomically similar tree species. Pre-trained weights of the network often produce misidentification among tree species in the same family or genus. For example, BOJ (Betula alleghaniensis) is often confused with BOP (Betula papyrifera), while EPR (Acer platanoides), ERS (Acer saccharum), and ERR (Acer rubrum) are misidentified many times. Moreover, EPO (Picea abies) and EPR (Picea rubens) are often difficult to distinguish. The identification of taxonomically similar trees is a challenging task for convolutional neural networks (CNNs). The high degree of similarity between such tree species often leads to misidentifications, which is a critical issue in image recognition and identification tasks. When the CNNs are pre-trained for one category and tested via another closely related category, such as different species of plants or animals, misidentifications are likely to occur due to the inherent variations in features and patterns. It is also worth mentioning that the result of the confusion matrix in our experiment shows that the network pre-trained weights rarely show misidentifications for the tree species in BarkNJ, and the overall identification accuracy is greater than that of the tree species in BarkNet with higher identification confidence. This result occurs because the bark images collected in this paper exclude the effects of lighting and shadows, and the subsequent processing also removes the images containing noise, which makes our BarkNJ dataset superior to BarkNet in terms of quality.
Tree bark provides essential information about tree species and their environmental conditions. Different kinds of trees possess unique bark properties, such as smooth or rough textures, that can be used to differentiate them. In the case of tree identification, the network would be fed with data about different tree species, including information about their taxonomic attributes and bark characteristics. These data would be used to train the network to recognize patterns unique to each species, as well as to use those patterns to identify new samples of trees. Feature extraction is an essential technique in CNNs that allows the network to focus on the most critical aspects of the image and ignore irrelevant or redundant features. For example, the network architecture can be designed to extract bark-related features, such as texture, pattern, or thickness, to distinguish different tree species based on their bark characteristics. However, the neural network feature extraction process is often called a black box. The network learns to identify and extract features from the input data most relevant to the task, and these features can be highly abstract and difficult to interpret or visualize. During our visualization, we found that the pre-trained neural network weights selectively focused on regions of the image that exhibited distinct features, including grooves, cracks, and lenticels, which strongly resemble the ways in which humans recognize tree species based on bark image. Our findings suggest that the network’s identification mechanism is closely linked to tree species’ taxonomic attributes and their bark’s unique biological features, as indicated based on the patterns revealed during our visualization experiment.
Integrated Gradients, Grad CAM, and Deep Feature Decomposition are three visualization methods that aim to explain the predictions of deep convolutional neural networks by highlighting the regions or features that contribute to the output. However, these methods also have some limitations [
37,
38,
39]. For example, Integrated Gradients may not be able to capture certain types of relationships between the input and output of a model. When applied to some models, Grad CAM may produce noisy or blurry heat maps. Deep Feature Decomposition requires a pre-trained autoencoder to reconstruct the input image from the decomposed features, which may introduce reconstruction errors or artifacts. Therefore, in our future research, deep network visualization techniques must be further improved to better reveal the principles and characteristics of the network workflow.
The image recognition technology that uses convolutional neural network has significant practical value in identifying tree species based on bark images, which can improve the intelligence and efficiency of forest resource surveys through fast and precise identification of tree species. The traditional methods of tree species identification in the field are less efficient, especially for tall trees, whose leaves are visually difficult to differentiate. For deciduous tree species, it requires significant effort to collect samples of leaves or flowers for identification during the fall or winter. The application of handheld mobile devices makes tree species identification in the field based on bark images more convenient and simplifies the sampling process. Combined with lightweight convolutional neural networks, the efficiency of field identification of tree species can be greatly improved. The technology proposed in our study enables forest workers to collect bark images in the field and upload them to a back-end server for identification. Through the optimization of data collection and image processing, the tree species identification technology proposed in our study can meet the needs of field workers in terms of identification efficiency and accuracy. In addition, through both the accumulation of a large number of bark images and network optimization, the accuracy and generalization performance of the neural network can be continuously improved, further improving the predictability of the model in different environmental settings.
In traditional deep learning processes, it is often necessary to use large amounts of data to train a model to achieve good performance, which usually entails high costs and complicated data acquisition processes. In contrast, few-shot learning techniques are able to use the prior experiences as input data, thus significantly reducing data requirements and providing a more cost-effective solution. In addition, few-shot learning techniques can also improve the robustness of models, allowing machine learning algorithms to maintain a high level of accuracy in the face of more complex scenarios. In future research, we will explore the application of few-shot learning to tree species identification to obtain a model with better generalization ability, while reducing the amount of data required for training. In addition, we will continue to collect bark images to build a large-scale bark dataset for deep learning.