Improvement and Assessment of Convolutional Neural Network for Tree Species Identification Based on Bark Characteristics

Cui, Zhelin; Li, Xinran; Li, Tao; Li, Mingyang

doi:10.3390/f14071292

Open AccessArticle

Improvement and Assessment of Convolutional Neural Network for Tree Species Identification Based on Bark Characteristics

Co-Innovation Center for Sustainable Forestry in Southern China, Nanjing Forestry University, Nanjing 210037, China

^*

Author to whom correspondence should be addressed.

Forests 2023, 14(7), 1292; https://doi.org/10.3390/f14071292

Submission received: 23 May 2023 / Revised: 16 June 2023 / Accepted: 21 June 2023 / Published: 23 June 2023

(This article belongs to the Special Issue Applications of Artificial Intelligence in Forestry)

Download

Browse Figures

Versions Notes

Abstract

:

Efficient tree species identification is of great importance in forest inventory and management. As the textural properties of tree barks vary less notably as a result of seasonal change than other tree organs, they are more suitable for the identification of tree species using deep learning models. In this study, we adopted the ConvNeXt convolutional neural network to identify 33 tree species using the BarkNetV2 dataset, compared the classification accuracy values of different tree species, and performed visual analysis of the network’s visual features. The results show the following trends: (1) the pre-trained network weights exhibit up to 97.61% classification accuracy for the test set, indicating that the network has high accuracy; (2) the classification accuracy values of more than half of the tree species can reach 98%, while the confidence level of correct identification (probability ratio of true labels) of tree species images is relatively high; and (3) there is a strong correlation between the network’s visual attractiveness and the tree bark’s biological characteristics, which share similarities with humans’ organization of tree species. The method suggested in this study has the potential to increase the efficiency of tree species identification in forest resources surveys and is of considerable value in forest management.

Keywords:

tree species identification; convolutional neural network; bark image; visual attractiveness

1. Introduction

Forest resource inventory is an essential component of forestry, reflecting the quantity, quality, and dynamic changes of forest resources. Identifying tree species in the survey area is a primary goal of forest resource inventory. To identify tree species in survey areas, surveyors typically rely on visual methods based on external characteristics, such as roots, stems, leaves, flowers, and fruits. For trees that cannot be visually identified, surveyors usually need to collect specimens and consult reference materials. This manual identification process requires solid expertise in dendrology and is often time consuming, costly, and inefficient. Machine learning can be applied to identify tree species and improve the efficiency of forest resource inventory. In machine learning and cognitive science, neural networks are models that simulate the structures and functions of biological neural networks [1,2]. After training, deep neural networks can automatically learn to extract features from large-scale, diverse, high-dimensional complex data and perform efficient classification, prediction, and pattern recognition [3].

Early studies of tree species identification usually used artificial neural networks or support vector machines to analyze the hyper-spectral features of trees, and there were also multiple methods that used texture feature extraction and descriptors to assist in tree classification [4,5,6,7,8,9,10,11,12,13]. Although these methods reduced some manual costs of classification, traditional machine learning methods usually required human design or selection of texture features and descriptors, which were subjective and often could not fully express the complexity and diversity of textures, and lost some information during post-processing operations, such as dimensionality reduction [14,15,16]. Therefore, the efficiency of traditional machine learning methods needs to be improved.

Deep learning is a machine learning method that uses deep neural networks to extract features and automatically learn data representations. Compared to traditional machine learning methods, deep learning can better discover and extract the feature information in datasets, thus having a higher generalization ability. The development of deep learning algorithms led to the increase in network layers, which makes convolutional neural networks or multi-feature recognition networks ever-more applicable in species classification and identification based on forest images [17,18,19,20]. For tree species identification, scholars who use deep learning models usually focus on leaves, flowers, fruits, or tree shapes. The classifier of the network usually recognizes the entire image as a whole [21], and the mixed background value can easily confuse the original representation of the network. Tree organs, such as leaves and flowers, are usually difficult to distinguish from the background (noise) during image acquisition, making it challenging to obtain precise image information for leaves and flowers found on tall trees. In addition, flowers and fruits are only present at certain times of the year, while leaf information is unavailable during defoliation. Consequently, there are many challenges involved in species identification using images of tree organs in forest resource inventory. The morphological characteristics of the bark, which is the outermost layer of the stems and roots of woody plants, are important features involved in distinguishing tree species. Employing tree bark as an identifier has several advantages compared to leaves. Most tree bark shapes are stable, unless subjected to irreversible disasters (e.g., forest fires), and tree bark textures do not change or change little with seasonal change [11].

There are few existing studies of tree species identification based on bark texture features that used deep learning algorithms. Most of these studies used ResNet [22] network as the basis for analysis and adopted deeper convolutional layers to achieve higher accuracy when performing bark texture recognition. Carpentier, M. et al. publicized a bark dataset called BarkNet 1.0, which is also the largest publicly available dataset, and a high tree species classification accuracy of 93.88% using ResNet18 and ResNet34 [23] was achieved. Misra et al. implemented an alternative classification method using patch-based convolutional neural networks that fine-tuned the network’s patch predictions and determined the image category via majority voting with an ensemble-based classifier [24]. Robert et al. developed DeepBark, which is a model capable of detecting bark surfaces under high background brightness [25]. Faizal achieved promising results on BarkVN-50 using a deeper network called ResNet101 [26]. Kim et al. trained VGG-16 and EfficientNet, obtained an identification accuracy value above 90%, and applied class activation mapping (CAM) aggregation to identify the critical classification features for each tree species [27]. Therefore, the image recognition technology of the convolutional neural network has great practical value for bark identification, which can quickly and accurately identify tree species, making forest resource surveys more intelligent and efficient. Using this technique, forestry workers can collect tree bark images on-site and upload them to the server for identification. By optimizing data collection and image processing, this identification technique can meet the needs of forest survey fieldwork teams in terms of efficiency and accuracy.

Most existing studies on tree species identification use pre-trained weights of networks that were trained on ImageNet, rather than using bark images as the pre-training data [22,23,24,25,26,27]. This method could lead to some misclassification and performance degradation. Many researchers use relatively backward networks to work on image classification of tree species [22,23,24,25,26] (like ResNet). Compared to some algorithms developed in recent years, the performance of traditional convolutional neural networks is inferior. Furthermore, due to the difficulty in demonstrating the identification processes of deep convolutional neural networks on images, only a few studies combine network vision with the biological features of tree species, thus failing to reveal the effectiveness of network classification results.

In this paper, we pre-train three ConvNeXt networks using different depths on the bark dataset. The three research objectives are as follows: (1) to compare the performances of these ConvNeXt networks; (2) to analyze the biological features that led to the discrepancies in classification accuracy values between different tree species; and (3) to explore the relationships between the visual attractiveness and biological features of bark images of different tree species.

2. Materials and Methods

2.1. Datasets

Most existing bark datasets for tree species identification based on deep learning experience categorization problems, such as insufficient data, blurred images, or limited species diversity. Collecting and producing high-quality bark datasets for deep learning is time consuming and labor intensive, which impedes academic research in this field. Therefore, we use a dataset combining previous bark image datasets with new bark images that we collected in Nanjing, China (Table 1).

We selected the BarkNet 1.0 dataset (hereafter referred to as BarkNet) as our research data for the following reasons: (1) BarkNet contains the largest data in terms of the number of tree species and images, with a total of 23 tree species and 23,616 images. Other datasets, such as BarkVN50, although covering a relatively larger number of tree species, had a relatively smaller number of images per tree species. (2) Regarding image quality, most of the images in the BarkNet dataset were manually cropped to remove noise and irrelevant background values and only contained clear bark images. Most image backgrounds in other datasets were not cropped, and some bark images had noisy data, such as shadows, lighting changes, and halos. Thus, the BarkNet dataset was more suitable for pre-training the network.

To further expand the dataset used to pre-train the network, we collected some common tree species in Nanjing using mobile devices, such as mobile phones. The image data we collected contained 7671 images of 10 tree species found at Nanjing Forestry University, and we named it BarkNJ. We used a pre-processing method similar to BarkNet to optimize the images in the dataset, retaining only the clear bark images after noise removal, and named the combined dataset BarkNetV2. Table 2 shows some details of the dataset used in this paper. We sorted the Latin names of each tree species and then selected a clear bark image of each tree species as an example image of that species, as shown in Figure 1.

2.2. Methodology

2.2.1. Selection of the Networks

Facebook AI Research and UC Berkeley jointly proposed a pure convolutional neural network called ConvNeXt [28], which achieved the highest classification accuracy on ImageNet with convolutional structure alone, surpassing the accuracy of the Swin Transformer tool proposed during the same period [29]. To optimize the network performance, the architecture of ConvNeXt uses a series of Vit strategies, which is similar to the Swin Transformer architecture. The ConvNeXt architecture is shown in Figure 2.

The modifications in the ConvNeXt architecture can be specified as follows: (1) the ratio of stacked block layers is adjusted to 1:1:9:1, with block 3 containing the highest percentage of stacked blocks. This modification is made to balance the computational requirements with performance. (2) The ResNet stem is modified to adopt the Patchify backbone used in Transformer to improve the network’s performance. (3) Grouped convolution in ResNet is replaced with depth-wise convolution, with the number of groups being the same as the number of channels. (4) The Bottleneck module of ResNet is substituted for a modified version of the MobileNetV2 Inverted Bottleneck module to optimize network performance. (5) The depth-wise convolutional layers are moved up to match the placement of the MHA (Multi-Headed Attention) module before the MLP (Multiple Layer Perceptron) layers in the Swin Transformer architecture. Additionally, the size of the convolutional kernel is changed from 3 × 3 to 7 × 7 to align with the Swin Transformer. (6) The activation function of ConvNeXt is replaced with Gelu, which is the same as the Swin Transformer. Additionally, the number of normalization layers and activation functions in a block is reduced, with only normalization layers retained after a depth-wise convolution operation. (7) The down-sampling layer is created by combining a layer normalization and a convolutional layer with a kernel size of 2 and a stride size of 2.

By adjusting the feature dimensions and the number of individual blocks, ConvNeXt can be divided into four networks of different depths, which correspond to the computational complexity of different Swin Transformer versions. The network parameters for each type are given in Table 3. It should be noted that the smaller networks of ConvNeXt-T and ConvNeXt-S have the same feature dimensions, though the ratio for the number of each block in the stage is different. Liu et al. believe that retaining the same number of stacked layers as in the Swin Transformer can achieve a better effect; thus, the results of ConvNeXt-T, in our experiments, are only used for reference [28,29]. Using a network with high computational complexity on a small dataset may lead to the network becoming overfitted. The number of images used in our dataset is considerably smaller than that of ImageNet, and the computational complexity of ConvNeXt-L is excessive for this amount of data. Furthermore, as Table 3 shows, despite having more than double the parameters, ConvNeXt-L provides only a 0.4% accuracy gain over ConvNeXt-B. Therefore, we omitted ConvNeXt-L from our experiments.

2.2.2. Setting of the Network Parameters

All training of ConvNeXt was performed via a PyTorch 1.11.0+cu113 with Python version 3.9.9. In contrast to Carpentier et al.’s study, we did not download pre-trained weights from ImageNet for transfer learning [23]. Instead, we selected the ConvNeXt network, which was a superior-performing convolutional neural network used for pre-training of the bark dataset.

Due to the significant differences between the ImageNet and BarkNetV2 networks, we recalculated the channel means and variances of all images and randomly split the training, validation, and test sets by 70%, 20%, and 10%, respectively, using the random seed setting. As the resolution of bark images in our datasets is high, training directly with bark images may degrade the network’s performance. Thus, the images of the training and validation sets were randomly cropped and centered, respectively, and the processed images were horizontally randomly rotated. The size of the images sent to the network for training was uniformly set to 224 × 224 pixels to maximize the capability of the graphics card.

Paszke and Sebastian used theoretical deduction and experimental verification to prove that choosing a multiple of two or eight as a batch size made no significant difference in practice [30,31]. Therefore, to maximize the efficiency of the graphics card while keeping the variables consistent, we set the batch size to 32, and the total number of training cycles was set to 50 epochs. Weight decay is a form of regularization that can effectively reduce the overfitting of deep networks; thus, we chose AdamW to be the optimizer and set the weight decay rate to 0.05. As the network needs to use the bark dataset for pre-training, we chose an appropriate learning rate of 0.001, which meant that the weights changed less with each iteration. The network took more time to reach its optimal value while finding the optimal value of the loss function. We set the first 10 epochs of the total training period as warm-up learning, during which stage the learning rate gradually increased with the pre-set value to stabilize the network. After the network was relatively stable, training was conducted according to the pre-set learning rate, which could make the network more quickly converge and have a better training effect. The main parameters for pre-training three ConvNeXt networks with different depths are listed in Table 4.

2.2.3. Visualization of the Network Workflow

Despite the excellent performance of convolutional neural networks in various types of image identification tasks, the process of network recognition is often considered a black box, preventing the observation of the internal working mechanism. Therefore, we tried to use some visualization methods to reveal the correlation between the biological characteristics of different tree species and the visual mechanisms of deep convolutional neural networks. The following three methods were used in this study to visualize the network workflow: (1) Integrated Gradients, which attributed the projection of a deep network to its input, rather than a particular layer, by invoking the gradient operator [32]. There are several methods for visualizing the internal components of deep neural networks. Integrated Gradients is a widely applicable approach with a robust theoretical foundation. (2) Smooth Grad CAM++ combined with Grad-CAM and Smooth Grad, which is a technique that can model and visualize a subset of feature maps or neurons at each neural network level [33,34,35], was also used. The output visualization presented a hierarchical feature that effectively incorporated the elements of visual appeal, localization, and object-like capture. (3) Deep Feature Decomposition, which provides insight into clustering patterns in feature space and presents results as heat maps, was also used. The maps used different colors to differentiate between concepts and adjusted the intensity to highlight semantically similar image regions [36].

3. Results and Analysis

3.1. Comparison of Identification Results

Figure 3 demonstrates the identification accuracy and loss value of three ConvNeXt networks with varying depths during training. Figure 3a shows that the identification accuracy of ConvNeXt-S on the 22nd epoch validation set exceeded 90%, indicating that the network achieved a high accuracy rate. In contrast to the rapid improvement in identification accuracy during initial training, the identification accuracy of the network on the validation set increased from 90 to 97% between the 22nd and 40th epochs. After about 20 epochs of training, the accuracy only increased by about 7%. The network gradually stabilized after the 40th epoch, with minimal change in accuracy. At the 50th epoch, the network’s Top-1 accuracy on the validation set reached 97.71%. It is worth noting that the lightweight network of ConvNeXt-S increased the Top-1 accuracy on the test set to 97.61%, surpassing that of ConvNeXt-B (97.58%). In Figure 3b, the training process of ConvNeXt-B and ConvNeXt-S was similar. However, due to the more significant number of parameters, the training curve of ConvNeXt-B was relatively more stable than ConvNeXt-S. The identification accuracy of ConvNeXt-B first exceeded 90% at the 24th epoch. The network gradually stabilized after training reached 40 epochs, reaching the maximum Top-1 accuracy on the validation and test sets at the 50th epoch, with 97.79% and 97.58%, respectively. In Figure 3c, the shallower depth of ConvNeXt-T resulted in a more extended period in which the identification accuracy of the validation set first reached 90% (26th epoch) and more fluctuations occurred compared to other networks. After 50 epochs of training, the Top-1 accuracy of ConvNeXt-T on the validation and test sets reached 97.49% and 97.29%, respectively.

Figure 3d shows that the ConvNeXt-S loss function values changed during training. Due to the small initial learning rate of the warm-up phase, the curve decreased rapidly during the first 10 epochs. When the network first achieved 90% identification accuracy on the validation set, the corresponding loss value was 0.494. It should be noted that the curve did not fluctuate much over a long period after reaching 90% identification accuracy, and the curve eventually stabilized at the 40th epoch. Subsequently, the fluctuation of the loss value curve was almost negligible. Figure 3e,f show that the training processes for the ConvNeXt-B and ConvNeXt-T were, in principle, very similar to those of the ConvNeXt-S. The loss values of the three networks of different depths decreased at basically the same downward trend. However, the deeper network of ConvNeXt-B could fit earlier, and its training process was more stable than those of the other two networks. When the training reached about 40 epochs, the loss function curves of the three networks stabilized and experienced no significant fluctuations.

In general, deeper networks are typically associated with superior non-linear representation capabilities, enabling the learning of more complex transformations and accommodating more complicated feature inputs. However, our experiments demonstrate the opposite results: deeper networks exhibit a slightly lower degree of classification accuracy compared to shallower networks. This result probably occurs because the bark images, which mainly consist of texture and color, are much simpler than images in multi-classification datasets, such as CIFAR and ImageNet, and shallow learners can effectively distinguish these bark features. Therefore, using deep networks can result in gradient instability and network degradation, which inevitably reduces the learning capacity of specific deep layers of uncomplicated datasets. Consequently, this issue leads to a reduction in identification performance on a new dataset.

3.2. Identification Precision by Species

A confusion matrix is a table that shows the extent to which a model predicted the correct class for each test datum, as well as the extent to which it made mistakes. The confusion matrix in Figure 4 shows the average identification accuracy of different tree species on the test set using the pre-trained weights obtained by pre-training the ConvNeXt-S network (without finetune) on the BarkNetV2 dataset. The results demonstrate that ConvNeXt-S achieved high identification accuracy (above 98%) for about half of the tree species. However, the identification accuracy of some tree species was relatively low, with an average accuracy rate of approximately 95%. The tree species with relatively low identification accuracy for ConvNeXt-S were EPR (Acer rubrum), ORA (Ulmus americana), OSV (Ostrya virginiana), and SAB (Abies balsamea). It should be noted that due to the limited number of bark images available for ERB (Acer platanoides), PEG (Populus grandidentata), and PID (Pinus rigida), the pre-trained weights obtained may not apply to the practical classification tasks for these tree species, despite there being no misidentifications of these species in the experiment.

When performing an identification task, a network typically uses a classical voting method to select the label that corresponds to the highest value of category probability as the predicted class output, as with ConvNeXt. The confusion matrix can indicate the identification accuracy of the network for each tree species, but it cannot effectively express the prediction correctness (i.e., the proportion of the tree species with the highest probability value). To address this issue, we loaded the pre-trained optimal weights for multi-label prediction onto the BarkNetV2 test set. Next, we obtained the predicted labels and corresponding category probabilities of 33 tree species found on the network (Table 5). In Table 5, the category probabilities indicate the confidence level of the network for accurate identification, while the mean and standard deviation of category probabilities indicate the average categorical confidence and the dispersion of the category probabilities, respectively.

A higher degree of variability in the category probabilities implies a lower level of confidence in the network’s ability to generate valid predictions. Moreover, the identification accuracy of a tree species on the network would be adversely affected by the high degree of fluctuation in its category probabilities. The pre-trained weights of ConvNeXt-S achieved excellent identification results for PEG (Populus grandidentata), MEL (Larix laricina), and CMCA (Cinnamomum camphora), with high confidence probabilities noted for the categories, as well as an overall identification accuracy of almost 100%. In contrast, for ERS (Acer saccharum), SAB (Abies balsamea), ORA (Ulmus americana), and others, the pre-trained weights were significantly less confident than the average figure, and the overall recognition accuracy for these species was generally in the range of 90% to 95%. It is worth mentioning that ConvNeXt-S achieved higher identification accuracy on the species in our BarkNJ dataset than in BarkNet, and the confidence probabilities of the categories are also higher than in BarkNet, thus showing the advantages of our dataset.

3.3. Visualization of Network Identification Process

3.3.1. Selection of Sample Images

All three networks showed some misidentified images, although their overall identification accuracies were slightly different. Tree species from the same family or genus are often difficult to distinguish in terms of morphological features; even deep learning networks may confuse these species. Many closely taxonomically related tree species mentioned in BarkNetV2 were used in this study, such as Betula alleghaniensis (yellow birch) and Betula papyrifera (white birch) in the genus Birch of the family Betulaceae; Pinus resinosa (red pine) and Picea rubens (red spruce), which are both in the family Pinaceae; and Pinus strobus (eastern white pine) and Tsuga canadensis (eastern hemlock) in the family Pinaceae. Furthermore, sun illumination, bark occlusion, or camera shaking during image acquisition may cause noises in the images. These noises may cause the network to misidentify features of the tree species during the identification process.

The workflow of neural networks is widely considered to be a black box, making it difficult to observe the identification process in detail. To further investigate how the ConvNeXt network identifies bark images, eight photos were selected as sample images for the visualization of the network workflow, and the details of the sample images are shown in Table 6. To ensure the effectiveness of the visualization work, we followed the below procedures to select sample images: four images with high classification accuracy (high overall accuracy for the category and high confidence for a single prediction) and four images with poor classification accuracy (i.e., each of the three networks misidentified the image as another tree species) were selected. The visualization section utilizes the optimal weights of the ConvNeXt-S network. The four selected high-identification precision images were sourced from the following tree species: MEL (Larix laricina), EPO (Picea abies), LCNE (Liriodendron chinense), and MAGS (Metasequoia glyptostroboides). In contrast, the four selected low-identification precision images were sourced from the following tree species: SAB (Abies balsamea), ERS (Acer saccharum), OSV (Ostrya virginiana), and CAAA (Camptotheca acuminata). The details of the selected eight images are shown in Table 6, which shows the Top-4 prediction categories with the highest probabilities and the species identified by the network (the classes corresponding to Top-1 Accuracy).

3.3.2. Integrated Gradient Visualization

The results of integrated gradient visualization in the format of dotted raster images can be seen in Figure 5, in which the original image and the bark outline predicted via network vision can roughly be seen. By applying this method to a neural network, it is possible to generate saliency maps that highlight the regions of an image that are most relevant for identification.

The bark image of Larix laricina has a network visual distribution that is mainly on brick-red bark under the thin scales, while the bark image of Picea abies is mainly composed of rough paper-like scales, which are deep red–brown in color and accompanied by cracks and trim pieces that partially stem from the lenticel. The visual distribution of the network is more concentrated at the joints between thick scales and the location where the lenticel grow. The bark image of Liriodendron chinense is dominated by vertical texture, which mainly highlights vertical stripes and slab exfoliation. Similarly, the visual distribution of the network of this bark image is more concentrated at the joints between thick scales. The bark image of Metasequoia glyptostroboides is dark brown, cracked, and flaked, and the texture of the bark image shows tapered branching and ascending branches, with evident visual attraction at cracks and spalls. In the four bark images with lower identification accuracy rates, the network visualizations are roughly concentrated on the scale cracks and the visually prominent exposed bark. However, the location of the concentrated visual attraction of the biological characteristics of identified tree species slightly deviates from the correct tree species. Therefore, the visualization shows that pre-trained weights derived from our experiments enable a human-like approach to bark perception by focusing attention on regions exhibiting distinct biological features.

3.3.3. Class Activation Mapping Hot Spots

Smooth Grad-CAM++ was utilized to produce heat maps that corresponded to three neural network stages (excluding stage one due to its limited representational capacity). This approach allowed us to visualize the spatial distribution and importance of visual features in each network layer. The heat map was obtained by weighting the feature maps with the corresponding convolutional weights and computing the average across entire feature maps. The resulting heat map visualizes the regions on the image that most strongly influence the final identification decision. The overlay of the original image with the heat map allows us to observe the change in the network’s visual recognition of the actual image (Figure 6).

For the four images with high recognition accuracy, we found that the network visual attractiveness heat map produced via Class activation mapping was basically located at some positions with distinct tree species features. For example, the exposed brick-red bark of Larix laricina and Picea abies, as well as the vertical spalls and cracks of Liriodendron chinense, were similar to the results produced via Integrated Gradient visualization. However, the heat maps exhibited imprecise positioning in the four images with lower classification accuracy rates. For instance, the extraction algorithm failed to effectively capture the texture features within the seams of Acer saccharum and Ostrya virginiana bark images. Additionally, the network generated for Abies balsamea and Camptotheca acuminata displayed incorrect visual locations, resulting in inaccurate coverage.

3.3.4. Image Depth Feature Decomposition

Using deep feature decomposition to perform visualization of category probabilities on four misidentified images, we obtained visualization results similar to those obtained via semantic segmentation (Figure 7). Based on this result, we know which bark image that blocks the network identifies as which tree species. On the bark image of Larix laricina, most areas are identified with the correct label, though there are still a few areas with potential identification errors, i.e., misidentified as CHR (Quercus rubra). The identification precision of Picea abies is relatively higher, with only a tiny portion (one of the lenticel) being misidentified as FRA (Fraxinus americana). Both bark images of Liriodendron chinense and Metasequoia glyptostroboides have relatively noticeable vertical peeling and cracks. Therefore, the decomposition results of these two bark images show a vertical distribution, with only a few misidentification errors appearing on the edge of the bark images, which are misidentified as ERS (Acer saccharum) and CHR (Quercus rubra), respectively. On the bark image of Abies balsamea, only the area in the lower part of the image was correctly identified as SAB, while other areas were identified as EPN (Picea mariana) and CHR (Quercus rubra). The identification results of Acer saccharum are even worse, and the feature decomposition identifies all areas of the image as FRA (Fraxinus americana), indicating that the network’s feature extractor failed to obtain adequate classification information. On the bark image of Ostrya virginiana, there is a large misidentified area in which the tree species is mislabeled as FRA (Fraxinus americana) and ERR (Acer rubrum), in different places. In contrast to the above cases, the mislabeled places on the bark image of Camptotheca acuminata exhibit an extensive vertical distribution, in which the tree species is primarily mislabeled as PSCS (Populus canadensis).

4. Discussion

Our experiments show that the ConvNeXt network can accurately identify 33 tree species, with an average accuracy rate of 97.61% on the test set, using bark images in the BarkNetV2 dataset. Compared to the identification results obtained using other models on the BarkNet dataset, the ConvNeXt network used in our experiments outperformed the identification results of previous studies.

Transfer learning is a technique where a model trained to perform one task is adapted for another related task. The advantage of transfer learning is that it can save time and computational resources by leveraging the knowledge contained within a large dataset (such as ImageNet) for use in a smaller dataset (such as CIFAR-10). Despite the fact that the network weights obtained via ImageNet are generally applicable, there may be cases where the network performance was not as expected for specific identification tasks. Taxonomically similar trees often show a high degree of similarity in appearance; thus, it is difficult to identify species with very similar taxonomic attributes based on bark images, as demonstrated by our experimental results. The network frequently fails to accurately distinguish tree species with similar identification features or low identification confidence. It is worth mentioning that the same tree species often undergo several phases of changes during the growth process. Although the bark changes much less than other tree organs, some morphological changes may also occur, such as plate shedding, groove deepening, and plate scale thickening. Therefore, if the network pays too much attention to local features on bark images, instead of capturing the common biological patterns at the species level, the network may experience overfitting, and the overall identification performance of the network will be reduced. For these identification tasks, network pre-training can incorporate similar features from different categories more effectively than transfer learning. Compared to the results obtained in the study by Carpentier and Kim, who utilized ImageNet pre-trained weights, the network pre-trained weights obtained through our training on BarkNetV2 have better identification capabilities [23,27].

During the training process, the accuracy and loss function curves of the three types of networks were generally similar. Although ConvNeXt-B has more parameters, ConvNeXt-S performed better on the test set, possibly because having too many parameters may decrease the network’s generalization ability, impairing its predictive performance. The reason for this issue lies in the fact that increasing the number of parameters in the network may cause it to experience overfitting of the noise in the training data, instead of capturing the potential signal, leading to a model that is highly accurate on the training set but performs poorly on a new dataset. Although the overall identification accuracy is considerably high, the confusion matrix results show that it is often difficult to distinguish some taxonomically similar tree species. Pre-trained weights of the network often produce misidentification among tree species in the same family or genus. For example, BOJ (Betula alleghaniensis) is often confused with BOP (Betula papyrifera), while EPR (Acer platanoides), ERS (Acer saccharum), and ERR (Acer rubrum) are misidentified many times. Moreover, EPO (Picea abies) and EPR (Picea rubens) are often difficult to distinguish. The identification of taxonomically similar trees is a challenging task for convolutional neural networks (CNNs). The high degree of similarity between such tree species often leads to misidentifications, which is a critical issue in image recognition and identification tasks. When the CNNs are pre-trained for one category and tested via another closely related category, such as different species of plants or animals, misidentifications are likely to occur due to the inherent variations in features and patterns. It is also worth mentioning that the result of the confusion matrix in our experiment shows that the network pre-trained weights rarely show misidentifications for the tree species in BarkNJ, and the overall identification accuracy is greater than that of the tree species in BarkNet with higher identification confidence. This result occurs because the bark images collected in this paper exclude the effects of lighting and shadows, and the subsequent processing also removes the images containing noise, which makes our BarkNJ dataset superior to BarkNet in terms of quality.

Tree bark provides essential information about tree species and their environmental conditions. Different kinds of trees possess unique bark properties, such as smooth or rough textures, that can be used to differentiate them. In the case of tree identification, the network would be fed with data about different tree species, including information about their taxonomic attributes and bark characteristics. These data would be used to train the network to recognize patterns unique to each species, as well as to use those patterns to identify new samples of trees. Feature extraction is an essential technique in CNNs that allows the network to focus on the most critical aspects of the image and ignore irrelevant or redundant features. For example, the network architecture can be designed to extract bark-related features, such as texture, pattern, or thickness, to distinguish different tree species based on their bark characteristics. However, the neural network feature extraction process is often called a black box. The network learns to identify and extract features from the input data most relevant to the task, and these features can be highly abstract and difficult to interpret or visualize. During our visualization, we found that the pre-trained neural network weights selectively focused on regions of the image that exhibited distinct features, including grooves, cracks, and lenticels, which strongly resemble the ways in which humans recognize tree species based on bark image. Our findings suggest that the network’s identification mechanism is closely linked to tree species’ taxonomic attributes and their bark’s unique biological features, as indicated based on the patterns revealed during our visualization experiment.

Integrated Gradients, Grad CAM, and Deep Feature Decomposition are three visualization methods that aim to explain the predictions of deep convolutional neural networks by highlighting the regions or features that contribute to the output. However, these methods also have some limitations [37,38,39]. For example, Integrated Gradients may not be able to capture certain types of relationships between the input and output of a model. When applied to some models, Grad CAM may produce noisy or blurry heat maps. Deep Feature Decomposition requires a pre-trained autoencoder to reconstruct the input image from the decomposed features, which may introduce reconstruction errors or artifacts. Therefore, in our future research, deep network visualization techniques must be further improved to better reveal the principles and characteristics of the network workflow.

The image recognition technology that uses convolutional neural network has significant practical value in identifying tree species based on bark images, which can improve the intelligence and efficiency of forest resource surveys through fast and precise identification of tree species. The traditional methods of tree species identification in the field are less efficient, especially for tall trees, whose leaves are visually difficult to differentiate. For deciduous tree species, it requires significant effort to collect samples of leaves or flowers for identification during the fall or winter. The application of handheld mobile devices makes tree species identification in the field based on bark images more convenient and simplifies the sampling process. Combined with lightweight convolutional neural networks, the efficiency of field identification of tree species can be greatly improved. The technology proposed in our study enables forest workers to collect bark images in the field and upload them to a back-end server for identification. Through the optimization of data collection and image processing, the tree species identification technology proposed in our study can meet the needs of field workers in terms of identification efficiency and accuracy. In addition, through both the accumulation of a large number of bark images and network optimization, the accuracy and generalization performance of the neural network can be continuously improved, further improving the predictability of the model in different environmental settings.

In traditional deep learning processes, it is often necessary to use large amounts of data to train a model to achieve good performance, which usually entails high costs and complicated data acquisition processes. In contrast, few-shot learning techniques are able to use the prior experiences as input data, thus significantly reducing data requirements and providing a more cost-effective solution. In addition, few-shot learning techniques can also improve the robustness of models, allowing machine learning algorithms to maintain a high level of accuracy in the face of more complex scenarios. In future research, we will explore the application of few-shot learning to tree species identification to obtain a model with better generalization ability, while reducing the amount of data required for training. In addition, we will continue to collect bark images to build a large-scale bark dataset for deep learning.

5. Conclusions

Based on the BarkNetV2 bark dataset, we used three ConvNeXt networks with different depths for tree species identification through bark images. In our experiments, the networks could identify 33 different tree species with an identification accuracy of 97.62%, exceeding the performances reported in previous works. Confusion matrices and category probability tables showed that more than half of the tree species could be accurately distinguished with high identification confidence. We described the workflow of network feature extraction by integrating gradients and other visualization methods, and we analyzed the correlation between the visual attractions of the network and the biological features of the tree bark. The results show that the location and importance of the visual attractions of the network are closely related to the biological characteristics of the tree bark. Additionally, we created a new dataset called BarkNJ, which consisted of images of a higher quality than those located in BarkNet. During the experiments, the tree species in the BarkNJ dataset achieved almost complete correct identification. Based on these results, transfer learning and fine-tuning of neural networks can further expand their application scenarios and create considerable potential application value in forest resource surveys both in China and abroad.

Author Contributions

Conceptualization, Z.C. and M.L.; formal analysis, Z.C.; funding acquisition, M.L.; methodology, Z.C. and M.L.; resources, T.L. and X.L.; writing—original draft preparation, Z.C. and X.L.; writing—review and editing, Z.C., M.L., T.L. and X.L. All authors have read and agreed to the published version of this manuscript.

Funding

This research was funded by the National Natural Science Foundation [30972298] and the Top-Notch Academic Programs Project of Jiangsu Higher Education Institutions, China [PPZY2015A062].

Data Availability Statement

The data underlying this article will be shared on reasonable request to the corresponding authors.

Acknowledgments

We would like to express our sincere gratitude to Nanjing Forestry University and the Forest Management Laboratory for providing us with the facilities and equipment required to conduct this study. We are also grateful to all colleagues and technicians who assisted us during the data preparation and analysis stages. Their expertise and support were invaluable for this project’s success.

Conflicts of Interest

The authors declare no conflict of interest.

References

Puri, M.; Pathak, Y.; Sutariya, V.K.; Tipparaju, S.; Moreno, W. Artificial Neural Network for Drug Design, Delivery and Disposition; Academic Press: Cambridge, UK, 2015. [Google Scholar]
Brahme, A. Comprehensive Biomedical Physics; Brahme, A., Ed.; Elsevier: Amsterdam, The Netherlands, 2014; Volume 10. [Google Scholar]
Xu, Z.Q.J.; Zhang, Y.Y.; Xiao, Y.Y. Training behavior of deep neural network in frequency domain. In Proceedings of the 26th International Conference on Neural Information Processing (ICONIP), Sydney, NSW, Australia, 12–15 December 2019; pp. 264–274. [Google Scholar] [CrossRef] [Green Version]
Fiel, S.; Sablatnig, R. Automated identification of tree species from images of the bark, leaves or needles. In Proceedings of the 16th Computer Vision Winter Workshop, Graz, Austria, 3 February 2010; pp. 67–74. [Google Scholar]
Wang, Y.; Zhang, W.; Gao, R.; Jin, Z.; Wang, X. Recent advances in the application of deep learning methods to forestry. Wood Sci. Technol. 2021, 55, 1171–1202. [Google Scholar] [CrossRef]
Chi, Z.R.; Li, H.Q.; Wang, C. Plant species recognition based on bark patterns using novel Gabor filter banks. In Proceedings of the International Conference on Neural Networks and Signal Processing, Nanjing, China, 14–17 December 2003; Volume 2, pp. 1035–1038. [Google Scholar] [CrossRef]
Kim, S.J.; Kim, B.W.; Kim, D.P. Tree recognition for land scape using by combination of features of its leaf, flower and bark. In Proceedings of the SICE Annual Conference, Tokyo, Japan, 13–18 September 2011; pp. 1147–1151. [Google Scholar]
Blaanco, L.J.; Travieso, C.M.; Quinteiro, J.M.; Hernandez, P.V.; Dutta, M.K.; Singh, A. A bark recognition algorithm for plant classification using a least square support vector machine. In Proceedings of the Ninth International Conference on Contemporary Comput ing (IC3), Noida, India, 11–13 August 2016; pp. 1–5. [Google Scholar] [CrossRef]
Boudra, S.; Yahiaoui, I.; Behloul, A. Bark identification using improved statistical radial binary patterns. In Proceedings of the International Conference on Content-Based Multimedia Indexing (CBMI), La Rochelle, France, 4–6 September 2018; pp. 1–6. [Google Scholar] [CrossRef]
Reměs, V.; Haindl, M. Bark recognition using novel rotation ally invariant multispectral textural features. Pattern Recognit. Lett. 2019, 125, 612–617. [Google Scholar] [CrossRef]
Bertrand, S.; Ameur, R.B.; Cerutti, G.; Coquin, D.; Valet, L.; Tougne, L. Bark and leaf fusion systems to improve auto matic tree species recognition. Ecol. Inform. 2018, 46, 27–73. [Google Scholar] [CrossRef]
Ratajczak, R. Efficient bark recognition in the wild. In Proceedings of the International Conference on Computer Vision Theory and Applications—Volume 4 (VISAPP), Prague, Czech Republic, 25–27 February 2019; pp. 240–248. [Google Scholar] [CrossRef]
Fekri-Ershad, S. Bark texture classification using improved local ternary patterns and multilayer neural network. Expert Syst. Appl. 2020, 158, 113509. [Google Scholar] [CrossRef]
Alzubaidi, L.; Zhang, J.; Humaidi, A.J.; Al-dujaili, A.; Duan, Y.; Al-Shamma, O.; Santamaría, J.; Fadhel, M.A.; Al-Amidie, M.; Farhan, L. Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. J. Big Data 2021, 8, 53. [Google Scholar] [CrossRef] [PubMed]
Chui, K.T.; Gupta, B.B.; Chi, H.R.; Zhao, M. Convolutional Neural Network and Deep One-Class Support Vector Machine with Imbalanced Dataset for Anomaly Network Traffic Detection. In Proceedings of the International Conference on Cyber Security, Privacy and Networking (ICSPN 2022), Fairfield, OH, USA, 3–11 July 2022; Volume 599, pp. 248–256. [Google Scholar] [CrossRef]
Xin, M.Y.; Wang, Y. Research on image classification model based on deep convolution neural network. J. Image Video Process. 2019, 40, 1–11. [Google Scholar] [CrossRef] [Green Version]
Choi, S. Plant Identification with Deep Convolutional Neural Network: SNUMedinfo at LifeCLEF Plant Identification Task 2015. In Proceedings of the Conference and Labs of the Evaluation Forum, Toulouse, France, 8–11 September 2015. [Google Scholar]
Lee, S.H.; Chan, C.; Wilkin, P.; Remagnino, P. Deep-plant: Plant identification with convolutional neural networks. In Proceedings of the 2015 IEEE International Conference on Image Processing (ICIP), Quebec City, QC, Canada, 27–30 September 2015; pp. 452–456. [Google Scholar] [CrossRef] [Green Version]
Grinblat, G.L.; Uzal, L.C.; Larese, M.G.; Granitto, P.M. Deep learning for plant identification using vein morphological patterns. Comput. Electron. Agric. 2016, 127, 418–424. [Google Scholar] [CrossRef] [Green Version]
Sun, Y.; Liu, Y.; Wang, G.; Zhang, H. Deep Learning for Plant Identification in Natural Environment. Comput. Intell. Neurosci. 2017, 6, 1–6. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Rawat, W.; Wang, Z.H. Deep Convolutional Neural Networks for Image Classification: A Comprehensive Review. Neural Comput. 2017, 29, 2352–2449. [Google Scholar] [CrossRef] [PubMed]
He, K.M.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar] [CrossRef] [Green Version]
Carpentier, M.; Giguère, P.; Gaudreault, J. Tree Species Identification from Bark Images Using Convolutional Neural Networks. In Proceedings of the 2018 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Madrid, Spain, 1–5 October 2018; pp. 1075–1081. [Google Scholar] [CrossRef] [Green Version]
Misra, D.; Crispim, C.F.; Tougne, L. Patch-Based CNN Evaluation for Bark Classification. Eur. Conf. Comput. Vis. 2020, 12540, 197–212. [Google Scholar] [CrossRef]
Robert, M.; Dallaire, P.; Giguère, P. Tree bark reidentification using a deep-learning feature descriptor. In Proceedings of the 17th Conference on Computer and Robot Vision (CRV), Ottawa, ON, Canada, 13–15 May 2020; pp. 25–32. [Google Scholar] [CrossRef]
Faizal, S. Automated Identification of Tree Species by Bark Texture Classification Using Convolutional Neural Networks. Int. J. Res. Appl. Sci. Eng. Technol. 2022, 10, 1384–1392. [Google Scholar] [CrossRef]
Kim, T.K.; Hong, J.; Ryu, D.; Kim, S.; Byeon, S.; Huh, W.; Kim, K.; Baek, G.H.; Kim, H.S. Identifying and extracting bark key features of 42 tree species using convolutional neural networks and class activation mapping. Sci. Rep. 2022, 12, 4772. [Google Scholar] [CrossRef] [PubMed]
Liu, Z.; Mao, H.; Wu, C.; Feichtenhofer, C.; Darrell, T.; Xie, S. A ConvNet for the 2020s. In Proceedings of the 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), New Orleans, LA, USA, 18–24 June 2022; pp. 11966–11976. [Google Scholar] [CrossRef]
Liu, Z.; Lin, Y.; Cao, Y.; Hu, H.; Wei, Y.; Zhang, Z.; Lin, S.; Guo, B. Swin Transformer: Hierarchical Vision Transformer using Shifted Windows. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 11–17 October 2021; pp. 9992–10002. [Google Scholar] [CrossRef]
Paszke, A.; Gross, S.; Massa, F.; Lerer, A.; Bradbury, J.; Chanan, G.; Killeen, T.; Lin, Z.; Gimelshein, N.; Antiga, L.; et al. PyTorch: An Imperative Style, High-Performance Deep Learning Library. Neural Inf. Process. Syst. 2019. submitted. [Google Scholar] [CrossRef]
Sebastian, R. Machine Learning with PyTorch and Scikit-Learn: Develop Machine Learning and Deep Learning Models with Python; Liu, Y.X., Vahid, M., Dmytro, D., Eds.; Packt Publishing: Birmingham, UK, 2022. [Google Scholar]
Sundararajan, M.; Taly, A.; Yan, Q. Axiomatic Attribution for Deep Networks. In Proceedings of the 34th International Conference on Machine Learning (ICML 17), Sydney, NSW, Australia, 6–11 August 2017; Volume 70. [Google Scholar] [CrossRef]
Omeiza, D.; Speakman, S.; Cintas, C.; Weldemariam, K. Smooth Grad-CAM++: An Enhanced Inference Level Visualization Technique for Deep Convolutional Neural Network Models. Intell. Syst. Conf. 2019. submitted. [Google Scholar] [CrossRef]
Selvaraju, R.R.; Das, A.; Vedantam, R.; Cogswell, M.; Parikh, D.; Batra, D. Grad-CAM: Why did you say that? Visual Explana tions from Deep Networks via Gradient-based Localization. arXiv 2019, arXiv:1610.02391. [Google Scholar] [CrossRef]
Smilkov, D.; Thorat, N.; Kim, B.; Viégas, F.B.; Wattenberg, M. SmoothGrad: Removing noise by adding noise. arXiv 2017, arXiv:1706.03825. [Google Scholar] [CrossRef]
Collins, E.; Achanta, R.; Süsstrunk, S. Deep Feature Factorization for Concept Discovery. arXiv 2018, arXiv:1806.10206. [Google Scholar] [CrossRef] [Green Version]
Kapishnikov, A.; Venugopalan, S.; Avci, B.; Wedin, B.; Terry, M.; Bolukbasi, T. Guided Integrated Gradients: An Adaptive Path Method for Removing Noise. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 5050–5058. [Google Scholar] [CrossRef]
Selvaraju, R.R.; Cogswell, M.; Das, A.; Vedantam, R.; Parikh, D.; Batra, D. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 618–626. [Google Scholar] [CrossRef] [Green Version]
Ruan, D.; Yan, Y.; Lai, S.Q.; Chai, Z.H.; Shen, C.H.; Wang, H.Z. Feature Decomposition and Reconstruction Learning for Effective Facial Expression Recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Nashville, TN, USA, 20–25 June 2021; pp. 13967–13976. [Google Scholar] [CrossRef]

Figure 1. Bark sample images and index of different tree species in BarkNet dataset.

Figure 2. ConvNeXt network architecture (ConvNeXt-B as an example). (a) shows network workflow, while (b) depicts architecture of each block in a stage of network.

Figure 3. Training process of ConvNeXt network. In (a–c), vaild_acc1 and vaild_acc5 represent Top-1 and Top-5 accuracies on validation set, respectively, and value in parentheses in subheadings indicates highest Top-1 accuracy of network on test set. In (d–f), train_loss and test_loss indicate loss function values of network on training and validation sets, respectively. A reference line is added to indicate moment when Top-1 accuracy of pre-training network initially reaches 90%.

Figure 4. ConvNeXt-S confusion matrix on test set. True Class and Prediction Class represent tree species index and the network prediction, respectively. Average classification accuracy of each tree species is shown in parentheses next to Prediction Class.

Figure 5. Integrated gradient visualization for network prediction. Left image presents original image, right image is visualization of deep network visual attractiveness captured after adding Gaussian noise smoothing, green color is location of high attractiveness, and deeper color indicates region’s importance within network vision.

Figure 6. Class activation mapping generated via Smooth Grad CAM++. Visual appeal is positioned to simulate three stages of ConvNeXt-S network, representing high and low visual attractiveness on a heat map. Distribution map of cold spots on bark image is made by superimposing original image to heat map.

Figure 7. Visualization of network depth feature decomposition. Original image is shown on left. Right image is semantically segmented via deep feature decomposition into regions reflecting different category features, using different colors to mark feature regions. Title contains tree species IDs and corresponding predicted category probabilities.

Table 1. Partially available public bark image datasets.

Collaborators	Dataset Name	Species	Number of Images	Dataset Size	Creation Year
Truong Hoang	BarkVN 50	50	5578	185 MB	2020
Rémi Ratajczak	Bark 101	101	2592	317 MB	2019
Matic Švab	TRUNK 12	12	360	1.1 GB	2014
Tae Kyung	BARK-KR	54	6918	9.77 GB	2021
Carpentier	BarkNet 1.0	23	23616	30.1 GB	2017
Cui	BarkNJ	10	7671	21.4 GB	2023

Table 2. Basic information of BarkNetV2 used in study. For convenience of training, we named each tree species in form of a species index, which is index list of tree species.

ID	Species	Common Name	Species Index	Number of Trees	Number of Images
1	Abies balsamea	Balsam fir	SAB	41	922
2	Acer platanoides	Norway maple	ERB	1	70
3	Acer rubrum	Red maple	ERR	64	1676
4	Acer saccharum	Sugar maple	ERS	81	1911
5	Betula alleghaniensis	Yellow birch	BOJ	43	1255
6	Betula papyrifera	White birch	BOP	32	1285
7	Camptotheca acuminata	Campo tree	CAAA	28	620
8	Cedrus deodara	Deodar cedar	CSDA	43	874
9	Cinnamomum camphora	Camphor wood	CMCA	49	947
10	Cupressus funebris	Cypress wood	CSFS	38	710
11	Fagus grandifolia	American beech	HEG	41	840
12	Fraxinus americana	White ash	FRA	61	1472
13	Juniperus chinensis	Round cypress	JSCS	52	927
14	Koelreuteria paniculata	Golden rain tree	KAPA	35	627
15	Larix laricina	Tamarack	MEL	77	1874
16	Liriodendron chinense	Liriodendron	LNCE	33	562
17	Metasequoia glyptostroboides	Redwood	MAGS	50	743
18	Ostrya virginiana	American hophornbeam	OSV	29	612
19	Picea abies	Norway spruce	EPO	72	1324
20	Picea glauca	White spruce	PIR	44	596
21	Picea mariana	Black spruce	EPN	44	885
22	Picea rubens	Red spruce	EPR	27	740
23	Pinus rigida	Pitch pine	PID	4	123
24	Pinus resinosa	Red pine	EPB	29	596
25	Pinus strobus	Eastern white pine	PIB	39	1023
26	Platanus acerifolia	Plane tree	PSAA	47	705
27	Populus canadensis	Canadian poplar	PSCS	69	1044
28	Populus grandidentata	Big-tooth aspen	PEG	3	64
29	Populus tremuloides	Quaking aspen	PET	58	1037
30	Quercus rubra	Northern red oak	CHR	109	2724
31	Thuja occidentalis	Northern white cedar	THO	38	746
32	Tsuga canadensis	Eastern hemlock	PRU	45	986
33	Ulmus americana	American elm	ORA	24	767
Total	NA	NA	NA	1398	31,287

Note: For convenience of distinction, images named with three capital letters are images located in BarkNet dataset. Images named with four capital letters are located in BarkNJ dataset that we collected. Tree species mentioned in BarkNet are typical of eastern coastal forests of Canada (sub-boreal coniferous forest climate and humid continental climate), whereas species mentioned in BarkNJ are common in eastern China (subtropical monsoon climate).

Table 3. Main parameters of four ConvNeXt networks.

Network	Parameters (M)	Channels	Stage	Flops (G)	Accuracy (ImageNet-1k)
ConvNeXt-T(tiny)	28.59	(96, 192, 384, 768)	(3, 3, 9, 3)	4.46	82.1%
ConvNeXt-S(small)	50.22	(96, 192, 384, 768)	(3, 3, 27, 3)	8.69	83.1%
ConvNeXt-B(base)	88.59	(128, 256, 512, 1024)	(3, 3, 27, 3)	15.36	85.1%
ConvNeXt-L(large)	197.77	(192, 384, 768, 1536)	(3, 3, 27, 3)	34.37	85.5%

Note: Parameters denotes number of network parameters, Channels denote number of channel dimensions, Stage denotes proportion of blocks per stage in network structure, and Flops denotes floating-point operations per second.

Table 4. Pre-training parameters for three ConvNeXt networks.

Network	Batch Size	Image Size	Learning Rate	Learning Rate Schedule	Training Epochs	Warm-Up Epochs
ConvNeXt-T	32	224 × 224	1 × 10⁻³	Cosine decay	50	10
ConvNeXt-S	32	224 × 224	1 × 10⁻³	Cosine decay	50	10
ConvNeXt-B	32	224 × 224	1 × 10⁻³	Cosine decay	50	10

Table 5. Identification confidence of ConvNeXt-S.

Species Index	Scientific Name	Test Image	Mean	Std	Accuracy
BOJ	Betula alleghaniensis	126	0.7952	0.1451	97.62%
BOP	Betula papyrifera	129	0.7899	0.1701	96.12%
CAAA	Camptotheca acuminata	62	0.8049	0.1104	98.39%
CHR	Quercus rubra	273	0.7528	0.1713	95.97%
CMCA	Cinnamomum camphora	95	0.8344	0.0354	100.00%
CSDA	Cedrus deodara	88	0.8262	0.0328	100.00%
CSFS	Cupressus funebris	71	0.7988	0.0697	100.00%
EPB	Pinus resinosa	60	0.7465	0.2051	95.00%
EPN	Picea mariana	89	0.8049	0.0794	100.00%
EPO	Picea abies	133	0.7760	0.1517	96.99%
EPR	Picea rubens	74	0.7925	0.1594	93.24%
ERB	Acer platanoides	7	0.7662	0.1027	100.00%
ERR	Acer rubrum	168	0.7654	0.1646	95.24%
ERS	Acer saccharum	192	0.7215	0.2241	95.31%
FRA	Fraxinus americana	148	0.8041	0.0846	98.65%
HEG	Fagus grandifolia	84	0.7993	0.1348	98.81%
JSCS	Juniperus chinensis	93	0.8190	0.0936	100.00%
KAPA	Koelreuteria paniculata	63	0.8299	0.0307	100.00%
LNCE	Liriodendron chinense	57	0.7917	0.0907	98.25%
MAGS	Metasequoia glyptostroboides	75	0.7926	0.1030	98.67%
MEL	Larix laricina	188	0.8451	0.0323	100.00%
ORA	Ulmus americana	77	0.7246	0.1991	94.81%
OSV	Ostrya virginiana	62	0.7519	0.1867	88.71%
PEG	Populus grandidentata	7	0.9040	0.0805	100.00%
PET	Populus tremuloides	104	0.8061	0.0889	99.04%
PIB	Pinus strobus	103	0.7855	0.1454	100.00%
PID	Pinus rigida	13	0.7917	0.1131	100.00%
PIR	Picea glauca	60	0.7904	0.1546	98.33%
PRU	Tsuga canadensis	99	0.8027	0.1131	98.99%
PSAA	Platanus acerifolia	71	0.8106	0.0573	98.59%
PSCS	Populus canadensis	105	0.8280	0.0295	100.00%
SAB	Abies balsamea	93	0.7572	0.1913	93.55%
THO	Thuja occidentalis	75	0.8108	0.0501	100.00%

Note: Test image indicates number of images in test set for each category. Mean and Std indicate mean and standard deviation of category probability, respectively. When top-1 category is true label, we record its probability. Otherwise, its probability is recorded as zero. Means and variances in category probabilities are obtained via statistics on classification results of all images in test set. Accuracy indicates identification accuracy of each tree species.

Table 6. Details of sample image.

True Class	Prediction Top-K Accuracy
True Class	1	2	3	4	Sum of Top-n
MEL (Larix laricina)	0.865 (MEL)	0.007 (ERR)	0.007 (ERS)	0.007 (ERR)	0.133
EPO (Picea abies)	0.830 (EPO)	0.014 (MEL)	0.012 (EPR)	0.009 (ERR)	0.117
LNCE (Liriodendron chinense)	0.834 (LNCE)	0.012 (ERR)	0.011 (CHR)	0.008 (PET)	0.136
MAGS (Metasequoia glyptostroboides)	0.856 (MAGS)	0.007 (MEL)	0.007 (FRA)	0.006 (SAB)	0.123
SAB (Abies balsamea)	0.569 (EPN)	0.272 (SAB)	0.020 (EPO)	0.011 (PIR)	0.128
ERS (Acer saccharum)	0.644 (FRA)	0.131 (CHR)	0.080 (ERS)	0.018 (ERR)	0.127
OSV (Ostrya virginiana)	0.853 (ERR)	0.012 (PIB)	0.011 (FRA)	0.010 (LNCE)	0.114
CAAA (Camptotheca acuminata)	0.770 (PSCS)	0.028 (FRA)	0.020 (ERR)	0.013 (THO)	0.169

Note: Prediction Top-K Accuracy denotes prediction probability of top-k items of network weights on image, and Sum of Top-n denotes sum of probabilities of categories other than top-4 items.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cui, Z.; Li, X.; Li, T.; Li, M. Improvement and Assessment of Convolutional Neural Network for Tree Species Identification Based on Bark Characteristics. Forests 2023, 14, 1292. https://doi.org/10.3390/f14071292

AMA Style

Cui Z, Li X, Li T, Li M. Improvement and Assessment of Convolutional Neural Network for Tree Species Identification Based on Bark Characteristics. Forests. 2023; 14(7):1292. https://doi.org/10.3390/f14071292

Chicago/Turabian Style

Cui, Zhelin, Xinran Li, Tao Li, and Mingyang Li. 2023. "Improvement and Assessment of Convolutional Neural Network for Tree Species Identification Based on Bark Characteristics" Forests 14, no. 7: 1292. https://doi.org/10.3390/f14071292

APA Style

Cui, Z., Li, X., Li, T., & Li, M. (2023). Improvement and Assessment of Convolutional Neural Network for Tree Species Identification Based on Bark Characteristics. Forests, 14(7), 1292. https://doi.org/10.3390/f14071292

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Improvement and Assessment of Convolutional Neural Network for Tree Species Identification Based on Bark Characteristics

Abstract

1. Introduction

2. Materials and Methods

2.1. Datasets

2.2. Methodology

2.2.1. Selection of the Networks

2.2.2. Setting of the Network Parameters

2.2.3. Visualization of the Network Workflow

3. Results and Analysis

3.1. Comparison of Identification Results

3.2. Identification Precision by Species

3.3. Visualization of Network Identification Process

3.3.1. Selection of Sample Images

3.3.2. Integrated Gradient Visualization

3.3.3. Class Activation Mapping Hot Spots

3.3.4. Image Depth Feature Decomposition

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI