Comparison of Deep Learning Models and Feature Schemes for Detecting Pine Wilt Diseased Trees

Zhi, Junjun; Li, Lin; Zhu, Hong; Li, Zipeng; Wu, Mian; Dong, Rui; Cao, Xinyue; Liu, Wangbing; Qu, Le’an; Song, Xiaoqing; Shi, Lei

doi:10.3390/f15101706

Open AccessArticle

Comparison of Deep Learning Models and Feature Schemes for Detecting Pine Wilt Diseased Trees

by

Junjun Zhi

¹

,

Lin Li

¹,

Hong Zhu

¹,

Zipeng Li

¹,

Mian Wu

¹,

Rui Dong

¹,

Xinyue Cao

^1,2,*,

Wangbing Liu

³,

Le’an Qu

¹

,

Xiaoqing Song

¹ and

Lei Shi

⁴

¹

School of Geography and Tourism, Anhui Normal University, Wuhu 241002, China

²

College of Resources and Environment, Anhui Science and Technology University, Chuzhou 233100, China

³

Key Laboratory of Jianghuai Arable Land Resources Protection and Eco-Restoration, Hefei 230088, China

⁴

No. 311 Geological Team, Bureau of Geology and Mineral Exploration of Anhui Province, Anqing 246000, China

^*

Author to whom correspondence should be addressed.

Forests 2024, 15(10), 1706; https://doi.org/10.3390/f15101706

Submission received: 26 July 2024 / Revised: 17 September 2024 / Accepted: 25 September 2024 / Published: 26 September 2024

(This article belongs to the Section Forest Inventory, Modeling and Remote Sensing)

Download

Browse Figures

Versions Notes

Abstract

:

Pine wilt disease (PWD) is a severe forest disease caused by the invasion of pine wood nematode (Bursaphelenchus xylophilus), which has caused significant damage to China’s forestry resources due to its short disease cycle and strong infectious ability. Benefiting from the development of unmanned aerial vehicle (UAV)-based remote sensing technology, the use of UAV images for the detection of PWD-infected trees has become one of the mainstream methods. However, current UAV-based detection studies mostly focus on multispectral and hyperspectral images, and few studies have focused on using red–green–blue (RGB) images for detection. This study used UAV-based RGB images to extract feature information using different color space models and then utilized semantic segmentation techniques in deep learning to detect individual PWD-infected trees. The results showed that: (1) The U-Net model realized the optimal image segmentation and achieved the highest classification accuracy with F1-score, recall, and Intersection over Union (IoU) of 0.9586, 0.9553, and 0.9221, followed by the DeepLabv3+ model and the feature pyramid networks (FPN) model. (2) The RGBHSV feature scheme outperformed both the RGB feature scheme and the hue saturation value (HSV) feature scheme, which were unrelated to the choice of the semantic segmentation techniques. (3) The semantic segmentation techniques in deep-learning models achieved superior model performance compared with traditional machine-learning methods, with the U-Net model obtaining 4.81% higher classification accuracy compared with the random forest model. (4) Compared to traditional semantic segmentation models, the newly proposed segment anything model (SAM) performed poorly in identifying pine wood nematode disease. Its success rate is 0.1533 lower than that of the U-Net model when using the RGB feature scheme and 0.2373 lower when using the HSV feature scheme. The results showed that the U-Net model using the RGBHSV feature scheme performed best in detecting individual PWD-infected trees, indicating that the proposed method using semantic segmentation technique and UAV-based RGB images to detect individual PWD-infected trees is feasible. The proposed method not only provides a cost-effective solution for timely monitoring forest health but also provides a precise means to conduct remote sensing image classification tasks.

Keywords:

pine wilt disease; remote sensing; semantic segmentation; deep learning; UAV; RGB image

1. Introduction

Forestry resources are one of China’s most important natural resources, and the control of forest pests and diseases has always been a crucial task in the field of forestry ecology. Pine wilt disease (PWD), also known as pine wood nematode (Bursaphelenchus xylophilus) disease, is referred to as the cancer of pine trees. It is one of the most destructive and dangerous types of pests and diseases in the field of forestry, which was first discovered in the United States of America and later spread to various regions of the world [1]. In China, the first case of pine wood nematode infection was detected in Jiangsu Province in 1982; since then, PWD has rapidly spread across the country [2]. According to a 2024 announcement by the National Forestry and Grassland Administration of China, PWD has spread to 664 county-level epidemic areas in 18 provinces (including municipalities and autonomous regions). The economic losses caused by this disease are enormous and have caused significant damage to China’s environmental protection work and forestry resources.

PWD has the characteristics of a wide transmission range, fast spread, high mortality, and is difficult and costly to control. It takes as little as 40 days for a single tree to die from infection and only 3 to 5 years for an entire pine forest to be infected. The optimal strategy for controlling such disease is early detection and early intervention, aiming to manage the disease at the initial stages of tree infection [3]. However, traditional survey methods mainly rely on manual surveys, which are time-consuming and costly and difficult to quickly grasp the dynamic occurrence of PWD in the epidemic area. Consequently, the traditional survey methods fail to meet the needs of regular pest and disease monitoring [4]. Fortunately, remote sensing technology offers advantages such as wide coverage, high temporal resolution, short revisit cycles, and extensive spatial reach, providing a robust technical foundation for timely monitoring of PWD. Currently, there are two main methods for monitoring forest pests and diseases using remote sensing technology, including satellite remote sensing technology and unmanned aerial vehicle (UAV) remote sensing technology. Among these, numerous studies have focused on the application of high-resolution satellite images in monitoring forest pests and diseases, such as WorldView [5], QuickBird [6], and GeoEye-1 [7]. However, the application of satellite images still has several limitations, such as optical satellite images are easily affected by cloudy weather, atmospheric conditions, and spatial resolution constraints, while radar satellite images are susceptible to interference from mountainous terrain. Additionally, satellite images cannot always provide the timely data needed for epidemic areas due to the limitations of revisit cycles. In contrast, UAV images overcome the limitations of satellite images and have higher spatial resolution, which has the potential to better identify PWD-infected trees. Currently, scholars have conducted many studies on the application of UAV remote sensing technology to detect PWD [8], such as multispectral [9], hyperspectral [10], and LiDAR [11].

In addition to the original bands of remote sensing images, indices derived from the original bands of remote sensing images are commonly used features for detecting forest pests and diseases. Numerous studies have demonstrated that commonly used remote sensing vegetation indices can effectively detect diseased trees, such as the normalized difference vegetation index (NDVI) [12] and enhanced vegetation index (EVI) [13]. Additionally, some studies have focused on the pathogenesis of PWD to develop specific indices targeting PWD and have achieved promising results, such as single red-edge [14], green normalized difference vegetation index (GNDVI) [15,16], Green-Red Spectral Area Index (GRSAI) [17], Green to Red Region Spectral anGle Index (GRRSGI) [18], and Multiple Ratio Disease-Water Stress Indices (MR-DSWIs) [19]. Besides vegetation indices, the use of texture features has also been proven to significantly improve the accuracy of identifying diseased trees [20].

In recent years, due to the rapid development of machine-learning and computer vision technologies, numerous scholars have conducted studies on integrating UAV remote sensing technology with machine-learning methods to identify diseased trees. Some of the machine-learning methods used include random forest [11], support vector machines [21], and spatiotemporal change detection [22]. With the rise of deep learning methods, an increasing number of studies have focused on the integration of deep learning and remote sensing and have demonstrated that deep learning techniques can be effectively applied to the detection and extraction of diseased trees [23,24,25]. For the applications of deep learning to disease monitoring, two types of techniques have been widely used: (1) Object detection technique, which involves improving model performance of existing object detection models, for example, the you only look once (YOLO), including modifying the backbone architecture of the YOLOv3 model [26], making lightweight improvements to the YOLOv4 model to obtain the YOLOv4-Tiny-3Layers model [27], combining the YOLOv5 model with spatial pyramid pooling and focus mechanism to create the YOLO-PWD model [28], integrating UAV images with Sentinel-2 satellite data to develop the YOLOv5-PWD model [29], and combining the YOLOv8 model with attention mechanisms for an improved model [30]. Apart from the YOLO series models, the Faster R-CNN model has also been proven to perform well in the field of diseased tree detection [31]; (2) Semantic segmentation technique. Few studies have focused on the applications of such techniques in diseased tree detection, such as the novel segmentation model SCANet [32] and the semi-supervised semantic segmentation model [33]. However, compared to the object detection technique, the application of semantic segmentation techniques in PWD detection is limited, especially the lack of attempts to utilize semantic segmentation models integrated with UAV images for this purpose.

The study compared three classic semantic segmentation models combined with different feature schemes to detect individual PWD-infected trees. Specifically, this study aims to achieve the following three objectives: (1) Compare the effects of red–green–blue (RGB), hue saturation value (HSV), and RGBHSV feature schemes on the identification accuracy; (2) compare the performance of U-Net, DeepLabV3+, and feature pyramid networks (FPNs) semantic segmentation models in detecting individual PWD-infected trees; (3) compare the model performance of deep learning models with traditional machine-learning methods and a newly proposed segment anything model (SAM) in the detection of individual PWD-infected trees. The study utilizes UAV images and semantic segmentation techniques for rapidly and accurately monitoring PWD with direct localization of individual PWD-infected trees. The proposed approach provides an efficient and straightforward solution for the prevention and management of PWD.

2. Materials

2.1. Study Area

Anhui Province (29°41′ N–34°38′ N and 114°54′ E–119°37′ E) is in central China (Figure 1). The topography of Anhui Province is primarily composed of plains, hills, and mountains, exhibiting a geographical pattern of high elevations in the south and lower elevations in the central and north regions. Anhui Province boasts abundant forestry resources, with forests covering nearly one-third of its total land area. Most of these forests are in the southern Anhui mountainous region and the Dabie Mountains. According to the Ninth Forest Resource Inventory Report, coniferous forests dominate Anhui’s forest resources, accounting for 27.29% of the province’s total forest area. However, the National Forestry and Grassland Administration’s 2024 announcement reported that there were 48 county-level epidemic areas in Anhui Province, indicating a severe prevention and control situation.

Qianshan City (30°27′ N–31°04′ N and 116°14′ E–116°46′ E) was selected as the study area, which is in the southwest Anhui Province and on the southeastern foothills of the Dabie Mountains. The terrain of Qianshan City is terraced, with elevation gradually decreasing from northwest to southeast. The vegetation cover mainly includes deciduous broad-leaved forests, evergreen broad-leaved forests, coniferous forests, mountain meadows, and grasslands. Qianshan City has numerous forestry areas and high forest coverage with a total forest cover rate of 39%, making it a prime location for monitoring PWD, which has long been a persistent issue in the region.

2.2. Data Source and Processing

The DJI M300 series drone equipped with the Zenmuse-P1 camera was used to obtain RGB images. The camera has an effective pixel count of 45 million, with a minimum shooting interval of 0.7 s. The planar accuracy is 3 cm, while the elevation accuracy is 5 cm. The ISO range is 100 to 25,600. The flowchart of the experiment is shown in Figure 2. After preprocessing, all RGB images were resampled with a spatial resolution of 0.1 m. The flights took place in November 2021 under favorable weather conditions with no clouds or rain and were set at a flying altitude of 350 m. The captured areas included Tianzhushan Forest Farm, Tianzhushan Town, and parts of Tuoling Forest Farm in Qianshan City, covering a total area of 140.17 km².

Based on the obtained UAV RGB images and field surveys, a total of 561 PWD-infected tree sample points were identified and vectorized through manual annotation. The annotated results were binarized in ArcGIS Pro to create label rasters, where 1 represents diseased trees and 0 represents others. By using the slicing function in ArcGIS Pro, the UAV RGB images and label rasters were cropped with a size of 512 * 512 pixels and a forward stride of 512 pixels, resulting in 2094 sets of valid data.

To ensure sample balance, we selected a total of 2050 slices containing diseased trees and 1134 samples containing other land cover types, with a ratio close to 2:1. The other land cover types mainly include buildings, bare land, non-pwd, plantation, and water (Figure 2).

After the classification was completed, post-processing operations were performed on all classification results, including small patch processing, clustering, primary and secondary analysis, and raster-to-vector conversion. These operations were all carried out in ENVI. The entire experimental process is shown in Figure 3.

3. Methods

3.1. Color Space Model

Common color space models mainly include the RGB color space model and the HSV color space model, in which the RGB color space model consists of three components: red (R), green (G), and blue (B). It is widely used in the field of computer vision due to its simple representation principle. However, the three-color components in the RGB color space model exhibit a certain degree of correlation, which may lead to information redundancy and potentially affect classification accuracy.

The HSV color space model consists of three components: hue (H), saturation (S), and value (V). Previous studies have shown that the HSV color space model has advantages over the RGB color space model in the field of remote sensing digital image processing. For instance, converting an RGB color space model to an HSV color space model could improve classification accuracy [34]. For an RGB image with pixel values ranging from 0 to 255, the process for calculating HSV is as follows [35]:

First, normalizing the RGB values to the range [0, 1]:

R^{'} = R / 255

(1)

G^{'} = G / 255

(2)

B^{'} = B / 255

(3)

Second, calculating value (V), saturation (S), and hue (H):

V = C_{m a x}

(4)

S = \{\begin{matrix} 0, C_{m a x} = C_{m i n} \\ \frac{C_{m a x} - C_{m i n}}{C_{m a x}}, C_{m a x} \neq C_{m i n} \end{matrix}

(5)

H = \{\begin{matrix} 60 ° \times (\frac{G^{'} - B^{'}}{Δ} m o d 6), C_{m a x} = R^{'} \\ 60 ° \times (\frac{B^{'} - R^{'}}{Δ} + 2), C_{m a x} = G^{'} \\ 60 ° \times (\frac{R^{'} - G^{'}}{Δ} + 4), C_{m a x} = B^{'} \end{matrix}

(6)

where

C_{m a x}

is the maximum value of the normalized RGB channels,

C_{m i n}

is the minimum value, and

Δ

is the difference between

C_{m a x}

and

C_{m i n}

:

C_{m a x} = m a x (R^{'}, G^{'}, B^{'})

(7)

C_{m i n} = m i n (R^{'}, G^{'}, B^{'})

(8)

Δ = C_{m a x} - C_{m i n}

(9)

To assess the impact of input features on classification accuracy, we tested three input feature schemes, including RGB, HSV, and RGBHSV. After converting the RGB images to HSV images, we stacked the RGB and HSV images to create a six-band RGBHSV image. The NumPy and OpenCV libraries in Python were adopted to perform the transformation and overlay processing.

3.2. Semantic Segmentation Model

Compared to traditional machine-learning algorithms, convolutional neural network (CNN) techniques can determine the target variable (e.g., PWD-infected trees) through operations such as convolution and fully connected layers, thereby improving classification accuracy. However, this technology still cannot provide the exact location information of the PWD-infected trees, necessitating manual visual interpretation in the later stages [36]. The fully convolutional network (FCN) proposed in 2015 not only can accurately provide health diagnostic information on the PWD-infected trees but also can provide their precise location information [37]. Therefore, this study employed and compared three advanced semantic segmentation models to identify the PWD-infected trees, including U-Net, FPN, and DeepLabV3+. The input layers of these models were extended to accept either three or six feature dimensions, allowing for a comparative analysis of the results produced by these models.

3.2.1. U-Net Model

The U-Net model is a deep learning model designed for image segmentation and is widely used for medical image segmentation [38]. Subsequently, numerous scholars have introduced the U-Net model into the field of remote sensing combined with remote sensing image segmentation techniques to achieve significant advancements, such as using improved U-Net models for image segmentation [39].

The U-Net model is a CNN-based model that consists of a symmetrical encoder and decoder. A complete U-Net model generally comprises three parts: encoder, decoder, and skipped connections. The encoder is composed of a series of convolutional layers and pooling layers, which are responsible for capturing context and extracting features from the input images. The encoder progressively reduces the spatial dimensions while increasing the depth of the feature maps, downsampling the input image to a smaller feature map, and extracting high-level semantic information. The decoder consists of a series of upsampling and convolutional layers. It upsamples the feature maps generated by the encoder back to the original resolution and merges them with the corresponding layers’ features from the encoder to progressively reconstruct the original resolution image. In this process, U-Net uses skipped connections to directly connect corresponding layers of the encoder and decoder. These connections enable the model to maintain fine-grained spatial information and ensure that the decoder is aware of both low-level and high-level features to obtain high-precision segmentation outcomes. Finally, U-Net employs a convolutional layer in the last layer of the decoder for conducting pixel-level classification and producing the final segmentation results. Taking a six-band image input as an example, the U-Net model structure is illustrated as Figure 4.

3.2.2. DeepLabv3+ Model

The DeepLabv3+ model, proposed by the Google Brain team in 2018, is the latest version in the DeepLab series [40]. It is a further improvement and optimization of the DeepLabv3 model and is also commonly used in remote sensing image classification tasks. Compared to the DeepLabv3 model, the DeepLabv3+ model offers higher segmentation accuracy and efficiency. Previous studies further improved the DeepLabv3+ model by building the semantic segmentation decoder on the DeepLabv3+ architecture and obtained high accuracy while reducing computational costs and processing time [41].

Due to the stridden pooling or convolution operations within the DeepLabv3 model’s backbone network, the model struggles to capture detailed information related to object boundaries. The DeepLabv3+ model uses DeepLabv3 as the encoder and adds a decoder module to recover the boundary information. Moreover, by combining the spatial pyramid pooling module with the encoder–decoder structure, the DeepLabv3+ model produces more refined segmentation results that not only contain rich semantic information but also restore boundary details. The DeepLabv3+ model structure is illustrated in Figure 5.

3.2.3. FPN Model

The FPN model constructs feature pyramids through cross-layer connections and a top-down feature pyramid structure. It simultaneously retains the semantic information of high-level features and the spatial information of low-level features [42]. FPN consists of two pathways: a bottom-up pathway and a top-down pathway. The bottom-up pathway extracts features through convolution operations, during which the spatial resolution continuously decreases while the feature resolution increases. The top-down pathway restores spatial resolution through upsampling operations while reducing the feature resolution. Like the U-Net model, FPN fuses feature maps from different levels through skipped connections in both the top-down and bottom-up pathways to obtain richer and multi-scale feature representations. Due to the ability to better capture contextual information and increase feature map resolution, FPN can obtain more useful information about small objects and is highly effective in handling small object segmentation. Additionally, the FPN model can adaptively construct feature pyramids without requiring scale transformations of the images, thus reducing computational load and time costs while improving segmentation accuracy. The FPN model structure is illustrated in Figure 6.

3.2.4. Settings for Training Framework

The PyTorch framework was used as the training framework with Compute Unified Device Architecture (CUDA) version 12.4. All three semantic segmentation models used resnet-152 as the backbone network. During the training processes, the batch size was set to 4, the maximum number of iterations was set to 50, and the learning rate was set to 0.0001. DiceLoss was used as the loss function, and Adam was used as the optimizer. The computer hardware included an NVIDIA GeForce 4606TI series graphics card with 16 GB of memory.

3.3. Random Forest Model

This study selected the random forest (RF) model, a commonly used machine-learning model typically applied to classification and regression problems [43], to compare with the model performance of the semantic segmentation model. Random forest uses a combination of multiple decision trees to make predictions and decisions. These decision trees can be seen as different classifiers, with each decision tree generated independently, making random forest better adapted to complex data and nonlinear relationships. Due to its excellent generalization ability and resistance to overfitting, random forest is often used to handle large and high-dimensional datasets [44].

This study implemented the random forest model using the Google Earth Engine platform. Two important parameters needed to be set for the random forest model: number of trees and mtry. The former represents the number of decision trees, and the latter represents the number of variables considered for each split, which were set to 500 and the square root of the number of variables, respectively. A total number of 11,504 random points were selected based on UAV images for training and validation, including 5518 PWD-infected tree points and 5986 non-diseased tree points. The training was conducted using the six RGBHSV features of these random points. The classification results of the random forest model were then compared with the optimal model of the three semantic segmentation models.

3.4. SAM Model

The segment anything model (SAM) is a semantic segmentation model developed by Meta [45]. SAM produces high-quality object masks from input prompts such as points or boxes, and it can be used to generate masks for all objects in an image. It has been trained on a dataset of 11 million images and 1.1 billion masks and has strong zero-shot performance on a variety of segmentation tasks. Since the SAM model can only accept three-channel inputs [46], we fine-tuned the SAM model using both the RGB and HSV images rather than the RGBHSV images.

In this study, we froze all VisionEncoder and PromptEncoder layers. During the training process, bounding boxes were used as prompts, while during the inference process, random points were used for prediction.

3.5. Accuracy Assessment

This study selected three indexes to evaluate the classification accuracies of the three semantic segmentation models, including Precision, F1-score, and intersection over union (IoU) [47]. The formulas for calculating these indexes are as follows:

P r e c i s i o n = \frac{T P}{T P + F P}

(10)

F 1 - s c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(11)

I o U = \frac{T P}{T P + F P + F N}

(12)

where TP represents true positives, which are the number of pixels correctly predicted by the model as positives (PWD-infected trees) and are positives. FP represents false positives, which are the number of pixels predicted by the model as positives but are negatives. TN represents true negatives, which are the number of pixels correctly predicted by the model as negatives and are negatives. FN represents false negatives, which are the number of pixels predicted by the model as negatives but are positives. In this study, positives refer to PWD-infected tree pixels, and negatives refer to non-diseased tree pixels. Recall is calculated as follows:

R e c a l l = \frac{T P}{T P + F N}

(13)

4. Result

4.1. Comparisons of Different Feature Schemes

The outputs of different feature schemes for identifying PWD-infected tree sample points are shown in Table 1. The results showed that the RGBHSV feature scheme obtained the highest classification accuracy, followed by the RGB feature scheme and the HSV feature scheme, which were unrelated to the choice of the semantic segmentation models. The classification accuracies of different feature schemes are mainly reflected in the number of successfully identified PWD-infected trees. For example, the RGBHSV feature scheme using the U-Net model successfully identified 553 PWD-infected trees with a success rate of 98.57%, which was 1.60% and 2.67% higher than that of the RGB and HSV feature schemes, respectively. The RGBHSV feature scheme using the FPN model successfully identified 546 diseased trees with a success rate of 97.33%, which was 0.72% and 2.14% higher than that of the RGB and HSV feature schemes, respectively. The RGBHSV feature scheme using the DeepLabv3+ model successfully identified 540 diseased trees with a success rate of 96.26%, which was 1.61% and 2.50% higher than that of the RGB and HSV feature schemes, respectively. Although the performance ranking of different feature schemes was consistent, the performance gaps among different semantic segmentation models varied obviously. Specifically, the performance difference between the RGB and RGBHSV feature schemes using the FPN model was small, whereas the advantage of the RGBHSV feature scheme was more pronounced compared with the RGB and RGBHSV feature schemes based on the U-Net and DeepLabv3+ models.

After using the trained models to predict all UAV images, the results are shown in Table 2. The U-Net model detected 710, 747, and 759 individual suspected PWD-infected trees using the RGB, HSV, and RGB-HSV feature schemes, respectively. Among these, 544, 538, and 553 trees were confirmed as true PWD-infected trees. The DeepLabv3+ model detected 717, 674, and 714 individual suspected PWD-infected trees using the RGB, HSV, and RGB-HSV feature schemes, respectively. Among these, 531, 526, and 540 trees were confirmed as true PWD-infected trees. The FPN model detected 727, 671, and 718 individual suspected PWD-infected trees using the RGB, HSV, and RGB-HSV feature schemes, respectively. Among these, 542, 534, and 546 trees were confirmed as true PWD-infected trees.

The model parameters are shown in Table 3. The U-Net model with 3-channel and 6-channel inputs has a total of 67,156,881 and 67,166,289 trainable parameters, respectively. The DeepLabv3+ model with 3-channel and 6-channel inputs has a total of 61,313,361 and 61,322,769 trainable parameters, respectively. The FPN model with 3-channel and 6-channel inputs has a total of 60,751,809 and 60,761,217 trainable parameters, respectively.

From the perspective of accuracy assessment indexes, there were differences in the classification results of the three semantic segmentation models (Table 4). For example, the RGBHSV feature scheme using the U-Net model obtained the highest classification accuracy, followed by the RGB feature scheme and the HSV feature scheme. Although the F1-score and IoU score of the RGBHSV feature scheme were higher than those of the other two input feature schemes, the difference was less than 1%, indicating a very slight advantage. The RGB feature scheme obtained the highest classification accuracy among the three feature schemes using the DeepLabv3+ model. Unlike the slight differences in the three feature schemes using the U-Net model, the advantage of the RGB feature scheme was evident compared with the other two feature schemes using the DeepLabv3+ model. This advantage was most pronounced in the IoU score, where the RGB feature scheme outperformed the HSV and RGBHSV feature schemes by 1.27% and 1.95%, respectively. There was almost no difference in classification accuracies among the three feature schemes using the FPN model, with the three accuracy assessment indexes being extremely close.

To better understand the differences in the performance of the three feature schemes, we compared some representative classification results, as shown in Figure 7. The classification results obtained by the RGB feature scheme using the U-Net model exhibited missed detections, failing to successfully identify some PWD-infected trees. Similarly, the classification results obtained by the HSV feature scheme using the FPN model also exhibited similar missed detections. Notably, the classification results obtained by the RGBHSV feature scheme showed that PWD-infected trees were successfully detected for both the U-Net model and the FPN model. The results indicated that a single-feature scheme may lack robustness and accuracy in complex environments. In contrast, the RGBHSV feature scheme demonstrated excellent performance in each model and outperformed the individual RGB and HSV feature schemes in detecting PWD-infected trees with the lowest missed detection rate and no incorrect detections, which may contribute to the ability to combine the advantages of both RGB and HSV features. Therefore, the RGBHSV feature scheme is a more effective and reliable choice in the task of detecting PWD-infected trees.

4.2. Comparisons of Different Semantic Segmentation Models

The outputs of different semantic segmentation models for identifying PWD-infected tree sample points showed the same trend in different feature schemes that the U-Net model obtained the highest classification accuracy, followed by the FPN model and the DeepLabv3+ model. Like the results of different feature schemes, there were differences in the classification results of the three semantic segmentation models. Specifically, the success rate of the U-Net model using the RGB feature scheme was 0.36% and 2.32% higher than that of the FPN and DeepLabv3+ models, respectively. The success rate of the U-Net model using the HSV feature scheme was 0.71% and 2.14% higher than that of the FPN and DeepLabv3+ models, respectively. The success rate of the U-Net model using the RGBHSV feature scheme was 1.24% and 2.31% higher than that of the FPN and DeepLabv3+ models, respectively, which showed the most significant difference. Moreover, the classification accuracies of the optimal models throughout the entire training also showed that the U-Net model also obtained the highest classification accuracy, followed by the FPN model and the DeepLabv3+ model. This indicates that the U-Net model outperforms both the FPN and DeepLabv3+ models.

To better understand the differences in the performance of the three semantic segmentation models, we compared some representative classification results, as shown in Figure 8. The classification results obtained by using the DeepLabv3+ model exhibited missed detections in both the HSV feature scheme and the RGBHSV feature scheme. Specifically, when using the HSV feature scheme, the DeepLabv3+ model failed to identify some PWD-infected tree samples, resulting in a higher missed detection rate. Moreover, when using the RGBHSV feature scheme, despite the integration of both RGB and HSV feature types, the DeepLabv3+ model still could not eliminate missed detections. This indicates that the DeepLabv3+ model may have certain limitations in dealing with complex PWD-infected tree detection tasks. Despite the structural advantages of the FPN model, which allows it to capture multi-scale information, some PWD-infected tree samples were still not successfully detected when using the RGB feature scheme. This could be due to the limitations of the RGB feature scheme in expressing color information, which may not adequately represent the characteristics of the PWD-infected tree samples. In contrast, the U-Net model successfully identified all PWD-infected tree samples using each of the three feature schemes and demonstrated exceptionally high accuracy and robustness. This indicates that the U-Net model has greater adaptability and reliability in the task of detecting PWD-infected tree samples and effectively handles detection challenges using different feature schemes. Therefore, among the three semantic segmentation models, the U-Net model is the most effective and reliable choice for detecting PWD-infected tree samples.

4.3. Comparison of the U-Net Model with the Traditional Machine-Learning Model

The random forest model was selected to compare with the semantic segmentation model (i.e., the U-Net model) with the highest model performance. By comparing the identification results of all PWD-infected tree samples using the random forest model and the U-Net model, it was found that the U-Net model’s success rate was 4.81% higher than that of the random forest model (Table 5). In terms of accuracy metrics, all three indexes demonstrated that the U-Net model outperformed the random forest model (Table 6). Specifically, the F1-score of the U-Net model was higher at 0.0254, the IoU score was improved by 0.0476, and the precision was increased by 0.0442. This indicates that the U-Net model is superior to the random forest model, whether from the perspective of identification success rate or accuracy metrics.

The prediction results of the U-Net model and the random forest model are shown in Figure 9. The prediction results of the random forest model had a serious misclassification issue, especially incorrectly identifying bare soil and building rooftops as PWD-infected tree pixels. In contrast, there were no fragmented patches in the prediction results of the U-Net model, with each PWD-infected tree clearly and accurately identified. This indicates that the U-Net model is better at accurately identifying PWD-infected trees.

4.4. Comparison of the U-Net Model with the SAM Model

Table 7 shows the difference in the parameters between the U-Net model and the SAM model. Both models have three-channel inputs, with the U-Net model having 67,156,881 trainable parameters, while the fine-tuned SAM model has 93,735,472 trainable parameters. The difference in parameters leads to a difference in training time between the two models.

The results of PWD identification accuracy are shown in Table 8. When using the RGB feature scheme, the U-Net model identified a total of 544 PWD-infected trees with a success rate of 96.97%. Compared with the SAM model, the U-Net model successfully detected 86 more PWD-infected trees, with a 15.33% higher success rate. When using the HSV feature scheme, the U-Net model identified a total of 538 PWD-infected trees with a success rate of 95.90%. Compared with the SAM model, the U-Net model successfully detected 136 more PWD-infected trees, with a 24.24% higher success rate.

In terms of accuracy metrics (Table 9), the U-Net model’s F1-score, IoU, and precision using the RGB feature scheme were higher than those of the SAM model by 0.1606, 0.1993, and 0.2929, respectively. When using the HSV feature scheme, these three accuracy indexes of the U-Net model were 0.2723, 0.3, and 0.3712 higher, respectively.

To better illustrate the difference between the two models, Figure 10 shows the details of the prediction results. Generally, there were more errors in the SAM model’s prediction outputs, such as incorrectly identifying shaded areas as PWD-infected trees when using the HSV feature scheme. This demonstrated that the SAM model’s ability to identify PWD-infected trees significantly was inferior to that of the U-Net model. Additionally, the SAM model incorrectly classified shadowed areas and some discolored trees as PWD-infected trees, whereas in the U-Net model, such instances are much less frequent.

5. Discussion

5.1. The Optimal Input Feature Scheme

When assessing the performance of the three input feature schemes, it is necessary to consider not only the models’ performance but also the practical applications of these feature schemes in image processing and computer vision. Although the RGB feature scheme is highly intuitive and widely used, it has poor feature separation. In contrast, the HSV feature scheme offers better feature independence and is more suitable for color segmentation. It is commonly used for land cover classification in the field of remote sensing [48]. The RGBHSV feature scheme comprehensively utilizes information from both the RGB and HSV color spaces and fully considers the contributions of color and brightness information to the image content, allowing it to more accurately capture object features in various scenarios. In the task of detecting PWD-infected tree samples, it indeed demonstrates the highest performance compared with the RGB and HSV feature schemes. The superiority of the RGBHSV input feature scheme lies in its relatively lower rate of misclassification, meaning that the model could more accurately distinguish different categories of objects, which is crucial for object detection tasks [49] and image classification [50].

In previous studies, multispectral and hyperspectral UAV images had been more widely used compared with UAV RGB images due to their rich feature information and higher identification accuracy. However, multispectral and hyperspectral UAV images are costly and difficult to obtain. As a result, the lower-cost UAV RGB images are a complementary choice for detecting diseased trees. Unlike previous studies, which typically only utilized a single color space model [51], this study explores color space information by combining information from different color space models as model input. Experimental results show that the synthesized color space model (i.e., the RGBHSV input feature scheme) achieves higher identification accuracy and lower false detection rates compared to a single-color space model. The superiority of the RGBHSV input feature scheme is apparent, which not only lies in the enhancement of model performance but also in its ability to meet the demands of various scenarios and offer better versatility and flexibility. This finding is validated both theoretically and practically, providing an important reference for the application of UAV RGB images in the detection tasks of PWD-infected trees.

5.2. The Optimal Semantic Segmentation Model

When selecting a suitable semantic segmentation model for a specific object detection task, it is important to consider factors including model architecture, feature extraction capability, and the ability to handle multi-scale features. Among the three semantic segmentation models used in this study, the unique structure of the U-Net model allows it to perform exceptionally well in many scenarios, which could be attributed to the encoder–decoder structure [52,53]. The encoder part captures the global features of the image through multi-level feature extraction, while the decoder part effectively reconstructs these features into pixel-level prediction results, thus preserving rich detail information. This makes the U-Net model highly advantageous for performing object detection tasks that require precise segmentation of targets. The ability to maintain detailed spatial information and context throughout the process enables the U-Net model to achieve high accuracy in segmenting complex images [54].

The FPN and DeepLabv3+ models also have their advantages in specific scenarios. The FPN model primarily addresses multi-scale issues by using a top-down feature fusion mechanism and effectively deals with targets at different scales. This makes the FPN model perform well in tasks such as object detection and instance segmentation, especially when dealing with large-scale scenes or targets of varying scales [55]. The DeepLabv3+ model expands the receptive field of the model by incorporating techniques such as atrous (dilated) convolutions and spatial pyramid pooling. This enables it to better handle segmentation tasks in large-scale scenes and large-scale images [56]. However, for tasks that require fine segmentation or the preservation of details, the U-Net model generally outperforms the DeepLabv3+ and FPN models [47]. The results of the comparison experiments of this study confirm this characteristic. In this study, PWD-infected trees usually occupy small areas in the image, which requires a high degree of fine extraction by the segmentation model. Fortunately, the U-Net model meets these requirements well, which is why the performance of the U-Net model is higher than that of the DeepLabv3+ and FPN models.

Additionally, it is worth noting that the differences between the random forest model and the U-Net model are quite pronounced and that the U-Net model is superior to the random forest model, which has also been observed in previous studies [53,57]. This is mainly because the random forest model essentially performs pixel-by-pixel classification, lacking the semantic information of surrounding pixels. It cannot capture the context of neighboring pixels, which leads to poorer performance. In contrast, the advantage of deep learning methods over traditional machine-learning methods in image classification tasks lies in their ability to fully utilize the correlations between pixels, while traditional machine-learning methods often rely on pixel-based classification, neglecting the spatial relationships between pixels. This results in poor performance for traditional machine-learning methods when dealing with salt-and-pepper noise or local discontinuities in the image. In contrast, deep learning methods can better capture the semantic information between pixels through operations such as convolution and pooling. This improves robustness to noise and local discontinuities, resulting in better performance when conducting semantic segmentation tasks in complex scenes.

The SAM model has achieved significant success in the field of computer vision and demonstrated notable potential in the field of remote sensing [46,58], which may benefit from one of SAM’s major advantages—its powerful recognition ability derived from pre-training on the SA-1B dataset. However, the SAM model performed the worst in this study. The probable reason may be that most of the images in SA-1B were taken from smartphones or cameras with fine spatial resolution, while the image data in this study were captured by drones with rough spatial resolution. The difference in spatial resolution may be one of the reasons for SAM’s poor performance. Additionally, most of the SA-1B data consist of RGB images, making the SAM model more adapted to RGB imagery. This also explains why the SAM model achieved relatively higher accuracy in recognizing RGB images compared to HSV images.

5.3. Error Source of the Semantic Segmentation Model

The sources of error in identifying PWD-infected trees using the semantic segmentation models primarily encompass two aspects, including missed detections and false positives. Regarding missed detections, the UAV RGB images collected for this study were captured in November 2021, when most PWD-infected trees were in the middle to late stages of infection, while a small number of PWD-infected trees were still in the early stages of infection. Given the significant differences in physiological condition and morphological characteristics between early-stage and middle-to-late-stage PWD-infected trees, the presence of PWD-infected trees at different stages within the study area can lead to the method missing many early-stage or middle-to-late-stage PWD-infected trees. This depends on the feature schemes chosen by the semantic segmentation model during the learning process. To address this issue, PWD-infected trees in early-stage and middle-to-late-stage can be sampled separately and subjected to multi-class classification. Regarding false positives, they are primarily influenced by the local features in RGB images. If the detection area contains a large amount of land cover with RGB digital values similar to those of PWD-infected trees, misclassifications may occur. For example, PWD-infected trees in middle-to-late-stage often exhibit reddish-brown colors in RGB images, which are similar to the colors of some building surfaces. This similarity may cause the model to mistakenly classify buildings as PWD-infected trees. To address this issue, new feature information, such as near-infrared bands and vegetation indices derived from satellite images, could be introduced, which help to differentiate between buildings and trees and reduce the occurrence of false positives.

5.4. Model Generalization

The prediction results of different models are shown in Figure 11. It was found that the predictive abilities of the U-Net, DeepLabv3+, and FPN models were similar, with the number of predicted suspected PWD-infected trees ranging from 650 to 750. When using the RGBHSV feature scheme, aside from the confirmed PWD-infected trees, the U-Net, DeepLabv3+, and FPN models identified 206, 174, and 178 more suspected PWD-infected trees, respectively. There were no significant misclassifications, but a few omissions were present. These suspected diseased trees cannot be distinguished from the drone images alone and require field investigations for confirmation. When using the HSV feature scheme, these three models identified 209, 148, and 137 more suspected PWD-infected trees, respectively, with no significant misclassifications and a few omissions. When using the RGB feature scheme, these three models identified 166, 186, and 185 more suspected PWD-infected trees, respectively. Unlike the previous two feature schemes, the RGB feature scheme presented noticeable misclassification issues, which were mainly due to serious confusion between some red roofs and PWD-infected trees showing reddish-brown colors, along with a few omissions. Generally, all three models have excellent generalization capabilities and can achieve high detection accuracy of PWD-infected trees at a large scale.

In contrast, for the RF and SAM models, the RF model detected a total of 1043 suspected PWD-infected trees, among which, only 526 were confirmed diseased trees. Moreover, there were many misclassifications, such as incorrectly classifying red roofs, mountain shadows, and other discolored trees as diseased trees. The SAM model using the RGB feature scheme and HSV feature scheme detected 862 and 1041 suspected PWD-infected trees, respectively, among which only 458 and 402 were confirmed PWD-infected trees. Similar to the RF model, the SAM model also exhibited a large number of misclassifications, and the omission of confirmed PWD-infected trees was also severe, with 103 and 159 confirmed PWD-infected trees missing, respectively. Compared to the three classical segmentation models (i.e., the U-Net, DeepLabv3+, and FPN models), both the RF and SAM models performed much poorer generalization capabilities.

6. Conclusions

This study utilized UAV RGB images and semantic segmentation techniques to detect individual PWD-infected trees. Specifically, the main innovations of this study are as follows: (1) By extracting color space feature information from RGB images and combining different color space features, the recognition accuracy was improved; (2) the backbone networks of three classic semantic segmentation models (i.e., the U-Net, DeepLabv3+, and FPN models) were modified for enhancement, and comparisons were made with a traditional machine-learning model (i.e., the RF model) and the newly proposed SAM model; (3) a low-cost, fast-recognition, high-precision, and large-scale monitoring deep learning technique for detecting PWD-infected trees was achieved. Nevertheless, the study has some limitations, such as instances of misclassification and false positives. In future research, distinguishing different stages of PWD-infected trees to further improve recognition accuracy deserves further exploration.

Author Contributions

Conceptualization, J.Z., X.C. and L.L.; methodology, J.Z. and L.L.; software, L.L., L.Q., R.D. and X.C.; validation, H.Z. and M.W.; formal analysis, J.Z. and X.C.; investigation, J.Z., L.S., H.Z., X.C., L.Q. and W.L.; resources, J.Z., L.S. and W.L.; data curation, L.S., R.D. and M.W.; writing—original draft preparation, L.L.; writing—review and editing, J.Z. and X.S.; visualization, L.L., H.Z., M.W., L.Q. and Z.L.; supervision, J.Z.; project administration, J.Z. and X.C.; funding acquisition, J.Z. and X.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was financed by the Natural Science Foundation of Anhui Province (No. 2208085MD91), the Natural Science Foundation of China (Nos. 42271060 and 42101419), and the Natural Resources Science and Technology Project of Anhui Province (No. 2023-K-5).

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Hou, Y.; Ding, Y. Dynamic analysis of pine wilt disease model with memory diffusion and nonlocal effect. Chaos Solitons Fractals 2024, 179, 114480. [Google Scholar] [CrossRef]
Xu, Q.; Zhang, X.; Li, J.; Ren, J.; Ren, L.; Luo, Y. Pine Wilt Disease in Northeast and Northwest China: A Comprehensive Risk Review. Forests 2023, 14, 174. [Google Scholar] [CrossRef]
Dong, Y.-H.; Peng, F.-L.; Li, H.; Men, Y.-Q. Spatial autocorrelation and spatial heterogeneity of underground parking space development in Chinese megacities based on multisource open data. Appl. Geogr. 2023, 153, 102897. [Google Scholar] [CrossRef]
Abdulridha, J.; Ehsani, R.; Abd-Elrahman, A.; Ampatzidis, Y. A remote sensing technique for detecting laurel wilt disease in avocado in presence of other biotic and abiotic stresses. Comput. Electron. Agric. 2019, 156, 549–557. [Google Scholar] [CrossRef]
Takenaka, Y.; Katoh, M.; Deng, S.; Cheung, K. Detecting Forests Damaged by Pine Wilt Disease at the Individual Tree Level Using Airborne Laser Data and Worldview-2/3 Images over Two Seasons. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2017, 42, 181–184. [Google Scholar] [CrossRef]
Coops, N.C.; Johnson, M.; Wulder, M.A.; White, J.C. Assessment of QuickBird high spatial resolution imagery to detect red attack damage due to mountain pine beetle infestation. Remote Sens. Environ. 2006, 103, 67–80. [Google Scholar] [CrossRef]
Dennison, P.E.; Brunelle, A.R.; Carter, V.A. Assessing canopy mortality during a mountain pine beetle outbreak using GeoEye-1 high spatial resolution satellite data. Remote Sens. Environ. 2010, 114, 2431–2435. [Google Scholar] [CrossRef]
Yang, G.; Liu, J.; Zhao, C.; Li, Z.; Huang, Y.; Yu, H.; Xu, B.; Yang, X.; Zhu, D.; Zhang, X.; et al. Unmanned Aerial Vehicle Remote Sensing for Field-Based Crop Phenotyping: Current Status and Perspectives. Front. Plant Sci. 2017, 8, 1111. [Google Scholar] [CrossRef]
Syifa, M.; Park, S.J.; Lee, C.W. Detection of the Pine Wilt Disease Tree Candidates for Drone Remote Sensing Using Artificial Intelligence Techniques. Engineering 2020, 6, 919–926. [Google Scholar] [CrossRef]
Yu, R.; Ren, L.; Luo, Y. Early detection of pine wilt disease in Pinus tabuliformis in North China using a field portable spectrometer and UAV-based hyperspectral imagery. For. Ecosyst. 2021, 8, 44. [Google Scholar] [CrossRef]
Zhao, X.; Qi, J.; Xu, H.; Yu, Z.; Yuan, L.; Chen, Y.; Huang, H. Evaluating the potential of airborne hyperspectral LiDAR for assessing forest insects and diseases with 3D Radiative Transfer Modeling. Remote Sens. Environ. 2023, 297, 113759. [Google Scholar] [CrossRef]
Suits, G.H. The Calculation of the Directional Reflectance of a Vegetative Canopy. Remote Sens. Environ. 1971, 2, 117–125. [Google Scholar] [CrossRef]
Liu, H.Q.; Huete, A. A Feedback Based Modification of the NDVI to Minimize Canopy Background and Atmospheric Noise. IEEE Trans. Geosci. Remote Sens. 1995, 33, 457–465. [Google Scholar] [CrossRef]
Li, N.; Huo, L.; Zhang, X. Using only the red-edge bands is sufficient to detect tree stress: A case study on the early detection of PWD using hyperspectral drone images. Comput. Electron. Agric. 2024, 217, 108665. [Google Scholar] [CrossRef]
Li, X.; Tong, T.; Luo, T.; Wang, J.; Rao, Y.; Li, L.; Jin, D.; Wu, D.; Huang, H. Retrieving the Infected Area of Pine Wilt Disease-Disturbed Pine Forests from Medium-Resolution Satellite Images Using the Stochastic Radiative Transfer Theory. Remote Sens. 2022, 14, 1526. [Google Scholar] [CrossRef]
Li, X.; Liu, Y.; Huang, P.; Tong, T.; Li, L.; Chen, Y.; Hou, T.; Su, Y.; Lv, X.; Fu, W.; et al. Integrating Multi-Scale Remote-Sensing Data to Monitor Severe Forest Infestation in Response to Pine Wilt Disease. Remote Sens. 2022, 14, 5164. [Google Scholar] [CrossRef]
Kim, S.-R.; Lee, W.-K.; Lim, C.-H.; Kim, M.; Kafatos, M.C.; Lee, S.-H.; Lee, S.-S. Hyperspectral Analysis of Pine Wilt Disease to Determine an Optimal Detection Index. Forests 2018, 9, 115. [Google Scholar] [CrossRef]
Zang, Z.; Wang, G.; Lin, H.; Luo, P. Developing a spectral angle based vegetation index for detecting the early dying process of Chinese fir trees. ISPRS J. Photogramm. Remote Sens. 2021, 171, 253–265. [Google Scholar] [CrossRef]
Huo, L.; Lindberg, E.; Bohlin, J.; Persson, H.J. Assessing the detectability of European spruce bark beetle green attack in multispectral drone images with high spatial- and temporal resolutions. Remote Sens. Environ. 2023, 287, 113484. [Google Scholar] [CrossRef]
Shamsoddini, A.; Trinder, J.C.; Turner, R. Pine plantation structure mapping using WorldView-2 multispectral image. Int. J. Remote Sens. 2013, 34, 3986–4007. [Google Scholar] [CrossRef]
Abdel-Rahman, E.M.; Mutanga, O.; Adam, E.; Ismail, R. Detecting Sirex noctilio grey-attacked and lightning-struck pine trees using airborne hyperspectral data, random forest and support vector machines classifiers. ISPRS J. Photogramm. Remote Sens. 2014, 88, 48–59. [Google Scholar] [CrossRef]
Zhang, B.; Ye, H.; Lu, W.; Huang, W.; Wu, B.; Hao, Z.; Sun, H. A Spatiotemporal Change Detection Method for Monitoring Pine Wilt Disease in a Complex Landscape Using High-Resolution Remote Sensing Imagery. Remote Sens. 2021, 13, 2083. [Google Scholar] [CrossRef]
Lee, M.-G.; Cho, H.-B.; Youm, S.-K.; Kim, S.-W. Detection of Pine Wilt Disease Using Time Series UAV Imagery and Deep Learning Semantic Segmentation. Forests 2023, 14, 1576. [Google Scholar] [CrossRef]
Rao, D.; Zhang, D.; Lu, H.; Yang, Y.; Qiu, Y.; Ding, M.; Yu, X. Deep learning combined with Balance Mixup for the detection of pine wilt disease using multispectral imagery. Comput. Electron. Agric. 2023, 208, 107778. [Google Scholar] [CrossRef]
Ye, X.; Pan, J.; Liu, G.; Shao, F. Exploring the Close-Range Detection of UAV-Based Images on Pine Wilt Disease by an Improved Deep Learning Method. Plant Phenomics 2023, 5, 0129. [Google Scholar] [CrossRef]
Wu, B.; Liang, A.; Zhang, H.; Zhu, T.; Zou, Z.; Yang, D.; Tang, W.; Li, J.; Su, J. Application of conventional UAV-based high-throughput object detection to the early diagnosis of pine wilt disease by deep learning. For. Ecol. Manag. 2021, 486, 118986. [Google Scholar] [CrossRef]
Li, F.; Liu, Z.; Shen, W.; Wang, Y.; Wang, Y.; Ge, C.; Sun, F.; Lan, P. A Remote Sensing and Airborne Edge-Computing Based Detection System for Pine Wilt Disease. IEEE Access 2021, 9, 66346–66360. [Google Scholar] [CrossRef]
Gong, H.; Ding, Y.; Li, D.; Wang, W.; Li, Z. Recognition of Pine Wood Affected by Pine Wilt Disease Based on YOLOv5. In Proceedings of the 2022 China Automation Congress (CAC), Xiamen, China, 25–27 November 2022; pp. 4753–4757. [Google Scholar]
Cai, P.; Chen, G.; Yang, H.; Li, X.; Zhu, K.; Wang, T.; Liao, P.; Han, M.; Gong, Y.; Wang, Q.; et al. Detecting Individual Plants Infected with Pine Wilt Disease Using Drones and Satellite Imagery: A Case Study in Xianning, China. Remote Sens. 2023, 15, 2671. [Google Scholar] [CrossRef]
Wang, S.; Cao, X.; Wu, M.; Yi, C.; Zhang, Z.; Fei, H.; Zheng, H.; Jiang, H.; Jiang, Y.; Zhao, X.; et al. Detection of Pine Wilt Disease Using Drone Remote Sensing Imagery and Improved YOLOv8 Algorithm: A Case Study in Weihai, China. Forests 2023, 14, 2052. [Google Scholar] [CrossRef]
Yu, R.; Luo, Y.; Zhou, Q.; Zhang, X.; Wu, D.; Ren, L. Early detection of pine wilt disease using deep learning algorithms and UAV-based multispectral imagery. For. Ecol. Manag. 2021, 497, 119493. [Google Scholar] [CrossRef]
Qin, J.; Wang, B.; Wu, Y.; Lu, Q.; Zhu, H. Identifying Pine Wood Nematode Disease Using UAV Images and Deep Learning Algorithms. Remote Sens. 2021, 13, 162. [Google Scholar] [CrossRef]
Wang, J.; Zhao, J.; Sun, H.; Lu, X.; Huang, J.; Wang, S.; Fang, G. Satellite Remote Sensing Identification of Discolored Standing Trees for Pine Wilt Disease Based on Semi-Supervised Deep Learning. Remote Sens. 2022, 14, 5936. [Google Scholar] [CrossRef]
Oide, A.H.; Nagasaka, Y.; Tanaka, K. Performance of machine learning algorithms for detecting pine wilt disease infection using visible color imagery by UAV remote sensing. Remote Sens. Appl. Soc. Environ. 2022, 28, 100869. [Google Scholar] [CrossRef]
Han, H.; Han, C.; Lan, T.; Huang, L.; Hu, C.; Xue, X. Automatic Shadow Detection for Multispectral Satellite Remote Sensing Images in Invariant Color Spaces. Appl. Sci. 2020, 10, 6467. [Google Scholar] [CrossRef]
Deng, X.; Tong, Z.; Lan, Y.; Huang, Z. Detection and Location of Dead Trees with Pine Wilt Disease Based on Deep Learning and UAV Remote Sensing. AgriEngineering 2020, 2, 294–307. [Google Scholar] [CrossRef]
Long, J.; Shelhamer, E.; Darrell, T. Fully Convolutional Networks for Semantic Segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, 7–12 June 2015; pp. 3431–3440. [Google Scholar]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Medical Image Computing and Computer-Assisted Intervention, Proceedings of the MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Springer International Publishing: Berlin/Heidelberg, Germany, 2015; Chapter 28; pp. 234–241. [Google Scholar]
He, X.; Zhou, Y.; Zhao, J.; Zhang, D.; Yao, R.; Xue, Y. Swin Transformer Embedding UNet for Remote Sensing Image Semantic Segmentation. IEEE Trans. Geosci. Remote Sens. 2022, 60, 4408715. [Google Scholar] [CrossRef]
Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 801–818. [Google Scholar]
La Rosa, L.E.C.; Sothe, C.; Feitosa, R.Q.; de Almeida, C.M.; Schimalski, M.B.; Oliveira, D.A.B. Multi-task fully convolutional network for tree species mapping in dense forests using small training hyperspectral data. ISPRS J. Photogramm. Remote Sens. 2021, 179, 35–49. [Google Scholar] [CrossRef]
Lin, T.Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature Pyramid Networks for Object Detection. In Proceedings of the European Conference on Computer Vision (ECCV), Venice, Italy, 22–29 October 2017; pp. 2117–2125. [Google Scholar]
Mpakairi, K.S.; Dube, T.; Sibanda, M.; Mutanga, O. Fine-scale characterization of irrigated and rainfed croplands at national scale using multi-source data, random forest, and deep learning algorithms. ISPRS J. Photogramm. Remote Sens. 2023, 204, 117–130. [Google Scholar] [CrossRef]
Ebrahimy, H.; Mirbagheri, B.; Matkan, A.A.; Azadbakht, M. Per-pixel land cover accuracy prediction: A random forest-based method with limited reference sample data. ISPRS J. Photogramm. Remote Sens. 2021, 172, 17–27. [Google Scholar] [CrossRef]
Kirillov, A.; Mintun, E.; Ravi, N.; Mao, H.; Rolland, C.; Gustafson, L.; Xiao, T.; Whitehead, S.; Berg, A.C.; Lo, W.Y.; et al. Segment Anything. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), Paris, France, 2–3 October 2023; pp. 4015–4026. [Google Scholar]
Osco, L.P.; Wu, Q.; de Lemos, E.L.; Gonçalves, W.N.; Ramos, A.P.M.; Li, J.; Marcato, J. The Segment Anything Model (SAM) for remote sensing applications: From zero to one shot. Int. J. Appl. Earth Obs. Geoinf. 2023, 124, 103540. [Google Scholar] [CrossRef]
Wieland, M.; Martinis, S.; Kiefl, R.; Gstaiger, V. Semantic segmentation of water bodies in very high-resolution satellite and aerial images. Remote Sens. Environ. 2023, 287, 113452. [Google Scholar] [CrossRef]
Konapala, G.; Kumar, S.V.; Ahmad, S.K. Exploring Sentinel-1 and Sentinel-2 diversity for flood inundation mapping using deep learning. ISPRS J. Photogramm. Remote Sens. 2021, 180, 163–173. [Google Scholar] [CrossRef]
Kanjir, U.; Greidanus, H.; Oštir, K. Vessel detection and classification from spaceborne optical images: A literature survey. Remote Sens. Environ. 2018, 207, 1–26. [Google Scholar] [CrossRef]
Pelizari, P.A.; Geiß, C.; Groth, S.; Taubenböck, H. Deep multitask learning with label interdependency distillation for multicriteria street-level image classification. ISPRS J. Photogramm. Remote Sens. 2023, 204, 275–290. [Google Scholar] [CrossRef]
Yan, G.; Li, L.; Coy, A.; Mu, X.; Chen, S.; Xie, D.; Zhang, W.; Shen, Q.; Zhou, H. Improving the estimation of fractional vegetation cover from UAV RGB imagery by colour unmixing. ISPRS J. Photogramm. Remote Sens. 2019, 158, 23–34. [Google Scholar] [CrossRef]
Rouet-Leduc, B.; Hulbert, C. Automatic detection of methane emissions in multispectral satellite imagery using a vision transformer. Nat. Commun. 2024, 15, 3801. [Google Scholar] [CrossRef]
Wu, Z.; Zhang, C.; Gu, X.; Duporge, I.; Hughey, L.F.; Stabach, J.A.; Skidmore, A.K.; Hopcraft, J.G.C.; Lee, S.J.; Atkinson, P.M.; et al. Deep learning enables satellite-based monitoring of large populations of terrestrial mammals across heterogeneous landscape. Nat. Commun. 2023, 14, 3072. [Google Scholar] [CrossRef]
Yang, J.; Matsushita, B.; Zhang, H. Improving building rooftop segmentation accuracy through the optimization of UNet basic elements and image foreground-background balance. ISPRS J. Photogramm. Remote Sens. 2023, 201, 123–137. [Google Scholar] [CrossRef]
Zhu, L.; Lee, F.; Cai, J.; Yu, H.; Chen, Q. An improved feature pyramid network for object detection. Neurocomputing 2022, 483, 127–139. [Google Scholar] [CrossRef]
Jiang, L.; Zhou, W.; Li, C.; Wei, Z. Semantic segmentation based on DeeplabV3+ with multiple fusions of low-level features. In Proceedings of the 2021 IEEE 5th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing China, 12–14 March 2021; pp. 1957–1963. [Google Scholar]
Caraballo-Vega, J.; Carroll, M.; Neigh, C.; Wooten, M.; Lee, B.; Weis, A.; Aronne, M.; Alemu, W.; Williams, Z. Optimizing WorldView-2, -3 cloud masking using machine learning approaches. Remote Sens. Environ. 2023, 284, 113332. [Google Scholar] [CrossRef]
Moghimi, A.; Welzel, M.; Celik, T.; Schlurmann, T. A Comparative Performance Analysis of Popular Deep Learning Models and Segment Anything Model (SAM) for River Water Segmentation in Close-Range Remote Sensing Imagery. IEEE Access 2024, 12, 52067–52085. [Google Scholar] [CrossRef]

Figure 1. (a) represents the location of Anhui Province in China, (b) represents the location of Qianshan City in Anhui Province and the Sentinel-2 satellite images of Qianshan City, (c) represents the distribution of drone images and PWD-infected trees determined by field survey, and (d) represents PWD-infected tree samples displayed in red box.

Figure 2. Major land cover types in the study area.

Figure 3. Workflow of the research project.

Figure 4. Structure of the U-Net model.

Figure 5. Structure of the DeepLabv3+ model.

Figure 6. Structure of the FPN model.

Figure 7. (a) represents the UAV RGB image, (b) represents the annotation of PWD-infected trees, (c) represents the classification result of the RGB feature scheme, (d) represents the classification result of the HSV feature scheme, and (e) represents the classification result of the RGBHSV feature scheme. The red circles represent PWD-infected tree samples misclassified by using the RGB and HSV feature schemes.

Figure 8. (a) represents the UAV RGB image, (b) represents the annotation of PWD-infected trees, (c) represents the classification result of the U-Net model, (d) represents the classification result of the DeepLabv3+ model, (e) represents the classification result of the FPN model. The red circles represent PWD-infected tree samples misclassified by the DeepLabv3+ and FPN models.

Figure 9. Comparison of the prediction results of the U-Net model and the RF model.

Figure 10. Comparison of the prediction results of the U-Net and SAM models. (a) represents drone image, (b) represents PWD-infected trees determined by field survey, (c) represents the comparison of prediction results between the U-Net and SAM models using the RGB feature scheme, and (d) represents the comparison of prediction results between the U-Net and SAM models using the HSV feature scheme.

Figure 11. Comparison of the prediction results of all models. The orange line represents the administrative boundary of Qianshan City while the red color point indicate PWD-infected trees predicted by different models.

Table 1. Classification results of different feature schemes for identifying PWD-infected trees.

Model	Feature Scheme	Success	Success Rate	Fail	Fail Rate
U-Net	RGB	544	96.97%	17	3.03%
	HSV	538	95.90%	23	4.10%
	RGBHSV	553	98.57%	8	1.43%
DeepLabv3+	RGB	531	94.65%	30	5.35%
	HSV	526	93.76%	35	6.24%
	RGBHSV	540	96.26%	21	3.74%
FPN	RGB	542	96.61%	19	3.39%
	HSV	534	95.19%	27	4.81%
	RGBHSV	546	97.33%	15	2.67%

Table 2. Recognition results of different models on all images.

Model	Feature Scheme	Suspected PWD-Infected Trees	Confirmed True PWD-Infected Trees
U-Net	RGB	710	544
	HSV	747	538
	RGBHSV	759	553
DeepLabv3+	RGB	717	531
	HSV	674	526
	RGBHSV	714	540
FPN	RGB	727	542
	HSV	671	534
	RGBHSV	718	540

Table 3. Comparison of model parameters and training time between the U-Net, DeepLabv3+, and FPN models.

Model	Channels	Trainable Params	Training Time(/epoch)
U-Net	3	67,156,881	5 m 55 s
	6	67,166,289	6 m 02 s
DeepLabv3+	3	61,313,361	5 m 48 s
	6	61,322,769	5 m 53 s
FPN	3	60,751,809	5 m 04 s
	6	60,761,217	5 m 11 s

Table 4. Accuracy assessment indexes of different feature schemes using three semantic segmentation models.

Model	Feature Scheme	F1-Score	IoU	Precision
U-Net	RGB	0.9538	0.9146	0.9744
	HSV	0.9576	0.9201	0.9654
	RGBHSV	0.9586	0.9221	0.9741
DeepLabv3+	RGB	0.9373	0.8838	0.9610
	HSV	0.9296	0.8711	0.9556
	RGBHSV	0.9254	0.8643	0.9598
FPN	RGB	0.9494	0.9049	0.9639
	HSV	0.9411	0.8903	0.9718
	RGBHSV	0.9480	0.9030	0.9632

Table 5. Comparison of identification results of the U-Net model and the random forest model.

Model	Success	Success Rate (%)	Fail	Fail Rate (%)
U-Net	553	98.57	8	1.43
RF	526	93.76	35	6.24

Table 6. Comparison of accuracy metrics of the U-Net model and the random forest model.

Model	F1-Score	IoU	Precision
U-Net	0.9586	0.9221	0.9741
RF	0.9332	0.8745	0.9299

Table 7. Comparison of model parameters and training time between SAM and U-Net.

Model	Trainable Params	Training Time(/epoch)
U-Net	67,156,881	5 m 55 s
SAM	93,735,472	10 m 56 s

Table 8. Classification results of SAM and U-Net for identifying PWD-infected trees.

Model	Feature Scheme	Success	Success Rate	Fail	Fail Rate
U-Net	RGB	544	96.97%	17	3.03%
	HSV	538	95.90%	23	4.10%
SAM	RGB	458	81.64%	103	18.36%
	HSV	402	71.66%	159	28.34%

Table 9. Accuracy assessment indexes of the SAM and U-Net models.

Model	Feature Scheme	F1-Score	IoU	Precision
U-Net	RGB	0.9538	0.9146	0.9744
	HSV	0.9576	0.9201	0.9654
SAM	RGB	0.7932	0.7153	0.6824
	HSV	0.6853	0.6201	0.5942

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhi, J.; Li, L.; Zhu, H.; Li, Z.; Wu, M.; Dong, R.; Cao, X.; Liu, W.; Qu, L.; Song, X.; et al. Comparison of Deep Learning Models and Feature Schemes for Detecting Pine Wilt Diseased Trees. Forests 2024, 15, 1706. https://doi.org/10.3390/f15101706

AMA Style

Zhi J, Li L, Zhu H, Li Z, Wu M, Dong R, Cao X, Liu W, Qu L, Song X, et al. Comparison of Deep Learning Models and Feature Schemes for Detecting Pine Wilt Diseased Trees. Forests. 2024; 15(10):1706. https://doi.org/10.3390/f15101706

Chicago/Turabian Style

Zhi, Junjun, Lin Li, Hong Zhu, Zipeng Li, Mian Wu, Rui Dong, Xinyue Cao, Wangbing Liu, Le’an Qu, Xiaoqing Song, and et al. 2024. "Comparison of Deep Learning Models and Feature Schemes for Detecting Pine Wilt Diseased Trees" Forests 15, no. 10: 1706. https://doi.org/10.3390/f15101706

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Comparison of Deep Learning Models and Feature Schemes for Detecting Pine Wilt Diseased Trees

Abstract

1. Introduction

2. Materials

2.1. Study Area

2.2. Data Source and Processing

3. Methods

3.1. Color Space Model

3.2. Semantic Segmentation Model

3.2.1. U-Net Model

3.2.2. DeepLabv3+ Model

3.2.3. FPN Model

3.2.4. Settings for Training Framework

3.3. Random Forest Model

3.4. SAM Model

3.5. Accuracy Assessment

4. Result

4.1. Comparisons of Different Feature Schemes

4.2. Comparisons of Different Semantic Segmentation Models

4.3. Comparison of the U-Net Model with the Traditional Machine-Learning Model

4.4. Comparison of the U-Net Model with the SAM Model

5. Discussion

5.1. The Optimal Input Feature Scheme

5.2. The Optimal Semantic Segmentation Model

5.3. Error Source of the Semantic Segmentation Model

5.4. Model Generalization

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI