Early-Stage Pine Wilt Disease Detection via Multi-Feature Fusion in UAV Imagery

Xie, Wanying; Wang, Han; Liu, Wenping; Zang, Hanchen

doi:10.3390/f15010171

Open AccessArticle

Early-Stage Pine Wilt Disease Detection via Multi-Feature Fusion in UAV Imagery

College of Information, Beijing Forestry University, Beijing 100083, China

^*

Author to whom correspondence should be addressed.

Forests 2024, 15(1), 171; https://doi.org/10.3390/f15010171

Submission received: 14 November 2023 / Revised: 22 December 2023 / Accepted: 29 December 2023 / Published: 14 January 2024

(This article belongs to the Section Forest Health)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Pine wilt disease (PWD) is a highly contagious and devastating forest disease. The timely detection of pine trees infected with PWD in the early stage is of great significance to effectively control the spread of PWD and protect forest resources. However, in the spatial domain, the features of early-stage PWD are not distinctly evident, leading to numerous missed detections and false positives when directly using spatial-domain images. However, we found that frequency domain information can more clearly express the characteristics of early-stage PWD. In this paper, we propose a detection method based on deep learning for early-stage PWD by comprehensively utilizing the features in the frequency domain and the spatial domain. An attention mechanism is introduced to further enhance the frequency domain features. Employing two deformable convolutions to fuse the features in both domains, we aim to fully capture semantic and spatial information. To substantiate the proposed method, this study employs UAVs to capture images of early-stage pine trees infected with PWD at Dahuofang Experimental Forest in Fushun, Liaoning Province. A dataset of early infected pine trees affected by PWD is curated to facilitate future research on the detection of early-stage infestations in pine trees. The results on the early-stage PWD dataset indicate that, compared to Faster R-CNN, DETR and YOLOv5, the best-performing method improves the average precision (AP) by 17.7%, 6.2% and 6.0%, and the F1 scores by 14.6%, 3.9% and 5.0%, respectively. The study provides technical support for early-stage PWD tree counting and localization in the field in forest areas and lays the foundation for the early control of pine wood nematode disease.

Keywords:

pine wood nematode; drone imagery; object detection; frequency domain; feature fusion

1. Introduction

Pine trees are highly prone to pine wilt disease (PWD) after being infested by the pine wood nematode (PWN; Bursaphelenchus xylophilus) [1]. PWD, known as “pine cancer”, is a common term used to describe the condition. After infection, the color of pine trees’ needles will change from green to yellow, and they will finally turn reddish-brown. The infected pine trees will succumb to the disease within a few months, and failure to implement timely management measures will result in severe damage to the entire pine forest within 3 to 5 years [2]. Due to the wide range of habitats, easy transmission, rapid onset, high fatality rate and difficulty in prevention and control, PWD can ultimately lead to a series of devastating deaths of pine tree species. It is a global-level forest disease [3] and one of the most serious natural disasters currently affecting China [4]. In this paper, we categorize the pine trees infected by PWD into four stages: early stage, moderate stage, severe stage and death [5] (see Figure 1). At the early stage, the resin secretion of pine trees infected with PWD stops, and the needles begin to fade from green to yellow-green. The timely and accurate detection of early-stage PWD is of great significance in preventing the wide-ranging spread of the disease among pine trees. It is also crucial for both ecological protection and economic development in China [6].

Collecting images using UAVs and employing object detection algorithms based on deep learning networks can help to rapidly and efficiently detect pest infestations [7,8,9,10,11], thereby enhancing the cost-effectiveness of forest pest identification. Safonova et al. [12] proposed a two-stage convolutional neural network (CNN) framework for the calibration and detection of UAV images. Firstly, they established a detection strategy to locate the crown areas with suspected infection, and then predicted the extent of damage for each candidate area based on a deformed CNN model. This method predicted the spatial domain images captured by the UAV, achieving detection accuracy of 95%. In the task of detecting PWD, Deng et al. [13] combined the region proposal network (RPN) [14] and Faster R-CNN, achieving accuracy of 90%. In the study by Qin et al. [15], a detection algorithm based on YOLOv5 [16] was used to fuse spatial domain images and multispectral images of pine trees with PWD.

However, the methods mentioned above can only detect moderate and severe stages of PWD. The visual differences between early-stage PWD and healthy pine trees are minimal. This limitation makes traditional object detection methods based on the image spatial domain (i.e., RGB color space) not well suited for the detection of the early stage. Wu et al. [17] proposed the real-time assessment of moderate-stage PWD using UAV images based on YOLOv3 [18], with detection accuracy 62%. In the study conducted by Yu et al. [19], two object detection algorithms, Faster R-CNN and YOLOv4 [20], as well as two traditional machine learning algorithms based on feature extraction, the random forest algorithm [21] and support vector machine (SVM) [22], were employed to recognize early-stage PWD. The detection accuracy of these four methods reached 60.98%, 57.07%, 75.33% and 73.28%, respectively. The above methods were limited in their detection accuracy due to the insufficient extraction of features of PWD at the early stage.

The main obstacle in early-stage PWD detection is that the color difference between early-stage PWD areas and healthy vegetation areas is not obvious in the spatial domain. On the other hand, the frequency domain distribution of the aerial perspective of drone imagery may better reflect the characteristics of the infected areas compared to images captured under the forest canopy or spatial domain images (such as the red boxes depicted in Figure 2). Therefore, in this paper, we consider utilizing frequency domain information to explore visually inconspicuous features of early-stage PWD UAV images. In recent years, an increasing number of researchers have been exploring imperceptible information through the frequency domain [23,24,25]. Influenced by the theory of digital feature processing, Xu et al. [26] found that compared to traditional spatial down-sampling, frequency domain learning could better preserve image information during image preprocessing. Therefore, they proposed a learnable frequency domain selection method using frequency domain features as input data for both image classification and object detection, respectively. By combining the frequency domain, the top-1 detection accuracy of the method with a backbone of ResNet-50 [27] and MobileNetV2 [28] was improved by 1.60% and 0.63% compared to using the spatial domain alone, respectively. Moreover, the average accuracy of Mask R-CNN [29] was improved by 0.8% compared to using only spatial domain features. In a study by Wang et al. [30], a multi-feature fusion network for ship detection in SAR images was proposed. This network extracted the spatial and frequency domain features of ship targets and utilized feature fusion blocks to merge spatial and frequency texture information. Finally, the network used RPN to detect ship targets in the original images. The detection accuracy on the SAR ship detection dataset reached 96.43%. Yang et al. [31] developed a novel cross-domain fusion network, CDF-Net, based on Discrete Cosine Transform (DCT) convolution. This network extracted and integrated the frequency domain and spatial domain features of input samples. When applied to the ImageNet 2012 dataset for image classification using ResNet-50, the proposed method achieved a maximum accuracy improvement of 3.68% compared to using only spatial domain features. On the COCO 2017 dataset for object detection, the mean average precision (mAP) of ResNet-50 and ResNeXt-50 [32] when combining frequency domain features increased by 0.5% and 1.2%, respectively, also compared to using only spatial domain features.

Based on the above discussion, this paper proposes an early-stage PWD detection method that fuses the features of the frequency domain and spatial domain of UAV images. The main contributions of our method are as follows: (1) utilizing the frequency domain to explore the visually imperceptible information of PWD at the early stage, which is difficult to distinguish in the spatial domain; (2) employing an attention mechanism to enhance the extraction of features in different frequency bands; (3) integrating spatial domain features and frequency domain features by deformable convolution to fully express semantic information and positional information, thus improving the ability of the model to detect PWD at the early stage; (4) constructing an early-stage pine wilt disease UAV drone imagery dataset, containing five forest areas of PWD from 547 high-resolution drone images from the Dahuofang Experimental Forest.

2. Materials and Methods

2.1. Study Area and UAV Image Data

The study area for our study is located in the Dahuofang Experimental Forest in Fushun, Liaoning Province, China. The forest is rich in vegetation types, most of which are mixed broadleaf–conifer forests. PWD is the main pest in this area. We divide the area affected by PWD into five blocks (as shown in Figure 3 as five-color rectangular boxes), which are Yanjiagou sample plots (Y-0 to Y-3) and a Puyaoshanqian sample plot (PY-4) (as shown in Figure 3c). Firstly, five sampling plots are established within each experimental site, and high-resolution drone imagery of the forest at these sites is captured using a drone equipped with a high-resolution camera. Subsequently, we conduct a comprehensive analysis of the health status of the trees by integrating manual verification with drone imagery. Ultimately, we determine whether the discoloration of the trees is caused by PWD infection. Drone imagery was collected in August 2020 and April to August 2021 using a DJI Elf 4 Pro V2.0 drone equipped with an optical camera. The terrain in the study area ranges from 0 to 60 m in slope, and the drone’s flight altitude varied from 100 to 240 m to capture pine forest image information at different scales according to the undulating terrain. Aerial images were captured in JPEG format, containing geographical coordinates and flight height information, with an image resolution of 5472 × 3648 pixels. A total of 547 aerial images containing red pine were obtained from the drone remote sensing, among which 527 images were from sites Y-1, Y-2, Y-3, and PY-4, and the other 20 images were from site Y-0.

2.2. Dataset

The dataset is constructed using the open-source annotation software LabelImg3.18 (Tzutalin), which annotates the early-stage PWD targets in the original images, recording the bounding box positions and forming a dataset in the format of [category, center coordinates, width, height]. In this study, data from sites Y-1, Y-2, Y-3, and PY-4, are used for model training. A total of 309 randomly selected images are used as the original data for the training set, and the other 218 images are used as the original data for the validation set.

In order to standardize the image specifications and alleviate the pressure on model training, we employ a scanning and cropping approach to resize the original image resolution to 2500 × 1500. This is achieved by setting the sliding window size to 2500 × 1500 and scanning the original image using this window. Consequently, each original image generates four sub-images. Two randomly selected sub-images are placed into the corresponding training or validation set. Ultimately, the early-stage PWD dataset is established, consisting of 718 images in the training set and 436 images in the validation set. Images from site Y-0 are not included in either the training or validation sets; they are only utilized to assess the generalization ability of the model.

2.3. Model Structure

In the frequency domain space, the features of the early-stage PWD are more significant. Therefore, as shown in Figure 4, this paper proposes a model framework for early-stage PWD detection in drone imagery by combining frequency domain features with spatial domain features. This can address the challenge of unclear features of early-stage PWD in drone imagery and the potential for false positives and missed detections during detection.

Firstly, we obtain the frequency domain feature information from the spatial domain, and then enhance this information through the frequency domain feature enhancement module (FEM). The spatial domain images and frequency domain images are used as separate input data for the model. In a feature pyramid network (FPN) [33], the feature fusion module (FFM) is employed to integrate the frequency domain feature information with corresponding spatial domain feature information at different scales. The prediction consists of two parts: frequency domain feature prediction and combined feature prediction. In the first part, after integrating different scales of frequency domain features with features from the backbone, we employ two ordinary convolutional blocks to generate predictions for each scale of frequency domain features separately. The second part involves fusing features at different scales, which are processed through the path aggregation network (PANet) [34] to form new features. These new features are processed by other ordinary convolutional blocks to generate predictions for each scale of combined features. The optimal prediction result is calculated based on the different predictions mentioned above and is marked in the image as a rectangular bounding box.

2.4. DCT

In this paper, to obtain the frequency domain features of the image (as shown in Figure 5), we first convert the spatial domain images from the RGB space to the YCbCr space. Then, the image data in the YCbCr space are divided into 8 × 8 patches. Each patch is processed by DCT from left to right and top to bottom. Finally, we group all the channels with the same frequency band into one channel to maintain their spatial relationships in each frequency domain. Since the Y, Cb, and Cr channels provide 8 × 8 channels each, there is a total of 192 channels in the frequency domain. The input size of the spatial domain images for our model is 800 × 800 × 3, and the size of the frequency domain features obtained is 100 × 100 × 192.

2.5. Frequency Domain Feature Enhancement Module

In order to discover more concealed features of PWD at an early stage in the frequency domain features, this paper proposes a learnable FEM. This module can be applied to process frequency domain information in any model. The module employs the convolutional block attention module (CBAM) [35], which is capable of extracting richer feature channels, thereby enhancing the model’s utilization of key features. The human visual system (HSV) exhibits differential sensitivity to various frequency domain channels. Studies have found that models based on CNNs also share similar characteristics [26], with higher sensitivity to low-band information compared to high-band information. Therefore, we design two FEM structures to deal with high- and low-frequency features. The FEM_HL (as shown in Figure 6) processes the features of the low band and high band separately to simultaneously highlight the low-band information representing the image foreground and the high-band features capturing image details within the frequency domain information. The FEM_L (as shown in Figure 7) disregards high-band features and retains only the low-band features in order to emphasize the image foreground information with higher responses in the frequency domain features.

The FEM_HL structure is as follows: firstly, the obtained frequency domain features are separated into different frequency bands based on the channels: Y, Cb, and Cr. Then, each frequency band feature is further divided into high-band features and low-band features. The high-band features from the three frequency bands are recombined to form a new set of high-band features

x_{h}^{f r e q}

. Performing the same operation on all low-band features results in new low-band features

x_{l}^{f r e q}

. To enhance the feature information in the corresponding frequency bands, CBAM is applied separately to

x_{h}^{f r e q}

and

x_{l}^{f r e q}

. Next, the high-band and low-band features from the same frequency bands are concatenated to individual channel features for Y, Cb, and Cr. These three channel features are then concatenated, resulting in frequency domain features with a channel number of 192. The concatenated features are subjected to two CBAM modules to enhance the model’s sensitivity to informative channel information, which enhances the output features for both high- and low-band components.

The FEM_L structure is as follows: similar to FEM_HL, the obtained frequency domain features are first divided into Y, Cb, and Cr channels. Only the low band of each channel’s features is retained. All the low-band features are recombined to form a new set of low-band features

x_{L}^{f r e q}

, which is then processed by using a CBAM module. Next, the low-band features from the same frequency band are separated by the Y, Cb, and Cr channels. Each set of low-band features is processed with a common convolutional operation with a kernel size of 1 × 1, adjusting their channel numbers from 32 to 64. The adjusted three features are then concatenated to form frequency domain features with 192 channels. Subsequently, two CBAM modules are applied to this concatenated feature set, in order to enhance the output for low-band features. All concatenation operations mentioned above are performed along the channel dimension.

2.6. Feature Fusion Module

In deep-learning-based object detection networks, each layer generates various information in the form of feature maps. The higher layers of the network contain richer semantic information, whereas the lower layers contain more precise positional information. The features from the spatial domain have a larger receptive field, which can compensate for frequency domain features. On the other hand, frequency domain features are more compactly arranged to represent information of different frequencies. This arrangement contributes to allowing the model to better learn the features of early-stage PWD.

The FFM aims to fuse the information between frequency domain features and different scales of spatial domain features extracted by the backbone, and then align them spatially. This module also can be used in any network to fuse frequency domain features and spatial domain features. Due to the varied sizes and shapes of early-stage PWD in drone imagery, we employ deformable convolution (DCN) [36] in the FFM module. DCN allows the convolution kernel to be extended to a larger range during training by adding an offset to the sampling position in the standard convolution tensor. This approach yields better results in learning the features of early-stage PWD of varying sizes and shapes.

In the computational process of the FFM (as shown in Figure 8), firstly, we duplicate the frequency domain feature map in triplicate. Then, a 1 × 1 common convolution is applied to adjust the channel numbers to match those of the spatial domain features extracted at different scales by the backbone. Furthermore, these frequency domain feature maps are up-sampled separately by using bilinear interpolation, aligning them with the corresponding scale of the spatial domain feature maps. We denote the spatial domain feature map as x^RGB and the frequency domain feature map obtained after up-sampling as x^FREQ. The x^RGB and x^FREQ are concatenated along the channel dimension to obtain x₁ and x₂, and the computational formulas are shown in Equations (1) and (2):

x_{1} = [x^{R G B} \times μ, x^{F R E Q}]

(1)

x_{2} = [x^{R G B}, x^{F R E Q} \times μ],

(2)

where the multiplication by μ is intended to enhance the weight of x^RGB and x^FREQ in their respective branches. Then, x₁ and x₂ are each passed through a regular convolutional block, and the resulting outputs serve as inputs to the deformable convolution. Finally, the two obtained feature maps are added pixel-wise to obtain the result of FFM.

2.7. Evaluation Indicators

In this paper, the F1 harmonic average and average precision (AP) are used as evaluation indexes. In object detection, the F1 harmonic average, precision and recall are expressed as shown in Equations (3)–(5):

p r e c i s i o n = \frac{T P}{T P + F N} \times 100 %

(3)

r e c a l l = \frac{T P}{T P + F P} \times 100 %

(4)

F 1 = \frac{2 \times p r e c i s i o n \times r e c a l l}{p r e c i s i o n + r e c a l l} \times 100 %,

(5)

where TP is the number of images that are correctly detected by the model as early-stage PWD; FP represents the number of images that do not belong to early-stage PWD but are incorrectly detected by the model; FN represents the number of images that belong to PWD at the early stage but are not detected by the model.

AP is a commonly used model evaluation index for object detection. A good model can maintain high precision and recall at the same time. Precision and recall can be expressed as a P-R curve, and the area under the P-R curve is the average accuracy AP, as shown in Equation (6):

AP = \int_{0}^{1} p (r) d r,

(6)

where p represents precision and r represents recall.

2.8. Other Details

In this paper, the spatial domain images and frequency domain images of the input model are firstly subjected to preprocessing operations of rotation, perspective, and scaling, and then the data are enhanced with the Cutout [37] data enhancement method.

3. Results

3.1. Implementation Details

The hardware platform is an Intel Core i7-9700 @ 3.00GHz (16G RAM) and NVIDIA GeForce GTX 1080Ti GPU (11G). The software platform is the Ubuntu 18.04 LTS 64-bit operating system and the PyTorch framework for deep learning. The model’s initial learning rate is set to 1e-2, the learning rate decay is set to 5e-4, the batch size (Batch size) is set to 16 for training and 8 for testing, the stochastic gradient descent (SGD) algorithm with momentum of 0.937 is used for optimization, and a total of 200 epochs of training are performed. We set μ = 2 in the FFM.

3.2. Experiments on the Structures of FEM

In order to validate the optimal FEM structure for the model, we use two FEM structures on the early-stage PWD dataset for training and validation, respectively. The model on this early-stage PWD dataset is denoted by the letter “K”. K0 represents that no enhancement module is added to the model in this paper. Firstly, the frequency domain features corresponding to spatial domain features are added. Models K1 and K2 represent the use of FEM_HL (enhancing high-band and low-band features) and FEM_L (enhancing low-band features only), respectively. The AP and F1 scores for early-stage PWD are shown in Table 1. It can be observed that for models utilizing any FEM structure, both the AP and F1 scores are higher than those of the original model.

Comparing to models using only the FEM_L, the models employing both high-band and low-band features of FEM_HL achieve the highest AP and F1 scores, reaching 81.1% and 78.1%, respectively. The reason that FEM_H performs better lies in the fact that the main information in the images is primarily concentrated in the low-band features, while image details are mainly contained in high-band features. The deep learning models have a higher response to low-band features and a lower response to high-band features. By comparing the results of the two structures, it is found that preserving high-band features could more effectively enhance the accuracy of the model in detecting early-stage PWD. Therefore, we choose to adopt FEM_HL, which preserves both high-band and low-band features.

3.3. Ablation Experiments

In order to validate the effectiveness of the different enhancement modules, our study conducts ablation experiments by gradually incorporating the modules to acquire frequency domain features, FEM_HL, FFM, and Cutout data augmentation. All these proposed modules can be integrated into any model for object detection.

We conduct ablation experiments from two directions: (1) incorporating individual enhancement modules separately to verify the effectiveness of each module on the original method, and (2) progressively adding enhancement modules to assess the impact of each module on the final method. The results are shown in Table 2, where a “√” indicates that the module is added.

From the results of the ablation experiment, it can be observed that each proposed module has a certain effect on improving the AP. Furthermore, when the modules proposed in this study are added cumulatively, all evaluation metrics reach their optimal values. For the early-stage PWD dataset, after integrating the frequency domain features obtained through DCT (K3), FEM_HL (K1), FFM (K4), and Cutout data enhancement (K5) separately, the AP increases by 3.9%, 4.7%, 3.5%, and 5.1%, respectively. Upon the cumulative integration of the frequency domain feature information of the spatial domain, FEM_HL, FFM, and Cutout data augmentation (K6), the AP improves by 3.9%, 4.7%, 6.7%, and 8.2%, respectively, and the F1 score is improved by the maximum value of 7.3%, reaching 79.2%. These results demonstrate the effectiveness of the improved modules and further validate the effectiveness of frequency domain features in extracting early-stage PWD. An attention mechanism can give high weights to focus on important information and low weights to ignore irrelevant details. Therefore, in this study, we employ an attention mechanism to enhance the frequency domain feature information, leading to an increased AP. This proves that the method proposed in this paper can effectively extract the crucial features of early-stage PWD from images. In addition, the degree of fusion between spatial domain features and frequency domain features also determines the prediction results of the model. Therefore, we design the FFM. Deformable convolution is used to better learn the features of PWD at the early stage with varying shapes and sizes. The purpose of data augmentation is to enhance the generalization performance of the model. After adding Cutout data augmentation, the problem of the limited image sample quantity in the early-stage PWD dataset is addressed, leading to an improvement in the model’s AP.

In order to verify that the improvements of our method indeed enhance the detection accuracy of early-stage PWD, Figure 9a depicts a selection of images taken from the early-stage PWD dataset. After integrating the modules proposed in this study into YOLOv5s, a comparison is made between the model’s performance before and after the improvements on the early-stage PWD dataset. The results are shown in Figure 9c,d.

As can be seen from the group of images in Figure 9c, it can be observed that the original algorithm exhibits a poor detection capability for PWD at the early stage, leading to problems such as false positives and missed detections during detection. The main reason for these problems is the insufficient feature extraction for early-stage PWD by the original algorithm. However, by introducing frequency domain feature information, using the FEM_HL and the FFM to fuse spatial domain features with frequency domain features, and employing the Cutout data augmentation strategy, our model significantly improves these issues (as shown in Figure 9d). The detection accuracy is greatly enhanced, and instances of false positives and missed detections during detection are significantly decreased.

To evaluate the generalization ability of the model, the improved model is tested on the Y-0 sample plot dataset, which never appears in the training set or the validation set. The experimental results are shown in Figure 10. It can be observed that on the previously unseen dataset, the proposed method in this study achieves the more accurate detection of early-stage PWD, indicating strong generalization capabilities.

Firstly, the original image resolution is scaled down to 800 × 800. Subsequently, Eigen Class Activation Maps (EigenCAM) [38] are employed to compare the results of early-stage PWD predictions between the method proposed in this paper and the original method (as illustrated in Figure 11). In the heatmap, darker colors indicate a greater impact on the model’s final classification and localization decisions for the target. Figure 10b represents the ground truth. The left sections of Figure 11c–f depict the predictions made by EigenCAM across the entire image. It is noticeable that the introduction of frequency domain information leads to significant improvements in the model’s predictions, especially in the upper left corner. With the addition of FEM_HL, the model’s predictions for the areas of early-stage PWD become more precise. Furthermore, after incorporating FFM, the colors indicating the model’s predictions in the target area are deepened, aligning closely with the ground truth. The right sections of Figure 11c–f show the results obtained after removing heatmap data outside the bounding boxes and scaling the heatmaps inside every bounding box. It is evident that with the gradual incorporation of the methods proposed in this paper, the model’s activated areas for regions of early-stage PWD increase and become more accurate.

3.4. Experiments of Different Algorithms

For the early-stage PWD dataset, Table 3 compares experiments between Faster-RCNN, DETR [39], YOLO series algorithms, and our method when integrated into corresponding YOLO algorithms. The results indicate that the precision of Faster-RCNN is lower, likely due to the presence of numerous small targets within the early-stage PWD dataset. The Faster-RCNN model is not suitable for models with less distinctive features and smaller targets. DETR is built on the Transformer architecture and achieves precision 2.3% higher than YOLOv5s. However, when compared to the application of the method proposed in this paper on YOLOv5s, the detection accuracy drops by 5.9%. This emphasizes the importance of frequency domain information in the detection of early-stage PWD. Compared to the models before improvement, the detection capabilities of early-stage PWD are improved to varying degrees based on the improved modules.

On the spatial domain image dataset, the detection accuracy of the YOLOv5 variant surpasses that of YOLOv3 (K9). The AP of YOLOv5n (K13), YOLOv5s (K15), and YOLOv5m (K5) is slightly higher than that of YOLOv4 (K11). Among them, the model based on YOLOv5m (K18) exhibits the best detection performance. Compared to the original model trained only on the spatial domain (K17), the AP is increased by 6.9%, and the F1 score is improved by 4.8%. YOLOv5l (K19), YOLOv5x (K21), and the methods based on YOLOv5l and YOLOv5x in this paper (K20, K22) have AP results lower than those of K17 and K18.

4. Discussion

The extraction of features of early-stage PWD is crucial for accurate detection. The comparative analysis involving the introduction of frequency domain information into various object detection models demonstrates that obtaining frequency domain information corresponding to drone imagery provides a more effective means of detecting early-stage PWD, surpassing detection methods relying solely on spatial domain information.

4.1. Object Detection

Our study uses high-resolution images obtained through UAV along with their corresponding frequency domain images. Among the seven object detection models employed in this study, our proposed enhancement methods are incorporated to detect early-stage PWD within the validation dataset. Among these models, YOLOv5m, after incorporating the module proposed in this paper, achieves the highest accuracy at 84.9%, with an F1 score of 79%. The use of high-resolution spatial domain images significantly reduces the need for extensive ground surveys, especially in dense forest canopies [40]. In comparison to satellite images, limited by weather conditions and spatial resolution, UAV-based images provide greater flexibility [41]. UAV-based object detection tasks have been successfully applied to UAV-based pest and disease detection, such as red turpentine beetle (RTB; Dendroctonus valens LeConte) [42,43], rice pests (stem borer and Hispa) [44], and other forest insect pests [45]. The combination of frequency domain information and deep learning object detection models offers an operational, flexible, and cost-effective method for the early detection of pine trees affected by PWD.

4.2. Model Performance

After incorporating the methods proposed in this study, the YOLOv3, YOLOv4, and YOLOv5 series object detection models have shown gradual improvements in detecting early-stage pine trees affected by PWD. Except for YOLOv3, these models have achieved AP above 80% and F1 scores above 73%. Several scholarly investigations have assessed the detection precision of diverse object detection frameworks [46,47,48,49]. Jiang et al. [50] evaluated the detection accuracy of YOLOv3, YOLOv4, and YOLOv5 series models on infrared thermal images and videos, with YOLOv5s demonstrating the best performance. Chen et al. [51,52] compared the detection performance of Faster R-CNN, SSD, and YOLOv5 on a water surface floater test dataset. However, in this study, the evaluation results demonstrate that YOLOv5m outperforms YOLOv5x in terms of detection performance, highlighting that enhancing the network depth and width does not guarantee a proportional improvement in the model’s accuracy across different datasets.

4.3. Limitations and Practical Considerations

The factors causing changes in the color of pine trees, besides being due to infestation by PWD, may also include reasons such as drought. The method described in this article is currently applicable only in pine forest areas potentially infected with PWD. It detects pine trees with color changes and confirms red pine discoloration due to the infestation of PWD through manual checks conducted in the forest area. However, the specific cause of the color change in pine trees cannot be determined solely through model detection.

The use of UAVs is promising because of their ability to provide comprehensive spatial and spectral data. However, acknowledging their constraints is vital, especially in scenarios involving vast areas where deploying UAVs may not be practical.

Our method enables the localization of early-stage PWD in unmanned aerial vehicle images. The main factors affecting the detection accuracy are as follows: (1) accurately detecting trees with distinctive postures pose a challenge due to limited training samples, leading to false positives and missed detections; (2) the detection results of early-stage PWD are better for trees with complete shapes in the images, but for trees with incomplete shapes at the image edges, the detection performance is poorer. Therefore, future work will continue to expand the dataset, add texture features of early-stage PWD, conduct further in-depth research on early-stage PWD with incomplete tree shapes, enhance the differentiation of various causes of pine tree color changes, and increase the practical value of the model.

5. Conclusions

Due to the subtle characteristics of early-stage PWD compared to healthy pine trees and PWD at a moderate stage, existing object detection methods exhibit low accuracy, leading to issues such as false positives and missed detections during detection. To address these challenges, we propose a detection model for infected pine trees affected by PWD in unmanned aerial vehicle imagery based on deep learning methods. This model incorporates frequency domain feature information from UAV images and enhances the disease detection accuracy by fusing spatial domain features with frequency domain features. Specifically, the experimental results demonstrate that frequency domain features can more significantly represent the characteristics of pine trees infected with PWD. Therefore, the paper first extracts frequency domain feature information from the spatial domain using DCT. Subsequently, an attention mechanism is employed to enhance these frequency domain features, improving the utilization of key features. The model takes both the spatial domain information and frequency domain information as input data. In order to align features from the spatial domain and frequency domain in space for better fusion results, we introduce deformable convolution in the feature fusion stage. By leveraging the extensible properties of deformable convolution kernels, the model learns the diverse features of early-stage PWD, varying in size and shape. Ultimately, this approach enhances the performance of the object detection model.

Experiments are conducted on the early-stage PWD dataset, and the proposed method is compared with other similar approaches. The results indicate an improvement in the AP of this method in detecting early-stage PWD, and the provided modules are plug-and-play solutions. This work lays the foundation for field investigations of pine trees affected by PWD in forest areas and enhances the capability for the early detection of forest pests and diseases.

Author Contributions

Methodology, W.X.; software, W.X.; supervision, W.L., H.W. and H.Z.; writing—review and editing, H.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R & D Program of China, grant number 2022YFF1302700 and the Science and Technology of Complex Electronic System Simulation Laboratory, grant number 614201004012102.

Data Availability Statement

The data presented in this study are available on request from the corresponding author, H.W., upon reasonable request.

Acknowledgments

The authors would like to sincerely thank the editors and the anonymous reviewers.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Mamiya, Y. Pathology of the pine wilt disease caused by Bursaphelenchus xylophilus. Annu. Rev. Phytopathol. 1983, 21, 201–220. [Google Scholar] [CrossRef]
Wu, Z.; Jiang, X. Extraction of Pine Wilt Disease Regions Using UAV RGB Imagery and Improved Mask R-CNN Models Fused with ConvNeXt. Forests 2023, 14, 1672. [Google Scholar] [CrossRef]
Li, M.; Li, H.; Ding, X.; Wang, L.; Wang, X.; Chen, F. The Detection of Pine Wilt Disease: A Literature Review. Int. J. Mol. Sci. 2022, 23, 10797. [Google Scholar] [CrossRef] [PubMed]
Wang, J.; Deng, J.; Yan, W.; Zheng, Y. Habitat Suitability of Pine Wilt Disease in Northeast China under Climate Change Scenario. Forests 2023, 14, 1687. [Google Scholar] [CrossRef]
Wang, C.; Zhang, X.; An, S. Spectral Characteristics Analysis of Pinus Massoniana Suffered by Bursaphelenchus Xylophilus. Remote Sens. Technol. Appl. 2007, 22, 4. [Google Scholar]
Hao, Z.; Huang, J.; Li, X.; Sun, H.; Fang, G. A multi-point aggregation trend of the outbreak of pine wilt disease in China over the past 20 years. For. Ecol. Manag. 2022, 505, 119890. [Google Scholar] [CrossRef]
Cai, P.; Chen, G.; Yang, H.; Li, X.; Zhu, K.; Wang, T.; Liao, P.; Han, M.; Gong, Y.; Wang, Q. Detecting Individual Plants Infected with Pine Wilt Disease Using Drones and Satellite Imagery: A Case Study in Xianning, China. Remote Sens. 2023, 15, 2671. [Google Scholar] [CrossRef]
Näsi, R.; Honkavaara, E.; Blomqvist, M.; Lyytikäinen-Saarenmaa, P.; Hakala, T.; Viljanen, N.; Kantola, T.; Holopainen, M. Remote sensing of bark beetle damage in urban forests at individual tree level using a novel hyperspectral camera from UAV and aircraft. Urban For. Urban Green. 2018, 30, 72–83. [Google Scholar] [CrossRef]
Yuan, Y.; Hu, X. Random forest and objected-based classification for forest pest extraction from UAV aerial imagery. Int. Arch. Photogramm. Remote Sens. Spat. Inf. Sci. 2016, 41, 1093–1098. [Google Scholar] [CrossRef]
Hu, G.; Yin, C.; Wan, M.; Zhang, Y.; Fang, Y. Recognition of diseased Pinus trees in UAV images using deep learning and AdaBoost classifier. Biosyst. Eng. 2020, 194, 138–151. [Google Scholar] [CrossRef]
Diez, Y.; Kentsch, S.; Fukuda, M.; Caceres, M.L.L.; Moritake, K.; Cabezas, M. Deep Learning in Forestry Using UAV-Acquired RGB Data: A Practical Review. Remote Sens. 2021, 13, 2837. [Google Scholar] [CrossRef]
Safonova, A.; Tabik, S.; Alcaraz-Segura, D.; Rubtsov, A.; Maglinets, Y.; Herrera, F. Detection of fir trees (Abies sibirica) damaged by the bark beetle in unmanned aerial vehicle images with deep learning. Remote Sens. 2019, 11, 643. [Google Scholar] [CrossRef]
Deng, X.; Tong, Z.; Lan, Y.; Huang, Z. Detection and location of dead trees with pine wilt disease based on deep learning and UAV remote sensing. AgriEngineering 2020, 2, 294–307. [Google Scholar] [CrossRef]
Ren, S.; He, K.; Girshick, R.; Sun, J. Faster r-cnn: Towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 2017, 39, 1137–1149. [Google Scholar] [CrossRef] [PubMed]
Qin, J.; Wang, B.; Wu, Y.; Lu, Q.; Zhu, H. Identifying pine wood nematode disease using UAV images and deep learning algorithms. Remote Sens. 2021, 13, 162. [Google Scholar] [CrossRef]
Jocher, G. Ultralytics-YOLOv5. Available online: https://github.com/ultralytics/yolov5 (accessed on 4 October 2022).
Wu, B.; Liang, A.; Zhang, H.; Zhu, T.; Zou, Z.; Yang, D.; Tang, W.; Li, J.; Su, J. Application of conventional UAV-based high-throughput object detection to the early diagnosis of pine wilt disease by deep learning. For. Ecol. Manag. 2021, 486, 118986. [Google Scholar] [CrossRef]
Redmon, J.; Farhadi, A. Yolov3: An incremental improvement. arXiv 2018, arXiv:1804.02767. [Google Scholar]
Yu, R.; Luo, Y.; Li, H.; Yang, L.; Huang, H.; Yu, L.; Ren, L. Three-dimensional convolutional neural network model for early detection of pine wilt disease using UAV-based hyperspectral images. Remote Sens. 2021, 13, 4065. [Google Scholar] [CrossRef]
Bochkovskiy, A.; Wang, C.-Y.; Liao, H.-Y.M. Yolov4: Optimal speed and accuracy of object detection. arXiv 2020, arXiv:2004.10934. [Google Scholar]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
Wang, K.; Fu, X.; Huang, Y.; Cao, C.; Shi, G.; Zha, Z.-J. Generalized UAV Object Detection via Frequency Domain Disentanglement. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 18–22 June 2023; pp. 1064–1073. [Google Scholar]
Sun, X.; Deng, H.; Liu, G.; Deng, X. Combination of spatial and frequency domains for floating object detection on complex water surfaces. Appl. Sci. 2019, 9, 5220. [Google Scholar] [CrossRef]
Al-Saad, M.; Aburaed, N.; Panthakkan, A.; Al Mansoori, S.; Al Ahmad, H.; Marshall, S. Airbus ship detection from satellite imagery using frequency domain learning. In Proceedings of the Image and Signal Processing for Remote Sensing XXVII, Online, 13–18 September 2021; pp. 279–285. [Google Scholar]
Xu, K.; Qin, M.; Sun, F.; Wang, Y.; Chen, Y.-K.; Ren, F. Learning in the frequency domain. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 14–19 June 2020; pp. 1740–1749. [Google Scholar]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Howard, A.; Zhmoginov, A.; Chen, L.-C.; Sandler, M.; Zhu, M. Inverted residuals and linear bottlenecks: Mobile networks for classification, detection and segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask r-cnn. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 2961–2969. [Google Scholar]
Wang, S.; Cai, Z.; Yuan, J. Automatic SAR Ship Detection Based on Multi-Feature Fusion Network in Spatial and Frequency Domain. IEEE Trans. Geosci. Remote Sens. 2023, 61, 1–11. [Google Scholar]
Yang, A.; Li, M.; Wu, Z.; He, Y.; Qiu, X.; Song, Y.; Du, W.; Gou, Y. CDF-net: A convolutional neural network fusing frequency domain and spatial domain features. IET Comput. Vis. 2023, 17, 319–329. [Google Scholar] [CrossRef]
Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
Lin, T.-Y.; Dollár, P.; Girshick, R.; He, K.; Hariharan, B.; Belongie, S. Feature pyramid networks for object detection. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 2117–2125. [Google Scholar]
Liu, S.; Qi, L.; Qin, H.; Shi, J.; Jia, J. Path Aggregation Network for Instance Segmentation. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Salt Lake City, UT, USA, 18–22 June 2018. [Google Scholar]
Woo, S.; Park, J.; Lee, J.-Y.; Kweon, I.S. Cbam: Convolutional block attention module. In Proceedings of the European Conference on Computer Vision (ECCV), Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Dai, J.; Qi, H.; Xiong, Y.; Li, Y.; Zhang, G.; Hu, H.; Wei, Y. Deformable Convolutional Networks. In Proceedings of the IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017. [Google Scholar]
DeVries, T.; Taylor, G.W. Improved regularization of convolutional neural networks with cutout. arXiv 2017, arXiv:1708.04552. [Google Scholar]
Muhammad, M.B.; Yeasin, M. Eigen-cam: Class activation map using principal components. In Proceedings of the 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, UK, 19–24 July 2020; pp. 1–7. [Google Scholar]
Carion, N.; Massa, F.; Synnaeve, G.; Usunier, N.; Kirillov, A.; Zagoruyko, S. End-to-end object detection with transformers. In Proceedings of the European Conference on Computer Vision, Glasgow, UK, 23–28 August 2020; pp. 213–229. [Google Scholar]
Ferreira, M.P.; Lotte, R.G.; D’Elia, F.V.; Stamatopoulos, C.; Kim, D.-H.; Benjamin, A.R. Accurate mapping of Brazil nut trees (Bertholletia excelsa) in Amazonian forests using WorldView-3 satellite images and convolutional neural networks. Ecol. Inform. 2021, 63, 101302. [Google Scholar] [CrossRef]
Flood, N.; Watson, F.; Collett, L. Using a U-net convolutional neural network to map woody vegetation extent from high resolution satellite imagery across Queensland, Australia. Int. J. Appl. Earth Obs. Geoinf. 2019, 82, 101897. [Google Scholar] [CrossRef]
Gao, B.; Yu, L.; Ren, L.; Zhan, Z.; Luo, Y. Early Detection of Dendroctonus valens Infestation at Tree Level with a Hyperspectral UAV Image. Remote Sens. 2023, 15, 407. [Google Scholar] [CrossRef]
Wang, G.; Zhao, H.; Chang, Q.; Lyu, S.; Liu, B.; Wang, C.; Feng, W. Detection Method of Infected Wood on Digital Orthophoto Map–Digital Surface Model Fusion Network. Remote Sens. 2023, 15, 4295. [Google Scholar]
Hassan, S.I.; Alam, M.M.; Illahi, U.; Suud, M.M. A new deep learning-based technique for rice pest detection using remote sensing. PeerJ Comput. Sci. 2023, 9, e1167. [Google Scholar] [CrossRef]
Zhang, J.; Cong, S.; Zhang, G.; Ma, Y.; Zhang, Y.; Huang, J. Detecting Pest-Infested Forest Damage through Multispectral Satellite Imagery and Improved UNet++. Sensors 2022, 22, 7440. [Google Scholar] [CrossRef]
Xu, X.; Zhao, S.; Xu, C.; Wang, Z.; Zheng, Y.; Qian, X.; Bao, H. Intelligent Mining Road Object Detection Based on Multiscale Feature Fusion in Multi-UAV Networks. Drones 2023, 7, 250. [Google Scholar] [CrossRef]
Yao, J.; Qi, J.; Zhang, J.; Shao, H.; Yang, J.; Li, X. A Real-Time Detection Algorithm for Kiwifruit Defects Based on YOLOv5. Electronics 2021, 10, 1711. [Google Scholar] [CrossRef]
Yan, B.; Fan, P.; Lei, X.; Liu, Z.; Yang, F. A Real-Time Apple Targets Detection Method for Picking Robot Based on Improved YOLOv5. Remote Sens. 2021, 13, 1619. [Google Scholar] [CrossRef]
Nepal, U.; Eslamiat, H. Comparing YOLOv3, YOLOv4 and YOLOv5 for Autonomous Landing Spot Detection in Faulty UAVs. Sensors 2022, 22, 464. [Google Scholar] [CrossRef] [PubMed]
Jiang, C.; Ren, H.; Ye, X.; Zhu, J.; Zeng, H.; Nan, Y.; Sun, M.; Ren, X.; Huo, H. Object detection from UAV thermal infrared images and videos using YOLO models. Int. J. Appl. Earth Obs. Geoinf. 2022, 112, 102912. [Google Scholar] [CrossRef]
Chen, F.; Zhang, L.; Kang, S.; Chen, L.; Dong, H.; Li, D.; Wu, X. Soft-NMS-Enabled YOLOv5 with SIOU for Small Water Surface Floater Detection in UAV-Captured Images. Sustainability 2023, 15, 10751. [Google Scholar] [CrossRef]
Wang, J.; Zhang, F.; Zhang, Y.; Liu, Y.; Cheng, T. Lightweight Object Detection Algorithm for UAV Aerial Imagery. Sensors 2023, 23, 5786. [Google Scholar] [CrossRef]

Figure 1. Diagram of each stage of pine wood nematode infestation.

Figure 2. Feature distribution of pine wood nematode infestation (the red box represents the early-stage PWD): (a) UAV image; (b) DCT transformed image; (c) selected area after DCT transformation. Y, Cb, Cr frequency domain signal statistic results (the blue area represents early-stage PWD, and the purple area represents the background).

Figure 3. Location of the study areas: (a) the map of Liaoning Province; (b) the colored squares on the map indicate five different study areas, with each area numbered from 0 to 4; (c) the study areas, as observed in high-resolution images captured by Google Earth on 20 April 2023.

Figure 4. Structure of proposed METD network.

Figure 5. Frequency domain information extraction process.

Figure 6. Structure of FEM_HL.

Figure 7. Structure of FEM_L.

Figure 8. Structure of FFM.

Figure 9. Comparison of YOLOv5s and our method on early-stage PWD dataset detection results (the wight boxes represent the ground truth, while the red boxes represent the detections made by the model): (a) the original image; (b) ground truth; (c) detection results of YOLOv5s; (d) detection results of our method.

Figure 10. Comparison of YOLOv5s and our method on Y-0 sample plot dataset detection results (the wight boxes represent the ground truth, while the red boxes represent the detections made by the model): (a) the original image; (b) ground truth; (c) detection results of YOLOv5s; (d) detection results of our method.

Figure 11. Comparison of heatmaps between original method and our method (the red boxes represent the ground truth, while boxes of other colors represent the detections made by the model): (a) original image; (b) ground truth annotated as 800 × 800; (c) last layer of K0; (d) last layer of K3; (e) last layer of K1; (f) last layer of K6.

Table 1. Detection performance of FEM_HL and FEM_L.

Method	FEM_HL	FEM_L	AP0.5/%	F1/%
K0			76.4	71.9
K1	√		81.1	78.1
K2		√	79.1	75.8

The “√” indicates that the module is added.

Table 2. Comparison of ablation experiments.

Method	DCT	FEM_HL	FFM	Cut-Out	AP0.5/%	F1/%
K0					76.4	71.9
K3	√				80.3	77
K1	√	√			81.1	78.1
K4	√		√		79.9	77
K5	√			√	81.5	78.6
K6	√	√	√		83.1	79.1
METD	√	√	√	√	84.6	79.2

Table 3. The performance comparison between different algorithms and our method on the early-stage PWD dataset.

Method	Model	Input Data	AP0.5/%	F1/%
K7	Faster-RCNN(Res101)	RGB	67.2	64.4
K8	DETR(Res101)	RGB	78.7	75.1
K9	YOLOv3	RGB	74.9	71.0
K10(ours)	YOLOv3	RGB + Freq	79.3	72.1
K11	YOLOv4	RGB	74.8	70.5
K12(ours)	YOLOv4	RGB + Freq	80.8	73.2
K13	YOLOv5n	RGB	75.3	70.4
K14 (ours)	YOLOv5n	RGB + Freq	82.2	76.1
K15	YOLOv5s	RGB	76.4	71.9
K16 (ours)	YOLOv5s	RGB + Freq	84.6	79.2
K17	YOLOv5m	RGB	78.0	74.0
K18 (ours)	YOLOv5m	RGB + Freq	84.9	79.0
K19	YOLOv5l	RGB	76.1	71.0
K20 (ours)	YOLOv5l	RGB + Freq	83.2	77.0
K21	YOLOv5x	RGB	77.5	72.0
K22 (ours)	YOLOv5x	RGB + Freq	82.4	77.1

RGB represents input data as spatial domain images, and Freq represents input data as frequency domain images.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xie, W.; Wang, H.; Liu, W.; Zang, H. Early-Stage Pine Wilt Disease Detection via Multi-Feature Fusion in UAV Imagery. Forests 2024, 15, 171. https://doi.org/10.3390/f15010171

AMA Style

Xie W, Wang H, Liu W, Zang H. Early-Stage Pine Wilt Disease Detection via Multi-Feature Fusion in UAV Imagery. Forests. 2024; 15(1):171. https://doi.org/10.3390/f15010171

Chicago/Turabian Style

Xie, Wanying, Han Wang, Wenping Liu, and Hanchen Zang. 2024. "Early-Stage Pine Wilt Disease Detection via Multi-Feature Fusion in UAV Imagery" Forests 15, no. 1: 171. https://doi.org/10.3390/f15010171

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Early-Stage Pine Wilt Disease Detection via Multi-Feature Fusion in UAV Imagery

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Area and UAV Image Data

2.2. Dataset

2.3. Model Structure

2.4. DCT

2.5. Frequency Domain Feature Enhancement Module

2.6. Feature Fusion Module

2.7. Evaluation Indicators

2.8. Other Details

3. Results

3.1. Implementation Details

3.2. Experiments on the Structures of FEM

3.3. Ablation Experiments

3.4. Experiments of Different Algorithms

4. Discussion

4.1. Object Detection

4.2. Model Performance

4.3. Limitations and Practical Considerations

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI