*2.2. Deep Learning-Based PWD Detection*

Several deep neural architectures have been proposed in recent years, and these have achieved significant breakthroughs in diverse problem domains. Deep learning-based object detection is a challenging multi-task problem which involves assigning a class label to an object of interest and then learning to locate the object's position. There have been two main streams of research related to the object detection model. The first is the region of interest (ROI) driven method, where the DNN (deep neural network) filters out the irrelevant background and then leaves the rest for refined classification. The typical example is Faster R-CNN [**?** ], which consists of an encoder (CNN layer), Region Proposal Network (RPN), ROI pooling, and a classifier. The encoder network extracts various feature maps by convolution mapping of the input sample into latent space. The RPN first generates thousands of candidate bounding boxes from the feature map, then a simple classifier filters out the negative bounding boxes (i.e., background) while retaining the more probable positive bounding boxes (i.e., foreground). ROI pooling [**?** ] collects feature maps in the positive bounding box and pools them into the same size. Finally, the bounding box regression block captures the precise location, and the classifier blocks divide the ROIs into specific categories. These two-stage object detection methods are relatively slow for real-time problems. That is why another types of single-stage object detection methods have been proposed. The well-known single-stage architectures include the YOLO series [**? ???** ] and the SSD series [**? ?** ]. In the single-stage method, there is no RPN, and the DNN has the ability to simultaneously localize and classify the target object. Therefore, using a single-stage detector network significantly reduces network complexity and speeds up the inference process. However, it still fails to provide more accurate position and classification information than the two-stage ROI driven method.

Some researchers have used a deep learning-based object detection for PWD. Reference [**?** ] presented a two-stage object detection method that uses UAV remote sensing images to locate PWD-infected trees. Reference [**?** ] compared the performance of single and two-stage object detection methods for PWD. In addition, Reference [**?** ] presented a dataset with multi-band images and built a spatial-context-attention network (SCANet) with an expanded receptive field to better utilize context information. Further, Reference [**?** ] proposed a faster disease filtering method that employs a lightweight one-stage detection model that discards a large number of irrelevant images before classifying the rest.

The previous studies devoted to PWD detection have suffered from a lack of data to train the DNN. By contrast, the DNN in our PWD detection method was trained on a large and previously-unused dataset, and our method is therefore more generalized and robust. A large number of "disease-like" objects push our object detector to build clear boundaries to distinguish them. Compared to the existing methods, our proposed method is more precise and rigorous in a real-world scenario.
