5.1.3. Evaluation in Real-World Dataset

We captured another 10 orthophotographs (Appendix **??**) from different cities in South Korea to test the reliability of the proposed system. Each orthophotograph has more than 30,000 × 20,000 pixels (360,000 m<sup>2</sup> in GSD 6.00 cm/pixel) area (Table **??**). We followed the same preprocessing method as that described in the "software integration" section, and the PWD detector captured 711 out of 730 PWD-infected trees in various resolutions (Table **??**). To show the robustness of our network, we draw a portion of an orthophotograph in Figure **??**. The potential infected pine trees (GT) are labeled by red dots. The blue bounding box denotes the inference result obtained by the trained detector. The green panel shows the sample of TPs, which has various symptoms of PWD-infected trees across the early to late stage. The red panel shows the false-detected PWD-infected trees. Distinguishing those "disease-like" objects in RGB channels remains a challenging task due to the ambiguity in both shape and color. Further investigations using either multi-spectral images or field investigations are needed.

**Figure 7.** Prediction result using the Goomisi Goaeup dataset described in Table **??**. The bottom left corner shows a magnification of the content in the white dash box. We selected some predicted bounding boxes and marked them with circles. The instances of enlarged disease are shown in the right figure; the green panel shows correctly detected PWD while the red panel represents the false alarms that show an appearance similar to that of a diseased tree.

### *5.2. The Effect of Hard Negative Mining*

HNM was proposed to alleviate the high variance and irreducible error in cases of limited training samples. When the model only sees a few types of disease symptom from a limited number of samples, a large number of background regions makes the detector liable to over-study, whereby it tends to map the true target into no disease. HNM provide many ambiguous "disease-like" objects which share a similar pattern with the disease in the middle and late stages. Including these ambiguous objects helps the model build clear boundaries by learning more discriminative features. In addition, the generated hard negative samples contain a lot of "out of interest regions" (background) such as highways, broadleaf forests, farmland, etc.; this diversity in the training data generalizes the system's ability to correctly classify real-world problems.

Figure **??** shows some examples of the stark contrast that occurred when we applied finetuning with six "disease-like" categories. HNM successfully suppresses the confidence score of ambiguous objects while locating real disease well. The "wb" category (first column) signifies no PWD-infected dead trees; "wb" and PWD-infected dead trees differ only slightly in the branches. Including "wb" reduces the number of FP bounding boxes and possibly well guides the network to update the network for real PWD-infected dead tree by lowering the confidence score of FPs. Moreover, low-resolution images leads to confusion in identifying the yellow land and small PWD objects. The new category "yellow land" alleviates the effect of ground (confidence score 0.960 –> 0.003 in column (2)). The same phenomenon happens with the "maple" category. Due to the loss of water in the late disease stage, PWD-infected trees tend to show a red-brown color, thus appearing similar to maple trees. Without HNM, the network predicted maple trees (column 6, first row) as PWD-infected disease (high confidence score), but including the additional maple samples promoted the learning ability to filter errors (confidence score 0.893 –> 0.630).

**Figure 8.** Advantage of using hard negative mining. The detector bounding boxes are the "diseaselike" object (FP). For convenience, we zoomed out from the region of interest to highlight the difference when hard negative mining is applied.
