**3. Results**

Four well-known loss functions, i.e., L1, mean square error (MSE), binary cross-entropy (BCE) [48], and focal loss [49] are used in this study. The L1 and MSE can be regarded as the most classic and typically used criteria for pixel-to-pixel comparisons. The BCE is a typical loss function that increases or decreases exponentially for binary classification. The focal loss introduces a scale factor to the BCE to reduce the importance of the easy example. These loss functions were trained either with or without the NFS, separately. All experiments were performed on the same dataset and processing platform.

Three typically used balanced metrics, i.e., the f1-score, Jaccard index, and kappa coefficient, are used for the quantitative evaluation. Compared with unbalanced metrics such as precision and recall, the selected metrics provide a more generalized accuracy level by considering both precision and recall.

#### *3.1. Learning Curves*

Figure 6 shows the relative values of loss from different loss functions under the validation dataset. Among all the loss functions (i.e., L1, MSE, BCE, and focal), the loss with the NFS (i.e., +NFS) indicated a faster converging speed than those without (i.e., −NFS).

**Figure 6.** Trends in validation loss values over different iterations.

Figure 7 shows the trend of kappa coefficient values over various iterations from four different loss functions under the validation dataset. Among all the conditions, the focal loss trained with the proposed NFS (i.e., focal + NFS) indicates the highest kappa coefficient values in most of the iterations. By contrast, the L1 loss trained without the NFS (i.e., L1 − NFS) indicated the lowest kappa coefficient values for almost every iteration.

**Figure 7.** Trends in validation accuracy values over different iterations.

#### *3.2. Quantitative Results*

Figure 8a shows the relative performances of different loss functions under the test dataset. Among all loss functions (i.e., L1, MSE, BCE, and focal), the loss with the NFS indicates the higher values for all evaluation metrics.

Figure 8b shows the corresponding values of the evaluation metrics over various loss functions. Among four loss functions, regardless of with or without the NFS, the focal loss is generally better than BCE, MSE, and L1 loss. L1 loss without NFS (L1 − NFS) indicates the lowest values for all metrics in all conditions. The best performance is achieved by focal loss with NFS, i.e., 0.651 for f1-score, 0.490 for the Jaccard index, and 0.626 for the kappa coefficient. Under all loss functions, the addition of the NFS results in significantly higher values for all evaluation metrics. The result indicates that the proposed NFS can effectively manage the slight misalignments from the annotation and achieve better performance. Interestingly, on the weakest L1 loss, the addition of the NFS results in the most significant increments among the three evaluation metrics. The increments of the f1-score, kappa coefficient, and Jaccard index reached 8.8%, 8.9%, and 9.8%, respectively.

#### *3.3. Qualitative Results*

Figure 9 presents six representative results of outlines extracted from the model trained by L1 loss with/without the NFS under test dataset. The backgrounds, red lines, and green circles represent the aerial input, predicted outline, and focused area. In general, the addition of the NFS yields a better building outline extraction, particularly on shadowed areas (e.g., green circles in a, b, and e) and turning corners (e.g., green circles in d and f). Additionally, the model trained with the NFS yields a more intact outline (e.g., green circles in c).

Figure 10 shows six representative groups of building outlines extracted from the model trained by the MSE loss with/without the NFS. Generally, the addition of the NFS yields a slightly better building outline extraction. Using the NFS, the extracted outlines contain fewer false positives within buildings (e.g., green circles in a and b) and fewer breakpoints (e.g., green circles c, d, e, and f).

Figure 11 shows six representative groups of outlines extracted from the model trained by BCE loss with or without the NFS. The backgrounds, red lines, and green circles represent the aerial input, predicted outline, and focused area, respectively. As shown in the figure, the addition of the NFS yields a slightly better line extraction at areas shadowed by surrounding trees (e.g., green circles of column a, e, and f). Moreover, the additional NFS results in better line continuity around corners of the buildings

(e.g., green circles of column b, c, and d). In general, using the proposed NFS, the building outline extracted from the aerial image is more intact, particularly on building corners and shadowed areas.


