**4. Discussion**

#### *4.1. Regarding the NFS*

In recent years, fully convolutional networks have demonstrated their ability in automatically extracting line features, including roads and building outlines [36,39,54]. However, those studies mainly focused on designing deeper or more complex network architectures to enhance the representation capability for better predictions. The loss functions of fully convolutional networks cannot handle misalignments or rotations between inputs and manually created annotations. Because the building outline occupies a small portion of pixels, misalignments and rotations will severely interfere with the building outline extraction accuracy.

Herein, we propose the NFS module to dynamically re-align the prediction and corresponding annotation. The proposed framework can be easily appended into existing loss functions, such as L1, MSE, and focal loss. Through a dynamic re-alignment, the addition of NFS enables the correct position of the annotation to be located for an appropriate loss calculation. Qualitative and quantitative results based on the testing data demonstrated the effectiveness of our proposed NFS.

#### *4.2. Accuracies, Uncertainties, and Limitations*

Among all methods, the focal loss with NFS indicates the highest values for all evaluation metrics. Its values of the f1-score, Jaccard index, and kappa coefficient are 0.624, 0.597, and 0.468. Compared with the naive L1 loss, the addition of the NFS results in significant increments in all evaluation metrics. The increments of the f1-score, kappa coefficient, and Jaccard index reach 8.8%, 8.9%, and 9.8%, respectively. As it is arguable that the kappa coefficient is unsuitable in the assessment and comparison of the accuracy [55], the actual performance gained from the NFS might be less significant (i.e., less than 9.8%). For robust loss functions (e.g., focal, and BCE loss), the improvement afforded by the NFS is less significant (see details in Figure 8b). Owing to the sliding-and-matching mechanism, the proposed NFS cannot be applied to annotations that require rotation correction. Since the methods are designed and trained on image patches with dense buildings, the trained model is not appropriate for evaluating the entire study area where buildings are sparsely presented.

We observe a slight decrease in processing speed when the NFS is applied through the analysis of computational efficiency. Considering the performance gain by the NFS, computational efficiency degradation is negligible. Because the NFS is independent of the aerial characteristic, in principle, it should apply for not only aerial images, but also other data sources (e.g.satellite, SAR, and UAV). The effectiveness of the NFS will be further estimated using publicly available datasets from various sources [56].

Because of the extremely biased negative/positive ratio, complete building outline extraction is still challenging. With the current classification-based scheme, the model is trained to generate pixel-to-pixel prediction using features extracted from sequential convolutional layers. The predicted pixels of the building outline lack of internal connectivity that some pixels might be misclassified as non-outline (e.g., 2nd and 3rd rows in Figure 9).
