(b) Table

**Figure 8.** Performances of different losses, either with or without nearest feature selector (NFS). (**a**) Bar chart for comparison of relative performances (**b**) Table of performances under different loss functions. For each loss function, the highest values are highlighted in bold.

**Figure 9.** Representative results of extracted outlines from model trained by L1 loss with/without nearest feature selector (NFS). Backgrounds, red lines, and green circles represent aerial input, predicted outline, and focused area, respectively. Selected results are denoted as (**<sup>a</sup>**–**f**).

**Figure 10.** Representative results of outlines extracted from model trained by mean square error (MSE) loss with/without nearest feature selector (NFS). Backgrounds, red lines, and green circles represent aerial input, predicted outline, and focused area, respectively.Selected results are denoted as (**<sup>a</sup>**–**f**).

**Figure 11.** Representative results of outlines extracted from model trained by binary cross-entropy (BCE) loss with/without nearest feature selector (NFS). Backgrounds, red lines, and green circles represent aerial input, predicted outline, and focused area, respectively.Selected results are denoted as (**<sup>a</sup>**–**f**).

Figure 12 presents six representative pairs of building outlines extracted from the model trained with the focal loss with or without the NFS. Owing to the robustness of the focal loss, even without the NFS, the model successfully recognizes and extracts the major parts of the building outline from the aerial input (e.g., b, c, and f). However, with the additional NFS, the generated outlines contain fewer false positives around corners with complicated backgrounds (e.g., a, d and e). Compared with L1 loss, the addition of NFS imposes a less significant effect on the model trained with focal loss. This observation is consistent with the quantitative result shown in Figure 8b.

**Figure 12.** Representative results of outlines extracted from model trained by focal loss with/without nearest feature selector (NFS). Backgrounds, red lines, and green circles represent aerial input, predicted outline, and focused area, respectively.Selected results are denoted as (**<sup>a</sup>**–**f**).

Figure 13 presents four representative pairs of failure cases from the model trained with the loss function that combines with or without the nearest feature selector (NFS). As compared with the model trained without NFS, the addition of NFS might lead to un-expected misclassification around corners.

**Figure 13.** Representative failure cases of outlines extracted from model trained by four losses with/without nearest feature selector (NFS). Backgrounds, red lines, and green circles represent aerial input, predicted outline, and focused area, respectively.

#### *3.4. Computational Efficiency*

All experiments are trained and tested on a Sakura "koukakuryoku" Server (https://www.sakura. ad.jp/koukaryoku/) equipped with a 4*times* NVIDIA Tesla V100 GPU (https://www.nvidia.com/ en-us/data-center/tesla-v100/) and installed with 64-bit Ubuntu 16.04 LTS. The original SegNet is implemented on Caffe [50] and trained on multi-class scene segmentation tasks, CamVid road scene segmentation [51] and SUN RGB-D indoor scene segmentation [52]. The stochastic gradient descent (SGD) with a fixed learning rate of 0.1 and a momentum of 0.9 is applied to train the

model. The implementation of the modified SegNet is based on geoseg (https://github.com/ huster-wgm/geoseg) [53], which is built on top of Pytorch(version ≥ 0.4.1). To avoid interference by other hyperparameters, all models are trained with a fixed batch size (i.e., 24) and a constant iteration (i.e., 10,000). The Adam stochastic optimizer, which operates at default settings (lr = <sup>2</sup>−4, betas = [0.9, 0.999]), is used for training different models.

Table 1 shows the computing speeds of the methods in frames per second (FPS). Among all the loss functions, the additional NFS results in slightly longer processing time during both training and testing. However, the decline in PFS is not significant.


**Table 1.** Comparison of the computational efficiencies of different loss functions under conditions that with or without NFS.
