3.2.1. Fully Convolutional Networks

We address boundary detection as a supervised pixel-wise image classification problem to discriminate between boundary and non-boundary pixels. The network used in this research is modified from the FCN with dilated kernel (FCN-DK) as described in [18]. We mainly did three modifications: (1) discarding the max-pooling layers; (2) using smaller-size filters; and (3) constructing deeper networks. A typical max-pooling layer computes the local maximum within the pooling filter, thereby merging the information of nearby pixels and reducing the dimension of the feature map [14]. In most cases, down-sampling is performed using pooling layers to capture large spatial patterns in the image, and then the coarse feature maps extracted through this process are up-sampled back to produce pixel-wise prediction at the resolution of the input image. However, in the structure of FCN-DK,

max-pooling layers are designed not to down-sample the input feature map by using a stride of one, therefore avoiding the need for up-sampling in the later stage. Nevertheless, the max-pooling results in a smoothing effect which is reducing the accuracy of boundaries. We therefore discarded it in the proposed network. In [18], Persello and Stein also demonstrated a case study based on FCN-DK using six convolutional layers with a filter size of 5 × 5 pixels. We modified it into 12 convolutional layers with 3 × 3 filters, as two 3 × 3 filters have the same receptive field as one 5 × 5 filter but less learnable parameters. Moreover, compared to single larger-sized filters, multiple small filters are interleaved by activation functions, resulting in better abstraction ability. Therefore, with less learnable parameters and better feature abstraction ability, smaller filters along with deeper networks are preferred.

Figure 3 shows the architecture of the proposed FCN. It consists of 12 convolutional layers interleaved by batch normalization and Leaky Rectified Linear Units (Leaky ReLU). Batch normalization layer is used to normalize each input mini-batch [19], and Leaky ReLU is the activation function of the network [20]. The classification is performed by the final SoftMax layer.

The core components of our network are the convolutional layers. They can extract spatial features hierarchically at different layers corresponding to different levels of abstraction. The 3 × 3 kernels used in the convolutional layers are dilated increasingly from 1 to 12 to capture larger contextual information. As a result, a receptive field of up to 157 × 157 pixels was achieved in the final layer. In each convolutional layer, zero paddings were used to keep the output feature maps at the same spatial dimension as the input. Therefore, the proposed FCN can be used to classify arbitrarily sized images directly and obtain correspondingly sized outputs.

**Figure 3.** Architecture of the proposed FCN.

To train the FCN, we randomly extracted 500 patches for training and 500 patches for validation from each training tile. All the patches were fully labeled with a patch size of 145 × 145 pixels. Stochastic gradient descent with a momentum of 0.9 was used to optimize the loss function. The training is performed in multiple stages using a different learning rate. We use a learning rate of 10−<sup>5</sup> for the first 180 epochs and a learning rate of 10−<sup>6</sup> for another 20 epochs. A sudden decrease can be observed in the learning curves when the learning rate changes (Figure 4). The implementation of the network is based on the MatConvNet (http://www.vlfeat.org/matconvnet/) library. All experiments were performed on a desktop workstation with an Intel Core CPU i7-8750H at 2.2 GHz, 16 GB of RAM, and a Nvidia Quadro P1000 GPU. The training time for the FCN was 6 h for each study area.

#### 3.2.2. Globalized Probability of Boundary (gPb)

Globalized Probability of Boundary (gPb) was proposed by Arbeláez et al. in 2011 [11]. gPb (global Pb) is a linear combination of mPb (multiscale Pb) and sPb (spectral Pb). The former conveys local multiscale Pb signals and the latter introduces global information. Multiscale Pb is an extension of the Pb detector advanced by Martin, Fowlkes and Malik [21]. The core block of Pb detector is calculating the oriented gradient signal (*x*,*y*,θ) from the intensity images. By placing a circular disc at the pixel location (*x*,) and dividing it into two half-discs at angle θ, we can obtain two histograms of pixel intensity values within each half-disc. (*x*,*y*,θ) is defined by the χ2 distance between the two histograms. For each input image, the Pb detector divides it into four intensity images, including brightness, color a, color b and texture channel. The oriented gradient signals are calculated separately for each channel. Multiscale Pb modifies the Pb detector by considering the gradients at three different scales, which means we give the discs three different diameters. Therefore, we can obtain local cues at different scales, from fine to coarse structures. For each pixel, the final mPb is obtained by combining the gradients of brightness, color a, color b and texture channel on three scales. Spectral Pb combines the multiscale image cues into an affinity matrix which defines the similarity between pixels. The eigenvectors of the affinity matrix which carry contour information are computed. They are treated as an image and convolved with Gaussian directional derivative filters. The sPb is calculated by combing the information from different eigenvectors.

Generally speaking, mPb detects all the edges while sPb extracts only the most salient one from the whole image. gPb combines the two and provides uniformly better performance. After detecting the boundary probability of each pixel using gPb, we also applied a grouping algorithm using the Oriented Watershed Transform and Ultrametric Contour Map (gPb–owt–ucm) to extract connected contours [11].

**Figure 4.** Learning curves of the FCNs in Busogo (**left**) and Muhoza (**right**).
