*3.1. CNN Architecture*

We first tested different pre-trained base CNNs (VGG, ResNet, Inception, Xception, MobileNet and DenseNet) to which we added trainable layers. The combined CNN model was trained with a batch size of 32 for 100 epochs. In the case of no learning, the training stopped earlier. Out of the 50% of balanced data used for training, we used 10% for validation. These data were not seen by the network, but were used only to calculate loss and accuracy per epoch. These metrics and their curves looked most promising for VGG19 [25]. Then we applied the trained network to the remaining 50% of testing data. VGG19 is a 19 layer deep CNN developed by the Visual Geometry Group (VGG) from the University of Oxford (Oxford, Oxfordshire, UK). It is trained to classify images into 1000 object categories, such as keyboard, mouse pencil and many animals. It has learned high-level features for a wide range of images from ImageNet [34]. ImageNet is a dataset of over 15 million labeled high-resolution images with around 22,000 categories. Compared to other CNNs, VGG has shown to generalize well, compared to more complex and less deep CNN architectures [25].

We used VGG19 layers pre-trained for 20,024,384 parameters as a base model. Next, we modified hyper-parameters for VGG19 on the learning optimizer, the depth of the fully connected layer, and the dropout rate to optimize accuracy and loss. We used softmax as an activation function to retrieve predictions for tiles being 'not boundary' in the range [0, 1]. These values represent the weights for the later least-cost-path calculation. Sigmoid activation, which is a type of softmax for a binary classification problem, provided similar results in terms of accuracy and loss. However, it required more post-processing, as the resulting value in the range [0, 1] cannot be understood as described for softmax activation.

The aim was to maximize the accuracy for training and validation data, while minimizing loss. To avoid over-fitting, the curves for training and validation accuracy should not diverge, which was achieved by increasing the dropout rate from 0.5 to 0.8. To avoid under-fitting, the curve for training accuracy should not be below that of the validation accuracy, which was avoided by increasing the depth of the fully connected layer from 16 to 1,024. To avoid oscillations in loss, the learning rate was lowered from 0.01 to 0.001. Learning was stopped once the validation accuracy did not further improve. Results and observations derived from different hyper-parameter settings and different pre-trained base CNNs are provided in the Appendix A (Table A1).

We achieved the best results after training 8,242 parameters on four trainable layers added to 22 pre-trained VGG19 layers (Table 3). This led to a validation accuracy of 71% and a validation loss of 0.598 after 200 epochs (Figure 7). The accuracy could be increased by 1% after 300 epochs, with validation loss restarting to increase to 0.623. We conclude that optimal results are achieved after 200 epochs. 100 epochs halve the training time to 11 hours, whilst obtaining 1% less accuracy and a loss of 0.588. The implementation relies on the open source library Keras [35], and this is publically available [23]. All experiments are conducted on a machine having a NVIDIA GM200 (GeForce GTX TITAN X) GPU with 128 GB RAM (Nvidia Corporation, Santa Clara, CA, US).



**Figure 7.** Accuracy and loss for our fine-tuned VGG19.

#### *3.2. RF vs. CNN Classification*

Of those lines that should get a boundary likelihood > 0, i.e., those that fall within the cadastral reference buffer, 100% for RF and 98% for CNN are assigned a boundary likelihood > 0 (Table 4). This means that both classifiers predict a boundary likelihood in the range [0, 1] when there is some overlap with the cadastral reference buffer.


**Table 4.** Is the boundary likelihood predicted for the correct lines?

Next, we looked at how valid the boundary likelihood is, i.e., whether its value is equal to the line's overlap with the cadastral reference buffer. For this we excluded lines having no overlap with the cadastral reference buffer, i.e., those having an overlap = 0. We grouped the remaining lines to compare boundary likelihood and overlap values (Table 5). For RF-derived boundary likelihoods, we obtained an accuracy of 41% and a precision of 49%. For CNN-derived boundary likelihoods, we obtained an accuracy of 52% and a precision of 76%. The percentage of lines per value interval of 0.25 for the same boundary likelihood and overlap value deviated on average by 15% for RF and by 7% for CNN (Table 5).

Overall, CNN-derived boundary likelihoods obtained a similar recall, a higher accuracy, and a higher precision (Table 4). The percentage of lines for different ranges of boundary likelihoods represented the distribution of overlap values more accurately (Table 5). Even though the values of overlap and boundary likelihood do not express the same, they provide a valid comparison between RF- and CNN-derived boundary likelihoods. We consider CNN-derived boundary likelihoods a better input for the interactive delineation, and continue the accuracy assessment for a boundary classification based on CNN.


**Table 5.** How correct is the predicted boundary likelihood?

#### *3.3. Manual vs. Automated Delineation*

Indirect surveying, comprising of manual or automated delineation, both rely on visible boundaries. Before comparing manual to automated delineation, we filtered the cadastral reference data for Ethiopia (Figure 2b) to contain visible parcels only. We kept only those parcels for which all boundary parts were visually demarcated. As in Kohli et al. [4], we consider only fully closed polygons that are entirely visible in the image. From the original cadastral reference data, we kept 38% of all parcels for which all boundaries were visible. In Kohli et al. [4], the portion of fully visible parcels has been reported to average around 71% of all cadastral parcels in rural Ethiopian areas. We can confirm 71% for parts of our study area that cover smallholder farms. Cadastral data for Rwanda and Kenya were delineated based on local knowledge in alignment with visible boundaries. As for Ethiopia, only fully closed and visible parcels were considered. The mean size of our visible parcels amounts to 2,725 m<sup>2</sup> for Ethiopia, 656 m<sup>2</sup> for Rwanda, and 730 m<sup>2</sup> for Kenya.

When manually delineating visible boundaries, we observed how tiring a task this manual delineation is: The delineator has to continuously scan the image for visible boundaries to then click precisely and repeatedly along the boundary to be delineated. Apart from the visual observation of the ortho-image, the delineator has no further guidance on where to click. Each parcel is delineated the same way, which makes it a highly repetitive task that exhausts the eyes and fingers in no time.

When comparing manual to automated delineation, this impression changes: The delineator now has lines and vertices to choose from, which can be connected automatically using multiple functionalities (Table 2, Figure 6). Complex, as well as simple, parcels require fewer clicking when delineating with the automated approach: To follow a curved outline, manual delineation requires frequent and accurate clicking while zooming in and out. Automated delineation requires clicking on vertices covering the start and endpoint once, before they are automatically connected precisely following object outlines (Figure 6d). Similarly, the automated delineation is superior for simple rectangular parcels: While manual delineation requires accurate clicking on each of the at least four corners of a rectangle, automated delineation allows clicking once somewhere inside the rectangle to retrieve its outline (Figure 8a).

However, choosing the optimal functionality can be time-consuming, especially in cases of fragmented MCG lines obtained from high-resolution UAV data. We assume that the time for automated delineation can be reduced through increased familiarity with all functionalities and by further developing their usability, e.g., by keyboard shortcuts.

**Figure 8.** (**a**) Automated delineation requires clicking once somewhere in the parcel, while manual delineation requires precise clicking at least four times on each corner. (**b**) Boundaries partly covered or delineated by vegetation impede indirect surveying and limit the effectiveness of our automated delineation compared to manual delineation.

Automated delineation required fewer clicks for our rural and peri-urban study areas (Table 6). Only those parcels for which one of our functionalities was more effective than manual delineation are considered for the automated delineation, amounting to 40–58% of all visible parcels. The effectiveness of manual delineation is considered for all 100% of the visible parcels. By maximizing the number of delineated parcels, we aimed to minimize the effect of unusual parcels that required much effort to delineate manually. We expect the measures that we obtained for the manual delineation to be similar for the 40–58% of parcels considered for the automated delineation. For the remaining parcels, MCG lines were either not available, or not aligning enough with the reference data. Manually delineating these parcels with the plugin requires the same number of clicks and time as conventional manual delineation, but is partly less tiring, as the delineation can be snapped to the MCG lines and vertices.


**Table 6.** Does automated delineation cost less effort?

Nevertheless, the lines and vertices can also impede the visibility: For our data from Rwanda and Kenya, the boundaries are not continuously visible. The partly vegetation-covered boundaries result in zigzagged and fragmented MCG lines (Figure 8b). Additionally, visible boundaries with low contrast were partly missed by MCG image segmentation. In both cases, the advantages of automated delineation are limited.

We claimed that the least-cost-path based on the boundary likelihood is beneficial to delineate long and curved outlines [21,22]. For the Ethiopian data, we now barely made use of the boundary likelihood: For the often small and rectangular parcels, connecting all lines surrounding a click or a selection of lines was more efficient. For areas with few fragmented, long or curved outlines, the workflow is assumed to be of similar effectiveness when leaving out the boundary classification. To include the boundary classification is beneficial when boundaries are demarcated, e.g., by long and curved boundaries, such as roads, waterbodies, or vegetation.

For our data from Kenya and Rwanda, we omitted the boundary classification, since we hardly used it for the Ethiopian data. The least-cost-path, for which a weight attribute can be selected in the plugin interface, used line length instead of boundary likelihood. Since the boundaries differ from

the boundaries in the Ethiopian scene, the CNN would need to be retrained or fine-tuned for the new boundary types. Retrieving CNN-derived boundary likelihoods for these UAV data would require further experiments on whether and how to rescale tiles to 224 × 224 pixels, while providing context comparable to our aerial tiles (Figure 5).

Overall, the automated delineation provided diverse functionalities for different boundary types (Table 7), which made delineation less tiring and more effective (Table 6). Improvements to manual delineation were the strongest for parcels fully surrounded by MCG lines. Such parcels were mostly found in the Ethiopian rural scene, where boundaries aligned with agricultural fields. In the Rwandan scene, automated delineation was time-consuming, since the boundaries were not demarcated consistently. Selecting and joining fragmented MCG lines required more careful visual inspection compared to the rural Ethiopian scene. In the Kenyan scene, the boundaries were less often covered by vegetation, and thus were in general better visible. Compared to the rural Ethiopian scene, the automated delineation still required more zooming, as boundaries were demarcated by more diverse objects.


**Table 7.** Which plugin functionality to use for which boundary type?
