*3.3. Accuracy Assessment*

The results in this research were evaluated considering the detection accuracy on the testing tiles. While validating the predicted results with reference data, a certain tolerance is often used in cadastral mapping. According to the International Association of Assessing Officers (IAAO), the horizontal spatial accuracy for cadastral maps in urban environments is usually 0.3 m or less, and in rural areas an accuracy of 2.4 m is sufficient [24]. Besides, FFP approaches advocates the flexibility in terms of accuracy to best accommodate social needs [3]. Complying with FFP land administration, we chose a 0.4 m tolerance for urban and peri-urban environments in this research. When adopted for other applications, this number can be adjusted correspondingly and according to the demands.

The accuracy assessment resorted to precision-recall measures, which are a standard evaluation technique especially for boundary detection in computer vision fields [21]. Precision (P), also called correctness, measures the ratio of correctly detected boundary pixels to the total detected boundary pixels. Recall (R), also called completeness, indicates the percentage of correctly detected boundaries to the total boundaries in the reference. The F measure (F) represents the harmonic mean between precision and recall [25]. As F combines both precision and recall, it can be regarded as an overall quality measure. The range of these three measures is between 0 and 1. Larger values represent higher accuracy.

Specifically, the accuracy assessment was done by overlapping the detected boundary with buffered reference data (0.4 m buffer). With the following table and formulas, we indicated how to calculate precision, recall and the F-score. Pixels labelled as boundary class in both detection and the buffered reference are called True Positive (TP), while pixels labelled as boundary in detection but non-boundary in the buffered reference are called False Positive (FP). The term False Negative (FN) and True Negative (TN) are defined similarly (Table 2). By overlaying the detection result with the buffered reference, we can obtain the value of TP, FP, TN and FN, respectively. TP stands for the number of correctly detected boundary pixels, and the sum of TP and FP indicates the number of total detected boundary pixels. Hence, we can calculate the value of precision through Formula 1. However, the sum of TP and FN stands for the number of total boundaries in the buffered reference, rather than the original reference. Therefore, we need to divide the sum of TP and FN by 8, which is the width of the buffered reference, to get the number of total boundaries in the original reference. This is because the buffered reference has a uniform width of 8 pixels, while the original reference is only single-pixel wide. Equations (2) and (3) show how to calculate recall and the F-score.

**Table 2.** Confusion matrix for binary classification.


In order to know the capability of different methods in detecting visible and invisible cadastral boundaries, we calculated the classification accuracy for the visible cadastral boundary, invisible cadastral boundary and all cadastral boundaries, separately. By overlapping the detected boundary with the buffered reference of only visible, invisible or all cadastral boundaries, we obtain three sets of accuracy assessment results for each algorithm on every testing tile.

$$P = TP / (TP + FP),\tag{1}$$

$$R = 8 \cdot TP / (TP + FN),\tag{2}$$

$$F = 2 \cdot P \cdot \mathbb{R} / (P + \mathbb{R}).\tag{3}$$

## **4. Results**

The proposed method along with the competing methods were implemented on both study sites, Busogo and Muhoza, to test their generalization ability. The results are evaluated considering the classification accuracy on the testing tiles using the precision-recall framework. The visual and numerical results of the testing tiles are demonstrated in the following table and figures.

Table 3 presents the separate accuracy for visible and invisible boundaries, as well as the overall accuracy for all cadastral boundaries of each algorithm on TS1 and TS2. Taking the classification accuracy of FCN on visible cadastral boundaries in TS1 as an example, FCN achieves 0.75 in precision, which means the ratio of truly detected visible boundaries to the total detected boundaries is 75%. The value of recall is 0.65, indicating 65% of visible cadastral boundaries among all the visible boundaries in the reference are detected. The final F-score of FCN is 0.70, which can be regarded as an overall measure of quality performance. Other results from Table 3 could be interpreted in the same way. Interestingly, according to the mathematical implications of precision, the sum of the P value on visible and invisible boundaries should be equal to the P value on all cadastral boundaries. We can easily verify this through the six sets of data in Table 3, with a small tolerance of ±0.03.

**Table 3.** Classification accuracies of the Fully Convolutional Network (FCN), Globalized Probability of Boundary–Oriented Watershed Transform–Ultrametric Contour Map (gPb–owt–ucm) and Multi-Resolution Segmentation (MRS) on TS1 and TS2. Three kinds of accuracies are calculated by comparing the detected boundary to the reference of visible, invisible and all cadastral boundaries.


According to Table 3, FCN achieves an F-score of 0.70 on visible boundaries and 0.06 on invisible boundaries in TS1. The score of the former is much higher than the latter. A similar situation can also be witnessed in TS2, which indicates that FCN detects mainly visible cadastral boundaries. The F-score of FCN on all boundaries in TS1 is 0.52, larger than that in TS2 (0.48). We can interpret this result considering the proportion of visible cadastral boundaries (The proportion of visible cadastral boundaries in each tile are calculated by computing the ratio of the total length of visible cadastral boundaries to that of all cadastral boundaries) in each tile, which is 57% in TS1 and 72% in TS2. Surprisingly, with more visible cadastral boundaries, TS2 get poorer detection results. According to the R value of visible boundaries, 65% of visible cadastral boundaries are detected in TS1 and the number is only 45% in TS2. It means that although TS2 has more visible cadastral boundaries, less of them are detected. The main difference in detection ability of the FCN in TS1 and TS2 can be understood

by considering the various types of visible cadastral boundaries. TS1 is located in a sub-urban area, with fences and strips of stones being the most predominant visible boundaries, whereas TS2 is in an urban area, where building walls and fences play the leading role. The better performance of FCN in TS1 indicates that FCN is good in detecting visible boundaries like fences and strips of stones, while cadastral boundaries that coincide with building walls are more difficult for FCN to recognize. Based on the above analysis we can conclude that FCN detects mainly visible cadastral boundaries, especially those demarcated by fences or strips of stones.

Comparing FCN to gPb–owt–ucm and MRS, the most salient finding is that under the same situation, like the detection accuracy for visible boundary in TS1 or all boundaries in TS2, the P value of FCN is always larger than that of gPb–owt–ucm and MRS, while the R value of FCN is always smaller. FCN always achieves the highest F-score. These results show that gPb–owt–ucm and MRS can detect large proportion of cadastral boundaries, but also many false boundaries. FCN has a very high precision rate, leading to the best overall performance.

Figure 5 shows the visible and invisible boundary references and the detected results of the investigated algorithms. According to Figure 5, the missing boundary fragments in the FCN classification output are mainly invisible boundaries. Besides, FCN has a more regular and cleaner output than gPb–owt–ucm and MRS. Although the outlines of buildings and trees correspond to strong edges, they are not confused by FCN with cadastral boundaries.

**Figure 5.** Reference and classification maps obtained by the investigated techniques. The visible boundary references are the green lines; the invisible are the red lines; and the detected boundaries are the yellow lines.

Figure 6 presents the error map of the detection results. By overlapping the detection map with the boundary reference, the correctly detected boundaries are marked as yellow; the false detection rea marked as red; and the missing boundaries are marked as green. Figure 6 supplies a better intuition of the detection results. Fewer red lines can be observed in the FCN output as compared to the other two algorithms, once again proving that FCN has higher precision.

The difference in computational cost between these methods is also worth highlighting. As mentioned earlier, it takes 6 h to train the FCN. However, once trained well, one tile is classified in 1 min with the proposed FCN, whereas it takes 38 min for MRS and 1.5 h for gPb–owt–ucm, respectively.

**Figure 6.** The error map of the investigated techniques. Yellow lines are TP; red lines are FP; and green lines are FN.
