*4.3. Discussion*

The obtained results confirm the possibility of using open spatial data as a dataset for the task of segmenting buildings from raster images. However, it should be kept in mind that the most important issue is the requirement for data accuracy, which must be adapted to the specific task. Given this, the use of the described approach may have some limitations when the goal is to segmen<sup>t</sup> building outlines very accurately, where the problems discussed in Section 2.1 may have a significant impact on the results. This section will discuss the achieved results in more detail.

Firstly, comparing the results that were obtained between the two datasets, one can see slightly higher metric values for the dataset with the smaller ground pixel resolution. This difference is particularly noticeable when using the DeepLabV3+ architecture, where an mIoU score of 12.90 points better was obtained due to the specificity of this architecture. For both UNETs, the difference in value did not exceed 0.01 points, but when visually comparing the results from these networks we see that some of the buildings for the larger pixel were not classified correctly. Such a small difference in metrics is due to the size of the buildings for each dataset. For example, a building with dimensions of 10 × 10 m occupies 20 × 20 pixels for a dataset with a larger pixel and 100 × 100 pixels for a dataset with a smaller pixel. No segmentation of such an object or its shading/shadowing in the test image will have a much greater impact in the case of a dataset with a smaller pixel, and this is reflected during the calculation of metrics.

The visual comparison also shows that the dataset with the smaller terrain pixel is more effective. In the test images, all the objects were correctly identified, whereas the dataset with the larger pixel size showed significantly more False Negative and False Positive areas.

A noticeable problem for both datasets is shaded and wooded areas. Shadows cast by high buildings cause the objects directly below to be covered or shaded. The result is a pixel misclassification or partial segmentation of the object. This problem concerns mainly garages or extensions to the main building. Similar results are generated on images where vegetation—usually trees—covers the image. While the first problem can be eliminated by creating a true-orthophotomap, the second problem in the case of using photogrammetric digital cameras will always occur. Its solution may be the use of LiDAR and adding more analysis dimensions in addition to RGB colours.

The loss function that was used achieved its purpose. The aim of using DICE loss was to maintain a balance between FP and FN values. The results obtained, i.e., similar Recall and Precision values, show that this aim can be considered satisfied.

The mIoU values obtained for building segmentation are similar to those achieved by other researchers for public datasets. Typically, this value reaches a value of around 90%.

The limitations of the used dataset, as described in this section and Section 2.2, may be difficult to fully overcome. The solution of this may be to expand the dataset to include other areas that generate new learning patterns. The analysed dataset was also not varied by lighting, so applying the algorithm to images with different histogram characteristics may not give satisfactory results. Therefore, further tests for more areas, and more diverse areas are possible. In addition, in order to fully assess the accuracy of the obtained results, a comparison of the obtained results with fully correct, hand-made building outlines is required.

The applied models and their hyperparameters can be optimised. However, the aim of this paper was not to develop new architectures or approaches, but to verify the possibility of using open data to generate data for training sets to solve the semantic segmentation problem.
