*Article* **Using Open Vector-Based Spatial Data to Create Semantic Datasets for Building Segmentation for Raster Data**

**Szymon Glinka, Tomasz Owerko and Karolina Tomaszkiewicz \***

> Faculty of Geo-Data Science, Geodesy, and Environmental Engineering, AGH University of Science and Technology, al. Mickiewicza 30, 30-059 Krakow, Poland; glinka@agh.edu.pl (S.G.); owerko@agh.edu.pl (T.O.) **\*** Correspondence: tomaszki@agh.edu.pl

**Abstract:** With increasing access to open spatial data, it is possible to improve the quality of analyses carried out in the preliminary stages of the investment process. The extraction of buildings from raster data is an important process, especially for urban, planning and environmental studies. It allows, after processing, to represent buildings registered on a given image, e.g., in a vector format. With an actual image it is possible to obtain current information on the location of buildings in a defined area. At the same time, in recent years, there has been huge progress in the use of machine learning algorithms for object identification purposes. In particular, the semantic segmentation algorithms of deep convolutional neural networks which are based on the extraction of features from an image by means of masking have proven themselves here. The main problem with the application of semantic segmentation is the limited availability of masks, i.e., labelled data for training the network. Creating datasets based on manual labelling of data is a tedious, time consuming and capital-intensive process. Furthermore, any errors may be reflected in later analysis results. Therefore, this paper aims to show how to automate the process of data labelling of cadastral data from open spatial databases using convolutional neural networks, and to identify and extract buildings from high resolution orthophotomaps based on this data. The conducted research has shown that automatic feature extraction using semantic ML segmentation on the basis of data from open spatial databases is possible and can provide adequate quality of results.

**Keywords:** semantic segmentation; open data; deep learning; building extraction; unet; deeplab
