*2.2. Selection of Study Areas*

The following criteria were used to select the area for further analysis:


In response to these criteria, the city of Bielsko-Biała in southern Poland in the Silesian Voivodeship was selected for further analysis. It is a city with diverse architecture, consisting of both older buildings and districts with modern buildings. Additionally, the city contains industrial areas with factories or large warehouses. In terms of building density and diversity, the city is characterised by a centre with compact buildings and, within a radius of about one kilometre, a less dense suburban area. Both an orthophotomap (dated 2021, with a maximum terrain pixel of 10 cm) and data from EGiB database were available for the city.

At the stage of selecting data sources, the use of two vector data resources, i.e., EGiB and OSM, was considered. The selection of the resource for further analysis was based on the verification of the actuality of these resources in relation to the orthophotomap of 2021, obtained from the Polish Geoportal [26]. Figure 1 shows the comparison between EGiB and OSM data. In green, the common parts of both resources are presented, in yellow the objects that are only in the EGiB database, while in red the elements that are only in the OSM database.

**Figure 1.** Comparison of data from EGIB and OSM (green—common parts of both databases, yellow— objects only in EGiB database, red—objects only in OSM database).

The analysis showed that the main problem of OSM resources is that they are outdated. Data from EGiB are more up-to-date, more accurate and complete and have no artefacts. Examples are presented in Figure 1 and—depending on the type of problem—are marked as the following areas:


•Area D—generalisation of building outline (simplification of building outline shape). Sincealltheindicatedresultinmuchlowerofthenetwork

 problems may a accuracy and the wrong extraction of buildings, in further works it was decided to use only the EGiB database.

Before proceeding to further work, the input data from the EGiB database were analysed in relation to the orthophotomap. This comparison was aimed at identifying possible errors that could affect the results of the algorithm and, consequently, the possibility of extracting buildings.

Firstly, the obvious problem that was identified was that the mask outline of the dataset followed the wall outline, not the roof outline. This was due to the specificity of the EGiB database, which contains the vertices of the wall points. The applied roof eaves and other elements intended to protect the objects against, for example, the degrading activity of rainwater, increased the building outline. The problem is illustrated in Figure 2a. However, it was considered that the problem could be omitted given the purpose of the study, i.e., to identify the existence of objects with an approximate outline rather than to identify their exact outline.

(**a**) (**b**)

(**c**) (**d**) 

**Figure 2.** Identified errors for input data. Red colour marks outlines of buildings from the cadastral database. The images represent, respectively, the problems: (**a**) outline along the building walls—not along the roof, (**b**) gaps in the EGiB database, (**c**) radial displacement for tall buildings, (**d**) correct outline.

For the same reason, the problem caused by the available orthophotomap not being a true orthophotomap [28] and containing radial displacements which are particularly visible for tall objects was also omitted. A true orthophotomap is slowly being made available by the Central Office of Geodesy and Cartography. As of the writing of this article, i.e., the end of February 2022, the EGiB database was not available for the area that was subject to

the creation of this type of product, and therefore it had to be excluded as a candidate for analysis. However, in each case, the building from the EGiB database is located within the outline boundaries of the objects with the orthophotomap. This problem is presented in Figure 2c.

In addition, the EGiB database does not consider objects that are: not permanently connected to the ground; additional elements of buildings such as terraces, summer cottages, farmhouses, outbuildings or allotments. For this reason, the database is incomplete. Therefore, in the process of data preprocessing it was decided not to consider the objects that are not included in the EGiB database. The problem is presented in Figure 2b.

Figure 2d shows an area that is not subject to any of the problems described above. This is also the case for most of the area to be analysed, so it was decided to check the possibilities described above for the segmentation of the buildings.
