Irregular Facades: A Dataset for Semantic Segmentation of the Free Facade of Modern Buildings

Wei, Junjie; Hu, Yuexia; Zhang, Si; Liu, Shuyu

doi:10.3390/buildings14092602

Open AccessArticle

Irregular Facades: A Dataset for Semantic Segmentation of the Free Facade of Modern Buildings

¹

College of Architecture, Nanjing Tech University, Nanjing 211816, China

²

College of Art & Design, Nanjing Tech University, Nanjing 211816, China

^*

Author to whom correspondence should be addressed.

Buildings 2024, 14(9), 2602; https://doi.org/10.3390/buildings14092602

Submission received: 16 July 2024 / Revised: 14 August 2024 / Accepted: 21 August 2024 / Published: 23 August 2024

(This article belongs to the Section Architectural Design, Urban Science, and Real Estate)

Download

Browse Figures

Versions Notes

Abstract

Semantic segmentation of building facades has enabled much intelligent support for architectural research and practice in the last decade. Faced with the free facade of modern buildings, however, the accuracy of segmentation decreased significantly, partly due to its low regularity of composition. The freely organized facade composition is likely to weaken the features of different elements, thus increasing the difficulty of segmentation. At present, the existing facade datasets for semantic segmentation tasks were mostly developed based on the classical facades, which were organized regularly. To train the pixel-level classifiers for the free facade segmentation, this study developed a finely annotated dataset named Irregular Facades (IRFs). The IRFs consist of 1057 high-quality facade images, mainly in the modernist style. In each image, the pixels were labeled into six classes, i.e., Background, Plant, Wall, Window, Door, and Fence. The multi-network cross-dataset control experiment demonstrated that the IRFs-trained classifiers segment the free facade of modern buildings more accurately than those trained with existing datasets. The formers show a significant advantage in terms of average WMIoU (0.722) and accuracy (0.837) over the latters (average WMIoU: 0.262–0.505; average accuracy: 0.364–0.662). In the future, the IRFs are also expected to be considered the baseline for the coming datasets of freely organized building facades.

Keywords:

free facade; irregular facade; modern building; semantic segmentation; classifier training; IRFs

1. Introduction

Images of building facades carry much information about design. With the rapid progress of computer vision (CV) technology, image-based research and the practice of architecture have become a promising field [1]. Many intelligent applications, such as knowledge discovery and 3-D reconstruction [2,3], were developed to support the architects in various ways. In order for the design information to be further used, it is necessary to segment the building facades accurately [4].

In the past few years, scholars have explored the semantic segmentation of building facades. Training classifiers to recognize the facade elements of different classes is a dominant direction of these studies [5]. Martinovic et al proposed a three-layered approach (ATLAS) to improve the segmentation of rectified building facades by combining classifiers and object detectors [6]. For unrectified facades, Lotte et al integrated convolutional neural network (CNN), Structure-from-Motion (SfM), and Multi-View-Stereo (MVS) to segment the facades with high spectral and spatial variability [7]. Focused on the segmentation of windows, Neuhausen and Konig trained soft cascaded classifiers to detect windows in ground view facade images [8]. By using a spatial attention and relation module, Mask R-CNN, an advanced CNN architecture, was optimized to improve the segmentation of windows at the instance level [9]. Mao et al devised a projection-based method [10] to segment the glass facade in oblique aerial images with geometric deformation and distortion. Besides machine learning, Oskouie, Becerik-Gerber, and Soibelman used the gradient profile of images to predict the locations of the common facade elements (i.e., windows and doors) [11]. These locations are then used to extract pixel- and texton-level local and global features to classify the facade elements.

The studies mentioned above have achieved high accuracy in the segmentation of classical facades. Compared to classical facades, the free facade of modern buildings shows a much lower regularity of composition [12]. Moreover, the extensive use of glass blurs the distinctions between different elements [13]. The optical properties of glass make it easy to reflect the outdoor environment or transmit the indoor environment, and therefore, weakens the features of facade elements [14]. Faced with free facades, the accuracy of the abovementioned methods decreased sharply [7]. This situation can be partly attributed to the fact that most previous studies only focused on improving training technology. The development of training samples has received less attention than it should have.

At present, classifiers for facade segmentation are basically trained with a few existing datasets, including CMP Facade Database [15], ADE20K dataset [16], ECP Paris [17], eTRIMS Image Database [18], ICG Graz50 [19], etc. These datasets provide a large number of finely annotated facade images. However, most of the images in them are about classical facades. Although some scholars have noticed the importance of architectural style and contributed several pixel-level annotated datasets of modern building facades, such as Comprehensive Facade Parsing [20] and Glass Facade Segmentation Dataset [21], their street and oblique aerial viewed samples cause severe perspective distortion and unexpected environmental disturbances. Moreover, the incomplete classes (GFSD) and a high proportion of regular facades (CFP) also limit their scope of application. Classifiers cannot learn enough features of free composition from these datasets and, therefore, show weak performance in the segmentation task of free building facades.

Several latest studies aim to enhance images with reinforcement learning (RL). Most explorations focused on solving the deficiencies of image color, such as uneven exposure [22], low light [23], color deviations, and low contrasts [24,25]. In addition to colors, scholars also paid attention to the RL-based denoising [26], restoration [27], and super-resolution [28] of images. The methods proposed in these studies can improve the quality of building images from multiple aspects. However, they do not change the architectural style of the facade in images. In other words, classifiers still cannot learn the logic of free composition from the classical facades even though the images have been enhanced.

Considering that modern building is the most common style in cities and is growing much faster than the buildings of other architectural styles, suitable samples are urgently needed to train the classifiers that segment the free facade accurately. This study provided a pixel-level annotated dataset of low-regularity facades, named Irregular Facades (IRFs). With this specialized dataset, classifiers can understand and learn the features of freely composed facade elements, thus segmenting the free building facades accurately. The IRFs were developed based on 1057 facade images in which the facade elements were annotated into six classes, namely Background, Plant, Wall, Window, Door, and Fence. To demonstrate the IRFs’ effect on classifier training, a multi-network cross-dataset control experiment was conducted. The results indicate that the classifiers trained with the IRFs reached higher accuracy on free facade segmentation than those trained with existing datasets. At the image level, the IRFs-trained classifiers (average WMIoU: 0.722; average accuracy: 0.837) show a significant advantage over the existing datasets-trained ones (average WMIoU: 0.262–0.505; average accuracy: 0.364–0.662). The similar trend in the class- and subclass-level metrics, including IoU and F1-score, suggest that the advantage is comprehensive and reliable.

The rest of this paper is organized as follows. Section 2 introduces the dataset we developed in detail and compares it with several highly rated datasets. The conditions, process, and results of the control experiment are reported in Section 3. Section 4 and Section 5 discuss the advantages and limitations of this study, respectively. The main contributions of this study are summarized in Section 6.

2. The Dataset of Irregular Facades (IRFs)

In order to train the classifier that is sensitive to the free facades of modern buildings with low regularity, a dataset named Irregular Facades (IRFs) was developed. Most of the facades in the IRFs are of modernist style and designed freely.

2.1. Overview of the IRFs

The IRFs contain 1057 images of building facades among 104 countries (1895–2023). These images were selected and downloaded from ArchDaily, a professional website of architecture, with official authorization.

2.1.1. Image Sizes

Images in the IRFs were cropped from the original version uploaded to ArchDaily. A total of 82.9% of the images have an aspect ratio in the range of 1/3 to 3, which means that the long side of the images would not be more than three times the short side. As many convolutional neural network (CNN) architectures only accept inputs of the same size, the moderate aspect ratio helps avoid excessive distortion when resizing the images. Moreover, to show the features of different elements clearly, most images (99.5%) are in high definition, with more than 100,000 pixels. The lowest pixel number in the IRFs is 62,040 (376 × 165 pixels). Figure 1 and Figure 2 show the distribution of aspect ratio and pixel number of the IRFs.

2.1.2. Image Features

We specifically calculated the distribution of glass elements (except windows) in the IRFs, including glass doors and fences. As shown in Figure 3, the proportions of samples containing glass fence (158 + 15 images) and non-glass fence (280 + 15 images) are close. This is helpful for improving the sensitivity of the classifier to fences of different materials. On the contrary, non-glass door (165 + 46 images) only accounts for a small proportion compared to glass door (585 + 46 images). The different proportions make it easier to observe whether glass elements have an impact on classifier training.

According to the distributions of glass door and fence, the images in the IRFs can be categorized into four kinds, namely images with both glass doors and fences, with glass doors only, with glass fences only, and without glass doors or fences. Classifiers thus gain better adaptability to the coexistence of multiple glass elements.

Given the association of the environment with the visual characteristics of glass elements, four environmental features, including Time, Weather, Light, and Foreground, were annotated as the labels of each image. Specifically, Time and Weather indicate the degree to which glass elements reflect the external environment. Light determines whether a glass element reflects indoor objects. Foreground exerts a significant effect on how clearly the facades can be perceived. These labels help divide samples evenly for training. Figure 4 shows the proportions of different labels among the IRFs.

2.2. Classes and Annotations

2.2.1. Classes

We defined six common classes of facade images, including Background, Plant, Wall, Window, Door, and Fence. Classes that are too rare were ignored. Labelme v5.1.1 [29], an open-source annotation tool, was used to annotate the images according to these classes. The objects were annotated from back to front with layered polygons such that no boundary was marked more than once, and no pixel was unmarked. Each annotation thus implicitly provides a depth ordering of the objects in the scene. Figure 5 illustrates the process of sample annotation. The quality control of annotation required fifteen minutes on average for one image.

Note that only dense trees were annotated as Plant, while the region of sparse trees was annotated based on the object behind it. This strategy aims to train classifiers that are more robust to disturbing factors in the environment.

2.2.2. Number of Pixels and Elements

The total pixel number and element number of the IRFs are 1.3 × 10⁹ and 26,057, respectively. On average, each image consists of 1.26 × 10⁶ pixels and 24.7 elements. The large amount of annotation (Figure 6) provides sufficient high-quality samples for classifier training.

2.3. Comparison to Existing Datasets

To indicate the quantity and quality of the IRFs, it was compared with five existing datasets of semantically annotated building facades, i.e., CMP Facade Database [15], ADE20K dataset [16], ECP Paris [17], eTRIMS Image Database [18], and ICG Graz50 [19].

2.3.1. Data Size

As shown in Figure 7, the IRFs have an obvious advantage in the number of samples compared to the existing datasets. The large number of samples can be flexibly divided according to various needs, thus enabling the IRFs to adapt to different networks widely. In terms of pixels per image, the IRFs reach 1.9–12.6 times as much as the existing datasets. That guarantees the quality of the IRF images.

2.3.2. Architectural Style

Because the feature of facade composition varies greatly among architectural styles, the style distribution of the dataset determines the applicable scope of the trained classifier to a large extent. In addition, for classical facades, glass brings almost no difficulty to semantic segmentation, as it only occurs on the elements that are small, regular-shaped, and neatly arranged. However, the interference of glass becomes serious when segmenting freely designed facades. As shown in Figure 8, the proportion of irregular facades in the IRFs is much higher than those of existing datasets. This is favorable for training the semantic classifiers for the free facade of modern buildings.

2.3.3. Class Proportion

Every dataset has a unique class system. To compare the IRFs and existing datasets, the classes of existing datasets were merged based on the six classes in the IRFs, as shown in Table 1. Although the IRFs’ definition of classes is not as detailed as that of existing datasets, it covers the main types of elements on building facades. Figure 9 shows the proportion of pixels and elements by class of the datasets.

2.3.4. Complexity

As shown in Figure 10, the number of classes and elements per image in the IRFs is comparable to existing datasets. At the dataset level (Table 2), the IRFs have much more pixels of glass doors and fences because most images in the existing datasets are about classical facades. The number and proportion of images containing glass doors and fences in the IRFs are also at a high level. Moreover, there are many facades covered by the foreground, mainly plants, in the IRFs. The higher complexity of the IRFs implies that the classifiers trained with the IRFs would understand the free facades in the real environment better.

3. Experiment

To demonstrate the performance of the IRFs for training the classifier for free facade segmentation, a multi-network cross-dataset control experiment was performed.

3.1. Datasets and Networks

3.1.1. Datasets

In addition to the IRFs, five existing datasets of pixel-level annotated facade images were introduced as competitors, including the CMP Facade Database [15], ADE20K dataset [16], ECP Paris [17], eTRIMS Image Database [18], and ICG Graz50 [19], whose basic information is analyzed comprehensively in Section 2.3. We chose the facade datasets according to the following standards:

Viewed from the front. Facades in the datasets should be viewed from the front orthographically or with only weak perspective distortion;
Openly available. Only the datasets that can be accessed from open sources were adopted in this experiment;
Widely used. The wide applications guarantee the dataset’s effectiveness in training classifiers for facade segmentation;
Highly rated. Good reputation indicates the high quality of the datasets.

Moreover, as some of these datasets were devised for multiple computer vision tasks, there are many other samples besides building facades, such as street and interior scenes. We removed the irrelevant samples, and only the building facades with pixel-level annotation remained for the experiment. Figure 11 shows the examples of the samples of each dataset adopted in the experiment.

3.1.2. Networks

To mitigate the effect of the possible bias of the networks on different datasets, five high-performance convolutional neural networks, U-Net [30], DeepLabv3+ [31], SegNeXt [32], HRNet [33], and PSPNet [34], were used to train classifiers independently. These networks were developed based on several classic backbones and recognized as state-of-the-art when proposed. Since the data size of some datasets is limited, all networks had been pretrained before the experiment to sufficiently utilize the samples.

Specifically, the networks were selected as their characteristics meet the needs of our facade segmentation experiment in different aspects, as shown below:

U-Net only needs very few annotated images to train classifiers. This makes it well adaptable to datasets with a small number of samples, including ECP Paris, eTRIMS, and Graz50;
DeepLabv3+ shows excellent performance on the images with reduced feature resolution, thus learning from the existing datasets effectively, especially ADE20K;
SegNeXt achieves efficient multi-scale information interaction at an acceptable computational cost, making it sensitive to objects of different sizes, like the elements on building facades;
HRNet maintains high-resolution representations through the whole training process so that the details of the boundaries between the facade elements of different classes can be perceived more accurately;
PSPNet fully uses the scenery context features that are particularly important for the free facade elements with irregular shapes and textures.

3.2. Process

The experiment was conducted in four steps, including sample splitting, pre-processing, training and testing, and assessment, as shown in Figure 12.

3.2.1. Sample Splitting

The samples of existing datasets were allocated randomly into training (80%) and validation (20%) sets, respectively.

For the IRFs, samples were split using the following steps.

①: In order to calculate the metrics of glass and non-glass elements based on the samples in which the materials were not specifically labeled, 61 samples in which glass door (or fence) and non-glass door (or fence) both appeared were excluded to ensure that each image only contains the door (or fence) of one material;
②: The remaining samples (n = 996) were assigned into four groups according to the distribution of glass elements, as shown in Table 3;
③: The samples of each group were split into training (70%), validation (10%), and testing (20%) sets randomly.

3.2.2. Pre-Processing

Samples of the datasets were standardized to ensure the same environment for training. First, all images and annotations were resized to 512 × 512. Second, the classes of existing datasets were merged into at most six classes, as shown in Table 1.

3.2.3. Training and Test

The training and validation set of six datasets, including the IRFs and five existing datasets, were adopted to train classifiers and tune hyperparameters. Thirty classifiers were trained through the pairwise combinations of six datasets and five selected networks. Each one of these classifiers was used to segment the images in the testing set split from the IRFs separately.

3.2.4. Assessment

Intersection over Union (IoU) (Rahman and Wang 2016) [35] was introduced as the main metric to assess the prediction at the class and subclass levels. WMIoU, the mean IoU weighted by the number of pixels in each class, measured the prediction at the image level. In addition, accuracy and F1-score were adopted to assist in the assessment. Each classifier’s performance in the test was measured by the metrics at three levels, as shown in Table 4.

The class-level metrics were computed directly according to the labels of six classes in the testing set. As no sample has both glass and non-glass elements of the same class, we applied the prediction of Door and Fence to compute the subclass-level metrics. The WMIoU and accuracy measured the performance of each classifier at the image level. Specifically, the metrics were computed as follows:

I o U = \frac{\sum_{i = 0}^{j - 1} {T P}_{i}}{\sum_{i = 0}^{j - 1} ({T P}_{i} + {F P}_{i} + {F N}_{i})}

(1)

w = \frac{P C}{P A}

(2)

W M I o U = \sum_{m = 0}^{n - 1} ({I o U}_{m} \times w_{m})

(3)

A c c u r a c y = \frac{\sum_{i = 0}^{j - 1} ({T P}_{i} + {T N}_{i})}{\sum_{i = 0}^{j - 1} ({T P}_{i} + {T N}_{i} + {F P}_{i} + {F N}_{i})}

(4)

P r e c i s i o n = \frac{\sum_{i = 0}^{j - 1} {T P}_{i}}{\sum_{i = 0}^{j - 1} ({T P}_{i} + {F P}_{i})}

(5)

R e c a l l = \frac{\sum_{i = 0}^{j - 1} {T P}_{i}}{\sum_{i = 0}^{j - 1} ({T P}_{i} + {F N}_{i})}

(6)

F 1 - score = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(7)

where j and n represent the number of samples and of classes, respectively; TP, TN, FP, and FN represent the number of predicted pixels that were true positive, true negative, false positive, and false negative; PC represents the number of pixels of each class; and PA represents the number of all pixels.

Note that, as some classifiers cannot recognize all classes in the testing set, the pixels of unrecognized classes were not adopted for the calculation of their class- and image-level metrics.

3.3. Results

The complete experimental results and the examples of the segmented facade are shown in Table A1 and Figure 13, respectively. To focus on the effect of different datasets in classifier training, we calculated the average metrics of the classifiers trained with the same dataset by five networks as the brief experimental results shown in Figure 14.

3.3.1. Image Level

As shown in Figure 14a, the classifiers trained with the IRFs, whose A-WMIoU and A-accuracy reached 0.722 and 0.837, respectively, have shown a large advantage (A-WMIoU: 0.217–0.460; A-accuracy: 0.175–0.473) to those trained with existing datasets at the image level. The significant advantage demonstrates the importance of architectural style in training. The highly similar trends of A-WMIoU and A-accuracy ensure the reliability of the advantage. The classifiers that segment the free facade of modern buildings accurately must be trained with samples of the same style. Although the existing datasets have been applied to many computer vision tasks on building facades, the classifiers they trained were not effective in recognizing the facade of non-classical buildings.

Some other indications can be found from the comparison of the image-level metrics between the classifiers trained with existing datasets. CMP Facade Database shows the weakest performance on the training classifier for free facade segmentation (A-WMIoU = 0.262; A-accuracy = 0.364), although it has the largest data size besides the IRFs. As it is the only dataset augmented by partially masking samples with color blocks, we prefer to assume that what the classifiers learn from the augmented samples negatively affects their understanding of the non-augmented images. The classifiers trained with the ADE20K also segmented the testing samples imprecisely (A-WMIoU = 0.311; A-accuracy = 0.439). This is most likely due to the dataset’s low resolution and strong perspective distortion.

On the contrary, the classifiers trained with ECP Pairs, eTRIMS, and Graz50 reached higher A-WMIoU (maximum of 0.505) and A-accuracy (maximum of 0.662), though each of these datasets contains only a few dozen samples. This indicates that proper samples are much more important than data size to train classifiers, especially when using increasingly smart networks.

3.3.2. Class Level

The class-level metrics, shown in Figure 14b–g, help reveal the reason for the difference between the image-level metrics. Both the A-IoU and A-F1-score of all classes of IRFs-trained classifiers are much higher than those of the other classifiers. The basically consistent degrees of advantage at the class level and image level suggest that the advantage of IRFs-trained classifiers is comprehensive, reliable, and significant.

For the classifiers trained with each existing dataset, the A-IoU of class Wall is approximately 10% (0.079–0.180) higher than the A-WMIoU, while the A-IoUs of other classes are much lower. In other words, as the difference between facade elements is much greater than that between building outlines among different architectural styles, classifiers can roughly learn the feature of building outline but only a few details of the freely designed facade elements from the samples in classical style.

Another noteworthy phenomenon, shown in Figure 14e, is that the classifiers trained with ADE20K recognized almost no pixels of Classes Window (A-IoU = 0.000; A-F1-score = 0.000) and Door (A-IoU = 0.009; A-F1-score = 0.005) in the test, no matter which network they were trained by, as shown in Table A1. This is unexpected, given that ADE20K has been widely applied in many CV tasks. A possible explanation is that the large number of perspective images in ADE20K weakens the feature of windows. Its low resolution might further exacerbate the feature weakening of the window.

3.3.3. Subclass Level

Figure 14h–k show the IoU of doors and fences of different materials. As shown in Figure 14j,k, the IRFs-trained classifiers’ metrics of subclasses Glass Fence (A-IoU = 0.603; A-F1-score = 0.747) and Non-Glass Fence (A-IoU = 0.601; A-F1-score = 0.750) are very close. This situation is understandable since the proportion of images containing glass and non-glass fence are close in the training and testing sets. However, a similar situation is also observed among the metrics of subclasses Glass Door (A-IoU = 0.361; A-F1-score = 0.527) and Non-Glass Door (A-IoU = 0.333; A-F1-score = 0.489) that distribute unevenly among the samples of the IRFs, as shown in Figure 14h,i.

The datasets CMP and ECP Paris provide a possible explanation for this phenomenon. Because most images in these two datasets are of classical facades, there are lots of non-glass fences and almost no glass fences. The classifiers trained with these datasets showed a significant difference on the metrics of subclasses Glass Fence (CMP: A-IoU = 0.117, A-F1-score = 210; ECP: A-IoU = 0.053, A-F1-score = 0.100) and Non-Glass Fence (CMP: A-IoU = 0.306, A-F1-score = 0.465; ECP: A-IoU = 0.261, A-F1-score = 0.410) in the test.

The above analyses imply that the existence, rather than the quantity, of the facade elements of different materials is the key to improving the quality of samples. To train the classifier for free facade segmentation, it would be more effective to apply the datasets in which the elements of the same class are made of diverse materials.

4. Discussion

In this study, we developed the dataset IRFs that improve the performance of classifiers for free facade segmentation by annotating irregular facades as training samples. The results of our experiment demonstrate that classifiers trained with the irregular samples segment the free facade more accurately than those trained with the samples of classical facades. The advantage occurring among several classic networks shows that the effect of the developed dataset is reliable.

To cross-validate the results of our experiment, two similar experiments in previous studies were introduced. Lotte et al used several classical facade datasets similar to ours to train classifiers and then tested the classifiers with their own dataset of free facades (SJC) [7]. As shown in Table 5, their accuracy of segmentation, which is even lower than ours, also suggests that the classifiers cannot segment the free facade accurately based on the knowledge learned from classical facades. On the contrary, B. Wang et al applied the samples of free facade on both training and test in their study [20]. Although their CFP samples were mainly taken from the street view, they achieved very similar experimental results to ours (Table 6). Based on the comparisons to these studies, we can conclude that classifiers only accurately segment the facade of the same architectural style as the training samples.

As shown in Figure 15, based on the accurate segmentation, the design information can be further extracted as a graph of facade elements, which illustrates the facade composition more directly than the original image [36]. For architects, it is easier to discover and understand the compositional logic and characteristics from the graphs. Moreover, as structured data, the graphs are compatible with many analysis methods. For example, by applying clustering on the graphs of facade elements, architects can efficiently obtain the main compositional patterns of the existing facades within a specific scope. This is helpful for architectural design and research in many fields, such as architectural innovation, typological studies, etc. Faced with the great number of building images in the big data era, the accurate segmentation on free facades makes the quantitative analysis of modern building appearance feasible.

Moreover, this study suggests that it is necessary to develop more annotated architectural image datasets of diverse styles. As a classifier is trained with samples by networks, both datasets and networks are important to the performance of the classifier. Accurate semantic segmentation of building facades in various styles cannot be achieved by the progress of CV techniques independently. The advantages of the classifiers trained with the IRFs remind us again that the classifier can only recognize what it has been taught by the samples [37]. Although some intelligent methods can generate coarse labels under certain conditions [38], the precise ground truth still must be manually annotated by humans. For various intersectional fields of CV application, corresponding samples and datasets are always indispensable for training better classifiers.

In addition to architectural style, several following indications of the sample have also been noticed to affect the performance of the classifier trained to segment building facades.

The quality of the training sample is more important than the quantity in classifier training. With the progress of CV technology, the priority of sample quality may continue to increase. Besides resolution, sample quality can also be improved in terms of classes, perspectives, etc.;
Because of the obvious difference between buildings and environments, the knowledge of building outline is easier to transfer among different architectural styles than that of facade elements. To obtain the composition of single buildings from street or aerial viewed images that include multiple buildings in the visual field, the instance segmentation of buildings is more important than distinguishing buildings from the environment;
The existence, rather than the quantity, of the facade elements of the same class but different materials in samples is considered the key to improving the classifier’s adaptability. To train classifiers that understand the color, texture, and transparency of various materials, each material must be presented in the samples, but not necessarily in large quantities nor evenly distributed.

These indications are expected to guide the development of new semantic segmentation datasets of building facades. Further rigorous tests, however, must be conducted in advance to confirm the scope and effect of these indications.

5. Limitation

The testing set of the control experiment was only split from the IRFs. This was adverse to the classifiers trained with existing datasets. Strictly, the testing set ought to be split from the dataset that was not involved in classifier training. However, most existing finely annotated datasets are mainly about classical facades [15]. Although the CFP [20] and GFSD [21] contain a number of non-classical facade images, their perspective, regularity, and classes do not match the standards of the testing set. As there is no suitable dataset of free facade that can be applied in the test, we had to split both training and testing sets from the IRFs. To further validate the importance of architectural style to classifier training, more semantic segmentation datasets of free facades must be developed.

Another limitation is that the predictions of the classifiers trained with IRFs were not accurate enough for practical tasks (A-WMIoU = 0.722, A-accuracy = 0.837). This is because we only adopted the basic architecture of the selected networks, which are not the most advanced at present. No optimization technique was used to improve the classifiers’ prediction. Since the focus of our experiment is to compare the datasets’ effect of training classifiers, the brief experimental process helped to eliminate the factors that may interfere with the results. By adding specific supports targeted to the free facade of modern buildings, the IRFs-trained classifier’s WMIoU and accuracy are expected to increase to 0.85 and 0.90, or even higher.

Moreover, compared with the existing datasets adopted in this study, the IRFs only contain the superordinate classes of facade elements, e.g., Wall. The specific elements that are subordinate to these classes are not annotated in our dataset, such as decorations, light boxes, vertical Greening, etc. The lack of a subordinate class weakens the flexibility of the IRFs for diverse applications. Based on the six annotated classes, more subclasses would be defined to improve the dataset.

The perspective from which the IRF images were taken also limits the dataset’s application. Unlike the datasets ADE20K, CFP, and SJC, all images in our dataset were taken from an orthographic front viewpoint by professional photographers. Therefore, the IRFs-trained classifiers’ performance may deteriorate on the images with strong perspective distortion. In the future, it is necessary to fill the gap by data augmentation or image rectification.

6. Conclusions

Building facade segmentation is gaining increasing attention from architects. Many image-based applications support intelligent programming, design, construction, and management. As the free facade has overturned the compositional logic of classical facades, training targeted classifiers is considered a precondition for accurate segmentation on the facade of modern buildings.

Focusing on the compositional characteristics, this study advocated producing samples of irregular facades, thereby training targeted classifiers for free facade segmentation. To achieve this goal, 1057 images of mostly irregular facades were gathered first. For each image, the pixels were annotated into six classes, namely Background, Plant, Wall, Window, Door, and Fence. Moreover, we labeled the materials of doors and fences as image features. The dataset Irregular Facades (IRFs) was developed based on the annotated images and related information. A multi-network cross-dataset control experiment was conducted to compare the impact of the training sample’s architectural style on classifier performance. The results indicate that the classifiers trained with irregular facades reached higher accuracy on the task of free facade segmentation than those trained with classical facades.

The main contributions of this study can be summarized as follows.

First, this study provides a pixel-level annotated dataset of IRFs that contains a large number of high-quality samples of irregular facades. It can support various image-based research and practice on the free facade of modern buildings.

Second, the results of our experiment demonstrate that architectural style plays an important role in semantic segmentation tasks of building facades. Classifiers segment the facades in the same style as training samples more accurately.

Third, several indications, including (i) the priority of sample quality over quantity, (ii) different recognition difficulty between classes, and (iii) the training strategy for the elements of the same class but different materials, were noticed in our experimental results. These indications are expected to guide future tasks of free facade segmentation.

Fourth, this study reminds us that both the improvements in samples and in networks contribute to training smarter classifiers. Especially for scholars of architecture, sample development is considered an accessible approach to making progress in the intersectional field of architecture and computer vision.

Our dataset also has its limitations, such as the lack of subclass annotations and the single perspective from the front view. These will be addressed in future updates by defining a detailed class system and using image augmentation. More advanced CV techniques will be applied to improve the performance of the IRFs-trained classifiers to meet the needs of practical tasks. Based on the newly developed similar datasets, the advantage of the IRFs should be further confirmed through a more rigorous control test.

In the era of AI, the development of training samples is a promising way to improve the effect of learning. By providing samples that are more consistent with the image to segment, it is easier for the classifiers to learn the features of the facades in specific styles. Given that the number of modern buildings is much greater and growing faster than the buildings of other styles, more annotated datasets of free facades are urgently needed in the near future. In addition to annotations, more explorations of the sample will be made from other aspects, such as sample split, pre- and post-processing, etc. Further efforts to establish a systematic framework of architectural sample annotation, augmentation, and processing are also expected.

Author Contributions

Data curation, J.W. and S.L.; Funding acquisition, S.Z. and S.L.; Methodology, J.W., S.Z., and S.L.; Project administration, S.L.; Resources, S.L.; Software, J.W.; Supervision, S.L.; Validation, J.W. and Y.H.; Writing—original draft, J.W. and Y.H.; Writing—review and editing, S.Z. and S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Natural Science Foundation of China [52108014], China Postdoctoral Science Foundation [2020M681565], the Humanities and Social Sciences Research Project of the Ministry of Education of the P. R. China [23YJC840039], and the Social Science Foundation of Jiangsu Province [22SHC008].

Data Availability Statement

The dataset Irregular Facades (IRFs) developed in this study is openly available in Kaggle at https://www.kaggle.com/datasets/liushuyuu/irregular-facades-irfs (accessed on 22 August 2024). The data that support the dataset development were gathered from ArchDaily with official authorization under a Creative Commons Attribution License (CC BY-NC-SA 4.0) https://creativecommons.org/licenses/by-nc-sa/4.0/ (accessed on 22 August 2024). The copyright of the data belongs to ArchDaily, and relevant architects or photographers are credited in the dataset.

Acknowledgments

The authors are grateful for the images and related data of building facades gathered from ArchDaily. Its generous sharing helped to develop the IRFs. We are also appreciative of the assistance from Zhihui Wang and Hanxin Zhang. They contributed much to the annotation of our samples. The networks that trained the classifiers in the experiment were provided by the mmsegmentation in GitHub at https://github.com/open-mmlab/mmsegmentation (accessed on 22 August 2024). This library made it more efficient to conduct the multi-network experiment.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Appendix A

Table A1. Results of the multi-network cross-dataset control experiment.

Classifier		Image-Level Metric (WMIoU/Accuracy)	Class-Level Metric (IoU/F1-Score)						Subclass-Level Metric (IoU/F1-Score)
Dataset for Training	Network for Training	Image-Level Metric (WMIoU/Accuracy)	Background	Plant	Wall	Window	Door	Fence	Glass Door	Non-Glass Door	Glass Fence	Non-Glass Fence
IRFs	U-Net	0.708 0.828	0.561 0.719	0.546 0.707	0.809 0.895	0.618 0.765	0.272 0.428	0.525 0.689	0.309 0.473	0.195 0.327	0.560 0.718	0.593 0.745
	DeepLabv3+	0.755 0.858	0.661 0.797	0.538 0.700	0.837 0.912	0.672 0.804	0.406 0.578	0.638 0.779	0.418 0.590	0.412 0.584	0.749 0.857	0.657 0.793
	SegNeXt	0.680 0.808	0.553 0.713	0.515 0.681	0.787 0.881	0.570 0.727	0.229 0.373	0.445 0.616	0.247 0.396	0.205 0.342	0.420 0.592	0.524 0.688
	HRNet	0.730 0.845	0.661 0.795	0.554 0.726	0.814 0.893	0.634 0.777	0.420 0.570	0.579 0.713	0.420 0.592	0.411 0.583	0.639 0.780	0.614 0.761
	PSPNet	0.736 0.846	0.648 0.787	0.547 0.708	0.820 0.901	0.646 0.785	0.406 0.578	0.570 0.726	0.412 0.584	0.440 0.611	0.647 0.786	0.619 0.765
	On average	0.722 0.837	0.617 0.762	0.540 0.704	0.813 0.896	0.628 0.772	0.347 0.505	0.551 0.705	0.361 0.527	0.333 0.489	0.603 0.747	0.601 0.750
CMP	U-Net	0.282 0.390	0.129 0.230	/	0.363 0.533	0.154 0.267	0.074 0.139	0.154 0.267	0.089 0.164	0.075 0.140	0.093 0.171	0.280 0.438
	DeepLabv3+	0.263 0.367	0.067 0.126	/	0.347 0.516	0.122 0.219	0.117 0.210	0.227 0.371	0.124 0.221	0.166 0.285	0.085 0.158	0.400 0.571
	SegNeXt	0.233 0.333	0.041 0.079	/	0.311 0.475	0.133 0.235	0.077 0.143	0.127 0.225	0.083 0.153	0.095 0.174	0.159 0.276	0.214 0.353
	HRNet	0.289 0.387	0.194 0.325	/	0.369 0.540	0.122 0.219	0.081 0.151	0.151 0.263	0.089 0.164	0.101 0.183	0.129 0.229	0.304 0.466
	PSPNet	0.244 0.342	0.102 0.187	/	0.315 0.479	0.120 0.215	0.103 0.187	0.188 0.317	0.119 0.214	0.128 0.228	0.121 0.217	0.330 0.497
	On average	0.262 0.364	0.107 0.189	/	0.341 0.509	0.130 0.231	0.090 0.166	0.169 0.289	0.100 0.183	0.113 0.202	0.117 0.210	0.306 0.465
ADE20K	U-Net	0.285 0.399	0.051 0.042	0.208 0.022	0.362 0.638	0.000 0.000	0.038 0.008	0.097 0.035	0.031 0.008	0.034 0.009	0.092 0.041	0.073 0.057
	DeepLabv3+	0.345 0.497	0.022 0.044	0.010 0.020	0.568 0.725	0.000 0.000	0.000 0.001	0.013 0.027	0.000 0.002	0.000 0.000	0.025 0.050	0.012 0.024
	SegNeXt	0.276 0.373	0.013 0.026	0.017 0.035	0.454 0.625	0.000 0.000	0.001 0.003	0.013 0.027	0.002 0.004	0.000 0.001	0.004 0.050	0.030 0.059
	HRNet	0.292 0.412	0.017 0.034	0.008 0.017	0.481 0.650	0.000 0.000	0.002 0.004	0.007 0.015	0.002 0.005	0.001 0.003	0.011 0.023	0.011 0.015
	PSPNet	0.357 0.514	0.019 0.037	0.010 0.020	0.589 0.742	0.000 0.000	0.003 0.007	0.003 0.006	0.005 0.010	0.000 0.000	0.003 0.007	0.003 0.023
	On average	0.311 0.439	0.024 0.037	0.051 0.023	0.491 0.676	0.000 0.000	0.009 0.005	0.027 0.022	0.008 0.006	0.007 0.003	0.027 0.034	0.026 0.036
ECP Paris	U-Net	0.411 0.547	0.321 0.487	/	0.509 0.675	0.243 0.392	0.049 0.094	0.094 0.172	0.039 0.075	0.091 0.167	0.016 0.032	0.161 0.278
	DeepLabv3+	0.411 0.553	0.352 0.521	/	0.491 0.659	0.284 0.443	0.019 0.039	0.163 0.281	0.011 0.023	0.045 0.086	0.045 0.087	0.305 0.468
	SegNeXt	0.469 0.601	0.297 0.459	/	0.594 0.746	0.279 0.436	0.056 0.107	0.116 0.208	0.065 0.123	0.089 0.165	0.059 0.113	0.216 0.356
	HRNet	0.532 0.676	0.376 0.547	/	0.660 0.796	0.327 0.493	0.089 0.163	0.165 0.284	0.096 0.175	0.090 0.166	0.079 0.147	0.282 0.440
	PSPNet	0.492 0.635	0.368 0.538	/	0.602 0.752	0.315 0.480	0.032 0.064	0.207 0.343	0.029 0.056	0.044 0.084	0.064 0.121	0.341 0.509
	On average	0.463 0.602	0.343 0.510	/	0.571 0.726	0.290 0.449	0.049 0.093	0.149 0.258	0.048 0.090	0.072 0.134	0.053 0.100	0.261 0.410
eTRIMS	U-Net	0.422 0.578	0.297 0.459	0.211 0.350	0.555 0.714	0.185 0.313	0.036 0.071	/	0.037 0.073	0.041 0.079	/	/
	DeepLabv3+	0.518 0.681	0.381 0.553	0.371 0.542	0.650 0.788	0.288 0.448	0.097 0.178	/	0.101 0.184	0.106 0.193	/	/
	SegNeXt	0.375 0.494	0.279 0.437	0.101 0.185	0.487 0.656	0.186 0.314	0.057 0.108	/	0.058 0.111	0.081 0.151	/	/
	HRNet	0.535 0.686	0.411 0.583	0.287 0.447	0.656 0.793	0.350 0.519	0.148 0.258	/	0.134 0.237	0.217 0.358	/	/
	PSPNet	0.499 0.655	0.392 0.564	0.240 0.388	0.620 0.766	0.301 0.463	0.099 0.182	/	0.070 0.132	0.189 0.319	/	/
	On average	0.470 0.619	0.352 0.519	0.242 0.382	0.594 0.743	0.262 0.411	0.087 0.159	/	0.080 0.147	0.127 0.220	/	/
Graz50	U-Net	0.479 0.638	0.278 0.436	/	0.614 0.761	0.245 0.394	0.028 0.054	/	0.023 0.046	0.043 0.084	/	/
	DeepLabv3+	0.540 0.701	0.305 0.468	/	0.681 0.811	0.317 0.482	0.081 0.151	/	0.075 0.141	0.111 0.200	/	/
	SegNeXt	0.471 0.626	0.269 0.424	/	0.595 0.746	0.279 0.436	0.028 0.056	/	0.016 0.030	0.064 0.121	/	/
	HRNet	0.508 0.662	0.311 0.475	/	0.626 0.770	0.332 0.499	0.069 0.130	/	0.075 0.118	0.100 0.183	/	/
	PSPNet	0.525 0.682	0.308 0.472	/	0.652 0.790	0.340 0.508	0.062 0.117	/	0.051 0.098	0.097 0.178	/	/
	On average	0.505 0.662	0.294 0.455	/	0.634 0.776	0.303 0.464	0.054 0.102	/	0.048 0.087	0.083 0.153	/	/

Note: / represents that there is no relevant class in the dataset.

References

Li, Z.; Guo, R.; Li, M.; Chen, Y.; Li, G. A review of computer vision technologies for plant phenotyping. Comput. Electron. Agric. 2020, 176, 105672. [Google Scholar] [CrossRef]
Petrova, E.; Pauwels, P.; Svidt, K.; Jensen, R.L. Towards data-driven sustainable design: Decision support based on knowledge discovery in disparate building data. Archit. Eng. Des. Manag. 2019, 15, 334–356. [Google Scholar] [CrossRef]
Li, Q.; Yang, G.; Gao, C.; Huang, Y.; Zhang, J.; Huang, D.; Zhao, B.; Chen, X.; Chen, B.M. Single drone-based 3D reconstruction approach to improve public engagement in conservation of heritage buildings: A case of Hakka Tulou. J. Build. Eng. 2024, 87, 108954. [Google Scholar] [CrossRef]
Boulaassal, H.; Landes, T.; Grussenmeyer, P.; Tarsha-Kurdi, F. Automatic segmentation of building facades using Terrestrial Laser Data. In Proceedings of the Paper Presented at the ISPRS Workshop on Laser Scanning 2007 and SilviLaser 2007, Espoo, Finland, 12–14 September 2007; pp. 65–70. Available online: https://shs.hal.science/halshs-00264839 (accessed on 22 August 2024).
Zhang, G.; Pan, Y.; Zhang, L. Deep learning for detecting building facade elements from images considering prior knowledge. Autom. Constr. 2022, 133, 104016. [Google Scholar] [CrossRef]
Martinović, A.; Mathias, M.; Weissenberg, J.; Van Gool, L. A Three-Layered Approach to Facade Parsing. In Proceedings of the Computer Vision—ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, 7–13 October 2012; Proceedings, Part VII. Fitzgibbon, A., Lazebnik, S., Perona, P., Sato, Y., Schmid, C., Eds.; Springer: Berlin/Heidelberg, Germany, 2012; pp. 416–429. [Google Scholar] [CrossRef]
Lotte, R.G.; Haala, N.; Karpina, M.; Aragão, L.E.O.E.C.D.; Shimabukuro, Y.E. 3D Facade Labeling over Complex Scenarios: A Case Study Using Convolutional Neural Network and Structure-From-Motion. Remote Sens. 2018, 10, 1435. [Google Scholar] [CrossRef]
Neuhausen, M.; König, M. Automatic window detection in facade images. Autom. Constr. 2018, 96, 527–539. [Google Scholar] [CrossRef]
Sun, Y.; Malihi, S.; Li, H.; Maboudi, M. DeepWindows: Windows Instance Segmentation through an Improved Mask R-CNN Using Spatial Attention and Relation Modules. ISPRS Int. J. Geo-Inf. 2022, 11, 162. [Google Scholar] [CrossRef]
Mao, Z.; Huang, X.; Xiang, H.; Gong, Y.; Zhang, F.; Tang, J. Glass facade segmentation and repair for aerial photogrammetric 3D building models with multiple constraints. Int. J. Appl. Earth Obs. Geoinf. 2023, 118, 103242. [Google Scholar] [CrossRef]
Oskouie, P.; Becerik-Gerber, B.; Soibelman, L. Automated Recognition of Building Facades for Creation of As-Is Mock-Up 3D Models. J. Comput. Civ. Eng. 2017, 31, 04017059. [Google Scholar] [CrossRef]
Andrade, D.; Harada, M.; Shimada, K. Framework for automatic generation of facades on free-form surfaces. Front. Archit. Res. 2017, 6, 273–289. [Google Scholar] [CrossRef]
Qi, F.; Tan, X.; Zhang, Z.; Chen, M.; Xie, Y.; Ma, L. Glass Makes Blurs: Learning the Visual Blurriness for Glass Surface Detection. IEEE Trans. Ind. Inform. 2024, 20, 6631–6641. [Google Scholar] [CrossRef]
Shan, Q.; Curless, B.; Kohno, T. Seeing through Obscure Glass. In Proceedings of the Computer Vision—ECCV 2010: 11th Eu-ropean Conference on Computer Vision, Heraklion, Crete, Greece, 5–11 September 2010; Proceedings, Part VI. Daniilidis, K., Maragos, P., Paragios, N., Eds.; Springer: Berlin/Heidelberg, Germany, 2010; pp. 364–378. [Google Scholar] [CrossRef]
Tyleček, R.; Šára, R. Spatial Pattern Templates for Recognition of Objects with Regular Structure. In Proceedings of the Pattern Recognition: 35th German Conference, GCPR 2013, Saarbrucken, Germany, 3–6 September 2013; Proceedings. Weickert, J., Hein, M., Schiele, B., Eds.; Springer: Berlin/Heidelberg, Germany, 2013; pp. 364–374. [Google Scholar] [CrossRef]
Zhou, B.; Zhao, H.; Puig, X.; Fidler, S.; Barriuso, A.; Torralba, A. Scene Parsing through ADE20K Dataset. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; IEEE Publishing: New York, NY, USA, 2017; pp. 5122–5130. [Google Scholar] [CrossRef]
Teboul, O.; Simon, L.; Koutsourakis, P.; Paragios, N. Segmentation of building facades using procedural shape priors. In Proceedings of the 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR), San Francisco, CA, USA, 13–18 June 2010; IEEE Publishing: New York, NY, USA, 2010; pp. 3105–3112. [Google Scholar] [CrossRef]
Korc, F.; Förstner, W. eTRIMS Image Database for Interpreting Images of Man-Made Scenes; Technical Report TR-IGG-P-2009-01; Department of Photogrammetry, Institute of Geodesy and Geoinformation, University of Bonn: Bonn, Germany, 2009; Available online: http://www.ipb.uni-bonn.de/projects/etrims_db/ (accessed on 22 August 2024).
Riemenschneider, H.; Krispel, U.; Thaller, W.; Donoser, M.; Havemann, S.; Fellner, D.; Bischof, H. Irregular lattices for complex shape grammar facade parsing. In Proceedings of the 2012 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Providence, RI, USA, 16–21 June 2012; IEEE Publishing: New York, NY, USA, 2012; pp. 1640–1647. [Google Scholar] [CrossRef]
Wang, B.; Zhang, J.; Zhang, R.; Li, Y.; Li, L.; Nakashima, Y. Improving facade parsing with vision transformers and line integration. Adv. Eng. Inform. 2024, 60, 102463. [Google Scholar] [CrossRef]
Mao, Z.; Huang, X.; Gong, Y.; Xiang, H.; Zhang, F. A Dataset and Ensemble Model for Glass Facade Segmentation in Oblique Aerial Images. IEEE Geosci. Remote Sens. Lett. 2022, 19, 6513305. [Google Scholar] [CrossRef]
Xi, R.; Ma, T.; Chen, X.; Lyu, J.; Yang, J.; Sun, K.; Zhang, Y. Image Enhancement Using Adaptive Region-Guided Multi-Step Exposure Fusion Based on Reinforcement Learning. IEEE Access 2023, 11, 31686–31698. [Google Scholar] [CrossRef]
Cotogni, M.; Cusano, C. Select & Enhance: Masked-based image enhancement through tree-search theory and deep reinforcement learning. Pattern Recognit. Lett. 2024, 183, 172–178. [Google Scholar] [CrossRef]
Wang, H.; Zhang, W.; Bai, L.; Ren, P. Metalantis: A Comprehensive Underwater Image Enhancement Framework. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5618319. [Google Scholar] [CrossRef]
Wang, H.; Zhang, W.; Ren, P. Self-organized underwater image enhancement. ISPRS J. Photogramm. Remote Sens. 2024, 215, 1–14. [Google Scholar] [CrossRef]
Xi, R.; Lyu, J.; Ma, T.; Sun, K.; Zhang, Y.; Chen, X. Learning filter selection policies for interpretable image denoising in parametrised action space. IET Image Process. 2024, 18, 951–960. [Google Scholar] [CrossRef]
Yu, K.; Wang, X.; Dong, C.; Tang, X.; Loy, C.C. Path-Restore: Learning Network Path Selection for Image Restoration. IEEE Trans. Pattern Anal. Mach. Intell. 2022, 44, 7078–7092. [Google Scholar] [CrossRef]
Xu, Y.; Hou, J.; Zhu, X.; Wang, C.; Shi, H.; Wang, J.; Li, Y.; Ren, P. Hyperspectral Image Super-Resolution with ConvLSTM Skip-Connections. IEEE Trans. Geosci. Remote Sens. 2024, 62, 5519016. [Google Scholar] [CrossRef]
Russell, B.C.; Torralba, A.; Murphy, K.P.; Freeman, W.T. LabelMe: A Database and Web-Based Tool for Image Annotation. Int. J. Comput. Vis. 2008, 77, 157–173. [Google Scholar] [CrossRef]
Ronneberger, O.; Fischer, P.; Brox, T. U-Net: Convolutional Networks for Biomedical Image Segmentation. In Proceedings of the Medical Image Computing and Computer-Assisted Intervention—MICCAI 2015: 18th International Conference, Munich, Germany, 5–9 October 2015; Proceedings, Part III. Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F., Eds.; Springer: Cham, Switzerland, 2015; pp. 234–241. [Google Scholar] [CrossRef]
Chen, L.C.; Zhu, Y.; Papandreou, G.; Schroff, F.; Adam, H. Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation. In Proceedings of the Computer Vision—ECCV 2018: 15th European Conference, Munich, Germany, 8–14 September 2018; Proceedings, Part VII. Ferrari, V., Hebert, M., Sminchisescu, C., Weiss, Y., Eds.; Springer: Cham, Switzerland, 2018; pp. 833–851. [Google Scholar] [CrossRef]
Guo, M.H.; Lu, C.Z.; Hou, Q.; Liu, Z.; Cheng, M.M.; Hu, S.M. SegNeXt: Rethinking Convolutional Attention Design for Semantic Segmentation. In Proceedings of the Advances in Neural Information Processing Systems 35 (NeurIPS 2022), New Orleans, LA, USA, 28 November–9 December 2022; Koyejo, S., Mohamed, S., Agarwal, A., Belgrave, D., Cho, K., Oh, A., Eds.; Curran Associates: Red Hook, NY, USA, 2022. Available online: https://proceedings.neurips.cc/paper_files/paper/2022/hash/08050f40fff41616ccfc3080e60a301a-Abstract-Conference.html (accessed on 22 August 2024).
Sun, K.; Xiao, B.; Liu, D.; Wang, J. Deep High-Resolution Representation Learning for Human Pose Estimation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 5693–5703. Available online: https://openaccess.thecvf.com/content_CVPR_2019/html/Sun_Deep_High-Resolution_Representation_Learning_for_Human_Pose_Estimation_CVPR_2019_paper.html (accessed on 22 August 2024).
Zhao, H.; Shi, J.; Qi, X.; Wang, X.; Jia, J. Pyramid Scene Parsing Network. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 6230–6239. [Google Scholar] [CrossRef]
Rahman, M.A.; Wang, Y. Optimizing Intersection-Over-Union in Deep Neural Networks for Image Segmentation. In Proceedings of the Advances in Visual Computing: 12th International Symposium, ISVC 2016, Las Vegas, NV, USA, 12–14 December 2016; Proceedings, Part I. Bebis, G., Boyle, R., Parvin, B., Koracin, D., Porikli, F., Skaff, S., Entezari, A., Min, J., Iwai, D., Sadagic, A., et al., Eds.; Springer: Cham, Switzerland, 2016; pp. 234–244. [Google Scholar] [CrossRef]
Hu, Y.; Wei, J.; Zhang, S.; Liu, S. FDIE: A graph-based framework for extracting design information from annotated building facade images. J. Asian Archit. Build. Eng. 2024. Advance online publication. [Google Scholar] [CrossRef]
Jing, L.; Chen, Y.; Tian, Y. Coarse-to-Fine Semantic Segmentation From Image-Level Labels. IEEE Trans. Image Process. 2020, 29, 225–236. [Google Scholar] [CrossRef] [PubMed]
Zhou, Z.-H. A brief introduction to weakly supervised learning. Natl. Sci. Rev. 2017, 5, 44–53. [Google Scholar] [CrossRef]

Figure 1. The distribution of aspect ratio in the IRFs.

Figure 2. The distribution of pixel number in the IRFs.

Figure 3. Distribution of the glass elements in the IRFs (except windows).

Figure 4. The proportions of image features in the IRFs.

Figure 5. An example of the production of the IRF samples.

Figure 6. Number of annotated pixels and elements by class in the IRFs.

Figure 7. Data size of the IRFs and existing datasets.

Figure 8. Architectural style of the IRFs and existing datasets.

Figure 9. Class proportion of the IRFs and existing datasets.

Figure 10. Number of elements and classes per image in the IRFs and existing datasets.

Figure 11. Examples of the samples of the adopted datasets.

Figure 12. The process of the experiment.

Figure 13. Examples of the segmented facade. (left–right) IRFs, CMP, ADE20K, ECP Paris, eTRIMS, and Graz50; (top–bottom) U-Net, DeepLabv3+, SegNeXt, HRNet, and PSPNet.

Figure 14. The brief results of the experiment. The prefix “A-” means that the metric is an average of the results of the classifiers trained by five different networks.

Figure 15. Further use of the segmented facade.

Table 1. Class systems of the IRFs and existing datasets.

IRFs	CMP	ADE20K	ECP Paris	eTRIMS	Graz50
Background	Background	Background	Sky	Sky, Pavement, Road, Car	Sky
Plant	/	Plant, Plant Life, Flora, Tree	/	Vegetation	/
Wall	Facade, Cornice, Molding, Pillar, Deco	Building, Edifice	Wall, Roof, Chimney, Outlier	Building	Wall
Window	Window, Sill, Blind, Shop	Window	Window, Shop	Window	Window
Door	Door	Door	Door	Door	Door
Fence	Balcony	Balcony	Balcony	/	/

Note: / represents that there is no relevant class in the dataset.

Table 2. The complexity of glass elements (except windows) and foreground of the IRFs and existing datasets.

	Glass Door		Glass Fence		Plant
Dataset	Pixels n (%)	Images n (%)	Pixels n (%)	Images n (%)	Pixels n (%)	Images n (%)
IRFs	2.7 × 10⁷ (2.0%)	576 (54.5%)	1.9 × 10⁷ (1.4%)	153 (14.5%)	4.1 × 10⁷ (3.1%)	417 (39.5%)
CMP	1.7 × 10⁶ (0.7%)	86 (22.8%)	/	/	1.1 × 10⁶ (0.5%)	34 (9.0%)
ADE20K	1.7 × 10⁵ (0.6%)	52 (20.9%)	2.4 × 10⁴ (0.1%)	1 (0.4%)	1.7 × 10⁶ (5.7%)	108 (43.4%)
ECP Paris	1.8 × 10⁵ (0.9%)	69 (66.3%)	/	/	/	/
eTRIMS	3.9 × 10⁵ (6.7%)	4 (6.7%)	/	/	6.7 × 10⁵ (11.4%)	56 (93.3%)
Graz50	3.0 × 10⁴ (0.4%)	11 (22.0%)	/	/	7.6 × 10³ (0.1%)	1 (2.0%)

Note: / represents that there is no relevant element in the dataset.

Table 3. Sample splits of the IRFs (n = 996).

Samples	B	D	F	N	Sum
Training	64	338	42	251	695
Validation	10	49	7	36	102
Testing	18	97	12	72	199
Sum	92	484	61	359	996

Note: B = both glass door and fence, D = glass door only, F = glass fence only, N = no glass door or fence.

Table 4. The metrics for assessment.

Level	Metrics	Description
Subclass level	IoU and F1-score of subclasses Glass Door, Non-Glass Door, Glass Fence, and Non-Glass Fence.	For single images, if a pixel of Glass Door (Glass Fence) or Non-Glass Door (Non-Glass Fence) was predicted as Door (Fence), the prediction was regarded as true at the subclass level.
Class level	IoU and F1-score of classes Background, Plant, Wall, Window, Door, and Fence.	The class-level metrics were computed based on the predictions and ground truth directly.
Image level	Weighted mean IoU (WMIoU) and accuracy.	The pixel number of each class was used as the weight of the class for the calculation of WMIoU.

Table 5. Comparison to previous study of the image-level accuracy of the classifiers trained with classical facades and tested with free facades. Values in the second row show the lowest and highest accuracy among the five selected networks in this study.

	Training and Validation Sets
Testing Set	CMP	ECP Paris	eTRIMS	Graz50
SJC [7]	0.36	0.25	0.45	0.37
IRFs (ours)	0.333–0.390	0.547–0.676	0.494–0.686	0.626–0.701

Table 6. Comparison to previous study that applied the free facades on both classifier training and testing. Only the best result of each metric is shown in the table as there were too many networks used in the studies.

	Image-Level Metric			Class-Level Metric
Dataset	Accuracy	mIoU	WMIoU	IoU (Wall)	IoU (Window)	IoU (Door)
CFP [20]	0.888	0.620	/	0.855	0.653	0.547
IRFs (ours)	0.858	/	0.755	0.837	0.672	0.420

Note: / represents that the metric is not computed in the study.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wei, J.; Hu, Y.; Zhang, S.; Liu, S. Irregular Facades: A Dataset for Semantic Segmentation of the Free Facade of Modern Buildings. Buildings 2024, 14, 2602. https://doi.org/10.3390/buildings14092602

AMA Style

Wei J, Hu Y, Zhang S, Liu S. Irregular Facades: A Dataset for Semantic Segmentation of the Free Facade of Modern Buildings. Buildings. 2024; 14(9):2602. https://doi.org/10.3390/buildings14092602

Chicago/Turabian Style

Wei, Junjie, Yuexia Hu, Si Zhang, and Shuyu Liu. 2024. "Irregular Facades: A Dataset for Semantic Segmentation of the Free Facade of Modern Buildings" Buildings 14, no. 9: 2602. https://doi.org/10.3390/buildings14092602

APA Style

Wei, J., Hu, Y., Zhang, S., & Liu, S. (2024). Irregular Facades: A Dataset for Semantic Segmentation of the Free Facade of Modern Buildings. Buildings, 14(9), 2602. https://doi.org/10.3390/buildings14092602

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Irregular Facades: A Dataset for Semantic Segmentation of the Free Facade of Modern Buildings

Abstract

1. Introduction

2. The Dataset of Irregular Facades (IRFs)

2.1. Overview of the IRFs

2.1.1. Image Sizes

2.1.2. Image Features

2.2. Classes and Annotations

2.2.1. Classes

2.2.2. Number of Pixels and Elements

2.3. Comparison to Existing Datasets

2.3.1. Data Size

2.3.2. Architectural Style

2.3.3. Class Proportion

2.3.4. Complexity

3. Experiment

3.1. Datasets and Networks

3.1.1. Datasets

3.1.2. Networks

3.2. Process

3.2.1. Sample Splitting

3.2.2. Pre-Processing

3.2.3. Training and Test

3.2.4. Assessment

3.3. Results

3.3.1. Image Level

3.3.2. Class Level

3.3.3. Subclass Level

4. Discussion

5. Limitation

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI