1. Introduction
Urban renewal [
1] typically involves the demolition of old buildings, construction of new structures, and land re-planning. This comprehensive approach effectively addresses land shortages and optimizes urban spatial layouts and functional designations. Influenced by factors such as urban pollution [
2], land resources [
3], and regional policies, the demolition and transformation of old buildings have become crucial methods to meet the requirements of urban renewal. In particular, in megacities, there is an urgent need for urban renewal, which is a key development goal for urban planning, the establishment of smart cities, and sustainable growth [
4].
Construction waste is an inevitable byproduct of the urban renewal process. The extensive and large-scale urban construction has significantly increased construction waste [
5], causing severe environmental pollution, human health risks, and ecological pressure [
6]. The European Union (EU) construction industry generates over 500 million tons of construction waste annually, accounting for 50% of all waste produced in the EU [
7]. In the United States, the construction industry produces 700 million tons of construction waste each year [
8]. Similarly, China faces formidable challenges, with its urbanization rate escalating at an unprecedented pace, surging from 51.83% in 2011 to 65.22% in 2022 [
9]. Construction waste constitutes over 40% of urban solid waste [
10]. With the rapid development of urbanization in China over the past decade, from 2006 to 2020, construction waste production has surged from 470 million tons to a staggering 3037 million tons. It reached 3209 million tons in 2021 and is projected to surpass 4000 million tons by 2026 [
11]. Therefore, there is an urgent need to accurately estimate the annual production of construction waste and implement appropriate waste management to improve urban quality, addressing the consumption issues in the urban renewal process [
12].
Numerous methods currently exist for estimating the annual production of construction waste. Wu et al. [
13] classify the methods for estimating construction waste into six types, including site visit (SV), generation rate calculation (GRC), lifetime analysis (LA), classification system accumulation (CSA), variable modeling (VM), and other methods. (1) The SV method requires on-site investigations, including direct measurements through weight and volume [
14] or indirect measurements using other easily accessible indicators (such as waste transport tickets) [
15]. Direct measurements can provide the most accurate waste yield but consume a significant amount of time, money, and labor. Indirect measurements can only roughly reflect the waste generation. (2) The GRC method calculates the total waste volume by multiplying the quantity of specific units by the corresponding generation rate. The per capita multiplier [
16] is the earliest quantification method for construction and demolition waste in GRC, offering a simple method for quantifying waste from construction and demolition activities in an area. However, economic conditions can lead to significant fluctuations in construction and demolition activities while the population remains almost constant. (3) The LA method assumes that all buildings must be demolished after a certain lifespan and infers the construction waste production by calculating the sum of weights to be demolished at expiration [
17]. However, this method requires appropriate assumptions about the lifespan of buildings and cannot provide detailed volume estimates at the material level. (4) The CSA method combines the GRC method with waste classification, offering a more detailed waste estimate by quantifying each specific material [
18]. However, this method requires region-specific data, which may not be suitable for industries with different construction technologies. (5) The VM method designs predictive models such as multiple linear regression and gray prediction models based on accessible variables [
19]. However, such methods are generally only suitable for short-term forecasting and cannot accurately estimate the annual production of construction waste. (6) Other methods mainly include the popular shallow machine learning (SML) methods for prediction in recent years [
20,
21]. However, the traditional methods consume considerable manpower, resources, and time and lack high accuracy and efficiency.
In recent years, with the continuous advancement of remote sensing technology and the widespread application of deep learning in target extraction, estimating annual construction waste production at a macroscopic level using HRSIs has become a crucial focus of current research. HRSIs possess advantages such as high spatial resolution, vital timeliness, and abundant information [
22], making them suitable for large-scale macroscopic observations of changes in waste piles. However, construction waste consists of debris from the construction process and demolition waste from dismantling activities [
23]. Except for the recycled waste, the remaining debris is sent to construction waste disposal sites for treatment. Hence, accurately estimating the generation of urban construction waste solely based on changes in waste piles at construction waste disposal sites is challenging. It is essential to comprehensively consider variations in both building areas and construction waste areas depicted in the images. Typically, the methods involve calculating construction waste production based on area multiplied by generation rate [
24,
25], subcategorizing engineering and demolition waste, and tracking changes in construction waste disposal sites. These approaches help eliminate errors caused by intermediate variables, enabling the analysis of urban construction waste landfill volume and resource conversion. This, in turn, enhances the precision of estimating both the production of urban construction waste and the capacity for its disposal.
The change in building area calculation requires building identification as the initial step. With the introduction of convolutional neural networks (CNNs) [
26], numerous works have emerged [
27]. For instance, Kang et al. [
28] proposed an efficient end-to-end fully convolutional model, EU-Net, designed for extracting buildings from optical remote sensing images. Shao et al. [
29] introduced a building residual refine network (BRRNet) based on an encoder–decoder structure, enabling the accurate and comprehensive extraction of complex-shaped buildings. Similarly, He et al. [
30] presented an automated building extraction method utilizing fully convolutional networks (FCNs) and conditional random fields (CRFs), which are effectively applicable for building target extraction in remote sensing images. Chen et al. [
31] employed the encoder–decoder backbone of DeepLabV3+. They proposed a dense residual neural network (DR-Net) by combining densely connected convolutional neural network (DCNN) and residual network (ResNet) structures, demonstrating efficient building extraction. Wang et al. [
32] combined a UNet, residual learning, atrous spatial pyramid pooling, and focal loss to propose an effective model, the residual UNet (RU-Net), for building extraction, achieving three to four times higher efficiency compared to FastFCNs and DeepLabV3+. Inspired by graph convolutional networks (GCNs), which can naturally model long-range spatial relations in HRSIs [
33], it has been observed that multi-scale feature fusion enhances the accuracy of building identification. For instance, Wang et al. [
34] proposed an end-to-end multi-scale boundary detection network (MBDNet) that combines a multi-level neural network structure with a boundary detector, improving building extraction through boundary perception. Zhang et al. [
35] proposed the dual spatial attention transformer net (DSAT-Net), a high-precision building extraction model, and designed an efficient dual spatial attention transformer (DSAFormer) to address the shortcomings of the standard vision transformer. Zhang et al. [
36] proposed the shunted dual skip connection UNet (SDSC-UNet), which introduces a novel shunted transformer to enable the model to establish global dependencies while capturing multi-scale information internally and is designed for high-precision building extraction.
Tracking and identifying construction waste disposal sites through high-resolution images has undergone significant research development. For example, Chen et al. [
37] proposed an optimal method that combines morphological indices and hierarchical segmentation. This approach enhanced the separability of construction waste from surrounding ground objects by comparing differences in spectral, geometric shape, and texture aspects, effectively addressing the spectral confusion between construction waste and the surrounding ground. Similarly, Davis et al. [
38] designed a deep convolutional neural network to simulate a real construction site scenario, which is challenging to classify on site. They employed digital images of waste deposited in construction site bins (artificial artifacts) to identify seven types of construction waste. Sun et al. [
39] addressed the issue of insufficient data for solid waste detection by introducing a data augmentation strategy. They proposed an improved pix2pix model to generate sufficient high-quality synthetic images for solid waste detection, thereby establishing a landfill dataset. Li et al. [
40] constructed a new solid waste detection dataset and proposed a location-guided key point network with multiple enhancements (LKN-ME) for urban solid waste detection. Xiaoyu et al. [
41] used the DeeplabV3+ network model and encoder to locate shallow and high-level semantic features. This enabled them to identify the location, type, area, and volume of illegally accumulated construction waste in remote sensing images. Zhang et al. [
42] proposed the ConvLSR-Net, where they appended a novel efficient vision transformer module called long-short-range transformer (LSRFormer) to learn both local and global features at each stage, making it applicable for the semantic segmentation extraction of various types of aerial images and construction waste in complex scenes.
The studies above indicate notable advancements in applying deep learning to identify buildings and construction waste in HRSIs. However, the following issues still exist in the current research:
Challenges in building recognition within complex scenes often arise from small targets, diverse sizes, varied shapes, and different types of buildings. These factors contribute to a relatively lower accuracy in building recognition and suboptimal image segmentation [
43]. The inadequate fusion of multi-scale features may lead to misclassifications [
44]. Traditional convolutions frequently struggle to retain spatial details effectively, leading to blurred boundaries and overlooking small buildings. Fixed receptive fields consistently lead to discontinuous gaps when extracting information from large buildings [
45].
Complex scenes present multiple challenges in identifying construction waste. These challenges include the following: (1) Construction waste is often situated in environments with similar feature information. In such scenarios, the network encounters difficulty in emphasizing less prominent characteristics. (2) Construction waste typically exhibits irregular shapes, fragmentation, and dispersion, posing challenges to the network’s capacity to capture spatial information and impeding accurate detection. (3) The unique attributes of construction waste, including color, shape, and texture, combined with substantial distinctions from other ground objects in satellite image backgrounds, make simple CNN structures insufficient for identifying construction waste in complex environments. A deeper and more specific network architecture is essential to accurately delineate construction waste areas in remote sensing images with complex backgrounds [
46]. To mitigate labor and time costs and enhance the efficiency of estimating construction waste production, there is an urgent need to develop a flexible multi-scale target identification model addressing construction waste subdivision identification and tracking changes in construction waste disposal sites.
The lack of a dataset for construction waste disposal sites constitutes a significant issue. There is a severe shortage of publicly available datasets designed explicitly for construction waste identification, and the existing datasets adhere to different standards. The commonly used datasets for construction waste extraction include the AerialWaste dataset [
47] and the SWAD dataset [
48]. Still, they exhibit several shortcomings in practical applications: (1) The AerialWaste dataset lacks the annotated information required for semantic segmentation, as its classification is based on the presence of solid waste rather than on each pixel in the image. This limitation impedes the quantitative analysis of waste production. (2) The SWAD dataset employs satellite spatial resolutions of 180 cm of GSD (Ground Sampling Distance), which is not a sub-meter satellite image. Consequently, the images fail to offer clear views of construction waste disposal sites, complicating the discernment of typical features. (3) Owing to indistinct details, the color of construction waste may slightly differ from its surroundings. However, these differences remain imperceptible due to the lower resolution, resulting in unclear heights and shapes of waste piles and an inability to accurately estimate activity at waste disposal sites.
In response to the abovementioned challenges, a multi-scale target attention-enhanced network (MT-AENet) is introduced to extract buildings and construction waste from complex backgrounds through semantic segmentation in HRSIs. The main contributions of this study can be summarized as follows:
- (1)
A novel model, the MT-AENet, based on an encoder–decoder structure, is designed explicitly for feature extraction in HRSIs. The encoder utilizes ResNet101 as the backbone network to extract high-dimensional features. The DS-ASPP and multi-scale feature fusion modules are integrated into the MT-AENet to extract features from both local and global image levels, which helps to fuse contextual information better. The dual-attention mechanism module further improves the accuracy and efficiency of detecting buildings and construction waste in HRSIs with complex backgrounds.
- (2)
A method for calculating the annual production of construction waste based on analyzing building changes using remote sensing images is proposed. By leveraging deep learning algorithms to extract and analyze the distribution of buildings in the same area at different times, the area changes of newly added and demolished buildings can be quickly and accurately obtained, and construction waste production can be accurately estimated. Additionally, by analyzing the changes in the landfill volume of construction waste disposal sites and assessing the landfill volume and resource conversion capacity, the annual disposal capacity of urban construction waste can be obtained.
- (3)
A comprehensive dataset, namely, the Construction Waste Disposal Sites Dataset (CWDSD), has been established for the internal area identification of construction waste disposal sites. Taking Changping and Daxing District, Beijing, China as an example, this dataset of construction waste disposal sites is curated from the sub-meter level satellite GF-2 and Google Earth. Detailed labeled images are provided, annotating various areas within the disposal site, including vacant landfills, engineering facilities, and waste storage areas.
The remainder of this study is organized as follows.
Section 2 provides the details of the proposed method.
Section 3 describes the dataset and experimental setup.
Section 4 introduces the experimental results, comparing the MT-AENet to traditional and current state-of-the-art methods. A detailed calculation of the construction waste yield is presented.
Section 5 discusses the differences between the existing work and our approach.
Section 6 summarizes the conclusions and future research.
6. Conclusions and Future Directions
6.1. Conclusions
This study investigates the CNN-based semantic segmentation of HRSIs to identify and extract buildings and construction waste. Considering the limitations of traditional recognition network architectures, a novel encoder–decoder structure is designed, constructing the Multi-scale Target Attention-Enhanced Network by integrating multi-scale features. The MT-AENet leverages contextual information from HRSIs more effectively, enhancing the model’s accuracy in recognition. Buildings in the study area are extracted through the MT-AENet. The engineering and demolition waste are calculated based on the increased and decreased building areas from HRSI data over two consecutive years. Consequently, the annual production of construction waste is calculated. Simultaneously, the MT-AENet extracts and analyzes the change in construction waste in the construction waste disposal sites during the same period. The annual production of construction waste and the annual resource conversion rate in the regional construction waste are accurately estimated. The experimental results indicate the following conclusions.
First, for the identification and extraction of buildings, the MT-AENet outperforms traditional networks with an improvement in Precision, Recall, F1, and IoU by 0.33%, 1.94%, 1.3%, 1.67%, and 2.45%, respectively. The BER is also reduced by 0.79%. For the identification and extraction of construction waste, the MT-AENet improves in Precision, Recall, F1, and IoU by 1.85%, 1.57%, 1.04%, 1.12%, and 0.81%, respectively. The BER is reduced by 0.7%. The MT-AENet is a high-precision and flexible model for dynamic recognition through HRSIs.
Second, by dividing the study area into smaller administrative regions, changes in buildings can be monitored dynamically, and the increase or decrease in the area of buildings in each region can be analyzed quickly and accurately. The calculations reveal that from 2019 to 2020, approximately 3.03 km2 of buildings were dismantled and renovated in Changping District. The engineering waste generated during urban renewal ranged from 24,916 tons to 66,443 tons, with an average of approximately 45,679.5 tons. The demolition waste generated ranged from 3,089,887 tons to 5,021,067 tons, averaging approximately 4,055,477 tons. The estimated annual construction waste production is determined to be approximately 4,101,156.5 tons.
Third, construction waste can be extracted from the disposal sites in Changping District. The construction waste landfill volume was approximately 32,798,392.32 m3, weighing 59,037,106.176 tons in 2019. In 2020, approximately 34,049,423.36 m3 of construction waste was deposited, weighing about 61,288,962.048 tons. The difference over the two years indicates an increase in construction waste in the disposal sites in Changping District by approximately 2,251,855.872 tons. In summary, the indirectly calculated construction waste for resource conversion is approximately 1,849,300.628 tons, with an annual resource conversion rate of 45.09%.
6.2. Future Directions
Currently, this study is limited to Changping District in Beijing. In the future, remote sensing images collected from different areas will be used to expand this study into a more extensive research area. This expansion aims to estimate the annual production of construction waste for an entire city, province, country, or even more extensive region, providing comprehensive and timely data support for urban renewal. This will aid in formulating scientifically sound urban renewal plans, reducing cost risks, and achieving the sustainable development goals of urban renewal.