Next Article in Journal
The Effects of Routinization on Radical and Incremental Creativity: The Mediating Role of Mental Workloads
Previous Article in Journal
Seasonal Oxy-Inflammation and Hydration Status in Non-Elite Freeskiing Racer: A Pilot Study by Non-Invasive Analytic Method
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Developing a Prediction Model of Demolition-Waste Generation-Rate via Principal Component Analysis

1
School of Science and Technology Acceleration Engineering, Kyungpook National University, Daegu 41566, Republic of Korea
2
School of Architectural, Civil, Environmental and Energy Engineering, Kyungpook National University, Daegu 41566, Republic of Korea
3
Industry Academic Cooperation Foundation, Kyungpook National University, Daegu 41566, Republic of Korea
*
Author to whom correspondence should be addressed.
Int. J. Environ. Res. Public Health 2023, 20(4), 3159; https://doi.org/10.3390/ijerph20043159
Submission received: 13 January 2023 / Revised: 8 February 2023 / Accepted: 9 February 2023 / Published: 10 February 2023

Abstract

:
Construction and demolition waste accounts for a sizable proportion of global waste and is harmful to the environment. Its management is therefore a key challenge in the construction industry. Many researchers have utilized waste generation data for waste management, and more accurate and efficient waste management plans have recently been prepared using artificial intelligence models. Here, we developed a hybrid model to forecast the demolition-waste-generation rate in redevelopment areas in South Korea by combining principal component analysis (PCA) with decision tree, k-nearest neighbors, and linear regression algorithms. Without PCA, the decision tree model exhibited the highest predictive performance (R2 = 0.872) and the k-nearest neighbors (Chebyshev distance) model exhibited the lowest (R2 = 0.627). The hybrid PCA–k-nearest neighbors (Euclidean uniform) model exhibited significantly better predictive performance (R2 = 0.897) than the non-hybrid k-nearest neighbors (Euclidean uniform) model (R2 = 0.664) and the decision tree model. The mean of the observed values, k-nearest neighbors (Euclidean uniform) and PCA–k-nearest neighbors (Euclidean uniform) models were 987.06 (kg·m−2), 993.54 (kg·m−2) and 991.80 (kg·m−2), respectively. Based on these findings, we propose the k-nearest neighbors (Euclidean uniform) model using PCA as a machine-learning model for demolition-waste-generation rate predictions.

1. Introduction

Owing to recent developments in artificial intelligence, machine-learning models have been widely studied for waste generation prediction. Machine learning has been successfully applied in developing waste generation prediction models, owing to its excellent ability to model complex mechanisms [1,2]. Information on waste generation is essential in waste management strategies, such as the planning of landfill spaces, calculation of levies on pollution sources or polluters and of subsidies for recycling, and establishment of corporate waste management policies [3].
With the rapid growth of urban populations, waste management has become an important issue for urban quality of life [4,5]. Waste treatment imposes social costs and environmental burdens, affecting carbon emissions, traffic congestion, and air quality. Proper waste management is therefore essential in constructing sustainable and comfortable urban environments [6,7,8]. Construction and demolition waste (CDW) is a key by-product of urbanization and substantially contributes to environmental degradation [9,10,11]. The construction industry is considered harmful to the environment because it is a major consumer of natural resources, materials, and energy; furthermore, the demolition of buildings at the end of their service life generates substantial amounts of waste [12,13]. CDW accounts for 35 to 40% of waste generation worldwide [14], 36% in the EU, and 67% in the United States [15]. Additionally, 70 to 90% of CDW is attributed to demolition waste [16,17]. As a result of its environmental effects and increasing generation, CDW presents a major challenge for the construction industry. Due to its social and environmental impacts, in addition to a huge amount of waste emissions, CDW has been the subject of much research, particularly during periods of rapid development [18,19].
Attempts have been made to develop predictive models for waste management in relation to waste generation, and several machine-learning algorithms have been utilized. For the period 2004 to 2019, Abdallah et al. [20] reported that the most commonly used machine-learning algorithms for the development of artificial intelligence models related to waste management were artificial neural networks (ANNs), support vector machines (SVMs), linear regression (LR) analysis, decision trees (DTs), and genetic algorithms (GAs), while random forest (RF) and k-nearest neighbor (KNN) algorithms have also been used. The artificial intelligence models used in different studies, applying these algorithms individually, exhibited variable predictive performances even when developed using the same algorithm. This variability is due to differences between the studies in the data used (for instance, in sample size and input variable characteristics), the particular advantages and disadvantages of each machine-learning algorithm, and the selection of hyperparameters, factors that affect and limit machine-learning models’ predictive performance. The advantages and disadvantages of each machine-learning algorithm constrain the potential improvements in performance that they can achieve.
To overcome such limitations, research has been conducted on the development of hybrid models combining various artificial intelligence systems; such hybrid models show better predictive performance than single algorithms. For example, a hybrid model to predict solid waste generation, applying the wavelet denoising method and partial least squares to SVM, achieved better predictive performance than the SVM model [21,22]. When developing a CDW-prediction model, applying a gray model improved the predictive performance of a support vector regression (SVR) model [14]. A hybrid model combining a GA and ANN to predict municipal solid waste achieved significantly better predictive performance than the ANN alone, raising the R2 from 0.13 to 0.78 [23]. A categorical principal component analysis (CATPCA) hybrid model using six input variables and generating six principal components (PCs) was applied to ANN, SVR, and RF to predict the demolition-waste-generation rate (DWGR). The hybrid PCA–SVR model achieved significantly better performance than SVR alone (SVR, R2 = 0.007; hybrid, R2 = 0.594) [24].
Based on these findings, the use of hybrid models has enabled continuous improvement in waste-generation-predictive performance and provides a means to develop effective artificial intelligence models for waste generation prediction. To further improve this, we aimed to develop a hybrid machine-learning model to predict DWGR in redevelopment areas in South Korea, by testing machine-learning algorithms appropriate for the data, selecting optimal hyperparameters, and combining them with artificial intelligence approaches. We focused on the DWGR because as a tool for developing waste management strategies, the waste generation rate has been widely applied in CDW management and research [17,25,26,27].
The main purpose of this study is to develop optimal ML models to forecast DWGR. More specifically, the detailed purpose is not only to develop a DWGR prediction model, but also to suggest ways to improve the prediction performance. In order to achieve this research purpose, this study presented a new approach to PCA utilization and developed a hybrid model with excellent predictive performance.

2. Methods and Materials

Figure 1 summarizes the research workflow. We first constructed a dataset by collecting DWGR data for 160 buildings in redevelopment areas in South Korea. LR, KNN, and DT algorithms were considered for developing the models, and the optimal hyperparameters were selected for each machine-learning algorithm. To improve model performance, hybrid models were developed by applying PCA. Model validation was conducted using leave-one-out cross-validation, and performance was evaluated using mean squared error (MSE), root mean square error (RMSE), coefficient of determination (R2), and mean absolute error (MAE).

2.1. Data Collection and Preprocessing

We collected information on building characteristics (location, structure, usage, wall type, roof type, gross floor area (GFA, in m2), and number of floors) by directly surveying each building before demolition in two South Korean cities’ redevelopment areas, namely Project A in Daegu (35.88 N latitude, 128.61 E longitude) and Project B in Busan (35.87 N latitude, 128.63 E longitude). The study areas were urban regeneration districts, consisting largely of aged buildings that are primarily low-rise. As Korea plans to demolish old buildings via urban regeneration projects, the amount of demolition waste is expected to increase significantly. Hence, the current study aims to provide useful information to aid governments and other stakeholders in their waste management approach.
Following demolition, DWGR information was obtained from truck transport information. Mean DWGR differed clearly with location, usage, structure, wall type, and roof type (Table 1), differing significantly between Projects A (741.6 kg·m−2) and B (1238.9 kg·m−2). The mean DWGR of mixed-use (residential and commercial) buildings was approximately 30% higher than that of residential buildings. These factors were therefore used as the input variables to predict DWGR. For most buildings, the GFA and DWGR were within 300 m2 and 1800 kg·m−2, respectively, because they were mostly old low-rise buildings in redevelopment areas. DWGR was defined as follows:
D W G R i = A   o f   b u i l d i n g i G F A   o f   b u i l d i n g i
where D W G R i is the demolition-waste-generation rate (kg·m−2) and A is the waste generated (kg) for building i .
To improve the machine-learning model performance by using a uniform data scale, the data were first normalized as follows:
x n o r m = x x m i n x m a x x m i n
where x is the data value, and x m a x and x m i n are the maximum and minimum values, respectively.

2.2. Applied Machine-Learning Algorithms

Our data were low dimensional, comprising few input variables (five categorical and two numerical). We first applied the DT algorithm, which can be applied regardless of the input variable type. Although the KNN algorithm is suitable for low-dimensional data [28], it achieved poor predictive performance (R2 = 0.51) in a prior study [29], and it has rarely been applied in studies of waste generation. For these reasons, we selected it for use here.
The LR algorithm has been widely used in studies of waste generation, with variable predictive performance [21]. As it assumes linear relationships, however, this method is not optimal for modelling highly non-linear data [21,28]. Nonetheless, we assumed that it would be useful for testing whether its performance can be improved by applying PCA.

2.2.1. Principal Component Analysis

PCA is a multivariate statistical method that reduces the complexity introduced by the inclusion of multiple variables [30] by reducing many variables into fewer variables (PCs) that explain the variance in the data. PCA converts the input variables into independent and linear compound PCs [31], which can then be used as input variables for artificial intelligence model development. Noori et al. (2008) [31] attempted to develop a hybrid PCA–SVM model for predicting solid waste generation, comparing an SVM model utilizing thirteen input variables with a hybrid model using six PCs: the hybrid model did not effectively improve performance (SVM alone, R2 = 0.78; hybrid: R2 = 0.75). Applying PCA improved the predictive performance of an ANN model from 0.77 to 0.80 [32].
PCA has been used to convert categorical or nominal variables into numerical variables [24,33,34]. PCA improved the performance of logistic regression in terms of accuracy and sensitivity, based on validation testing [20]. Further, PCA has been used to score social capital, as well as a nominal and ordinal variable [28]. For these reasons, we applied it in developing machine-learning models with high predictive performance.

2.2.2. Linear Regression

LR models are supervised learning models because they use a linear equation based on specific input values and a machine-learning-based output value [35,36]. LR models are easy to interpret and have low computation costs, although they are generally considered unsuitable for modeling nonlinear data [20] and are prone to bias [28]. Because of its benefits, LR has been consistently applied in developing models to predict waste generation, such as for municipal solid waste [36,37,38,39,40,41].

2.2.3. K-Nearest Neighbor

KNN is a supervised learning algorithm that applies distance calculations using training data and a pre-defined k value, and a clustering algorithm to find values nearest to k [42]. KNN has been widely used for regression and classification in various fields due to its simplicity and intuitiveness. Three studies [28,29,37] have applied it to predict municipal solid waste and CDW generation. KNN is considered more suitable for low-dimensional data than data with many input variables [28].

2.2.4. Decision Tree

DT algorithms predict the final dependent variable by constructing a complex decision-making process that combines several simple decision-making steps; this non-parametric model is used for both regression analysis and classification [43]. DT learning is a supervised predictive model that connects the observed and target values for each item and approximates a function by applying a series of hierarchical rules [44]. DT algorithms allow easy interpretation, have low computational costs [12], and can be applied to both numerical and categorical data, although their performance is reduced by overfitting. They have been applied to CDW prediction [37,45,46] and municipal solid waste prediction [47,48].

2.3. Hyperparameter Tuning

A model’s hyperparameters significantly affect its predictive performance, robustness, and generalizability [49]. We therefore performed HP tuning for the applied algorithms (DT, KNN, and LR) to optimize the models (Table 2). For the KNN model, HP tuning was conducted for the metrics (Euclidean, Manhattan, and Chebyshev), weighting (distance and uniform), and k values (also known as k_neighbors), and the model was optimized at different values depending on the metrics, weights, and k values (Figure 2). For the LR model, we considered ridge, lasso, and elastic net regularization in addition to the original LR, adjusting the regulation strength (alpha value) accordingly (Figure 3). For the DT model, we adjusted the maximum tree depth, minimum number of samples for each leaf, and split criteria to prevent overfitting and optimize the model (Figure 4).

2.4. Model Validation and Evaluation

To verify model performance, we applied leave-one-out cross-validation, a special case of k-fold cross-validation; this method can achieve more stable results than k-fold cross-validation for small datasets because it uses all samples for testing and training to ensure sufficient sample sizes [50,51,52,53,54,55].
To evaluate DWGR prediction model performance we used the MAE, RMSE, and R2:
M A E = i = 1 n | y i x i | n
R M S E = i = 1 n ( y i x i ) 2 n
R 2 = 1 i = 1 n ( y i x i ) 2 i = 1 n ( y i x ¯ i ) 2
where x i and y i are, respectively, the observed and predicted quantities of demolition waste generated; x ¯ i is the average of x i ; and n is the number of samples. The performance of the models is considered higher as the R2 value increases and the MAE and RMSE values decrease.

3. Results

3.1. Principal Component Analysis

We selected the thirteen PCs that explained 100% of the variance, with PC1 explaining 33.1% (Figure 5). These were used as new input variables to develop a hybrid machine-learning model for DWGR prediction. The variable coefficients (or eigenvectors) of PCs created by the PCA technique are shown in Table 3.

3.2. Input Variable Selection

We developed non-hybrid and hybrid models for DWGR prediction using the seven input variables and 13 PCs (Table 4). Among the non-hybrid models, the model using all seven input variables showed the best predictive performance, highlighting the importance of model testing. To generate the hybrid models, we used all seven input variables and thirteen PCs and selected the input variables that produced the best performance for each hybrid model.

3.3. Model Performance

Of the non-hybrid models, the KNN (Chebyshev distance) model performed the worst (R2 = 0.627, RMSE = 194.417, and MAE = 133.752), while the DT model performed the best (R2 = 0.872, RMSE = 113.976, and MAE = 71.432). The KNN (Euclidean uniform) model achieved intermediate performance (average predicted DWGR = 993.54 kg·m−2, R2 = 0.664, RMSE = 184.467, and MAE = 127.125) models (Figure 6). The KNN algorithm performed significantly worse than the LR and DT models.
Overall, the hybrid models (R2: 0.836–0.897) performed significantly better than the non-hybrid models, with the LR and KNN hybrids performing better in terms of R2, RMSE, and MAE. The KNN hybrid models performed significantly better than the LR hybrid, while the hybrid PCA−DT model (R2 = 0.849, RMSE = 123.538, MAE = 80.472) performed worse than the DT model alone (R2 = 0.872). The KNN hybrid models achieved the best predictive performance, with Euclidean uniform KNN ranking highest (PCA−KNN (Chebyshev distance): R2 = 0.872, RMSE = 114.005, and MAE = 72.255; PCA−KNN (Euclidean uniform): 991.80 kg·m−2, R2 = 0897, RMSE = 102.116, and MAE = 69.013). Based on these results, we selected the PCA–KNN (Euclidean uniform) model as the optimal model for DWGR prediction.
Compared to previously reported PCA hybrid models [24,31,32], our PCA hybrid models achieved better performance. Previously, the PCA−SVM and PCA−ANN hybrid models performed slightly better (R2 = 0.78 and R2 = 0.80, respectively) than SVM and ANN alone (R2 = 0.75 and R2 = 0.77, respectively) [32,56]. These findings indicate that the improvement in performance achieved by using PCA varies significantly depending on the algorithm used. Similarly, hybrid models combining CATPCA with SVM, RF, and ANN were tested, with the hybrid models performing significantly better for SVM but worse for RF [24].

3.4. Optimal Model Performance

Including PCA significantly improved DWGR predictive performance for the KNN algorithm, particularly for the Euclidean uniform method (Figure 7) with the PCA−KNN (Euclidean uniform) achieving more accurate predictions. Although many of the KNN (Euclidean uniform) model predictions fell outside the 20% error margin, most were within these margins. Although the KNN (Euclidean uniform) model’s DWGR predictions differed significantly from the observed values (Figure 8), they appear to provide a good approximation. This is supported by the fact that the KNN (Euclidean uniform) hybrid model has an error rate of 7% compared to the mean of the observed values, while for the non-hybrid model, this was 13%.

3.5. Importance of Input Variables in the Optimal Model

We examined the importance of the input variables used in the KNN (Euclidean uniform) and PCA−KNN (Euclidean uniform) models using Pearson’s correlation analysis (Table 5). For the non-hybrid model, the number of floors and the floor area were highly important, in addition to its usage, while location appeared to negatively affect DWGR. For the hybrid model, PC1, number of floors, and floor area were highly correlated with DWGR; PC1 provided a new input variable that significantly affects DWGR predictions. For the hybrid model, PC4 and PC2 negatively affected DWGR. For both the non-hybrid and hybrid models, the number of floors and the floor area were important factors.

4. Discussion and Recommendations

This study looked into the development of a hybrid ML model using PCA technology to enhance the performance of the DWGR prediction model for the DT, KNN, and LR algorithms. This study sets itself apart from previous studies that used PCs generated from PCA technology as the only variable input. For instance, previous studies that employed PCA to predict municipal solid and C&D waste generation, such as [3,24,31,32,57], utilized the limited PCs generated through PCA as input variables in the model. Cha et al. [24] developed PCA−SVM and PCA−ANN models using the six PCs generated from the six input variables of the raw data by applying CATPCA and presented the SVM (6PCs) model (R2 = 0.594) as the best predictive performance model. Lu et al. [3] applied PCA to a multi-linear regression model to predict construction waste generation and developed the PCA–MLR model by converting the five input variables (i.e., population, GDP per capita, total construction output, floor space of newly started buildings, and floor space) into PCA. A study by Lu et al. [3], demonstrated that PCA−MLR (5PCs) with 5 PCs applied revealed better performance than other PCA−MLRs. Noori et al. [31] developed the PCA–SVM model to predict MSW generation and proposed a 6 PCs−SVM model (R2 = 0.752) as the best prediction model. Noori et al. [32] compared the ANN model and the PCA−ANN model to show an improvement in performance with the PCA−ANN model. Based on that study, the ANN (7 PCs) model (R2 = 0.80) that applied 7 PCs out of 13 PCs generated the best performance. Shi et al. [57] conducted a study on the factors affecting MSW generation in Africa. This study applied PCA to five input variables (i.e., GDP per capita, geographical location, urbanization rate, legal integrity, and law enforcement time) and showed a high correlation with PC1 in 54 African countries. PC1 had large loads on the three input variables (i.e., GDP per capita, geographical location, and urbanization rate), and these variables were presented as key factors affecting MSW generation in African countries. Sun et al. [56] developed an MSW prediction model by applying PCA to the regression analysis and presented the PCA regression model that applied 3 PCs out of 7 PCs as the best performing model.
As mentioned above, previous studies utilized part or all of the PCs that were generated using PCA technology. On the other hand, our study developed a new hybrid ML model by combining the 13 PCs generated through PCA with the existing set of input variables (i.e., location, usage, structure, roof type, wall type, floor area, and number of floors). Through this method, we found that the KNN algorithm showed better predictive performance results than the LR and DT algorithms when combined with PCA. However, it should be noted that these results reflect the characteristics of the input variables used in this study (i.e., five categorical variables and two numerical variables). Additionally, as shown in Table 4, in order to develop an optimal performance ML prediction model, it is necessary to test the model by structuring various input variable groups using the PCs generated through PCA. In other words, empirical attempts through tests on various input variable sets and application of ML algorithms are vital to developing a hybrid ML model with superior predictive performance by applying PCA.

5. Conclusions

In this study, we developed three hybrid PCA−machine-learning models (using the DT, KNN, and LR algorithms) and optimized the hyperparameters to improve DWGR predictive performance. Among the non-hybrid models, DT achieved the best performance and KNN (Chebyshev distance) achieved the worst performance. Including PCA improved the predictive performance of the LR and KNN models, with the KNN (Ensemble uniform) hybrid model achieved the best performance.
It is typically difficult to collect DWGR data in the field and, particularly, sufficient data for machine-learning-model development. Using too few variables, or samples that are too small, limits the development of machine-learning models with high predictive performance. Based on our findings, PCA is effective for developing new input variable sets to supplement low-dimensional input data for DWGR prediction. Therefore, we expect that this method will assist companies and researchers in this field to overcome these obstacles and develop high-performance DWGR prediction models. Follow-up research on machine-learning algorithms, including empirical studies, will further improve the performance of machine-learning models for DWGR prediction.
Here, we include several observable limitations throughout the current study. Firstly, the ML model development process demonstrated in this study is unfavorable from a cost perspective, which warrants the development of a simpler and more cost-effective ML model to secure predictive performance in the future. In addition, the various compositions of demolition waste necessitate a waste management strategy during the development of the DWGR model that considers the different types of demolition waste.

Author Contributions

Conceptualization, methodology, validation, and supervision: G.-W.C. and S.-H.C.; Resources, writing—reviewing and editing, and funding acquisition: G.-W.C., S.-H.C., W.-H.H. and C.-W.P. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported in part by the National Research Foundation of Korea (NRF) grant, funded by the South Korean government (MSIT) (NRF-2019R1A2C1088446).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

All data included in this study are available upon request by contact with the corresponding author.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

DT, decision tree; DWGR, demolition-waste-generation rate; HP, hyperparameter; KNN, k-nearest neighbors; LR, linear regression; MAE, mean absolute error; PCA, principal component analysis; R2, coefficient of determination; RMSE, root mean square error; WG, waste generation.

References

  1. Zamani Joharestani, M.; Cao, C.; Ni, X.; Bashir, B.; Talebiesfandarani, S. PM2. 5 prediction based on random forest, XGBoost, and deep learning using multisource remote sensing data. Atmosphere 2019, 10, 373. [Google Scholar] [CrossRef]
  2. Ye, Z.; Yang, J.; Zhong, N.; Tu, X.; Jia, J.; Wang, J. Tackling environmental challenges in pollution controls using artificial intelligence: A review. Sci. Total Environ. 2020, 699, 134279. [Google Scholar] [CrossRef]
  3. Lu, W.; Lou, J.; Webster, C.; Xue, F.; Bao, Z.; Chi, B. Estimating construction waste generation in the Greater Bay Area, China using machine learning. Waste Manag. 2021, 134, 78–88. [Google Scholar] [CrossRef] [PubMed]
  4. World Health Organization. Hidden Cities: Unmasking and Overcoming Health Inequities in Urban Settings; World Health Organization: Geneva, Switzerland, 2010. [Google Scholar]
  5. Leão, S.; Bishop, I.; Evans, D. Spatial-temporal model for demand and allocation of waste landfills in growing urban regions. Comput. Environ. Urban Syst. 2004, 28, 353–385. [Google Scholar] [CrossRef]
  6. Adeyemi, A.S.; Olorunfemi, J.F.; Adewoye, T.O. Waste Scavenging in third world cities: A case study in Ilorin, Nigeria. Environmentalist 2001, 21, 93–96. [Google Scholar] [CrossRef]
  7. Esin, T.; Cosgun, N. A study conducted to reduce construction waste generation in Turkey. Build. Environ. 2007, 42, 1667–1674. [Google Scholar] [CrossRef]
  8. Kontokosta, C.E.; Hong, B.; Johnson, N.E.; Starobin, D. Using machine learning and small area estimation to predict building-level municipal solid waste generation in cities. Comput. Environ. Urban Syst. 2018, 70, 151–162. [Google Scholar] [CrossRef]
  9. Lu, W.; Yuan, H. Exploring critical success factors for waste management in construction projects of China. Resour. Conserv. Recycl. 2010, 55, 201–208. [Google Scholar] [CrossRef]
  10. Coelho, A.; de Brito, J. Influence of construction and demolition waste management on the environmental impact of buildings. Waste Manag. 2012, 32, 532–541. [Google Scholar] [CrossRef]
  11. Lu, W.; Tam, V. Construction waste management policies and their effectiveness in Hong Kong: A longitudinal review. Renew. Sustain. Energy Rev. 2013, 23, 214–223. [Google Scholar] [CrossRef]
  12. Kulatunga, U.; Amaratunga, D.; Haigh, R.; Rameezdeen, R. Attitudes and perceptions of construction workforce on construction waste in Sri Lanka. Manag. Environ. Qual. Int. J. 2006, 17, 57–72. [Google Scholar] [CrossRef]
  13. Song, Y.; Wang, Y.; Liu, F.; Zhang, Y. Development of a hybrid model to predict construction and demolition waste: China as a case study. Waste Manag. 2017, 59, 350–361. [Google Scholar] [CrossRef]
  14. Wu, H.; Zuo, J.; Zillante, G.; Wang, J.; Yuan, H. Status quo and future directions of construction and demolition waste research: A critical review. J. Cleaner Prod. 2019, 240, 118163. [Google Scholar] [CrossRef]
  15. López-Ruiz, L.A.; Roca-Ramón, X.; Gassó-Domingo, S. The circular economy in the construction and demolition waste sector— A review and an integrative model approach. J. Clean. Prod. 2020, 248, 119238. [Google Scholar] [CrossRef]
  16. Butera, S.; Christensen, T.H.; Astrup, T.F. Composition and leaching of construction and demolition waste: Inorganic elements and organic compounds. J. Hazard. Mater. 2014, 276, 302–311. [Google Scholar] [CrossRef] [PubMed]
  17. Lu, W.; Yuan, H.; Li, J.; Hao, J.J.; Mi, X.; Ding, Z. An empirical investigation of construction and demolition waste generation rates in Shenzhen City, South China. Waste Manag. 2011, 31, 680–687. [Google Scholar] [CrossRef] [PubMed]
  18. Ma, Z.; Shen, J.; Wang, C.; Wu, H. Characterization of Sustainable Mortar Containing High-Quality Recycled Manufactured Sand Crushed from Recycled Coarse Aggregate. Cem. Concr. Compos. 2022, 132, 104629. [Google Scholar] [CrossRef]
  19. Wu, H.; Xu, J.; Yang, D.; Ma, Z. Utilizing thermal activation treatment to improve the properties of waste cementitious powder and its newmade cementitious materials. J. Clean. Prod. 2021, 322, 129074. [Google Scholar] [CrossRef]
  20. Abdallah, M.; Talib, M.A.; Feroz, S.; Nasir, Q.; Abdalla, H.; Mahfood, B. Artificial intelligence applications in solid waste management: A systematic research review. Waste Manag. 2020, 109, 231–246. [Google Scholar] [CrossRef]
  21. Abbasi, M.; Abduli, M.A.; Omidvar, B.; Baghvand, A. Forecasting municipal solid waste generation by hybrid support vector machine and partial least square model. Int. J. Environ. Res. 2013, 7, 27–38. [Google Scholar]
  22. Abbasi, M.; Abduli, M.A.; Omidvar, B.; Baghvand, A. Results uncertainty of support vector machine and hybrid of wavelet transform-support vector machine models for solid waste generation forecasting. Environ. Prog. Sustain. Energy. 2014, 33, 220–228. [Google Scholar] [CrossRef]
  23. Soni, U.; Roy, A.; Verma, A.; Jain, V. Forecasting municipal solid waste generation using artificial intelligence models—A case study in India. SN Appl. Sci. 2019, 1, 162. [Google Scholar] [CrossRef]
  24. Cha, G.W.; Moon, H.J.; Kim, Y.C. A hybrid machine-learning model for predicting the waste generation rate of building demolition projects. J. Cleaner Prod. 2022, 375, 134096. [Google Scholar] [CrossRef]
  25. Cochran, K.; Townsend, T.; Reinhart, D.; Heck, H. Estimation of regional building-related C&D debris generation and composition: Case study for Florida, US. Waste Manag. 2007, 7, 921–931. [Google Scholar] [CrossRef]
  26. Kartam, N.; Al-Mutairi, N.; Al-Ghusain, I.; Al-Humoud, J. Environmental management of construction and demolition waste in Kuwait. Waste Manag. 2004, 24, 1049–1059. [Google Scholar] [CrossRef] [PubMed]
  27. Martínez-Lage, I.; Martínez-Abella, F.; Vázquez-Herrero, C.; Pérez-Ordóñez, J.L. Estimation of the annual production and composition of C&D Debris in Galicia (Spain). Waste Manag. 2010, 30, 636–645. [Google Scholar] [CrossRef]
  28. Nguyen, X.C.; Nguyen, T.T.H.; La, D.D.; Kumar, G.; Rene, E.R.; Nguyen, D.D.; Chang, S.W.; Chung, W.J.; Nguyen, X.H.; Nguyen, V.K. Development of machine learning-based models to forecast solid waste generation in residential areas: A case study from Vietnam. Resour. Conserv. Recycl. 2021, 167, 105381. [Google Scholar] [CrossRef]
  29. Abbasi, M.; El Hanandeh, A. Forecasting municipal solid waste generation using artificial intelligence modelling approaches. Waste Manag. 2016, 56, 13–22. [Google Scholar] [CrossRef]
  30. Çamdevýren, H.; Demýr, N.; Kanik, A.; Keskýn, S. Use of principal component scores in multiple linear regression models for prediction of Chlorophyll-a in reservoirs. Ecol. Model. 2005, 181, 581–589. [Google Scholar] [CrossRef]
  31. Noori, R.; Abdoli, M.A.; Ameri-Ghasrodashti, A.; Jalili-Ghazizadea, M. Prediction of municipal solid waste generation with combination of support vector machine and principal component analysis: A case study of Mashhad. Env. Progress Sustain. Energy. 2008, 28, 249–258. [Google Scholar] [CrossRef]
  32. Noori, R.; Karbassi, A.; Salman Sabahi, M.S. Evaluation of PCA and gamma test techniques on ANN operation for weekly solid waste prediction. J. Environ. Manag. 2010, 91, 767–771. [Google Scholar] [CrossRef] [PubMed]
  33. Khikmah, L.; Wijayanto, H.; Syafitri, U.D. Iop In modeling governance KB with CATPCA to overcome multicollinearity in the logistic regression. In Proceedings of the 3rd International Conference on Mathematics, Science and Education (ICMSE), Semarang, Indonesia, 3–4 September 2016. [Google Scholar]
  34. Saukani, N.; Ismail, N.A. Identifying the Components of Social Capital by Categorical Principal Component Analysis (CATPCA). Soc. Indic. Res. 2019, 141, 631–655. [Google Scholar] [CrossRef]
  35. Abdi, H. The method of least squares. In Encyclopedia of Measurement and Statistics; SAGE Publishing: Thousand Oaks, CA, USA, 2007. [Google Scholar]
  36. Azadi, S.; Karimi-Jashni, A. Verifying the performance of artificial neural network and multiple linear regression in predicting the mean seasonal municipal solid waste generation rate: A case study of Fars province, Iran. Waste Manag. 2016, 48, 14–23. [Google Scholar] [CrossRef]
  37. Cha, G.W.; Choi, S.H.; Hong, W.H.; Park, C.W. Development of machine learning model for prediction of demolition waste generation rate of buildings in redevelopment areas. Int. J. Environ. Res. Public Health 2023, 20, 107. [Google Scholar] [CrossRef]
  38. Chhay, L.; Reyad, M.A.H.; Suy, R.; Islam, M.R.; Mian, M.M. Municipal solid waste generation in China: Influencing factor analysis and multi-model forecasting. J. Mater. Cycles Waste Manag. 2018, 20, 1761–1770. [Google Scholar] [CrossRef]
  39. Fu, H.Z.; Li, Z.S.; Wang, R.H. Estimating municipal solid waste generation by different activities and various resident groups in five provinces of China. Waste Manag. 2015, 41, 3–11. [Google Scholar] [CrossRef]
  40. Golbaz, S.; Nabizadeh, R.; Sajadi, H.S. Comparative study of predicting hospital solid waste generation using multiple linear regression and artificial intelligence. J. Environ. Health Sci. Eng. 2019, 17, 41–51. [Google Scholar] [CrossRef]
  41. Kumar, A.; Samadder, S.R. An empirical model for prediction of household solid waste generation rate–A case study of Dhanbad, India. Waste Manag. 2017, 68, 3–15. [Google Scholar] [CrossRef] [PubMed]
  42. Guo, G.; Wang, H.; Bell, D.A.; Bi, Y.; Greer, K. KNN Model-Based Approach in Classification. In Lecture Notes in Computer Science; Springer: Berlin/Heidelberg, Germany, 2003; Volume 2888, pp. 986–996. [Google Scholar] [CrossRef]
  43. Safavian, S.R.; Landgrebe, D. A survey of decision tree classifier methodology. IEEE Trans. Syst. Man Cybern. 1991, 21, 660–674. [Google Scholar] [CrossRef]
  44. Hastie, T.J.; Tibshirani, R.J.; Friedman, J.H. The Elements of Statistical Learning: Data Mining, Inference, and Prediction; Springer: New York, NY, USA, 2009. [Google Scholar]
  45. Cha, G.W.; Kim, Y.C.; Moon, H.J.; Hong, W.H. New approach for forecasting demolition waste generation using chi-squared automatic interaction detection (CHAID) method. J. Cleaner Prod. 2017, 168, 375–385. [Google Scholar] [CrossRef]
  46. Kuritcyn, P.; Anding, K.; Linß, E.; Latyev, S.M. Increasing the safety in recycling of construction and demolition waste by using supervised machine learning. J. Phys. Conf. Ser. 2015, 588, 012035. [Google Scholar] [CrossRef]
  47. Huang, X.; Xu, X. Legal regulation perspective of eco-efficiency construction waste reduction and utilization. Urban Dev. Stud. 2011, 9, 90–94. [Google Scholar]
  48. Kannangara, M.; Dua, R.; Ahmadi, L.; Bensebaa, F. Modeling and prediction of regional municipal solid waste generation and diversion in Canada using machine learning approaches. Waste Manag. 2018, 74, 3–15. [Google Scholar] [CrossRef]
  49. Shawi, R.E.; Maher, M.; Sakr, S. Automated machine learning: State-of-the-art and open challenges. arXiv 2019, arXiv:1906.02287. [Google Scholar] [CrossRef]
  50. Cha, G.W.; Moon, H.J.; Kim, Y.M.; Hong, W.H.; Hwang, J.H.; Park, W.J.; Kim, Y.C. Development of a prediction model for demolition waste generation using a random forest algorithm based on small datasets. Int. J. Environ. Res. Public Health 2020, 17, 6997. [Google Scholar] [CrossRef]
  51. Cha, G.W.; Moon, H.J.; Kim, Y.C. Comparison of random forest and gradient boosting machine models for predicting demolition waste based on small datasets and categorical variables. Int. J. Environ. Res. Public Health 2021, 18, 8530. [Google Scholar] [CrossRef]
  52. Cheng, H.; Garrick, D.J.; Fernando, R.L. Efficient strategies for leave-one-out cross validation for genomic best linear unbiased prediction. J. Anim. Sci. Biotechnol. 2017, 8, 38. [Google Scholar] [CrossRef]
  53. Shao, Z.; Er, M.J. Efficient leave-one-out cross-validation-based regularized extreme learning machine. Neurocomputing. 2016, 194, 260–270. [Google Scholar] [CrossRef]
  54. Cheng, J.; Dekkers, J.C.; Fernando, R.L. Cross-validation of best linear unbiased predictions of breeding values using an efficient leave-one-out strategy. J. Anim. Breed. Genet. 2021, 138, 519–527. [Google Scholar] [CrossRef] [PubMed]
  55. Wong, T.T. Performance evaluation of classification algorithms by k-fold and leave-one-out cross validation. Pattern Recognit. 2015, 48, 2839–2846. [Google Scholar] [CrossRef]
  56. Sun, N.C.S.; Chungpaibulpatana, S. Development of an appropriate model for forecasting municipal solid waste generation in Bangkok. Energy Procedia 2017, 138, 907–912. [Google Scholar] [CrossRef]
  57. Shi, Y.; Wang, Y.; Yue, Y.; Zhao, J.; Maraseni, T.; Qian, G. Unbalanced Status and Multidimensional Influences of Municipal Solid Waste Management in Africa. Chemosphere 2021, 281, 130884. [Google Scholar] [CrossRef] [PubMed]
Figure 1. Research workflow used in developing the demolition-waste-generation rate (DWGR) predictive model.
Figure 1. Research workflow used in developing the demolition-waste-generation rate (DWGR) predictive model.
Ijerph 20 03159 g001
Figure 2. K-nearest neighbor (KNN) model performance according to hyperparameter values. (a) Manhattan distance, (b) Euclidean distance, (c) Chebyshev distance, (d) Manhattan uniform, (e) Euclidean uniform, and (f) Chebyshev uniform. The red diamond represents the hyperparameter value representing the optimal performance.
Figure 2. K-nearest neighbor (KNN) model performance according to hyperparameter values. (a) Manhattan distance, (b) Euclidean distance, (c) Chebyshev distance, (d) Manhattan uniform, (e) Euclidean uniform, and (f) Chebyshev uniform. The red diamond represents the hyperparameter value representing the optimal performance.
Ijerph 20 03159 g002
Figure 3. Linear regression model performance according to hyperparameter values. (a) Ridge, (b) Lasso, and (c) Elastic net regularization. The red diamond represents the hyperparameter value representing the optimal performance.
Figure 3. Linear regression model performance according to hyperparameter values. (a) Ridge, (b) Lasso, and (c) Elastic net regularization. The red diamond represents the hyperparameter value representing the optimal performance.
Ijerph 20 03159 g003
Figure 4. Decision tree model performance according to hyperparameter values (the minimum number of samples for each leaf [minimum samples leaf], split criteria, and maximum tree depth). The red diamond represents the hyperparameter value representing the optimal performance.
Figure 4. Decision tree model performance according to hyperparameter values (the minimum number of samples for each leaf [minimum samples leaf], split criteria, and maximum tree depth). The red diamond represents the hyperparameter value representing the optimal performance.
Ijerph 20 03159 g004
Figure 5. Proportion of variance explained by the number of principal components used.
Figure 5. Proportion of variance explained by the number of principal components used.
Ijerph 20 03159 g005
Figure 6. Performance of non-hybrid and hybrid machine-learning models for DWGR prediction.
Figure 6. Performance of non-hybrid and hybrid machine-learning models for DWGR prediction.
Ijerph 20 03159 g006
Figure 7. Observed and predicted demolition-waste-generation rate values, comparing the KNN (Euclidean uniform) and PCA−KNN (Euclidean uniform) models. Gray lines: ±20% error on the observed values.
Figure 7. Observed and predicted demolition-waste-generation rate values, comparing the KNN (Euclidean uniform) and PCA−KNN (Euclidean uniform) models. Gray lines: ±20% error on the observed values.
Ijerph 20 03159 g007
Figure 8. Comparison of prediction results by KNN (Euclidean uniform) and PCA−KNN (Euclidean uniform).
Figure 8. Comparison of prediction results by KNN (Euclidean uniform) and PCA−KNN (Euclidean uniform).
Ijerph 20 03159 g008
Table 1. Building description and statistical summary of the variables used.
Table 1. Building description and statistical summary of the variables used.
CategoryNumber of BuildingsTotal Demolition Waste Generation (kg)DWGR (kg·m−2)GFA (m2)
MinMeanMax
LocationProject A816,072,522534.8741.61137.38301.0
Project B7918,194,868795.21238.91729.113,627.3
UsageResidential13517,875,051534.8938.01629.416,994.4
Mixed-use (residential & commercial)256,392,340750.91253.01729.14933.8
StructureReinforced Concrete3512,573,029795.21430.41637.08652.1
Concrete block816,360,974534.8854.61180.57539.8
Concrete brick153,050,583717.01452.11729.12500.4
Wood292,282,804590.5755.4883.03236.0
Wall typeBrick324,727,204590.5995.91729.14556.0
Block12118,975,282534.81000.81637.016,579.0
Soil7564,904668.0711.8759.84556.0
Roof typeSlab379,062,146717.01228.31729.17003.5
Slab and roofing tile336,939,655931.01213.21637.05080.8
Slab & slate3963,912813.51310.31614.8719.6
Slate13841,899534.8599.7681.81384.9
Roofing tile746,459,779580.0820.81576.57739.5
Table 2. Hyperparameters considered for developing machine-learning predictive models.
Table 2. Hyperparameters considered for developing machine-learning predictive models.
Machine-Learning AlgorithmsHyperparameters
TitleTested ValuesSelected
KNNEuclideandistanceK neighborsRange (1, 20)5
Manhattan4
Chebyshev12
Euclideanuniform4
Manhattan4
Chebyshev11
LRRidgealphaRange (0.0001, 1000)0.5
Lasso0.8
Elastic net0.6
DTMin samples leafRange (1, 10)2
Split criteriaRange (1, 10)1
Max depthRange (1, 15)4
KNN: k-nearest neighbor; LR: linear regression; DT: decision-tree; Min samples leaf: the minimum number of samples for each leaf; Max depth: the maximum tree depth.
Table 3. Results of principal component analysis.
Table 3. Results of principal component analysis.
VariablesLoading of Variables
PC 1PC 2PC 3PC 4PC 5PC 6PC 7PC 8PC 9PC 10PC 11PC 12PC 13
Location_project A0.29−0.280.00−0.29−0.040.050.020.170.36−0.260.13−0.080.02
Location_project B−0.290.280.000.290.04−0.05−0.02−0.17−0.360.26−0.130.08−0.02
Floor area0.31−0.100.320.22−0.01−0.11−0.03−0.11−0.22−0.20−0.19−0.45−0.63
Usage_residential−0.30−0.150.170.01−0.27−0.240.390.21−0.02−0.11−0.110.040.00
Usage_residential & commercial0.300.15−0.17−0.010.270.24−0.39−0.210.020.110.11−0.040.00
Structure_concrete block−0.24−0.24−0.43−0.020.100.06−0.120.180.110.10−0.220.11−0.43
Structure_concrete brick0.210.27−0.33−0.07−0.13−0.070.060.16−0.51−0.450.05−0.190.29
Structure_reinforced concrete0.28−0.190.360.220.06−0.100.03−0.070.120.30−0.27−0.120.49
Structure_wood−0.140.310.42−0.15−0.100.090.08−0.280.11−0.110.530.14−0.19
Wall type_block−0.13−0.44−0.010.170.16−0.17−0.15−0.09−0.21−0.100.360.020.09
Wall type_brick0.170.39−0.16−0.15−0.01−0.010.38−0.170.260.07−0.32−0.02−0.10
Wall type_soil−0.070.180.33−0.06−0.320.38−0.440.52−0.080.07−0.150.000.00
Roof type_roofing tile−0.270.160.18−0.180.41−0.26−0.240.000.12−0.32−0.24−0.060.09
Roof type_slab0.290.14−0.130.16−0.17−0.360.030.340.050.400.360.01−0.15
Roof type_slab/roofing tile0.06−0.330.00−0.41−0.290.290.10−0.35−0.360.19−0.110.07−0.01
Roof type_slab/slate0.09−0.030.140.150.540.480.480.34−0.18−0.020.090.11−0.05
Roof type_slate−0.10−0.02−0.180.61−0.320.370.02−0.180.31−0.310.01−0.070.09
Number of floors0.36−0.020.100.15−0.06−0.14−0.10−0.03−0.08−0.27−0.200.82−0.10
Table 4. Input variables used to develop the non-hybrid and hybrid machine-learning predictive models.
Table 4. Input variables used to develop the non-hybrid and hybrid machine-learning predictive models.
ModelInput Variables
Non-hybridLRlocation, usage, structure, wall type, roof type, number of floors, floor area
LR (ridge)
LR (lasso)
LR (elastic net)
KNN (Euclidean distance)
KNN (Manhattan distance)
KNN (Chebyshev distance)
KNN (Euclidean uniform)
KNN (Manhattan uniform)
KNN (Chebyshev uniform)
DT
HybridLRPC 1, 2, 3, 4, 6, 8, 9, 10, 11, 13
LR (ridge)PC 1, 2, 3, 4, 6, 8, 9, 10, 11, 13
LR (lasso)PC 1, 2, 3, 4, 6, 8, 9, 10, 11, 13, location, number of floors
LR (elastic net)PC 1, 2, 3, 4, 6, 8, 9, 10, 11, 13, location, number of floors
KNN (Euclidean distance)PC 1, 2, location, structure, wall type, floor area
KNN (Manhattan distance)PC 1, 2, 4, location, wall type, structure, floor area
KNN (Chebyshev distance)PC 1, 2, 4, wall type, structure
KNN (Euclidean uniform)PC 1, 2, 4, structure, wall type, floor area, number of floors
KNN (Manhattan uniform)PC 1, 2, 4, location, wall type, structure, floor area, number of floors
KNN (Chebyshev uniform)PC 1,2,4, location, structure, wall type, floor area
DTPC 1, 2, 5, 10, 13
KNN: K-nearest neighbor; LR: linear regression; DT: decision-tree; PC: principal component.
Table 5. Pearson’s correlation analysis of the input variables used to develop the KNN (Euclidean uniform) and PCA−KNN (Euclidean uniform) models.
Table 5. Pearson’s correlation analysis of the input variables used to develop the KNN (Euclidean uniform) and PCA−KNN (Euclidean uniform) models.
Model TypeInput VariablesPearson’s Correlation
KNN (Euclidean uniform)number of floors0.782
floor area0.747
usage0.359
structure0.172
roof type0.107
wall type−0.130
location−0.782
PCA−KNN (Euclidean uniform)PC10.783
number of floors0.782
floor area0.747
structure0.172
PC4−0.117
wall type−0.130
PC2−0.377
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cha, G.-W.; Choi, S.-H.; Hong, W.-H.; Park, C.-W. Developing a Prediction Model of Demolition-Waste Generation-Rate via Principal Component Analysis. Int. J. Environ. Res. Public Health 2023, 20, 3159. https://doi.org/10.3390/ijerph20043159

AMA Style

Cha G-W, Choi S-H, Hong W-H, Park C-W. Developing a Prediction Model of Demolition-Waste Generation-Rate via Principal Component Analysis. International Journal of Environmental Research and Public Health. 2023; 20(4):3159. https://doi.org/10.3390/ijerph20043159

Chicago/Turabian Style

Cha, Gi-Wook, Se-Hyu Choi, Won-Hwa Hong, and Choon-Wook Park. 2023. "Developing a Prediction Model of Demolition-Waste Generation-Rate via Principal Component Analysis" International Journal of Environmental Research and Public Health 20, no. 4: 3159. https://doi.org/10.3390/ijerph20043159

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop