Hybrid Intelligent Model for Estimating the Cost of Huizhou Replica Traditional Vernacular Dwellings

Huang, Jian; Huang, Wei; Quan, Wei; Xing, Yandong

doi:10.3390/buildings14092623

Open AccessArticle

Hybrid Intelligent Model for Estimating the Cost of Huizhou Replica Traditional Vernacular Dwellings

by

Jian Huang

^1,2,*

,

Wei Huang

¹,

Wei Quan

¹ and

Yandong Xing

¹

School of Architecture and Civil Engineering, Huangshan University, Huangshan 245041, China

²

Anhui Institute for the Preservation and Inheritance of Hui-Style Architecture, Huangshan University, Huangshan 245041, China

^*

Author to whom correspondence should be addressed.

Buildings 2024, 14(9), 2623; https://doi.org/10.3390/buildings14092623

Submission received: 23 July 2024 / Revised: 14 August 2024 / Accepted: 22 August 2024 / Published: 24 August 2024

(This article belongs to the Special Issue Intelligence Techniques Applied in Infrastructure, Engineering and Construction)

Download

Browse Figures

Versions Notes

Abstract

:

Amidst the backdrop of rural revitalization and cultural renaissance, there is a surge in the construction demand for replica traditional vernacular dwellings. Traditional cost estimation methods struggle to meet the need for rapid and precise estimation due to the complexity inherent in their construction. To address this challenge, this study aims to enhance the accuracy and efficiency of cost estimation by innovatively developing an Adaptive Self-Explanatory Convolutional Neural Network (ASCNN) model, tailored to meet the specific cost estimation needs of replica traditional vernacular dwellings in the Huizhou region. The ASCNN model employs a Random Forest model to filter key features, inputs these into the CNN for cost estimation, and utilizes Particle Swarm Optimization (PSO) to optimize parameters, thereby improving predictive accuracy. The decision-making process of the model is thoroughly interpreted through SHAP value analysis, ensuring credibility and transparency. During the construction of the ASCNN model, this study collected and analyzed bidding control price data from 98 replica traditional vernacular dwellings. The empirical results demonstrate that the ASCNN model exhibits outstanding predictive performance on the test set, with a Root Mean Square Error (RMSE) of 9828.06 yuan, a Mean Absolute Percentage Error (MAPE) of 0.6%, and a Coefficient of Determination (R²) as high as 0.989, confirming the model’s high predictive accuracy and strong generalization capability. Through SHAP value analysis, this study further identifies key factors such as floor plan layout, roof area, and column material coefficient that are central to cost prediction. The ASCNN model proposed in this study not only significantly improves the accuracy of cost estimation for Huizhou replica traditional vernacular dwellings, but also enhances its transparency and credibility through model interpretation methods, providing a reliable basis for related investment decisions. The findings of this study also offer valuable references and insights for rapid and precise cost estimation of replica buildings in other regions worldwide.

Keywords:

replica traditional vernacular dwellings; cost estimation; convolutional neural network; particle swarm optimization; SHAP value analysis

1. Introduction and Literature Review

Replicas of traditional vernacular dwellings, constructed with either traditional or modern materials that embody cultural characteristics, differ significantly from the restoration of historical buildings [1]. These replicas are not only essential carriers of “cultural confidence” for the Chinese people but also play a crucial role in promoting “rural revitalization”. In recent years, driven by the dual benefits of cultural and rural revitalization policies, investments in constructing replicas of ancient architecture and gardens have surged across China.

The cost estimation of replica antique buildings is particularly challenging due to their complex designs, their intricate construction processes, and the difficulties in measurement and pricing. These factors often lead to the “three excesses” phenomenon, where the preliminary budget exceeds the estimate, the detailed budget exceeds the preliminary budget, and the final accounting exceeds the detailed budget [2]. Given that 75% to 90% of the costs in ancient replica construction projects are determined during the decision-making and design phases [3], rapid and accurate cost estimation is crucial to prevent the “three excesses” phenomenon. However, traditional cost estimation methods, such as the unit index method, expert experience method, and budget quota method, face issues like lagging index updates, insufficient experience, and low efficiency, which struggle to adapt to the complexity of cost composition in ancient replica construction [4]. These complexities are influenced by various factors such as architectural style, material selection, construction techniques, and structural types, which may involve nonlinear relationships with costs, leading to insufficient accuracy of traditional methods [5]. Therefore, there is an urgent need to develop new computational methods to improve the accuracy and efficiency of cost estimation in ancient replica construction projects to meet their specific requirements.

Artificial intelligence (AI) is gradually becoming a research hotspot in the field of engineering costs and has become a powerful tool for solving complex problems [6]. Compared with traditional methods, machine learning has a clear advantage in handling large amounts of data and discovering nonlinear relationships among data. In engineering cost estimation, AI methods are used to analyze historical cases, identify complex relationships between cost-influencing factors, and provide more accurate and reliable predictive results [7]. These methods include Backpropagation Neural Networks (BPNNs) [8,9,10], Support Vector Machines (SVMs) [11,12], Decision Trees (DTs) [13], Case-Based Reasoning (CBR) [14,15,16], and Ensemble Algorithms [17]. Studies have shown that these methods have been effectively verified in cost prediction applications for roads, tunnels, bridges, residential buildings, and public buildings, but there is little research in the field of ancient replica construction.

In the field of construction cost estimation, Deep Artificial Neural Networks (DNNs) have attracted attention for their ability to capture complex relationships among project variables, showing greater potential and value than traditional learning methods [18]. Even with limited data, DNNs can achieve highly accurate engineering valuation predictions [19,20,21]. For instance, Li et al. [22] introduced a construction cost estimation approach based on DNNs, which incorporates engineering characteristics and bill of quantities as inputs, while predicting the total bid price and associated taxes as outputs. This method demonstrated the DNNs model’s potential to significantly improve the accuracy of construction cost forecasts, achieving a relative error of just 4.203% in predicting the total price, with relative errors for the composite unit prices V1 and V2 being 2.98% and 4.52%, respectively.

Convolutional Neural Networks (CNNs), proposed by Yann LeCun and others as a type of deep neural network [23], have achieved significant success in the field of image recognition and processing. CNNs’ ability to automatically learn and extract features from raw data is particularly important in the data-intensive and diverse field of construction engineering. Xue et al. [24] utilized a CNN algorithm to perform an in-depth analysis of cost prediction for expressway projects during the conceptual design phase. The study’s findings demonstrate the superior applicability and accuracy of the CNN model in handling the high-dimensional nonlinear complexities of expressway cost estimation, outperforming conventional models. Yi et al. [25] applied the Maximum Information Coefficient (MIC) to filter key indicators, combining it with CNNs to develop a cost prediction model tailored for civil engineering projects involving mountainous high-speed railways. By optimizing model parameters, the MIC-CNN model not only achieved a low average relative error of 5.476% but also exhibited a minimal prediction fluctuation of just 1.045%, outperforming the traditional CNN, BPNN, and Adaboost-SVR models in both accuracy and stability. To address the challenges of high-dimensional feature data processing and nonlinear relationships, researchers often integrate multiple predictive models within machine learning frameworks to enhance forecasting accuracy. For instance, Han et al. [26] developed an advanced construction cost estimation model, the NGO-CNN-SVM, specifically for high-standard farmland projects. This model integrates CNN with Support Vector Machines (SVMs) and is optimized using the Northern Goshawk Optimization (NGO) algorithm. It demonstrated exceptional predictive accuracy on 120 construction cost datasets, particularly in bridge and culvert projects, with an R² exceeding 0.970 and a relative error below 3.548%. The NGO-CNN-SVM model outperformed traditional neural networks and other hybrid models, underscoring the effectiveness of deep learning in enhancing the precision and reliability of construction cost predictions. These studies collectively suggest that the application of deep learning technologies significantly enhances the accuracy and reliability of construction cost forecasting, providing an efficient and scientifically robust tool for construction economic analysis. Nevertheless, research on the application of CNN in cost prediction in the field of ancient replica engineering is still scarce, indicating that this area needs further exploration and development.

In AI models, feature selection is crucial for ensuring model stability and reducing computational costs [27]. In response to the complexity of factors affecting the cost of ancient replica construction, this study uses the Random Forest (RF) algorithm for preliminary feature screening to improve model efficiency [28]. The performance of CNNs in visual recognition and data processing tasks depends on the selection of hyperparameters [29]. To optimize these parameters, this study introduces the Particle Swarm Optimization (PSO) algorithm, a heuristic algorithm effective in searching for optimal solutions in high-dimensional spaces [30]. PSO, which simulates social behavior, avoids getting trapped in local minima without the need for gradient information, significantly enhancing model performance [31].

The “black box” nature of deep learning models can lead to unpredictable results and biases [32], especially when applied to assist in decision-making in the construction engineering field. Therefore, ensuring the fairness and transparency of the model is crucial for effective and responsible application [33]. SHAP (Shapley Additive exPlanations) provides a way to evaluate the marginal contribution of predictors by calculating SHAP values to measure the contribution of features, offering global and local explanations of the model [34,35]. In this study, SHAP value analysis not only enhances the credibility of the model but also provides clear decision support for stakeholders, verifying the model’s ability to capture the association between covariates and cost.

This study proposes an Adaptive Self-Explanatory Convolutional Neural Network (ASCNN) model that integrates PSO, CNN, and SHAP interpretive analysis, taking replica traditional vernacular dwellings in the Huizhou region of China as a case study. The ASCNN model uses a CNN to establish a cost prediction model, a PSO to optimize network structural parameters, and SHAP to provide interpretability of model decisions. The structure of this paper is as follows: first, it introduces the construction process and working principle of the ASCNN model in detail. Then, it verifies and analyzes the ASCNN model through empirical research. Finally, it presents conclusions and discusses the potential application value and advantages of the ASCNN model in the cost estimation of replica traditional vernacular dwellings, providing references and suggestions for research and practice in related fields.

2. Methodology

2.1. Convolutional Neural Network (CNN) Algorithm

CNNs utilize convolutional layers, activation layers, pooling layers, and fully connected layers to extract spatial features from input data and perform classification or regression tasks [36]. In this paper, the CNN executes feature extraction and task execution through the following steps:

(1): Convolutional Layer: Uses convolution kernels to extract local features.
(2): Activation Layer: Introduces nonlinearity, typically employing the ReLU function to enhance the network’s learning capability.
(3): Pooling Layer: Reduces the dimensionality of features and strengthens their invariance, with max pooling adopted in this paper (Equation (1)).
(4): Fully Connected Layer: Maps the extracted features to the final output (Equation (2)). The architecture and process of the CNN used in this paper are detailed in Figure 1.

{m a x p o o l i n g}_{(i, j)}^{(k, l)} = {m a x}_{u, v} (m a x (0, (\sum_{c = 1}^{C} \sum_{u = 1}^{h} \sum_{v = 1}^{w} ω_{(c, u, v)}^{(k, l)} \cdot x_{(c, i \cdot s + u - 1, j \cdot s + v - 1)}^{(l - 1)} + b^{(k, l)})))

(1)

y = \emptyset (W \cdot X + b)

(2)

where x is the input data, i and j are the vertical and horizontal coordinates of the input features, k is the index of the convolutional kernel, l is the index of the convolutional group layer, C represents the number of channels, h and w represent the height and width of the input sample,

ω

is the weight of the convolutional kernel, u and v represent the height and width of the convolutional kernel (respectively), s denotes the pooling stride, which serves as the input to the subsequent convolutional layer, X is the one-dimensional input feature that has been flattened from the higher-dimensional feature maps, W is the weight matrix of the Fully Connected Layer, b is the bias term, and

\emptyset

is the activation function of the output layer, which is still set as the ReLU function.

The parameter optimization of the CNN in this paper integrates the BPNN with the Adaptive Moment Estimation (Adam) algorithm for learning rate optimization. Adam dynamically adjusts individual parameter learning rates based on first and second moment estimates, facilitating faster and more stable convergence in noisy environments or with sparse gradients [37]. For regression tasks with numerical targets, the Mean Squared Error (MSE) is chosen as the loss function (Equation (3)) to accurately measure the deviation between predicted and actual values.

M S E = \frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}

(3)

where n is the number of samples,

y_{i}

is the actual value, and

{\hat{y}}_{i}

is the predicted value.

2.2. Particle Swarm Optimization (PSO)

PSO is a swarm intelligence optimization method that simulates the foraging behavior of bird flocks [38]. The algorithm progressively approximates the optimal solution through collaboration and information sharing among particles within the search space. The specific process is illustrated in Figure 2, as follows:

(1) Initialize the particle swarm: Randomly generate the initial positions and velocities of particles, recording the current position of each particle and the best position in the particle swarm as the global optimum;

(2) Calculate the fitness value: Assess the fitness of each particle using the CNN prediction accuracy.

(3) Update the personal best position (pbest): If the current particle’s fitness is better than its historical best, update the personal best position.

(4) Update the global best position (gbest): If the current particle’s fitness is superior to the global optimum, update the global best position.

(5) Update particle velocity and position: Adjust the particle’s state based on personal best and global best through velocity update and position update formulas (Equations (4) and (5)), where the inertia weight, individual learning factor, and social learning factor collectively influence the new position and velocity of the particles.

Particle position update formula:

x_{i} (t + 1) = x_{i} (t) + v_{i} (t + 1)

(4)

Particle velocity update formula:

v_{i} = ω \cdot v_{i} (t) + c_{1} \cdot r_{1} \cdot ({p b e s t}_{i} - x_{i} (t)) + c_{2} \cdot r_{2} \cdot ({g b e s t}_{i} - x_{i} (t))

(5)

where

ω

is the inertia weight, controlling the size of the particle’s motion inertia, c₁ and c₂ are the individual learning factor and the social learning factor, respectively, controlling the influence of individual and group information on the particle’s movement, r₁ and r₂ are random numbers, used to add randomness,

v_{i} (t)

is particle velocity, and

x_{i} (t)

is particle position.

(6) Assess if the stopping criteria have been met: The algorithm halts and outputs the optimal solution if the change in fitness value is below the preset threshold; otherwise, it returns to step 2 to continue the iteration process.

2.3. SHAP Algorithm

The SHapley Additive exPlanations (SHAP) algorithm provides a method for interpreting model predictions by decomposing them into the individual and interaction contributions of features [35]. The specific computational steps are as follows: (1) Define feature subsets: considering all possible combinations of feature subsets S; (2) Calculate marginal contributions: for each subset S, compute the prediction difference before and after the feature is added to S (Equation (6)); (3) Weighted average of marginal contributions: calculate the weighted average of all marginal contributions based on the size of the subset and the total number of features (Equation (7)); (4) Sum Shapley values: sum the Shapley values of all features to obtain the contribution of each feature to the prediction.

M = f (x_{s ⋃ {j}}) - f (x_{s})

(6)

{S H A P}_{j} = \sum_{\begin{matrix} S \subseteq P \\ j \notin S \end{matrix}} \frac{|S|}{|P|} M

(7)

where P is the set of all features, S is a subset of P excluding j, M is the marginal contribution of feature j to the prediction,

f (x_{s ⋃ {j}})

is the prediction with feature j included in set S,

f (x_{S})

is the prediction with feature j excluded from set S,

|S|

is the size of set S, and

|P|

is the size of set P.

2.4. ASCNN for Huizhou Ancient Replica Architecture Cost Estimation

Selecting and adjusting network architecture parameters (such as the number of layers, neurons, convolutional kernels, and their sizes) and training parameters (such as learning rate, batch size, number of iterations, and regularization coefficients) are crucial for the regression accuracy and generalization capability of the CNN model [39]. This study integrates the PSO algorithm with a CNN and introduces the SHAP algorithm to enhance the model’s interpretability. The PSO algorithm automatically searches for the optimal parameter combination, while SHAP values provide precise explanations for model predictions, enhancing the transparency of the model’s decision-making process. The flowchart of the ASCNN algorithm is shown in Figure 3, and the steps are as follows:

(1) Data Processing: The training dataset was first normalized to confine its value range to [0, 1]. After normalization, the RF algorithm was applied for feature selection, identifying and excluding factors with negligible impact on cost estimation for Huizhou replica traditional vernacular dwellings. The remaining significant factors were then used as input variables to construct the ASCNN predictive model. To meet the minimum sample size requirement for neural networks, the filtered data were randomly divided into three independent subsets, each containing at least 14 samples, ensuring adequate data for model training. During model construction, each subset was sequentially used as the validation set, while the remaining two subsets were combined to form the training set, facilitating effective training and optimization of the ASCNN predictive model.

(2) Parameter Initialization: To balance optimization effects and computational costs, this study selects five key parameters for optimization: learning rate (λ), the number of fully connected hidden layers (

l_{n}

) and their neuron counts (

h_{n}

), the number of convolutional kernels (

k_{n}

), and the size of the convolutional kernels (

k_{s}

). Considering that the learning rate is effectively adjusted in the Adam optimizer, this paper focuses on optimizing the last four parameters. Each particle’s position and velocity in the particle swarm are randomly initialized, with each particle’s position representing a set of CNN model parameters (

l_{n}

,

h_{n}

,

k_{n}

,

k_{s}

). Based on experience, it is recommended to set upper and lower limits for each parameter, with the following ranges for each parameter:

l_{n} \in

[1, 3],

h_{n} \in

[1, 100],

k_{n} \in

[3, 7], and

k_{s} \in

[1, 5], all of which are positive integers.

(3) CNN Training and Validation: For each particle, a CNN model is constructed based on the current position’s parameter configuration. The CNN model is trained using the training dataset, and its performance on the validation dataset is evaluated, with the evaluation results serving as the particle’s fitness value.

(4) Fitness Function: To balance the model’s generalization performance and complexity, Equation (8) is used as the fitness function for PSO search to identify the optimal parameter set [40].

F i t e n e s s = E_{t r a i n} + E_{v a l i d a t i o n}

(8)

where E_train and E_validate indicate the average training error and the validating error of three prediction models.

Extensive research has shown that integrating artificial neural networks with optimization algorithms is effective, particularly in minimizing training error and avoiding local optima during the training phase. However, overfitting can occur when a model fits too closely to the training data, indicating excessive model complexity. To prevent overfitting, it is essential to evaluate predictive accuracy using validation data when constructing the inference model. The choice of the objective function is critical in balancing the model’s generalization ability with its complexity [40]. The optimization objective proposed in this paper is designed to strike an optimal balance between minimizing training error and enhancing the model’s generalizability.

(5) PSO Optimization: The PSO algorithm searches for the best tuning parameters of the model by optimizing the fitness function, iterating to find the parameters that minimize the target fitness function.

(6) Stopping Condition: Once the stopping condition is met, the optimization process is terminated. The number of generations (G_max) or the number of function evaluations (NFE) can be used as the termination criterion; this study adopts G_max as the termination condition.

(7) Optimized Parameters: When the termination criterion is met, the loop is stopped, and a set of optimally adjusted parameters are used to train the entire training dataset and predict the cost of Huizhou replica traditional vernacular dwellings.

(8) Perform Final Testing with the Optimal Model: The optimized model is subjected to final testing using a test dataset that was not utilized during the training or validation phases. This step is crucial for evaluating the model’s performance and ensuring its robustness in predicting unseen data.

(9) SHAP Model Interpretation: Calculate the SHAP values for each feature to evaluate the specific impact of each feature on the model’s prediction.

3. Experimental Results, Analysis, and Discussion

3.1. Feature Analysis

The selection of features directly impacts the stability and accuracy of model fitting. Too many features can lead to the curse of dimensionality and feature redundancy, while too few can result in information loss and underfitting. Given the scarcity of research literature on the cost of traditional Huizhou residential construction, this paper divides the feature selection process into three steps. The first step involves using expert interviews to gather as many factors that primarily affect the cost of Huizhou residences as possible. The second step involves feature data processing, and the third step is the preliminary extraction of key features related to this task, aiming to achieve dimensionality reduction and enhance the model’s generalization capability.

3.1.1. Data Collection

In this study, face-to-face expert interviews were conducted to ensure the quality and credibility of the data collected. The experts interviewed have years of experience in the design and cost estimation of Hui-style ancient architecture, contributing to a rich repository of professional knowledge. The interviewer pre-determined a semi-structured format for the interviews to elicit comprehensive responses. The interviews involved open-ended questions such as: “Could you please elaborate on the factors that you believe may influence the cost in ancient replica construction projects? Please list and describe the extent of influence for each factor”. Through these expert interviews, the main factors affecting the cost of ancient replica construction were identified. Initially, eight factors were selected as the primary characteristics for the cost of Hui-style replica residential buildings, as detailed in Table 1.

Based on the eight influencing factors identified, this study conducted field research and consulted with experts in the field of Hui-style ancient architectural cost estimation, collecting original tender control price data for a total of 98 Huizhou replica traditional vernacular dwellings. Given the consistency in geographical location and construction timing of the projects from which the data were collected, there was no need for temporal or regional adjustments to the data.

The individual projects of Huizhou replica traditional vernacular dwellings can be categorized into two major divisions: ancient architectural sub-projects and civil engineering sub-projects. The ancient architectural sub-projects encompass key sub-projects such as large-scale timber work, wooden decoration, roofing, brickwork, and stonework, whereas the civil engineering sub-projects include sub-projects like earthwork, masonry, concrete, and wall decoration. After a meticulous analysis of the cost data for the 98 collected residential buildings, the results indicate that the cost of the ancient architectural sub-projects accounts for 88.24% of the total project cost. Within this division, the combined costs of the large-scale timber work, wooden decoration, roofing, brickwork, and stonework sub-projects constitute 80.04% of the expenses of the ancient construction division.

In accordance with the “80–20 rule”, which posits that the majority of effects are often generated by a minority of critical factors [41], this study selects ancient construction components such as large-scale timber work, wooden decoration, roofing, brickwork, and stonework as the primary feature inputs. Integrating the information obtained from the preliminary expert interviews, these main factors are defined as first-level features and are further refined into 19 second-level features, as detailed in Table 2.

3.1.2. Data Reduction

Data reduction is a crucial step in data analysis, aiding in reducing data dimensions, simplifying data representation, and extracting key information from the data. Data reduction is divided into two categories: quantification of qualitative features and attribute reduction.

Quantification of Qualitative Features

Feature factors include both qualitative and quantitative data types. Since predictive models require input features to be numerical, qualitative features must be converted into quantitative data. For example, Huizhou traditional vernacular dwellings have evolved from the traditional courtyard house form and, based on natural environmental conditions, have developed several planar types, such as “Ao”, “Hui”, “H”, and “Ri”, as shown in Figure 4 [42]. Different layouts result in varying numbers of side rooms, halls, and courtyards, significantly impacting the construction cost. Therefore, in processing qualitative data, layouts that are concave are quantified as 1, those that are “Hui”-shaped are quantified as 2, and so on, with specific quantification values detailed in Table 2.

2.: Feature Aggregation

In the cost analysis of ancient replica buildings, some characteristic factors within the data exhibit high discreteness. It is necessary to integrate information scattered across multiple attribute parameters, representing the same issue in the raw data, into a single comprehensive attribute parameter. For instance, in material selection for components in Huizhou traditional vernacular dwellings, cost considerations often lead to the use of different materials for the same type of component within different size ranges. For example, columns with a diameter less than 200 mm are typically made of Chinese fir, while those with a diameter of 200 mm or greater are made of materials such as camphorwood or mahogany.

In response, this study proposes using a weighted average method to deal with the material properties of large timber works and beams. This method integrates the attribute parameters of different materials into a unified component material coefficient (

η

), as shown in Equation (9), thereby accurately reflecting the impact of component material on cost. This processing not only improves the consistency and analyzability of the data but also lays the foundation for establishing a more accurate cost prediction model. After data reduction, the material coefficient feature is added, increasing the number of secondary features to 20, as shown in Table 3.

η = ω_{1} \times T r e e S p e c i e s 1 + ω_{2} \times T r e e S p e c i e s 2 + \dots ω_{n} \times T r e e S p e c i e s n

(9)

In the formula,

ω_{1} \dots ω_{n}

is the material weight of the tree species, determined by the ratio of the average market price of the tree species over the past five years. The proportion of tree species i is set based on the volume ratio of tree species i in similar components.

3.: Min–Max Normalization for Data

The quantitative data among the input features exhibit significant differences in magnitude and order of magnitude, leading to slow and uneven model convergence, which affects the predictive results [43]. Therefore, the 98 sets of data were linearly normalized to the interval [0, 1]. When applying the trained model to the test set for cost prediction, the output data are inversely normalized according to Equation (10).

x = y \times (x_{m a x} - x_{m i n}) + x_{m i n}

(10)

3.1.3. OPTICS Clustering for Outlier Detection

The OPTICS clustering algorithm is a density-based clustering method that automatically identifies clusters and outliers without the need to preset the number of clusters, exhibiting strong robustness [44]. The algorithm generates a reachability plot that illustrates the variation in data point density, where valleys represent the presence of clusters, and their depth indicates the tightness of the connections within the clusters, as shown in Figure 5. In this study, by setting a threshold based on the 95th percentile of reachability distances (r = 0.834), data points 60 and 92 in the plot, which are above this threshold, are identified as outliers because they exhibit a greater reachability distance compared to other points in the dataset.

3.1.4. Preliminary Extraction of Principal Features

In this study, we employed the Random Forest algorithm for feature selection within the dataset to eliminate features irrelevant to the learning task. There are primarily two methods for calculating the Variable Importance Measure (VIM) in a Random Forest: one based on the reduction of the Gini index in split nodes and the other based on the Out-of-Bag (OOB) prediction error rate. Considering the dataset includes both continuous and categorical variables, we opted for an assessment method based on data permutation to evaluate feature importance, as it provides a more precise evaluation for variables whose error rates remain unchanged after data permutation.

Using a Python program, we calculated and ranked the VIM values of each feature, as shown in Figure 6. The results indicate that, among the 20 feature attributes, the VIM values for Number of Door Hoods (X₁₇), Number of Floors (X₃), Construction Techniques (X₁₉), Ground Construction (X₁₅), and Partition Screen Material (X₁₀) are all zero, rendering them ineffective as features. The relative VIM values for the other features all exceeded 20%; hence, we selected these 15 features as input features for the cost estimation of replica Huizhou traditional vernacular dwellings. This approach significantly enhanced the efficiency of data organization and model computation, reducing the time and cost for cost engineers in data preparation and model calculation.

3.2. Computational Results

A total of 96 valid datasets of bidding control prices for Huizhou replica traditional vernacular dwellings were collected in this study. The processed datasets were randomly divided into two subsets: 67 for constructing the training set of the inference model, and the remaining 29 serving as the test set for evaluating model performance. The RF algorithm was utilized to screen 20 original features, resulting in 15 key features as input variables for the ASCNN. The accuracy assessment of the model employed the Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Percentage Error (MAPE), and Coefficient of Determination (R²) as evaluation metrics (Equation s (11) to (13)].

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}

(11)

M A P E = \frac{100 %}{n} \sum_{i = 1}^{n} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}|

(12)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}}

(13)

where

y_{i}

is the actual value,

{\hat{y}}_{i}

is the predicted value,

\bar{y}

is the average of actual values, and n is the number of samples.

Predictions were performed according to the model parameters suggested in Table 4. Figure 7 illustrates the change in normalized MSE as the number of iterations in the PSO process increases. The MSE value decreases with the increase in the number of iterations, showing a significant decline from 0 to approximately 75 iterations. During this process, the PSO algorithm effectively adjusted the model parameters, resulting in a substantial reduction in model prediction error. After 70 iterations, the value stabilizes without significant changes, indicating that the PSO algorithm has essentially converged, finding an optimal combination of model parameters. The iterative process demonstrates the effectiveness of the PSO algorithm in this optimization process. The final standard MSE value is approximately 0.015, which is a significant decrease relative to the initial value, indicating that the PSO algorithm has significantly improved the model’s predictive accuracy.

The predictive results of the ASCNN model are presented in Table 5. The findings indicate that the ASCNN model demonstrated an RMSE of 9828.06 yuan and a MAPE of 0.6% on the test dataset, with an R² reaching 0.989, signifying highly significant predictive performance and strong generalization capabilities.

Figure 8a illustrates the comparison between the predicted and actual values on the test dataset, showing a good match between the predicted and actual values for the majority of the data points. Notably, the overall trend of the data is well-aligned, indicating the model’s proficiency in capturing the global trends of data changes. For instance, at data points 6, 9, 13, and 18, the model successfully detected peak changes in actual values, further evidencing its robust response to extreme values within the data and its ability to reflect the actual cost variations to a considerable extent.

Figure 8b displays the distribution of relative errors for the test set samples in the cost prediction of replica traditional vernacular dwellings. The results indicate that the relative errors of the samples are all within ±3%, demonstrating that the model possesses high overall predictive accuracy and stability. Among the test samples, the maximum positive relative error is 2.37%, and the maximum negative error is −2.63%, which complies with the 3% precision range required for engineering budget preparation and is significantly lower than the requirement to control cost errors within ±10% during the detailed feasibility study phase. This indicates that the model has high predictive precision in most cases. Additionally, it is observed that the relative errors for the majority of samples are within ±2%, with a balanced error distribution and a small overall magnitude, suggesting that the model’s predictions are stable and reliable, without significant fluctuations. This stability is crucial for cost budgeting and control in practical applications. The error distribution in the figure shows an alternating pattern of positive and negative values, without a noticeable bias, indicating that the model exhibits balanced performance in terms of overfitting or underfitting. It should be noted that the dataset samples collected in this study are relatively small; an increase in the quantity and quality of training samples is expected to further improve predictive accuracy.

3.3. Results Analysis

To validate the effectiveness of the ASCNN model, comparisons were made with other AI models that have achieved good results in cost estimation. These algorithms include non-optimized CNN networks, BP-ANN, Radial Basis Function Neural Networks (RBF-ANNs), and Elastic Support Vector Machines (ELSVMs). The models were assessed using RMSE, MAPE, and R² as metrics for evaluating model accuracy.

Figure 9 presents the performance metrics of various models on the test dataset, where the ASCNN outperforms all other models across all indicators, particularly in MAPE and R², demonstrating its strong predictive accuracy and fitting capability. Specifically, the RMSE of ASCNN is the lowest in both the training and test sets, indicating the smallest error between predicted and actual values. In comparison, the RMSE of the unoptimized CNN increased by 26%, and the MAPE increased by 102%. Other models show significantly higher RMSE than ASCNN, with the ELSVM model performing the worst. The MAPE of ASCNN is the lowest in both the training and test sets, with a particularly minimal value of 0.6% on the test set, indicating the smallest predictive error. Moreover, the R² values of ASCNN are close to 1 in both the training and test sets, indicating the best fitting effect. Other models have R² values lower than ASCNN, with ELSVM having the lowest R² value and the worst fitting effect.

It can be observed that ASCNN exhibits the best performance across all metrics, with its RMSE, MAPE, and R² superior to other models, especially with prominent performance in MAPE and R², indicating higher predictive accuracy and excellent fitting. Although other models such as RBF-ANN and BP-ANN show relatively good predictive performance, they are still inferior to ASCNN, particularly in MAPE and R². ELSVM performs the worst across all metrics, with the largest predictive error and the poorest fitting, making it unsuitable for cost prediction of replica traditional vernacular dwellings.

Therefore, based on the analysis, ASCNN is the most superior model for cost prediction of replica traditional vernacular dwellings, with high application value and reliability. In practical applications, it is recommended to prioritize the use of the ASCNN model for prediction to achieve higher predictive accuracy and better fitting effects.

3.4. Model Interpretation

This study employs SHAP to elucidate the CNN model used in the task of cost estimation for replica traditional vernacular dwellings, verifying whether the model can reasonably capture the correlation between the parameters used in the training process and the cost of these dwellings. By thoroughly analyzing the model’s predictive logic, not only can the credibility of the model be enhanced, but also clearer and more persuasive decision support can be provided to relevant stakeholders.

Figure 10a presents the global feature importance plot based on SHAP values, showing the average absolute SHAP values of each feature to intuitively represent their importance. It can be observed that the average SHAP values of Plan Layout (X₄), Column Material Coefficient (X₇), Roof Area (X₁₃), Beam Material Coefficient (X₈), Screen Door Area (X₁₁), Partition Screen Area (X₉), and Building Area (X₁) rank in the top seven, indicating their greatest impact on the model’s prediction and further indicating that the Plan Layout, Tilework, Major Woodwork, Wood Decoration, and Building Area have the most significant influence on the cost of these dwellings.

The SHAP value distribution illustrated in Figure 10b offers an in-depth insight into the contribution of each feature in the model’s prediction. The results show that the SHAP values of Plan Layout (X₄) and Roof Area (X₁₃) cover a wide range with significant differences, indicating that these two features play a notable role in the cost prediction of ancient dwellings. In particular, the broad distribution of SHAP values for Plan Layout (X₄) and Roof Area (X₁₃) suggests a potentially complex nonlinear relationship between these factors and the predicted outcomes.

The Column Material Coefficient (X₇), Building Area (X₁), Screen Door Area (X₁₁), Beam Material Coefficient (X₈), and Partition Screen Area (X₉) exhibit a moderate range of SHAP values, indicating a stable but significant impact on cost prediction. This finding aligns with common knowledge in the field of architecture that the material of wooden structures and wood decoration are key factors affecting the construction cost of these dwellings.

On the other hand, features such as Brick Ground Area (X₁₄) and Patio Area (X₁₈) have a more concentrated distribution of SHAP values, indicating their relatively minor contribution to the model’s prediction. This suggests that the Brickwork and Stonework parts of Huizhou replica traditional vernacular dwellings have a smaller impact on the cost. The impact of the other six features on the model’s prediction is relatively limited, indicating a degree of redundancy.

To further validate the robustness and interpretability of the ASCNN model, in-depth interviews were conducted with eight experts, each possessing extensive experience in cost estimation of traditional construction projects. The interview findings revealed a high degree of consistency between the SHAP value analysis and the experts’ professional insights, thereby significantly enhancing the credibility of the model’s predictive outcomes.

Delving into the contributions of features to model predictions and their mechanisms of impact is crucial for precisely controlling the construction costs of replica traditional vernacular dwellings. This analysis not only identifies the key factors that dominate cost predictions, but also reveals the nonlinear dynamics in cost influences, providing a solid foundation for accurate budget preparation. Through this process, cost engineers can gain profound insights into the mechanisms of ancient dwelling cost formation, quantify the specific impacts of various features on costs with precision, and thus comprehensively understand how architectural elements and material choices collectively affect construction costs. This in-depth understanding is of significant practical importance for formulating reasonable cost budgets, optimizing resource allocation, and enhancing the overall efficiency of construction projects.

4. Conclusions

Huizhou-style residences are celebrated for their exquisite architectural structures and intricate decorative craftsmanship. These characteristics involve the use of numerous materials and artisanal skills, establishing a complex nonlinear relationship with construction costs. This complexity significantly increases the difficulty of cost estimation, causing traditional methods to often fall short of the expected accuracy and leading to frequent budget overruns. In the current context, where the Engineering, Procurement, and Construction (EPC) model is increasingly popular and the profit margins of construction projects are gradually decreasing, accurately predicting construction costs is particularly crucial. To address this challenge, this study proposes a model that fully utilizes historical data, combining PSO with a CNN model, referred to as the ASCNN, to predict the costs of replica traditional vernacular dwellings in the Huizhou area. This model not only provides more precise cost estimation but also enhances the transparency and credibility of the model through interpretation of SHAP values, offering reliable support for investment decisions in replica traditional vernacular dwellings.

A total of 96 bidding control price datasets for newly constructed replica traditional vernacular dwellings were collected to assess the performance of the ASCNN. The experimental results indicate that the ASCNN model can accurately predict the construction costs of Huizhou replica traditional vernacular dwellings, achieving the best results in RMSE, MAPE, and R², with RMSE and MAPE being 26% and 102% higher in accuracy than the second most accurate CNN algorithm, respectively. Additionally, by removing five redundant input parameters, the ASCNN model saves more time and effort in data updating and collection compared to other algorithms. Based on SHAP value calculations, this study identified Plan Layout (X₄), Roof Area (X₁₃), Column Material Coefficient (X₇), Screen Door Area (X₁₁), Beam Material Coefficient (X₈), Building Area (X₁), and Partition Screen Area (X₉) as the most important factors affecting the accuracy of cost prediction for Huizhou replica traditional vernacular dwellings, while finding that brickwork has a relatively minor impact on cost estimation.

Cost engineers can use this model to accurately identify and quantify the key factors affecting the cost of replica traditional vernacular dwellings, providing precise cost estimates. Through in-depth SHAP value analysis, this study reveals elements that significantly impact costs, enabling engineers to understand the internal mechanisms of cost composition more deeply and make precise adjustments accordingly.

This study has successfully developed an efficient artificial intelligence model aimed at providing a powerful tool for auxiliary decision-making and cost control in the field of cost estimation for replica Huizhou traditional dwellings. The proposed ASCNN model integrates the advantages of various artificial intelligence technologies, including: (1) adaptively adjusting the input parameters of the CNN to optimize performance; (2) providing high reliability and relatively accurate cost predictions for Huizhou replica traditional dwellings, enhancing the accuracy of estimation; (3) improving the objectivity and efficiency of the model by reducing manual operations; and (4) reducing the workload and time required for data updating and collection, improving the efficiency of data processing.

Despite the significant achievements of this study in the cost estimation of replica Huizhou traditional dwellings, some limitations need to be addressed in future research. (1) Geographical Limitations: The current study’s samples are mainly concentrated in specific areas of Huizhou traditional architecture. Future work should expand the geographical scope of the samples to verify the model’s universality and applicability in different regions. (2) Sample Diversity: The collected samples mainly come from a few single buildings of local cultural and tourism projects, and the construction time is relatively short, which limits the diversity of the samples and does not fully consider the impact of time factors on cost. Follow-up studies should consider the impact of time changes on the cost of replica traditional vernacular dwellings to improve the model’s dynamic adaptability and the accuracy of long-term predictions.

Author Contributions

Conceptualization, J.H.; resources, J.H.; formal analysis, J.H.; writing—original draft preparation, J.H.; writing—review and editing, J.H. and W.H.; validation, W.H.; funding acquisition, J.H. and W.Q.; supervision, W.Q. and Y.X. All authors have read and agreed to the published version of the manuscript.

Funding

The research was supported by the Natural Science Research Key Project of Anhui Educational Committee (Grant No. 2022AH051956), the Natural Science Foundation of Huangshan University (2021xkjq003), and the Huizhou Culture Research Project of Huangshan University, China (Grant No. 2019xhwh005).

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the authors on request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhang, M.; Zhang, J.; Liu, Q.; Li, T.; Wang, J. Research on the Strategies of Living Conservation and Cultural Inheritance of Vernacular Dwellings—Taking Five Vernacular Dwellings in China’s Northern Jiangsu as an Example. Sustainability 2022, 14, 12503. [Google Scholar] [CrossRef]
Yan, Y. Cost Management of Ancient Building Relocation Repair Projects. J. Fujian Univ. Technol. 2023, 21, 481–486. [Google Scholar] [CrossRef]
Vitasek, S.; Macek, D. A Study of Factors Influencing the Compliance of Design Estimates at the Construction Stage of Residential Buildings. Buildings 2024, 14, 2010. [Google Scholar] [CrossRef]
Liu, Y. Study on the Cost of Ancient Buildings Repair Project in South China. Master’s Thesis, Guangdong University of Technology, Guangzhou, China, 2019. [Google Scholar] [CrossRef]
Yang, L. Research on Influencing Factors and Countermeasures of Renovation Cost of Historical Buildings. Master’s Thesis, Tianjin University of Science and Technology, Tianjin, China, 2022. [Google Scholar] [CrossRef]
Elmousalami, H.H. Artificial Intelligence and Parametric Construction Cost Estimate Modeling: State-of-the-Art Review. J. Constr. Eng. Manag. 2020, 146, 03119008. [Google Scholar] [CrossRef]
Seidu, R.; Young, B.; Clack, J.; Adamu, Z.; Robinson, H. Innovative Changes in Quantity Surveying Practice through BIM, Big Data, Artificial Intelligence and Machine Learning. Appl. Sci. Univ. J. Nat. Sci. 2020, 4, 37–47. [Google Scholar]
Emslley, M.W.; Lowe, D.J.; Duff, A.R.; Harding, A.; Hickson, A. Data Modelling and The Application of a Neural Network Approach to The Prediction of Total Construction Costs. Constr. Manag. Econ. 2002, 20, 465–472. [Google Scholar] [CrossRef]
Lowe, D.J.; Emsley, M.W.; Harding, A. Predicting Construction Cost Using Multiple Regression Techniques. J. Constr. Eng. Manag. 2006, 132, 750–758. [Google Scholar] [CrossRef]
Wilmot, C.G.; Mei, B. Neural Network Modeling of Highway Construction Costs. J. Constr. Eng. Manag. 2005, 131, 765–771. [Google Scholar] [CrossRef]
An, S.H.; Park, U.Y.; Kang, K.I.; Cho, M.Y.; Cho, H.H. Application of Support Vector Machines in Assessing Conceptual Cost Estimates. J. Comput. Civ. Eng. 2007, 21, 259–264. [Google Scholar] [CrossRef]
Cheng, M.Y.; Hoang, N.D.; Wu, Y.W. Hybrid Intelligence Approach Based on LS-SVM and Differential Evolution for Construction Cost Index Estimation: A Taiwan Case Study. Autom. Constr. 2013, 35, 306–313. [Google Scholar] [CrossRef]
Moussa, M.; Ruwanpura, J.; Jergeas, G. Decision Tree Modeling Using Integrated Multilevel Stochastic Networks. J. Constr. Eng. Manag. 2006, 132, 1254–1266. [Google Scholar] [CrossRef]
Doğan, S.Z.; Arditi, D.; Murat Günaydin, H. Using Decision Trees for Determining Attribute Weights in a Case-Based Model of Early Cost Prediction. J. Constr. Eng. Manag. 2008, 134, 146–152. [Google Scholar] [CrossRef]
Ji, S.H.; Ahn, J.; Lee, E.B.; Kim, Y. Learning Method for Knowledge Retention in CBR Cost Models. Autom. Constr. 2018, 96, 65–74. [Google Scholar] [CrossRef]
Kim, S. Hybrid Forecasting System Based on Case-based Reasoning and Analytic Hierarchy Process for Cost Estimation. J. Civ. Eng. Manag. 2013, 19, 86–96. [Google Scholar] [CrossRef]
Cao, Y.; Ashuri, B.; Baek, M. Prediction of Unit Price Bids of Resurfacing Highway Projects through Ensemble Machine Learning. J. Comput. Civ. Eng. 2018, 32, 04018043. [Google Scholar] [CrossRef]
Pan, Y.; Zhou, S.; Guan, J.; Wang, Q.; Ding, Y. Limited Field Images Concrete Crack Identification Framework Using PCA and Optimized Deep Learning Model. Buildings 2024, 14, 2054. [Google Scholar] [CrossRef]
Wang, Y.; Ning, X.; Zhen, D.; Yong, W.; Zhang, H. Research on Construction Project Cost Prediction Model Based on Recurrent Neural Network. In SHS Web of Conferences, Proceedings of the 2023 International Conference on Digital Economy and Management Science, Kai Feng, China, 21–23 April 2023; EDP Sciences: Les Ulis, France, 2023; Volume 170, p. 02009. [Google Scholar]
Al-tawal, D.R.; Arafah, M.; Sweis, G.J. A Model Utilizing the Artificial Neural Network in Cost Estimation of Construction Projects in Jordan. Eng. Constr. Archit. Manag. 2021, 28, 2466–2488. [Google Scholar] [CrossRef]
Saeidlou, S.; Ghadiminia, N. A Construction Cost Estimation Framework Using DNN and Validation Unit. Build. Res. Inf. 2024, 52, 38–48. [Google Scholar] [CrossRef]
Li, B.; Xin, Q.; Zhang, L. Engineering Cost Prediction Model Based on DNN. Sci. Program. 2022, 2022, 3257856. [Google Scholar] [CrossRef]
Lecun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
Xue, X.; Jia, Y.; Tang, Y. Expressway Project Cost Estimation with a Convolutional Neural Network Model. IEEE Access 2020, 8, 217848–217866. [Google Scholar] [CrossRef]
Yi, M.; Zeng, Y.; Qin, Z.; Xia, Z.; He, Y. Cost Prediction Model of Mountain High-speed Railway Civil Engineering Based on MIC-CNN. Railw. Stand. Des. 2023, 67, 44. [Google Scholar] [CrossRef]
Han, K.; Wang, W.; Huang, X. Predicting the Construction Cost of High Standard Farmland Irrigation Projects using NGO-CNN-SVM. Trans. Chin. Soc. Agric. Eng. 2024, 40, 62–72. [Google Scholar] [CrossRef]
Jović, A.; Brkić, K.; Bogunović, N. A Review of Feature Selection Methods with Applications. In Proceedings of the 2015 38th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO), Opatija, Croatia, 25–29 May 2015. [Google Scholar]
Jaiswal, J.K.; Samikannu, R. Application of Random Forest Algorithm on Feature Subset Selection and Classification and Regression. In Proceedings of the 2017 World Congress on Computing and Communication Technologies (WCCCT), Tiruchirappalli, India, 2–4 February 2017. [Google Scholar]
Gaspar, A.; Oliva, D.; Cuevas, E.; Zaldívar, D.; Pérez, M.; Pajares, G. Hyperparameter Optimization in a Convolutional Neural Network Using Metaheuristic Algorithms. Metaheuristics in Machine Learning: Theory and Applications; Springer International Publishing: Cham, Switzerland, 2021; pp. 37–59. [Google Scholar]
Kennedy, J.; Eberhart, R. Particle swarm optimization. In Proceedings of the ICNN’95-International Conference on Neural Networks, Perth, WA, Australia, 27 November 1995. [Google Scholar]
Jiyue, E.; Liu, J.; Wan, Z. A Novel Adaptive Algorithm of Particle Swarm Optimization Based on the Human Social Learning Intelligence. Swarm Evol. Comput. 2023, 80, 101336. [Google Scholar]
Montavon, G.; Samek, W.; Müller, K.R. Methods for Interpreting and Understanding Deep Neural Networks. Digital Signal Process. 2018, 73, 1–15. [Google Scholar] [CrossRef]
Wieland, R.; Lakes, T.; Nendel, C. Using Shapley Additive Explanations to Interpret Extreme Gradient Boosting Predictions of Grassland Degradation in Xilingol, China. Geosci. Model Dev. 2021, 14, 1493–1510. [Google Scholar]
Zhu, P.; Cao, W.; Zhang, L.; Zhou, Y.; Wu, Y.; Ma, Z.J. Interpretable Machine Learning Models for Prediction of UHPC Creep Behavior. Buildings 2024, 14, 2080. [Google Scholar] [CrossRef]
Lundberg, S.M.; Lee, S.I. A Unified Approach to Interpreting Model Predictions. Adv. Neural Inf. Process. Systems. 2017, 30, 4768–4777. [Google Scholar]
Dhillon, A.; Verma, G.K. Convolutional Neural Network: A Review of Models, Methodologies and Applications to Object Detection. Prog. Artif. Intell. 2020, 9, 85–112. [Google Scholar] [CrossRef]
Kinga, D.; Adam, J.B. A Method for Stochastic Optimization. In Proceedings of the International Conference on Learning Representations (ICLR), Boston, MA, USA, 6 July 2015. [Google Scholar]
Wang, D.; Tan, D.; Liu, L. Particle Swarm Optimization Algorithm: An Overview. Soft Comput. 2018, 22, 387–408. [Google Scholar] [CrossRef]
Shafi, S.; Assad, A. Exploring the Relationship between Learning Rate, Batch Size, and Epochs in Deep Learning: An Experimental Study; Springer Nature Singapore: Singapore, 2023; pp. 201–209. [Google Scholar]
Cao, M.T.; Cheng, M.Y.; Wu, Y.W. Hybrid Computational Model for Forecasting Taiwan Construction Cost Index. J. Constr. Eng. Manag. 2015, 141, 04014089. [Google Scholar] [CrossRef]
Le, C.; Jeong, H.D.; Damnjanovic, I.; Bukkapatnam, S. Pareto Principle in Scoping-Phase Cost Estimating: A Multiobjective Optimization Approach for Selecting and Applying Optimal Major Work Items. J. Constr. Eng. Manag. 2022, 148, 04022076. [Google Scholar] [CrossRef]
Shan, D.Q. Chinese Vernacular Dwellings: People’s Daily Life with Their Houses; China Intercontinental Press: Beijing, China, 2010; pp. 35–52. [Google Scholar]
Liang, K.Y.; Zeger, S.L. Regression Analysis for Correlated Data. Annu. Rev. Public Health 1993, 14, 43–68. [Google Scholar] [CrossRef] [PubMed]
Zhao, Y.; Li, H.; Yu, X.; Ma, N.; Yang, T.; Zhou, J. An independent central point OPTICS clustering algorithm for semi-supervised outlier detection of continuous glucose measurements. Biomed. Signal Process. Control 2022, 71, 103196. [Google Scholar] [CrossRef]

Figure 1. CNN model structure diagram.

Figure 2. PSO algorithm flowchart.

Figure 3. ASCNN structural diagram.

Figure 4. Floor plan of traditional Huizhou residential buildings: (a) “Ao”-shaped layout; (b) “Hui”-shaped layout; (c) “H”-shaped layout; (d) “Ri”-shaped layout.

Figure 5. Optics reachability distance plot.

Figure 6. Features’ relative importance plot.

Figure 7. PSO iteration process.

Figure 8. ASCNN model prediction results: (a) CNN model output results; (b) relative error of output results.

Figure 9. Results comparison of the models: (a) results comparison of RMSE; (b) results comparison of MAPE; (c) results comparison of MAPE.

Figure 10. SHAP value: (a) feature importance ranking plot; (b) SHAP value violin plot.

Table 1. Summary of expert interviews on key factors for Huizhou replica traditional vernacular dwellings.

Main Factors
(1) Architectural Form	(5) Brickwork Quantity and Material
(2) Woodwork Quantity and Material	(6) Stonework Quantity and Material
(3) Wood Decoration Quantity and Material	(7) Wood Frame Structure Type
(4) Tilework Quantity and Material	(8) Construction Techniques

Table 2. Factors of Influence.

Serial Number	Primary Features	Secondary Features	Feature Attributes	Feature Encoding
1	Architectural Form	Building Area	--	m²
2		Architectural Style	Ming Dynasty	1
2		Architectural Style	Qing Dynasty	2
3		Number of Floors	--	Floor
4		Plan Layout	“Ao”-shaped Layout	1
			“Hui”-shaped Layout	2
			“H”-shaped Layout	3
			“Ri”-shaped Layout	4
5	Major Woodwork	Column Quantity	--	m³
6		Beam Quantity	--	m³
7		Major Woodwork Material	Chinese Fir	1
			Camphorwood	2
			Mahogany	3
8	Wood Decoration	Partition Screen Area	--	m²
9		Partition Screen Material	Chinese Fir	1
9		Partition Screen Material	African Teak	2
10		Screen Door Area	--	m²
11		Screen Door Material	Chinese Fir	1
11		Screen Door Material	African Teak	2
12	Tilework	Roof Area	--	m²
13	Brickwork	Brick Ground Area	--	m²
14		Ground Construction	Square-Brick Ground	1
			Slate Ground	2
			Composite Soil Ground	3
15		Door Hood Construction	Arched Door Hood	1
			Inscribed Door Hood	2
			Drooping Flower Door Hood	3
			Figure-eight Door Hood	4
16		Number of Door Hood	--	Unit
17	Stonework	Patios Area	--	m²
18	Construction	Construction Techniques	Traditional Techniques	1
			Non-traditional Techniques	2
			Combination of Traditional and Non-traditional Techniques	3
19	Wood Frame Structure	Wood Frame Type	Bracket-set Structure	1
			Combination of Bracket-set Structure and Beam-lift Structure	2
			Beam-lift Structure	3

Table 3. Samples and input features.

Description	Notation	Project Number
Description	Notation	1	2	3	4	5	6		97	98
Building Area	X₁	304.80	289.30	271.70	194.00	167.80	165.40	…	168.00	871.00
Architectural Style	X₂	2	2	2	2	2	2	…	2	1
Number of Floors	X₃	2	2	2	2	2	2	…	2	2
Plan Layout	X₄	2	2	2	1	2	1	…	1	4
Column Quantity	X₅	11.31	8.47	8.49	8.51	5.78	8.03	…	6.82	74.93
Beam Quantity	X₆	9.23	5.29	7.85	7.02	5.04	10.74	…	6.09	153.27
Column Material Coefficient	X₇	0.45	0.33	0.37	0.35	0.33	0.43	…	0.51	0.57
Beam Material Coefficient	X₈	0.43	0.49	0.45	0.48	0.40	0.42	…	0.67	0.71
Partition Screen Area	X₉	18.6	22.62	8.97	17.77	30.08	33.28	…	10.4	75.48
Partition Screen Material	X₁₀	2	2	2	2	2	2	…	2	2
Screen Door Area	X₁₁	114.49	165.09	83.91	64.02	62.67	176.81	…	61.30	75.48
Screen Door Material	X₁₂	1	1	1	1	1	1	…	1	1
Roof Area	X₁₃	191.09	166.68	170.38	106.67	104.54	101.52	…	92.84	188.34
Brick Ground Area	X₁₄	61.00	66.63	32.47	45.95	28.67	49.47	…	40.00	61.72
Ground Construction	X₁₅	1	1	1	1	1	1	…	1	1
Door Hood Construction	X₁₆	2	2	2	2	2	2	…	2	4
Number of Door Hoods	X₁₇	1	1	1	1	0	1	…	1	1
Patio Area	X₁₈	12.93	12.52	21.99	9.23	7.64	6.52	…	9.41	57.70
Construction Techniques	X₁₉	1	1	1	1	1	1	…	1	1
Wood Frame Type	X₂₀	1	1	1	1	1	1	…	1	3
Tender Control Price (ten thousand yuan)	Y	150.63	139.71	138.67	106.88	105.59	119.31	…	94.75	999.52

Table 4. ASCNN algorithm parameter settings.

Parameter Name	Parameter Value	Parameter Name	Parameter Value
Optimizer	Adam	Learning Rate	Adaptive
Data Split	0.7	L2 Regularization	0.01
Data Shuffling	Yes	Cross-validation	3
Activation Function	ReLU	Number of Neural Network Iterations	1000
PSO Particle Number	21	PSO Maximum Iterations	100
C1, C2, W	2.0, 2.0, 0.5	Batch Size	4

Table 5. Model results of ASCNN.

Data Set	RMSE	MAPE (%)	R²
Training set	3018.92	0.2	0.999
Cross-validation set	8080.94	0.53	0.992
Test set	9828.06	0.6	0.989

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Huang, J.; Huang, W.; Quan, W.; Xing, Y. Hybrid Intelligent Model for Estimating the Cost of Huizhou Replica Traditional Vernacular Dwellings. Buildings 2024, 14, 2623. https://doi.org/10.3390/buildings14092623

AMA Style

Huang J, Huang W, Quan W, Xing Y. Hybrid Intelligent Model for Estimating the Cost of Huizhou Replica Traditional Vernacular Dwellings. Buildings. 2024; 14(9):2623. https://doi.org/10.3390/buildings14092623

Chicago/Turabian Style

Huang, Jian, Wei Huang, Wei Quan, and Yandong Xing. 2024. "Hybrid Intelligent Model for Estimating the Cost of Huizhou Replica Traditional Vernacular Dwellings" Buildings 14, no. 9: 2623. https://doi.org/10.3390/buildings14092623

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hybrid Intelligent Model for Estimating the Cost of Huizhou Replica Traditional Vernacular Dwellings

Abstract

1. Introduction and Literature Review

2. Methodology

2.1. Convolutional Neural Network (CNN) Algorithm

2.2. Particle Swarm Optimization (PSO)

2.3. SHAP Algorithm

2.4. ASCNN for Huizhou Ancient Replica Architecture Cost Estimation

3. Experimental Results, Analysis, and Discussion

3.1. Feature Analysis

3.1.1. Data Collection

3.1.2. Data Reduction

3.1.3. OPTICS Clustering for Outlier Detection

3.1.4. Preliminary Extraction of Principal Features

3.2. Computational Results

3.3. Results Analysis

3.4. Model Interpretation

4. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI