Next Article in Journal
Creep Deformation Characteristics and Damage Unified Creep Constitutive Model of Undisturbed Structural Loess Under Different Consolidation Conditions
Previous Article in Journal
Study on the Dynamic Deformation Characteristics of Artificial Structural Loess
Previous Article in Special Issue
Housing Behaviors for Older Households in South Korea: The Role of Intergenerational Networks
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Novel Prediction Model for the Sales Cycle of Second-Hand Houses Based on the Hybrid Kernel Extreme Learning Machine Optimized Using the Improved Crested Porcupine Optimizer

1
School of Mathematics and Computer Sciences, Nanchang University, Nanchang 330000, China
2
School of Infrastructure Engineering, Nanchang University, Nanchang 330000, China
3
School of Civil Engineering and Architecture, Wuhan University of Technology, Wuhan 430000, China
*
Author to whom correspondence should be addressed.
Buildings 2025, 15(7), 1200; https://doi.org/10.3390/buildings15071200 (registering DOI)
Submission received: 23 February 2025 / Revised: 29 March 2025 / Accepted: 1 April 2025 / Published: 6 April 2025
(This article belongs to the Special Issue Study on Real Estate and Housing Management—2nd Edition)

Abstract

:
Second-hand housing transactions are an important part of the housing market. Due to the dual influence of location and price, the sales cycle of second-hand housing has shown significant diversity. As a result, when residents sell or buy second-hand houses, they often cannot accurately and quickly evaluate the cycle of the second-hand house; thus, the transaction fails. For this reason, this paper develops a prediction model of the second-hand housing sales cycle based on the hybrid kernel extreme learning machine (HKELM) optimized using the Improved Crested Porcupine Optimizer (CPO), which has achieved rapid and accurate prediction. Firstly, this paper uses a Stimulus–Organism–Response model to identify 33 factors that affect the second-hand housing sales cycle from three aspects: policy factors, economic factors, and market supply and demand. Then, in order to solve the problems of slow convergence, easy-to-fall-into local optimum, and insufficient optimization performance of the traditional CPO, this paper proposes an improved optimization algorithm for crowned porcupines (Cubic Chaos Mapping Crested Porcupine Optimizer, CMTCPO). Subsequently, this paper puts forward a prediction model of the second-hand housing sales cycle based on an improved CPO-HKELM. The model has the advantages of a simple structure, easy implementation, and fast calculation speed. Finally, this paper selects 400 second-hand houses in eight cities in China as case studies. The case study shows that the maximum relative error based on the model proposed in this paper is only 0.0001784. A ten-fold cross-test proves that the model does not have an over-fitting phenomenon and has high reliability. In addition, this paper discusses the performances of different chaotic maps to improve the CPO and proves that the algorithm including chaotic maps, mixed mutation, and tangent flight has the best performance. Compared with the classical meta-heuristic optimization algorithm, the improved CPO proposed in this paper has the smallest calculation error and the fastest convergence speed. Compared with a BPNN, LSSVM, RF, XGBoost, and LightGBM, the HKELM has advantages in prediction performance, being able to handle high-dimensional complex data sets more effectively and significantly reduce the consumption of computing resources. The relevant research results of this paper are helpful to predict the second-hand housing sales cycle more quickly and accurately.

1. Introduction

The real estate market has a significant economic impact and a broad range of social effects. It plays a crucial role in the global economy by fulfilling people’s fundamental housing needs. Additionally, it promotes economic growth, creates job opportunities, and enhances social well-being [1,2,3]. In recent years, the transaction volume of second-hand housing has accounted for a significant proportion of the real estate market. In 2023, the volume of second-hand housing in European and American countries, such as the United States, the United Kingdom, and France, accounted for 89%, 83%, and 74% of the total, respectively, while the volume of Asian countries, such as China, South Korea, and Japan, accounted for 40%, 36%, and 42%, respectively. Second-hand housing constitutes an integral part of the housing market. They are located in relatively mature communities with complete supporting facilities. Thus, more choices can be provided for homebuyers [4,5]. However, influenced by location and price, the sales cycle of second-hand housing demonstrates significant diversity. This leads to residents often being unable to accurately and swiftly assess the sales cycle of second-hand housing when selling or purchasing, resulting in failed transactions. Consequently, the study of the sales cycle of second-hand housing is imbued with significant practical implications and theoretical value.
From the perspective of the research object, in the field of second-hand housing transactions, the current research mainly focuses on the prediction of second-hand housing prices, and there are few studies on the sales cycle. Additionally, most studies do not thoroughly identify the factors influencing second-hand housing. Instead, they often focus solely on individual attributes and regional characteristics, neglecting the effects of the broader macro environment and market performance [6]. Based on the data from 38,363 second-hand housing transactions in Chengdu, China, collected by Ke.com, Zhang et al. [7] used a random forest model to predict the price of second-hand housing, and the error of the model was 2.61%. However, their research focuses on the attributes of second-hand housing, the indicators involved are relatively limited, and the diversity and comprehensiveness of indicators still need to be further expanded. Gao et al. [8] incorporated the housing location factor into their study and established a multi-task learning (MTL) model. This model offers significant advantages over the traditional model, yet it still focuses primarily on the characteristics of the second-hand house and its location. Li et al. [9] identified several influential factors for their research, including the basic characteristics of the housing, the structure information, the community environment, and the supporting facilities within the community. These factors were utilized to assess transactions involving second-hand homes. Lu et al. [10] combined various regression models to predict house prices, and the estimates of each regression model were averaged to produce the final result. However, the factors they considered were limited to the number of bedrooms, the size of the house, and the available amenities, among others. The research conducted by Park and Bae [11] examined only the physical characteristics and supporting facilities of houses. In their findings, the selection of influencing factors was overly narrow, relying on the traditional hedonic pricing theory. This approach focused on three main factors: the physical attributes of the housing, location characteristics, and the neighborhood environment.
Unlike the aforementioned scholars, DiPasquale and Wheaton [12] incorporated macroeconomic variables into their analysis to predict housing prices. They discovered that these macroeconomic factors can enhance the accuracy of housing price predictions. Additionally, Zhong et al. [13] underscored the strong connection between macroeconomic factors and the real estate market, noting that this relationship influences the market at various levels. Zhang [14] pointed out that traditional house price prediction models typically rely on numerical attributes, such as the number of rooms, but overlook the textual descriptions of the houses. To address this, he collected descriptive information from second-hand houses in five cities across Ontario, Canada, and developed a deep-learning model. The model achieved an accuracy of 0.7904, indicating that there is still room for improvement. The research conducted by He X and Xia F [15] highlights the significance of individual psychological and emotional factors in the real estate market at the micro level. While these factors are challenging to quantify, they are crucial in understanding the second-hand housing market. Additionally, Gdakowicz et al. [16] pointed out that various internal and external factors interact throughout the process, from the initial listing of houses to the final transaction, ultimately influencing the duration of the sales cycle. Based on the above research results, a variety of macro and micro factors needed to be considered to ensure the comprehensiveness and accuracy of the prediction results.
The Stimulus–Organism–Response (SOR) model, proposed by Mehrabian and Russell, offers a comprehensive explanation of the mechanisms behind consumer purchasing behavior. In this model, ‘Stimulus’ (S) refers to the external environmental factors influencing consumers’ cognition and emotions. ‘Organism’ (O) serves as a mediating variable, representing the internal state of the consumers. Finally, ‘Response’ (R) indicates the external behavioral reactions of consumers, which are typically expressed as either approaching or avoiding certain stimuli. The framework suggests that various stimuli influence consumers’ purchasing behavior. These stimuli generate motivation, which drives consumers to make purchases. After purchase, consumers evaluate the products and the purchasing channels and manufacturers involved [17]. Considering second-hand housing transactions as a consumer purchasing behavior, the SOR theory is applicable. The SOR theory is introduced in this study to identify factors affecting the sales cycle of second-hand housing. In this context, ‘Stimulus’ refers to the macro and external environmental factors that affect the second-hand housing market. ‘Organism’ pertains to the characteristics of the second-hand house and its surrounding area. Finally, ‘Response’ represents the overall reaction of the second-hand housing market to these stimuli and organisms.
The authors note that no study proposes a quantitative method to accurately predict the sales cycle and comprehensively identify various influencing factors. Therefore, we have chosen to focus our research on the sales cycle of second-hand houses in order to address this gap in the field.
From the perspective of research methods, there is currently a lack of quantitative methods to study the sales cycle. Therefore, collecting appropriate data, using appropriate methods to characterize the nonlinear relationship between influencing factors, and accurately and quickly predicting is the key to our research. The real estate transaction period is lengthy, and obtaining the necessary data can be challenging. Traditional data collection methods mainly rely on statistical data or personal surveys [18]. Roy [19] used statistical data compiled by government agencies to conduct an in-depth study of urban housing demand. Feng et al. [20] collected the original data of commercial residential buildings in Beijing through field investigations. Although the above two methods of obtaining data ensure the authority of the data and the richness of the details, they are time consuming and labor intensive. With the development of science and technology, the online real estate platform provides us with new ideas [21]. Globally, online real estate platforms, such as Lianjia in China, Zillow in the United States, PropertyGuru in Southeast Asia, and 99acres in India, are popular, offering services such as home purchases, sales, and rentals while working on data reliability and immediacy [22]. Crawler is a technology that simulates browser behavior by writing programs to obtain data on the Internet automatically. It can effectively overcome the limitations of traditional methods in data collection, reduce costs, and improve the efficiency of data collection [23]. In the field of real estate, more and more researchers are using Web crawler technology to obtain relevant data on online platforms. Lee et al. [24] crawled the public information data provided by a real estate portal and predicted the value of a house accordingly. Yao et al. [25] collected housing price data from China’s largest online real estate market and generated the spatial distribution of housing prices in Shenzhen, China, with a spatial resolution of 5 m. The research data of Song and Ma [26] were crawled on Lianjia.com, while geocoding was performed in the Gaode Map API with physical addresses to correlate with other geographic data spatially. Kang et al. empirically analyzed housing prices based on crawled Web data [6]. Based on these research findings, this study employs crawler technology to collect data aimed at improving the accuracy of sales cycle prediction.
Some studies utilize statistical methods for predictions. For instance, Lancaster et al. [27] introduced multiple regression analysis in the field of real estate price evaluation. Pior et al. [28] combined Geographic Information System (GIS) technology to assess the prices of second-hand housing. In their research, Xu and Zhang employed multi-scale geographically weighted regression (MGWR) [29]. Recent studies have found that the performance of machine learning models is superior to traditional statistical methods, being able to not only deal with complex nonlinear relationships but also having higher computational efficiency [30,31]. Adetuji et al. [32] established a random forest model to predict housing prices, and the error range was controlled within ±5%, showing the advantages of the machine learning model in prediction accuracy. Xu and Zhang [33] used neural network models to predict housing prices, further demonstrating the potential of machine learning methods in efficiency and accuracy. Zhan et al. [34] combined Bayesian optimization with ensemble learning technology to bring new solutions to housing price forecasting. Ge et al. [35] compared artificial neural networks (ANNs) with logistic regression (LR) models and found that an ANN performed significantly better than LR in dealing with nonlinear relationships, which further confirmed the superiority of machine learning models in prediction problems. Although machine learning models are superior to statistical methods in terms of prediction efficiency and accuracy, the learning speed of some models is usually much slower than the required speed. Huang et al. [36] noted this and proposed a new learning algorithm called an extreme learning machine (ELM), which can provide the best generalization performance with an extremely fast learning speed. The kernel extreme learning machine (KELM) uses the kernel function instead of the activation function of the hidden layer based on the extreme learning machine. In practical applications, the number of neurons in the hidden layer, input parameters, and system bias are not considered. The generalization ability and learning speed are better than that of the ELM [37]. Considering that the KELM is highly effective for addressing problems with multiple inputs and nonlinear complex relationships and has a strong reliance on kernel functions, this paper presents a linear weighting approach for multiple kernel functions. It establishes a hybrid kernel extreme learning machine (HKELM). The model is intended to predict the sales cycle of second-hand houses.
The HKELM has multiple hyperparameters that need to be optimized. At present, the mainstream optimization algorithms are the GA [38], QPSO [39], SSA [40], WOA [41], GWO [42], BWO [43], HHO [44], DBO [45], etc. These algorithms and their variants are widely used in various engineering problems and parameter optimization. The Crested Porcupine Optimizer (CPO) was proposed by Abdel-Basset et al. [46]. It has been tested in several optimization problems. Compared with the most advanced heuristic algorithms and traditional methods, it has strong competitiveness and broad application prospects. Zang et al. [47] used the CPO to optimize the BiTCN-BiLSTM-SA model and predicted optical fiber network traffic based on this. Gao et al. [48] used the CNN-BiLSTM-Attention model optimized using the CPO to predict the temperature of a bridge girder. However, some scholars have pointed out that the CPO has problems such as slow convergence and insufficient optimization performance [49] and have improved the CPO. Zhang et al. [50] deleted the population reduction mechanism and improved the update formula of the defense phase. Based on this, the LSTM model was optimized to predict the deformation of soft rock tunnels. Wang et al. [51] introduced chaotic mapping to initialize the population to improve the CPO and optimized the DV-Hop algorithm based on this. Liu et al. [52] enhanced the CPO based on Cauchy mutation, adaptive weights, and other strategies. The above improvements have achieved good results. At the same time, we noticed that Ling et al. [53] introduced the Lévy flight strategy when improving the whale optimization algorithm (WOA), which effectively improved the algorithm’s convergence speed and global optimization ability. Wang et al. [54] introduced the tangent flight operator when improving the Harris Hawk optimizer (HHO), which is innovative. Based on the above research results, we propose the following improvements to the CPO:
  • Using a Cubic map to initialize the population;
  • Deleting cycle population reduction;
  • Introducing a hybrid mutation strategy;
  • Introducing a tangent flight strategy.
Therefore, this paper intends to select the CMTCPO-HKELM method to study the sales cycle of second-hand houses. The main contributions are as follows: (1) In this paper, the SOR model is used for the first time to analyze the influencing factors of the second-hand housing sales cycle, which provides a new perspective for the research in this field. (2) This paper presents an improved optimization algorithm for crested porcupines, which has better computational performance. Based on this, this paper also develops a novel prediction model of the second-hand housing sales cycle, which has achieved rapid and accurate prediction.
The remaining chapters of this article are arranged as follows: The second section shows the research methods proposed in this paper in detail. The third section will select eight cities in China with different economic conditions to carry out case studies. The fourth section discusses the research details of this paper. The fifth section summarizes the research content and limitations of this paper.

2. Materials and Methods

2.1. Influencing Factors of the Sales Cycle of Second-Hand Housing

2.1.1. Factors Related to Stimulation (S)

The stimulus characterizes the external environmental factors of the second-hand housing market, which profoundly affect the sales cycle of second-hand housing. The factors related to the stimulus can be divided into three categories: policy, economy, and market supply and demand.
1.
Policy factors:
Policies implemented by the government primarily encompass monetary policy and tax policy. They intervene in the real estate market at the transaction level, triggering fluctuations in house prices [55,56,57]. In second-hand housing transactions, buyers focus on purchase costs, making policy factors significantly influence the second-hand housing market. Relevant research shows that credit policies include loan interest rates, the determination of down payment ratios, and restrictions on loan terms and economic conditions. When credit policies are suitable for homebuyers, people’s willingness to buy significantly improves [58,59,60]. At the same time, tax fluctuations also impact house prices [61]. In this study, we selected ‘loan rate’ and ‘down payment ratio’ to characterize credit policy and ‘tax ratio’ to characterize tax policy.
2.
Economic factors:
Economic factors stimulate the second-hand housing market in two ways: regional economic conditions and buyers’ purchasing power. Relevant research points out that the real estate market is obviously affected by economic factors, and the downward trend of regional economic conditions has dramatically weakened the purchasing power of consumers [62]. On the one hand, consumers’ desire to buy a house depends on their own housing needs and future purchase plans. On the other hand, it also depends on consumers’ actual disposable funds and consumption capacity. In this study, ‘regional GDP growth rate’ is selected to characterize the regional economic situation, and ‘per capita disposable income’ and ‘unemployment rate’ are selected to characterize the consumption ability of buyers.
3.
Market supply and demand:
Second-hand housing transactions are often affected by the stock of new housing and the supply-and-demand relationship of second-hand housing. When the supply of new homes significantly affects the housing stock, the price of new homes will affect the price of existing homes [63]. At the same time, there is an imbalance between the supply and demand of second-hand housing. The surge in listings over some time will lead to the blocking of second-hand housing transactions. In this study, ‘new house sales cycle’ is selected to characterize the stock of new houses, and ‘second-hand house listing volume’ and ‘second-hand house trading volume’ are selected to characterize the supply-and-demand relationship of second-hand houses.

2.1.2. Factors Related to Organism (O)

The organism characterizes the internal state of the second-hand housing market, that is, the attributes and additional characteristics of second-hand housing. According to the hedonic pricing theory, the factors associated with the organism can be divided into three distinct classes: housing characteristics, location attributes, and supporting facilities.
1.
Housing characteristics:
When consumers purchase goods, they are satisfied by consuming and enjoying the characteristic utility of the goods, so the characteristic attribute of the goods determines the price. In the second-hand housing market, housing characteristics affect the second-hand housing transaction. It is noted that relevant research on housing prices usually uses factors such as ‘building area’, ‘house age’, ‘the number of living rooms’, ‘the number of bedrooms’, ‘the number of toilets’, ‘structural form’, ‘the total number of floors’, ‘floor number’, ‘orientation’, and ‘decoration degree’ to establish a prediction system. This study uses the above factors to characterize housing characteristics.
2.
Location attributes and supporting facilities:
As a key element, the surrounding environment of the house is closely related to human behavior patterns and experiences [64]. Second-hand houses with a superior geographical location are usually located in areas with convenient transportation and perfect supporting facilities. They can provide buyers convenient commuting conditions, high-quality educational resources, rich commercial facilities, and good medical services. These factors have an important impact on the living experience and the value of second-hand houses. Some scholars have introduced service accessibility, traffic accessibility, accessibility of high-quality educational resources, accessibility of shopping centers, greening rate, and other factors to establish an index system when studying second-hand housing transactions [8,9,10]. Therefore, this paper chooses ‘distance from the city center’, ‘distance from the subway station’, ‘the number of educational resources’, ‘greening rate’, and ‘floor area ratio’ to characterize the location attributes and supporting facilities of second-hand houses.

2.1.3. Factors Related to Response (R)

The reaction characterizes the comprehensive reaction of the second-hand housing market to stimuli and organisms. Factors related to the reaction (R) can be divided into the groups housing price and market performance.
1.
Housing price:
The price of second-hand housing is divided into the listing price and transaction price. A change in the listed price significantly affects the trading results [65]. A higher listing price may reduce the purchase intention of buyers, and a reasonable listing price may attract more potential buyers. The transaction price also affects the seller’s pricing strategy and the buyer’s purchase intention. A decline in the transaction price may encourage more buyers to enter the market, and an increase in the transaction price indicates that the market competition is fierce, and sellers are more confident in holding the property in anticipation of higher returns. The rise and fall of housing prices directly affect the activity of housing transactions and market expectations. The sharp rise in house prices may increase buyers’ sense of urgency, and the expectation of further appreciation of the value of real estate prompts them to speed up their purchase decisions. When house prices begin to fall, buyers may wait and see, expecting prices to fall further [66]. Therefore, this study uses ‘listed price’, ‘transaction price’, and ‘the rise and fall of house prices’ to characterize housing prices.
2.
Market performance:
The market performance of second-hand housing is closely related to the average listing time, the number of times to see, and the rate of change in trading volume. There is a correlation between the average listing time and the price of real estate transactions [67]. A shorter listing time may indicate that the property is superior or reasonably priced to buyers’ satisfaction. In comparison, a more extended listing time may mean that the property is less attractive or overpriced [68]. The number of visits reflects the degree of interest of potential buyers in the property. High-frequency band viewing may increase a house’s exposure, increasing the transaction opportunity. A low viewing frequency may mean that the property’s attractiveness in the market is limited [69]. The volume change rate refers to the percentage increase or decrease in real estate volume in a certain period. The increase in trading volume may indicate that the market is active and demand is strong, while the decrease in trading volume may indicate that the market is depressed and demand is weak. Therefore, this study selects ‘average listing time’, ‘number of times to watch’, and ‘volume change rate’ to characterize the market performance.
Based on the above analysis, we have designed a forecasting index system for the sales cycle of second-hand housing, including three primary and thirty-three secondary indexes. O6, O8, O9, and O10 are qualitative, while the others are quantitative. The specific details are shown in Table 1. It should be noted that the number of floors represented by the influencing factor O8 belongs to privacy data. In order to protect the privacy of customers, the real estate platform only provides that the house is located on low floors, middle floors, or high floors, so the influencing factor O8 is classified as a qualitative indicator.

2.2. Relevant Data Acquisition Methods

In this study, the secondary index data of ‘stimulus’ (S) and ‘response’ (R) mainly come from the official government website and the authoritative statistical yearbook, which are highly reliable. The secondary index data of ‘organism’ (O) is automatically collected from the Internet through Web crawler technology, which has a certain timeliness. Specifically, on the real estate information platform Lianjia (https://nc.lianjia.com/, accessed on 22 November 2024), we collected data from 400 second-hand housing transactions for Shanghai, Shenzhen, Guangzhou, Chengdu, Qingdao, Nanchang, Luoyang, and Guilin, eight representative cities. These data cover all aspects related to the house’s property, including but not limited to key information such as the area, price, construction age, and house decoration style. In addition, the information about the location attributes and supporting facilities of the house is obtained from the Gaode map (https://ditu.amap.com/, accessed on 7 December 2024), which covers the geographical location, traffic conditions, educational resources, medical facilities, and other aspects of the house.
The eight cities we selected have certain representativeness and are in line with China’s basic national conditions. Economically, these eight cities belong to the first-tier cities, second-tier cities, and third-tier cities. Geographically, Qingdao, Nanchang, and Shanghai are located in East China, Shenzhen, Guangzhou, and Guilin are located in South China, Luoyang is located in Central China, and Chengdu is located in Southwestern China. Politically, Shanghai, Guangzhou, Nanchang, and Chengdu are municipalities or provincial capitals, and the rest are prefecture-level cities. Figure 1 shows the eight cities we chose.

2.3. The Predictive Model Proposed in This Research

2.3.1. Hybrid Kernel Extreme Learning Machine

An extreme learning machine (ELM) has the advantages of a simple structure, easy implementation, and fast calculation speed. It determines the parameters between the input layer and the hidden layer via random generation and then uses the Moore–Penrose generalized inverse to calculate the optimal parameters between the hidden layer model and the output layer model [70]. In some engineering backgrounds, the performance of the ELM has been proven to be superior to the traditional BP neural network and SVM model [36].
The ELM’s input parameters are randomly generated and fixed, so there is no need for an iterative solution. It only needs to calculate the parameters between the hidden and output layers, greatly improving the model’s calculation speed. Figure 2 shows the basic structure of the ELM.
Given a data set containing N samples, let the sample x i = ( x i 1 , x i 2 , , x i d ) have d features, and y is the sample label. The single-layer feedforward neural network (SLFN) constructed by the ELM can be converted into Equation (1) to solve [36]:
min β i = 1 N y i j = 1 L β j h ( ω j x i + b j ) 2 ,
where L is the number of hidden layer neurons, β j is the weight between the jth hidden layer neuron and the output layer neuron, ω j is the weight between the input layer neuron and the jth hidden layer neuron, b j is the bias term, and h is the activation function.
For the convenience of calculation, Equation (1) can be expressed in the form of a matrix [36]:
min β H β Y 2 ,
where
H = h ( w 1 x 1 + b 1 ) h ( w L x 1 + b L ) h ( w 1 x N + b 1 ) h ( w L x N + b L ) ,
Y = y 1   y 2     y N T ,   β = β 1   β 2     β L T .
By using the Karush–Kuhn–Tucker (KKT) condition and singular value decomposition (SVD) to solve the optimization problem of Equation (2), we can obtain [36]:
β = H Y , H = ( H T H ) 1 H T ,
where H is the Moore–Penrose generalized inverse matrix of H .
In order to make the generalization ability of the ELM stronger, Huang et al. added the regularization coefficient C into Equation (2) and added β 2 as the penalty term to obtain the following problem [71]:
min β H β Y 2 + 1 C β 2 .
Huang et al. proposed the kernel extreme learning machine (KELM) model by using the kernel function instead of the activation function of the hidden layer [37]. It can effectively avoid the dimension disaster generated by the traditional ELM and improve the generalization ability [37]:
Ω K E L M = H T H , Ω i , j = h ( x i ) h ( x j ) = κ ( x i , x j ) ,
where Ω K E L M is the kernel matrix, and κ ( x i , x j ) is the kernel function.
At this point, the solution to Equation (2) is [71]:
β = ( H T H + I C ) 1 H T Y .
The regularization coefficient and the kernel parameter determine the performance of the KELM. The regularization coefficient improves the model’s generalization ability, and the kernel parameter determines the effect of sample mapping on the high-dimensional feature space. Common kernel functions include the Radial Basis Function (RBF), polynomial kernel function, and linear kernel function. The RBF has good local learning ability and poor prediction for samples beyond a certain range. On the contrary, the polynomial kernel function, as a global kernel function, can predict samples in a global range. Therefore, this study combines the RBF and polynomial kernel functions to construct a hybrid kernel extreme learning machine (HKELM) model.
The expression of the hybrid kernel function is:
κ ( x i , x j ) = θ · e x p x i x j 2 2 σ 2 + ( 1 θ ) x i x j + a b .
where θ is the weight, σ is the parameter of the RBF, and a , b is the parameter of the polynomial kernel function.
Based on Equations (1)–(9), the predicted value y ^ of sample label y is:
y ^ = h ( x ) β = h ( x ) ( H T H + I C ) 1 H T Y = κ ( x , x 1 ) κ ( x , x 2 ) κ ( x , x N ) ( Ω K E L M + I C ) 1 Y .

2.3.2. Crested Porcupine Optimizer

The CPO is a new heuristic algorithm inspired by the defense mechanism of the crested porcupine. When faced with threats, crested porcupines use four strategies: visual defense, auditory defense, olfactory defense, and physical attack to resist intruders. The basic process of the CPO is as follows [46]:
Step 1: population initialization:
CPO is a population-based heuristic algorithm. Each CP in the population is continuously updated in the search space as a candidate solution. Assuming that there are N individuals in the CP population, the search space is d-dimensional, and the position matrix can be expressed using Equation (11) [46]:
X = x 1 x 2 x N = x 1 , 1 x 1 , j x 1 , d x i , 1 x i , j x i , d x N , 1 x N , j x N , d .
Each individual x i can be initialized as Equation (12) [46]:
x i = l b + r a n d × ( l b u b ) , i = 1 , 2 , , N ,
where l b and u b are the upper and lower bounds of the search space of the solution, respectively, and r is a random number between [0, 1].
Step 2: cycle population reduction (CPR):
The individuals in the crown porcupine population will trigger the defense mechanism after being threatened, while the unthreatened individuals live normally. To simulate this process, the CPO introduces population cycle reduction technology to reduce the population size [46]:
N ( t ) = N m i n + ( N N m i n ) × 1 t % T m a x T T m a x T ,
where t is the current generation, N ( t ) is the population size of the t generation, N m i n is the minimum population size, N is the initial population size, T m a x is the maximum number of iterations, T is used to determine the number of cycles, and % is the remainder operation. The change in population size when T = 1 , 5 , 10 , N m i n = 30 , N = 50 is shown in Figure 3.
Step 3: exploration phase:
1.
Visual defense:
When the crown porcupine encounters a predator, it will erect and wave its sharp spines to warn the predator. This phenomenon can be expressed using Equation (14) [46]:
x i ( t + 1 ) = x i ( t ) + τ 1 × 2 × τ 2 × x C P ( t ) y i ( t ) ,
where τ 1 is a random number obeying the normal distribution, τ 2 is a random number between [ 0 , 1 ] , x i ( t ) is the position of the individual in the t-th generation, x C P ( t ) is the optimal position of the population in the t-th generation, and y i ( t ) is the position of the predator in the t-th generation. It can be defined according to Equation (15) [46]:
y i ( t ) = ( x i ( t ) + x r ( t ) ) / 2 ,
where r is an arbitrary random integer in [ 0 , 1 ] .
2.
Auditory defense:
The crown porcupine will produce noise to warn the predator. This process can be realized by Equation (16) [46]:
x i ( t + 1 ) = ( 1 U 1 ) × x i ( t ) + U 1 × ( y i ( t ) + τ 3 × ( x r 1 ( t ) x r 2 ( t ) ) ) ,
where τ 3 is a random number between [ 0 , 1 ] , r 1 and r 2 are two random integers between [ 1 , N ] , and U 1 represents a randomly generated binary vector.
Step 4: exploitation phase:
3.
Olfactory defense:
The crown porcupine emits a special smell, preventing the predator from approaching. This process can be realized by Equation (17) [46]:
x i ( t + 1 ) = ( 1 R 1 ) × x i ( t ) + R 1 × ( x r 1 ( t ) + S i ( t ) × ( x r 1 ( t ) x r 2 ( t ) ) τ 4 × δ × γ ( t ) × S i ( t ) ) ,
where τ 4 is a random number between [ 0 , 1 ] , r 3 is a random integer between [ 1 , N ] , R 1 is a randomly generated binary vector, S i ( t ) is an odor diffusion factor defined by Equation (18), δ is defined by Equation (19) and is used to control the search direction, and γ ( t ) is a defence factor defined by Equation (20).
S i ( t ) = e x p f ( x i ( t ) ) k = 1 N f ( x k ( t ) ) + ε ,
δ = + 1 , i f   r a n d 0.5 1 , e l s e ,
γ ( t ) = 2 × r a n d × 1 t T m a x t T m a x ,
Here, f ( · ) represents the value of the objective function, and ε > 0 is a minimum value. r a n d is a random vector, where the values are random numbers between [ 0 , 1 ] , and r a n d is a random number between [ 0 , 1 ] .
4.
Physical attack:
When the predator attacks the crown porcupine, the crown porcupine will attack the predator. This interaction can be modeled by the physical process of inelastic collision, which can be realized by Equation (21) [46]:
x i ( t + 1 ) = x C P ( t ) + ( α ( 1 τ 5 ) + τ 5 ) × ( δ × x C P ( t ) x i ( t ) ) τ 6 × δ × γ ( t ) × F i ( t ) ,
where τ 5 , τ 6 is a random number between [ 0 , 1 ] , α is the convergence factor, and F i ( t ) is the average force acting on the ith predator, which is defined by Equations (22)–(24):
F i ( t ) = τ 7 × m i × ( v i ( t + 1 ) v i ( t ) ) ,
m i = e x p f ( x i ( t ) ) i = 1 K f ( x k ( t ) ) + ε ,
v i ( t ) = x i ( t ) , v i ( t + 1 ) = x r ( t ) ,
Here, m i is the mass of the ith individual in the t-th generation, v i ( t + 1 ) is the final velocity, v i ( t ) is the initial velocity of the ith individual in the t-th generation, τ 7 is a vector, and the values are random numbers between [ 0 , 1 ] .
Step 5: balance between the exploration phase and the exploitation phase:
In order to balance the exploration phase and exploitation phase of the CPO, random numbers φ 1 φ 5 from [ 0 , 1 ] and a constant T f from [ 0 , 1 ] are introduced. The position update formula is summarized as follows [46]:
x i ( t + 1 ) = U p d a t e   b y   E q u a t i o n   ( 14 ) , φ 3 < φ 4 U p d a t e   b y   E q u a t i o n   ( 16 ) , e l s e , φ 1 < φ 2 U p d a t e   b y   E q u a t i o n   ( 17 ) , φ 5 < T f U p d a t e   b y   E q u a t i o n   ( 21 ) , e l s e , φ 1 φ 2 .
Step 6: global optimal solution update:
The global optimal solution can be updated as follows [46]:
x C P ( t + 1 ) = x i ( t + 1 ) , i f   f ( x i ( t + 1 ) ) < f ( x C P ( t ) )   x C P ( t ) , e l s e .

2.3.3. CMTCPO: An Improved Crested Porcupine Optimizer

The traditional CPO has problems such as slow convergence, easy-to-fall-into local optimum, and insufficient optimization performance. To avoid these problems, this research proposes an improved crown porcupine optimizer, CMTCPO. The specific improvement process is as follows.
1.
Use chaos mapping to initialize populations:
In many meta-heuristic algorithms, the initialization of the population often depends on the probability distribution to generate randomly. Although this method is simple, it may lead to insufficient population diversity, affecting the algorithm’s search efficiency and global optimization ability. In order to solve this problem, some scholars began to explore the introduction of chaotic mapping to improve the meta-heuristic algorithm. Chaos refers to the extremely complex and unpredictable dynamic behavior in deterministic dynamic systems due to their sensitivity to initial conditions. It has the characteristics of unpredictability, aperiodicity, and ergodicity [72]. In recent years, chaotic mapping has been widely recognized in optimization. It has been proven to maintain population diversity and improve an algorithm’s global search ability [73].
Table 2 shows 10 common chaotic mapping functions: C 1 C 3 are based on a polynomial function, C 4 C 6 are based on a trigonometric function, C 7 is based on a complementary function, C 8 , C 9 is a piecewise function, and C 10 is composed of various chaotic mappings.
Figure 4 shows the distribution of points generated by different chaotic maps. Except for C 1 , C 5 , the generated points fall within [ 0 , 1 ] , which ensures diversity and comprehensiveness in the search process. The points generated by C 7 are evenly distributed and have good mathematical characteristics, but the points generated by some chaotic maps are unevenly distributed, resulting in performance degradation. Therefore, selecting the appropriate chaotic map is very important in practical applications. This study selects the above 10 chaotic maps to improve the CPO, and the improved effect is comprehensively compared. The optimal chaotic map is the Cubic chaotic map. The specific process is discussed in the Section 4.
The initialization of the CPO also depends on random generation. Therefore, this study introduces chaotic mapping to improve the CPO, increase the diversity of the population, and improve the search ability of the algorithm. The position x i of each CP individual can be initialized using Equation (27):
x i = l b + C l × ( l b u b ) , i = 1 , 2 , , N , l = 1 , 2 , , 10 ,
where l b and u b are the upper and lower bounds of the search space of the solution, respectively, and C l is the chaotic map in Table 2.
2.
Delete cycle population reduction technique:
The CPO’s cycle population reduction technique leads to a smaller search space in a certain period. In this study, the population cycle reduction technique is deleted, and the population size is kept unchanged to prevent the loss of the potential optimal solution.
3.
Hybrid mutation strategy:
With the increase in the number of iterations, the traditional CPO easily falls into a local optimum in the later stage, and convergence is slow. This study proposes a hybrid mutation strategy to improve this problem.
Convergence factor A is defined to judge whether the population is in the early or late stages of iteration. Let m , n be any positive integers and T m a x be the maximum number of iterations; then, A changes according to Equation (28):
A ( t ) = 2 e x p { ( m t T m a x ) n } .
Figure 5 shows the changes of A ( t ) under different parameter conditions. When m = 1.5 , n = 8 , A ( t ) has very good properties: it decreases slowly. It has a large value in the first 500 generations and decreases rapidly to a very small value in the latter 500 generations. At this time, if A ( t ) < 0.5 , it indicates that the algorithm is in the late iteration; otherwise, it is in the early iteration. Therefore, this study’s parameter for the convergence factor is m = 1.5 , n = 8 .
Step 1: differential evolution strategy:
The basic process of differential evolution is similar to that of the Genetic Algorithm (GA), including mutation, crossover, and selection. This strategy effectively solves the problem of insufficient vitality in the later stage of meta-heuristic algorithm iteration. Our study introduces this strategy into the first and third defense stages to improve the CPO’s convergence speed.
When A ( t ) < 0.5 , let X be the original population, V be the mutated population, and U be the population after the crossover operation; then, the following process occurs:
  • Mutation:
Let F 1 and F 2 denote the scaling factor. F 1 is an arbitrary random number between [ 0 , 1 ] , and F 2 be a random number between [ 0 , 1 ] , which obeys the standard normal distribution N ( 0,1 ) . In the t generation, if a CP individual’s position is x i ( t ) , four individuals, x r 1 t , x r 2 t , x r 3 t , and x r 4 ( t ) , are randomly selected from the original population. The individual v i ( t ) can be represented by Equation (29) [74]:
v i = F 1 · ( x c p ( t ) x i ( t ) ) + F 2 · ( x r 1 ( t ) x r 2 ( t ) + x r 3 ( t ) x r 4 ( t ) ) .
  • Crossover:
The crossover coefficient CR determines whether the individuals in the population perform crossover operation, which can be defined as:
C R = 0.3 + 0.2 s i n ( π 2 t · T m a x ) .
Let c r be an arbitrary random number between [ 0 , 1 ] . When c r is less than or equal to the crossover coefficient C R , the population performs a crossover operation. The individual u i ( t ) in the population U can be expressed as [74]:
u i j ( t ) = u i j ( t ) , i f   c r C R x i j ( t ) , o t h e r w i s e .
  • Selection:
Individual x i ( t ) in the original population and individual v i ( t ) in the crossover population are selected according to the greedy algorithm, and the individual with better fitness is chosen as the next-generation individual [74]:
x i ( t ) = u i ( t ) , i f   f ( u i ( t ) ) f ( x i ( t ) ) x i ( t ) , o t h e r w i s e .
Step 2: Cauchy mutation strategy:
Cauchy mutation is applied to improve the meta-heuristic algorithm. The basic idea is to use the Cauchy distribution to generate disturbance based on the current optimal solution to explore new positions in the search space. The peak value of the Cauchy distribution at the origin is smaller. Still, it is heavier at the tails at both ends, which makes the Cauchy mutation able to explore new solutions in a more extensive range, which helps the algorithm to escape from the local optimum and enhance the global search ability. The probability density function is as follows [54].
f ( x ; a , b ) = 1 π b ( x a ) 2 + b 2 ,
where a is the position parameter, and b is the scale parameter. When a = 0 , b = 1 , the distribution represented by Equation (33) is called the standard Cauchy distribution.
Therefore, this study introduces the Cauchy mutation strategy as the fourth defense mechanism to increase the CPO’s global search ability in the later stage of iteration, help the algorithm move beyond the local optimal solution, and improve the diversity of solutions.
If A ( t ) < 0.5 , in the t generation, the position x i ( t ) of the CP individual can be updated using Equation (34):
x i ( t + 1 ) = x i ( t ) + ( x C P ( t ) x i ( t ) ) · C a u c h y ( 0,1 ) ,
where C a u c h y ( 0,1 ) represents a random number that obeys the standard Cauchy distribution.
4.
Tangent flight strategy:
Some studies have introduced Lévy flight operators to improve the performance of swarm intelligence algorithms [75,76,77]. Lévy flight is a random walk process that alternates between high-frequency, short-distance exploration and low-frequency, long-distance exploration, which can simultaneously avoid the local optimum and increase population diversity. Some scholars have recently proposed new algorithms and improvement strategies by imitating the Lévy flight process. For example, Layeb [78] proposed the tangent search algorithm. Wang et al. [54] noted that the tangent function tends to infinity at some points and is periodic and introduced the tangent flight operator to improve the HHO. Therefore, this study introduces a tangent flight operator when improving the CPO. The step length of the tangent flight can be controlled using Equation (35) [54]:
T a n g e n t = t a n ( π 2 δ ) ,
where δ = r a n d n ( 1 , d ) is a vector composed of d-dimensional random numbers.
The direction is expressed using Equation (36) [54]:
D i r e c t i o n = μ s i g n [ r a n d 1 2 ] ,
where μ , r a n d are random numbers obeying uniform distribution between [0, 1], and the value of s i g n [ r a n d 1 2 ] can only be 1 ,   0 ,   1 .
After performing all defense strategies in each round, we introduce tangent flight to update again:
x i ( t + 1 ) = x C P ( t ) + ( x i ( t ) x C P ( t ) ) μ s i g n [ r a n d 1 2 ] t a n ( π 2 δ ) ,
where represents the corresponding element multiplication.
Based on the above strategy, this study improved the CPO and named it the CMTCPO. The pseudo-code of the CMTCPO algorithm is shown in Figure S1.

2.4. The Proposed Prediction Model

The prediction model for the sales cycle of the second-hand housing proposed in this paper can be realized according to the following process:
Step 1: data crawling and preprocessing:
Based on the Scrapy framework, a Python 3.12.7 program crawls the latest second-hand housing transaction information of Shanghai, Shenzhen, Guangzhou, Chengdu, Qingdao, Nanchang, Luoyang, and Guilin in 2024 on Lianjia.com. The information related to their geographical location is obtained from the Gaode map.
The missing values in the crawled data were addressed. To ensure the authenticity of the data, all cases with missing values were removed. Additionally, since some indicators are qualitative, they need to be converted into numerical values to be incorporated into the model.
Machine learning models require high-quality data. Different indicators have varying dimensions and magnitudes, which can lead to decreased prediction accuracy and slower model convergence. To enhance data analysis, normalization is necessary. We will utilize Min–Max Scaling as the normalization method.
Let x i j denote the normalized value of the data x i j , max x j denote the maximum value of the j index, and m i n x j denote the minimum value of the jth index. Then, the normalization method of the larger and better index is:
x i j = x i j m i n ( x j ) m a x ( x j ) m i n ( x j ) .
Similarly, the normalization method of the smaller and better type of index is:
x i j = m a x ( x j ) x i j m a x ( x j ) m i n ( x j ) .
Step 2: correlation analysis:
Before selecting the model quantization period, it is essential to determine whether the model should be linear or nonlinear. The Pearson correlation coefficient is a useful measure for assessing the size and direction of the linear relationship between two sets of variables. Its formula is as follows [79]:
R = i = 1 n ( x i x ¯ ) ( y i y ¯ ) i = 1 n x i x ¯ 2 ( y i y ¯ ) 2 .
where x i , y i are two sets of random variables, and x ¯ , y ¯ are the sample mean values.
In general, when R > 0.8 , the correlation between variables x , y is strong; when 0.7 < R < 0.8 , it is general; and when R < 0.7 , there is no correlation between the variables.
In our study, we will calculate the correlation coefficient between each index and the sales cycle to evaluate the strength of the correlation. We will then decide whether to use a linear or nonlinear model based on these results. If more than 70% of the indicators demonstrate significant correlation, we will select the linear model for analyzing the sales cycle; otherwise, we will opt for the nonlinear model.
Step 3: finding the optimal parameters of the HKELM based on the CMTCPO:
1.
Divide the data sets:
The partition ratio of the data set directly affects the training of the model. The common partition ratios include 8:2, 7:3, and 6:4. Each ratio has its own applicable scenarios. As the amount of data in the test set gradually increases, it helps to evaluate the model’s ability to see data comprehensively.
2.
Set the model parameters:
The performance of the algorithm needs to select basic parameters such as the appropriate population size N , the number of iterations T m a x , the chaotic control parameter ρ , the convergence factor α , and the constant T f . For general optimization problems, the population size can be selected from [ 30 , 100 ] , and the number of iterations is set to 100 to 200 times the problem dimension to ensure the coverage of the search space and to reduce the waste of computing resources. The convergence factor α and the constant T f are determined according to Reference [46], and ρ is 2.595.
The parameters of the HKELM model include the regularization coefficient C , the weight θ , the bandwidth σ of the RBF kernel, and the parameters a , b of the Poly kernel. These parameters are initialized according to Equation (12).
3.
Calculate the fitness:
After inputting the model parameters and the divided training set, the CMTCPO searches the multidimensional solution space and gradually approaches the global optimal solution through multiple iterations. The root mean square error (RMSE) can quantify the difference between the predicted value and the actual value of the model. The smaller the value is, the higher the prediction accuracy of the model is [80]. Therefore, we choose the RMSE as the fitness function to evaluate the fitness of each generation, aiming for improved prediction accuracy. The calculation method is outlined in Equation (41) [80]:
R M S E = 1 n i = 1 n ( y i y ^ i ) 2 ,
where n is the number of samples, y i is the true value of the sample, and y ^ i is the predicted value of the sample.
4.
Update the optimal solution and value according to the fitness of each generation.
5.
Determine whether the current time satisfies the termination condition and output the optimal parameter; otherwise, repeat Step 3.
Step 4: establishing the HKELM model and predicting the sales cycle:
The optimal parameters found using the CMTCPO in Step 3 are substituted into the reconstructed HKELM model to predict the cycle of second-hand house removal.
Step 5: analysis of the accuracy and reliability of the predicted results:
To prevent over-fitting of the model, we performed a ten-fold cross-validation. This involved randomly dividing the data into 10 groups, using 9 of them for training and the remaining group for testing. If the RMSE, MAPE, and maximum relative error of the prediction results are very similar across the 10 iterations, it indicates that the model’s predictions are accurate and that over-fitting is not present.
To assess the reliability of the predicted sales cycle, this study employs the Bland–Altman analysis method. If the difference between the predicted results and the actual values falls within the 95% confidence interval, the predictions are deemed more reliable.
If the model meets the accuracy and reliability requirements simultaneously, the prediction process is completed. Otherwise, Step 3 and Step 4 will be repeated, and the prediction results will be analyzed again until the conditions are met.
The flow chart of this study is shown in Figure 6:
In summary, our study used the SOR model to identify 33 influencing factors, and the selected influencing factors were more comprehensive than previous studies. At the same time, the original crown porcupine optimizer is improved and used to optimize the hybrid kernel extreme learning machine. Based on this, a novel second-hand house removal cycle prediction model is proposed.

3. Case Analysis

3.1. Data Preprocessing and Correlation Analysis

Based on the crawler technology, this study selected 400 second-hand houses from cities including Shanghai, Shenzhen, Guangzhou, Chengdu, Qingdao, Nanchang, Luoyang, and Guilin. The aim was to verify the proposed prediction model’s accuracy, efficiency, and applicability. To maintain the integrity and authenticity of the data, any original samples with missing values were removed.
It should be noted that due to the privacy of second-hand housing transaction data, Lianjia.com has access restrictions on users from the same IP address to prevent malicious data acquisition. Therefore, it is difficult to conduct large-scale data acquisition. The 400 relevant data points crawled in this article laid the foundation for future research. We need to overcome the limitations of data acquisition and consider establishing cooperative relationships with real estate platforms to obtain richer data sets through this channel in the future. Some of the case data are shown in Table S1 in the Supplementary Materials.
Some indicators in the original data set are qualitative. Therefore, we need to digitize all the categorical data within the data set. The two indices, ‘floor number’ (low floor = 0, middle floor = 1, and high floor = 2) and ‘decoration degree’ (unrenovated = 0, simple = 1, and upgraded = 2), are ordinal data and can be assigned directly.
Considering that the ‘structure type’ and ‘orientation’ do not have a hierarchical order, they can be processed via one-hot encoding. In one-hot encoding, each category value is converted into a binary vector, with the exception that one position representing the category is 1, and the remaining positions are 0.
The process of encoding the ‘orientation’ is shown in Table 3.
After this, we standardize all the numerical data by applying Equations (38) and (39). Subsequently, the standardized data are input into Equation (40) to determine the Pearson correlation coefficient between the input and output variables. In order to more intuitively show the correlation between the various indicators, we have drawn a heat map, as shown in Figure 7.
The color of the indicators with a strong correlation in the heat map is darker, and the color of the indicators with a weak correlation is lighter. Based on this, the most relevant indicators in the model establishment process can be screened out. The calculation results show that each influencing factor has a certain correlation with the de-period, and we selected the most relevant indicators, ignoring the unimportant indicators.
We also plotted the correlation coefficients between each input indicator and the predicted target into Table 4. The data presented in Table 4 show that the majority of correlation coefficients are below 0.7, suggesting that the secondary indicators do not have a significant correlation with the sales cycle. As a result, it is advisable to use a nonlinear modeling approach to predict the sales cycle for second-hand houses.

3.2. Prediction of the Cycle of Second-Hand House Removal

This section is based on the process discussed in Section 2.4 to predict the sales cycle of second-hand houses. After data preprocessing, the training set and test set are divided into 7:3 ratios and input into the model. Then, the model parameters are initialized, the population size is set to N = 30 , the number of iterations is set to T m a x = 1000 , and we take the convergence factor α = 0.2 , T f = 0.8 ,   ρ = 2.595 . The range of regularization parameters is set to [0.01, 10], the bandwidth of the RBF kernel is set to [10, 1000], the weight is set to [0.5, 1], the parameters of the Poly kernel are set to [0.01, 1], and the parameters are set to integers between [1, 5].
The processed data set and initialized parameters are input into the formula to initialize the CMTCPO-HKELM model. The computer’s CPU used in the case analysis is Intel (R) Core (TM) i9-14900K @ 3.20 GHz, the memory is 128 G, the operating system is Windows 10, the code compilation software is Spyer 5.5.1, and the programming language is Python. According to Equation (32), we calculate each generation’s fitness in the CMTCPO parameter optimization process and draw the fitness curve, as shown in Figure 8.
The CMTCPO converges near the 20th generation, and the optimal parameters of the HKELM are obtained. The optimal regularization parameter is C = 10 , the bandwidth is σ = 992.672 , and the weight of the RBF kernel is μ = 0.5 ; the parameters of the polynomial kernel are m = 0.01 and n = 4 . The optimal RMSE is 0.00003190. The detailed process of the CMTCPO in parameter optimization is shown in Table 5.
According to Table 5, the calculation error of the fitness function before and after 20 iterations is 0.00000986, which is greater than 0.000001. However, the calculation errors are less than 0.000001 for generations 21 and later. This shows that the CMTCPO has converged and found the optimal parameters at 21 generations. We also calculated the running time of the CMTCPO and repeated it 100 times to take the average and obtained the optimal parameters of the CMTCPO after 78.62 optimization times on average.
After obtaining the optimal parameters, the test set is substituted into the HKELM model established using the optimal parameters. The training results of the model on the test set are shown in Table 6.
The optimal HKELM model’s prediction results are more accurate on 120 test samples. The maximum relative error is 0.0001784, the MAPE is 0.00001235%, and the RMSE is 0.00002185.

3.3. Reliability Test of Prediction Results

Ten-fold cross-validation is a technique used to assess whether a model is experiencing over-fitting [81]. This method involves randomly dividing the data set into ten parts: nine parts are used as training sets, while one part serves as the test set. After conducting this process ten times, if the results from each iteration are similar, it can be concluded that the model is not showing over-fitting. Conversely, if the results vary significantly, it indicates that over-fitting may be present in the model. The outcomes of the ten-fold cross-validation are presented in Table 7.
In the 10 calculations, the prediction results of the RMSE, MAPE and maximum relative error are very close, which indicates that the prediction results of this study are more accurate and that there is no over-fitting phenomenon.
To verify the reliability of the predicted sales cycle for second-hand houses, this study employs Bland–Altman analysis. Introduced by Bland and Altman, this method evaluates the consistency between the results obtained from two different data calculation methods [82]. We will utilize the CMTCPO-HKELM method to predict the sales cycle and will compare these predictions with the actual values. If the predictions and actual values fall within the 95% confidence interval, we can conclude that the calculation results are reliable. The findings from the Bland–Altman test are illustrated in Figure 9.
In Figure 9, the y -axis represents the difference between the actual value and the predicted value, while the x -axis represents the actual value of the model, with the m e a n = 0.000004236   a n d   S D = 0.00002144 . There are 116 groups of values falling within the range ( m e a n 1.96 S D   t o   m e a n + 1.96 S D ) , and only 4 groups of values are not within the range. The results of the Bland–Altman analysis indicate that the majority of predictions fall within the 95% confidence interval, demonstrating high reliability.
Although the ten-fold cross-validation and Bland–Altman analysis show that the constructed second-hand house removal cycle model has good reliability, the performance of the model on the unseen data set needs to be further investigated.
We tried to collect information on 100 second-hand housing transactions as an unseen data set from Lianjia.com. These data are from Xi’an and Harbin, which are located in Northwestern China and Northeastern China, respectively. It is noted that cities located in these geographical locations are not considered in the above study. If the performance of the model is better on these data, it can further show that the model has good robustness.
The calculation time of the model on the unseen data set is 286.59 s, the RMSE is 0.003420, and the maximum relative error is 0.02142. We also perform Bland–Altman analysis on the data from the unseen data set, and the results are shown in Figure 10. There are 956 groups of values falling within the range ( m e a n 1.96 S D   t o   m e a n + 1.96 S D ) , and only 4 groups of values are not within the range ( m e a n = 0.007803   a n d   S D = 0.003329 ). The prediction results of the model on the unseen data set passed the Bland–Altman analysis.
It can be seen that the model basically meets the requirements and has good robustness.

3.4. Explanation of the Model

The interpretability of machine learning models is very important. Our study identified 33 influencing factors based on the SOR model. It is important to consider how each indicator affects the prediction of the sales cycle. Therefore, we introduce the partial dependence plot (PDP), which can isolate a single feature while keeping other features unchanged, and distinguish whether its influence on the prediction target is positive or negative. By drawing the PDPs, the machine learning model will be better explained, and the role of each feature in predicting the target will be demonstrated. The PDPs of some indicators we drew are shown in Figure 11. Figure S2 in the Supplementary Materials contains the full version.
It can be seen from Figure 11 that all the indicators fluctuate in a certain manner in the longitudinal direction, showing a relationship with the predicted target. For example, with the increase in the age of the house, the curve shows a trend of increasing first and then decreasing. This can be explained as that people tend to buy houses with moderate age. This is because the houses with smaller ages are usually in non-core areas, and the supporting facilities are relatively imperfect. The older houses are built earlier, may not have elevators, and the supporting facilities are relatively old, and people are reluctant to buy.

4. Discussion

4.1. Different Chaotic Maps Improve the Performance of the CPO

This study evaluated the improved performance of the CPO using 18 standard test functions, named f 1 f 18 . The functions are categorized as follows: f 1 f 5 represent unimodal functions, f 6 f 11 are multimodal functions, and f 12 f 18 are fixed-dimensional multimodal functions. Unimodal benchmark functions have a single optimal solution and are used to assess the algorithm’s development capabilities. In contrast, multimodal benchmark functions contain multiple optimal solutions, which test the algorithm’s search performance in complex scenarios. Detailed information about each benchmark function can be found in Table S2.
In order to explore which chaotic map has the best improvement effect on the CPO, we use ten chaotic maps to improve the CPO and test their performances on 18 benchmark functions. Considering the limited space, we only show part of the calculation results here; the results of each algorithm running 100 times on f 4 are shown in Table 8. CCPO1–CCPO10 correspond to Logistic, Cubic, Singer, Sine, Sinusoidal, Chebyshev, Circle, Tent, Piecewise, and Tent–Logistic–Cosine, respectively. The complete results are shown in Supplementary Materials Table S3.
It can be seen from Table 8 that for the benchmark function f 4 , the performance of each algorithm is ranked as follows: CCPO2 > CCPO4 > CCPO3 > CCPO7 > CCPO8 > CCPO10 > CPO > CCPO5 > CCPO9 > CCPO1 > CCPO6. CCPO2 has an average fitness of 6.17 × 10−14 and a variance of 8.66 × 10−14, which is very close to the optimal solution 0. At the same time, we calculate the average ranking of each algorithm, as shown in Table 9.
Table S3 and Table 9 reveal that CCPO2 and CCPO3, which utilize the Cubic chaotic map and the Singer chaotic map to enhance the CPO, demonstrate a significant improvement in algorithm performance. In contrast, CCPO4, which employs the Sine chaotic map to improve the CPO, shows performance results that are comparable to the original CPO algorithm. While other chaotic maps do enhance the performance on several functions, their overall effectiveness is still inferior to that of the original CPO algorithm. This indicates that Cubic chaotic mapping has a good effect on improving the CPO algorithm.

4.2. Different Strategies to Improve the Performance of the CPO

To assess the effectiveness of various strategies for improving the CPO algorithm, we designed and conducted a series of ablation experiments. The purpose of these experiments is to systematically remove or replace each component of the algorithm to evaluate the contribution and impact of each component on its overall performance. Table 10 presents the performance of the original CPO algorithm alongside seven variants, tested on f 1 . The CMTCPO representation included in the table combines the chaotic map (C), hybrid mutation strategy (M), and tangent flight strategy (T) to enhance the CPO algorithm. It should be noted that we only show part of the calculation results here. The complete results are shown in Supplementary Materials Table S4.
For the benchmark function f 1 , the CMTCPO shows a good effect, with an average fitness of 2.36 × 10−70 and a variance 6.26 × 10−70, followed by the CTCPO and TCPO, with an average fitness of 5.24 × 10−70 and 7.20 × 10−62, respectively. The average fitness of the CPO algorithm is only 5.32 × 10−36, which indicates that the tangent flight (T) strategy greatly improves the performance of the CPO. At the same time, the average fitness of the MCPO is 1.16 × 10−61, which is 25 orders of magnitude higher than the CPO, and the CCPO is only 6 orders of magnitude higher than the CPO. In addition, we also calculate the ranking of each algorithm, which is recorded in Table 11.
It can be seen from Table 11 that the performance of the seven variants from good to bad is CMTCPO > CTCPO > MTCPO > MCPO > TCPO > CCPO > CMCPO > CPO. The algorithm combining three strategies of chaotic mapping, hybrid mutation, and tangent flight is the best. This is because the tangent flight strategy and the hybrid mutation strategy increase the ability of the algorithm to jump out of the local optimum, and Cubic chaotic mapping effectively improves the population diversity.

4.3. Performance of CMTCPO

In order to explore the performance difference between the CMTCPO algorithm and traditional optimization algorithms, we selected the GA, QPSO, SSA, WOA, and GWO five algorithms to compare with the CPO algorithm and CMTCPO. This comparative experiment is still carried out on the 18 benchmark functions in Table S2. The maximum number of iterations of the algorithm is set to 1000 times, and the fitness curve is shown in Figure S3. We only show the fitness curves for f 3 and f 10 in Figure 12; the complete results are shown in the Supplementary Materials.
For the benchmark function f 3 , the CMTCPO decreases slowly in the early iteration stage. The QPSO, SSA, GWO and other algorithms are superior to the CMTCPO in the early stage. In the later iteration stage, the CMTCPO continues to jump out of the local optimum and obtain better results. The result is caused by the hybrid mutation strategy and tangent flight strategy introduced in the CMTCPO algorithm.
Figure 12 shows that the CMTCPO excels in unimodal and multimodal problems, successfully avoiding local optimal values on multiple occasions. In comparison to the CPO and other meta-heuristic algorithms, solutions obtained through the CMTCPO are closer to the theoretical optimal value, exhibiting superior convergence speed.
Given that the effectiveness of the CMTCPO in machine learning prediction has not been established, we selected six classical algorithms—GA, PSO, BO, SSA, WOA, and GWO—to optimize the HKELM model and determine its optimal parameters. Subsequently, we compared the performances of these five optimization algorithms with the CPO and CMTCPO. The calculation results for each algorithm, after running the simulations 100 times, are presented in Table 12.
Table 12 illustrates that the CMTCPO achieves the smallest calculation error and the fastest convergence speed when optimizing the HKELM model, outperforming other meta-heuristic algorithms. The order of prediction accuracy of each algorithm is CMTCPO > PSO > GWO = CPO > SSA > WOA > BO > GA, and the order of operation time is CMTCPO < CPO < PSO < GWO < SSA < BO < WOA < GA. The significant potential and advantages of the CMTCPO in machine learning prediction are evident. This is primarily due to its adoption of a hybrid mutation and tangent flight strategy, which can make the algorithm jump out of local optima multiple times and obtain more accurate results.

4.4. Performance of HKELM

In order to test the computational performance of the HKELM model, we chose a backpropagation neural network (BPNN) [83], least squares support vector machine (LSSVM) [84], Random Forests, LightGBM, and XGBoost [85], which are widely used in the field of prediction, to compare with the HKELM. These six data prediction methods also use the CMTCPO for parameter optimization. The results after 100 calculations are shown in Table 13.
Table 13 shows that the RMSE of the HKELM algorithm is the lowest, and it also has the shortest calculation time. Compared with the BPNN, LSSVM, RF, XGBoost, and LightGBM, the HKELM demonstrates superior prediction performance. It is more effective at handling high-dimensional, complex data sets and significantly reduces the consumption of computing resources. This may be because the HKELM belongs to a single-hidden-layer feedforward neural network, which randomly determines an input weight and bias and can directly calculate the output weight using the least squares method. This method avoids the complex iterative process and greatly reduces the amount of calculation.

5. Conclusions

In this paper, a novel prediction model of the second-hand housing sales cycle is proposed based on the improved CPO-HKELM method. Through in-depth analysis and data mining of the second-hand housing market, this model effectively improves the accuracy and reliability of cycle prediction. The experimental results using 400 groups of data from eight cities in China show that the maximum relative error of the improved CPO-HKELM model is 0.0001784, the MAPE is 0.00001235%, and the RMSE is 0.0002050. Three strategies of chaotic mapping, hybrid mutation, and tangent flight are used to improve the CPO. Specifically, chaotic mapping is added to the population initialization stage, which increases the population diversity. The hybrid mutation is added to the first, third, and fourth stages of the original CPO to avoid the algorithm falling into a local optimum, and a tangent flight strategy is added at the end to further broaden the search space of the algorithm. Compared with the classical CPO, GA, PSO, BO, SSA, WOA, and GWO, the improved CPO has the smallest calculation error and the fastest convergence speed. Compared with the BPNN, LSSVM, RF, XGBoost, and LightGBM, the HKELM has the lowest RMSE and the shortest computing time, handling high-dimensional complex data sets more effectively and significantly reducing the consumption of computing resources. In addition, with the help of the SOR model, we discuss many important factors that affect the second-hand housing sales cycle, including market supply and demand, economic indicators, policy changes, and so on. The above results show that the prediction model based on the improved CPO-HKELM can provide reliable theoretical support and a data basis for the research and practice of the second-hand housing market.
The future research direction can further explore the influence of multivariate variables on the model performance in order to further improve the forecasting ability and promote the in-depth development of real estate market analysis.

Supplementary Materials

The following supporting information can be downloaded at https://www.mdpi.com/article/10.3390/buildings15071200/s1: Figure S1: The Pseudocode of CMTCPO; Figure S2: PDPs of the indicators; Figure S3: The performance of seven algorithms on 18 benchmark functions; Table S1: Some Case Data; Table S2: Detailed information about benchmark function; Table S3: Performance of CCPO1–CCPO10 on 18 benchmark functions; Table S4: The performance of each variant on 18 benchmark functions.

Author Contributions

Conceptualization, B.Y. and S.C.; methodology, B.Y. and H.W.; software, B.Y. and D.Y.; validation, B.Y., D.Y., H.W., J.W. and S.C.; formal analysis, B.Y. and H.W.; investigation, B.Y. and D.Y.; resources, B.Y. and S.C.; data curation, B.Y. and S.C.; writing—original draft preparation, B.Y., D.Y., H.W., J.W. and S.C.; writing—review and editing, B.Y., D.Y., H.W., J.W. and S.C.; visualization, B.Y. and S.C.; supervision, H.W.; project administration, H.W. and S.C.; funding acquisition, H.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The case analysis data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Miller, N.; Peng, L.; Sklarz, M. House Prices and Economic Growth. J. Real Estate Financ. Econ. 2011, 42, 522–541. [Google Scholar] [CrossRef]
  2. Strauss, J. Does housing drive state-level job growth? Building permits and consumer expectations forecast a state’s economic activity. J. Urban Econ. 2013, 73, 77–93. [Google Scholar] [CrossRef]
  3. Nong, H. Analyzing the Role of the Real Estate Sector in the Sectoral Network of the Chinese Economy. Struct. Change Econ. Dyn. 2024, 70, 567–580. [Google Scholar] [CrossRef]
  4. Xu, L.L.; Li, Z.W. A New Appraisal Model of Second-Hand Housing Prices in China’s First-Tier Cities Based on Machine Learning Algorithms. Comput. Econ. 2021, 57, 617–637. [Google Scholar] [CrossRef]
  5. Jiang, Y.; Zheng, L.; Wang, J. Research on external financial risk measurement of China real estate. Int. J. Financ. Econ. 2021, 26, 5472–5484. [Google Scholar] [CrossRef]
  6. Kang, Y.; Zhang, F.; Peng, W.; Gao, S.; Rao, J.; Duarte, F.; Ratti, C. Understanding house price appreciation using multi-source big geo-data and machine learning. Land Use Policy 2021, 111, 104919. [Google Scholar] [CrossRef]
  7. Zhang, Y.; Huang, J.; Zhang, J.; Liu, S.; Shorman, S. Analysis and prediction of second-hand house price based on random forest. Appl. Math. Nonlinear Sci. 2022, 7, 27–42. [Google Scholar] [CrossRef]
  8. Gao, G.; Bao, Z.; Cao, J.; Qin, A.K.; Sellis, T.K. Location-Centered House Price Prediction: A Multi-Task Learning Approach. ACM Trans. Intell. Syst. Technol. 2019, 13, 32. [Google Scholar] [CrossRef]
  9. Li, B.; Zhu, W.; Xu, Z.; Zhang, C. Two-sided matching theory-based second-hand house transaction evaluation and recommendation by the modified PLC-DEMATEL method. Appl. Soft Comput. 2024, 166, 112196. [Google Scholar] [CrossRef]
  10. Lu, S.; Li, Z.; Qin, Z.; Yang, X.; Goh, R.S.M. A hybrid regression technique for house prices prediction. In Proceedings of the 2017 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), Singapore, 10–13 December 2017; pp. 319–323. [Google Scholar]
  11. Park, B.; Bae, J.K. Using machine learning algorithms for housing price prediction: The case of Fairfax County, Virginia housing data. Expert Syst. Appl. 2015, 42, 2928–2934. [Google Scholar] [CrossRef]
  12. DiPasquale, D.; Wheaton, W.C. Housing Market Dynamics and the Future of Housing Prices. J. Urban Econ. 1994, 35, 1–27. [Google Scholar] [CrossRef]
  13. Zhong, C.; Xie, L.; Shi, Y.; Xu, X. Macro-prudential policy, its alignment with monetary policy and house price growth: A cross-country study. Q. Rev. Econ. Financ. 2023, 90, 51–62. [Google Scholar] [CrossRef]
  14. Zhang, H.; Li, Y.; Branco, P. Describe the house and I will tell you the price: House price prediction with textual description data. Nat. Lang. Eng. 2023, 30, 661–695. [Google Scholar] [CrossRef]
  15. He, Y.; Xia, F. Heterogeneous traders, house prices and healthy urban housing market: A DSGE model based on behavioral economics. Habitat Int. 2020, 96, 102085. [Google Scholar] [CrossRef]
  16. Gdakowicz, A.; Putek-Szeląg, E. The influence of discretization of the flat’s area on its selling time. Procedia Comput. Sci. 2022, 207, 1881–1890. [Google Scholar] [CrossRef]
  17. Mehrabian, A.; Russell, J.A. An Approach to Environmental Psychology; The MIT Press: Cambridge, MA, USA, 1974; p. 266. [Google Scholar]
  18. Krause, A.; Lipscomb, C. The Data Preparation Process in Real Estate: Guidance and Review. J. Real Estate Pract. Educ. 2016; in press. [Google Scholar] [CrossRef]
  19. Roy, D. Housing demand in Indian metros: A hedonic approach. Int. J. Hous. Mark. Anal. 2018, 13, 19–55. [Google Scholar] [CrossRef]
  20. Feng, C.; Li, W.; Zhao, F. Influence of rail transit on nearby commodity housing prices: A case study of Beijing Subway Line Five. Acta Geogr. Sin. 2011, 66, 1055–1062. [Google Scholar]
  21. Saiz, A. Bricks, mortar, and proptech. J. Prop. Invest. Financ. 2020, 38, 327–347. [Google Scholar] [CrossRef]
  22. Zhang, X.; Lin, Z.; Zhang, Y.; Zheng, Y.; Zhang, J. Online property brokerage platform and prices of second-hand houses: Evidence from Lianjia’s entry. Electron. Commer. Res. Appl. 2021, 50, 101104. [Google Scholar] [CrossRef]
  23. Khder, M. Web Scraping or Web Crawling: State of Art, Techniques, Approaches and Application. Int. J. Adv. Soft Comput. Its Appl. 2021, 13, 145–168. [Google Scholar] [CrossRef]
  24. Lee, W. Machine Learning based Prediction of The Value of Buildings. KSII Trans. Internet Inf. Syst. 2018, 12, 3966–3991. [Google Scholar] [CrossRef]
  25. Yao, Y.; Zhang, J.; Hong, Y.; Liang, H.; He, J. Mapping fine scale urban housing prices by fusing remotely sensed imagery and social media data. Trans. GIS 2018, 22, 561–581. [Google Scholar] [CrossRef]
  26. Song, Y.; Ma, X. Exploration of intelligent housing price forecasting based on the anchoring effect. Neural Comput. Appl. 2023, 36, 2201–2214. [Google Scholar] [CrossRef]
  27. Lancaster, K.J. A New Approach to Consumer Theory. J. Political Econ. 1966, 74, 132–157. [Google Scholar] [CrossRef]
  28. Pior, M.Y.; Shimizu, E. GIS-aided evaluation system for infrastructure improvements: Focusing on simple hedonic and Rosen’s two-step approaches. Comput. Environ. Urban Syst. 2001, 25, 223–246. [Google Scholar] [CrossRef]
  29. Xu, S.; Zhang, Z. Spatial Differentiation and Influencing Factors of Second-Hand Housing Prices: A Case Study of Binhu New District, Hefei City, Anhui Province, China. J. Math. 2021, 2021, e8792550. [Google Scholar] [CrossRef]
  30. Soltani, A.; Heydari, M.; Aghaei, F.; Pettit, C.J. Housing price prediction incorporating spatio-temporal dependency into machine learning algorithms. Cities 2022, 131, 103941. [Google Scholar] [CrossRef]
  31. Rico-Juan, J.R.; de La Paz, P.T. Machine learning with explainability or spatial hedonics tools? An analysis of the asking prices in the housing market in Alicante, Spain. Expert Syst. Appl. 2021, 171, 114590. [Google Scholar] [CrossRef]
  32. Adetunji, A.B.; Akande, O.N.; Ajala, F.A.; Oyewo, O.; Akande, Y.F.; Oluwadara, G. House Price Prediction using Random Forest Machine Learning Technique. Procedia Comput. Sci. 2022, 199, 806–813. [Google Scholar] [CrossRef]
  33. Xu, X.; Zhang, Y. House price forecasting with neural networks. Intell. Syst. Appl. 2021, 12, 200052. [Google Scholar] [CrossRef]
  34. Zhan, C.; Liu, Y.; Wu, Z.; Zhao, M.; Chow, T.W.S. A hybrid machine learning framework for forecasting house price. Expert Syst. Appl. 2023, 233, 120981. [Google Scholar] [CrossRef]
  35. Ge, X.; Runeson, G. Modeling Property Prices Using Neural Network Model for Hong Kong. Int. Real Estate Rev. 2004, 7, 121–138. [Google Scholar] [CrossRef]
  36. Huang, G.-B.; Zhu, Q.-Y.; Siew, C.-K. Extreme learning machine: Theory and applications. Neurocomputing 2006, 70, 489–501. [Google Scholar] [CrossRef]
  37. Huang, G.B.; Zhou, H.; Ding, X.; Zhang, R. Extreme Learning Machine for Regression and Multiclass Classification. IEEE Trans. Syst. Man Cybern. Part B (Cybern.) 2012, 42, 513–529. [Google Scholar] [CrossRef]
  38. Holland, J.H. Genetic algorithms. Sci. Am. 2012, 7, 1482. [Google Scholar]
  39. Sun, J.; Fang, W.; Wu, X.; Palade, V.; Xu, W. Quantum-Behaved Particle Swarm Optimization: Analysis of Individual Particle Behavior and Parameter Selection. Evol. Comput. 2012, 20, 349–393. [Google Scholar] [CrossRef]
  40. Xue, J.-K.; Shen, B. A novel swarm intelligence optimization approach: Sparrow search algorithm. Syst. Sci. Control Eng. 2020, 8, 22–34. [Google Scholar] [CrossRef]
  41. Mirjalili, S.; Lewis, A. The Whale Optimization Algorithm. Adv. Eng. Softw. 2016, 95, 51–67. [Google Scholar] [CrossRef]
  42. Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey Wolf Optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef]
  43. Zhong, C.; Li, G.; Meng, Z. Beluga whale optimization: A novel nature-inspired metaheuristic algorithm. Knowl.-Based Syst. 2022, 251, 109215. [Google Scholar] [CrossRef]
  44. Heidari, A.A.; Mirjalili, S.; Faris, H.; Aljarah, I.; Mafarja, M.; Chen, H. Harris hawks optimization: Algorithm and applications. Future Gener. Comput. Syst. 2019, 97, 849–872. [Google Scholar] [CrossRef]
  45. Xue, J.; Shen, B. Dung beetle optimizer: A new meta-heuristic algorithm for global optimization. J. Supercomput. 2023, 79, 7305–7336. [Google Scholar] [CrossRef]
  46. Abdel-Basset, M.; Mohamed, R.; Abouhawwash, M. Crested Porcupine Optimizer: A new nature-inspired metaheuristic. Knowl.-Based Syst. 2024, 284, 111257. [Google Scholar] [CrossRef]
  47. Zang, J.; Cao, B.; Hong, Y. Research on the Fiber-to-the-Room Network Traffic Prediction Method Based on Crested Porcupine Optimizer Optimization. Appl. Sci. 2024, 14, 4840. [Google Scholar] [CrossRef]
  48. Gao, Y.; Wang, J.; Yu, W.; Yi, L.; Guo, F. Crested Porcupine Optimizer-Optimized CNN-BiLSTM-Attention Model for Predicting Main Girder Temperature in Bridges. Appl. Sci. 2024, 14, 7356. [Google Scholar] [CrossRef]
  49. Liu, S.; Jin, Z.; Lin, H.; Lu, H. An improve crested porcupine algorithm for UAV delivery path planning in challenging environments. Sci. Rep. 2024, 14, 20445. [Google Scholar] [CrossRef]
  50. Zhang, C.; Liu, H.; Peng, Y.; Ding, W.; Cao, J. Intelligent Prediction and Application Research on Soft Rock Tunnel Deformation Based on the ICPO-LSTM Model. Build.-Basel 2024, 14, 2244. [Google Scholar] [CrossRef]
  51. Wang, H.; Zhang, L.; Liu, B. Research and Design of a Hybrid DV-Hop Algorithm Based on the Chaotic Crested Porcupine Optimizer for Wireless Sensor Localization in Smart Farms. Agriculture 2024, 14, 1226. [Google Scholar] [CrossRef]
  52. Liu, H.; Zhou, R.; Zhong, X.; Yao, Y.; Shan, W.; Yuan, J.; Xiao, J.; Ma, Y.; Zhang, K.; Wang, Z. Multi-Strategy Enhanced Crested Porcupine Optimizer: CAPCPO. Mathematics 2024, 12, 3080. [Google Scholar] [CrossRef]
  53. Ling, Y.; Zhou, Y.; Luo, Q. Lévy Flight Trajectory-Based Whale Optimization Algorithm for Global Optimization. IEEE Access 2017, 5, 6168–6186. [Google Scholar] [CrossRef]
  54. Wang, M.; Wang, J.-S.; Li, X.-D.; Zhang, M.; Hao, W.-K. Harris Hawk Optimization Algorithm Based on Cauchy Distribution Inverse Cumulative Function and Tangent Flight Operator. Appl. Intell. 2022, 52, 10999–11026. [Google Scholar] [CrossRef]
  55. Bernanke, B.; Gertler, M.J. Should Central Banks Respond to Movements in Asset Prices. Am. Econ. Rev. 2001, 91, 253–257. [Google Scholar] [CrossRef]
  56. Miles, W. The Housing Bubble: How Much Blame Does the Fed Really Deserve? J. Real Estate Res. 2014, 36, 41–58. [Google Scholar] [CrossRef]
  57. Duan, J.; Tian, G.; Yang, L.; Zhou, T. Addressing the macroeconomic and hedonic determinants of housing prices in Beijing Metropolitan Area, China. Habitat Int. 2021, 113, 102374. [Google Scholar] [CrossRef]
  58. Lerman, S.R. Location, housing, automobile ownership, and mode to work: A joint choice model. Transp. Res. Rec. 1976, 610, 6–11. [Google Scholar]
  59. Jones, L.D. Current Wealth Constraints on the Housing Demand of Young Owners. Rev. Econ. Stat. 1990, 72, 424–432. [Google Scholar] [CrossRef]
  60. Linneman, P.D.; Wachter, S. The Impacts of Borrowing Constraints on Homeownership. Real Estate Econ. 1989, 17, 389–402. [Google Scholar] [CrossRef]
  61. Hou, S.; Wang, J.; Zhu, D. Has the Newly Imposed Property Tax Controlled Housing Prices? An Analysis of China’s 2009–2020 Interprovincial Panel Data. Sustainability 2022, 14, 14872. [Google Scholar] [CrossRef]
  62. Kaleka, A.; Morgan, N.A. How marketing capabilities and current performance drive strategic intentions in international markets. Ind. Mark. Manag. 2019, 78, 108–121. [Google Scholar] [CrossRef]
  63. Abelson, P. The Real Incidence of Imposts on Residential Land Development and Building. Econ. Pap. A J. Appl. Econ. Policy 2010, 18, 85–89. [Google Scholar] [CrossRef]
  64. Woo, A.; Han, J.; Shin, H.; Lee, S. Economic benefits of urban streetscapes: Analyzing the interrelationships between visual street environments and single-family property values in Seoul, Korea. Appl. Geogr. 2024, 163, 103182. [Google Scholar] [CrossRef]
  65. Chen, C.; Zhai, H.; Wang, Z.; Ma, S.; Sun, J.; Wu, C.; Zhang, Y.; Tramontana, F. Experimental Research on the Impact of Interest Rate on Real Estate Market Transactions. Discret. Dyn. Nat. Soc. 2022, 2022, e9946703. [Google Scholar] [CrossRef]
  66. Gordon, B.L.; Winkler, D.T. The Effect of Listing Price Changes on the Selling Price of Single-Family Residential Homes. J. Real Estate Financ. Econ. 2016, 55, 185–215. [Google Scholar] [CrossRef]
  67. Miller, N.G. Time on the Market and Selling Price. Real Estate Econ. 2003, 6, 164–174. [Google Scholar] [CrossRef]
  68. Beracha, E.; Seiler, M.J. The Effect of Listing Price Strategy on Transaction Selling Prices. J. Real Estate Financ. Econ. 2013, 49, 237–255. [Google Scholar] [CrossRef]
  69. Benefield, J.D.; Cain, C.L.; Johnson, K.H. On the Relationship Between Property Price, Time-on-Market, and Photo Depictions in a Multiple Listing Service. J. Real Estate Financ. Econ. 2009, 43, 401–422. [Google Scholar] [CrossRef]
  70. Huang, G.; Huang, G.-B.; Song, S.; You, K. Trends in extreme learning machines: A review. Neural Netw. 2015, 61, 32–48. [Google Scholar] [CrossRef]
  71. Huang, G.-B. An Insight into Extreme Learning Machines: Random Neurons, Random Features and Kernels. Cogn. Comput. 2014, 6, 376–390. [Google Scholar] [CrossRef]
  72. Pecora, L.; Carroll, T. Synchronization in chaotic system. Phys. Rev. Lett. 1990, 64, 821. [Google Scholar] [CrossRef]
  73. Coelho, L.d.S.; Mariani, V.C. Use of chaotic sequences in a biologically inspired algorithm for engineering design optimization. Expert Syst. Appl. 2008, 34, 1905–1913. [Google Scholar] [CrossRef]
  74. Liu, M.; Yao, X.; Li, Y. Hybrid whale optimization algorithm enhanced with Lévy flight and differential evolution for job shop scheduling problems. Appl. Soft Comput. 2020, 87, 105954. [Google Scholar] [CrossRef]
  75. Cui, Y.; Shi, R.; Dong, J. CLTSA: A Novel Tunicate Swarm Algorithm Based on Chaotic-Lévy Flight Strategy for Solving Optimization Problems. Mathematics 2022, 10, 3405. [Google Scholar] [CrossRef]
  76. Zhang, Y.; Fei, L.; Chui, C.K.; Chng, C.B.; Zhao, S.; Li, J. Machine Learning Surrogate Model Optimized by Improved Sparrow Search Algorithm for Multi-Objective Optimization of Permanent Magnet Synchronous Motor Direct-Drive Pump. IEEE Trans. Veh. Technol. 2024, 73, 12773–12786. [Google Scholar] [CrossRef]
  77. Zhang, J.; Wang, J.S. Improved Salp Swarm Algorithm Based on Levy Flight and Sine Cosine Operator. IEEE Access 2020, 8, 99740–99771. [Google Scholar] [CrossRef]
  78. Layeb, A. Tangent search algorithm for solving optimization problems. Neural Comput. Appl. 2022, 34, 8853–8884. [Google Scholar] [CrossRef]
  79. Weisburd, D.; Britt, C.; Wilson, D.B.; Wooditch, A. (Eds.) Measuring Association for Scaled Data: Pearson’s Correlation Coefficient. In Basic Statistics in Criminology and Criminal Justice; Springer International Publishing: Cham, Switzerland, 2020; pp. 479–530. [Google Scholar]
  80. Sharma, D.K.; Chatterjee, M.; Kaur, G.; Vavilala, S. 3—Deep learning applications for disease diagnosis. In Deep Learning for Medical Applications with Unique Data; Gupta, D., Kose, U., Khanna, A., Balas, V.E., Eds.; Academic Press: Cambridge, MA, USA, 2022; pp. 31–51. [Google Scholar]
  81. Gunasegaran, T.; Cheah, Y.N. Evolutionary cross validation. In Proceedings of the 2017 8th International Conference on Information Technology (ICIT), Amman, Jordan, 17–18 May 2017; pp. 89–95. [Google Scholar]
  82. Doğan, N.Ö. Bland-Altman analysis: A paradigm to understand correlation and agreement. Turk. J. Emerg. Med. 2018, 18, 139–141. [Google Scholar] [CrossRef]
  83. Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning representations by back-propagating errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
  84. Suykens, J.; Lukas, L.; Van, P.; De, D.; Vandewalle, J. Least Squares Support Vector Machine Classifiers: A Large Scale Algorithm. Eur. Conf. Circuit Theory Des. ECCTD 2000, 99, 839–842. [Google Scholar]
  85. Kazemi, F.; Asgarkhani, N.; Jankowski, R. Optimization-based stacked machine-learning method for seismic probability and risk assessment of reinforced concrete shear walls. Expert Syst. Appl. 2024, 255, 124897. [Google Scholar] [CrossRef]
Figure 1. Basic information of the eight cities.
Figure 1. Basic information of the eight cities.
Buildings 15 01200 g001
Figure 2. The basic structure of the ELM.
Figure 2. The basic structure of the ELM.
Buildings 15 01200 g002
Figure 3. The cyclic variation in population size.
Figure 3. The cyclic variation in population size.
Buildings 15 01200 g003
Figure 4. The distribution of points generated by different chaotic maps.
Figure 4. The distribution of points generated by different chaotic maps.
Buildings 15 01200 g004
Figure 5. The changes of A ( t ) under different parameter conditions.
Figure 5. The changes of A ( t ) under different parameter conditions.
Buildings 15 01200 g005
Figure 6. Flow chart of our study.
Figure 6. Flow chart of our study.
Buildings 15 01200 g006
Figure 7. The heat map of numerical data.
Figure 7. The heat map of numerical data.
Buildings 15 01200 g007
Figure 8. Fitness curve of CMTCPO optimization.
Figure 8. Fitness curve of CMTCPO optimization.
Buildings 15 01200 g008
Figure 9. The results of the Bland–Altman analysis on the data set.
Figure 9. The results of the Bland–Altman analysis on the data set.
Buildings 15 01200 g009
Figure 10. The results of the Bland–Altman analysis on the unseen data set.
Figure 10. The results of the Bland–Altman analysis on the unseen data set.
Buildings 15 01200 g010
Figure 11. PDPs of some indicators.
Figure 11. PDPs of some indicators.
Buildings 15 01200 g011
Figure 12. The performances of seven algorithms on f 3 and f 10 .
Figure 12. The performances of seven algorithms on f 3 and f 10 .
Buildings 15 01200 g012
Table 1. Prediction index system of the sales cycle of second-hand housing.
Table 1. Prediction index system of the sales cycle of second-hand housing.
Primary IndexSecondary IndexUnitReference
S: StimulusS1: loan rate%[58,59,60]
S2: down payment rate%[58,59,60]
S3: tax rate%[61]
S4: regional GDP growth rate%[62]
S5: per capita disposable incomeCNY[22]
S6: unemployment rate%\
S7: second-hand house listing volumeset\
S8: second-hand house trading volumeset\
S9: new house sales cyclemonth[22,63]
O: OrganismO1: building aream2[22]
O2: house ageyear[22]
O3: the number of living roomspcs[22]
O4: the number of bedrooms pcs[22]
O5: the number of toilets pcs[22]
O6: structural form\[22]
O7: the total number of floorspcs[22]
O8: floor number \[8,9,10]
O9: orientation\[8,9,10]
O10: decoration degree\[8,9,10]
O11: distance from the city center km[8,9,10]
O12: distance from the subway stationm[8,9,10]
O13: the number of bus linespcs[8,9,10]
O14: the number of educational resourcespcs[8,9,10]
O15: the number of medical facilitiespcs[8,9,10]
O16: the number of commercial facilitiespcs[8,9,10]
O17: floor area rate%[8,9,10]
O18: greening rate%[8,9,10]
R: ResponseR1: listed priceyuan/m2[65]
R2: transaction priceyuan/m2\
R3: the rise and fall of house prices%[66]
R4: average listing timeday[67,68]
R5: number of times to watchpcs[69]
R6: volume change rate%\
Table 2. Different kinds of chaotic maps.
Table 2. Different kinds of chaotic maps.
NameFunctionParameter
C1Logistic
x n + 1 = a x n ( 1 x n )
a = 3.8
C2Cubic
x n + 1 = ρ x n ( 1 x n 2 )
ρ = 2.595
C3Singer
x n + 1 = a ( 7.86 x n 23.31 x n 2 + 28.75 x n 3 13.302875 x n 4 )
a = 1.07
C4Sine
x n + 1 = a 4 s i n ( π x n )
a = 4
C5Sinusoidal
x n + 1 = a x n 2 s i n ( π x n )
a = 2.3
C6Chebyshev
x n + 1 = c o s ( a c o s 1 ( x n ) )
a = 4
C7Circle
x n + 1 = m o d x n + b a 2 π s i n ( 2 π x n ) , 1
a = 0.5 , b = 0.2
C8Tent
x n + 1 = x n a , x n < a 1 x n 1 a , x n a
a = 0.75
C9Piecewise
x n + 1 = x n p , 0 x n < p x n p 0.5 p , p x n < 0.5 1 p x n 0.5 p , 0.5 x n < 1 p 1 x n p , 1 p x n < 1
p = 0.3
C10Tent–Logistic
–Cosine
x n + 1 = cos ( π ( 2 r x n + 4 ( 1 r ) x n ( 1 x n ) 0.5 ) ) , i f   x n < 0.5 cos ( π ( 2 r ( 1 x n ) + 4 ( 1 r ) x n ( 1 x n ) 0.5 ) ) , e l s e
r = 0.5
Table 3. The process of encoding the ‘orientation’.
Table 3. The process of encoding the ‘orientation’.
OrientationOrientation_1Orientation_2Orientation_3
South/North–South100
Southwest/Southeast010
North001
Table 4. The correlation coefficients between the input indicators and the predicted target.
Table 4. The correlation coefficients between the input indicators and the predicted target.
Second Index S 1 S 2 S 3 S 4 S 5 S 6
Pearson correlation coefficient−0.056−0.0510.0040.108−0.065−0.009
Second index S 7 S 8 S 9 O 1 O 2 O 3
Pearson correlation coefficient−0.054−0.099 −0.0670.001−0.0280.009
Second index O 4 O 5 O 7 O 11 O 12 O 13
Pearson correlation coefficient0.018−0.037−0.028−0.064−0.0210.003
Second index O 14 O 15 O 16 O 17 O 18 R 1
Pearson correlation coefficient−0.010.031−0.033−0.0320.014−0.045
Second index R 2 R 3 R 4 R 5 R 6
Pearson correlation coefficient 0.046 0.184 0.160 0.155 0.011
Table 5. Detailed process of CMTCPO in parameter optimization.
Table 5. Detailed process of CMTCPO in parameter optimization.
IterationsFitness (n − 1)Fitness (n)Fitness (n)–Fitness (n − 1)Continues?
190.000041760.000041760 < 0.000001Yes
200.000041760.000031900.00000986 > 0.000001Yes
210.000031900.000031900 < 0.000001Yes
2000.000031900.000031900 < 0.000001No
Table 6. Training results of the model.
Table 6. Training results of the model.
IndexSales CyclePrediction ValueIndexSales CyclePrediction Value
1138137.9999944343560 6515091508.9998216241200
51817.9999998553915 70870869.9999910180190
1010551054.9999938703400 75934933.9999052902180
15325324.9999987755760 803232.0000067648819
203534.9999989408656 85214213.9999897121620
25440439.9999837009160 903635.9999965745102
30433432.9999803560750 95572572.0000001090980
35382381.9999896302480 100179178.9999961658900
40355354.9999999811600 105244243.9999996364940
454141.0000006213821 110129129.0000004478840
503333.0000030636866 1158787.0000013263564
55460459.9999912310660 120204203.9999999556950
60317316.9999874981490 ---
Table 7. The outcomes of the ten-fold cross-validation.
Table 7. The outcomes of the ten-fold cross-validation.
Index12345
RMSE0.000042630.0000096940.0000081940.0000099930.000003059
MAPE0.0000015840.0000029940.0000023200.0000043050.000002395
Maximum relative error0.00014760.000029170.000023520.000021540.000009050
Index678910
RMSE0.000019480.0000089090.0000060760.000013320.00001946
MAPE0.0000073910.0000037570.0000032700.0000057830.000002322
Maximum relative error0.000063760.000024980.000017670.000026470.00006625
Table 8. Performance of CCPO1–CCPO10 on f 4 .
Table 8. Performance of CCPO1–CCPO10 on f 4 .
f 4 MeanStdRank
CCPO14.67 × 10−101.31 × 10−910
CCPO26.17 × 10−148.66 × 10−141
CCPO32.35 × 10−113.40 × 10−113
CCPO41.95 × 10−112.48 × 10−112
CCPO51.79 × 10−103.83 × 10−108
CCPO61.00 × 1020.0011
CCPO73.66 × 10−115.53 × 10−114
CCPO83.67 × 10−117.34 × 10−115
CCPO91.92 × 10−104.51 × 10−109
CCPO106.21 × 10−118.33 × 10−116
CPO1.11 × 10−102.68 × 10−107
Table 9. Average ranking of CCPO1–CCPO10.
Table 9. Average ranking of CCPO1–CCPO10.
NameCCPO1CCPO2CCPO3CCPO4CCPO5CCPO6
Average rank3.502.83 3.22 3.33 4.22 4.94
Overall rank61231011
NameCCPO7CCPO8CCPO9CCPO10CPO\
Average rank3.89 3.83 3.67 3.39 3.33 \
Overall rank98753\
Table 10. The performance of each variant on f 1 .
Table 10. The performance of each variant on f 1 .
f 1 MeanStdRank
CPO5.32 × 10−361.66 × 10−358
CCPO3.98 × 10−425.75 × 10−427
MCPO1.16 × 10−611.82 × 10−614
TCPO7.20 × 10−622.21 × 10−613
MTCPO1.26 × 10−583.63 × 10−586
CMCPO1.90 × 10−595.95 × 10−595
CTCPO5.24 × 10−701.23 × 10−692
CMTCPO2.36 × 10−706.26 × 10−701
Table 11. Average ranking of CCPO1–CCPO10.
Table 11. Average ranking of CCPO1–CCPO10.
NameCPOCCPOMCPOTCPO
Average rank4.223.442.833.11
Overall rank8645
NameMTCPOCMCPOCTCPOCMTCPO
Average rank2.783.502.501.78
Overall rank3721
Table 12. Calculation results for each algorithm.
Table 12. Calculation results for each algorithm.
Average Optimization
Algorithm
Average
RMSE
Average Calculation
Time
Average Iteration
Time at
Convergence
GA0.0046103283.17656.64
PSO0.0019121141.74199.80
BO0.0020851562.12390.53
SSA0.020171399.54349.75
WOA0.020591144.46417.73
GWO0.0019391291.48219.55
CPO0.0019391082.25129.87
CMTCPO0.00003190786.9878.71
Table 13. Results after 100 calculations.
Table 13. Results after 100 calculations.
ModelAverage
RMSE
Average Calculation
Time
Average Iteration
Time at
Convergence
CMTCPO-BPNN0.00065721844.37368.87
CMTCPO-LSSVM0.00022671495.26237.28
CMTCPO-RF0.00045731386.95298.19
CMTCPO-XGBoost0.00003637912.43118.62
CMTCPO-LightGBM0.0008529823.56135.89
CMTCPO-HKELM0.00003190786.9878.71
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Yu, B.; Yan, D.; Wu, H.; Wang, J.; Chen, S. A Novel Prediction Model for the Sales Cycle of Second-Hand Houses Based on the Hybrid Kernel Extreme Learning Machine Optimized Using the Improved Crested Porcupine Optimizer. Buildings 2025, 15, 1200. https://doi.org/10.3390/buildings15071200

AMA Style

Yu B, Yan D, Wu H, Wang J, Chen S. A Novel Prediction Model for the Sales Cycle of Second-Hand Houses Based on the Hybrid Kernel Extreme Learning Machine Optimized Using the Improved Crested Porcupine Optimizer. Buildings. 2025; 15(7):1200. https://doi.org/10.3390/buildings15071200

Chicago/Turabian Style

Yu, Bo, Deng Yan, Han Wu, Junwu Wang, and Siyu Chen. 2025. "A Novel Prediction Model for the Sales Cycle of Second-Hand Houses Based on the Hybrid Kernel Extreme Learning Machine Optimized Using the Improved Crested Porcupine Optimizer" Buildings 15, no. 7: 1200. https://doi.org/10.3390/buildings15071200

APA Style

Yu, B., Yan, D., Wu, H., Wang, J., & Chen, S. (2025). A Novel Prediction Model for the Sales Cycle of Second-Hand Houses Based on the Hybrid Kernel Extreme Learning Machine Optimized Using the Improved Crested Porcupine Optimizer. Buildings, 15(7), 1200. https://doi.org/10.3390/buildings15071200

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop