Stacked Ensemble Model for the Automatic Valuation of Residential Properties in South Korea: A Case Study on Jeju Island

Kim, Woosung; Hong, Jengei

doi:10.3390/land13091436

Open AccessArticle

Stacked Ensemble Model for the Automatic Valuation of Residential Properties in South Korea: A Case Study on Jeju Island

by

Woosung Kim

¹

and

Jengei Hong

^2,*

¹

School of Business, Konkuk University, Seoul 05029, Republic of Korea

²

School of Management and Economics, Handong Global University, Pohang 37554, Republic of Korea

^*

Author to whom correspondence should be addressed.

Land 2024, 13(9), 1436; https://doi.org/10.3390/land13091436

Submission received: 9 August 2024 / Revised: 30 August 2024 / Accepted: 2 September 2024 / Published: 5 September 2024

(This article belongs to the Section Land Innovations – Data and Machine Learning)

Download

Browse Figures

Versions Notes

Abstract

:

While the use of machine learning (ML) in automated real estate valuation is growing, research on stacking ML models into ensembles remains limited. In this paper, we propose a stacked ensemble model for valuing residential properties. By applying our models to a comprehensive dataset of residential real estate transactions from Jeju Island, spanning 2012 to 2021, we demonstrate that the predictive power of ML-based models can be enhanced. Our findings indicate that the stacked ensemble model, which combines predictions using ridge regression, outperforms all individual algorithms across multiple metrics. This model not only minimizes prediction errors but also provides the most stable and consistent results, as evidenced by the lowest standard deviation in both absolute errors and absolute percentage errors. Additionally, we employed the decision tree method to analyze the conditions under which specific features yield more accurate results or less reliable outcomes. It was observed that both the size and age of an apartment significantly impact prediction performance, with smaller and older complexes exhibiting lower accuracy and higher error rates.

Keywords:

stacking approaches; stacked ensemble model; ensemble model; predictive models; mass appraisal; machine learning

1. Introduction

Based on International Association of Assessing Officers (IAAOs) standards, “Mass appraisal is the process of valuing a group of properties as of a given date using common data, standardized methods, and statistical testing” [1]. Mass appraisal is commonly employed by local governments for property taxation purposes, as it allows them to assess the value of numerous properties within their jurisdiction in a timely and cost-effective manner. The authors in [2] stated that “a large number of tax base assessments of real estate need to be carried out in a relatively short period of time. At the same time, this assessment should conform to the law of asset assessment. In practice, it is essential to adopt a mass appraisal model which fits a specific country’s real estate market structure as well as is adaptive to its change over time”. Furthermore, the Basel II Accord, introduced by the Basel Committee on Banking Supervision (BCBS) in 2008, mandates that banks regularly monitor the value of collateral, with at least one evaluation per year. In markets where conditions are highly volatile, more frequent assessments are recommended. The accord allows for the use of statistical evaluation methods, such as referencing house price indices or conducting sampling, to update estimates or identify collateral that may have depreciated in value and require re-appraisal. A qualified professional must reassess the property when there is evidence that the collateral’s value has significantly decreased relative to the broader market or when a credit event, such as default, occurs. This regulation has led to more frequent property appraisals, which has consequently increased the associated costs in terms of time and money. Therefore, there is a growing need for a reliable, accurate, and efficient appraisal tool, with the mass appraisal model presenting a viable solution to meet this demand. In this study, we introduce a stacked ensemble approach for assessing the value of residential properties. Utilizing an extensive dataset of residential real estate transactions from Jeju Island, South Korea, covering the period from 2012 to 2021, we show that the predictive accuracy of machine learning-based models can be significantly improved. We also utilized the decision tree method to examine the circumstances in which certain features produce more accurate predictions or lead to less reliable results.

1.1. Literature Review

The origins of mass appraisal practices for property tax assessments can be traced back to the 1920s in the United States [3]. In response to the increasing demand for systematic and standardized methodologies within this domain, the International Association of Assessing Officers (IAAOs) was established in 1934. Since its inception, the IAAOs has played a pivotal role in advancing the discipline of mass appraisal through comprehensive educational programs, rigorous academic research, and the development of professional standards [4]. In addition to the IAAOs, other prominent institutions such as the Royal Institution of Chartered Surveyors (RICSs), the International Valuation Standards Council (IVSC), and the Appraisal Foundation (AF) have significantly contributed to the theoretical foundations and standardization of real estate mass appraisal. These organizations, alongside academic scholars, have developed and refined the criteria, processes, and methods employed in mass appraisal, providing detailed guidance on practical applications [2].

With the advancement of computer-assisted mass appraisal (CAMA) systems, models and standards have increasingly incorporated automated valuation methodologies (AVMs) for mass appraisal [5]. According to a survey conducted by the IAAOs, automated valuation models (AVMs) have demonstrated significant benefits in the assessment process. These benefits include improved accuracy, enhanced uniformity, greater equity, reduced costs and increased efficiency [6]. Countries such as Australia, Sweden, Northern Ireland, New Zealand, Singapore, Malaysia, and the United States have implemented CAMA systems based on automated appraisal models [7].

A wide range of techniques have been employed to effectively tackle the challenges associated with implementing AVMs in mass appraisal contexts [5]. The most conventional model for the mass appraisal problem is the hedonic pricing model based on the multiple linear regression technique. The model stems from Lancaster’s consumer theory in [8]. In Lancaster’s theory, the role of characteristics or attributes of goods is emphasized in determining consumer preferences and the price of goods. It is assumed that a product is perceived as a set of characteristics or attributes that contribute to the satisfaction or utility consumers derive from them. When a consumer purchases a product, they acquire characteristics within the product that can provide utility. Thus, the consumer’s optimal purchase decision is the goods that provide the highest level of utility given the resource constraints. According to the theory, an observed market price reflects a variety of characteristics within the product. While we cannot observe the value of individual attributes directly, it is possible to construct a model to estimate the implicit (hedonic) price for attributes. From this perspective, as mentioned in [9], “a house is a heterogeneous good embodying a package of inherent characteristics relevant to location, property attributes, and environmental amenities”. Various characteristics of a property, such as the age, the number of rooms, accessibility to public transportation system, and more, can be considered when estimating the property’s price.

Based on the theory, hedonic pricing models have been employed in various studies to explore the relationship between real estate prices and their characteristics. In [2,9], structural attributes, neighborhood attributes, and locational attributes are considered the factors that determine the price of residential property. The structural characteristics encompass not only the type, area, age, and number of bedrooms and restrooms, but also features such as the various amenities available within the property, which are considered in numerous studies. The authors of [10,11] have established that the price of a house is positively correlated with the number of rooms it contains, as well as its overall floor area. In [12], it was demonstrated that the age of a property can have a negative effect on its price. In [9], the heating system and floor level of a property are considered to construct a hedonic model. Also, neighborhood attributes are defined as characteristics shared by the entire apartment complex. These include features such as the apartment brand, number of units and buildings in the apartment complex, parking lot, floor area ratio, and building coverage ratio. Also, various researchers have examined how locational attributes, including proximity to business districts (CBDs), public transportation system, and retail stores, influence house prices [13,14,15].

The hedonic pricing models based on multiple linear regression methods are advantageous due to their simplicity in estimating and interpreting regression coefficients. In [2], it is mentioned that “because the target of mass appraisal is a large number of properties, and the valuation results need to be explained to the public, the basic needs are convenient operation and simple understanding”. Since the linear regression-based models assume that the effects of each attribute are distinct and constant, the coefficients of the model are easy to interpret, offering clear insights into how changes in the predictors are expected to impact the target variable (dependent variable). However, they oversimplify the complexity or non-linearity of the actual real estate market, which may result in reduced accuracy of the model. Regarding this, [9] mentioned that “if the housing market is organized into a series of sub-markets by housing size or income group, or if there is non-linearity in household preferences for an attribute, the predictor obtained from a single regression would fail to capture the complexities”.

Against this backdrop, machine learning-based appraisal models have emerged as a promising alternative, and research in this area has been increasingly active. Compared to traditional linear regression methods, machine learning techniques offer distinct advantages for property valuation. They excel at handling complex, non-linear relationships within the data, which linear regression often fails to capture. Moreover, machine learning algorithms can automatically detect and model interactions between features, thereby enhancing prediction accuracy. Numerous prior studies have demonstrated that, although machine learning-based appraisal models may exhibit lower interpretability compared to linear regression-based models, they consistently achieve higher accuracy. For instance, in studies [16,17,18], automated valuation models utilizing machine learning techniques were employed to estimate property values. In [9], the characteristics of a house price prediction model using the random forest (RF) algorithm were examined and its performance was compared to that of a traditional hedonic pricing model based on multiple linear regression. Like [9], the majority of studies on appraisal models based on machine learning techniques indicate that these models achieve greater accuracy compared to those based on linear regression models. For more comprehensive information, please refer to [2,19,20,21,22,23,24,25,26].

In the field of computer science, several studies have demonstrated that combining multiple models can yield more accurate predictions compared to using individual models alone. This is because a well-constructed ensemble of models can mitigate the errors associated with the unique characteristics of each model. In various fields, the techniques for creating model ensembles have been proposed. The techniques to aggregate models include simple averaging [27], weighted averaging [28,29], and probabilistic aggregation [30]. The effectiveness of these combinations heavily depends on the selection of models and the methods used for their integration. While numerous studies have explored the use of ensemble models for prediction and classification tasks in other domains, there is a notable lack of research dedicated to developing automated valuation models that incorporate a combination of predictors. In the field of property valuation, several studies have employed hybrid models that combine multiple prediction methods for appraisal. However, while some have utilized a hybrid of the hedonic and repeat-sales methods—leveraging repeat transactions when available and using hedonic information to control for differences in quality, as in the case of [31]—these approaches primarily focus on supplementing the repeat-sales method with hedonic models or excluding the impact of events like pandemics on the real estate market [32]. Few studies, however, have aimed to enhance the performance of prediction techniques by combining the predictive results of multiple methods. To the best of our knowledge, [33] is the only research that demonstrates an improved real estate appraisal model by synthesizing multiple predictive models. In the paper, it is mentioned that “Although the combined method is used in regression and classification problems in various fields, to our best knowledge, it has not been used in real estate appraisal problems”. To construct the combined model, support vector regression, random forest, XGBoost, LightGBM, and CatBoost algorithms were employed as single predictors, with naïve averaging, weighted averaging, and a machine learning-based voting method applied for aggregation.

1.2. Contribution and Organization

Motivated by the aforementioned lack of results, this paper proposes a stacked ensemble model for valuing residential properties. While stacking is a widely utilized ensemble technique in machine learning that aims to enhance model performance by integrating multiple predictive models, research applying this model to the mass appraisal problem is rare. Unlike other ensemble methods such as bagging and boosting, stacking involves training multiple base models (weak learners) on a dataset and then using their predictions as inputs for a meta-learner. This meta-learner attempts to optimize the combination of these base model predictions to achieve superior predictive accuracy. As base models, we employ random forest, gradient boosting, XGBoost, Catboost, AdaBoost, k-nearest neighbors (k-NNs), and decision tree algorithms. Ridge regression is used as the meta-learner. By applying our models to the entire residential real estate transaction data set from the period of 2012 to 2021 in Jeju Island, we demonstrate that the predictive power of ML-based models can be improved. The metrics used for performance comparison include the coefficient of determination (R²), mean absolute percentage error (MAPE), mean absolute error (MAE), coefficient of variation of the root mean squared error (CVRMSE), and root mean squared error (RMSE). The stacked ensemble model exhibited superior performance across all metrics, achieving the lowest RMSE of 23,041,799.68, MAE of 13,604,466.61, MAPE of 0.0840, and CVRMSE of 11.2157, along with the highest R² of 0.9725. Furthermore, the stacked ensemble model demonstrated reduced variance in errors compared to other models, with both the maximum and standard deviation (STDEV) of the absolute errors (AEs) and absolute percentage errors (APEs) being the lowest. This suggests that the stacked ensemble model provides more stable and reliable predictions, effectively leveraging the strengths of individual algorithms for enhanced predictive accuracy. Also, we employ decision tree techniques to explore patterns of error occurrences and their underlying factors. Using decision tree techniques, the conditions under which the model’s predictions are inaccurate can be analyzed. Investigating the constructed decision tree, it was observed that both the size and age of an apartment influence prediction accuracy. Specifically, smaller and older apartment complexes tend to show lower accuracy and higher error rates, suggesting that these factors significantly impact the model’s performance. This analysis provides insights into when automated valuation models should be utilized from a practical perspective.

The rest of our paper is organized as follows. In Section 2, we outline the techniques employed in this study. The individual machine learning techniques used in this study—decision tree, random forest (RF), gradient boosting, XGBoost, CatBoost, AdaBoost, and k-NN algorithms—are introduced, along with the method for combining these predictive models to create a stacked ensemble model. Section 3 describes our dataset and presents basic statistical information. The results of our analysis are provided in Section 4, and the final section offers a summary of our conclusions.

2. Methodology

In this section, the machine learning algorithms used as individual predictors are briefly introduced, along with the stacking method employed to aggregate these algorithms.

2.1. k-Nearest Neighbor Algorithm (k-NN)

The k-nearest neighbor (k-NN) algorithm is a non-parametric, instance-based learning method widely utilized for regression tasks. This algorithm operates on the premise that similar data points yield similar target values. The methodology of k-NN regression involves storing the entire training dataset, encompassing both feature vectors and their corresponding target values. To predict the target value for a new input data point, the algorithm calculates the distance between this new point and all points in the training dataset. Common distance metrics include Euclidean, Manhattan, and Minkowski distances. Once distances are calculated, the algorithm identifies the k-nearest neighbors to the new data point. Upon identifying the nearest neighbors, k-NN regression aggregates their target values to produce a final prediction. Typically, this aggregation involves computing the mean of the neighbors’ target values, although methods such as the median can be used to mitigate the influence of outliers. This approach allows the algorithm to make predictions based on the local structure of the data.

The primary advantages of k-NN regression include its simplicity and ease of implementation. While the prediction phase can be computationally expensive for large datasets due to the need to calculate distances to all training points, it does not require a computationally intensive training phase, as it is a lazy learning algorithm. Despite these computational challenges, the k-NN algorithm remains an effective and practical choice for regression tasks, providing accurate predictions by leveraging the similarity between data points. For studies on applying the algorithm to mass appraisal problems, please refer to [34,35].

2.2. Decision Tree Algorithm (Regression Tree)

The regression tree algorithm builds a binary tree where each decision node splits the dataset based on specific feature values to reduce the variance of the target variable within the resulting subsets. This recursive partitioning process continues until a stopping criterion, such as a minimum number of samples per leaf or a maximum tree depth, is reached. To make predictions, the algorithm traverses the tree from the root to a leaf node, with the final prediction being the mean target value of the samples within the leaf node.

In [36], it is highlighted that decision trees and their ensemble models (random forest (RF), gradient boosting, XGBoost, CatBoost, and AdaBoost) are particularly suitable for real estate appraisal problems. Firstly, these methods effectively handle categorical variables with many levels. In contrast to multiple regression or neural networks, which require a large number of parameters to accommodate qualitative variables, leading to overfitting, tree-based methods manage these variables more efficiently. Secondly, they perform well with missing data. When data is missing for an observation, the prediction is made using the existing structure of the tree, eliminating the need to exclude observations or impute missing values. Thirdly, predictions for new observations fall within the range of existing observations, which helps prevent significant overestimation or underestimation of real estate values.

Our dataset includes all real estate transactions in Jeju Island, South Korea, from 2012 to 2021, totaling 63,306 transactions. Among these, several variables are categorical (such as heating system and structure). Also, if records with missing values are removed, only 12,967 transactions remain, resulting in a significant loss of information. Therefore, as mentioned in the literature, decision tree-based algorithms are appropriate for this analysis. Consequently, we employed decision tree-based algorithms in our study.

2.3. Random Forest Algorithm

Random forest (RF) regression is an ensemble learning technique that enhances prediction accuracy by aggregating the outputs of multiple decision trees. This method was initially introduced by [37], who demonstrated that combining tree-based predictors using hyperplane splits can improve predictive performance and prevent overfitting. In RF regression, multiple decision trees are independently constructed using different bootstrapped samples of the original dataset. Each tree is grown to its maximum extent without pruning. Unlike traditional decision trees, which use the best possible split at each node based on all available features, RF trees use a random subset of features for each split. This randomness increases the diversity of the trees in the ensemble. After generating a substantial number of trees, the RF model makes predictions by averaging the outputs of all individual trees. In [38], it was noted that “this somewhat counterintuitive strategy turns out to perform very well compared to many other classifiers, including discriminant analysis, support vector machines, and neural networks, and is robust against overfitting”. The robustness and high predictive performance of the RF algorithm have led to its widespread application in real estate valuation [9,36]. The method effectively handles large numbers of input variables, copes well with missing data, and provides insights into variable importance.

2.4. Gradient Boosting Algorithm

Gradient boosting improves the predictive performance of models by sequentially combining the outputs of several weak learners. Initially introduced by [39], it builds an ensemble of models in an additive manner, where each new model aims to correct the errors made by the previous ones, thereby incrementally improving performance. The algorithm constructs new base learners that align closely with the negative gradient of the loss function, which measures how well the model fits the data. This iterative process continues until a predefined number of iterations is reached or the model’s performance stabilizes.

A significant distinction of gradient boosting is its flexibility in handling various loss functions, which can be tailored to specific applications. For instance, in real estate valuation, the loss function typically gauges the difference between actual and predicted property prices. The algorithm employs a gradient descent procedure to minimize the loss function as new trees are added, ensuring that each successive tree effectively reduces the overall prediction error. This method enhances the model’s ability to capture complex, non-linear relationships within the data.

Gradient boosting offers several mathematical advantages, including the reduction of bias and variance through iterative refinement. The algorithm’s capacity to focus on the most challenging data points and its robust handling of outliers make it a powerful tool for regression tasks. By incrementally improving the model’s predictions, gradient boosting creates a strong predictive model that generalizes well to new data. These advantages have led to the development and expansion of various techniques based on the principles of gradient boosting, such as XGBoost, CatBoost, and AdaBoost, each incorporating enhancements to improve performance and scalability.

2.5. XGBoost Algorithm

XGBoost, short for extreme gradient boosting, is a highly efficient and scalable implementation of the gradient boosting framework introduced by [40]. This algorithm enhances traditional gradient boosting principles with several advancements, making it particularly effective for regression tasks. One of the primary improvements is the integration of advanced regularization techniques, specifically L1 (Lasso) and L2 (Ridge) regularization, which help prevent overfitting and improve model generalization by penalizing model complexity and encouraging simpler, more robust trees.

XGBoost also excels in handling missing data and sparse features. The algorithm automatically determines the best way to manage missing values, maintaining predictive accuracy without the need for explicit imputation. It employs a sparsity-aware approach that efficiently processes datasets with high proportions of missing or zero values. This capability is crucial for real-world applications where incomplete data is common and traditional gradient boosting methods might struggle.

Additionally, XGBoost introduces significant improvements in computational efficiency and scalability. It optimizes memory usage and training speed through parallel processing and advanced data structures like column blocks and cache-aware prefetching. These enhancements substantially reduce training time, enabling XGBoost to manage large-scale datasets more effectively. This efficiency supports rapid experimentation and model iteration, which is essential for dynamic fields such as real estate valuation. For a comprehensive understanding of XGBoost, refer to the seminal paper by [40], which details its theoretical foundations and practical applications.

2.6. CatBoost Algorithm

CatBoost is an advanced implementation of the gradient boosting algorithm developed by [41], designed specifically to handle categorical data more effectively. CatBoost incorporates several key advancements that distinguish it from traditional gradient boosting methods, particularly in its approach to processing categorical features and preventing overfitting. Traditional gradient boosting methods often struggle with categorical data, necessitating transformation into numerical formats, which can increase dimensionality and potentially lead to information loss. CatBoost addresses this by employing a technique called “ordered target statistics”, which converts categorical features into numerical representations using statistical averages. This method preserves the natural ordering and relationships within the data, maintaining the integrity of the original information.

The algorithm also utilizes a unique approach called ordered boosting to combat overfitting and enhance model generalization. In standard gradient boosting, each new model is trained on the residuals of the entire training set, which can lead to overfitting, especially with smaller datasets. CatBoost mitigates this issue by building each tree using a permutation of the dataset, ensuring that each model learns only from past observations rather than future ones. This approach maintains the causal structure of the data and produces more robust models, reducing the risk of overfitting.

Additionally, CatBoost incorporates a symmetric tree structure, where splits at each level are determined simultaneously rather than sequentially. This symmetry simplifies the tree-building process and accelerates both training and prediction phases. Optimized for efficient computation on both CPUs and GPUs, CatBoost can scale effectively and handle large datasets, making it suitable for various applications.

2.7. AdaBoost Algorithm

AdaBoost, short for adaptive boosting, is an ensemble learning algorithm developed by [42]. Although it shares foundational principles with gradient boosting, such as combining weak learners to form a strong learner, AdaBoost introduces unique mechanisms that set it apart. It is particularly effective for both classification and regression tasks and is known for its simplicity and robustness.

One of the primary distinctions of AdaBoost is its adaptive reweighting mechanism. In AdaBoost, simple decision trees are often used as weak learners. Initially, all training instances are assigned equal weights. After each weak learner is trained, AdaBoost increases the weights of incorrectly predicted instances, forcing subsequent learners to focus more on these difficult cases. This differs from gradient boosting, where the focus is on minimizing the residual errors of previous models through gradient descent.

Another significant feature of AdaBoost is how it forms its final model. AdaBoost does not rely on gradient descent to minimize a loss function. Instead, it aggregates the predictions from all weak learners by taking a weighted sum, where the weights are determined by the accuracy of each learner. This method ensures that the final model is a robust combination of all learners, each contributing based on its performance. While this approach effectively reduces both bias and variance, it can be sensitive to noisy data and outliers, as the adaptive weighting may amplify the influence of these instances. Despite its sensitivity to noise, AdaBoost remains a popular choice due to its effectiveness and straightforward implementation.

2.8. Stacked Ensemble Model

Stacking, or stacked generalization, is an advanced ensemble learning technique introduced by Wolpert in 1992 [43]. It aims to improve predictive performance by combining multiple base models, leveraging their individual strengths through a meta-learner to produce a more accurate and robust final prediction. This approach is particularly effective for complex regression tasks where capturing diverse patterns and relationships in the data is crucial.

The stacking process begins with training several base models, often referred to as level 0 models, on the same dataset. In our study, the base models correspond to the algorithms mentioned above. Each base model generates predictions, which are then used as inputs for the meta-learner, also known as the level 1 model. The meta-learner is typically a simple model, such as linear regression, which is trained on the outputs of the base models to make the final prediction. Mathematically, let

\{h_{1}, h_{2}, \dots, h_{M}\}

represent the set of base models, where

M

denotes the number of these base models and

X

denotes the set of input features. Each base model

h_{i}

produces a prediction

h_{i} (X)

. The meta-learner

H

takes these predictions as input and produces the final prediction. Formally, the stacked model can be represented as:

\hat{y} = H (h_{1} (X), h_{2} (X), \dots, h_{M} (X)) .

(1)

The meta-learner is trained to minimize the prediction error on the validation or training set, optimizing the combination of base models to reduce overall prediction error. This differentiates stacking from other ensemble methods like bagging, which reduces variance by averaging predictions from multiple resampled datasets, and boosting, which sequentially builds models to correct residual errors from previous ones. Stacking’s ability to integrate diverse models is crucial for its effectiveness as it allows the meta-learner to exploit the strengths of each base model while mitigating their individual weaknesses.

The primary advantage of stacking is its flexibility to combine heterogeneous models. By allowing various algorithms to serve as base models, stacking can capture a broader range of patterns and relationships within the data, leveraging the strengths of different algorithms. This diversity helps improve the overall generalization capability of the ensemble model. Additionally, stacking can help mitigate overfitting. While individual models may overfit specific parts of the data, the meta-learner in stacking can learn to down-weight the influence of these overfitted models, resulting in a more robust final prediction. This is particularly beneficial when dealing with complex and noisy datasets, as the meta-learner can effectively balance the contributions of each base model. From a practical standpoint, stacking simplifies the model selection process for practitioners. The combined model presented in this study evaluates the performance of individual algorithms based on the features of the data. When estimating the price of real estate, the model selects and applies the most appropriate techniques based on the learned performance of these algorithms, considering the specific characteristics of the property. This approach eliminates the need to search for a single best predictive model, as the stacking model assigns higher weights to the most suitable models. Models that perform poorly on specific data are given low weights or excluded, ensuring they have a minimal effect on the overall predictive performance. This strategy allows practitioners to confidently utilize various machine learning models with reduced concern about selecting the optimal one. Furthermore, because stacking combines various machine learning techniques, it can be effectively applied not only to the real estate market in a specific region but also to property valuations in other regions. This adaptability makes it a versatile tool in real estate appraisal, ensuring reliable predictions across diverse geographic locations.

As mentioned earlier, while the stacked model can automatically identify the most suitable model regardless of the base algorithms used, this study primarily employs decision tree models as the base algorithms. Decision tree algorithms and their ensemble models, such as random forest (RF), gradient boosting, XGBoost, CatBoost, and AdaBoost, are particularly well-suited for real estate valuation tasks, as discussed in [36]. Tree-based algorithms efficiently manage categorical variables with numerous levels. Unlike multiple regression or neural networks, which may require complex parameterization to incorporate qualitative data—thereby increasing the risk of overfitting—tree-based models naturally accommodate these variables through data splitting, avoiding excessive complexity. This feature-based data splitting can also be advantageous for accurately evaluating property values by effectively considering the location of real estate. Note that the latitude and longitude of properties are used as features in this study. Other algorithms typically represent a property’s locational characteristics through factors such as accessibility to public transportation or proximity to elementary schools, assuming a linear (or directly proportional or inversely proportional) relationship with property value. However, when observing the actual real estate market in Korea, property values may be influenced non-linearly by variables representing specific locational characteristics. In such cases, decision tree techniques, which are based on data splitting, can identify specific grids with different price distributions based on latitude and longitude, allowing for a more accurate estimation of property values. Since the ensemble models used in this study incorporate decision trees, they are particularly effective at capturing the non-linear effects of location on property value. In addition, the models handle missing data effectively by using the existing tree structure to make predictions even when some information is missing, eliminating the need to discard observations or impute missing values. Our dataset comprises 63,306 real estate transactions, but removing records with missing data reduces this number to 12,967 transactions, resulting in substantial information loss. Thus, this approach preserves the integrity of the analysis and ensures that all available data is utilized.

We utilize linear regression, specifically ridge regression, as the meta-learner in our study. Given that the predictions from our base models serve as inputs, the meta-learner employs these seven predictions to generate the final output. This approach is akin to training a model on a dataset comprising seven features, indicating that algorithms designed for managing complex structures, such as neural networks, may be less suitable for this specific context. Linear regression is advantageous due to its computational efficiency and simplicity of implementation, making it particularly effective for large datasets and facilitating rapid model deployment. By assigning appropriate weights to each base model, linear regression effectively synthesizes their strengths, ensuring that contributions from models excelling in different scenarios are balanced to enhance overall performance. Additionally, in cases where all individual algorithms exhibit similar biases, the aggregation process via linear regression can adjust for these biases through the intercept term. The linear combination of predictions from multiple models also helps to mitigate overfitting, as linear regression introduces minimal complexity and typically generalizes more effectively compared to more intricate aggregation methods. Moreover, ridge regression is particularly well-suited when independent variables are highly correlated. Given the potential for correlation among the predictions generated by different algorithms, employing ridge regression as the meta-learner is an appropriate choice. This method helps to stabilize the model by effectively managing the multicollinearity among the input predictions, ensuring a more robust and reliable final output.

3. Data Set and Basic Statistics

Jeju Island, also known as Jeju-do, is the largest island off the coast of the Korean Peninsula and functions as a self-governing province of South Korea. With a population of approximately 700,000 and an area of 1846 square kilometers, Jeju Island is notable for its unique geographical and cultural features (Figure 1). While South Korea is known for its competitive real estate market, property prices on Jeju Island have risen significantly due to its growing popularity as both a tourist destination and a residential area. The average housing price in Jeju has increased substantially, driven by investments from both local and foreign sources. In recent years, real estate on Jeju Island has become highly desirable, particularly for its scenic beauty and relatively relaxed living conditions compared to the mainland.

Over the past decade, Jeju Island has experienced a significant number of real estate transactions, reflecting its appeal for both tourism and permanent residence. We collected 63,306 samples from apartment transaction records on Jeju Island from 2012 to 2021, provided by South Korea’s Ministry of Land, Infrastructure, and Transport (MOLIT). This dataset covers all transactions on Jeju Island during the period, offering a comprehensive overview of market trends and property values. Note that the objective of our study is to develop an automated valuation model for residential properties using the stacking method. Thus, our target variable is the price of the properties, and basic information about the features necessary for estimating the value is summarized in Table 1 and Table 2.

The features used to estimate apartment prices are categorized into three groups based on their characteristics. This classification follows the framework presented in [33]. The attributes of a property include its inherent characteristics, such as the floor area, the property’s age, and various other factors. Previous research has shown a positive correlation between property prices and attributes like the floor area [10]. Ref. [12] noted that a property’s age, or the number of years since its construction, can negatively impact its value. In our study, we utilize variables such as the occupancy approval date and transaction date to consider the property’s age. Additionally, the net floor area refers to the actual usable space within the apartment, whereas the gross floor area includes shared facilities such as gyms and swimming pools. In South Korean apartments, a significantly larger gross floor area compared to the net floor area indicates a higher likelihood that the apartment complex includes a variety of amenities. Both variables are considered in this study. The land share of an apartment represents the actual land area owned by each unit. As buildings age and their value depreciates over time, the importance of the land share increases. In the context of redevelopment or reconstruction, a larger land share can lead to higher compensation and an increase in the per-square-meter price. Additionally, variables such as floor level, heating system, the direction of the living room windows, and the presence of additional balconies, duplex layout, private gardens, and attics are used to construct the model as property attributes.

Apartment attributes refer to the characteristics that are common across all units within an apartment complex. For example, two properties within the same apartment complex may differ in size or the presence of an additional balcony (property attributes), but variables such as the total number of parking spots in the complex will have the same value for all properties within that complex. In South Korea, multi-family housing is categorized into three types based on the number of floors and the total floor area. Multiplex housing refers to buildings with four or fewer floors, whereas those with five or more floors are classified as apartments. Multiplex housing with a total floor area exceeding 660 square meters is referred to as “yeonlip” (Type 1), while those with a total floor area of 660 square meters or less are called “dasedae” (Type 2). Additionally, the construction method of the apartment is considered in the structure variable. To reflect the convenience of the apartment’s parking facilities, we use the total number of parking spots as a variable. Generally, apartment complexes with a larger number of households tend to have higher prices. To account for this, we included the number of buildings and households in the complex as variables. The floor area ratio (FAR) is determined by dividing the total gross floor area by the land area, while the building coverage ratio (BCR) is calculated by dividing the building area by the land area.

Locational attributes pertain to the characteristics associated with a property’s geographic location. According to references [9,44], these attributes encompass all external factors linked to a house’s location, including accessibility to the central business district (CBD), public transportation systems, schools, and proximity to environmental amenities. To integrate geographic location into our models, we use the property’s latitude and longitude coordinates and the nearest school information. Note that the nearest elementary, middle, and high schools are treated as categorical variables. In South Korea, children are assigned to schools based on the proximity to their residence. Given the high emphasis on children’s education in Korea, the specific school to which a property is zoned is crucial and can significantly impact the property’s value. Zoning districts represent the most fundamental classification of land use, ensuring that all land across the nation is designated without overlap. These zones regulate land use to ensure its alignment with specific purposes and apply various restrictions based on the area’s characteristics. According to Article 6 of the National Land Planning and Utilization Act, land in South Korea is categorized into four types of zoning districts: urban areas, which require systematic development and management due to high or anticipated population and industrial density; control areas, which need urban-level management or conservation similar to agricultural and forest zones; agricultural and forest areas, designated for the promotion and conservation of agriculture and forestry outside urban zones; and natural environment conservation areas, essential for conserving natural environments, water resources, ecosystems, and cultural properties. Each category is further subdivided, with specific restrictions applied based on the detailed zoning designations. These subdivisions impose various limitations and regulations to ensure appropriate and efficient land use, tailored to the specific needs and characteristics of each zone. For example, the type 1 exclusive residential zone is restricted to a building coverage ratio of 50% and a floor area ratio of 100% or less. The types of buildings permitted in this zone are limited to single-family homes (excluding multi-family homes) and certain neighborhood facilities, such as community centers, communal workshops, and supermarkets. For specific restrictions related to detailed subdivisions, please refer to [45].

The basic statistics for the numerical variables in the dataset are presented in Table 2. Most variables have no missing values; however, approximately 25% of the data for apartment parking spaces is missing. There are also significant missing values for categorical variables, with about 70% of the data for the heating system being incomplete. Removing records with missing values would result in retaining only 13,065 out of 63,306 transactions, leading to a substantial loss of data. Therefore, instead of removing records with missing values, we utilize tree-based algorithms to handle the missing data. As mentioned earlier, tree-based algorithms are particularly well-suited for analyzing datasets with missing values. These algorithms can naturally handle missing data by using surrogate splits, which allow them to make the best possible decision even when some data is missing. This feature ensures that the integrity of the analysis is maintained without discarding a significant portion of the data.

4. Results and Post Analysis

In this section, we compare the performance of individual algorithm predictors (i.e., k-NN, decision tree, random forest, gradient boosting, XGBoost, CatBoost, and Adaboost) with that of a stacked ensemble model. To construct the ensemble model, we utilized the aforementioned algorithms as the base learners and employed linear regression (ridge regression) as the meta-learner. Linear regression, being computationally efficient and straightforward to implement, is advantageous for large datasets and facilitating rapid model deployment. By assigning appropriate weights to each base model, linear regression effectively amalgamates their strengths. This method ensures that if one model performs well in certain contexts and another excels in different scenarios, their contributions are balanced to enhance overall performance. Moreover, if all individual algorithms exhibit similar biased predictions, the aggregation process via linear regression can adjust these biases through the intercept term. Additionally, the linear combination of predictions from multiple models helps mitigate overfitting, as linear regression introduces minimal complexity and typically generalizes better compared to more complex aggregation methods.

The performance comparison was conducted using 5-fold cross-validation to ensure a robust and reliable evaluation after optimizing the hyperparameters of the individual algorithms. In 5-fold cross-validation, the dataset is divided into five equal parts. Each part is used as a test set once, while the remaining four parts serve as the training set. This process is repeated five times, with each fold being used as the test set exactly once. The results from each fold are then averaged to provide a comprehensive assessment of the model’s performance. During this process, we fine-tuned the hyperparameters of each algorithm to identify the configuration that yielded the best performance. The algorithms with the highest performance were then selected to construct an ensemble model, leveraging their strengths to further enhance predictive accuracy. To compare the performance of the combined model using algorithms and stacking methods, results for various performance metrics were derived. The performance metrics used for comparison included root mean squared error (

R M S E

), mean absolute error (

M A E

), mean absolute percentage error (

M A P E

), coefficient of determination (

R^{2}

), and coefficient of variation of the root mean squared error (

C V R M S E

). The number of samples in the test set is defined as

n

, the actual value of the

p

-th sample in the test set is denoted as

y_{p}

, the predicted value of the dependent variable through the trained algorithm is denoted as

\hat{y_{p}}

, and the mean of

y_{p}

is represented as

\bar{y}

. The performance metrics are defined as follows:

R M S E = \sqrt{\frac{\sum_{p = 1}^{n} {(\hat{y_{p}} - y_{p})}^{2}}{n}},

(2)

M A E = \frac{1}{n} \sum_{p = 1}^{n} |\hat{y_{p}} - y_{p}|,

(3)

M A P E = \frac{100}{n} \sum_{p = 1}^{n} |\frac{\hat{y_{p}} - y_{p}}{y_{p}}|,

(4)

R^{2} = 1 - \frac{\sum_{p = 1}^{n} {(y_{p} - \hat{y_{p}})}^{2}}{\sum_{p = 1}^{n} {(y_{p} - \bar{y})}^{2}},

(5)

C V R M S E = \frac{\sqrt{\frac{\sum_{p = 1}^{n} {(\hat{y_{p}} - y_{p})}^{2}}{n}}}{\bar{y}} \times 100,

(6)

The results for each performance metric are presented in Table 3.

The stacked ensemble model demonstrated the best performance across all metrics, achieving the lowest RMSE of 23,041,799.68, MAE of 13,604,466.61, MAPE of 0.0840, and CVRMSE of 11.2157, as well as the highest R² of 0.9725. This indicates the smallest average error, highest precision, and best relative performance. XGBoost and AdaBoost also showed strong performance, with XGBoost achieving an RMSE of 24,175,049.55, MAE of 14,447,068.81, MAPE of 0.0894, R² of 0.9697, and CVRMSE of 11.7673. AdaBoost followed closely with an RMSE of 25,515,035.69, MAE of 13,696,174.92, MAPE of 0.0847, R² of 0.9663, and CVRMSE of 12.4195. In contrast, k-NN and decision tree exhibited higher error metrics and lower R² values, indicating less accurate predictions. Specifically, k-NN had an RMSE of 60,906,942.70, MAE of 33,667,069.83, MAPE of 0.2029, R² of 0.8079, and CVRMSE of 29.6468, while decision tree had an RMSE of 58,479,005.01, MAE of 34,265,276.34, MAPE of 0.2058, R² of 0.8230, and CVRMSE of 28.4649. The results suggest that the stacked ensemble model effectively combines the strengths of individual algorithms, leading to superior predictive performance. Figure 2 shows the scatter plot between the actual transaction prices and the predicted values.

Additionally, the variance of errors in the stacked ensemble model is smaller compared to other models. The maximum and standard deviation (STDEV) of both the absolute errors (AEs) and the absolute percentage errors (APEs) for each technique are presented in the Table 4. In the most extreme case (the maximum APE), the k-NN algorithm’s absolute percentage error is 11.6407, indicating that the prediction error is more than 11 times the actual transaction price. In contrast, the stacked ensemble model has the smallest maximum absolute percentage error of 3.8315 among all the algorithms. Also, the standard deviation of both the absolute error and absolute percentage error are the lowest among the algorithms, implying that the stacked ensemble model provides more stable and reliable predictions.

Table 5, Table 6, Table 7, Table 8 and Table 9 provide the probability that the model corresponding to the row has a higher score than the model corresponding to the column. The interpretation of a higher score varies depending on the metric; for metrics such as R², a higher score indicates a better model, while for other metrics, a higher score indicates a worse model. If negligible difference is enabled, the smaller number below indicates the probability that the difference between the pair is negligible. This test is based on the Bayesian interpretation of the t-test. The results indicate that, for most performance measures, the stacked ensemble model is likely to outperform the other models.

To evaluate the contribution of each feature to the prediction of the stacked model, the sensitivity analysis can be conducted [46]. Here, we utilized the permutation feature importance technique. In this method, the values of a specific feature are permuted, disrupting the relationship between that feature and the target variable. After the permutation, the change in the model’s performance metric is measured. A significant decrease in performance indicates that the feature was important to the model’s predictions, while a minimal change suggests that the feature had little influence. Figure 3 presents the results of the feature importance analysis. The left side shows how the model’s R-squared value changes when each variable is permuted, while the right side illustrates the changes in the mean absolute percentage error (MAPE). The results consistently indicate that the variables gross floor area, approval date, transaction date, net floor area, number of buildings in a complex, and number of households in a complex are the most important, in that order. This may be because, in the case of Korean real estate, many properties tend to have similar external appearances and internal layouts. Therefore, once the area (gross floor area, net floor area) and construction year are determined, the appearance and structure of the property and complex are likely to be similar. The transaction date might reflect the economic conditions or real estate policies of that particular year. Subsequently, variables such as latitude and longitude, which reflect locational value, appear in the order of importance.

From a practical standpoint, understanding the conditions and factors that lead to inaccuracies is crucial for improving model reliability and performance. By identifying these sources of error, practitioners can make better decisions by appropriately determining when to use the models. Figure 4 is a scatter diagram illustrating the percentage error by variable. Note that the percentage error (PE) is defined as the difference between the actual transaction price and the predicted value, divided by the actual transaction price. Through scatter diagrams, various insights can be obtained regarding percentage errors. Firstly, the errors show a similar distribution of overestimation and underestimation. Out of a total of 63,306 transactions, 33,015 cases had a percentage error of 0 or higher (underestimation), while 30,291 cases had a percentage error of less than 0 (overestimation). However, extreme predictions tend to occur with overestimations. In cases where the prediction differs from the actual transaction by more than 100%, no underestimations were observed (no cases with PE > 1). In contrast, there were 190 cases of overestimation (PE < −1). Secondly, the percentage error varies depending on the feature. The top left scatter diagram in Figure 3 depicts the percentage error according to the number of households. As shown, extreme predictions tend to occur with overestimations and are more likely to happen when the number of households in a complex is small. This can be interpreted in several ways. When the number of households is small, the frequency of transactions is generally low, leading to insufficient data for accurate model training. Additionally, small household complexes may include both luxury villas and low-cost housing, which can result in diverse property characteristics being grouped into the same node after decision tree splitting. This, in turn, may lower prediction accuracy. Floor level can be similarly interpreted. Typically, high-rise apartments consist of many units, so properties located on higher floors are likely to be in complexes with a large number of households. Therefore, when appraising properties in apartment complexes with a small number of households, it is advisable to exercise caution in using the model for valuation. Conversely, when estimating the value of properties in complexes with a sufficient number of households, the model can provide relatively stable performance. This information is useful in determining the appropriate conditions for using the model, such as evaluating the value of specific types of properties. The right-hand diagram shows that the percentage error is relatively higher for low-cost, small properties. This indicates that the appraisal model is better suited for predicting properties above a certain size. Additionally, since the actual transaction price is used in the denominator of the percentage error calculation, the same error value has a greater impact on small, low-cost apartments.

To address the aforementioned issue, we used the decision tree method to analyze under which conditions specific features provide more accurate results or less reliable outcomes. The decision tree technique was particularly useful in this context because it allows us to explicitly observe the patterns where errors occur, rather than treating the model as a black box. We constructed a decision tree to classify cases where the absolute percentage error exceeds a certain threshold. For example, if the absolute percentage error is greater than 20%, the category variable is set to 1; otherwise (if the error is less than 20%), it is set to 0. The features used for this classification were the same as those used in the appraisal model, with the category variables serving as the outcome variable. The decision tree used to classify cases where the percentage error is 20% or higher is presented in Figure 5. For ease of interpretation, the depth of the decision tree was limited to three.

Out of the total 63,306 transactions, 5613 transactions have a prediction error of 20% or higher (8.9%, the top node). However, the occurrence of errors varies depending on whether the apartment has an elevator. In cases where there is an elevator (right node), 2115 out of 41,735 transactions (5.1%) have errors of 20% or higher. In contrast, for apartments without an elevator, 3498 out of 21,571 transactions (16.2%) have errors of 20% or higher. Following the leftmost node down the decision tree reveals the conditions under which errors are most significant. When there is no elevator and the apartment building has two or fewer units, 2740 out of 13,651 transactions (20.1%) exhibit errors of 20% or higher. This is considerably higher than the overall rate of 8.9% for transactions with errors exceeding 20%. An interesting aspect of decision tree analysis is that the ranking of features identified as important by the permutation feature importance method does not always perfectly align with the features that appear at the top nodes of the tree. For instance, in Figure 5, neither gross floor area nor net floor area is presented, which suggests that while the size of the property is important for price prediction, it does not significantly contribute to prediction errors once the area is accounted for. It is important to note that there are no missing values for gross floor area and net floor area. This indicates that the model performs well across properties of all sizes. However, analysis of the decision tree reveals that the size of the apartment complex (number of units or buildings) and the age of the building significantly impact predictive performance. Variables such as the number of apartment buildings in a complex, apartment parking spaces, and the number of households in a complex are all related to the size of the apartment complex. It can be observed that the smaller the apartment complex, the lower the prediction accuracy. At each branching point, the segments corresponding to smaller complexes show a higher proportion of errors exceeding 20% compared to other segments. Additionally, it was found that the older the approval date, the lower the predictive accuracy. Interestingly, the presence of an elevator encompasses both these attributes. In apartment complexes without elevators, the absence of an elevator may be due to the buildings being too old or the complex being small (comprising low-rise buildings). This analysis using the decision tree method provides valuable insights into when the appraisal model may yield higher errors, thereby informing better usage of the model. Figure 6 presents the decision tree used to classify cases with a percentage error of 50% or higher. Similar to the previous example, we can analyze the decision tree to identify the conditions of features that lead to significant percentage errors. Out of the total 63,306 transactions, 1121 transactions have a prediction error of 50% or higher (1.8%, the top node). For transactions involving properties with an approval date prior to 21 August 1970, 978 out of 29,036 transactions exhibited a prediction percentage error exceeding 50%. In contrast, properties approved after 21 August 1970, showed only a 0.4% (143 out of 34,270) prediction percentage error in transactions. This indicates that properties with a more recent approval date tend to have relatively smaller prediction errors. Following the left node of the decision tree, for properties with fewer units (fewer than 90 units in the apartment complex), 4.6% of the transactions (738 out of 16,072) had an absolute percentage error exceeding 50%. On the other hand, following the right node, for properties approved after 21 August 1970, and equipped with elevators, only 94 out of 30,789 transactions (0.3%) had an absolute percentage error exceeding 50%.

This decision tree model provides valuable insights into when predictive techniques can be reliably applied. While efforts to improve prediction accuracy are always necessary, from a practical perspective, if the prediction accuracy is low under certain conditions, it may be difficult to trust these predictive techniques. In such cases, additional analysis through on-site inspections by experts or alternative methods could be warranted. Conversely, if the prediction’s results are sufficiently reliable under specific conditions, an automated valuation model could be applied. For example, among properties approved after 21 August 1970, with elevators and more than 19 units, only 53 out of 23,981 transactions (0.2%) exhibited a percentage prediction error exceeding 50%. Under these conditions, it could be reasonably concluded that an automated valuation model would be sufficiently trustworthy.

While this approach allows practitioners to identify scenarios where an automated model can be trusted, thereby streamlining the appraisal process and reducing the need for costly manual assessments, it is important to recognize the limitations of automated models, particularly in situations where data is scarce or where complex, non-quantifiable factors play a significant role in property valuation. Observing our decision tree model, we notice that significant errors tend to occur under conditions of limited data. For instance, in the lower nodes of Figure 6, the total number of transactions associated with certain conditions is limited, resulting in larger errors. This suggests that insufficient data may hinder the model’s ability to learn effectively, thereby reducing prediction accuracy. In such cases, automated models may need to be supplemented with expert judgment or alternative methodologies to ensure a comprehensive and accurate assessment.

5. Conclusions

Based on the analysis presented in this study, it can be concluded that the application of a stacked ensemble model can significantly enhances the predictive accuracy of automated real estate valuation models. The research demonstrates that by integrating various machine learning algorithms through the stacking approach, with ridge regression as the meta-learner, it is possible to achieve superior performance across multiple metrics, including RMSE, MAE, MAPE, R², and CVRMSE. This ensemble model not only minimizes prediction errors but also ensures greater stability and reliability, as evidenced by the lower variance in both absolute errors and absolute percentage errors. The advantage of the stacking method is its ability to integrate a diverse set of models, effectively leveraging the strengths of various algorithms to identify a broader range of patterns and relationships within the data. This diversity enhances the ensemble model’s ability to generalize to new data. From a practical perspective, this advantage can simplify model selection by assessing the performance of individual algorithms based on the characteristics of the data. When applied to real estate price estimation, the model automatically chooses the most appropriate techniques, assigning greater weight to algorithms that perform best for the given data set. This approach eliminates the need to search for a single best predictive model, as the stacked ensemble model naturally prioritizes the most effective models and reduces the influence of those that perform poorly on certain data points. Consequently, this method allows practitioners to confidently use a variety of machine learning models, thereby reducing uncertainty and improving overall predictive performance.

We also proposed a decision tree analysis method to identify the conditions under which certain features lower prediction accuracy. By employing a decision tree technique to classify instances where the absolute percentage error exceeds a defined threshold, we can determine the specific scenarios in which the model’s predictive accuracy diminishes. The analysis revealed that the size and age of an apartment are crucial factors, with smaller and older complexes showing reduced accuracy and higher error rates. This finding provides valuable insights into practical considerations when applying the model in real-world scenarios. For instance, the case study suggests that additional efforts may be required for more accurate valuations in apartments with fewer units or those that are older. Conversely, apartments with a sufficient number of units or those that are less aged may yield more accurate predictions than the performance metrics discussed in the paper.

Our research can be extended in several directions. First, the incorporation of a broader range of machine learning techniques could enhance the construction of ensemble models. In this study, the majority of predictors used were decision tree-based techniques. The predictive power of such a model might improve by integrating algorithms grounded in diverse principles. For example, incorporating an artificial neural network could be a promising approach. As our data contains a significant number of missing values, often due to the difficulty in collecting certain information when constructing features for real estate appraisal models, one potential direction for future research would be to investigate various methods for handling missing data. This research could explore how different imputation techniques impact the predictive performance of the models. By examining the effectiveness of these techniques in dealing with missing data, we can enhance the robustness and accuracy of real estate appraisal models in situations where complete data collection is challenging. Additionally, exploring various methods of combining predictors and applying the stacking method alongside the decision tree technique for identifying error patterns to different regions presents another potential direction for future research. Another direction for future research is the development of a reliable housing price index using machine learning techniques. In [47], a price index for free-market transactions of mixed-use income properties was developed to analyze the impact of speculative bubbles on property prices. By assuming the existence of a speculative bubble during the property boom of 1985–1990 and subsequently removing this hypothetical bubble from the price index, the study demonstrated that the proportion of explainable price variation increased when the bubble term was excluded. A similar methodology could be employed to analyze the effects of various economic conditions on the real estate market. In [48], it is noted that the performance of traditional valuation methods can decline under certain economic conditions. Enhancing the model’s performance by incorporating variables that better capture these economic conditions could be a promising direction for future research. Lastly, as proposed in [49], research could be extended to focus on making appropriate investment decisions from the perspective of investors and estimating the associated risks.

Author Contributions

Conceptualization and design of the data analysis, J.H.; methodology and data analysis, W.K.; interpretation, J.H. and W.K.; writing—original draft preparation, W.K.; supervision, J.H.; All authors have read and agreed to the published version of the manuscript.

Funding

This paper was supported by Konkuk University in 2022.

Data Availability Statement

Restrictions apply to the availability of these data. Data were obtained from Ministry of Land, Infrastructure, and Transport (MOLIT) of South Korea with the permission of Ministry of Land, Infrastructure, and Transport (MOLIT) of South Korea.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

IAAO. Standard on Mass Appraisal of Real Prop; IAAO: Kansas City, MO, USA, 2017. [Google Scholar]
Wang, D.; Li, V.J. Mass appraisal models of real estate in the 21st century: A systematic literature review. Sustainability 2019, 11, 7006. [Google Scholar] [CrossRef]
Zangerle, J.A. Principles of Real Estate Appraising; S. McMichael Pub. Organization: Cleveland, OH, USA, 1927; Volume 3. [Google Scholar]
McCluskey, W.J.; Borst, R.A. Specifying the effect of location in multivariate valuation models for residential properties: A critical evaluation from the mass appraisal perspective. Prop. Manag. 2007, 25, 312–343. [Google Scholar] [CrossRef]
d’Amato, M. Amato, M. A brief outline of AVM models and standards evolutions. In Advances in Automated Valuation Modeling: AVM after the Non-Agency Mortgage Crisis; Springer: Berlin/Heidelberg, Germany, 2017; pp. 3–21. [Google Scholar]
Bidanset, P.E.; Rakow, R. Survey on the use of automated valuation models (AVMs) in government assessment offices: An analysis of AVM use, acceptance, and barriers to more widespread implementation. J. Prop. Tax Assess. Adm. 2022, 19, 3. [Google Scholar]
Dimopoulos, T.; Moulas, A. A proposal of a mass appraisal system in Greece with CAMA system: Evaluating GWR and MRA techniques in Thessaloniki Municipality. Open Geosci. 2016, 8, 675–693. [Google Scholar] [CrossRef]
Lancaster, K.J. A new approach to consumer theory. J. Political Econ. 1966, 74, 132–157. [Google Scholar] [CrossRef]
Hong, J.; Choi, H.; Kim, W.S. A house price valuation based on the random forest approach: The mass appraisal of residential property in South Korea. Int. J. Strateg. Prop. Manag. 2020, 24, 140–152. [Google Scholar] [CrossRef]
Fletcher, M.; Gallimore, P.; Mangan, J. Heteroscedasticity in hedonic house price models. J. Prop. Res. 2000, 17, 93–108. [Google Scholar] [CrossRef]
Rodriguez, M.; Sirmans, C.F. Quantifying the value of a view in single-family housing markets. Apprais. J. 1994, 62, 600. [Google Scholar]
Kain, J.F.; Quigley, J.M. Measuring the value of housing quality. J. Am. Stat. Assoc. 1970, 65, 532–548. [Google Scholar] [CrossRef]
Adair, A.; McGreal, S.; Smyth, A.; Cooper, J.; Ryley, T. House prices and accessibility: The testing of relationships within the Belfast urban area. Hous. Stud. 2000, 15, 699–716. [Google Scholar] [CrossRef]
Song, Y.; Sohn, J. Valuing spatial accessibility to retailing: A case study of the single family housing market in Hillsboro, Oregon. J. Retail. Consum. Serv. 2007, 14, 279–288. [Google Scholar] [CrossRef]
Chen, J.H.; Ong, C.F.; Zheng, L.; Hsu, S.C. Forecasting spatial dynamics of the housing market using support vector machine. Int. J. Strateg. Prop. Manag. 2017, 21, 273–283. [Google Scholar] [CrossRef]
McCluskey, W.; Davis, P.; Haran, M.; McCord, M.; McIlhatton, D. The potential of artificial neural networks in mass appraisal: The case revisited. J. Financ. Manag. Prop. Constr. 2012, 17, 274–292. [Google Scholar] [CrossRef]
Zhou, G.; Ji, Y.; Chen, X.; Zhang, F. Artificial neural networks and the mass appraisal of real estate. Int. J. Online Eng. 2018, 14, 180–187. [Google Scholar] [CrossRef]
Kontrimas, V.; Verikas, A. The mass appraisal of the real estate by computational intelligence. Appl. Soft Comput. 2011, 11, 443–448. [Google Scholar] [CrossRef]
Mora-Garcia, R.T.; Cespedes-Lopez, M.F.; Perez-Sanchez, V.R. Housing price prediction using machine learning algorithms in COVID-19 times. Land 2022, 11, 2100. [Google Scholar] [CrossRef]
Choy, L.H.; Ho, W.K. The use of machine learning in real estate research. Land 2023, 12, 740. [Google Scholar] [CrossRef]
Gnat, S. Property mass valuation on small markets. Land 2021, 10, 388. [Google Scholar] [CrossRef]
Li, S.; Jiang, Y.; Ke, S.; Nie, K.; Wu, C. Understanding the effects of influential factors on housing prices by combining extreme gradient boosting and a hedonic price model (XGBoost-HPM). Land 2021, 10, 533. [Google Scholar] [CrossRef]
Bilgilioğlu, S.S.; Yılmaz, H.M. Comparison of different machine learning models for mass appraisal of real estate. Surv. Rev. 2023, 55, 32–43. [Google Scholar] [CrossRef]
Dimopoulos, T.; Bakas, N. An artificial intelligence algorithm analyzing 30 years of research in mass appraisals. RELAND Int. J. Real Estate Land Plan. 2019, 2, 10–27. [Google Scholar]
Matysiak, G.A. Assessing the accuracy of individual property values estimated by automated valuation models. J. Prop. Invest. Financ. 2023, 41, 279–289. [Google Scholar] [CrossRef]
Baur, K.; Rosenfelder, M.; Lutz, B. Automated real estate valuation with machine learning models using property descriptions. Expert Syst. Appl. 2023, 213, 119147. [Google Scholar] [CrossRef]
Taniguchi, M.; Tresp, V. Averaging regularized estimators. Neural Comput. 1997, 9, 1163–1178. [Google Scholar] [CrossRef]
Krogh, A.; Vedelsby, J. Neural network ensembles, cross validation, and active learning. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 1994; Volume 7. [Google Scholar]
Merz, C.; Pazzani, M. Combining neural network regression estimates with regularized linear weights. In Advances in Neural Information Processing Systems; MIT Press: Cambridge, MA, USA, 1996; Volume 9. [Google Scholar]
Kittler, J.; Hatef, M.; Duin, R.P.; Matas, J. On combining classifiers. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 226–239. [Google Scholar] [CrossRef]
Case, B.; Pollakowski, H.O.; Wachter, S.M. On choosing among house price index methodologies. Real Estate Econ. 1991, 19, 286–307. [Google Scholar] [CrossRef]
Renigier-Biłozor, M.; Źróbek, S.; Walacik, M.; Janowski, A. Hybridization of valuation procedures as a medicine supporting the real estate market and sustainable land use development during the COVID-19 pandemic and afterwards. Land Use Policy 2020, 99, 105070. [Google Scholar] [CrossRef]
Hong, J.; Kim, W.S. Combination of machine learning-based automatic valuation models for residential properties in South Korea. Int. J. Strateg. Prop. Manag. 2022, 26, 362–384. [Google Scholar] [CrossRef]
Yıldırım, H. Property value assessment using artificial neural networks, hedonic regression and nearest neighbors regression methods. Selçuk Üniversitesi Mühendislik Bilim Teknol. Derg. 2019, 7, 387–404. [Google Scholar]
Mukhlishin, M.F.; Saputra, R.; Wibowo, A. Predicting house sale price using fuzzy logic, Artificial Neural Network and K-Nearest Neighbor. In Proceedings of the 1st International Conference on Informatics and Computational Sciences (ICICoS), Semarang, Indonesia, 15–16 November 2017. [Google Scholar]
Antipov, E.A.; Pokryshevskaya, E.B. Mass appraisal of residential apartments: An application of Random forest for valuation and a CART-based approach for model diagnostics. Expert Syst. Appl. 2012, 39, 1772–1778. [Google Scholar] [CrossRef]
Ho, T.K. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; Volume 1, pp. 278–282. [Google Scholar]
Liaw, A.; Wiener, M. Classification and regression by random forest. R News 2002, 2, 18–22. [Google Scholar]
Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13 August 2016; pp. 785–794. [Google Scholar]
Prokhorenkova, L.; Gusev, G.; Vorobev, A.; Dorogush, A.V.; Gulin, A. CatBoost: Unbiased boosting with categorical features. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 3–8 December 2018; Curran Assoc. Inc.: Red Hook, NY, USA, 2018; Volume 31. [Google Scholar]
Freund, Y.; Schapire, R.E. A decision-theoretic generalization of on-line learning and an application to boosting. J. Comput. Syst. Sci. 1997, 55, 119–139. [Google Scholar] [CrossRef]
Wolpert, D.H. Stacked generalization. Neural Netw. 1992, 5, 241–259. [Google Scholar] [CrossRef]
Yu, D. Modeling owner-occupied single-family house values in the city of Milwaukee: A geographically weighted regression approach. GIScience Remote Sens. 2007, 44, 267–282. [Google Scholar] [CrossRef]
Korea Legislation Research Institute. Available online: https://elaw.klri.re.kr/eng_service/lawView.do?hseq=64982&lang=ENG (accessed on 22 July 2024).
Dimopoulos, T.; Bakas, N. Sensitivity analysis of machine learning models for the mass appraisal of real estate. Case study of residential units in Nicosia, Cyprus. Remote Sens. 2019, 11, 3047. [Google Scholar] [CrossRef]
Björklund, K.; Söderberg, B. Property cycles, speculative bubbles and the gross income multiplier. J. Real Estate Res. 1999, 18, 151–174. [Google Scholar] [CrossRef]
DeLisle, J.; Grissom, T. Valuation procedure and cycles: An emphasis on down markets. J. Prop. Invest. Financ. 2011, 29, 384–427. [Google Scholar] [CrossRef]
French, N. Predicted property investment returns: Risk and growth models. J. Prop. Invest. Financ. 2019, 37, 580–588. [Google Scholar] [CrossRef]

Figure 1. Jeju island (Source: Wikipedia).

Figure 2. Scatter plots of the actual transaction prices (on the horizontal axis) and the predicted values (on the vertical axis).

Figure 3. Permutation feature importance (left: R², right: MAPE).

Figure 4. Scatter diagram illustrating the percentage error by variables.

Figure 5. Decision tree for classifying absolute percentage errors of 20% or higher.

Figure 6. Decision tree for classifying absolute percentage errors of 50% or higher.

Table 1. The variables used in our research.

Category	Variable	Description and Units of Variables
Target variable	Transaction price	Korean won
Property attributes	Approval date	Date of first use (occupancy approval date)
Property attributes	Transaction date	Transaction date (year, month, and day)
	Gross floor area	Total area of a property, including common areas (unit: square meters (m²))
	Net floor area	Total usable area of a property, excluding common areas (unit: square meters (m²))
	Land share	The portion of the total land area of a multi-family housing development that is allocated to a specific property (square meters: m²)
	Floor	Floor level
	Heating system	Central/individual/local district (categorical)
	Facing	Direction of the living room window (categorial) ex) South-facing, East-facing
	Additional balcony	Presence of additional balcony, Y/N
	Duplex status	Whether it is a duplex, Y/N
	Private garden status	Presence of private garden, Y/N
	Presence of Attic	Whether there is an attic, Y/N
Apartment attributes	Type	Apartment/Multiplex housing type 1/Multiplex housing type 2 (Type 1 and Type 2 are classified based on whether the floor area is greater than or less than 660 square meters, respectively.)
	Structure	Reinforced concrete structure/lightweight steel structure, etc. (categorial)
	Apartment parking spaces	Number of parking spaces in the apartment
	Number of households in complex	Unit: number
	Number of apartment buildings in complex	Unit: number
	Presence of elevator	Y/N
	Floor area ratio	Ratio of the gross floor area divided by the land area
	Building coverage ratio	Ratio of the building area divided by the land area
Locational attribute	Latitude	latitude of a property
	Longitude	longitude of a property
	The nearest elementary school	Name of elementary school
	The nearest middle school	Name of middle school
	The nearest high school	Name of high school
	Zoning district	Pre-designated land use by the government (categorial)

Table 2. Descriptive statistics for numerical variables.

Variable	Mean	Median	Standard Deviation	Min	Max	Missing
Transaction price	205,442,166.1	180,000,000	138,999,633	10,000,000	3,441,576,456	0 (0%)
Approval date	-	30 March 2010	-	22 January 1975	29 December 2020	0 (0%)
Transaction date	-	6 July 2016	-	2 January 2012	9 September 2021	0 (0%)
Gross floor area	82.6055	82.4	28.4205	15.2	580.806	0 (0%)
Net floor area	68.2157	71.79	24.0101	14.386	517.548	0 (0%)
Land share	64.3634	54.062	45.7675	0	910.233	0 (0%)
Floor	3.95	3	2.64	-1	20	0 (0%)
Apartment parking spaces	149.7601	28	257.3708	0	1196	15803(25%)
Number of households in a complex	156.2326	48	242.4379	2	1156	0(0%)
Number of buildings in a complex	4.3952	2	6.0079	1	46	0(0%)
Floor area ratio	155.3761	148.69	150.1591	0	1542.43	36(0%)
Building coverage ratio	29.4293	26.75	25.3080	0	153.9	0(0%)
Latitude	126.5128	126.5195	0.1009	126.1647	126.9530	0(0%)
Longitude	33.4324	33.4850	0.1024	33.2114	33.9591	0(0%)

Table 3. Performance measures.

Model	RMSE	MAE	MAPE	R²	CVRMSE
k-NN	60,906,942.697	33,667,069.829	0.2029	0.8079	29.6468
Decision tree	58,479,005.011	34,265,276.340	0.2058	0.8230	28.4649
Random forest	26,755,586.576	14,383,975.461	0.0873	0.9629	13.0234
Gradient boosting	43,257,558.391	27,992,793.679	0.1640	0.9032	21.0558
XGBoost	24,175,049.545	14,447,068.812	0.0894	0.9697	11.7673
CatBoost	29,416,413.785	18,908,384.362	0.1143	0.9552	14.3186
AdaBoost	25,515,035.693	13,696,174.919	0.0847	0.9663	12.4195
Stacked ensemble model	23,041,799.680	13,604,466.605	0.0840	0.9725	11.2157

Table 4. The standard deviation and maximum values of the absolute error and percentage error.

	Maximum AE	STDEV of AE	Maximum APE	STDEV of APE
k-NN	1,995,761,000	50,752,813.86	11.64069231	0.329260711
Decision tree	1,440,562,000	47,367,223.86	13.19953846	0.322507656
Random forest	1,265,093,000	22,649,574.79	3.96373913	0.141373689
Gradient boosting	1,047,533,000	33,547,877.77	4.621761538	0.199948192
XGBoost	934,233,000	19,502,897.29	4.277217391	0.135398364
CatBoost	1,485,270,000	22,558,702.97	4.201130435	0.15008249
AdaBoost	1,196,743,000	21,486,066.58	3.942857143	0.145567101
Stacked ensemble model	763,940,000	18,620,899.69	3.831478261	0.132033815

Table 5. Comparison of models by RMSE.

	k-NN	Decision Tree	Random Forest	Gradient Boosting	XGBoost	Catboost	Adaboost	Stacked Ensemble
k-NN		0.838	1.000	0.999	1.000	1.000	1.000	1.000
Decision tree	0.162		1.000	1.000	1.000	1.000	1.000	1.000
Random forest	0.000	0.000		0.001	0.806	0.167	0.997	0.943
Gradient boosting	0.001	0.000	0.999		1.000	1.000	1.000	1.000
XGBoost	0.000	0.000	0.194	0.000		0.001	0.326	0.887
CatBoost	0.000	0.000	0.833	0.000	0.999		0.906	0.999
AdaBoost	0.000	0.000	0.003	0.000	0.674	0.094		0.872
Stacked ensemble	0.000	0.000	0.057	0.000	0.113	0.001	0.128

Table 6. Comparison of models by MAE.

	k-NN	Decision Tree	Random Forest	Gradient Boosting	XGBoost	Catboost	Adaboost	Stacked Ensemble
k-NN		0.141	1.000	1.000	1.000	1.000	1.000	1.000
Decision tree	0.859		1.000	1.000	1.000	1.000	1.000	1.000
Random forest	0.000	0.000		0.000	0.378	0.000	1.000	1.000
Gradient boosting	0.000	0.000	1.000		1.000	1.000	1.000	1.000
XGBoost	0.000	0.000	0.622	0.000		0.000	0.997	0.996
CatBoost	0.000	0.000	1.000	0.000	1.000		1.000	1.000
AdaBoost	0.000	0.000	0.000	0.000	0.003	0.000		0.847
Stacked ensemble	0.000	0.000	0.000	0.000	0.004	0.000	0.153

Table 7. Comparison of models by MAPE.

	k-NN	Decision Tree	Random Forest	Gradient Boosting	XGBoost	Catboost	Adaboost	Stacked Ensemble
k-NN		0.227	1.000	1.000	1.000	1.000	1.000	1.000
Decision tree	0.773		1.000	1.000	1.000	1.000	1.000	1.000
Random forest	0.000	0.000		0.000	0.047	0.000	0.999	0.994
Gradient boosting	0.000	0.000	1.000		1.000	1.000	1.000	1.000
XGBoost	0.000	0.000	0.953	0.000		0.000	0.997	0.998
CatBoost	0.000	0.000	1.000	0.000	1.000		1.000	1.000
AdaBoost	0.000	0.000	0.001	0.000	0.003	0.000		0.779
Stacked ensemble	0.000	0.000	0.006	0.000	0.002	0.000	0.221

Table 8. Comparison of models by R².

	k-NN	Decision Tree	Random Forest	Gradient Boosting	XGBoost	Catboost	Adaboost	Stacked Ensemble
k-NN		0.174	0.001	0.002	0.000	0.001	0.001	0.000
Decision tree	0.826		0.000	0.001	0.000	0.001	0.000	0.000
Random forest	0.999	1.000		1.000	0.204	0.811	0.004	0.081
Gradient boosting	0.998	0.999	0.000		0.000	0.000	0.000	0.000
XGBoost	1.000	1.000	0.796	1.000		0.999	0.681	0.116
CatBoost	0.999	0.999	0.189	1.000	0.001		0.105	0.001
AdaBoost	0.999	1.000	0.996	1.000	0.319	0.895		0.148
Stacked ensemble	1.000	1.000	0.919	1.000	0.884	0.999	0.852

Table 9. Comparison of models by CVRMSE.

	k-NN	Decision Tree	Random Forest	Gradient Boosting	XGBoost	Catboost	Adaboost	Stacked Ensemble
k-NN		0.838	1.000	0.999	1.000	1.000	1.000	1.000
Decision tree	0.162		1.000	1.000	1.000	1.000	1.000	1.000
Random forest	0.000	0.000		0.001	0.805	0.169	0.997	0.941
Gradient boosting	0.001	0.000	0.999		1.000	1.000	1.000	1.000
XGBoost	0.000	0.000	0.195	0.000		0.001	0.326	0.886
CatBoost	0.000	0.000	0.831	0.000	0.999		0.905	0.999
AdaBoost	0.000	0.000	0.003	0.000	0.674	0.095		0.871
Stacked ensemble	0.000	0.000	0.059	0.000	0.114	0.001	0.129

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, W.; Hong, J. Stacked Ensemble Model for the Automatic Valuation of Residential Properties in South Korea: A Case Study on Jeju Island. Land 2024, 13, 1436. https://doi.org/10.3390/land13091436

AMA Style

Kim W, Hong J. Stacked Ensemble Model for the Automatic Valuation of Residential Properties in South Korea: A Case Study on Jeju Island. Land. 2024; 13(9):1436. https://doi.org/10.3390/land13091436

Chicago/Turabian Style

Kim, Woosung, and Jengei Hong. 2024. "Stacked Ensemble Model for the Automatic Valuation of Residential Properties in South Korea: A Case Study on Jeju Island" Land 13, no. 9: 1436. https://doi.org/10.3390/land13091436

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

Stacked Ensemble Model for the Automatic Valuation of Residential Properties in South Korea: A Case Study on Jeju Island

Abstract

1. Introduction

1.1. Literature Review

1.2. Contribution and Organization

2. Methodology

2.1. k-Nearest Neighbor Algorithm (k-NN)

2.2. Decision Tree Algorithm (Regression Tree)

2.3. Random Forest Algorithm

2.4. Gradient Boosting Algorithm

2.5. XGBoost Algorithm

2.6. CatBoost Algorithm

2.7. AdaBoost Algorithm

2.8. Stacked Ensemble Model

3. Data Set and Basic Statistics

4. Results and Post Analysis

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI