Next Article in Journal
Optimal Investments in the Portfolio Yield Reactive (PYR) Model
Previous Article in Journal
The Risk of Protectionism: What Can Be Lost?
Previous Article in Special Issue
Understanding the Effects of Market Volatility on Profitability Perceptions of Housing Market Developers
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Hedonic Models Incorporating Environmental, Social, and Governance Factors for Time Series of Average Annual Home Prices

by
Jason R. Bailey
*,
W. Brent Lindquist
and
Svetlozar T. Rachev
Department of Mathematics & Statistics, Texas Tech University, Lubbock, TX 79409-1042, USA
*
Author to whom correspondence should be addressed.
J. Risk Financial Manag. 2024, 17(8), 375; https://doi.org/10.3390/jrfm17080375
Submission received: 4 July 2024 / Revised: 13 August 2024 / Accepted: 20 August 2024 / Published: 21 August 2024
(This article belongs to the Special Issue Shocks, Public Policies and Housing Markets)

Abstract

:
Using data from 2000 through 2022, we analyze the predictive capability of the annual numbers of new home constructions and four available environmental, social, and governance (ESG) factors on the average annual price of homes sold in eight major U.S. cities. We contrast the predictive capability of a P-spline generalized additive model (GAM) against a strictly linear version of the commonly used generalized linear model (GLM). As the data for the annual price and predictor variables constitute non-stationary time series, we transform each time series appropriately to produce stationary series for use in the GAMs and GLMs in order to avoid spurious correlations in the analysis. While arithmetic returns or first differences are adequate transformations for the predictor variables, we utilize the series of innovations obtained from AR(q)-ARCH(1) fits for the average price response variable. Based on the GAM results, we find that the influence of ESG factors varies markedly by city and reflects geographic diversity. Notably, the presence of air conditioning emerges as a strong factor. Despite limitations on the length of available time series, this study represents a pivotal step toward integrating ESG considerations into predictive time series models for real estates.

1. Introduction

Hedonic models are employed to analyze and predict average real estate prices via intrinsic and extrinsic factors. The average home price in a city plays an important role in the calculations made by potential homebuyers, particularly for low- and fixed-income buyers. Undoubtedly, the impacts of climate change and extreme weather will also affect the decisions made by potential homebuyers (as well as current homeowners) as the century progresses. Much work has been carried out in quantifying and modeling residence-based (e.g., lot area and number of bedrooms) and neighborhood-based (e.g., school zoning and homeowners’ association fees) factors. The impact of recent developments in environmental, social, and governance (ESG) policies and factors on real estate prices have not been as well-analyzed. This paper contributes to that analysis.
We begin by briefly describing work that has been carried out to assess the impact of ESG factors on homebuilding and consumer decision-making. The environmental, social, and governance components of ESG represent the sustainability factors of a property. Resiliency to global warming, the risk of a natural disaster, and the installation of renewable energy systems are examples of environmental factors. Noise pollution, construction worker labor standards, and homeowner satisfaction are examples of social factors. Legal issues related to property owner practices, regulatory compliance with standards set at all governmental levels, and overall transparency are examples of governance factors.
Ma et al. (2019) analyzed the impact of governmental policymaking processes on residential green energy additions and constructions. Even with a consumer base open to adopting environmentally friendly technologies, the cost bases, measured relative to non-green energy prices, play a large role in the adoption of such technologies. In particular, governmental policies on residential green energy subsidies that are too stringent can have an adverse effect on household installations.
Lauper et al. (2013) analyzed the green home acquisition and installation process from the point of view of a homebuilder. Social factors (e.g., behavioral control and social norms) have meaningful impacts on the energy-relevant decisions made in homebuilding. Social policies and norms, such as low energy consumption building certificates and awareness of available green technologies, have been shown to heighten consumer interest and spending on environmentally friendly appliances.
In addition to qualitative analyses based on consumer behavior, quantitative indices have been developed to provide guidance to consumers in assessing home prices. Environmental factors (e.g., average maximum temperatures and flood risk) can be expected to play a role in homebuyer decisions (and, therefore, real estate pricing). Mahanama et al. (2021) developed a natural disasters index to assess the level of future systemic risk caused by natural disasters. Their index used decades of property losses from the NOAA Storm Data to assess the main contributors to property losses. Although a homeowner’s thought process can be very subjective, a quantification of the risk of extreme weather events represents an important step in translating subjective thought processes into quantitative factors for use in modeling. The results of a survey of research at the intersection of climate risks, housing, and mortgage markets revealed that natural disasters are expected to continue to weigh heavily on home prices (Contat et al. 2023). Specifically, the risks of flooding and wildfires were shown to correlate inversely with home prices, as higher risks of floods and wildfires result in discounts on said prices.
Intrinsic and extrinsic factors, including some ESG factors, have been used to describe the variance in (the logarithm of) the expected sales price of homes (Bailey et al. 2022). When ESG factors (accessibility for the elderly and disabled, presence of central air conditioning, “green home” rating, and waterfront location) commonly available on real estate vendor websites were included, minor improvements were observed in the model adjusted R 2 values. Although the model results were city dependent, the potential impact that such ESG factors had in assessing home prices was established.
Such factors have also been shown to have different impacts on home valuations depending on a home’s value in the context of the local housing price distribution. For example, an analysis of 1366 home sales in Orem and Provo, Utah, from mid-1999 to mid-2000 found that segmenting houses by quantile was important in identifying the effects of several input factors (Zeitz et al. 2007). The number of bedrooms was found to be more significant in lower-priced homes than higher-priced homes, whereas the number of bathrooms was more significant in higher-priced homes than lower-priced homes. As another example, an analysis of 136,000 single-family home sales in Jacksonville, Florida, from 1990 to 2006 found that square footage and lot size were found to be more significant at the upper level of home prices, whereas home age was more significant at lower home prices (Zeitz et al. 2008). The analysis also assessed the impact of home location relative to a body of water, such as whether or not the property was an oceanfront one or bordered the St. John’s River.
Furthermore, an analysis in a similar vein conducted in Changsha, China, found that homes near the lowest and highest quantiles of the price distribution were more affected by the prices of nearby properties than those in the middle (Liao and Wang 2012). Similarly to the Jacksonville analysis, square footage was found to weigh more heavily as the home price increased. With regard to green areas, the study found that lower-priced units were positively impacted by the presence of green spaces, whereas the opposite was true for higher-priced units. Another study in China focused on the capital financing patterns of A-listed Chinese companies as commercial enterprises also found the significance of ESG factors, particularly for creditworthiness (Zahid et al. 2023). The relationship between the combined ESG score and market-based financial leverage was found to be negative and significant, which indicates that stronger ESG disclosures may have resulted in higher investor confidence in those companies. In a similar vein, they also found that a positive correlation existed between a company’s ESG performance and the managerial skills of its chief executive officer (Zahid et al. 2024). Furthermore, companies experiencing financial difficulties prioritized ESG performance more significantly than companies not experiencing financial difficulties. Thus, these analyses reveal that ESG factors are important considerations in non-Western countries, too.
Other ESG factors not currently featured on real estate vendor sites, such as the impacts of air pollution, have been explored. For example, an analysis of the closure of a toxic site leading to changes in atmospheric pollution levels found that the corresponding drop in SO2 levels correlated with an average house price increase of 6% (Lavaine 2019). However, the average price of flats decreased by 9%, suggesting that the impacts of refinery closures and changes in air pollution levels have heterogeneous effects on the subsamples.
The usual application of a hedonic home-pricing model is “cross-sectional”, consisting of a data set of response (price) and predictor (e.g., number of bedrooms, bathrooms, home size, etc.) variables for a sample of homes. Implicit in the cross-sectional analysis is the assumption that the data set represents independent and identically distributed random samples reflective of the pricing structure in a particular geographic area. By contrast, the application in this paper is to time series data. For a geographic area (specifically a city), the data set consists of the average annual home price, the number of homes sold per year, and yearly values for each of four available ESG factors. As we show (Appendix B), each time series is non-stationary and exhibits strong year-by-year trends, which can produce spurious correlations in fits by hedonic models.
This paper pursues three goals. The first is the determination of an appropriate transformation into a stationary form for each time series of annual data. The second is to evaluate the effectiveness and accuracy of the application, to these transformed series, of a P-spline-based generalized additive model (GAM) compared to a generalized linear model (GLM). The analysis used data from eight cities, and the cities are shown on the map in Figure 1 below.
These cities were chosen to represent variations in geography, primary economic activity, and population size and density. For example, the cities of Portland and Seattle represent the Pacific Coast, whereas the cities of Columbus and Oklahoma City represent the interior of the country. Furthermore, Austin has a strong economic base in the tech sector, whereas Nashville is a major center of the music industry. Finally, Atlanta has a high population density of over 1420 people per square kilometer, whereas Jacksonville has a much lower population density of about 490 people per square kilometer.
Each transformed time series is “de-trended”, as it represents values from a fixed-mean and fixed-variance random variable. Using principal component analysis, the third objective is to investigate the residual error time series (aggregated across cities) from each of the GAM and GLM fits to determine the presence of further latent random variables. As the model is deliberately parsimonious in terms of the number of predictor factors, we hypothesize the presence of additional latent variables.

2. Materials and Methods

2.1. Price and Factor Data

Price and factor data were acquired from Zillow1. The data set is composed of completed sale transactions of homes2 each year for the years 2000 through 2022 for eight cities3. For each year and city, the data set consists of the average home sale price (Av Price), the total number of homes constructed (New Homes), and four ESG factors: the number of homes with central air conditioning (Central AC), the number of green-rated homes (Green), the number of homes considered accessible to the elderly and disabled (Accessible), and the number of homes along a waterfront (Waterfront)4. The eight cities studied were Atlanta, GA (ATL), Austin, TX (AUS), Columbus, OH (COL), Jacksonville, FL (JAX), Nashville, TN (NAS), Oklahoma City, OK (OKC), Portland, OR (POR), and Seattle, WA (SEA). As an example, Table A2 in Appendix B summarizes the full data set for ATL.

2.2. Generalized Additive and Linear Models

A GAM relates a univariate response variable Y t to a set of predictor variables (factors) x k , t , k = 1 ,   ,   m . (Here, the subscript t = 1 ,   ,   τ indicates the observed set of values of the response and predictor variables. In this application, t indicates yearly time values.) Specifically, the GAM relates the expected value μ t = E Y t to the predictor values via
g μ t = β 0 + f 1 x 1 , t + f 2 x 2 , t + + f m x m , t ,         t = 1 , , τ .
The model assumes Y t ~ E F μ t , θ , where E F μ t ,   θ denotes the exponential family of distributions having mean μ t and scale parameter θ . The choice of the link function g   ·   relates expected values of the average μ t to the factors via
μ t = g 1 β 0 + f 1 x 1 , t + f 2 x 2 , t + + f m x m , t + ε t ,
where ε t denotes the residual error that is not captured by the model. The identity function was used for g   ·   , and P-splines (Eilers and Marx 1996) were used for the functions f j   ·   . Such P-splines minimize the penalized sum of squares
i = 1 τ Y τ j = 1 m f j x j , t 2 + j = 1 m λ j f j z 2 d z ,
where the tuning parameters λ j > 0 determine the weight assigned to the smoothness of each function. The values x t j are referred to as the knots for the function f j   ·   .
The results acquired from this GAM were compared to those from a standard GLM of the form
g E Y Y X = β 0 + β 1 x 1 + + β m x m + ξ X β + ξ .
In (4), matrix notation is used to represent the response and predictor values: Y = Y 1 ,   ,   Y τ T is the column vector of response values, β = β 0 ,   β 1 ,   ,   β m T is the column vector of unknown parameters, ξ = ε 0 ,   ε 1 ,   ,   ε τ T is the column vector of residuals, and X is a τ × m + 1 matrix. The first column of X is a vector of ones, while column j ,   j = 2 ,   ,   m + 1 , is the vector x j = x j , 1 ,   ,   x j , τ T of values for factor x j . As in the GAM, we used the identity function for g   ·   , which reduced (4) to a pure linear model.
In the GAM (1), the use of a non-linear function f j   ·   provides non-linear dependence on the time-dependent value of its argument x j , t . In the GLM (4), the corresponding coefficient β j is constant, which results in linear dependence on the time-dependent value of x j , t . Thus, a non-linear form for the f j   ·   enables a greater fitting accuracy for the GAM compared to the GLM. On the other hand, the GLM provides a superior model for prediction. The form of the function f j   ·   depends on the known values of its knots. Accurate prediction by GAM requires knowledge of future knot values to a much greater degree than the GLM5 does.

2.3. Transformation to Stationary Time Series

We wish to compare the accuracy of models (1) and (4) in predicting the time series of the expected value μ t = E Y t of average annual home price Y t in terms of the time series of the five factors x j , t , j = 1 ,   ,   5 described earlier for each city. Each time series consists of 23 years of values. It is necessary that the time series for the response variable and each factor be stationary in order to avoid spurious correlations in the factor analyses. Stationarity of each time series was investigated via the augmented Dickey–Fuller (ADF) test applied to the random walk model6
Δ y t = α y t 1 + i = 1 q δ i Δ y t i + ε t .
The statistic D F α = α ^ / S E α ^ 7 is used to test the hypotheses H 0 :   α = 0 (the existence of a unit root) against H A :   α < 0 . The rejection of the null hypothesis through a sufficiently small p -value suggests that no unit root is present and that stationarity can be assumed.
As illustrated in Figure A1 (Appendix B) for ATL and as verified by the ADF test, stationarity could not be inferred at any reasonable level of statistical significance for any factor or price time series for any of the eight cities. As discussed in Appendix B and illustrated for ATL in Figure A2, the use of the arithmetic return series for each predictor factor produced transformed time series that were acceptable. There were five exceptions for which the arithmetic return time series had to be replaced by simple first differences in order to avoid division by zero: the Accessible factor for Seattle, the Green factor for Columbus, and the Waterfront factor for three cities. As the first-difference time series for the Waterfront factor for all eight cities had very acceptable p -values (below 1.5%), we used the first-difference time series for the Waterfront factor for all cities for consistency. Thus, the predictor variables used in (1) and (4) represent transformed series (with the transformation being either arithmetic returns or first differences). The specific transformation used for each factor is summarized in Table A4 in Appendix B.
Table 1 provides the p -values for the ADF tests computed on each of the transformed predictive factor series. The transformed time series for each factor is assumed stationary at a 5% significance level with four exceptions: New Homes and Central AC for ATL and Accessible for JAX and OKC. Only one (Accessible for JAX) is not significant at the 10% level.
Neither arithmetic returns nor first (nor second) differences were sufficient to achieve stationarity for the average price time series. To obtain stationarity, we resorted8 to fitting an AR( q )-ARCH(1)-Student’s- t model
r t μ r = i = 1 q φ i r t i μ r + ϵ t   , ϵ t = σ t z t   ,         z t ~ t ν   , σ t 2 = ω + α 1 ϵ t 1 2
to the arithmetic return series r t = Y t Y t 1 / Y t 1 of the yearly average price. In (6), t ν denotes Student’s-t distribution with υ degrees of freedom. A fit was judged satisfactory if the innovation series z t was determined to be stationary (as verified by the ADF test). For each city, we chose the smallest value of q that produced a stationary innovation series. As detailed in Table A4 in Appendix B, the value q = 1 was sufficient for four cities, whereas q = 2 was required for the remaining four. The p -values obtained for the price innovation time series are also listed in Table 1. Four are significant at the 5% level, and the remaining four are significant at the 10% level. The resulting innovation time series z t served as the response-variable time series in the GAM and GLM fits.

2.4. Principal Component Analysis for Additional Systematic Factors

As noted in the previous section, the GAMs and GLMs were applied to response variables consisting of “average-price innovation” time series and either arithmetic return or first-differenced transformed predictor variable time series. As a result of the transformations, each time series is reduced to 22 (rather than 23) years of observations (2001 through 2022). For each city, the difference between the AR( q )-ARCH(1)-derived innovation and the regression model fit results in twenty-two residual error values (one per year). These residuals can be assembled in a matrix R = ε t , k , t = 1 , ,   22 , k = 1 , ,   8 ( k is the city index9). We performed a principal component analysis by computing the eigenvalues and eigenvectors of the variance–covariance matrix R T R in order to determine whether systemic factors remained in the residuals (Rachev et al. 2007). The eigenvectors correspond to the principal components (ordered in descending order). We refer to extreme value theory (de Haan and Ferreira 2006) to analyze the type of decay exhibited by the explained variances10 associated with the principal components. Consider the decaying discrete exponential distribution
f 1 x = 1 β 1 β ( x 1 ) = 1 β 1 β   e x l n 1 β ,         x = 0 ,   1 ,   ,       β 0 ,   1
and the decaying power-law zeta distribution
f 2 x = 1 ζ b x b ,         x = 1 ,   2 ,   ,       b > 1 ,
where ζ b is the Riemann zeta function and x is the index of the principal component. The relative changes with respect to x in these two distributions are
R 1 x = f 1 x + 1 f 1 x f 1 x = β ,     R 2 x = f 2 x + 1 f 2 x f 2 x = x 1 + x b 1 .
As the magnitude of R 1 x is independent of x , each component of the exponential fit has the same relative drop in importance. On the other hand, the magnitude of R 2 x decreases as higher components are added for any b > 1 ; thus, additional components add less value to the model. Power decay suggests that noise dominates the residuals, whereas exponential decay suggests that systemic factors continue to be unaccounted for (de Haan and Ferreira 2006). If f ( x ) represents the observed distribution of proportion of variance, plots of l n f ( x ) vs. x compared with l n f ( x ) vs. l n ( x ) will distinguish between exponential and power-law tail behavior.

3. Results

3.1. GLM and GAM Results

GLM and GAM fits were obtained, respectively, using the R lm (linear model) function and the gam package (Hastie 2023). Table 2 displays the p -values associated with the various factors for each city as fit by the GLM and GAM. Note that the p -values for ATL are identical under both the GAM and the GLM. For this city, the GAM P-splines simplified to linear terms and became identical to the GLM.
Because there is only a small set of factors, we evaluate the significance of each factor with a level of significance of 10%. Table 3 summarizes the number of significant factors for each city as well as the number of cities for which each factor was found to be significant. In either marginal view (i.e., by city or by factor), the number of significant quantities under the GAM equaled or exceeded that under the GLM. Notable differences in the number of significant factors occurred for COL, JAX, OKC, and POR. Increases in the significant number occurred for all five factors and particularly for New Homes and Central AC.
Table 2 also presents the adjusted R 2 values obtained from the model fits. These values are reflective of the marginal significance numbers summarized in Table 3. Figure 2 presents a box-and-whisker summary of the spread of the adjusted R 2 values for each model. The non-linear GAM produced consistently better values. The fact that some values are negative, particularly for the GLM, indicates model inappropriateness. Given the small number of predictor variables, the large adjusted R 2 values for the GAM were unexpected and indicate a direction for future investigation.
The results illustrate the potential for using ESG return (or first-difference) factors in modeling average home price innovation time series for cities. The GAM results indicate that such relationships are nonlinear. The nonlinear nature of a GAM is able to distinguish the predictive capability of the ESG factors (particularly the influence of central air conditioning) on average home prices. The results show conformity with the geographic locations and demographic profiles of the cities. As an example, consider the Waterfront factor. Water-body percentage (the percentage of the city area that is of a body of water) is a proxy (though not always an accurate one11) for waterfront acreage. The percentage of each city’s area comprised of bodies of water is provided in Table 4.
Consider the scatterplot of the Waterfront GAM p -values versus water-body percentage shown in Figure 3. As the proxy is approximate, we look for a fuzzy relationship by dividing the plot into four quadrants. Significant occupancy in the (low, high) and (high, low) quadrants indicates a fuzzy inverse relationship between the water-body percentage and Waterfront p -value. Three cities (ATL, AUS, and OKC) occupy the (low, high) quadrant with three (COL, JAX, and NAS) occupying the (high, low) quadrant. Two cities (POR and SEA) occupy the (high, high) quadrant with POR lying very close to the (high, low) quadrant. SEA’s water-body percentage is a poor proxy for waterfront area since a large fraction of the water-body area (Puget Sound and Lake Washington) is distant from the shoreline.
Similarly, we consider the percentage of seniors living alone as a proxy for the Accessible factor. Table 4 also shows the 2010 census results on the percentage of seniors living alone in each of the eight cities, and Figure 3 shows the relevant scatter plot and quadrants. Again, three cities (ATL, AUS, and SEA) occupy the (low, high) quadrant with three (COL, JAX, and OKC) occupying the (high, low) quadrant. This indicates a fuzzy inverse relationship as established by six of the eight cities.
For comparison purposes, Figure 4 presents these quadrant plots for the GLM fits. Maintaining a p -value threshold of 0.01 leads to no meaningful inverse relationship between the Waterfront p -value and water-body percentage. One might argue that there is a fuzzy inverse relationship between the percentage of seniors living alone and the Accessible p -value in the GLM results, but this would be based upon the (high, low) quadrant occupancy of a single city (OKC).

3.2. Principal Component Analysis and Residuals Results

Table A5 and Table A6 in Appendix C provide the residual matrices R for each model. Table 5 displays the proportion of variance obtained for each of the identified components for each model. The fits of the exponential and power-law decays (7) and (8) to the proportions of variance data in Table 5 are presented in Figure 5 along with the R 2 and mean squared error (MSE) results for each.
Visually and through the quantitative R 2 and MSE values, it is clear that the exponential fit does a better job of describing the data. Thus, we conclude that systemic factors not included in the models exist in the residuals. This was fully anticipated, as there was no expectation that a model for average annual home prices based only on new home constructions and four ESG factors would encompass all significant factors. Moreover, a comparison of the R 2 and MSE for the GAM and GLM exponential fits further supports our findings from the adjusted R 2 numbers and p -values that the GAM provides a model superior to the GLM.

4. Discussion

Our results demonstrate that P-spline GAMs possess strong predictive capabilities for the expected value of the average annualized home sale price in major U.S. cities with ESG factors. The results stand in stark contrast to the GLMs. Although each factor in the GAM was significant for multiple cities, some factors (particularly central air conditioning) were especially prevalent. Overall, the results of the eight surveyed cities strongly suggest that the significance of ESG factors is very city-dependent.
As climate change continues to warm the planet, cities at more northerly latitudes which would otherwise not experience hotter temperatures will see higher rates of central air conditioning. Therefore, we expect that the significance of available central air conditioning will increase as the century progresses for both average annualized real estate prices and individual home prices. The cities of Columbus, Jacksonville, Nashville, Oklahoma City, and Portland each had significant p -values for the air conditioning factor. Four of these cities have annual temperature ranges in excess of forty degrees Fahrenheit and no more than two months out of the year with daily highs in excess of ninety degrees Fahrenheit. Because global temperatures have increased over the past few decades, residents in these cities will experience more days with excessive heat; thus, air conditioning units will continue to increase in demand.
Such a situation and its policy ramifications occurred in Portland and Seattle during the 2021 Western North America heat wave during late June and early July. Temperature anomalies well in excess of twenty degrees Fahrenheit above the historical average were present in these two cities. According to the U.S. Census Bureau, only 44% of homes in Seattle had air conditioning in 2019; that number increased to over 53% just two years later. Since the heat wave, the City of Seattle has enhanced its rebates and credits for air conditioning units, and these rebates and credits are particularly focused on energy-efficient and green energy implementations12. Combined with the green energy credits of the Inflation Reduction Act passed at the federal level in 2022, it is clear that governmental financial incentives for consumers to install air conditioning units and green energy sources have encouraged many homeowners to do so.
One weakness of the current data set is the length of each time series. Yearly data points over 23 years result in first-difference, return, and innovation time series of length 22, which reduces the effective sample size needed for fits by the ARFIMA-GARCH-based model, GAM, and GLM. It would be better to have access to monthly data over the same 23-year time period. Unfortunately, we had no access to such data for this study. Additionally, the four selected ESG factors were the ones readily accessible from Zillow, and the incorporation of more ESG factors would have enhanced the analysis.

Author Contributions

Conceptualization, J.R.B. and S.T.R.; methodology, all authors.; software, J.R.B.; validation, all authors; formal analysis, all authors; investigation, J.R.B. and W.B.L.; data curation, J.R.B.; writing—original draft preparation, J.R.B.; writing—review and editing, W.B.L.; visualization, J.R.B. and W.B.L.; supervision, S.T.R. and W.B.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data and source code utilized in this study are available upon request to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Filter Values

Table A1. Filter values used for Zillow data.
Table A1. Filter values used for Zillow data.
FilterInputFilterInput
StatusSold
Price RangeMIN: USD 50k, MAX: USD 10M
Bedrooms1+
Bathrooms1+
Home TypeHouses, Townhomes,
Multi-Family, and
Condos/Co-ops
More Filters
Max HOAAnyMust have A/CESG b
Parking SpotsAnyMust have poolNS
Square FeetMIN: 500, MAX: NS aWaterfrontESG b
Lot SizeMIN: NS, MAX: NSCityNS
Year BuiltMIN: 2000, MAX: 2022MountainNS
Has basementNSParkNS
Single-story onlyACC cWaterNS
Hide 55+ communitiesNSSold in Last36 months
Keywords
“Green”,
“Green Home”
ESG“Accessible”ACC
a NS = not specified b “Yes” when filtering for those houses and “NS” otherwise. c Single-story only and/or classified as “Accessible”.

Appendix B. Analysis of Stationarity

Table A2 provides the price and factor data for the city of Atlanta. Figure A1 plots these 22-year time series. Visual inspection shows clear and non-stationary trends in each time series. The p -values obtained from the ADF test run on each of these times series are provided in Table A2. All p -values are strongly indicative of non-stationary time series. A common method used to transform a non-stationary time series x t into a stationary one is through first differences Δ x t = x t x t 1 or, equivalently, arithmetic returns r t = Δ x t / x t 1 13. The use of arithmetic returns is preferred, for its use on different time series produces transformed series of comparable magnitudes. Figure A2 plots the time series for Figure A1 in terms of their arithmetic return. Visually, the trends are eliminated or vastly reduced. Figure A3 plots the return series for the price and factor series, and Table A2 presents the resultant p -values from the ADF test. All are vastly improved, which indicates stationarity (at a threshold significance of 7.5%) with the exception of the price series. Fits of an AR( q )-ARCH(1)-Student’s t model to the price series for ATL produced the best results for q = 2 . The resultant innovation time series is shown in Figure A3, and the p -value from the ADF test for the innovation time series is provided in Table A3.
Table A2. Price and factor data for ATL.
Table A2. Price and factor data for ATL.
YearAv
Price
New
Homes
AccessibleCentral
AC
GreenWater-
front
2000174,500456434542031
2001187,800718857163838
2002196,4009191128734679
2003203,400649386152643
2004211,7001267139123540107
2005222,0001842140181555142
2006229,2001775171171837139
2007233,800145112413863699
2008225,5009161459021952
2009212,000427414013074
2010195,600330463054731
2011180,5008928875
2012172,9001051114614
2013183,4001682174711
2014200,9001673182138
2015216,60028352771614
2016232,40031953051123
2017249,10040463941226
2018269,60045124322528
2019286,400629118233126
2020303,2001225319683275
2021351,30010915410195588
2022430,000740297605167
Figure A1. Time series for year-averaged home sale price (Av Price) and the factors New Homes, Accessible, Central AC, Green, and Waterfront for the city of Atlanta for the years 2000 through 2022. (Source: Zillow).
Figure A1. Time series for year-averaged home sale price (Av Price) and the factors New Homes, Accessible, Central AC, Green, and Waterfront for the city of Atlanta for the years 2000 through 2022. (Source: Zillow).
Jrfm 17 00375 g0a1aJrfm 17 00375 g0a1b
Table A3. Significance ( p -value) of the time series for ATL.
Table A3. Significance ( p -value) of the time series for ATL.
FactorNew
Homes
AccessibleCentral
AC
GreenWater-
Front
Price
raw data (Figure A1)0.7290.3680.7570.5440.632>0.990
arithmetic return0.0740.0120.069****0.943
AR(2)-ARCH(1)
innovation
na anananana0.020
** indicates a p -value < 0.01 . a Not applicable.
Figure A2. Arithmetic return series for the times series displayed in Figure A1.
Figure A2. Arithmetic return series for the times series displayed in Figure A1.
Jrfm 17 00375 g0a2aJrfm 17 00375 g0a2b
Figure A3. Price innovation time series for ATL.
Figure A3. Price innovation time series for ATL.
Jrfm 17 00375 g0a3
Table A4 summarizes the transformations that were used on each time series, and Table 1 summarizes the p -values obtained from the ADF tests.
Table A4. Type of transformed series used in the GAM and GLM fits.
Table A4. Type of transformed series used in the GAM and GLM fits.
ATLAUSCOLJAXNASOKCPORSEA
rtn artnrtnrtnrtnrtnrtnrtn
rtnrtnrtnrtnrtnrtnrtnrtn
rtnrtnfd brtnrtnrtnrtnrtn
rtnrtnrtnrtnrtnrtnrtnfd
fdfdfdfdfdfdfdfd
q = 2  c q = 1   d q = 1 q = 2 q = 1 q = 1 q = 2 q = 2
a rtn: arithmetic return b fd: first difference c AR(2)-ARCH(1) d AR(1)-ARCH(1).

Appendix C. GLM and GAM Residuals

Table A5. Residual values (USD) for the GLM fit on each city by year.
Table A5. Residual values (USD) for the GLM fit on each city by year.
YearATLAUSCOLJAXNASOKCPORSEA
20010.068−0.229−0.0820.050−0.1010.0010.218−0.025
2002−0.793−0.282−0.0860.292−0.540−0.207−0.591−0.655
20030.246−0.6080.3940.244−0.3320.253−0.1000.773
2004−0.1910.057−0.0790.341−0.088−0.0430.8260.755
2005−0.411−0.034−0.0040.0050.8280.3221.5770.700
2006−0.433−0.047−0.352−0.2710.1120.206−0.020−0.815
2007−0.0530.251−0.089−0.198−0.481−0.577−0.196−0.780
2008−0.887−0.1510.016−0.056−0.761−0.0890.259−1.170
2009−0.416−0.354−0.1880.289−0.243−0.4850.294−0.067
2010−0.292−0.145−0.239−0.1610.234−0.253−0.150−0.036
20110.319−0.039−0.341−0.1710.319−0.024−0.699−0.275
20120.3480.4520.192−0.0990.0850.202−0.0540.710
20131.8570.3260.333−0.0180.137−0.4070.1800.150
2014−0.6300.1490.391−0.1190.2460.675−0.488−1.296
2015−0.9480.072−0.092−0.3000.616−0.312−0.7180.347
20160.027−0.077−0.0870.142−0.0030.0360.214−0.013
2017−0.1510.2280.219−0.1580.4060.070−0.1160.921
2018−0.295−0.158−0.2210.094−0.0310.117−0.268−0.511
2019−0.387−0.026−0.156−0.006−0.4980.273−0.5610.225
20200.340−0.115−0.086−0.053−0.164−0.038−0.0220.131
20212.1520.6570.2590.1890.1480.1440.7171.027
20220.5300.0730.298−0.0380.1110.134−0.302−0.096
Table A6. Residual values (USD) for the GAM fit on each city by year.
Table A6. Residual values (USD) for the GAM fit on each city by year.
YearATLAUSCOLJAXNASOKCPORSEA
20010.068−0.1470.037−0.048−0.2960.159−0.0060.184
2002−0.793−0.530−0.2320.119−0.463−0.369−0.052−0.686
20030.246−0.3210.1410.116−0.559−0.220−0.2950.935
2004−0.1910.195−0.1260.601−0.3040.0001.5350.945
2005−0.4110.195−0.186−0.2030.8570.5301.8070.884
2006−0.433−0.019−0.591−0.2930.045−0.481−0.784−0.704
2007−0.0530.057−0.454−0.867−0.606−0.317−0.926−1.442
2008−0.887−0.190−0.397−0.025−0.967−0.607−0.414−1.151
2009−0.416−0.442−0.0350.345−0.951−0.430−0.378−0.255
2010−0.292−0.404−0.195−0.034−0.164−0.5360.167−0.122
20110.319−0.103−0.1490.1450.026−0.513−1.677−0.200
20120.3480.4430.3920.015−0.3240.6540.3630.707
20131.8570.0810.4610.0920.032−0.6060.2990.463
2014−0.6300.1720.516−0.0780.6880.971−0.090−1.100
2015−0.948−0.0630.094−0.2790.670−0.0170.0610.452
20160.027−0.220−0.3700.329−0.141−0.4810.483−0.051
2017−0.151−0.0660.200−0.2000.331−0.061−0.2710.853
2018−0.295−0.450−0.0530.009−0.348−0.383−0.289−0.469
2019−0.3870.130−0.015−0.127−0.0730.898−0.0060.475
20200.3400.036−0.113−0.051−0.3080.032−0.385−0.006
20212.1521.4060.5580.5091.3700.2171.9520.949
20220.5300.2400.517−0.0441.4831.560−1.092−0.660

Notes

1
Data from https://www.zillow.com/homes/ were collected by specifying the city in the search field and then the entries for all filters as provided in Table A1 in Appendix A.
2
Home types considered are specified in the appropriate filter in Table A1.
3
Note that the data apply only to homes constructed within the city boundaries and not to homes within the associated Metropolitan Statistical Area.
4
Specifically, three of the factors are environmental and one (accessibility) is social, although all four are often influenced by local policies.
5
Restated in the context of the P-splined-based GAM and the GLM used here, extrapolation using polynomials is much less accurate than extrapolation using a linear least-squares fit.
6
In (5), we use a generic notation y t to denote the time series being tested.
7
S E · denotes standard error.
8
We tested a variety of ARFIMA-GARCH models before settling on AR( q )-ARCH(1) with q = 1 ,   2 . We desired an ARFIMA-GARCH model that was as parsimonious as possible in the number of coefficients to be fit.
9
We index the cities in alphabetical order.
10
The explained variance associated with each principal component is the ratio of its eigenvalue to the sum of all eigenvalues.
11
If two cities border a body of water in the United States, then the common city boundary often divides the body of water along a line medial to the city shorelines. Thus, two or more cities bordering a large and contained body of water can have large water-body percentages but relatively short shorelines.
12
13
Higher-order differences may be required if the time series is integrated of an order higher than one.

References

  1. Bailey, Jason R., Davide Lauria, W. Brent Lindquist, Stefan Mittnik, and Svetlozar T. Rachev. 2022. Hedonic models of real estate prices: GAM models; environmental and sex-offender-proximity factors. Journal of Risk and Financial Management 15: 601. [Google Scholar] [CrossRef]
  2. Contat, Justin, Carrie Hopkins, Luis Mejia, and Matthew Suandi. 2023. When Climate Meets Real Estate: A Survey of the Literature; Working Paper 23-05; Washington, DC: Federal Housing Finance Agency.
  3. de Haan, Laurens, and Ana Ferreira. 2006. Extreme Value Theory: An Introduction. Berlin: Springer. [Google Scholar]
  4. Eilers, Paul H. C., and Brian D. Marx. 1996. Flexible smoothing with B-spines and penalties. Statistical Science 11: 89–121. [Google Scholar] [CrossRef]
  5. Hastie, T. 2023. Package ‘gam’. (V. 1.22-3). Available online: https://cran.r-project.org/web/packages/gam/gam.pdf (accessed on 3 July 2024).
  6. Lauper, Elisabeth, Susanne Bruppacher, and Ruth Kaufmann-Hayoz. 2013. Energy-relevant decisions of home buyers in new home construction. Umweltpsychologie 17: 109–23. [Google Scholar]
  7. Lavaine, Emmanuelle. 2019. Environmental risk and differentiated housing values: Evidence from the north of France. Journal of Housing Economics 44: 74–87. [Google Scholar] [CrossRef]
  8. Liao, Wen-Chi, and Xizhu Wang. 2012. Hedonic house prices and spatial quantile regression. Journal of Housing Economics 21: 16–27. [Google Scholar] [CrossRef]
  9. Mahanama, Thilini, Abootaleb Shirvani, and Svetlozar T. Rachev. 2021. A natural disasters index. Environmental Economics and Policy Studies 24: 263–84. [Google Scholar] [CrossRef]
  10. Ma, Junhai, Aili Hou, and Yi Tian. 2019. Research on the complexity of green innovative enterprise in dynamic game model and governmental policy making. Chaos, Solitons & Fractals: X 2: 1000008. [Google Scholar]
  11. Rachev, Svetlozar T., Stefan Mittnik, Frank J. Fabozzi, Sergio M. Focardi, and Teo Jašić. 2007. Financial Econometrics. New York: Wiley. [Google Scholar]
  12. US Census Gazetteer Files, United States Census Bureau. 2023. Available online: https://www.census.gov/geographies/reference-files/time-series/geo/gazetteer-files.2023.html (accessed on 14 January 2023).
  13. US Census, United States Census Bureau. 2010. Available online: https://www.census.gov/content/dam/Census/library/publications/2011/dec/c2010br-09.pdf (accessed on 17 September 2023).
  14. Zahid, R. M. Ammar, Adil Saleem, and Umer Sahil Maqsood. 2023. ESG performance, capital financing decisions, and audit quality: Empirical evidence from Chinese state-owned enterprises. Environmental Science and Pollution Research 30: 44086–99. [Google Scholar] [CrossRef] [PubMed]
  15. Zahid, R. M. Ammar, Muhammad Kaleem Khan, Umer Sahil Maqsood, and Marina Nazir. 2024. Environmental, social, and governance performance analysis of financially constrained firms: Does executives’ managerial ability make a difference? Managerial and Decision Economics 45: 2751–66. [Google Scholar] [CrossRef]
  16. Zeitz, Joachim, Emily Norman Zietz, and G. Stacy Sirmans. 2007. Determinants of House Prices: A Quantile Regression Approach. The Journal of Real Estate Finance and Economics 37: 317–33. [Google Scholar] [CrossRef]
  17. Zeitz, Joachim, G. Stacy Sirmans, and Greg T. Smersh. 2008. The Impact of Inflation on Home Prices and the Valuation of Housing Characteristics Across the Price Distribution. Journal of Housing Research 17: 119–38. [Google Scholar] [CrossRef]
Figure 1. The eight selected cities plotted on a map of the continental United States.
Figure 1. The eight selected cities plotted on a map of the continental United States.
Jrfm 17 00375 g001
Figure 2. Box-and-whisker summary of the adjusted R 2 values of Table 2 for the GLM and GAM fits.
Figure 2. Box-and-whisker summary of the adjusted R 2 values of Table 2 for the GLM and GAM fits.
Jrfm 17 00375 g002
Figure 3. (Left) Waterfront p -value versus water-body percentage. (Right) Accessible p -value versus percentage of seniors living alone for the GAM fits.
Figure 3. (Left) Waterfront p -value versus water-body percentage. (Right) Accessible p -value versus percentage of seniors living alone for the GAM fits.
Jrfm 17 00375 g003
Figure 4. (Left) Waterfront p -value versus water-body percentage. (Right) Accessible p -value versus percentage of seniors living alone for the GLM fits.
Figure 4. (Left) Waterfront p -value versus water-body percentage. (Right) Accessible p -value versus percentage of seniors living alone for the GLM fits.
Jrfm 17 00375 g004
Figure 5. (Left) Exponential and (right) power-law fits to the proportions of variance obtained for eight components arising from the principal component analysis for the GLM and GAM fits to all eight cities.
Figure 5. (Left) Exponential and (right) power-law fits to the proportions of variance obtained for eight components arising from the principal component analysis for the GLM and GAM fits to all eight cities.
Jrfm 17 00375 g005
Table 1. ADF test p -values for each transformed time series by city.
Table 1. ADF test p -values for each transformed time series by city.
FactorATLAUSCOLJAXNASOKCPORSEA
New Homes0.074**0.0170.034**0.021**0.014
Central AC0.0690.0150.0130.0320.0120.036****
Green****************
Accessible0.012****0.244**0.076****
Waterfront0.015**************
Av Price
Innovations
0.020**0.0360.0530.0950.0550.0200.097
** Indicates a p -value <   0.01 .
Table 2. Significance ( p -value) of the factors in the GLM and GAM fits.
Table 2. Significance ( p -value) of the factors in the GLM and GAM fits.
FactorATLAUSCOLJAXNASOKCPORSEA
GLM
New Homes0.7470.1890.1840.1030.0250.5150.1760.632
Accessible0.4670.9940.1690.3150.585**0.3530.320
Central AC0.5940.2340.1690.1170.0240.7000.5500.879
Green0.5000.1000.2490.6330.2470.1160.1910.457
Waterfront0.6290.9750.8380.8070.0410.9290.8550.242
Adj. R 2 0.167 0.0610.1440.1590.2180.504 0.5250.226
GAM
New Homes0.7470.1510.1000.0170.0910.1520.0170.555
Accessible0.4670.9450.0310.0210.677**0.7200.169
Central AC0.5940.2400.0630.0270.0850.0150.0320.997
Green0.5000.0190.3560.1880.102**0.0730.363
Waterfront0.6290.6460.0690.085**0.9840.1230.462
Adj. R 2 0.167 0.3880.5180.5600.7030.8550.4680.349
** indicates p -value < 0.01 .
Table 3. Summary of marginal significances in Table 2 using a p -value threshold of 10%.
Table 3. Summary of marginal significances in Table 2 using a p -value threshold of 10%.
Number of significant factors
ModelATLAUSCOLJAXNASOKCPORSEA
GLM01003100
GAM01443330
Number of cities for which a factor is significant
ModelNew
Homes
AccessibleCentral
AC
GreenWater-
front
GLM11111
GAM43533
Table 4. Percentage of water area and percentage of seniors living alone, by city.
Table 4. Percentage of water area and percentage of seniors living alone, by city.
ATLAUSCOLJAXNASOKCPORSEA
Water Area a0.72.02.614.54.22.37.940.9
Seniors b3.84.67.27.98.213.49.04.1
a source: US Census Gazetteer (2023) b source: US Census (2010).
Table 5. Proportion of explained variance by principal component (PC).
Table 5. Proportion of explained variance by principal component (PC).
ModelPC 1PC 2PC 3PC 4PC 5PC 6PC 7PC 8
GLM0.4530.2050.1110.0870.0610.0410.0280.014
GAM0.3190.2090.1440.1380.0750.0500.0390.028
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Bailey, J.R.; Lindquist, W.B.; Rachev, S.T. Hedonic Models Incorporating Environmental, Social, and Governance Factors for Time Series of Average Annual Home Prices. J. Risk Financial Manag. 2024, 17, 375. https://doi.org/10.3390/jrfm17080375

AMA Style

Bailey JR, Lindquist WB, Rachev ST. Hedonic Models Incorporating Environmental, Social, and Governance Factors for Time Series of Average Annual Home Prices. Journal of Risk and Financial Management. 2024; 17(8):375. https://doi.org/10.3390/jrfm17080375

Chicago/Turabian Style

Bailey, Jason R., W. Brent Lindquist, and Svetlozar T. Rachev. 2024. "Hedonic Models Incorporating Environmental, Social, and Governance Factors for Time Series of Average Annual Home Prices" Journal of Risk and Financial Management 17, no. 8: 375. https://doi.org/10.3390/jrfm17080375

Article Metrics

Back to TopTop