In the empirical section, we first present the data utilized in the study. After that, in step 1, we pretest the data for outliers before we present descriptive statistics in step 2. Propensity scores are estimated in step 3, together with two different matching approaches before the final hedonic price equation is estimated in step 4 using the program Stata. In step 5, the capitalization effect is estimated both on the average and across the price distribution. In step 6, we test the robustness of the estimates by estimating a spatial lag model and a spatial drift model. Further, in step 7, we test whether the capitalization is higher in the northern part of Sweden.
Data
We used data from Sweden covering the period between 2013 and 2018. In all, we were able to utilize more than 100,000 single-family housing arm’s-length sales in all municipalities in Sweden. The data are from Mäklarstatistik AB, which is an association of brokers that covers about 80 percent of all brokered properties.
The dependent variable is the natural logarithm of house prices used, based on the transaction price when the contract was signed. The variables included in the data are, in addition to price (restricted to higher price than 340,000 SEK, equivalent to 35,000 USD), the independent variables EPC, living area (less than 300 square meters), number of rooms (less than 10 rooms), plot size (larger than 1000 square meters and smaller than 25,220 square meters), age, and contract date (sale year after 2012), as well as information about location measured by longitude and latitude.
Our key variable is EPC grade. To make our results somewhat comparable to some of the earlier studies, we tested the hypothesis of a price premium of EPC grade A–C to the default grade D–G. That is to say, A–C is the treatment, and D–G is the control group, if one employs the authors of [
24] terminology. Hence, the variable EPC is a binary variable, where one is equal to the treatment group, and zero is equal to the control group.
Step 1: Pretest of data
The preliminary hedonics price equation is estimated, and the regression observation weights are calculated. As said, the natural logarithm of the price was used as a dependent variable, and the included co-variates are EPC (binary variable indicating one for A–C grade), the natural logarithm of the living area and age, number of rooms, plot size, and location, as well as the fixed monthly and municipality effects. The coefficients concerning the default hedonic model are presented in Table 3.
Figure 1 shows the distribution of the estimated regression observation weights exhibited.
The average regression weight is equal to 0.9, and the median is equal to 0.96. The standard deviation of the regression weights is equal to 0.13. At the percentile 0.25, the regression weight is equal to 0.75, which we used as a cut-off value. After deleting potential outliers (around 20,000 outliers or 20 percent of the original data), the data set consisted of almost 83,000 transacted single-family houses after the deletion of the outliers. Around 19,000 of them were treated, that is, with an EPC grade of A–C, and around 63,000 were in the control group.
Step 2: Descriptive analysis
Figure 2 exhibits the distribution of EPC grades in the sample exhibited. The average grade is equal to D, and the median grade is equal to E. There are relatively few houses with grade A in the sample, around 1 percent and additional 6 percent with grade B. However, if we are investigating houses younger than 10 years, we can observe that more than 3 percent have grade A, and almost 19 percent have grade B. The median grade among homes built the last 10 years is C. Hence, this fact highlights the potential bias that might exist in the sample and force use of the propensity score method.
Table 1 shows descriptive statistics concerning the dependent variable and the independent variables for the treatment group, the control group, and all.
The price difference between the treatment group and the control group is small and not statistically significant. On the other hand, some of the housing attributes differ substantially between the treatment group and the control group. For example, the size of the houses in the treated sample is around 141 square meters compared to only 130 square meters in the control group. We can also observe that the average age is lower in the treatment group. That is, it is more likely that a new, larger house also has a higher EPC rating than an older and smaller home. This selection bias can potentially have some consequences when we estimate the EPC effect on housing prices.
Step 3: Propensity score method
In the first step of the propensity score matching, we estimated the propensity score with logistic regression where treatment is the dependent variable and a set of housing attributes such as living area, age, plot size, and number of rooms, as well as location attributes such as longitude, latitude, municipality, and county. Time is also included in the propensity score model.
Figure 3 displays the propensity score of the logistic regression.
The explanation power is on the lower side. However, the primary goal with the propensity score model is not to maximize the goodness to fit; the goal is to create as small differences between treated and untreated as possible. The authors of [
27] proposed to start comparing the similarity between the groups in the matched sample by comparing means, medians, and standard deviations. In the following step, we therefore applied propensity score matching. As we said earlier, we performed two different types of matching, specifically, the nearest neighbor and radius.
Table 2 exhibits the average values for some main attributes for the treated properties and control groups.
The first thing that we can observe is that there are small differences between the matching methods; they all produce almost the same matching transactions. Moreover, the number of treated transactions is nearly the same, and the control group is slightly smaller than the treated group. The total sample of matched transactions is considerably more limited than the total number of observations that we initially used. The difference in average values concerning the housing attributes is small. None of them are statistically significant at a 5% percent significance level. The difference in the matched sample with the unmatched sample is substantial. Compared with the statistics in
Table 1, the matching procedure controls for all differences. Based on this, we are confident that we have balanced the data set effectively. Concerning the price, the difference in average price in the treated and the control group is not significant, and more surprisingly, the difference has the wrong sign.
Step 4: Hedonic price equation
The next step is to run the hedonic regressions. The default model is a model where the outlier and selection bias are uncontrolled. The second and third models control for outlier and selection bias using influential weights and the propensity score directly into the hedonic price equation. We first included the propensity score as a covariate (model name Multivariate 1) and using the inverse propensity score as sample weights (Multivariate 2). Next, we controlled for selection bias by only using the matched sample, namely, by the nearest neighbor (Matched 1) and within a radius (Matched 2). Finally, we controlled for selection bias using the estimated strata as fixed effects (Stratified).
Table 3 exhibits the results from all the above models. All models were estimated as log-linear models where the natural logarithmic of the price is the dependent variable, and the independent variables’ living area and age are also natural logarithmic, while the rest of the independent variables are untransformed.
Overall goodness-of-fit (adjusted R2) is around 75–87 percent, which can be considered to be excellent and comparable to other hedonic price studies. The difference between the default model and the models controlling for outlier and selection bias is substantial. Estimated coefficients concerning living area, number of rooms, and age are of reasonable magnitude, and they are all statistically significant. For example, one extra room is expected to increase the price by around 3 percent, and a year-older, single-family house will decrease the price by 0.16 percent. The estimates are also robust across specifications of the hedonic model. The EPC variable is estimated to be at its lowest at 3.36 percent (coefficient 0.033) in the Matched model 2 compared to 5.27 percent (coefficient 0.0514) in the Default model. That is equivalent to almost SEK 100,000 (Euro 9278) and approximately SEK 200,000 (Euro 18,556), respectively. Hence, failure to consider outliers and potential selection problems can have significant consequences and policy implications.
The low VIF (variance inflation factor) value concerning our EPC variable indicates that the multicollinearity problem is modest. Moreover, we can reject the null hypotheses of heteroscedasticity (Breusch–Pagan test) and non-normality (Shapiro–Francia test), respectively.
Step 5: Quantile regression
Using the propensity score matching and the quantile regression, we can analyze if the treatment effect varies in the price distribution.
Table 4 shows the estimated treatment effect, presented together with t-values in each percentile of the price distribution.
It is interesting to observe that the impact of high energy performance on the house prices does not vary as much as expected. It seems that our results do not support the findings of, for example, [
14]. Our results indicate the impact of EPC is present in all housing price segments and that the percentage impact is almost the same. The effect only varies between 2.62 and 2.94 percent.
Step 6: Robustness of the model
Spatial dependency may be present even if we are controlling for space by including location covariates, such as fixed effects for the municipality and the county, as well as longitude and latitude. Our results are robust, tested by controlling for spatial dependence. Two different spatial weight matrices were used, namely, inverse distance and inverse distance within 4 kilometers. We estimated a spatial error model (SEM), a spatial autoregressive model (SAR), and a spatial Durbin model (SDM). We estimated the models with maximum likelihood estimations. The total effects of the various models are presented in
Table 5.
The first thing we notice is that our estimate concerning the EPC is remarkably robust across spatial econometric specifications. The estimates only vary from 3.43 percent to 3.49 percent (not significantly different from each other). Moreover, spatial dependency is present, but it will not cause any bias or estimates. Third, the spatial component does not add to the goodness-of-fit. Hence, we concluded that the spatial dependency in the hedonic model does not have any severe impact on our estimates.
Step 7: Parameter heterogeneity
Sweden is a long country from the 55th-degree to 68th-degree latitude (that is, closer to the North Pole than to the equator). Sweden has two different climatic zones, namely, the subarctic climate zone (from 60 degrees latitudes and up) and the hemiboreal climate zone.
Figure 4 shows a map of Sweden and the differences in annual average temperatures. In the southern part of Sweden, the yearly temperature is around 7–10 degrees Celsius, and up in the north, it is around 0 degrees. The climate is warmer along the east coast than inland. Lower temperatures up in the subarctic zone could potentially mean that EPC impacts house prices more as energy costs are more likely to be higher. As shown in [
15], energy costs have an impact on people’s willingness to pay for high EPC ratings in Finland. To test this hypothesis, we created an interaction variable between treatment effect and a binary variable that indicates latitudes higher than 60 degrees—that is, close to the subarctic climate zone.
If the estimated parameter concerning the interaction variable is significant, we can accept the hypothesis that there is more willingness to pay for the high rating in the subarctic climate zone.
Table 6 displays the results. The results also indicate that the impact of EPC on house prices in the northern part of Sweden (5 percent) is higher than in the southern region (3 percent). Regardless of the matching method, the estimates are significantly different from zero, indicating that we should unsurprisingly reject the hypothesis that energy expenses do not have an impact on capitalization. Hence, climate influences energy cost and, therefore, potentially could have an impact on the willingness to pay for high EPC ratings. Something we have not tested here is whether the capitalization has a seasonal effect. An expected impact of season could be that a high EPC rating in northern Sweden is capitalized more during the winter months than in the summer. It is an issue that is well suited to future research.