Appendix A. Econometric Estimation. Semiparametric Approach
A semi-parametric model specifies the conditional mean of the dependent variable as two separate components, one parametric and one non-parametric. These types of models are very attractive from an empirical point of view due to their flexibility to balance precision and robustness. On the one hand, it allows us to incorporate prior information from economic theory or past experience while maintaining more flexibility in the specification of the model. On the other hand, although there is a nonparametric part that shows a slower convergence rate, the estimators obtained for the parametric part exhibit the same statistical properties as if the whole model were totally parametric. This is the so-called 
-consistency property (see 
Robinson (
1988) and 
Speckman (
1988), for example) for cross-sectional models. Finally, semiparametric models allow to provide a solution to the well-known “curse of dimensionality” of the fully nonparametric models. Several reviews on this topic exist and we suggest the interested reader consult 
Ai and Li (
2008), 
Henderson and Parmeter (
2015), 
Parmeter and Racine (
2019), 
Rodriguez-Poo and Soberon (
2017), and 
Su and Ullah (
2011), among others.
A semi-parametric panel data model with heterogeneous slopes and unknown functions and cross-sectional dependence is given by
        
        where 
 denotes the dependent variable (i.e., the environmental quality measure in Equation (
5)), 
 and 
 are 
 and 
 vectors of the explanatory variables of interest (i.e., in the case of Equation (
5), 
 and 
), respectively, 
 is an unknown smooth function and 
 is a 
 vector of unknown population parameters.
The aim of the researchers is to obtain consistent estimators of 
 and 
 knowing that 
 is a 
 vector of observed common effects (including deterministic regressors such as intercepts or seasonal dummies), 
 is a 
 vector of unknown parameters, 
 is a 
 vector of unobserved common factors, 
 is the corresponding vector of the factor loadings, and 
 are the individual-specific (idiosyncratic) errors assumed to be independently distributed of 
. In general, however, the unobserved factors 
 could be correlated with 
, and to allow for such a possibility, we adopt the fairly general models for the individual specific regressors,
        
        where 
, 
, 
, and 
 are 
, 
, 
, and 
, factor loading matrices with fixed components, and 
 and 
 are the specific components of 
 and 
, respectively, distributed independently of the common effects and across 
i, but assumed to follow general covariance stationary processes.
With the aim of obtaining consistent estimators for 
 and 
, in the following we will show how, with modifications, the Common Correlated Effects (CCE) approach proposed in 
Pesaran (
2006) can be applied to a semiparametric regression model.
Let 
 and 
 be matrices of zero of 
 and 
 dimension, respectively, 
 and 
 identity matrices of 
 and 
 dimension. If we combine (
A1)–(
A3) and rearrange terms, we can write
        
        where 
 and 
 are matrices of 
 and 
 dimension, respectively, of the form
        
In order to show that using suitable proxies for the unobserved factors is enough to avoid having to use initial estimates of 
, we take the cross-sectional sample averages of (
A4) obtaining
        
        where 
, 
, and 
. Furthermore, 
, 
, 
, and 
 are the cross-sectional averages of 
, 
, 
, and 
, respectively, and 
. Let 
, where 
, 
, and 
. Following 
Pesaran (
2006), we can premultiply both sides of (
A5) by 
 and solve for 
,
        
        provided that
        
As 
, 
, 
, 
, and 
 for each 
t under weak conditions. It follows,
        
The result of this last line suggests that we can use 
 as observable proxies of the unobservable factors, 
. Therefore, we can conclude that effectively the Common Correlated Effect (CCE) approach proposed in 
Pesaran (
2006) for fully parametric models can be applied in a semi-parametric setting with slight changes.
In this situation, we can estimate 
 and 
 by augmenting the semiparametric regression of 
 on 
 and 
 with 
 obtaining the following regression model
        
        where 
 captures possible approximation errors of the proxies. In addition, 
 is a 
 vector of proxies, where 
.
In order to get a 
-consistent estimator of 
, we follow 
Robinson (
1988) to eliminate the unknown functional 
. Taking conditional expectations of (
A9) yields
        
        and subtracting (
A10) from (
A9) yields
        
In order to obtain feasible estimators for 
, it is well-known that these conditional expectations are unknown and need to be estimated. With this aim, 
Robinson (
1988) proposes to use (higher-order) Nadaraya-Watson kernel estimators. Later, 
Linton (
1995) and 
Hamilton and Truong (
1997), among others, pointed out that partial regression methods can be improved further by using local linear smoothers (see 
Fan and Gijbels (
1996) to a deeper discussion about the desirable properties of these estimators). At the light of these results, we propose to use a local linear smoothers to estimate these conditional expectations.
Let 
, where (
, or 
, or 
). For a given point 
 and for 
 in a neighbourhood of 
z, we propose to minimize the following weighted local linear least-squares (LLLS) problem for 
,
        
        where 
 is a product kernel function such as 
, 
 is the 
lth component of 
u, and 
a is a positive bandwidth term. Of course, a general diagonal or non-diagonal bandwidth matrix could be employed, but for the sake of simplicity, a single scalar bandwidth is used.
Using the resulting estimators for these conditional estimators in (
A11) and writing the resulting expression in vectorial form yields
        
        where 
 and 
 are 
T-dimensional vectors, 
 and 
 are matrices of dimension 
 and 
, respectively, and 
 is a 
 diagonal matrix. Further, assuming that 
 is invertible, 
 is a 
 smoothing matrix associated to the individual 
i of the form
        
        where 
 is a 
 matrix, 
 is a 
 vector having 1 in the first entry and all other entries 0, 
 is a 
 diagonal matrix. Note that 
 is the new error term which consists of three elements: (i) original error term, (ii) approximation error of the proxies, (iii) approximation error of the Taylor expansion.
By the formula for partitioned regression, the estimator of 
 in (
A13) is given by
        
        where 
 is a projection matrix, for 
, 
, and 
.
Following a similar reasoning, the estimator of 
 in (
A13) is given by
        
        where 
.
Focusing now on the nonparametric estimation of the smooth unknown function 
, we use the above estimator so the corresponding weighted local linear least-squares problem to minimize is of the following form
        
        where 
 is a product kernel defined as in (
A12) and 
h is the new bandwidth term.
Then, assuming that 
 is invertible, the resulting CCE nonparametric estimator of 
 is given by
        
        where 
 is a 
 diagonal matrix defined as 
, with 
h instead of 
a.
Under the conditions in 
Appendix B one can show that the semiparametric CCE estimator, 
 is consistent and asymptotically normal as 
N and 
T tends to infinity. More precisely, following a similar proof scheme as in 
Pesaran (
2006), the following result is obtained.
Theorem A1. Consider the panel data model (A1), and suppose that , , , Assumptions A1–A3 and A4–A7 hold,  (in no particular order), and the rank condition (A7) is satisfied. Then,  and  are consistent estimators of  and , respectively. If it is further assumed that  as , then where , ,  and  are covariance matrices. Furthermore,  for , where  and  are vectors whose tth element are such as  and .  Similarly, under the conditions in 
Appendix B one can show that the nonparametric CCE estimator, 
 is consistent and asymptotically normal as 
N and 
T tends to infinity.
Theorem A2. Consider the panel data model (A1), and suppose that Assumptions A1–A9 hold. Given the -consistency of  and , as ,where  and  is the Hessian matrix of .  That theorem is proved following a similar proof scheme as in 
Musolesi et al. (
2020), so it is omitted. The detailed proof of the theorem can be provided upon request.
Finally note that the estimate of the variances of the above theorems can be used to construct standard errors for  or confidence bands for . We use a standard multivariate kernel density estimator with a Epanechnikov kernel and the Silverman’s rule-of-thumb to chose the bandwidth.
  Appendix B. Assumptions
In order to derive the asymptotic distribution of 
 and 
 obtained in 
Appendix A, the following notation is used. Denoting 
 and 
, where 
 and 
. Furthermore, the following conditions are required.
Assumption A1. (Common Effects).  The  vector for common effects , is covariance stationary with absolute summable autocovariances, distributed independently of the individual-specific errors , , and  for all i, t, and .
 Assumption A2. (Individual-Specific Errors). 
The individual-specific errors , , and  are distributed independently for all i, j, t, and . Furthermore, for each i, , , and  follow linear stationary processes with absolute summable autocovariances , , and , where  are  vectors of identically, independently distributed  random variables with mean zero, variance matrix , and finite fourth-order cumulants. In particular,for all i and some constants , , and , where  and  and  are positive definite matrices. Assumption A3. (Identification of ) For each i,  and  are nonsingular  matrices and have finite second-order moments for all i. Furthermore, .
 Assumption A4. (Density function). The density of  satisfies  and is twice continuously differentiable in all its arguments with bounded second-order derivatives at any point of its support.
 Assumption A5. (Smoothness condition). Let  be the support of . The unknown functions , , and  are bounded and twice continuously differentiable at z in the interior of  with second-order derivatives bounded.
 Assumption A6. (Kernel function).  is a product kernel, and the univariate kernel function  is compactly supported and bounded such that , , and , where  and  are scalars and  is a  identity matrix. All odd-order moments of k vanish, that is, , for all non-negative integers  such that their sum is odd.
 Assumption A7. (Bandwidth). a and h are positive bandwidths such that as , ,  and . Furthermore, as , .
 Assumption A8. The map  is twice continuously differentiable at z in the interior of  with second-order derivatives bounded.
 Assumption A9. (Lyapounov). For some ,  exists and is bounded.