*3.3. Model 3: Four Components of the Model with Determinants of Ine*ffi*ciency*

To fully satisfy the assumptions made in the model, we introduce a final model by [24] and [25] that overcomes some of the limitations of the earlier models. In this model, the error term is split into four components. The four components in this paper's context capture:


Then, our final model based on these characteristics is the Kumbhakar et al. [25] model which is specified as:

$$ENE\_{it} = \alpha\_0 + f(\mathbf{x}\_{it}; \boldsymbol{\beta}) + \mu\_i + \upsilon\_{it} + \eta\_i + \mu\_{it}$$

where μ*<sup>i</sup>* is two-sided individual province heterogeneity, *vit* is a two-sided random error term, η*<sup>i</sup>* is one-sided time-invariant individual inefficiency, and *uit* is one-sided time-variant inefficiency. In production models, the signs on the front of the inefficiency components are negative, reflecting production below the frontier output, while in cost or energy use models they are positive, suggesting higher cost or energy use above the minimum or frontier.

Instead of using a single stage ML estimation method based on the distributional assumption of the four components ([32], a simpler multi-step procedure is considered and we write the model as:

$$ENE\_{it} = \alpha\_0^\* + f(\mathbf{x}\_{it}; \boldsymbol{\beta}) + \alpha\_i + \varepsilon\_{it}$$

where α∗ <sup>0</sup> = α<sup>0</sup> − *E*(η*i*) − *E*(*uit*); and α*<sup>i</sup>* = μ − η*<sup>i</sup>* + *E*(η*i*).

This model can be estimated in three steps. In the first step, we use the standard random-effects panel regression to estimate <sup>β</sup>ˆ. This procedure also gives predicted values of <sup>α</sup>*<sup>i</sup>* and <sup>∈</sup>*it*, which we denote by αˆ*<sup>i</sup>* and ∈ˆ*it*. In the second step, we estimate the time-varying technical inefficiency, *uit*, and in the final step, we estimate η*<sup>i</sup>* following a procedure similar to that in Step 2. Lastly, we estimate the persistent efficiency, PE, as *PE* = −exp(η*i*). The residual efficiency, RE, is obtained as in Models 1 and 2, assuming a half normal distribution or truncated normal distribution *uit*. The overall efficiency, OE, following Kumbhakar et al. [28], is obtained from the product of PE and RE, that is, OE = PE × RE.

Table 1 gives the main characteristics of the three different efficiency models. The characteristics are related to the underlying assumptions of the different models, decomposition of the error components, time variation patterns of inefficiency, and the estimation procedure.


**Table 1.** Main characteristics of the different models.

Notes: Fixed-effects (Fixed), random-effects (Random), homoscedastic variance (Homosc.), time invariant efficiency (Time-inv.), zero truncated error term (Zero trunc.), and maximum likelihood (ML).

#### **4. Data**

The data used in this study are from the province level observed for the period 2010–2017. It is obtained from the National Bureau of Statistics of China [1]. The dataset is the best available and frequently used in research and planning. This section describes the data source and provides a list of key and control variables; it also gives a descriptive analysis of the data.

## *4.1. Main Variables*

In this research, energy use is defined as the economic value of total energy used per capita. It covers all economic sectors. It is reflected in both the price and quantity of energy. It is also reflected in the value of production. The definition of energy used here is close to the one used by [11], who defined energy efficiency as the production of the same amount of services or desirable outputs but with less energy inputs and undesirable outputs. In the current study, the undesirable output is controlled for by environmental stringency, carbon dioxide, and fine particulate matter. Provinces' per capita GDP is used as the main explanatory variable. It reflects labor productivity, size or scale in the economy as well as opportunity for energy use or consumption.

It should be noted that one may consider income to be endogenously determined and, as such, it can induce biased estimation results. One way of endogenizing income is by using predicted income or lag income as the explanatory variable. However, the two approaches may in turn lead to a bias. Here, we ignore the issue of endogeneity with the argument that we use province level data which is average per capita income and not endogenous to private and public users. Variations in income levels within the province that could be a source of endogeneity are not observed. At the level of aggregate income there is one-to-one correspondence between income and expenditure, and work is part of social life and most people, regardless of their income per hour, work 40 h per week.
