3.1.2. Heckit Model

Heckman (1979) proposed a two-step estimation method to resolve the problem of sample selection bias caused by using observable sample data. The two-step estimation method first uses the probit method to estimate the coefficients of all observed values and calculates the inverse Mills ratio (IMR). It has subsequently used the ordinary least squares method to estimate nonzero observed values, to include the IMR as an explanatory variable, and to estimate the coefficients of the model. The Heckit model mainly comprises a selection equation and an outcome equation:

Selection equation:

$$d\_i^\* = z\_i \alpha + \mu\_i \,\,\mu\_i \sim \mathcal{N}(0, 1) \tag{9}$$

$$d\_i = 1 \text{ if } d\_i^\* > 0 \tag{10}$$

$$d\_{\bar{1}} = 0\_{\prime} \,\, else\,\,\tag{5a}$$

In Equation (9), *d* ∗ *i* is the latent variable, *z<sup>i</sup>* is the explanatory variable influencing participation and consumption, and *α* is the corresponding coefficient. Equation (9) reflects the relationship between *d* ∗ *i* , the latent variable of the selection mechanism, and *d<sup>i</sup>* , the dichotomous dummy variable actually observed (Huang and Wang 2016).

Outcome equation:

$$y\_i^\* = \mathfrak{x}\_i \mathfrak{E} + v\_i \quad v\_i \sim \mathcal{N}\left(0, \sigma^2\right) \tag{11}$$

$$y\_i = y\_i^\* \text{ if } \ d\_i = 1 \tag{12}$$

In Equation (11), *y* ∗ *i* is the latent consumption expenditure variable, *y<sup>i</sup>* is the observed consumption expenditure variable, *x<sup>i</sup>* is the variable influencing consumption expenditure, and *β* is the corresponding coefficient. The Heckit model assumes that the error terms (*µ<sup>i</sup>* and *ν<sup>i</sup>* ) of the selection equation and the outcome equation are correlated, with the degree of correlation being expressed by *ρ*. The normal distribution of the error terms of the two equations is represented in Equation (5).

Apart from the two-step estimation method, the Heckit model can also adopt the maximum likelihood method to estimate the parameters, and its log-likelihood function is as follows (Aristei et al. 2008; Wodjao 2007):

$$\ln L = \sum\_{0} \ln \left[ 1 - \Phi(z\_i a) \right] + \sum\_{+} \ln \left[ \Phi \left( \frac{z\_i a + \frac{\rho}{\sigma} (y\_i - x\_i \beta)}{\sqrt{1 - \rho^2}} \right) \frac{1}{\sigma} \phi \left( \frac{y\_i - x\_i \beta}{\sigma} \right) \right] \tag{13}$$
