The Concavity of Conditional Maximum Likelihood Estimation for Logit Panel Data Models with Imputed Covariates

Otieno, Opeyo Peter; Cheng, Weihu

doi:10.3390/math11204338

Open AccessArticle

The Concavity of Conditional Maximum Likelihood Estimation for Logit Panel Data Models with Imputed Covariates

by

Opeyo Peter Otieno

^1,2,*

and

Weihu Cheng

¹

Department of Statistics, Beijing University of Technology, Beijing 100124, China

²

Department of Statistics and Computational Mathematics, The Technical University of Kenya, Nairobi P.O. Box 52428-00200, Kenya

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(20), 4338; https://doi.org/10.3390/math11204338

Submission received: 17 July 2023 / Revised: 1 September 2023 / Accepted: 8 September 2023 / Published: 18 October 2023

(This article belongs to the Special Issue New Advances in Statistics and Econometrics)

Download

Browse Figures

Versions Notes

Abstract

:

In estimating logistic regression models, convergence of the maximization algorithm is critical; however, this may fail. Numerous bias correction methods for maximum likelihood estimates of parameters have been conducted for cases of complete data sets, and also for longitudinal models. Balanced data sets yield consistent estimates from conditional logit estimators for binary response panel data models. When faced with a missing covariates problem, researchers adopt various imputation techniques to complete the data and without loss of generality; consistent estimates still suffice asymptotically. For maximum likelihood estimates of the parameters for logistic regression in cases of imputed covariates, the optimal choice of an imputation technique that yields the best estimates with minimum variance is still elusive. This paper aims to examine the behaviour of the Hessian matrix with optimal values of the imputed covariates vector, which will make the Newton–Raphson algorithm converge faster through a reduced absolute value of the product of the score function and the inverse fisher information component. We focus on a method used to modify the conditional likelihood function through the partitioning of the covariate matrix. We also confirm that the positive moduli of the Hessian for conditional estimators are sufficient for the concavity of the log-likelihood function, resulting in optimum parameter estimates. An increased Hessian modulus ensures the faster convergence of the parameter estimates. Simulation results reveal that model-based imputations perform better than classical imputation techniques, yielding estimates with smaller bias and higher precision for the conditional maximum likelihood estimation of nonlinear panel models.

Keywords:

maximum likelihood; fixed effects; conditional logit; Hessian matrix

MSC:

62F99; 65C99

1. Introduction

Parameter estimation is a key goal of inferential statistics, and most researchers attempt to fit data into models that would produce the best of all possible parameter estimates. The motivation behind parameter estimation is to make inferences about a study population using sample information, and this calls for very well-spelled-out ways of ensuring that unbiased and precise estimates are achieved in every parameter estimation technique applied. During sample data collection, researchers encounter missing values in the study variables, a problem that leads to complications in statistical analyses through inaccurate estimates that may eventually lead to incorrect inferences and policy actions.

Specifically, when the response variable is binary, problems of missing covariates are further compounded by the nonlinear treatment of the model specification. Studies on missingness and parameter estimation have shown that the most frequent techniques of imputation result into biased estimates with significant loss of power [1,2,3]. This problem cuts across every model, with no exception made for the logit model for binary choice response variables; several studies have made attempts to come up with reliable imputation techniques for missing observations, so as to reduce the estimates’ bias. For example, Fang and Jun [4] proposed a procedure for estimating the parameters of a generalized linear model (GLM) with missing dependent and independent variables, known as the iterative imputation estimation (IIE). This iterative method proved to be computationally faster and easier compared to maximum likelihood estimation (MLE) or weighted estimation equations, and it was therefore recommended for large samples with multiple covariates with missing values. IIE, however, proved to be less efficient than MLE, since it does not incorporate the present covariate values that correspond to missing response values when simplified computation is required. Another study by Horton and Laird [5] gave an in-depth review of the method of weights for GLMs, as was developed by Ibrahim [6] for missing discrete covariates. They also acknowledged that if nuisance parameter distribution is incorrectly specified, then the method of weights does not yield unbiased estimates of the regression model.

In narrowing down to a characterization of the association between dichotomous outcome variables and other model covariates, we often use logistic regression approaches. In a broad sense, the maximum likelihood estimation technique produces parameter estimates which yield the highest probability of achieving the observed data set. It is a general method for estimating the logistic regression parameters. When maximum likelihood estimates (MLE) are absent, the maximum likelihood technique (ML) is occasionally vulnerable to a convergence problem. The need to assess the behavior of parameter estimates for a logistic regression model using MLE is of great importance, and the applications of the logistic model stretch far and wide across research disciplines. There exist numerous works that discuss the convergence problem (on a logistic regression model, by Cox et al. [7], or bias reduction by Firth, Anderson and Richardson) [8,9]. Other studies outline many assumptions regarding the distributions of ML estimations resulting from the bias reduction technique, and the impact of varying sample size on MLE [10,11].

The asymptotic characteristics of the maximum likelihood estimator are crucial for statistical inference based on the logistic regression model, according to Lee [12]. Therefore, for the logistic regression parameters, the sampling distribution of the ML estimators is asymptotically normal and unbiased under large sample scenarios. On the other hand, due to unbiased estimates in small samples, the asymptotic properties of maximum likelihood estimations may not hold [12,13]. Other studies by Kyeongjun and Jung-In reveal that specific estimators, such as the pivot-based estimator, yield plausible mean square errors (MSE) and biases compared to MLEs and weighted least-square estimators [14]. Saeid et al. similarly compared the maximum likelihood estimates and the Bayes estimates for Gompertz distribution, but with no assumption of missing covariates [15]. Therefore, privy to the fact that MLE may not always be the best for all types of distributions and models, this study limits aims to investigate the performance of conditional MLE in panel data models. Firth’s method has been introduced as one of the penalization techniques to minimize the small sample bias of the ML estimators for the linear regression model [8,13]. Lee [9] performed a comparison of the performance of the standard MLE and that of Firth’s penalized MLE. The results of this comparison showed that the asymptotic MLE performed better than the penalized MLE in terms of the statistical power [12].

To prevent the problem of having to deal with extreme biases in estimates resulting from imputation of missing covariates, there is a need to try and establish the best imputation technique among those proposed in the literature. We propose the use of the Hessian matrix from the log-likelihood function to establish whether or not the used imputation technique yields parameter estimates that would maximize the conditional likelihood function of a logistic panel data model.

The present paper, therefore, aims to evaluate the susceptibility of the Hessian matrix to different imputation techniques by comparing the magnitudes of the determinants obtained from the Hessian matrix of the log-likelihood function with the imputed covariate vector.

In a bid to curb the incidental parameter problem, especially for logistic regression panel models, we adopt a conditional maximum likelihood estimator which analytically eliminates the individual fixed effects from the estimation algorithm. This we do in this first section, wherein we also lay down the basics of panel data econometrics.

After the introductory section, Section 2 of this paper gives the specification of the nonlinear binary choice panel data models, under the assumption that the response variable is dichotomous. Section 3 highlights the incidental parameter problem in estimating the logistic panel data model and shows how the conditional maximum likelihood approach circumvents it. In Section 4, we discuss parameter estimation for a logit panel data model, in which the covariate vector is partitioned into sample present values and missing or imputed values. This discerns the impact of missingness on the Hessian of the proposed estimator of the binary choice logistic panel model. In addition, we also present results from a Monte Carlo simulation which evaluates the effect of the imputation of the covariate vector on the determinants of the Hessian matrix and the parameter estimates. Section 5 concludes by summarizing the study’s findings and offering recommendations for more research based on its main findings.

2. Model Specification

2.1. Panel Data

Observing experimental subjects or units over a repeated number of times produces a set of data referred to as panel data, which provides two kinds of information: cross-sectional and time series. This unique characteristic allows panel data to account for individual differences that are time-invariant, through regression methods which adequately utilize all the information in the data set.

The logit panel data model then develops from the logistic regression into model binary choice response variables which have had wide applications in almost all research fields that conduct pretest and posttest studies, with the aim of discerning the impact of the test. As such, for

N

units each observed

T

times, we have a total of

N \times T

observations made.

2.2. The Logit Panel Data Model

Suppose that a stream of

T

binary responses are observed for every unit

i

of a population of size

N

, and that all observations are made. We can define a

T \times 1

vector of the dichotomous variable

Y

as

Y_{i} = {(Y_{i 1}, Y_{i 2}, \dots, Y_{i T})}^{'}

, where

Y_{i t} \in \{0,1\},

taking the value of 1 if an event is successful, and 0 otherwise. It then follows that

Y_{i t} ~ b e r n o u l l i (p_{i t})

and

E (Y_{i t}) = p_{i t}

. Similarly, for each time

t

, let

Y_{i t}

be predicted by a corresponding

1 \times k

vector of covariates

x_{i t} = ({x^{(1)}}_{i t}, {x^{(2)}}_{i t}, \dots, {x^{(K)}}_{i t})

and an individual specific, time-invariant parameter

c_{i}

. Given the binary nature of the response variable, it is plausible to model the association between

Y_{i t}

and the vector of covariates

x_{i t}

using the logistic regression function:

p_{i t} = \Pr (Y_{i t} = 1) = E (Y_{i t}| x_{i t}, c_{i}) = F (x_{i t}^{'} β + c_{i}),

(1)

where

β = {(β_{1}, β_{2}, \dots, β_{K})}^{'}

denotes the k × 1 vector of

k

regression coefficients of the covariate matrix

x_{i t}

. This is so because

F (\cdot)

, as a link function that relates the binary outcome to the functional forms of the explanatory variables, is a probability model.

Under random sampling,

P r (Y_{i t} = 1| x_{i t}, c_{i}) = E (Y_{i t} = 1| x_{i t}, c_{i}; β),

and when the binary response model is specified correctly, we can in turn specify the linear probability model (LPM):

\begin{matrix} \begin{matrix} P r (Y_{i t} = 1| x_{i t}, c_{i}) = x_{i t}^{'} β + c_{i} \end{matrix} \\ P r (Y_{i t} = 0| x_{i t}, c_{i}) = 1 - (x_{i t}^{'} β + c_{i}) \end{matrix}\} .

(2)

Adopting a linear probability model shows the absurdity of predicting the “probabilities” of the response variable as either less than zero or greater than one. This shortfall is, however, addressed by specifying the monotonically increasing function

F

such that

F (\cdot) : R \to [0,1]

and

\begin{matrix} P r (Y_{i t} = 1| x_{i t}, c_{i}) \to 1 a s x_{i t} β + c_{i} \to \infty \\ P r (Y_{i t} = 1| x_{i t}, c_{i}) \to 0 a s x_{i t} β + c_{i} \to - \infty \end{matrix}\}

(3)

This study adopts the logistic distribution as a nonlinear functional form of

F

:

F (x_{i t}^{'} β + c_{i}) = \frac{e^{x_{i t}^{'} β + c_{i}}}{1 + e^{x_{i t}^{'} β + c_{i}}},

(4)

such that

0 \leq F \leq 1

for all values of

x_{i t}^{'} β + c_{i}

. Equation (4) now depicts the (cumulative) logistic distribution function, which overcomes LPM’s drawbacks. Since

F,

as specified in (4), is nonlinear in both the parameters and covariates, OLS estimators for the parameters would be inappropriate. However, by obtaining the logs of the odds ratio, we can linearize Equation (4) to give the logit panel data regression model:

l n (\frac{P r (Y_{i t} = 1| x_{i t}, c_{i})}{P r (Y_{i t} = 0| x_{i t}, c_{i})}) = x_{i t}^{'} β + c_{i} .

(5)

There are several methods for estimating fixed-effects models, including (a) demeaning variables, (b) unconditional maximum likelihood estimation (also known as least-squares dummy variables, or LSDV), and (c) conditional maximum likelihood estimation, which is the method of choice for logistic regressions. The methods used to estimate panel data models with fixed effects and the continuous dependent variable

Y_{i t}

aim to compensate for the fixed effects,

c_{i}

, through elimination, so as to estimate the covariate coefficients,

β

. The conditional maximum likelihood estimation partials or “conditions” the fixed effects out of the likelihood function for categorical dependent variables when certain nonlinear functions that preserve the structure of the dependent variable are taken into account. This is achieved by relating the probability of the regressand to the total number of events observed for each category [16].

When the panel data sets are unbalanced due to cases of missing covariates, the estimation methods become computationally complicated and produce inefficient parameter estimates [17]. Several factors, such as delayed enrollment, early withdrawal, or intermittent non-response from a study unit, have been identified as causes of missingness in the literature. As a result, in these situations, methods for handling missing observations that have been presented in the literature are applicable. In this study, we examine the effects of missing data on the nonlinear panel data models’ conditional maximum likelihood estimation methods, in an effort to determine the most effective method.

3. Incidental Parameter Problem and MLE

3.1. Incidental Parameter Problem

As specified in model (1), the presence of individual effects

c_{i}

complicates the computation of parameter estimates greatly; thus, to obtain consistent estimates for the parameters for static linear models, we simply difference-out (differentiate) the fixed effects. The number of parameters

c_{i}

increases with the increase in sample size, a notion attributed to Neyman, Scott and Lancaster [18,19], which is referred to as the incidental parameter problem. For example, in the model (1), we have

K + N

parameters to be estimated, and

c_{i}

being dependent on

N

becomes incidental for the panel data model and we may not obtain their consistent estimates for nonlinear panels [20]. However, consistent estimators of the covariate coefficients for linear panel data models can be obtained after eliminating the incidental parameters through the difference

y_{i t} - E (Y_{i t})

. This is so because for the linear panel model, the MLE of

c_{i}

and

β

are asymptotically independent [21].

Chamberlain demonstrated that this differentiation does not work for panel models with dichotomous response variables, even when

T

is fixed [16]. In other studies, Greene [22] was able to force the MLE of the incidental parameters by using an equivalent number of dummy variables.

3.2. The Unconditional Log Likelihood Function

In logistic regression, the maximum likelihood estimation of parameters finds a specific vector

{\hat{β}}^{M L}

that yields the highest likelihood of obtaining the sample outcomes

\{y_{1}, y_{2}, \dots\}

given an observed vector of explanatory variables

x

.

By assumption,

p_{i t} = \Pr (Y_{i t} = 1) = E (Y_{i t}| x_{i t}, c_{i}) = F (x_{i t}^{'} β + c_{i})

and

{1 - p}_{i t} = \Pr (Y_{i t} = 0) = 1 - F (x_{i t}^{'} β + c_{i}),

therefore, the likelihood function for the entire sample is

L (y| x; β) = \prod_{i = 1}^{N} {(F (x_{i t} β + c_{i}))}^{y_{i}} {[1 - F (x_{i t} β + c_{i})]}^{1 - y_{i}} .

(6)

This gives the log likelihood function for the sample:

l n L (y| x; β) = \sum_{i = 1}^{N} \{y_{i} l n F (x_{i t} β + c_{i}) + (1 - y_{i}) l n [1 - F (x_{i t} β + c_{i})]\} .

(7)

The maximum likelihood estimator,

{\hat{β}}^{M L}

, optimizes Equation (7) with a maxima.

3.3. Conditional Log Likelihood Function for the Logistic Panel Data Model

In limiting the function

F

to being the logistic distribution function (4), Equation (7) becomes the log likelihood function for the logistic panel data model now expressed as:

l n L (y| x; β) = \sum_{i = 1}^{N} \{y_{i} l n (\frac{e^{x_{i t} β + c_{i}}}{1 + e^{x_{i t} β + c_{i}}}) + (1 - y_{i}) l n (\frac{1}{1 + e^{x_{i t} β + c_{i}}})\} .

(8)

The logistic model is preferred over the alternative probit model because it does not impose any assumptions on the relationship between

c_{i}

and

x_{i t}

in order to yield a consistent estimator of

β,

except under severe exogeneity. For the logistic distribution function, we are able to eliminate the fixed effects from the likelihood function by conditioning on the “minimal sufficient statistic” for the incidental parameters,

c_{i}

. The newly obtained likelihood function is the conditional likelihood function to be maximized.

Considering the conditional probabilities when

T = 2

, we know that:

P r (Y_{i 1} + Y_{i 2} = 1| x_{i 1}, x_{i 2}, c_{i},) = \frac{e^{x_{i 1} β + c_{i}} + e^{x_{i 2} β + c_{i}}}{[1 + e^{x_{i 1} β + c_{i}}] [1 + e^{x_{i 2} β + c_{i}}]},

(9)

P r (Y_{i 1} = 0, Y_{i 2} = 1| x_{i 1}, x_{i 2}, c_{i}, Y_{i 1} + Y_{i 2} = 1) = \frac{e^{x_{i 2} β + c_{i}}}{e^{x_{i 1} β + c_{i}} + e^{x_{i 2} β + c_{i}}} = \frac{e^{{(x}_{i 2} - x_{i 1}) β}}{1 + e^{{(x}_{i 2} - x_{i 1}) β}},

(10)

and

P r (Y_{i 1} = 1, Y_{i 2} = 0| x_{i 1}, x_{i 2}, Y_{i 1} + Y_{i 2} = 1) = \frac{1}{1 + e^{{(x}_{i 2} - x_{i 1}) β}} .

(11)

Note that conditioning is applied to

y_{i 1} + y_{i 2} = 1

, for which

y_{i t}

changes between the two time periods and ensures that the

c_{i}

’s are eliminated; therefore,

\sum_{t} y_{i t}

is a sufficient statistic for the fixed effects.

Probabilities (10) and (11) are conditional on

y_{i 1} + y_{i 2} = 1,

and do not depend on

c_{i}

.

We have the joint probability distribution function:

P r (Y_{i 1}, Y_{i 2}| x_{i 1}, x_{i 2}, Y_{i 1} + Y_{i 2} = 1) = \{\begin{matrix} 1 i f (y_{i 1}, y_{i 2}) = (0,0) o r (1,1) \\ \frac{1}{1 + e^{{(x}_{i 2} - x_{i 1}) β}} i f (y_{i 1}, y_{i 2}) = (1,0) \\ \frac{e^{{(x}_{i 2} - x_{i 1}) β}}{1 + e^{{(x}_{i 2} - x_{i 1}) β}} i f (y_{i 1}, y_{i 2}) = (0,1) \end{matrix} .

(12)

Using the joint probability distribution function (12) in Equation (8), we express the conditional log-likelihood function as:

l n L = \sum_{i = 1}^{N} \{d_{01 i} l n (\frac{e^{{(x}_{i 2} - x_{i 1}) β}}{1 + e^{{(x}_{i 2} - x_{i 1}) β}}) + d_{10 i} l n (\frac{1}{1 + e^{{(x}_{i 2} - x_{i 1}) β}})\},

(13)

where

d_{01 i}

chooses the units whose dependent variable changed from 0 to 1, while

d_{10 i}

chooses units wherein the response variable changed from 1 to 0.

Hence, by maximizing the conditional log likelihood function (13), we obtain consistent estimates of

β

, regardless of whether

c_{i}

and

x_{i t}

are correlated. Generally, for several imputation techniques, the conditional logit estimator yields fairer biases and root mean square errors than the unconditional logit estimator, especially for large

N

[23].

The trick is thus to condition the likelihood on the outcome series (

y_{i 1}, y_{i 2}

), and in the more general case, the general conditional probability of the response variable (

y_{i 1}, y_{i 2}, \dots ., y_{i T}

), given

\sum_{t} y_{i t},

is

P r (Y_{i 1}, Y_{i 2}, \dots ., Y_{i T}| X_{i}, \sum_{t} Y_{i t}) = \frac{e^{(\sum_{t} y_{i t} x_{i t} β)}}{\sum_{d \in D_{i}} e^{(\sum_{t} d_{i t} x_{i t} β)}},

(14)

where

D_{i} = \{(d_{i 1}, d_{i 2}, d_{i 3}, \dots, d_{i T})| d_{i t} = 0,1 and \sum_{t} d_{i t} = \sum_{t} y_{i t}\}

.

4. Parameter Estimation with the Imputed Covariate Sub-Matrix

4.1. Partitioned Covariate Matrix

In the presence of missing observations in the covariate vector

x_{i t}

, we express it as a sum of two vectors

x_{{i t}_{s}}

and

x_{{i t}_{I}}

for the sample‘s present covariate values and the missing covariate value, respectively. Therefore, we have the conditional probabilities (10) and (11):

P r (Y_{i 1} = 0, Y_{i 2} = 1| x_{i 1}, x_{i 2}, Y_{i 1} + Y_{i 2} = 1) = \frac{e^{Δ x_{i_{I}} β}}{e^{- Δ x_{i_{s}} β} + e^{Δ x_{i_{I}} β}},

(15)

and

P r (Y_{i 1} = 1, Y_{i 2} = 0| x_{i 1}, x_{i 2}, Y_{i 1} + Y_{i 2} = 1) = \frac{e^{- Δ x_{i_{s}} β}}{e^{- Δ x_{i_{s}} β} + e^{Δ x_{i_{I}} β}},

(16)

respectively, where

Δ x_{i_{I}} = (x_{{i 2}_{I}} - x_{{i 1}_{I}})

and

Δ x_{i_{s}} = (x_{{i 2}_{s}} - x_{{i 1}_{s}})

.

Equations (15) and (16) when used in Equation (13) now give the conditional log-likelihood function with imputed covariates:

l n L = \sum_{i = 1}^{N} \{d_{01 i} l n (\frac{e^{Δ x_{i_{I}} β}}{e^{- Δ x_{i_{s}} β} + e^{Δ x_{i_{I}} β}}) + d_{10 i} l n (\frac{e^{- Δ x_{i_{s}} β}}{e^{- Δ x_{i_{s}} β} + e^{Δ x_{i_{I}} β}})\} .

(17)

Consistent estimates of the parameters of Equation (17) are solved for with an iterative technique using the Newton–Raphson algorithm.

4.2. Newton-Raphson Algorithm and the Hessian Matrix Optimization of the Log Likelihood Function

Given a differentiable function

f

, Newton and Raphson proposed a non-analytic method of obtaining the roots of the function

f

through iterative approximations using the following relation:

x_{h + 1} = x_{n} - \frac{f (x_{h})}{f^{'} (x_{h})},

(18)

where

x_{h + 1}

are the

{(h + 1)}^{t h}

iteration. The goal of this method is to make the approximated result as close as possible to the exact result. If

f

is defined as the gradient function (score vector), then the first derivative of

f

gives the Hessian matrix, which is the matrix of the second-order derivatives of the likelihood function.

The Newton–Raphson algorithm for MLE involves fixing an initial estimate value

β^{(0)}

and using steps

h

to iterate for the next value:

β^{(h + 1)} = β^{(h)} + J {(β^{(h)})}^{- 1} s (β^{(h)}),

(19)

in which,

s (β) = \frac{\partial l n L}{\partial β}

is the score or gradient vector of the log likelihood function (17), and

J (β) = - \frac{\partial^{2} l n L}{\partial β \partial β^{'}}

is the observed information matrix, obtained as the negative of the computed Hessian matrix.

The score vector and observed Hessian matrix from the log likelihood function are, respectively,

s (β) = \frac{\partial l n L}{\partial β} = \sum_{i = 1}^{N} \{d_{01 i} [Δ {x'}_{i_{I}} - (\frac{- Δ {x^{'}}_{i_{s}} e^{- Δ x_{i_{s}} β} + {Δ {x'}_{i_{I}} e}^{Δ x_{i_{I}} β}}{e^{- Δ x_{i_{s}} β} + e^{Δ x_{i_{I}} β}})] - d_{10 i} [Δ {x'}_{i_{s}} + (\frac{- Δ {x^{'}}_{i_{s}} e^{- Δ x_{i_{s}} β} + {Δ {x'}_{i_{I}} e}^{Δ x_{i_{I}} β}}{e^{- Δ x_{i_{s}} β} + e^{Δ x_{i_{I}} β}})]\},

(20)

J (β) = - \frac{\partial^{2} l n L}{\partial β \partial β^{'}} = \sum_{i = 1}^{N} \{d_{01 i} [(\frac{Δ {x^{'}}_{i_{s}} Δ x_{i_{s}} e^{- Δ x_{i_{s}} β} + {Δ {x^{'}}_{i_{I}} Δ x_{i_{I}} e}^{Δ x_{i_{I}} β}}{e^{- Δ x_{i_{s}} β} + e^{Δ x_{i_{I}} β}}) - D^{'} D] + d_{10 i} [(\frac{Δ {x^{'}}_{i_{s}} Δ x_{i_{s}} e^{- Δ x_{i_{s}} β} + {Δ {x^{'}}_{i_{I}} Δ x_{i_{I}} e}^{Δ x_{i_{I}} β}}{e^{- Δ x_{i_{s}} β} + e^{Δ x_{i_{I}} β}}) - D^{'} D]\},

(21)

where

D = (\frac{- Δ {x^{'}}_{i_{s}} e^{- Δ x_{i_{s}} β} + {Δ {x^{'}}_{i_{I}} e}^{Δ x_{i_{I}} β}}{e^{- Δ x_{i_{s}} β} + e^{Δ x_{i_{I}} β}}) .

For well-defined parameter estimates of the log likelihood function, it is sufficient that (a) the log likelihood function must be concave, indicating that the model is identified; and (b) the Hessian matrix must be negative and semi-definite, yielding a negative curvature of the log likelihood plot. This means that we can equally depict the general Gaussian curvature of the likelihood function by evaluating the determinant of the Hessian matrix at a critical point of the function. The concavity of the log-likelihood function is easily established when all eigenvalues of its Hessian are negative. Therefore, the determinant of the Hessian matrix of the likelihood function should be non-negative, as a necessary condition for concavity.

In this study, we confirm that the conditional log likelihood function of the logit panel data model preserves its concavity, even when different imputation techniques are applied to the missing covariates matrix X. Establishing the concavity or convexity of the log-likelihood function becomes a necessary condition to help know whether the solutions or parameter estimates are optimally local or global. For the nonlinear logit panel data model, the maximum likelihood estimates are yielded when the Hessian matrix is negative semi-definite, resulting from a strictly concave log-likelihood function.

We use simulations to assess the relationship between the Hessian modulus and the properties of the parameter estimates for the conditional MLE of the logit panel data model with various imputation techniques for missing covariates.

4.3. Simulation Study

To investigate the concavity of the log-likelihood function through the behavior of the Hessian matrix when different imputation techniques are used to fill up for the missing covariates, we present Monte Carlo simulation results for a logistic panel data set. In this section, we focus on the N-R maximization of Equation (17), and use simulation results to compare properties of the Hessian matrices of the conditional log-likelihood function resulting from the new data sets obtained after imputation.

The simulation compares different sets of panel data generated by imputing covariates with imposed missingness patterns. This is achieved through substitution of the imputed covariate vector

x_{{i t}_{I}}

into Equation (17) for which both item-based and model-based imputation methods are used to fill up for the missing covariates. We consider a binary response variable that is specified by the relation model:

y_{i t} = 1 (x_{i t} β + c_{i} + ε_{i t} \geq 0) i = 1,2, 3, \dots, n t = 1,2, 3, \dots, T,

y_{i t} = 1 (c_{i} + {β_{1} x^{(1)}}_{i t} + β_{2} {x^{(2)}}_{i t} + β_{3} {x^{(3)}}_{i t} + β_{4} {x^{(4)}}_{i t} + β_{5} {x^{(5)}}_{i t} + ε_{i t} \geq 0) i = 1,2, 3, \dots, n t = 1,2, 3, \dots, T .

(22)

The covariate vector,

x_{i t}

contains five different variables, each having values drawn from normal, uniform or binomial distributions, as shown in Table 1.

ε_{i t}

is a disturbance term with a logistic distribution given by

ε_{i t} = l n |\frac{u_{i t}}{1 + u_{i t}}|

with

u_{i t} ~ N (0,1)

. The parameters

β_{1}

to

β_{5}

were fixed at

β_{1} = 1

,

β_{2} = - 1

,

β_{3} = 1

,

β_{4} = 1

and

β_{5} = 1

. We simulated the fixed effects

c_{i}

such that they depend partly on the sum of first covariate

x^{(1)}

and the time period

T

as

c_{i} = \frac{\sqrt{T} \sum x^{(1)}}{n} + α_{i}

with

α_{i} ~ N (0,1)

.

To establish the sample sizes, we imposed an expected probability of success at

P r (y_{i t} = 1| x_{i t}, c_{i}) = 0.5

and acceptable coefficients of variation values of

C o V = 0.2

0.2,

C o V = 0.14,

and

C o V = 0.09, r e s p e c t i v e l y

in the relation

\begin{matrix} N ≅ & \frac{1}{P r (Y) \times {(C o V)}^{2}} \end{matrix}

. These gave three different values of

N

(

N

= 50,

N

= 100 and

N = 250

) which were used for all sets of data fitted into the models to enable detailed comparisons and also to evaluate the impact of varying

N

on the determinant of the Hessian matrix of the log likelihood function. Further, to evaluate the impact of the proportion of missingness, we use two missingness proportions, 10% and 30%, by randomly inserting

N A

’s corresponding to the desired proportion of observations from the data set, and imputing them back accordingly for each value of

N

.

For each data set specified, we found the determinants of the Hessian matrices and plot against the corresponding data code for ease of comparison across sample sizes. We used the determinant of the Hessian matrix as a generalization of the second derivative test for univariate functions, where a positive determinant indicates an optimum value. This shows that the log likelihood is a concave function. The imputation techniques used herein are mean imputation; median imputation; last value carried forward; and Bayesian (Multiple Imputation with Chained Equations) imputation (Table 2, Table 3, Table 4, Table 5 and Table 6).

5. Discussion, Conclusions, and Recommendations

The simulated data when used to fit the logit panel model produced conditional maximum likelihood estimates with complete data which followed finite sample distributions as shown in Figure 1 and Figure 2. We note that the conditional MLE values from the complete data set are asymptotically normally distributed. By using different sample sizes, our results validate the asymptotic nature of the parameter bias. Similarly, the results show that the parameter estimates improve with increasing sample size (Figure 3). The precision of the estimates asymptotically increases, thereby making them more statistically significant.

The key objectives of this study were to focus on a method used to modify the conditional likelihood function through the partitioning of the covariate matrix in a bid to curb the incidental parameter problem and to assess the susceptibility of the Hessian matrix of the log likelihood function to the imputation techniques employed in completing a panel data set with missing covariates.

Undeniably, of all the classical imputation techniques, mean and median imputation do not introduce much undue bias into the data set, and therefore perform relatively better than the last value-carried-forward technique and mode of imputation. However, a model-based technique for imputation, like MICE, yields even better estimates, with even more reduced bias and precision [24]. Figure 1 shows the varying and reducing trends of the parameter estimates across the sample sizes and across imputation methods used in this study.

The value of

Δ x_{i_{I}}

inversely impacts the elements of the Hessian matrix and consequently its determinants. As seen, this study revealed that the smaller the determinant, the larger the parameter estimates, which signify an increased bias for smaller sample sizes. This indicates that by increasing the determinant of the Hessian matrix through a reduction in

Δ x_{i_{I}}

values, we reduce the product

J {(β^{(h - 1)})}^{- 1} s (β^{(h - 1)})

to zero.

From the N-R algorithm (19), therefore, the inverse of the Hessian

J

serves to reduce the product

J {(β^{(h - 1)})}^{- 1} s (β^{(h - 1)})

to yield convergence in the iterations of

β^{(h)}

. An increasing Hessian modulus therefore ensures faster convergence of the parameter estimates with more precision, as seen from Table 7 and Figure 4. The positive moduli of the Hessian for the conditional MLEs are sufficient for the concavity of the log likelihood function that gives the optimum estimates of the parameters.

Deriving estimators is crucial for improving their theoretical comprehension as well as for lowering the computational complexity involved in estimating logit panel data models. Unbalancedness in a data set leads to biased parameter estimates, as seen from the Monte Carlo results, and the various imputation methods used in this study react differently to the concavity of the Hessian matrix, which also affects the estimates’ bias and efficiency.

We can see from this study that when the within estimator becomes analytically cumbersome to use, the conditional maximum likelihood estimator becomes preferable over the unconditional MLE, since we are able to eliminate the fixed effects from the estimation process, thereby limiting our concentration on the parameter estimates only.

For further development of this study, we recommend consideration of panel models with multiple fixed effects, and panel data sets with study units observed over

T > 2

time periods. Real data from social and industrial settings can also be used to validate the findings herein.

Author Contributions

Conceptualization, O.P.O.; Methodology, O.P.O. and W.C.; Resources, W.C.; Writing—original draft, O.P.O.; Supervision, W.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding. The APC was funded by Beijing University of Technology.

Data Availability Statement

The data used in this research was simulated in R-console version R 4.2.2 and the codes used are available in https://drive.google.com/file/d/1EwvKG-Zb1N0WX1QqkJbq0gNyUiv3_tOO/view?usp=drive_link.

Conflicts of Interest

The authors declare no conflict of interest.

References

Janssen, K.J.M.; Donders, A.R.T.; Harrell, F.E.; Vergouwe, Y.; Chen, Q.; Grobbee, D.E.; Moons, K.G.M. Missing Covariate Data in Medical Research: To Impute Is Better than to Ignore. J. Clin. Epidemiol. 2010, 63, 721–727. [Google Scholar] [CrossRef] [PubMed]
Donders, A.R.T.; van der Heijden, G.J.M.G.; Stijnen, T.; Moons, K.G.M. Review: A Gentle Introduction to Imputation of Missing Values. J. Clin. Epidemiol. 2006, 59, 1087–1091. [Google Scholar] [CrossRef] [PubMed]
Knol, M.J.; Janssen, K.J.M.; Donders, A.R.T.; Egberts, A.C.; Heerdink, E.R.; Grobbee, D.E.; Moons, K.G.M.; Geerlings, M.I. Unpredictable Bias When Using the Missing Indicator Method or Complete Case Analysis for Missing Confounder Values: An Empirical Example. J. Clin. Epidemiol. 2010, 63, 728–736. [Google Scholar] [CrossRef] [PubMed]
Fang, F.; Shao, J. Iterated imputation estimation for generalized linear models with missing response and covariate values. Comput. Stat. Data Anal. 2016, 103, 111–123. [Google Scholar] [CrossRef]
Horton, N.J.; Laird, N.M. Maximum likelihood analysis of generalized linear models with missing covariates. Stat. Methods Med. Res. 1999, 8, 37–50. [Google Scholar] [CrossRef] [PubMed]
Ibrahim, J.G. Incomplete Data in Generalized Linear Models. J. Am. Stat. Assoc. 1990, 85, 765–769. [Google Scholar] [CrossRef]
Cox, D.R.; Hinkley, D.V. Theoretical Statistics; Chapman and Hall: London, UK, 1974. [Google Scholar]
Firth, D. Bias Reduction of Maximum Likelihood Estimates. Biometrika 1993, 80, 27–38. [Google Scholar] [CrossRef]
Anderson, J.A.; Richardson, S.C. Logistic Discrimination and Bias Correction in Maximum Likelihood Estimation. Technometrics 1979, 21, 71–78. [Google Scholar] [CrossRef]
McCullagh, P. The Conditional Distribution of Goodness-of-Fit Statistics for Discrete Data. J. Am. Stat. Assoc. 1986, 81, 104–107. [Google Scholar] [CrossRef]
Shenton, L.R.; Bowman, K.O. Maximum Likelihood Estimation in Small Samples; Lubrecht & Cramer Limited: New York, NY, USA, 1977. [Google Scholar]
Lee, S. Detecting Differential Item Functioning Using the Logistic Regression Procedure in Small Samples. Appl. Psychol. Meas. 2016, 41, 30–43. [Google Scholar] [CrossRef] [PubMed]
Puhr, R.; Heinze, G.; Nold, M.; Lusa, L.; Geroldinger, A. Firth’s Logistic Regression with Rare Events: Accurate Effect Estimates and Predictions? Stat. Med. 2017, 36, 2302–2317. [Google Scholar] [CrossRef] [PubMed]
Lee, K.; Seo, J.-I. Different Approaches to Estimation of the Gompertz Distribution under the Progressive Type-II Censoring Scheme. J. Probab. Stat. 2020, 2020, 3541946. [Google Scholar] [CrossRef]
Asadi, S.; Panahi, H.; Swarup, C.; Lone, S.A. Inference on Adaptive Progressive Hybrid Censored Accelerated Life Test for Gompertz Distribution and Its Evaluation for Virus-Containing Micro Droplets Data. Alex. Eng. J. 2022, 61, 10071–10084. [Google Scholar] [CrossRef]
Chamberlain, G. Analysis of Covariance with Qualitative Data. Rev. Econ. Stud. 1980, 47, 225. [Google Scholar] [CrossRef]
Mátyás, L.; Lovrics, L. Missing Observations and Panel Data. Econ. Lett. 1991, 37, 39–44. [Google Scholar] [CrossRef]
Neyman, J.; Scott, E.L. Consistent Estimates Based on Partially Consistent Observations. Econometrica 1948, 16, 1. [Google Scholar] [CrossRef]
Lancaster, T. The Incidental Parameter Problem since 1948. J. Econom. 2000, 95, 391–413. [Google Scholar] [CrossRef]
Baltagi, B.H. Econometric Analysis of Panal Data; John Wiley & Sons: Hoboken, NJ, USA, 2001. [Google Scholar]
Hsiao, C. Analysis of Panel Data; Cambridge University Press: Cambridge, UK; New York, NY, USA, 2003. [Google Scholar]
Greene, W. The Behaviour of the Maximum Likelihood Estimator of Limited Dependent Variable Models in the Presence of Fixed Effects. Econom. J. 2004, 7, 98–119. [Google Scholar] [CrossRef]
Opeyo, P.O.; Olubusoye, O.E.; Odongo, L.O. Conditional Maximum Likelihood Estimation for Logit Panel Models with Non-Responses. Int. J. Sci. Res. 2014, 3, 2242–2254. [Google Scholar]
Opeyo, P.O.; Cheng, W.; Xu, Z. Superiority of Bayesian Imputation to Mice in Logit Panel Data Models. Open J. Stat. 2023, 13, 316–358. [Google Scholar] [CrossRef]

Figure 1. Empirical distributions of parameter estimates by conditional MLE for a complete unimputed panel data set.

Figure 2. Densities of parameter estimates by conditional MLE for a complete unimputed panel data set.

Figure 3. Comparative parameter estimates by conditional MLE for different imputed panel data sets with varying sample sizes and proportions of missingness.

Figure 4. Comparative determinants of the Hessian matrices across different imputed panel data sets with varying sample sizes.

Table 1. Description of variables.

Variable	$X^{(1)}$	$X^{(2)}$	$X^{(3)}$	$X^{(4)}$	$X^{(5)}$
Type	continuous	continuous	continuous	discrete	discrete
Distribution	N~(0, 1)	U~(0, 1)	N~(0.5, 0.5)	B~(nT, 2, 0.65)	Bernoulli

Table 2. Parameter estimates by conditional MLE for a complete unimputed panel data set.

		Complete Unimputed Data
		$β_{1}$	$β_{2}$	$β_{3}$	$β_{4}$	$β_{5}$
10% Missingness	n = 50	1.097358	−1.1006	1.162922	1.088189	1.135867
	n = 100	1.063353	−1.04996	1.060443	1.042268	1.07011
	n = 250	1.026624	−1.02073	1.020552	1.020266	1.006623
30% Missingness	n = 50	1.118411	−1.06917	1.112667	1.122587	1.10587
	n = 100	1.05757	−1.02386	1.074032	1.033003	1.052396
	n = 250	1.027265	−1.01406	1.008524	1.015673	0.993208

Table 3. Parameter estimates by conditional MLE for a mean imputed panel data set.

		Mean Imputed Data
		$β_{1}$	$β_{2}$	$β_{3}$	$β_{4}$	$β_{5}$
10% Missingness	n = 50	1.069795	−1.07215	1.12953	1.059192	1.245338
	n = 100	1.032826	−1.00675	1.023471	1.006381	1.154124
	n = 250	1.000421	−0.9956	0.988925	0.994209	1.120467
30% Missingness	n = 50	1.043838	−0.966003	0.968262	1.008104	1.329941
	n = 100	1.004478	−0.93376	0.970724	0.929846	1.262566
	n = 250	0.980142	−0.93882	0.922916	0.913546	1.238532

Table 4. Parameter estimates by conditional MLE for an LVCF imputed panel data set.

		LVCF Imputed Data
		$β_{1}$	$β_{2}$	$β_{3}$	$β_{4}$	$β_{5}$
10% Missingness	n = 50	0.943744	−0.92664	0.977888	0.923632	1.126666
	n = 100	0.895114	−0.87648	0.889337	0.878934	1.083107
	n = 250	0.869159	−0.8563	0.853653	0.864803	1.03423
30% Missingness	n = 50	0.682111	−0.60787	0.626598	0.665022	0.967133
	n = 100	0.660872	−0.60855	0.620407	0.604995	0.92563
	n = 250	0.633169	−0.59185	0.594952	0.592881	0.911108

Table 5. Parameter estimates by conditional MLE for a median imputed panel data set.

		Median Imputed Data
		$β_{1}$	$β_{2}$	$β_{3}$	$β_{4}$	$β_{5}$
10% Missingness	n = 50	1.098609	−1.06006	1.12113	1.042217	1.090476
	n = 100	1.062593	−0.99428	1.017271	0.988661	1.025071
	n = 250	1.030955	−0.98289	0.97916	0.974852	0.977667
30% Missingness	n = 50	1.104232	−0.94469	0.951333	0.965304	1.058915
	n = 100	1.061597	−0.92397	0.956362	0.890659	0.958509
	n = 250	1.034481	−0.92603	0.908687	0.879251	0.930022

Table 6. Parameter estimates by conditional MLE for a Bayesian (MICE) imputed panel data set.

		Bayesian (MICE) Imputed Data
		$β_{1}$	$β_{2}$	$β_{3}$	$β_{4}$	$β_{5}$
10% Missingness	n = 50	1.067832	−1.07822	1.27407	1.062909	1.141502
	n = 100	1.028601	−1.00793	1.02702	1.010346	1.026907
	n = 250	0.998693	−0.99862	0.994132	0.999532	1.000275
30% Missingness	n = 50	0.966065	−1.04179	0.986995	0.99603	0.878896
	n = 100	0.946108	−0.94759	0.95725	0.931001	0.834477
	n = 250	0.917728	−0.93417	0.934447	0.921759	0.862901

Table 7. Comparative parameter biases and determinants of the Hessian matrices across different imputed panel data sets with varying sample sizes.

		Determinant of Hessian $(\times 10^{- 6})$	Parameter Bias
		Determinant of Hessian $(\times 10^{- 6})$	$β_{1}$	$β_{2}$	$β_{3}$	$β_{4}$	$β_{5}$
Complete Unimputed Data	n = 50	10.33104	0.118411	0.069170	0.112667	0.122587	0.105870
	n = 100	13.67689	0.057570	0.023855	0.074032	0.033003	0.052396
	n = 250	15.00124	0.027265	0.014057	0.008524	0.015673	−0.006792
Mean Imputed Data	n = 50	2.509932	0.043838	−0.033998	−0.031738	0.008104	0.329941
	n = 100	3.186349	0.004478	−0.066237	−0.029276	−0.070154	0.262566
	n = 250	4.677663	−0.019858	−0.061184	−0.077084	−0.086455	0.238532
LVCF Imputed Data	n = 50	11.67492	−0.317889	−0.392132	−0.373402	−0.334978	−0.032867
	n = 100	18.52447	−0.339128	−0.391447	−0.379594	−0.395006	−0.074370
	n = 250	20.43683	−0.366831	−0.408155	−0.405048	−0.407119	−0.088892
Median Imputed Data	n = 50	4.156453	0.104232	−0.055310	−0.048667	−0.034696	0.058915
	n = 100	4.049792	0.061597	−0.076035	−0.043638	−0.109342	−0.041491
	n = 250	6.079974	0.034481	−0.073969	−0.091313	−0.120749	−0.069978
Bayesian (MICE) Imputed Data	n = 50	1.510900	−0.033935	0.041789	−0.013006	−0.003970	−0.121104
	n = 100	1.968110	−0.053892	−0.052413	−0.042750	−0.068999	−0.165523
	n = 250	3.926050	−0.082272	−0.065831	−0.065554	−0.078241	−0.137099

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Otieno, O.P.; Cheng, W. The Concavity of Conditional Maximum Likelihood Estimation for Logit Panel Data Models with Imputed Covariates. Mathematics 2023, 11, 4338. https://doi.org/10.3390/math11204338

AMA Style

Otieno OP, Cheng W. The Concavity of Conditional Maximum Likelihood Estimation for Logit Panel Data Models with Imputed Covariates. Mathematics. 2023; 11(20):4338. https://doi.org/10.3390/math11204338

Chicago/Turabian Style

Otieno, Opeyo Peter, and Weihu Cheng. 2023. "The Concavity of Conditional Maximum Likelihood Estimation for Logit Panel Data Models with Imputed Covariates" Mathematics 11, no. 20: 4338. https://doi.org/10.3390/math11204338

APA Style

Otieno, O. P., & Cheng, W. (2023). The Concavity of Conditional Maximum Likelihood Estimation for Logit Panel Data Models with Imputed Covariates. Mathematics, 11(20), 4338. https://doi.org/10.3390/math11204338

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

The Concavity of Conditional Maximum Likelihood Estimation for Logit Panel Data Models with Imputed Covariates

Abstract

1. Introduction

2. Model Specification

2.1. Panel Data

2.2. The Logit Panel Data Model

3. Incidental Parameter Problem and MLE

3.1. Incidental Parameter Problem

3.2. The Unconditional Log Likelihood Function

3.3. Conditional Log Likelihood Function for the Logistic Panel Data Model

4. Parameter Estimation with the Imputed Covariate Sub-Matrix

4.1. Partitioned Covariate Matrix

4.2. Newton-Raphson Algorithm and the Hessian Matrix Optimization of the Log Likelihood Function

4.3. Simulation Study

5. Discussion, Conclusions, and Recommendations

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI