1. Introduction
Parameter estimation is a key goal of inferential statistics, and most researchers attempt to fit data into models that would produce the best of all possible parameter estimates. The motivation behind parameter estimation is to make inferences about a study population using sample information, and this calls for very well-spelled-out ways of ensuring that unbiased and precise estimates are achieved in every parameter estimation technique applied. During sample data collection, researchers encounter missing values in the study variables, a problem that leads to complications in statistical analyses through inaccurate estimates that may eventually lead to incorrect inferences and policy actions.
Specifically, when the response variable is binary, problems of missing covariates are further compounded by the nonlinear treatment of the model specification. Studies on missingness and parameter estimation have shown that the most frequent techniques of imputation result into biased estimates with significant loss of power [
1,
2,
3]. This problem cuts across every model, with no exception made for the logit model for binary choice response variables; several studies have made attempts to come up with reliable imputation techniques for missing observations, so as to reduce the estimates’ bias. For example, Fang and Jun [
4] proposed a procedure for estimating the parameters of a generalized linear model (GLM) with missing dependent and independent variables, known as the iterative imputation estimation (IIE). This iterative method proved to be computationally faster and easier compared to maximum likelihood estimation (MLE) or weighted estimation equations, and it was therefore recommended for large samples with multiple covariates with missing values. IIE, however, proved to be less efficient than MLE, since it does not incorporate the present covariate values that correspond to missing response values when simplified computation is required. Another study by Horton and Laird [
5] gave an in-depth review of the method of weights for GLMs, as was developed by Ibrahim [
6] for missing discrete covariates. They also acknowledged that if nuisance parameter distribution is incorrectly specified, then the method of weights does not yield unbiased estimates of the regression model.
In narrowing down to a characterization of the association between dichotomous outcome variables and other model covariates, we often use logistic regression approaches. In a broad sense, the maximum likelihood estimation technique produces parameter estimates which yield the highest probability of achieving the observed data set. It is a general method for estimating the logistic regression parameters. When maximum likelihood estimates (MLE) are absent, the maximum likelihood technique (ML) is occasionally vulnerable to a convergence problem. The need to assess the behavior of parameter estimates for a logistic regression model using MLE is of great importance, and the applications of the logistic model stretch far and wide across research disciplines. There exist numerous works that discuss the convergence problem (on a logistic regression model, by Cox et al. [
7], or bias reduction by Firth, Anderson and Richardson) [
8,
9]. Other studies outline many assumptions regarding the distributions of ML estimations resulting from the bias reduction technique, and the impact of varying sample size on MLE [
10,
11].
The asymptotic characteristics of the maximum likelihood estimator are crucial for statistical inference based on the logistic regression model, according to Lee [
12]. Therefore, for the logistic regression parameters, the sampling distribution of the ML estimators is asymptotically normal and unbiased under large sample scenarios. On the other hand, due to unbiased estimates in small samples, the asymptotic properties of maximum likelihood estimations may not hold [
12,
13]. Other studies by Kyeongjun and Jung-In reveal that specific estimators, such as the pivot-based estimator, yield plausible mean square errors (MSE) and biases compared to MLEs and weighted least-square estimators [
14]. Saeid et al. similarly compared the maximum likelihood estimates and the Bayes estimates for Gompertz distribution, but with no assumption of missing covariates [
15]. Therefore, privy to the fact that MLE may not always be the best for all types of distributions and models, this study limits aims to investigate the performance of conditional MLE in panel data models. Firth’s method has been introduced as one of the penalization techniques to minimize the small sample bias of the ML estimators for the linear regression model [
8,
13]. Lee [
9] performed a comparison of the performance of the standard MLE and that of Firth’s penalized MLE. The results of this comparison showed that the asymptotic MLE performed better than the penalized MLE in terms of the statistical power [
12].
To prevent the problem of having to deal with extreme biases in estimates resulting from imputation of missing covariates, there is a need to try and establish the best imputation technique among those proposed in the literature. We propose the use of the Hessian matrix from the log-likelihood function to establish whether or not the used imputation technique yields parameter estimates that would maximize the conditional likelihood function of a logistic panel data model.
The present paper, therefore, aims to evaluate the susceptibility of the Hessian matrix to different imputation techniques by comparing the magnitudes of the determinants obtained from the Hessian matrix of the log-likelihood function with the imputed covariate vector.
In a bid to curb the incidental parameter problem, especially for logistic regression panel models, we adopt a conditional maximum likelihood estimator which analytically eliminates the individual fixed effects from the estimation algorithm. This we do in this first section, wherein we also lay down the basics of panel data econometrics.
After the introductory section,
Section 2 of this paper gives the specification of the nonlinear binary choice panel data models, under the assumption that the response variable is dichotomous.
Section 3 highlights the incidental parameter problem in estimating the logistic panel data model and shows how the conditional maximum likelihood approach circumvents it. In
Section 4, we discuss parameter estimation for a logit panel data model, in which the covariate vector is partitioned into sample present values and missing or imputed values. This discerns the impact of missingness on the Hessian of the proposed estimator of the binary choice logistic panel model. In addition, we also present results from a Monte Carlo simulation which evaluates the effect of the imputation of the covariate vector on the determinants of the Hessian matrix and the parameter estimates.
Section 5 concludes by summarizing the study’s findings and offering recommendations for more research based on its main findings.
4. Parameter Estimation with the Imputed Covariate Sub-Matrix
4.1. Partitioned Covariate Matrix
In the presence of missing observations in the covariate vector
, we express it as a sum of two vectors
and
for the sample‘s present covariate values and the missing covariate value, respectively. Therefore, we have the conditional probabilities (10) and (11):
and
respectively, where
and
.
Equations (15) and (16) when used in Equation (13) now give the conditional log-likelihood function with imputed covariates:
Consistent estimates of the parameters of Equation (17) are solved for with an iterative technique using the Newton–Raphson algorithm.
4.2. Newton-Raphson Algorithm and the Hessian Matrix Optimization of the Log Likelihood Function
Given a differentiable function
, Newton and Raphson proposed a non-analytic method of obtaining the roots of the function
through iterative approximations using the following relation:
where
are the
iteration. The goal of this method is to make the approximated result as close as possible to the exact result. If
is defined as the gradient function (score vector), then the first derivative of
gives the Hessian matrix, which is the matrix of the second-order derivatives of the likelihood function.
The Newton–Raphson algorithm for MLE involves fixing an initial estimate value
and using steps
to iterate for the next value:
in which,
is the score or gradient vector of the log likelihood function (17), and
is the observed information matrix, obtained as the negative of the computed Hessian matrix.
The score vector and observed Hessian matrix from the log likelihood function are, respectively,
where
For well-defined parameter estimates of the log likelihood function, it is sufficient that (a) the log likelihood function must be concave, indicating that the model is identified; and (b) the Hessian matrix must be negative and semi-definite, yielding a negative curvature of the log likelihood plot. This means that we can equally depict the general Gaussian curvature of the likelihood function by evaluating the determinant of the Hessian matrix at a critical point of the function. The concavity of the log-likelihood function is easily established when all eigenvalues of its Hessian are negative. Therefore, the determinant of the Hessian matrix of the likelihood function should be non-negative, as a necessary condition for concavity.
In this study, we confirm that the conditional log likelihood function of the logit panel data model preserves its concavity, even when different imputation techniques are applied to the missing covariates matrix X. Establishing the concavity or convexity of the log-likelihood function becomes a necessary condition to help know whether the solutions or parameter estimates are optimally local or global. For the nonlinear logit panel data model, the maximum likelihood estimates are yielded when the Hessian matrix is negative semi-definite, resulting from a strictly concave log-likelihood function.
We use simulations to assess the relationship between the Hessian modulus and the properties of the parameter estimates for the conditional MLE of the logit panel data model with various imputation techniques for missing covariates.
4.3. Simulation Study
To investigate the concavity of the log-likelihood function through the behavior of the Hessian matrix when different imputation techniques are used to fill up for the missing covariates, we present Monte Carlo simulation results for a logistic panel data set. In this section, we focus on the N-R maximization of Equation (17), and use simulation results to compare properties of the Hessian matrices of the conditional log-likelihood function resulting from the new data sets obtained after imputation.
The simulation compares different sets of panel data generated by imputing covariates with imposed missingness patterns. This is achieved through substitution of the imputed covariate vector
into Equation (17) for which both item-based and model-based imputation methods are used to fill up for the missing covariates. We consider a binary response variable that is specified by the relation model:
The covariate vector,
contains five different variables, each having values drawn from normal, uniform or binomial distributions, as shown in
Table 1.
is a disturbance term with a logistic distribution given by
with
. The parameters
to
were fixed at
,
,
,
and
. We simulated the fixed effects
such that they depend partly on the sum of first covariate
and the time period
as
with
.
To establish the sample sizes, we imposed an expected probability of success at and acceptable coefficients of variation values of 0.2, and in the relation . These gave three different values of ( = 50, = 100 and ) which were used for all sets of data fitted into the models to enable detailed comparisons and also to evaluate the impact of varying on the determinant of the Hessian matrix of the log likelihood function. Further, to evaluate the impact of the proportion of missingness, we use two missingness proportions, 10% and 30%, by randomly inserting ’s corresponding to the desired proportion of observations from the data set, and imputing them back accordingly for each value of .
For each data set specified, we found the determinants of the Hessian matrices and plot against the corresponding data code for ease of comparison across sample sizes. We used the determinant of the Hessian matrix as a generalization of the second derivative test for univariate functions, where a positive determinant indicates an optimum value. This shows that the log likelihood is a concave function. The imputation techniques used herein are mean imputation; median imputation; last value carried forward; and Bayesian (Multiple Imputation with Chained Equations) imputation (
Table 2,
Table 3,
Table 4,
Table 5 and
Table 6).
5. Discussion, Conclusions, and Recommendations
The simulated data when used to fit the logit panel model produced conditional maximum likelihood estimates with complete data which followed finite sample distributions as shown in
Figure 1 and
Figure 2. We note that the conditional MLE values from the complete data set are asymptotically normally distributed. By using different sample sizes, our results validate the asymptotic nature of the parameter bias. Similarly, the results show that the parameter estimates improve with increasing sample size (
Figure 3). The precision of the estimates asymptotically increases, thereby making them more statistically significant.
The key objectives of this study were to focus on a method used to modify the conditional likelihood function through the partitioning of the covariate matrix in a bid to curb the incidental parameter problem and to assess the susceptibility of the Hessian matrix of the log likelihood function to the imputation techniques employed in completing a panel data set with missing covariates.
Undeniably, of all the classical imputation techniques, mean and median imputation do not introduce much undue bias into the data set, and therefore perform relatively better than the last value-carried-forward technique and mode of imputation. However, a model-based technique for imputation, like MICE, yields even better estimates, with even more reduced bias and precision [
24].
Figure 1 shows the varying and reducing trends of the parameter estimates across the sample sizes and across imputation methods used in this study.
The value of inversely impacts the elements of the Hessian matrix and consequently its determinants. As seen, this study revealed that the smaller the determinant, the larger the parameter estimates, which signify an increased bias for smaller sample sizes. This indicates that by increasing the determinant of the Hessian matrix through a reduction in values, we reduce the product to zero.
From the N-R algorithm (19), therefore, the inverse of the Hessian
serves to reduce the product
to yield convergence in the iterations of
. An increasing Hessian modulus therefore ensures faster convergence of the parameter estimates with more precision, as seen from
Table 7 and
Figure 4. The positive moduli of the Hessian for the conditional MLEs are sufficient for the concavity of the log likelihood function that gives the optimum estimates of the parameters.
Deriving estimators is crucial for improving their theoretical comprehension as well as for lowering the computational complexity involved in estimating logit panel data models. Unbalancedness in a data set leads to biased parameter estimates, as seen from the Monte Carlo results, and the various imputation methods used in this study react differently to the concavity of the Hessian matrix, which also affects the estimates’ bias and efficiency.
We can see from this study that when the within estimator becomes analytically cumbersome to use, the conditional maximum likelihood estimator becomes preferable over the unconditional MLE, since we are able to eliminate the fixed effects from the estimation process, thereby limiting our concentration on the parameter estimates only.
For further development of this study, we recommend consideration of panel models with multiple fixed effects, and panel data sets with study units observed over time periods. Real data from social and industrial settings can also be used to validate the findings herein.