Generalized Information Matrix Tests for Detecting Model Misspecification

Golden, Richard M.; Henley, Steven S.; White, Halbert; Kashner, T. Michael

doi:10.3390/econometrics4040046

Open AccessArticle

Generalized Information Matrix Tests for Detecting Model Misspecification

by

Richard M. Golden

^1,*,

Steven S. Henley

^2,3,6,

Halbert White

^4,† and

T. Michael Kashner

^3,5,6,7

¹

School of Behavioral and Brain Sciences, GR4.1, 800 W. Campbell Rd., University of Texas at Dallas, Richardson, TX 75080, USA

²

Martingale Research Corporation, 101 E. Park Blvd., Suite 600, Plano, TX 75074, USA

³

Department of Medicine, Loma Linda University School of Medicine, Loma Linda, CA 92357, USA

⁴

Department of Economics, University of California San Diego, La Jolla, CA 92093, USA

⁵

Office of Academic Affiliations (10A2D), Department of Veterans Affairs, 810 Vermont Ave. NW (10A2D), Washington, DC 20420, USA

⁶

Center for Advanced Statistics in Education, VA Loma Linda Healthcare System, Loma Linda, CA 92357, USA

⁷

Department of Psychiatry, University of Texas Southwestern Medical Center at Dallas, Dallas, TX 75390, USA

^*

Author to whom correspondence should be addressed.

^†

Halbert White sadly passed away before this article was published.

Econometrics 2016, 4(4), 46; https://doi.org/10.3390/econometrics4040046

Submission received: 29 December 2015 / Revised: 13 September 2016 / Accepted: 26 October 2016 / Published: 15 November 2016

(This article belongs to the Special Issue Recent Developments of Specification Testing)

Download

Browse Figures

Versions Notes

Abstract

:

Generalized Information Matrix Tests (GIMTs) have recently been used for detecting the presence of misspecification in regression models in both randomized controlled trials and observational studies. In this paper, a unified GIMT framework is developed for the purpose of identifying, classifying, and deriving novel model misspecification tests for finite-dimensional smooth probability models. These GIMTs include previously published as well as newly developed information matrix tests. To illustrate the application of the GIMT framework, we derived and assessed the performance of new GIMTs for binary logistic regression. Although all GIMTs exhibited good level and power performance for the larger sample sizes, GIMT statistics with fewer degrees of freedom and derived using log-likelihood third derivatives exhibited improved level and power performance.

Keywords:

asymptotic theory; Information Matrix Test; specification analysis; logistic regression; simulation study; information ratio; misspecification

JEL Classification:

C12; C13; C15; C18; C52

1. Introduction

If a researcher’s probability model of the observed data is not correctly specified, then the interpretation of its parameter estimates may not be valid, leading to incomplete or incorrect conclusions. Thus, whether a model is correctly specified must be considered when analyzing and interpreting data (e.g., [1,2]). This issue is critically important in econometrics as well as more general scientific inquiry. For example, in health economics, estimates of the impact of clinical treatments [3,4], care systems [5], and health policy interventions on health outcomes [6] are dependent on the underlying assumption that the model to be tested is correctly specified. Further, model misspecification testing is essential for statistical analysis of randomized control trials [7,8] and observational studies [9,10]. For these reasons, this paper introduces a unified framework for identifying, classifying, and developing a wide range of specification tests.

1.1. Information Matrix Test Methods for Detection of Model Misspecification

Assume that the data

x_{1}, ..., x_{n}

observed in an experiment is a realization of a sequence of independent and identically distributed d-dimensional random vectors

X_{1}, ..., X_{n}

with a common data generating process density

p_{x}

. Let

M \equiv {f (x; θ) : θ \in Θ}

denote a proposed probability model that is a collection of probability densities indexed by a k-dimensional parameter vector

θ

. If

p_{x} \in M

, so that

p_{x} (x) = f (x; θ^{*})

a . e .

for some

θ^{*} \in Θ

, then

M

is correctly specified with respect to

p_{x}

.

When

M

is correctly specified with respect to

p_{x}

, the inverse of the asymptotic covariance matrix of the maximum likelihood estimator

{\hat{θ}}_{n} \equiv \arg \max \prod_{i = 1}^{n} f (X_{i}; θ)

is equal to both the inverse Hessian covariance matrix

A^{*} \equiv - \nabla^{2} E {\log f (X_{i}; θ^{*})}

and the inverse Outer Product Gradient (OPG) covariance matrix

B^{*} \equiv E {\nabla \log f (X_{i}; θ^{*}) {(\nabla \log f (X_{i}; θ^{*}))}^{T}}

. This classic result is called the Information Matrix Equality (see [1,2], and Theorem 4 of this paper for relevant reviews).

Let

u : R^{k} \to R

. The notation

\nabla u

refers to a k-dimensional column vector of functions called the gradient whose ith element is

\frac{\partial u}{\partial x_{i}}

,

i = 1, ..., k .

The notation

\nabla^{2} u

refers to a k-dimensional matrix-valued function which is called the Hessian of u. The element in the ith row and jth column of

\nabla^{2} u

is

\frac{\partial^{2} u}{\partial x_{i} \partial x_{j}}

,

i, j = 1, ..., k .

As described by White [1,2], the information matrix equality may be used as the basis for a test of model misspecification. White [1] proposed the Information Matrix Test (IMT) for testing the null hypothesis that the elements of the k-dimensional Hessian and k-dimensional Outer Product Gradient (OPG) inverse asymptotic covariance matrices (denoted by

A^{*}

and

B^{*}

respectively) are equal. That is, White [1] considered the null hypothesis:

H_{o} : v e c h (A^{*} - B^{*}) = 0_{k (k + 1) / 2}

where

0_{k (k + 1) / 2}

denotes a k(k + 1)/2-dimensional column vector of zeros. Rejection of this null hypothesis thus implies a violation of the information matrix equality and thus the presence of model misspecification. Moreover, as noted by White [1], it may be helpful to also consider situations where the null hypothesis is “directional.” If a directional null hypothesis is rejected, this implies

H_{o} : v e c h (A^{*} - B^{*}) = 0_{k (k + 1) / 2}

is rejected (but the converse of this latter statement does not hold). White [1], in particular, discussed directional IMTs that have the form:

H_{o} : S v e c h (A^{*} - B^{*}) = 0_{r}

where the selection matrix

S \in R^{r \times (k (k + 1) / 2)}

consists of r rows of a k(k + 1)/2-dimensional identity matrix. In some cases directional IMTs may have more statistical power because they are designed to identify specific types of model misspecification.

For many years, the IMT approach has not been widely used outside of linear regression modeling because various instabilities (possibly associated with large degrees of freedom) of the test were observed. Chesher [11] and Lancaster [12] demonstrated how the calculation of the third derivatives of the log-likelihood function could be avoided for the full IMT, but the effectiveness of their approach was shown in some cases to exhibit unacceptable performance in logistic regression and linear regression [13,14,15,16,17,18].

1.2. Recent Developments in Information Matrix Test Theory

An advance in the theory of information matrix testing was provided by Presnell and Boos [19] (also see, [20,21,22]), who introduced an IOS (in and out of sample) directional IMT and showed that it was effective in a variety of important situations through both theoretical analyses and simulation studies. More recently, Golden et al. [23] introduced a general unified theory for model specification testing based upon a nonlinear extension of White’s [1] approach to specification testing. The new IMTs developed within the framework of Golden et al. [23] are called Generalized Information Matrix Tests (GIMT).

In particular, Golden et al. [23] discussed the problem of testing the null hypothesis that a smooth nonlinear GIMT hypothesis function

s : R^{k \times k} \times R^{k \times k} \to R^{r}

of the Hessian and OPG inverse asymptotic covariance matrices is equal to an r-dimensional vector of zeros. That is, a GIMT tests the null hypothesis

H_{o} : s (A^{*}, B^{*}) = 0_{r}

. Golden et al. [23] emphasized that different choices of GIMT hypothesis function yield different types of directional and non-directional GIMT hypotheses. Although Golden et al. [23] did not provide explicit regularity conditions and a detailed analysis of their proposed general class of GIMTs, Golden et al. [23] introduced key formal definitions, provided an informal discussion of relevant theoretical results, and reported the results of a comprehensive simulation study of a realistic epidemiological analysis problem using logistic regression for six new GIMTs that exhibited appealing level and power performance. This approach for the detection of model misspecification has now been used in observational and randomized controlled trial studies [7,8,9,10].

Since the publication of Golden et al. [23], Cho and White [24] described an important class of non-directional GIMTs and showed that each of their three test statistics for model misspecification was asymptotically distributed as a squared Gaussian random variable under the null hypothesis. In addition, Cho and White [24] provided analyses of the power of their test statistics under local and global alternatives. Zhou et al. [25] proposed a non-directional GIMT statistic for the large important class of regression models where the distribution of the response variable conditioned upon the covariates is a member of the linear exponential family. Like Cho and White [24], they showed their misspecification test statistic has only a single degree of freedom and is asymptotically distributed as a squared Gaussian random variable under the null hypothesis. Huang and Prokhorov [26] also showed how the information matrix testing framework is useful for investigating goodness-of-fit using non-directional GIMT statistics for semi-parametric probability models that are specified by copulas. All of this previous work on GIMTs can be interpreted as special cases or variants of special cases of the general framework of Golden et al. [23] for finite-dimensional smooth probability models.

This paper provides a unified framework for addressing the detection of model misspecification using a variety of GIMT statistics for a large class of finite-dimensional smooth probability models. By presenting the details of the GIMT framework and explicitly presenting the relevant regularity assumptions, it establishes the foundation for supporting research into the further development of a large class of GIMTs as well as assisting in understanding the similarities and differences between different GIMTs in the existing published statistical literature.

Our paper is organized in the following manner. In Section 2, we provide the assumptions of the GIMT framework. In Section 3, we characterize the asymptotic distribution of a large family of GIMTs for a large class of finite-dimensional smooth probability models under the assumptions and definitions in Section 2. In Section 4, we investigate the performance of new GIMTs using simulation studies developed with respect to a particular logistic regression model intended to be representative of a commonly encountered problem of model misspecification detection. Conclusions are provided in Section 5.

2. GIMT Theoretical Framework: Definitions and Assumptions

In this section, we introduce the definitions and assumptions of our formal mathematical theory of Generalized Information Matrix Tests. In most practical applications, these assumptions are often satisfied for thrice continuously differentiable probability models with a fixed number of free parameters that have locally unique solutions. Throughout, it is assumed that observations are independent and identically distributed.

2.1. Data Generating Process

Let

B (R^{d})

be the Borel σ-field generated by the open subsets of

R^{d}

.

Assumption 1.

Data Generating Process (DGP). Let

X_{i}, i = 1, 2, ...

be a sequence of independent and identically distributed (i.i.d) random vectors where

X_{i}

has a common probability measure P on the measurable space

(R^{d}, B (R^{d}))

with completion

(R^{d}, F_{0}, P_{0})

.

Let the triplet

(Ω, F_{o}, P_{o})

be the probability space for the Data Generating Process (DGP).

In regression modeling applications, the first element of the d-dimensional real vector x_i (a realization of X_i) may be a particular value of the outcome (dependent) variable for a regression model associated with the ith data record, the second element of x_i may be the number 1 for the purpose of introducing an intercept parameter, and the remaining elements of x_i may be particular values for the predictor variables associated with the ith data record, i = 1, …, n.

Although Assumption 1 assumes that the observed data X_i, i = 1, 2, … are i.i.d., the theory presented here is also applicable to panel data analyses. For example, consider a situation where data are collected in a longitudinal study on a group of individuals over a period of time. Assume the observations across participants are assumed to be i.i.d., but the observations for a particular participant are neither necessarily identically distributed nor independent. Let X_it denote the observation associated with the measurement of the ith participant in the study at time index t for t = 1, … ,T (where T is a fixed finite number) and i = 1, …, n. The theory described in this article is applicable to evaluating the degree to which a probability model can account for the observed data

X_{i} \equiv [\begin{matrix} X_{i, 1} & \dots & X_{i, T} \end{matrix}]

, i = 1, …, n.

The following assumption of absolute continuity is now introduced to permit alternative representations of

P_{0}

in order to represent, construct, and manipulate probability densities for data generating processes involving data samples containing combinations of discrete and continuous random variables.

Assumption 2.

Absolute Continuity. Let

ν_{j} (x_{j})

be a σ-finite measure on the measurable space

(R, B (R))

, j = 1, … , d. Let

ν \equiv \otimes_{j = 1}^{d} ν_{j} (x_{j})

be a σ-finite product measure on the measurable space

(R^{d}, B (R^{d}))

. Assume

P_{0}

is absolutely continuous with respect to

ν

.

By the Radon-Nikodým Theorem, Assumption 2 guarantees the joint distribution of X_i, P₀, may be represented using a Radon-Nikodým density function. The Radon-Nikodým density

p_{x} \equiv d P_{0} / d ν

is common to the i.i.d. random variables

X_{i}

, i = 1, …, n on the measurable space

(R^{d}, B (R^{d}))

.

Assumption 2 allows the theoretical results developed here to be applicable to random vectors that contain both discrete and absolutely continuous components. If a random vector is a discrete random vector or an absolutely continuous random vector, then the Radon-Nikodým density becomes a probability mass function or an absolutely continuous probability density function and the associated measure theory notation may be avoided.

2.2. Probability Model

Let

s u p p X

denote the support of X.

Assumption 3.

Parametric Densities. (i) Let

Θ

be a compact and non-empty subset of

R^{k}

,

k \in ℕ

; (ii) Let

f : R^{d} \times Θ \to [0, \infty)

. For each θ in

Θ

,

f (\cdot; θ)

is a density with respect to v and

f (x; \cdot)

is continuous on

Θ

for each

x \in s u p p X

; (iii)

\log f (x; \cdot)

is continuously differentiable on

Θ

for each

x \in s u p p X

; (iv)

\log f (x; \cdot)

is twice continuously differentiable on

Θ

for each

x \in s u p p X

; (v)

\log f (x; \cdot)

is thrice continuously differentiable on

Θ

for each

x \in s u p p X

.

Definition.

Probability Model. Let

f

be defined as in Assumption 3(i) and Assumption 3(ii). Let

F : R^{d} \times Θ \to [0, 1]

be defined such that for each θ in

Θ

,

F (\cdot; θ) : R^{d} \to [0, 1]

is the probability distribution for X specified by density

f (\cdot; θ) .

The set

M \equiv {F (\cdot; θ) : R^{d} \to [0, 1] | θ \in Θ}

is the probability model on

Θ

specified by

f

.

Definition.

Misspecified Model. The probability model

M

is misspecified when

P_{0} \notin M

, otherwise

M

is correctly specified.

2.3. Hypothesis Function

Definition.

GIMT Hypothesis Function. Let

ϒ

be a compact and non-empty subset of

R^{k \times k}

,

k \in ℕ

. A Generalized Information Matrix Test (GIMT) Hypothesis function

s : ϒ \times ϒ \to R^{r}

has the property that if

A = B

, then

s (A, B) = 0_{r}

for every symmetric positive definite matrix

A \in ϒ

and for every symmetric positive definite matrix

B \in ϒ

.

Definition.

Nondirectional and directional GIMT Hypothesis Functions. Let

ϒ

be a compact and non-empty subset of

R^{k \times k}

,

k \in ℕ

. A nondirectional GIMT hypothesis function

s : ϒ \times ϒ \to R^{r}

has the property

A = B

if and only if

s (A, B) = 0_{r}

for all

(A, B) \in ϒ \times ϒ

. A directional GIMT hypothesis function is a GIMT hypothesis function that is not nondirectional.

When

A : R^{m \times n} \to R^{q \times r}

, let

\frac{d A}{d B} \equiv \frac{d v e c (A^{T})}{d v e c (B^{T})}

when it exists (e.g., [27]; also see [28,29]). Let

\nabla s : ϒ \times ϒ \to R^{r \times 2 k^{2}}

be defined such that for all

A, B \in ϒ

:

\nabla s (A, B) \equiv [\begin{matrix} \frac{\partial s (\cdot, B)}{\partial v e c (A)} & \frac{\partial s (A, \cdot)}{\partial v e c (B)} \end{matrix}]

when it exists.

Assumption 4.

Hypothesis Function Regularity Conditions. (i) Let

ϒ

be a compact and non-empty subset of

R^{k \times k}

,

k \in ℕ

. Let

s : ϒ \times ϒ \to R^{r}

be continuous on

ϒ \times ϒ

; (ii)

A^{*}

and

B^{*}

are in the interior of

ϒ \subseteq R^{k \times k}

; (iii)

\nabla s

exists and is continuous on

ϒ \times ϒ

; (iv)

\nabla s^{*}

has full row rank r on

ϒ \times ϒ

.

In practice, Assumption 4 provides a procedure for checking if the theory described here can be applied to a proposed GIMT hypothesis function.

Definition.

Antisymmetric GIMT Hypothesis Function. Let

s : ϒ \times ϒ \to R^{r}

be a GIMT hypothesis function satisfying Assumption 4(i), Assumption 4(ii), and Assumption 4(iii). If, in addition,

s (A, B) = - s (B, A)

for all

(A, B) \in ϒ \times ϒ

, then

s : ϒ \times ϒ \to R^{r}

is called an antisymmetric GIMT hypothesis function.

2.4. Notation

Let

g (x; θ) \equiv - \nabla \log f (x; θ)

. Let

{\bar{g}}_{n} (θ) \equiv (1 / n) \sum_{i = 1}^{n} g (X_{i}; θ)

. Let

{\hat{g}}_{n} \equiv {\bar{g}}_{n} ({\hat{θ}}_{n})

.

Let

{\bar{A}}_{n} (θ) \equiv - (1 / n) \sum_{i = 1}^{n} \nabla^{2} \log f (X_{i}; θ)

. Let

A (θ) \equiv \nabla^{2} l (θ)

.

Let

{\bar{B}}_{n} (θ) \equiv (1 / n) \sum_{i = 1}^{n} g (X_{i}; θ) {(g (X_{i}; θ))}^{T}

.

Let

B (θ) \equiv \int g (x; θ) {(g (x; θ))}^{T} p_{x} (x) d ν_{x} (x)

. Let

{\hat{A}}_{n} \equiv {\bar{A}}_{n} ({\hat{θ}}_{n})

. Let

{\hat{B}}_{n} \equiv {\bar{B}}_{n} ({\hat{θ}}_{n})

.

Let

A^{*} = A (θ^{*})

. Let

B^{*} = B (θ^{*})

. Let

d_{x, θ} (x; θ) \equiv [\begin{matrix} - v e c h (\nabla^{2} \log f (x; θ)) \\ v e c h (g (x; θ) {(g (x; θ))}^{T}) \end{matrix}]

.

Let

{\bar{d}}_{n} (θ) = (1 / n) \sum_{i = 1}^{n} d_{x, θ} (X_{i}; θ)

. Let

{\hat{d}}_{n} \equiv {\bar{d}}_{n} ({\hat{θ}}_{n})

. Let

{\bar{d}}_{n}^{*} \equiv {\bar{d}}_{n} (θ^{*})

.

Let

d (θ) \equiv [\begin{matrix} v e c h (A (θ)) \\ v e c h (B (θ)) \end{matrix}]

. Let

d^{*} \equiv d (θ^{*})

.

Let the notation

I_{k}

denote a k-dimensional identity matrix.

Let the duplication matrix

D_{k} : R^{k (k + 1) / 2} \to R^{k^{2}}

be defined such that:

D_{k} v e c h (A) = v e c (A)

and the inverse duplication matrix

D_{k}^{†} : R^{k^{2}} \to R^{k (k + 1) / 2}

be defined such that:

D_{k}^{†} v e c (A) = v e c h (A)

.

Let

D_{k}^{\otimes} \equiv I_{2} \otimes D_{k}

and let

D_{k}^{\otimes †} \equiv I_{2} \otimes D_{k}^{†}

.

Let

\nabla d : Θ \to R^{k (k + 1)}

where

\nabla d \equiv [\begin{matrix} \frac{d v e c h (A)}{d θ} \\ \frac{d v e c h (B)}{d θ} \end{matrix}] = D_{k}^{\otimes †} [\begin{matrix} \frac{d A}{d θ} \\ \frac{d B}{d θ} \end{matrix}]

.

Let

\nabla {\bar{d}}_{n} (θ) \equiv (1 / n) \sum_{i = 1}^{n} \nabla d_{x, θ} (X_{i}; θ)

. Let

\nabla {\hat{d}}_{n} \equiv \nabla {\bar{d}}_{n} ({\hat{θ}}_{n})

. Let

\nabla d^{*} \equiv \nabla d (θ^{*})

.

2.5. Regularity Conditions

The following Assumption 5 uses a matrix version of the standard definition of dominated by an integrable function (see Appendix A).

Assumption 5.

Domination Conditions

(i)(a)

\log f (x; θ)