Next Article in Journal
Bayesian Approach to Disentangling Technical and Environmental Productivity
Next Article in Special Issue
Measurement Errors Arising When Using Distances in Microeconometric Modelling and the Individuals’ Position Is Geo-Masked for Confidentiality
Previous Article in Journal / Special Issue
Asymptotic Distribution and Finite Sample Bias Correction of QML Estimators for Spatial Error Dependence Model
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Strategic Interaction Model with Censored Strategies

Department of Economics, New York University, 19 West 4th Street, New York, NY 10012, USA
Econometrics 2015, 3(2), 412-442; https://doi.org/10.3390/econometrics3020412
Submission received: 24 March 2015 / Revised: 5 May 2015 / Accepted: 21 May 2015 / Published: 1 June 2015
(This article belongs to the Special Issue Spatial Econometrics)

Abstract

:
In this paper, we develop a new model of a static game of incomplete information with a large number of players. The model has two key distinguishing features. First, the strategies are subject to threshold effects, and can be interpreted as dependent censored random variables. Second, in contrast to most of the existing literature, our inferential theory relies on a large number of players, rather than a large number of independent repetitions of the same game. We establish existence and uniqueness of the pure strategy equilibrium, and prove that the censored equilibrium strategies satisfy a near-epoch dependence property. We then show that the normal maximum likelihood and least squares estimators of this censored model are consistent and asymptotically normal. Our model can be useful in a wide variety of settings, including investment, R&D, labor supply, and social interaction applications.

1. Introduction

Identification and estimation of strategic interaction models have recently received a great deal of attention in econometrics, owing to the growing interest and application of stochastic games in various fields including industrial organization, labor, political and international economics. Most of the existing literature has focused on discrete choice games, see [1,2] for a survey of recent results. In this literature, the observed data are assumed to arise from an equilibrium of a game played by a finite number of players, and therefore, to be correlated across players. Typically, the number of players is assumed to be fixed, and the asymptotic inferential theory relies on a large number of independent repetitions of the same game in different markets or in a single market at different points of time. Two notable exceptions are Menzel [3] and Xu [4], who develop the inferential theory of discrete choice games based on a large number of players.
In this paper, we develop a new model of a static game of incomplete information with a large number of players. The model has two key distinguishing features. First, the strategies are subject to threshold effects, and can be interpreted as dependent censored random variables, e.g., R&D investment and labor supply. Second, the game is played in a single market and is not repeated over time. To develop the asymptotic theory, we instead assume that the number of players grows unboundedly, and the players reside on an exogenously given lattice so that the vector of their choices and characteristics can be viewed as a dependent random field, which can be handled by the limit theorems for near-epoch dependent (NED) random fields established by Jenish and Prucha [5].
We derive this model explicitly in two game-theoretical applications: (i) R&D investment by firms under strategic complementarities; and (ii) labor supply decision by women under peer effects. The set-up is standard for a static game of incomplete information: each player’s payoff function depends on her choice and choices of other players, her commonly observed characteristics, and her private characteristic unobserved by other players; players move simultaneously based on their expectations about the choices of other players, and in equilibrium, players have self-consistent expectations, see [6], i.e., their subjective expectations coincide with the expectation based on the equilibrium distribution of strategies conditional on commonly observed variables. We assume that private characteristics are i.i.d. normal across players, and prove existence and uniqueness of the pure strategy equilibrium under some mild conditions. We then show that the censored equilibrium strategies also satisfy the NED property under the same conditions.
Under normality of private shocks, the equilibrium strategies boil down to a Tobit econometric model. However, in contrast to the standard Tobit model, our censored model involves a non-zero threshold parameter that needs to be estimated. Therefore, we use the following two-step semiparametric procedure: we first estimate the threshold by the minimum order statistic of the uncensored subsample, and then estimate the remaining parameters either by the maximum likelihood or least squares method. Unlike the standard Tobit model, the maximum likelihood estimator does not strictly dominate the least squares estimator in our model due to discontinuous dependence of the likelihood on the first-step estimator, which may amplify finite-sample biases stemming from the first-step estimation. This provides a rationale for considering the least squares estimation as an alternative to the maximum likelihood procedure. We establish consistency and asymptotic distributions of these estimators. The minimum order statistic is n-consistent and asymptotically exponentially distributed, while the maximum likelihood and least squares estimators are n -consistent and asymptotically normal. A Monte Carlo study suggests that all these estimators perform well in finite samples.
Finally, we address the computational challenges of our game with a large number of players. The standard estimation of games involves computing the equilibrium for each alternative parameter value and then optimizing the objective function over parameter values, and thus presents a formidable computational burden. To tackle it, we use the constrained optimization algorithm proposed by Su and Judd [7], which treats the equilibrium equations as constraints and optimizes simultaneously over parameters and equilibrium variables, thereby avoiding calculation of the equilibrium at each iteration on the parameter value. Su and Judd [7] show equivalence of this constrained optimization problem to the original problem. Our simulations confirm viability and a significant computational efficiency of the Su-Judd algorithm in our model.
To our knowledge, the proposed censored model has not been considered in the existing literature. Most of the existing results have dealt with discrete choice games, e.g., [8]. Recently, Xu and Lee [9] analyzed a spatially autoregressive (SAR) Tobit model, which can be viewed as a censored version the Cliff-Ord type linear SAR model with a known spatial weight matrix. Xu and Lee [9] establish the NED property as well as consistency and asymptotic normality of the maximum likelihood estimator using the limit theorems of Jenish and Prucha [5]. Though not explicitly demonstrated, this SAR Tobit model can be interpreted as an equilibrium of a static game of complete information, while our model is a game of incomplete information with a different concept of equilibrium and, consequently, qualitatively different implications. Moreover, the presence of latent endogenous variables and non-zero threshold in our model pose additional statistical and computational difficulties. Thus, the two papers are complementary to each other.
The paper is organized as follows. Section 2 describes and derives the model in two examples. Section 3 establishes existence and uniqueness of the equilibrium, and proves the NED property of the equilibrium strategies. Section 4 discusses identification and estimation of the model. Consistency and asymptotic distributions of the estimators are established in Section 5. Section 6 contains a Monte Carlo study, and Section 7 concludes. All proofs are collected in the appendices.

2. Model

In this paper, we are concerned with estimation of the following econometric model:
Y i n = b 0 + X i β 0 + j N i α j 0 E i Y j n + ε i , b 0 + X i β 0 + j N i α j 0 E i Y j n + ε i > γ 0 0 , otherwise
where Y i n , i = 1 , . . . , n , is the choice of agent i, E i Y j n = E Y j n | F i is agent i’s expectation, given its information set F i , about the choices of its neighbors, Y j n , within the neighborhood of radius r, N i = N i ( r ) , containing a fixed number of neighbors k = N i that does not depend on i; X i is the vector of observed characteristics of agent i; ε i is agent i’s private characteristic observed only by agent i, and α 0 , β 0 , b 0 , γ 0 , γ 0 0 , is the vector of unknown coefficients. The distribution of private shocks is known to all players. It is assumed that ε i i.i.d. N 0 , σ 0 2 , and are independent of X i . The information set of each player consists of the entire state vector, X ¯ n = X 1 , . . . , X n , and its private information ε i , i.e., F i = X ¯ n , ε i .
The choice of player i is assumed to be directly affected by its neighbors only in a fixed neighborhood of the known radius, r > 0 , with respect to some socio-economic metric. However, it will be indirectly affected by all other players. The number of the neighbors within the r-neighborhood of each agent is assumed to be fixed and equal to k. To avoid the incidental parameters problem, the k coefficients α j 0 j N i , measuring the effect of these k neighbors, are assumed to depend only on the relative locations of i and j, but not on j or i. We formally specify the metric and the neighborhood structure in the following section.
The above assumptions seem reasonable in many empirical settings. For example, in their R&D decision, firms would take into account R&D of its neighbors within a certain distance in the geographic (or product characteristic) space, rather than all firms in the market. This is due to the fact that technological spillovers, knowledge diffusion and labor mobility—determinants of R&D diffusion—are usually confined to a limited geographical or technological area.
Aside from the unobserved heterogeneity captured by the private shocks ε i , we do not allow individual heterogeneity in the parameters. The reason is that the model assumes only one repetition of the game and the number of agents growing to infinity to develop the asymptotic theory. Clearly, allowing heterogenous parameters across individuals will result in inconsistency.
Model (1) is fairly general for applications. It can arise as a system of best response functions of a static game of incomplete information among n players. Below, we derive these equations for two strategic interaction models: (i) R&D investment by firms; and (ii) labor supply by women. In these models, decisions of players exhibit strategic complementarities, and are subject to threshold effects.

2.1. Spillovers in R&D Investment

A large body of empirical evidence suggests presence of technology and R&D spillovers among firms, e.g., [10]. Audretsch and Feldman [10] find that knowledge spillovers are more prevalent in the industries that exhibit spatial clustering. Positive R&D spillovers may occur through several channels including knowledge transfers, labor mobility and imitation. Therefore, it is reasonable to expect the magnitude of such a spillover effect to depend on the geographical and technological distances between firms. As a result, firms’ R&D expenditures may be spatially correlated, and the magnitude of this correlation often decays with the distance between firms.
The literature distinguishes two major channels through which R&D can raise firms’ profits: cost-reducing and demand-creating effects. The former allows firms to carry out process improvements leading to efficiency gains and cost reduction, while the latter enables firms to improve the quality of their product and thereby boost the demand. Levin and Reiss [11] analyze a model of monopolistic competition with both demand-creating and cost-reducing R&D spillovers across n firms. Based on a sample of US manufacturing firms, the authors find statistically significant, sizeable spillovers in the cost-reducing R&D and insignificant, small spillovers in the demand-creating R&D in most industries. Levin and Reiss [11] also find the elasticity of product quality to firm’s own R&D to be much higher than that of cost to firm’s own R&D. Other theoretical models of R&D spillovers include d’Aspremont and Jacquemin [12], and Motta [13], among others.
Yet all these papers model the R&D investment as a continuous variable thereby neglecting the strong empirical evidence that a sizeable proportion of firms do not undertake R&D activities, see, e.g., [14]. One plausible explanation is that the demand-creating effect of R&D is subject to threshold effects: the quality could be raised only after a certain minimum level R&D investment is attained; R&D has no effect on the quality below this level. Thus, the R&D expenditure could be viewed as a censored decision variable whose optimal values below a certain threshold are unobserved. This type of model in the single-firm setup is analyzed by Gonzalez and Jaumandreu [15].
To study spatial spillovers in R&D investment, we develop a simple model of strategic interaction with a censored decision variable that incorporates the empirical findings discussed above. We consider a single, monopolistically competitive industry composed of a large number, n , firms, each producing a brand of the same product differentiated by quality. Let p i , q i , s i and y i denote, respectively, the price, demand, product quality and R&D expenditure of firm i. To derive q i , we employ a variant of the Dixit-Stiglitz [16] model of monopolistic competition in which the CES utility of a representative consumer is augmented with preference for quality:
u ( q 1 , . . . , q n ) = i = 1 n q i s i η ρ 1 / ρ , 0 < ρ < 1
where η > 0 is a quality sensitivity parameter. Utility maximization yields the demand for firm i of the form: q i p i , s i = K ¯ p i ν s i ϵ , where ν = 1 / 1 ρ > 1 is the elasticity of substitution between the quality-adjusted goods, ϵ = η ν 1 , K ¯ = I p ¯ 1 with I being consumer’s income and p ¯ = i = 1 n p i / s i η 1 ν is a quality-adjusted price index. To obtain non-increasing marginal demand for quality, 2 q i / s i 2 0 , suppose that η 1 / ν 1 . If the number of firms is large, it is reasonable to assume the effect of a single firm’s decision on the industry index p ¯ to be negligible, i.e., K ¯ is constant, and normalize K ¯ = 1 .
Following Gonzalez and Jaumandreu [15], we assume that firm’s own R&D expenditure affects only its product quality, subject to a technological constraint:
s i = s ( y i ) = y i + 1 δ if y i > y ¯ y ¯ + 1 δ if y i y ¯
where y ¯ is the minimum investment required for quality improvements, 0 < δ < 1 is the R&D sensitivity parameter. Throughout, we use y i + 1 instead of y i to ensure that the logarithm of the censored investment is defined for zero values, and let Y i = log y i + 1 . It is a convenient normalization, which does not affect the results.
Furthermore, in light of the above empirical findings, we assume that other firms’ R&D have only a cost-reducing effect on firm i , and this effect is limited to the fixed r-neighborhood, N i r , of firm i:
c i = c i X i , Y i , e i = exp X i β + α j N i Y j + e i
where c i and X i are, respectively, the marginal cost and vector of observed cost-determinants of firm i, Y i = log y i + 1 is the log of firm i’s investment, Y i = Y 1 , . . , Y i 1 , Y i + 1 , . . . , Y n is the vector of the log R&D choices of all firms except i, and e i is firm i’s idiosyncratic cost component. The coefficient α 0 measures the strength of this spillover effect.
Suppose that all firms observe X ¯ n = X 1 , . . . , X n , but e i is observed only by firm i. Given this uncertainty about the choices of other firms, following Durlauf [17], see also [6], we assume that each firm i decides on its R&D investment based on its beliefs about the choices of the other firms, E i Y i = E Y i | X ¯ n , e i , which are formed as the conditional expectation given all the information available to firm i.
Based on these beliefs, firms choose simultaneously price, p i , and R&D investment, y i , to maximize their profits subject to a technological constraint, i.e., solve
max p i 0 , y i 0 Π ( p i , y i , X i , E i Y i , e i ) = p i c i X i , E i Y i , e i q i p i , s i y i C ( y i )
s.t. q i p i , s i = p i ν s i ϵ , s i = y i + 1 δ if y i > y ¯ ( y ¯ + 1 ) δ if y i y ¯ and C ( y i ) = y i + 1 if y i > y ¯ y i if y i y ¯
where C ( y i ) is the cost of investment. The nonstochastic threshold y ¯ 0 is assumed to be observed and constant across all firms.
Lemma 1. 
The solution to optimization problem (2) and (3) is given by (1) with α j 0 = τ α , j = 1 , . . , k , β 0 = τ β , ε i = τ e i , τ = ν 1 / ( 1 ϵ δ ) , b 0 = log ϵ δ ν ν 1 ν 1 ν 1 / ( 1 ϵ δ ) and γ 0 = log 1 ϵ δ 1 / ϵ δ y ¯ + 1 > 0 .
If α < 0 , i.e., R&D of the neighbors has a cost-reducing effect on firm i, then both the probability and intensity of firm i’s R&D increases with the expected R&D of its neighbors. In other words, there are strategic complementarities or positive externalities in the R&D decision of firms. Furthermore, the probability of R&D is also increasing in (i) the elasticity of demand with respect to quality, higher ϵ; (ii) the elasticity of quality with respect to R&D, higher δ; and (iii) the market power, lower ν. The latter is consistent with the Schumpeterian argument that economies of scales make R&D more attractive to large firms than to small firms.

2.2. Peer Effects in Female Labor Supply

Our next example involves social interactions in the female labor supply. Suppose the utility of female i is defined over her consumption, c i , and leisure, l i , as follows:
U ( c i , y i , h i ) = c i δ δ + h i l i
where 0 < δ < 1 is the parameter characterizing the relative preference for consumption over leisure. Let y i be the labor supply of female i, and let the weight on the leisure, h i , capture the peer effects that depend on the labor supply decisions of female i’s peers in her social neighborhood, referred to as friends, as follows:
h i = h i X i , Y i , e i = exp ( X i β + α j N i Y j + e i )
where X i is the vector of observed characteristics of woman i, Y j = log y j + 1 is the log labor supply of woman i’s friends, and e i is her private characteristic unobserved by other women. As in the previous example, we use y i + 1 instead of y i to ensure that log of the censored labor supply is defined for zero values. In presence of positive peer effects, α < 0 , which implies mutual reinforcement in the choices within the social group.
As before, all women observe X ¯ n = X 1 , . . . , X n , but e i is observed only by i. Woman i makes her decision based on her beliefs about the choices of her peer group, E i Y i = E Y i | X ¯ n , e i , which are formed as the conditional expectation given all the information available to woman i. Based on these beliefs, women simultaneously maximize their utility subject to threshold effects:
max c i 0 , y i 0 U ( c i , X i , E i Y i , e i ) = c i δ δ + h i X i , E Y i , e i l i
s.t. c i = w y i + 1 if w y i + 1 > c ¯ c ¯ if w y i + 1 c ¯ , l i = T y i 1 if w y i + 1 > c ¯ T y i if w y i + 1 c ¯
where w is the wage, T is the time endowment, c ¯ is the reservation labor income, which can be interpreted as welfare or other government transfers. The nonstochastic threshold c ¯ 1 is assumed to be observed and constant across women.
Lemma 2. 
The solution to optimization problem (4) and (5) is given by (1) with α j 0 = α / 1 δ , j = 1 , . . , k , β 0 = β / 1 δ , ε i = e i / 1 δ , b 0 = log w δ / 1 δ and γ 0 = log 1 δ 1 / δ c ¯ > 0 .
If α < 0 , i.e., there are positive peer effects, then both the probability and magnitude of woman i’s labor supply increases with the expected labor supply of her peers.

3. Equilibrium: Characterization and Weak Dependence

We assume that in equilibrium, players have self-consistent expectations, i.e., their subjective expectations or beliefs coincide with the expectation based on the equilibrium distribution of strategies conditional on X ¯ n . That is,
E i Y j = E Y j | X ¯ n , ε i = E Y j | X ¯ n : = Y ˜ j n
where the expectation E · | X ¯ n is taken with respect to the equilibrium conditional distribution of strategies. The last equality follows from independence of ε i , and independence of ε i and X i .
Suppose that the ε i are i.i.d. N 0 , σ 0 2 . Taking conditional expectation of Equation (1) with respect to the equilibrium distribution of strategies, conditional on X ¯ n , yields:
Y ˜ i n = Φ j N i α j 0 Y ˜ j n + X i β 0 + b 0 γ 0 σ 0 j N i α j 0 Y ˜ j n + X i β 0 + b 0 + σ 0 ϕ j N i α j 0 Y ˜ j n + X i β 0 + b 0 γ 0 σ 0
where Φ and ϕ are, respectively, the c.d.f. and p.d.f. of the standard normal distribution.
Provided that they are well-defined, strategies Y i n , i = 1 , . . , n are independent across i conditional on X ¯ n and have censored normal distributions with the means Y ˜ i n , i = 1 , . . , n , the common variance σ 0 and the common nonstochastic threshold γ 0 . In equilibrium, Y ˜ i n , i = 1 , . . , n satisfy system (7). If this system has a unique solution, the corresponding equilibrium strategies, Y i n , i = 1 , . . , n , will be also unique with probability 1, since a censored normal variable is uniquely characterized by its mean, variance and threshold. This leads to the following characterization of equilibrium.
Definition 1. 
An equilibrium is a set of policy functions Y i n , i = 1 , . . , n whose conditional mean functions Y ˜ i n = E Y i n | X ¯ n i = 1 n satisfy system (7).
A similar characterization of equilibrium in discrete games of incomplete information is used in [8].
An appealing feature of Equation (1) is that it reduces to the popular Tobit model, which is part of any regression package. However, the difficulty is that Y i n depends on the latent regressors, Y ˜ i n . Thus, one would need first to obtain consistent estimates of the latent regressors, and then use any consistent estimation procedures for the Tobit model.
Since consistency of any estimation method hinges upon uniqueness of equilibrium, we first prove existence and uniqueness of the pure strategy equilibrium. To this end, we maintain the following assumption.
Assumption 1. 
The shocks ε i i.i.d. N 0 , σ 0 2 and λ = k ϕ 0 γ 0 / σ 0 + 1 α ¯ < 1 , where γ 0 0 , α ¯ = max 1 j k α j 0 , ϕ · is the p.d.f. of the standard normal distribution.
This assumption restricts the strength of interactions, captured by the coefficients α: interactions must not be too strong for a stable equilibrium to exist. Intuitively, if the interactions are long-ranged and too strong, then the effect of remote neighbors is substantial and may lead to instability and multiple equilibria. Since it involves only the estimated coefficients and the number of neighbors, k, is typically known, the assumption is testable.
Assumption 1 is similar to Assumptions B and C in [4], which restrict the strength of interactions to obtain a unique equilibrium in a discrete choice game of social interactions.
Based on this assumption, we can now show existence and uniqueness of equilibrium.
Theorem 1. 
Under Assumption 1, there exists a unique equilibrium of model (1).
In general, without restrictions on the parameters, multiple equilibria could occur. If one does not want to impose restrictions directly, one can use the Mathematical Program with Equilibrium Constraints (MPEC) routine to deal with multiple equilibria implicitly by choosing the equilibrium that maximizes the empirical likelihood.
In equilibrium, the policy variables will be correlated across players. To characterize their dependence, we assume that the process W i n = Y i n , Y ˜ i n , X i , ε i is indexed by a vector of locations t ( i ) = ( t 1 , , t d ) Z d on the lattice Z d , and hence can be viewed as a random field on Z d . In other words, W i n = W t ( i ) n , t ( i ) Λ n , n 1 is a triangular array of vector-valued random fields defined on a probability space ( Ω , F , P ) and observed on sample regions Λ n Z d . In the following, to simplify notation, we suppress the index t and write W i n = W t ( i ) n . Furthermore, we denote by · the Euclidian norm in R d and by · p = E · p 1 / p – the L p -norm.
Assumption 2. 
The data-generating process W i n = W t ( i ) n , t ( i ) Λ n , n 1 is a triangular array of random fields indexed by t ( i ) = ( t 1 , , t d ) Z d , where the Λ n Z d are the sample regions such that Λ n as n . The distance between players i and j is measured by the Euclidian metric: d i s t ( i , j ) = t ( i ) t ( j ) .
This assumption implies that the players’ locations are exogenous, i.e., they are known and determined outside the model. Extensions to endogenous locations would require explicit modeling of the location choice, and would therefore considerably complicate the model. This extension is an interesting direction for future research.
Given Assumption 2, it turns out that the equilibrium policy variables satisfy a weak dependence condition known as near-epoch dependence (NED), see [5], under the same condition that ensures uniqueness of equilibrium. For ease of reference, we state definition of NED random variables.
Definition 2. 
The triangular array of random fields W t n , t Λ n , n 1 , W t n 2 < , is L 2 -NED on { V t n , t Λ n , n 1 } iff sup n , t Λ n W t n E ( W t n | F t n ( m ) ) 2 ψ ( m ) for some sequence ψ ( m ) 0 as m , where F t n ( m ) = σ ( V s n ; s Λ n : t s m ) .
Theorem 2. 
Suppose Assumptions 1 and 2 hold and X 6 = sup i X i 6 < , then (i) Y ˜ i n is L 2 -NED on X i with the NED numbers ψ ( m ) = 2 Y 2 λ m / r , where p = m / r is the integer of part of m / r , (ii) Y i n is L 2 -NED on X i , ε i with the NED numbers c ψ 1 / 12 ( m ) for constant c that does not depend on m, (iii) 1 Y i n > γ 0 and 1 Y i n = 0 are L 2 -NED on X i , ε i with the NED numbers ( 2 + k α ¯ ) ψ 1 / 3 ( m ) .
The value of the constant c is given in the proof, but it is not important for what follows.

4. Identification and Estimation

We now discuss identification and estimation of our model. Let Z i n = Y ˜ j n , j N i , 1 , X i , and let δ = α j , j N i , b , β denote the corresponding vector of the coefficients. Given the specification, it is natural to identify and estimate all unknown parameters based on the likelihood function. The log likelihood function of the model is
log L ( θ , γ ) = i = 1 n log f Y i | X ¯ n ; θ , γ with log f Y i | X ¯ n ; θ , γ = 1 Y i = 0 log Φ γ Z i n δ σ + 1 Y i > γ log σ 1 ϕ Y i Z i n δ σ
where θ = δ , σ . Likelihood function (8) involves an unknown threshold parameter, γ , which is in contrast to the standard Tobit model, where the threshold is assumed to be known and equal to zero. The maximum likelihood (ML) estimator of γ is the minimum order statistics of the uncensored subsample. More specifically, partition the dependent variable and regressor matrix into two parts: Y = Y 0 , Y 1 and Z = Z ( 0 ) , Z ( 1 ) , where the subscript 0 indicates that observations come from the censored subsample, and the subscript 1 – from the uncensored subsample, and let
γ ^ = min Y i : Y i > 0 , i = 1 , , n
As shown in Proposition 1 below, γ ^ is a consistent estimator of γ. The ML estimators of the other parameters θ can then be obtained by the standard differentiation techniques.
Assumption 3. 
Suppose (i) sup i , n W i n 6 < ; and (ii) X i is a stationary α-mixing process with coefficient satisfying α ( k , l , m ) ( k + l ) ς α ^ ( m ) , ς 0 , α ^ ( m ) s.t. m = 1 m d ς / 3 1 α ^ 1 / 6 m < .1
Proposition 1. 
Under Assumptions 1–3, lim n P n γ ^ γ 0 < z = 1 exp a z for z > 0 , where
a = E Φ Z i n δ γ σ E ϕ Z i n , ( 1 ) δ γ / σ σ Φ Z i n , ( 1 ) δ γ / σ
and Z i n , ( 1 ) is the regressor vector from the uncensored subsample.
Thus, γ ^ is n-consistent and asymptotically exponentially distributed. For i.i.d. sample, this result has been established by Carson and Sun [19]. So, Proposition 1 extends [19] to a spatially dependent case. The superconsistency of γ ^ is a well-known consequence of the dependence of the support of Y i on γ.
Proposition 1 implies that γ 0 is identified. The remaining parameters, θ 0 = δ 0 , σ 0 can now be identified from the likelihood function. Alternatively, one can identify θ 0 from the conditional mean function and estimate it by the least squares procedure:
φ ( Z i n , θ , γ ) = E Y i | X ¯ n = Φ Z i n δ γ σ Z i n δ + σ ϕ Z i n δ γ σ
In contrast to the standard Tobit model with zero-threshold, the ML estimator does not strictly dominate the least squares (LS) estimator in our model due to the presence of the first-step estimator. The thing is that the LS objective function is continuous in γ, while the likelihood function is not. The latter implies that small finite-sample biases in γ ^ may cause sizeable finite-sample biases in the ML estimates of θ. This prediction is confirmed by the simulation results of Section 6, which suggest larger finite-sample biases in the ML than in the LS estimator. This is the main rationale for considering the LS procedure as an alternative to the ML procedure in our model.
Thus, estimation of model (1) could be carried out in two steps. First, estimate the threshold parameter γ by the minimum order statistic of the uncensored subsample γ ^ , and substitute it for the true γ 0 in (8) and (10). Then, estimate the remaining parameters θ in (8) and (10) by the ML or LS procedures, respectively. Note that the least squares estimator of γ in (1) will be imprecise due to near multicollinearity of the intercept and threshold. Therefore, we use the first-step estimator γ ^ in both procedures.
We now present sufficient conditions for identification of θ.
Assumption 4. 
Suppose (i) at least one of components of X i has the full support, R ; and (ii) E Z i n Z i n is positive definite.
Theorem 3. 
Under Assumptions 1–4, Q θ , γ 0 = E m W i n , θ , γ 0 is uniquely maximized at θ 0 for
A. 
m W i n , θ , γ 0 = log f Y i | X ¯ n ; θ , γ 0 , where log f Y i | X ¯ n ; θ , γ is as defined in (8), and
B. 
m W i n , θ , γ 0 = Y i φ ( Z i n , θ , γ 0 ) 2 , where φ ( Z i n , θ , γ ) is as defined in (10).
Practically, the second-step estimation of θ could be implemented through the following nested fixed-point (NFXP) algorithm: (i) in an inner loop, for a given θ, find the unique solution of the equilibrium equations (7) by the fixed-point algorithm; and (ii) in an outer loop, search over θ Θ that maximizes the objective function. Let Y ˜ θ , γ = Y ˜ 1 , , Y ˜ n be the solution of the equilibrium equations (7). Then, the resulting estimator can be represented as
θ ^ = arg max θ Θ Q n θ , Y ˜ θ , γ ^ , γ ^ = 1 n i = 1 n m W i n , θ , γ ^
where m · , · , · is either the log likelihood function defined in (8) or minus the squared deviation of Y i from the conditional mean defined in (10). This formulation makes explicit the dependence of the equilibrium variables on the estimated parameters. Given superconsistency of the first-step estimator, the resulting second-step maximum likelihood or least squares estimators of θ will be root-n consistent, asymptotically normal and independent of γ ^ , as shown in Theorem 4 below. However, the NFXP algorithm will be computationally costly for large cross-sectional datasets, e.g., n 200 .
To overcome this problem, we instead use the constrained optimization algorithm proposed by Su and Judd [7]. The idea is to solve the following constrained optimization problem:
max θ , Y ˜ Q n θ , Y ˜ , γ ^ subject to h ( Y ˜ , θ , γ ^ ) = 0
where h ( Y ˜ , θ , γ ) = 0 is the vector representation of the equilibrium system (7). Note that Y ˜ in this formulation does not depend on θ, and is chosen simultaneously with θ to maximize the objective function subject to the equilibrium constraints. This obviates the need to solve the multi-dimensional fixed point problem for Y ˜ at each iteration on θ.
Su and Judd [7] prove equivalence of problems (11) and (12) provided that the model is identified. They also demonstrate the computational advantage of this constrained optimization algorithm over the NFXP algorithm in the context of a single-agent dynamic discrete choice model. In particular, they show that the proposed algorithm leads, on average, to a ten-fold reduction of the computational time relative to the NFXP algorithm.
Since our model is identified by Theorem 3, the maximizer θ ^ of problem (11) equals the maximizer θ ¯ of problem (12) by Proposition 1 of Su and Judd [7], and one can thus replace the computationally intensive problem (11) by the simpler problem (12). We investigate performance of the constrained optimization algorithm for our model in the Monte Carlo study of Section 6.

5. Consistency and Asymptotic Normality

We next show consistency and asymptotic normality of the maximum likelihood and least squares estimators. To this end, we need the following assumption.
Assumption 5. 
Suppose
(i) 
Θ = Δ × Σ and Γ are compact, Σ = σ 1 ; σ 2 , σ 1 > 0 , θ 0 i n t ( Θ ) , γ 0 i n t ( Γ ) .
(ii) 
sup i , n E sup Θ log 2 Φ γ 0 Z i n δ / σ < , sup i , n E sup Θ Y i n Z i n δ 2 < , sup i , n E sup Θ m W i n , θ , γ 0 θ < .
Part (i) Assumption 5 is the standard condition on the parameter space and the true parameter value; Part (ii) is used to verify uniform convergence of various sample functions. Generally, the above assumptions are slightly stronger than those in the fully parametric Tobit model with zero-threshold, since our Tobit estimator of θ relies on a nonparametric first-step estimator of γ.
Theorem 4. 
Under Assumptions 1–5, the maximum likelihood and least squares estimators are both consistent and asymptotically normal, i.e., n θ ^ n θ 0 d N ( 0 , Ω ) , where Ω = H 0 1 S 0 H 0 1 with H 0 = E 2 θ θ m ( W i n , θ 0 , γ 0 ) and S 0 = V a r θ m ( W i n , θ 0 , γ 0 ) .
Thus, both the maximum likelihood and least squares estimators of θ are n -consistent and asymptotically normal. To conduct inference, it remains to obtain a consistent estimate of the covariance matrix S 0 . For this purpose, one can employ the following spatial HAC estimator:
S ^ θ ^ , γ ^ = 1 n ¯ i Λ n j Λ n : i j h n K ( ( i j ) / h n ) θ m ( W i n , θ ^ , γ ^ ) θ m ( W j n , θ ^ , γ ^ )
where K ( ( i j ) / h n ) = K ( ( i 1 j 1 ) / h n , , ( i d j d ) / h n ) is a d-dimensional symmetric kernel, and h n is a bandwidth parameter. Jenish [20] proves consistency of this estimator for more general nonparametric estimators of γ. In our model, consistency is achieved by bandwidth parameters satisfying h n = O ( n 1 / 3 d ) .

6. Numerical Results

In this section, we examine the finite sample properties of the maximum likelihood (ML) and least squares (LS) estimators of our censored model, as well the performance of the Su-Judd [7] algorithm.
Throughout, data W i 1 , i 2 reside on the two-dimensional lattice Z 2 , where i 1 , i 2 Z 2 denotes, for simplicity, the vector of coordinates. The data are simulated on a rectangular grid of m 1 + 300 × m 2 + 300 locations. To control for boundary effects, we discard the 300 outer boundary points along each of the axes and use the sample of size n = m 1 m 2 for estimation.
Our experiment consists of two stages: (i) simulation; and (ii) estimation. In the first stage, we first simulate two i.i.d. N ( 0 , 1 ) processes ε i 1 , i 2 and η i 1 , i 2 , which are independent of each other. Next, using the fixed-point algorithm, we generate the process X i 1 , i 2 according to:
X i 1 , i 2 = 0 . 2 X i 1 1 , i 2 + X i 1 , i 2 1 + X i 1 + 1 , i 2 + X i 1 , i 2 + 1 + η i 1 , i 2
and then the process Y ˜ i 1 , i 2 according to:
Y ˜ i 1 , i 2 = Φ α Y ˜ i 1 1 , i 2 + Y ˜ i 1 , i 2 1 + Y ˜ i 1 + 1 , i 2 + Y ˜ i 1 , i 2 + 1 + X i 1 , i 2 β + b γ σ · · α Y ˜ i 1 1 , i 2 + Y ˜ i 1 , i 2 1 + Y ˜ i 1 + 1 , i 2 + Y ˜ i 1 , i 2 + 1 + X i 1 , i 2 β + b σ ϕ α Y ˜ i 1 1 , i 2 + Y ˜ i 1 , i 2 1 + Y ˜ i 1 + 1 , i 2 + Y ˜ i 1 , i 2 + 1 + X i 1 , i 2 β + b γ σ
with α = 0 . 2 , β = 1 , b = 1 . 25 , γ = 2 and σ = 1 . Last, we form the process Y i 1 , i 2 according to:
Y i 1 , i 2 = Y i 1 , i 2 * 1 Y i 1 , i 2 * > γ , where Y i 1 , i 2 * = b + X i 1 , i 2 β + α Y ˜ i 1 1 , i 2 + Y ˜ i 1 , i 2 1 + Y ˜ i 1 + 1 , i 2 + Y ˜ i 1 , i 2 + 1 + ε i 1 , i 2
In the second stage, we first construct the minimum order statistic estimator of γ γ ^ – and then use the Su-Judd [7] constrained optimization algorithm to estimate the remaining parameters θ = α , β , b , σ . As discussed in Section 4, we estimate the n-dimensional vector of endogenous variables Y ˜ = Y ˜ i 1 , i 2 jointly with θ, instead of computing it at each iteration on θ, i.e., we solve the constrained optimization problem:
max θ , Y ˜ Q n θ , Y ˜ , γ ^ subject to h ( Y ˜ , θ , γ ^ ) = 0
where h ( Y ˜ , θ , γ ) = 0 is the vector representation of the equilibrium system (13).
The estimation results based on 1000 Monte-Carlo repetitions are presented in Table 1, Table 2, Table 3, Table 4 and Table 5. Table 1 reports the estimates of the autoregressive parameter α = 0.2 .
Table 1. Estimation of auto-regressive parameter: α = 0.2 .
Table 1. Estimation of auto-regressive parameter: α = 0.2 .
Maximum Likelihood
Sample SizeMeanBias (%)SDRMSE25th Pct50th Pct75th Pct
N = 200 0.1978 1.1212 0.0056 0.0060 0.1946 0.1979 0.2013
N = 400 0.1989 0.5674 0.0030 0.0032 0.1970 0.1991 0.2008
N = 600 0.1991 0.4255 0.0022 0.0024 0.1977 0.1992 0.2007
N = 800 0.1994 0.3052 0.0018 0.0019 0.1983 0.1994 0.2006
N = 1000 0.1995 0.2691 0.0015 0.0016 0.1984 0.1995 0.2005
Least Squares
N = 200 0.1989 0.5452 0.0059 0.0060 0.1953 0.1990 0.2027
N = 400 0.1995 0.2578 0.0033 0.0034 0.1974 0.1996 0.2018
N = 600 0.1996 0.1981 0.0023 0.0024 0.1982 0.1997 0.2013
N = 800 0.1998 0.1162 0.0019 0.0019 0.1985 0.1999 0.2011
N = 1000 0.1998 0.0787 0.0017 0.0017 0.1986 0.1998 0.2011
Both the ML and LS estimators of α behave well for all sample sizes. The finite-sample bias declines rapidly from about 1.1% ( n = 200 ) to 0.3% (for n = 1000 ) in the case of the ML estimator, and from 0.55% ( n = 200 ) to 0.08% ( n = 1000 ) in the case of the LS estimator. These results suggest that a five-fold increase in the sample size leads to a more than three-fold reduction in the ML bias, and a more than six-fold decrease in the LS bias, which is consistent with our asymptotic theory. The standard errors also fall off rapidly with the sample size. A similar pattern is observed for the estimates of the slope β, shown in Table 2.
Table 3 contains the minimum order statistic estimates of γ. The finite-sample bias diminishes from 4.5% ( n = 200 ) to 0.8% (for n = 1000 ), which means that a five-fold increase in the sample size is associated with more than a five-fold reduction in the bias. This is in line with the theoretical prediction of n-consistency of the minimum order statistics.
Table 2. Estimation of slope: β = 1.0 .
Table 2. Estimation of slope: β = 1.0 .
Maximum Likelihood
Sample SizeMeanBias (%)SDRMSE25th Pct50th Pct75th Pct
N = 200 1.0053 0.5286 0.0666 0.0668 0.9612 1.0072 1.0507
N = 400 1.0037 0.3665 0.0419 0.0420 0.9756 1.0022 1.0295
N = 600 1.0032 0.3248 0.0324 0.0326 0.9789 1.0038 1.0251
N = 800 1.0022 0.2228 0.0254 0.0255 0.9844 1.0014 1.0197
N = 1000 1.0019 0.1886 0.0232 0.0233 0.9870 1.0019 1.0169
Least Squares
N = 200 1.0050 0.5027 0.0712 0.0714 0.9538 1.0038 1.0522
N = 400 1.0018 0.1758 0.0448 0.0449 0.9717 0.9988 1.0320
N = 600 1.0015 0.1534 0.0340 0.0340 0.9775 1.0009 1.0241
N = 800 1.0009 0.0909 0.0271 0.0271 0.9820 0.9999 1.0180
N = 1000 1.0002 0.0241 0.0250 0.0250 0.9840 1.0007 1.0161
Table 3. Estimation of threshold: γ = 2.0 .
Table 3. Estimation of threshold: γ = 2.0 .
Min. Order Statistics
Sample SizeMeanBias (%)SDRMSE25th Pct50th Pct75th Pct
N = 200 2.0899 4.4969 0.1128 0.1443 2.0219 2.0528 2.1111
N = 400 2.0438 2.1888 0.0462 0.0637 2.0123 2.0288 2.0610
N = 600 2.0294 1.4715 0.0310 0.0427 2.0082 2.0198 2.0408
N = 800 2.0189 0.9471 0.0193 0.0270 2.0050 2.0130 2.0260
N = 1000 2.0161 0.8047 0.0166 0.0231 2.0045 2.0109 2.0219
Next, Table 4 and Table 5 present the estimates of the intercept b and the standard deviation σ, respectively.
The maximum likelihood estimates of b and σ exhibit larger biases than those of α and β. However, they decrease as the sample size increases: from 5.5% ( n = 200 ) to 1.4% ( n = 1000 ) in the case of b, and from 15.9% ( n = 200 ) to 6.9% ( n = 1000 ) in the case of σ. Thus, the biases still halve when the sample size increases four-fold, consistent with the asymptotic theory. Large small-sample biases could be explained by weak identification or near multicollinearity introduced by the inverse Mills ratio, which is approximately linear over a wide range of its argument.
Table 4. Estimation of intercept: b 0 = 1.25 .
Table 4. Estimation of intercept: b 0 = 1.25 .
Maximum Likelihood
Sample SizeMeanBias (%)SDRMSE25th Pct50th Pct75th Pct
N = 200 1.3185 5.4812 0.1352 0.1516 1.2353 1.3156 1.3950
N = 400 1.2862 2.8955 0.0749 0.0832 1.2372 1.2804 1.3321
N = 600 1.2778 2.2216 0.0539 0.0607 1.2415 1.2744 1.3130
N = 800 1.2702 1.6191 0.0438 0.0482 1.2403 1.2698 1.2990
N = 1000 1.2679 1.4353 0.0374 0.0415 1.2428 1.2668 1.2924
Least Squares
N = 200 1.2834 2.6748 0.1449 0.1487 1.1917 1.2808 1.3699
N = 400 1.2671 1.3653 0.0843 0.0860 1.2121 1.2630 1.3197
N = 600 1.2635 1.0803 0.0582 0.0598 1.2236 1.2625 1.3011
N = 800 1.2582 0.6562 0.0483 0.0489 1.2252 1.2572 1.2896
N = 1000 1.2560 0.4775 0.0427 0.0431 1.2253 1.2557 1.2852
Table 5. Estimation of standard deviation: σ = 1.0 .
Table 5. Estimation of standard deviation: σ = 1.0 .
Maximum Likelihood
Sample SizeMeanBias (%)SDRMSE25th Pct50th Pct75th Pct
N = 200 0.8405 15.9486 0.0575 0.1695 0.8010 0.8397 0.8798
N = 400 0.8966 10.3429 0.0397 0.1108 0.8700 0.8972 0.9243
N = 600 0.9179 8.2084 0.0316 0.0879 0.8962 0.9185 0.9392
N = 800 0.9262 7.3842 0.0273 0.0787 0.9075 0.9267 0.9448
N = 1000 0.9303 6.9696 0.0246 0.0739 0.9134 0.9305 0.9468
Least Squares
N = 200 0.9860 1.3956 0.4799 0.4801 0.7597 0.9956 1.2585
N = 400 0.9598 4.0236 0.2901 0.2929 0.8227 0.9858 1.1342
N = 600 0.9784 2.1637 0.2000 0.2012 0.8833 0.9881 1.1034
N = 800 0.9828 1.7215 0.1594 0.1604 0.8930 0.9963 1.0809
N = 1000 0.9303 0.8312 0.1241 0.1243 0.9114 0.9941 1.0732
Interestingly, the LS estimates of all parameters, including b and σ, have smaller finite-sample biases than the respective ML estimates. The reason is that the LS objective function is continuous in the first-step nonparametric estimator of γ, while the likelihood function is not, and small first-step biases in γ may get disproportionately amplified and translate into sizeable second-step biases in θ. Consequently, the biases in the ML estimates of b and σ due to weak identification are further exacerbated by discontinuity of the likelihood function. Nevertheless, as expected, the LS estimator has larger standard errors than the ML estimator across all parameters. Thus, in contrast to the standard Tobit model with zero-threshold, the ML estimator does not strictly dominate the LS estimator in our model.
Finally, Table 6 reports the computational time and the number of converged iterations for the Su-Judd [7] algorithm. The algorithm performs well for all sample sizes: converges in almost 99% of iterations and the time costs are less than two hours even in large sample sizes such as n = 1000 . For comparison, the NFXP algorithm will take about 130–150 hours to estimate the model for same sample sizes. Thus,the Su-Judd [7] algorithm offers a considerable time savings over standard nested fixed-point algorithms.
Table 6. Algorithm Performance
Table 6. Algorithm Performance
Maximum Likelihood
Sample Size N = 200 N = 400 N = 600 N = 800 N = 1000
Number of converged iterations1000999991992961
Run Time2 min6 min27 min61 min119 min
Least Squares
Sample Size N = 200 N = 400 N = 600 N = 800 N = 1000
Number of converged iterations942996997999998
Run Time2 min5 min24 min57 min112 min
Overall, the simulations results are consistent with our asymptotic theory: the finite-sample biases and standard errors of the ML and LS estimators decay rapidly with the sample size. Moreover, the Su-Judd [7] constrained optimization algorithm seems to be a viable and effective numerical procedure for estimating games with large number of players, including our model.

7. Conclusions

In this paper, we study identification and estimation of a static game of incomplete information with censored strategies. Specifically, we show existence and uniqueness of an equilibrium as well as its weak dependence property under a condition that restricts the strength of interactions among the players. We then show identification of the parameters and estimate them by maximum likelihood and least squares procedures. The resulting estimators are shown to be consistent and asymptotically normal. We also demonstrate application of our results to modeling spillovers in firms’ R&D investment and peer effects in female labor supply.
One direction for future research is to relax the normality assumption on the errors and obtain identification under more general error distributions whose conditional mean functions satisfy contraction mapping conditions, similar to the one used in the paper. Another extension could be to allow for random threshold effects in the outcome variable, using some parametric family of distributions. One can also allow for truncated strategies by slight modifications in the likelihood function and Assumption 1. Finally, instead of the regular lattice, one can consider players located at the nodes of some graph, which describes the network structure, as in the social interactions literature.

Acknowledgments

I would like to thank Konrad Menzel for his helpful comments.

Conflicts of Interest

The author declares no conflict of interest.

Appendix

A. Proofs for Section 2

Throughout appendices, C denotes a generic constant that does not depend on n and may vary from line to line. Also, A = t r a c e 1 / 2 ( A A ) denotes the norm of a nonrandom matrix A.
Proof of Lemma 1: The first-order conditions with respect to price are the same for both y i > y ¯ and y i y ¯ . They imply the following optimal price: Π p i = p i ν s i ϵ ν p i c i p i ν 1 s i ϵ = 0 p i * = ν ν 1 c i .
First, if y i > y ¯ , then the first-order conditions with respect to investment imply the following optimal investment: Π y i = ϵ δ p i c i p i ν y i + 1 ϵ δ 1 1 = 0 y i * + 1 = B c i τ , where B = ϵ δ ν ν ν 1 ν 1 1 / ( 1 ϵ δ ) and τ = ν 1 / ( 1 ϵ δ ) > 0 since, by assumption, 0 < ϵ δ < 1 and ν > 1 . The value of the profit at the optimal values of price and investment is Π 1 = 1 ϵ δ p i * 1 ν y i * + 1 ϵ δ / ν .
Next, if y i y ¯ , then Π = p i c i p i ν y ¯ + 1 ϵ δ y i and, hence, it is optimal to set y i * = 0 . The value of the profit in the second case is Π 2 = Π p i * , 0 = p i * 1 ν y ¯ + 1 ϵ δ / ν . Thus, firm i engages in R&D iff Π 1 > Π 2 y i * + 1 > 1 ϵ δ 1 / ϵ δ y ¯ + 1 , and hence the optimal investment is given by:
Y i = b 0 X i τ β τ α j N i E i Y j τ e i , if b 0 X i τ β τ α j N i E i Y j τ e i > γ 0 0 , otherwise
where b 0 = log B and γ 0 = log 1 ϵ δ 1 / ϵ δ y ¯ + 1 > 0 since 0 < ϵ δ < 1 and y ¯ > 0 .
Proof of Lemma 2: First, if w y i + 1 > c ¯ , then U = w δ y i + 1 δ δ + h i T y i 1 and the first-order conditions imply the following labor supply: U y i = w δ y i + 1 δ 1 h i = 0 y i * + 1 = w δ / 1 δ h i 1 / δ 1 . The optimal utility is U 1 = w δ y i * + 1 δ δ + h i T w δ y i * + 1 δ .
Next, if w y i + 1 c ¯ , then U = w δ c ¯ δ δ + h i T y i and, hence, it is optimal to set y i * = 0 . The optimal utility in the second case is U 2 = w δ c ¯ δ δ + h i T . Thus, firm i engages in R&D iff U 1 > U 2 y i * + 1 > 1 δ 1 / δ c ¯ , and hence the optimal investment is given by:
Y i = b 0 X i β 1 δ α 1 δ j N i E i Y j e i 1 δ , if b 0 X i β 1 δ α 1 δ j N i E i Y j e i 1 δ > γ 0 0 , otherwise
where b 0 = log w δ / 1 δ and γ 0 = log 1 δ 1 / δ c ¯ > 0 since c ¯ 1 and 0 < δ < 1 .

B. Proofs for Section 3

Proof of Theorem 1: In the following, we suppress dependence of variables on n and write Y ˜ i = Y ˜ i n .
To prove the theorem, it suffices to show that the mapping G = ( G 1 , . . . , G n ) : R n R n is a contraction mapping w.r.t. Y ˜ i i = 1 n , with the components given by:
G i = G Y ˜ j j N i , X i = Φ j N i α j 0 Y ˜ j + X i β 0 + b 0 γ 0 σ 0 j N i α j 0 Y ˜ j + X i β 0 + b 0 + σ 0 ϕ j N i α j 0 Y ˜ j + X i β 0 + b 0 γ 0 σ 0
The result would then follow by the Banach Fixed Point Theorem.
To simplify notation, let z = j N i α j 0 Y ˜ j + X i β 0 + b 0 γ 0 σ 0 1 . Differentiating G i w.r.t. Y ˜ j gives
G i Y ˜ j = α j 0 σ 0 ϕ z σ 0 z + γ 0 + α j 0 Φ z α j 0 z ϕ z = α j 0 γ 0 σ 0 ϕ z + α j 0 Φ z
where we used ϕ z = z ϕ z . It then follows that sup G i Y ˜ j α j 0 ϕ 0 γ 0 σ 0 1 + 1 . Hence, by the Mean Value Theorem, for any X i and any Y, Y R n :
G ( Y , X i ) G ( Y , X i ) < j N i sup G Y ˜ j · Y j Y j λ 0 j N i Y j Y j , λ 0 < 1
by Assumption 1, where λ 0 = α ¯ ϕ 0 γ 0 σ 0 1 + 1 and α ¯ = max 1 j k α j 0 .
Consequently, the vector mapping G satisfies the following Lipschitz condition:
G ( Y ) G ( Y ) i = 1 n G ( Y , X i ) G ( Y , X i ) λ i = 1 n Y i Y i
with the Lipschitz coefficient λ = k λ 0 < 1 , by Assumption 1. ∎
Proof of Theorem 2: The proof is similar to that of Proposition 1 of Jenish [18]. Let N i ( m ) = j , j i , 1 j n : t i t j m be the m-neighborhood of agent i that excludes i, and N i o ( m ) = N i ( m ) i be the neighborhood of agent i that includes i, where t i Z d is i’s location.
Let Y ˜ i n = F i X 1 , , X n , i = 1 , . . . , n , be the unique solution of the system Y ˜ i n = G ( Y ˜ j n j N i ( r ) , X i ) , i = 1 , . . . , n , defined in (B1). To simplify, we suppress dependence of variables on n, and write Y ˜ i = Y ˜ i n .
Fix i, and define η i ( m ) = X j j N i o ( m ) and ζ i ( m ) = X j j Λ n \ N i o ( m ) for any m N , so that we can partition X = X 1 , , X n = η i ( m ) ; ζ i ( m ) . Suppose that the underlying probability space is rich enough that there exists a random variable U uniformly distributed on 0 , 1 that is independent of X j . Then, by Lemma A1 in Jenish [18], there exists a function h ( U , η i ( m ) ) such that the process X ( m ) = X j m j = 1 n = η i ( m ) ; h ( U , η i ( m ) ) has the same distribution as X.
We now construct an approximation to Y ˜ i . Define
Y ˜ j ( m ) = F j X 1 ( m ) , , X n ( m ) = F j η i ( m ) ; h ( U , η i ( m ) )
Since X ( m ) has the same distribution as X , we have Y ˜ j ( m ) = G ( Y ˜ l ( m ) l N j r , X j ( m ) . If j N i ( m ) , then X j ( m ) = X j and by the Lipschitz condition (B2), we have
Y ˜ j Y ˜ j ( m ) 2 λ 0 l N j ( r ) Y ˜ l Y ˜ l ( m ) 2
Consider two cases: m r and m < r . If m r , recursive use of (B4) gives
Y ˜ i Y ˜ i ( m ) 2 λ 0 j 1 N i ( r ) Y ˜ j 1 Y ˜ j 1 ( m ) 2 λ 0 2 j 1 N i ( r ) j 2 N j 1 ( r ) Y ˜ j 2 Y ˜ j 2 ( m ) 2 . . . λ 0 p j 1 N i ( r ) . . . j p N j p 1 ( r ) Y ˜ j p Y ˜ j p ( m ) 2 2 Y 2 λ p
where λ = λ 0 k < 1 . If m < r , then p = m / r < 1 and again we will have the same bound on the error:
Y ˜ i Y ˜ i ( m ) 2 λ 0 j 1 N i ( r ) Y ˜ j 1 Y ˜ j 1 ( m ) 2 2 Y 2 λ 2 Y 2 λ p
Since i was arbitrary, using the same arguments, we can approximate the entire process Y ˜ i by Y ˜ i ( m ) , which has a similar functional form but a more tractable dependence structure. The approximation error is given by sup n , i Λ n Y ˜ i Y ˜ i ( m ) 2 2 Y 2 λ p .
We now verify the NED condition sup n , i Λ n Y ˜ i E ( Y ˜ i | F i n X ( m ) ) 2 0 as m , where F i n X ( m ) = σ ( X j , j N i o ( m ) ) . If 2 m > diameter Λ n , then Y ˜ i E ( Y ˜ i | F i n X ( m ) ) 2 = 0 .
If 2 m diameter Λ n , approximate Y ˜ i by Y ˜ i ( m ) as defined in (B3). Note that by construction, E [ Y ˜ i | F i n X ( m ) ] = E [ Y ˜ i ( m ) | F i n X ( m ) ] = 0 1 F i η i ( m ) ; h ( u , η i ( m ) ) d u . Then, by the Jensen inequality and independence of U and X i , we have
Y ˜ i E ( Y ˜ i | F i n X ( m ) ) ) 2 = F i η i ( m ) ; ζ i ( m ) 0 1 F i η i ( m ) ; h ( u , η i ( m ) ) d u 2 E 0 1 F i η i ( m ) ; ζ i ( m ) F i η i ( m ) ; h ( u , η i ( m ) ) 2 d u 1 / 2 = F i η i ( m ) ; ζ i ( m ) F i η i ( m ) ; h ( U , η i ( m ) ) 2 = Y ˜ i Y ˜ i ( m ) 2 ψ ( m ) = 2 Y 2 λ m / r 0 as m
We now show that Y i is L 2 -NED on X i , ε i . Let F i n ( m ) = σ ( X j , ε j , j N i o ( m ) ) and define Z i as
Z i = g Y ˜ j j N i r = b 0 + X i β 0 + j N i r α j 0 Y ˜ j + ε i , b 0 + X i β 0 + j N i r α j 0 Y ˜ j + ε i > γ 0 γ 0 , otherwise
Then, Y i = Z i 1 Z i > γ 0 . The proof will proceed in three steps: (i) verify the NED property of Z i ; (ii) show the NED property of 1 Y i > 0 = 1 Z i > γ 0 ; and (iii) apply Proposition 3 of Jenish and Prucha [5] to the product of the two processes to show its NED property.
It can be easily verified that g y 1 , , y k satisfies a Lipschitz inequality: g y 1 , . . . , y k g y 1 , . . . , y k α ¯ j = 1 k y j y j , where α ¯ = max 1 j k α j 0 . By the least mean squared property of the conditional mean,
Z i E Z i | F i n ( m ) 2 g Y ˜ i g E Y ˜ i | F i n ( m ) 2 α ¯ j N i r Y ˜ j E Y ˜ j | F i n X ( m ) 2 k α ¯ ψ ( m ) 0 as m
since E Y ˜ j | F i n ( m ) = E Y ˜ j | F i n X ( m ) by independence of ε i and X i , Y ˜ j .
Next, let φ ( z ) = 1 z > γ 0 , a > 0 be some positive scalar to be chosen later. Define the function:
φ m ( z ) = 1 , z > γ 0 + ψ a ( m ) ψ a ( m ) z γ 0 , γ 0 z γ 0 + ψ a ( m ) 0 , z < γ 0
This piecewise linear function converges pointwise to 1 z > γ 0 as m ; but in contrast to 1 z > γ 0 , it is Lipschitz-continuous with the Lipschitz coefficient ψ a ( m ) , i.e., for all z 1 , z 2 φ m ( z 1 ) φ m ( z 2 ) ψ a ( m ) z 1 z 2 . Moreover, observe that sup z φ ( z ) φ m ( z ) 1 , and consequently,
φ ( z ) φ m ( z ) 2 1 γ 0 z γ 0 + ψ a ( m ) d F z 1 / 2 = F γ 0 + ψ a ( m ) F γ 0 1 / 2 f ( z ˜ ) ψ a ( m ) 1 / 2 C ψ a / 2 ( m )
where f ( · ) is the p.d.f. of z and z ˜ γ 0 , γ 0 + ψ a ( m ) . Using the above inequalities gives
φ ( Z i ) E φ ( Z i ) | F i n ( m ) 2 φ ( Z i ) φ m ( Z i ) 2 + φ m ( Z i ) E φ m ( Z i ) | F i n ( m ) 2 + E φ ( Z i ) | F i n ( m ) E φ m ( Z i ) | F i n ( m ) 2 2 φ ( Z i ) φ m ( Z i ) 2 + φ m ( Z i ) E φ m ( Z i ) | F i n ( m ) 2 2 ψ a / 2 ( m ) + k α ¯ ψ 1 a ( m ) .
Now, minimize the order of magnitude of the variable in the last line by setting a = 2 / 3 . Thus, 1 Z i > γ 0 = 1 Y i > 0 is L 2 -NED on X i , ε i with the NED numbers ( 2 + k α ¯ ) ψ 1 / 3 ( m ) .
Next, observe that 1 Y i = 0 = 1 1 Y i > 0 . Hence, by Proposition 2 of Jenish and Prucha [5], 1 Y i = 0 is L 2 -NED on X i , ε i with the same NED numbers. Finally, noting that
z 1 z > γ 0 z 1 z > γ 0 z + φ ( z ) z z + φ ( z ) φ ( z )
and using Proposition 3 of Jenish and Prucha [5] with B ( z , z ) = z + φ ( z ) and r = 3 , we have that Y i is L 2 -NED on X i , ε i with the NED numbers c ψ 1 / 12 ( m ) , where c = 2 Y 6 3 / 2 Y 2 1 / 4 2 + k α ¯ 1 / 4 .

C. Proofs for Section 4

Proof of Proposition 1:
The proof of this proposition follows Carson and Sun [19]. The latter paper is not readily applicable to our model since it relies on LLN for independent processes. Let n 0 and n 1 denote, respectively, the sizes of the censored and uncensored subsamples, and let Y i , ( 0 ) and Y i , ( 1 ) denote observations from the censored and uncensored subsample, respectively. Conditional on the state variables X ¯ n , the uncensored subsample Y i , ( 1 ) is independent and follows a truncated normal distribution. Consequently,
P min Y 1 > γ + z / n | X ¯ n , n 1 = P Y 1 , ( 1 ) > γ + z / n , . . . , Y n 1 , ( 1 ) > γ + z / n | X ¯ n , n 1 = i = 1 n 1 1 Φ γ + z / n Z i n , ( 1 ) δ / σ Φ γ Z i n , ( 1 ) δ / σ 1 Φ γ Z i n , ( 1 ) δ / σ = i = 1 n 1 1 z n σ ϕ γ Z i n , ( 1 ) δ / σ + o p 1 1 Φ γ Z i n , ( 1 ) δ / σ
where o p 1 holds uniformly over i since ϕ · is uniformly continuous on R . Now, let
μ i n = 1 σ ϕ γ Z i n , ( 1 ) δ / σ 1 Φ γ Z i n , ( 1 ) δ / σ and μ = E μ i n
Next, we establish some inequalities for the r.v. μ i n . By Lemma D1, Z i n is L 2 -NED on X i with geometrically decaying coefficients. Then, by Proposition 3 of Jenish and Prucha [5], μ i n is also L 2 -NED on X i , which is mixing satisfying Assumption 3. Consequently, μ i n satisfies the LLN of Jenish and Prucha [5], i.e., 1 n 1 i = 1 n 1 μ i n = μ + o p 1 . By the Mill’s Ratio inequality, ϕ z / 1 Φ z K z for some K < , and hence, sup i , n E μ i n 6 M ¯ < , since sup i , n E Z i n 6 < by assumption. Next, let M k = max 1 i k μ i n , for k n and define sets A 1 ( c ) = M 1 > c , A k ( c ) = M k 1 c , μ k n > c for k = 2 , . . . , n . Clearly, M n > c k = 1 n A k ( c ) and A k ( c ) μ k n > c . Then, by the Markov inequality,
P max 1 i n μ i n > n 1 / 6 C k = 1 n P A k ( n 1 / 6 C ) k = 1 n P μ k n > n 1 / 6 C M ¯ C 6
which implies max 1 i n μ i n = O p n 1 / 6 . Now, using (C1) and the last inequality gives
1 n 1 log P min Y 1 > γ + z / n | X ¯ n , n 1 = 1 n 1 i = 1 n 1 log 1 z n μ i n + o p 1 n = 1 n 1 i = 1 n 1 z n μ i n + o p 1 n + O μ i n 2 n 2 = 1 n 1 i = 1 n 1 z n μ i n + o p 1 n + O max 1 i n μ i n 2 n 2 = z n 1 n 1 i = 1 n 1 μ i n + o p 1 n + O p n 5 / 3 = z n μ + o p 1 + o p 1 n = z μ n 1 + o p 1
Thus, P min Y 1 > γ + z / n | n 1 = exp n 1 n z μ 1 + o p 1 . Moreover, by Theorem 2, 1 Y i > γ is L 2 -NED on X i , ε i , and hence, satisfies the LLN of Jenish and Prucha [5]:
n 1 n = 1 n i = 1 n 1 Y i > γ p E 1 Φ γ Z i n δ / σ : = κ
Consequently, P min Y 1 > γ + z / n | n 1 exp a z , where a = κ μ . As the right-hand side of the last expression does not depend on n 1 , it is also the limit of the unconditional probability. Thus,
lim n P n γ ^ γ z = 1 exp a z for z > 0
Proof of Theorem 3:
By Proposition 1, γ 0 is identified, so it remains to prove identification of θ by showing that the population objective function Q θ , γ 0 = E m W i n , θ , γ 0 is uniquely maximized at θ 0 .

A. Identification in ML

To prove identification in the ML case, it suffices to verify the Kullback-Leibler information inequality, i.e., for all θ Θ s.t. θ θ 0
log f Y i | X ¯ n ; θ , γ 0 log f i Y i | X ¯ n ; θ 0 , γ 0 with positive probability.
Observe that log f Y i | X ¯ n ; θ , γ 0 log f i Y i | X ¯ n ; θ 0 , γ 0 = 1 Y i = 0 log Φ γ 0 Z i n δ σ log Φ γ 0 Z i n δ 0 σ 0 1 2 1 Y i > γ 0 log σ 2 log σ 0 2 1 Y i > γ 0 Y i Z i n δ 2 2 σ 2 Y i Z i n δ 0 2 2 σ 0 2 .
Clearly, log f Y i | X ¯ n ; θ , γ 0 log f Y i | X ¯ n ; θ 0 , γ 0 with positive probability if σ 2 σ 0 2 . Suppose the opposite were true, i.e., log f Y i | X ¯ n ; θ , γ 0 = log f Y i | X ¯ n ; θ 0 , γ 0 w.p.1. Then, we would have log σ 2 σ 0 2 Y i Z i n δ 0 2 σ 0 2 + Y i Z i n δ 2 σ 2 = 0 for Y i > γ 0 , which implies that the r.v. Y i Z i n δ 0 2 and Y i Z i n δ 2 would have a point mass, which is impossible since at least one of the regressors in X i has a full support by Assumption 4. So, σ 0 2 is identified, and we can focus on the case: σ 2 = σ 0 2 but δ δ 0 .
If δ δ 0 , then E Z i n δ Z i n δ 0 2 = E Z i n δ δ 0 2 = δ δ 0 E Z i n Z i n δ δ 0 > 0 , since E Z i n Z i n is positive definite by Assumption 4. Hence, Z i n δ Z i n δ 0 with positive probability. Consequently, Y i Z i n δ Y i Z i n δ 0 with positive probability. Since Z i n δ Z i n δ 0 with positive probability, and Φ · and log are strictly increasing functions, we also have log Φ γ 0 Z i n δ σ 0 log Φ γ 0 Z i n δ 0 σ 0 with positive probability, which proves (C2).

B. Identification in LS

To prove identification in the LS case, it suffices to verify that for all θ Θ s.t. θ θ 0
φ Z i n , θ , γ 0 φ Z i n , θ 0 , γ 0 with positive probability.
Recall φ ( Z i n , θ , γ 0 ) = Φ Z i n δ γ 0 σ Z i n δ + σ ϕ Z i n δ γ 0 σ . Re-parametrize: υ = δ / σ and σ = σ . This mapping is one-to-one, and hence its suffices to prove identification of the parameters υ 0 = δ 0 / σ 0 and σ 0 .
By similar arguments as in part A, if υ υ 0 , then Z i n υ Z i n υ 0 with positive probability. Denote u = Z i n υ and u 0 = Z i n υ 0 , and consider the function φ ˜ ( u , σ ) = σ u Φ u γ 0 / σ + ϕ u γ 0 / σ . It is strictly increasing in σ. To see this, let γ ˜ = γ 0 / σ and note that φ ˜ ( u , σ ) σ = u Φ u γ ˜ + 1 + γ ˜ 2 ϕ u γ ˜ > 0 , because if u < 0 , then u γ ˜ u < ϕ γ ˜ u 1 Φ γ ˜ u = ϕ u γ ˜ Φ u γ ˜ by the Mill’s Ratio inequality since γ ˜ = γ 0 / σ 0 . Moreover, φ ˜ ( u , σ ) is also strictly increasing in u:
φ ˜ ( u , σ ) u = σ Φ u γ ˜ + u ϕ u γ ˜ u γ ˜ ϕ u γ ˜ = σ Φ u γ ˜ + γ ˜ ϕ u γ ˜ > 0 ,
which proves identification of υ 0 and σ 0 , and thus completes the proof of the theorem. ∎

D. Proofs for Section 5

The following lemma collects formulas and some properties of the score and Hessian functions for the ML and LS estimators, which are used throughout the proofs. Let m i n = m ( W i n , θ , γ ) , s ( W i n , θ , γ ) = m i n θ = m i n δ ; m i n σ 2 denote the score function, and let the Hessian matrix be denoted by:
H ( W i n , θ , γ ) = 2 m i n θ θ = 2 m i n δ δ 2 m i n δ σ 2 2 m i n σ 2 δ 2 m i n σ 2 2
Lemma D1. 
Let ϕ i n = ϕ γ Z i n δ σ , Φ i n = Φ γ Z i n δ σ and φ i n = φ ( Z i n , θ , γ ) .
A. 
For the ML estimator, the components of the score and Hessian are given by:
m i n δ = 1 Y i = 0 ϕ i n σ Φ i n Z i n + 1 Y i > γ 1 σ 2 Y i Z i n δ Z i n m i n σ 2 = 1 Y i = 0 ϕ i n 2 σ 3 Φ i n γ Z i n δ 1 2 σ 2 1 Y i > γ + 1 2 σ 4 1 Y i > γ Y i Z i n δ 2 2 m i n δ δ = 1 Y i = 0 ϕ i n σ Φ i n 2 1 σ ϕ i n 1 σ 2 γ Z i n δ Φ i n Z i n Z i n 1 Y i > γ 1 σ 2 Z i n Z i n 2 m i n σ 2 δ = 1 Y i = 0 ϕ i n 2 σ 3 Φ i n 2 1 σ γ Z i n δ ϕ i n Φ i n + 1 σ 2 γ Z i n δ 2 Φ i n Z i n 1 σ 4 1 Y i > γ Y i Z i n δ Z i n 2 m i n σ 2 2 = 1 Y i = 0 σ 5 ϕ i n 4 Φ i n 2 Z i n δ γ 3 σ 2 Φ i n 3 γ Z i n δ Φ i n Z i n δ γ 2 σ ϕ i n + 1 2 σ 4 1 Y i > γ 1 σ 6 1 Y i > γ Y i Z i n δ 2
B. 
For the LS estimator, the components of the score and Hessian are given by:
m i n δ = 2 Y i φ i n Φ i n + γ σ ϕ i n Z i n , m i n σ 2 = Y i φ i n ϕ i n γ γ Z i n δ γ σ 3 + 1 σ 2 m i n δ δ = 2 Φ i n + γ σ ϕ i n 2 Z i n Z i n + 2 Y i φ i n ϕ i n σ 1 Z i n δ γ γ σ 2 Z i n Z i n 2 m i n σ 2 δ = ϕ i n Φ i n + γ σ ϕ i n γ γ Z i n δ γ + σ 2 σ 3 Y i φ i n γ γ Z i n δ γ 2 + σ 2 γ Z i n δ γ σ σ 4 Z i n 2 m i n σ 2 2 = ϕ i n 2 γ 2 Z i n δ σ 3 + 1 σ 2 + 1 2 Y i φ i n ϕ i n Z i n δ γ 3 γ σ 7 + Z i n δ γ Z i n δ + 2 γ σ 5 1 σ 3
C. 
Under Assumptions 1–3(i), m ( W i n , θ , γ ) and H W i n , θ 0 , γ 0 are L 1 -NED on X i , ε i with coefficients decaying at geometric rates, and the score s ( W i n , θ 0 , γ 0 ) is L 2 -NED on X i , ε i with coefficients decaying at geometric rates.
The rates of the NED coefficients are required only for verification of the assumptions of the CLT and LLN for NED processes of Jenish and Prucha [5], which rely on polynomial decay rates. Clearly, the NED coefficients declining at any geometric rate will automatically satisfy these theorems, and their exact orders of magnitude are unimportant for the proofs our results.
Proof of Lemma D1:  Parts A and B follow by straightforward differentiation. To show part C, observe that by Theorem 2 and Proposition 3 of Jenish and Prucha [5], 1 Y i n > γ 0 , 1 Y i n = 0 , Φ γ Z i n δ σ , ϕ γ Z i n δ σ and Y i Z i n δ 2 are L 2 -NED on X i , ε i with the NED coefficients decaying at geometric rates. Then, by Theorem 17.9 of Davidson [21], all products and sums of L 2 -NED terms are L 1 -NED variables with the NED coefficients decaying at the slowest rate of the multiples or summands. Thus, m i n is L 1 -NED on X i , ε i for both the ML and LS estimators. B analogous arguments, the Hessians H W i n , θ 0 , γ 0 are L 1 -NED on X i , ε i for both the ML and LS estimators. Finally, the score of the ML estimator is L 2 -NED on X i , ε i with geometric decay rates by Example 17.17 of Davidson [21] as the sum of products of 1 Y i = 0 or 1 Y i > γ , which are bounded and L 2 -NED, and some smooth functions, each of which is L 2 -NED on X i , ε i with geometric decay rates. By analogous arguments, the score of the LS estimator is also L 2 -NED on X i , ε i with geometric decay rates. ∎
Proof of Theorem 4:  We first show consistency. Let Q θ , γ = E m W i n , θ , γ . To prove consistency of θ ^ n , it suffices to verify assumptions of Lemma A-1 of Andrews [22], namely: (a) sup θ Θ Q n θ , γ ^ Q θ , γ 0 p 0 , and (b) for every neighborhood Θ 0 of θ, sup θ Θ / Θ 0 Q θ , γ 0 < Q θ 0 , γ 0 . Condition (b) is satisfied both for the ML and LS estimators by the identification theorem, Theorem 3. So, it only remains to verify condition (a).
A. We first verify condition (a) for the ML estimator. Note that
sup θ Θ Q n θ , γ ^ Q θ , γ 0 sup θ Θ Q n θ , γ 0 Q θ , γ 0 + sup θ Θ Q n θ , γ ^ Q n θ , γ 0
We now show that both terms on the r.h.s. of this inequality go to zero in probability. Consider the first term. By Lemma D1, m W i n , θ , γ 0 is L 1 -NED on X i , ε i with the NED coefficients decaying at a geometric rate, and X i , ε i is α-mixing satisfying Assumption 3(ii). Moreover, E m W i n , θ , γ < . Then, by Theorem 1 of Jenish and Prucha [5], m W i n , θ , γ 0 satisfies a pointwise LLN on Θ. Next, since Θ is compact, and m W i n , θ , γ 0 is continuously differentiable in θ, m W i n , θ , γ 0 satisfies a Lipschitz condition in θ with a coefficient satisfying: sup i , n sup Θ E m W i n , θ , γ 0 θ < , by Assumption 5. Thus, m W i n , θ , γ 0 is L 1 -stochastically equicontinuous on Θ, and hence, by the ULLN of Jenish and Prucha [23], the first term on the r.h.s. of (D3) converges to zero.
Next, write out the second term in (D3) as
Q n θ , γ ^ Q n θ , γ 0 = 1 n i = 1 n 1 Y i = 0 log Φ γ ^ Z i n δ σ log Φ γ 0 Z i n δ σ + 1 n i = 1 n log σ 1 ϕ Y i Z i n δ σ 1 Y i > γ ^ 1 Y i > γ 0
By construction, γ ^ = min Y i : Y i > 0 γ 0 , and hence,
1 Y i > γ ^ 1 Y i > γ 0 = 0 , if Y i Y 1 = γ ^ 1 Y i > γ 0 , if Y i = min Y i : Y i > 0 = γ ^
Note that the minimal value in the uncensored subsample, Y i > 0 , is attained only by a single observation in the subsample. This is because the variables Y i are i.i.d. continuously distributed on γ 0 , + so that the probability of the event Y i = Y j , i j , is zero. Then,
Q n θ , γ ^ Q n θ , γ 0 = 1 n i = 1 n 1 Y i = 0 log Φ γ ^ Z i n δ σ log Φ γ 0 Z i n δ σ 1 n 1 γ ^ > γ 0 log σ 1 ϕ γ ^ Z i n , ( 1 ) δ σ
Since log Φ z is continuously differentiable on R , we have by the Mean Value Theorem:
log Φ γ ^ Z i n δ σ log Φ γ 0 Z i n δ σ = ϕ γ ˜ Z i n δ σ σ Φ γ ˜ Z i n δ σ γ ^ γ 0
where γ ˜ is between γ ^ and γ 0 , and γ ˜ p γ 0 . Then,
sup θ Θ Q n θ , γ ^ Q n θ , γ 0 γ ^ γ 0 1 n i = 1 n sup θ Θ ϕ γ ˜ Z i n δ σ σ Φ γ ˜ Z i n δ σ + 1 n 1 γ ^ > γ 0 sup θ Θ log σ 1 ϕ γ ^ Z i n , ( 1 ) δ σ p 0
since γ ^ p γ 0 , sup x R ϕ x Φ x C , σ σ 1 , σ 2 , σ 1 > 0 , E sup δ Y i n Z i n δ 2 < , by Assumption 5, and by Proposition 1: E 1 γ ^ > γ 0 = P γ ^ > γ 0 = P n γ ^ γ 0 > 0 1 . Thus, sup θ Θ Q n θ , γ ^ Q θ , γ 0 p 0 , and hence, the ML estimator θ ^ n p θ 0 .
B. We now verify condition (a) for the LS estimator. Using similar arguments as in part A, m W i n , θ , γ 0 is L 1 -NED and L 1 -stochastically equicontinuous on Θ, and hence, by the ULLN of Jenish and Prucha [23], the first term on the r.h.s. of (D3) converges to zero.
As for the second term, observe that by the Mean Value Theorem:
Q n θ , γ ^ Q n θ , γ 0 = 1 n i = 1 n m W i n , θ , γ ^ m W i n , θ , γ 0 = 1 n i = 1 n φ ( W i n , θ , γ 0 ) φ ( W i n , θ , γ ^ ) 2 Y i + φ ( θ , γ 0 ) + φ ( θ , γ ^ ) = γ ^ γ 0 1 n i = 1 n γ 0 σ ϕ Z i n δ γ ˜ σ 2 Y i + φ ( θ , γ 0 ) + φ ( θ , γ ^ ) p 0
since γ ^ p γ 0 and the r.v. in braces is O p 1 , where φ ( W i n , θ , γ ) = Φ Z i n δ γ σ Z i n δ + σ ϕ Z i n δ γ σ and φ γ = γ ϕ Z i n δ γ / σ / σ . This completes the proof of consistency
Proof of asymptotic normality: Taking the mean value expansion of the first-order conditions around θ 0 gives:
0 = Q n θ ^ , γ ^ θ = 1 n i = 1 n s W i n , θ 0 , γ ^ + 1 n i = 1 n H W i n , θ ˜ , γ ^ θ ^ n θ 0
where θ ˜ lies between θ ^ and θ 0 , and θ ^ n p θ 0 . For both ML and LS estimators, we verify below that
1 n i = 1 n H W i n , θ ˜ , γ ^ p H 0 = E H W i n , θ 0 , γ 0
where H 0 is nonsingular by assumption. Then,
n θ ^ n θ 0 = H 0 1 1 n i = 1 n s W i n , θ 0 , γ ^ E s W i n , θ 0 , γ ^ H 0 1 1 n i = 1 n E s W i n , θ 0 , γ ^ + o p ( 1 )
Define: ν n ( γ ) = n 1 / 2 i = 1 n s W i n , θ 0 , γ E s W i n , θ 0 , γ , and re-write
n θ ^ n θ 0 = H 0 1 ν n ( γ 0 ) H 0 1 ν n ( γ ^ ) ν n ( γ 0 ) H 0 1 1 n i = 1 n E s W i n , θ 0 , γ ^ + o p ( 1 )
By Lemma D1, s W i n , θ 0 , γ 0 is L 2 -NED on X i , ε i with the NED coefficients decaying at a geometric rate both in the case of the ML and LS estimators. In turn, X i , ε i is mixing with α-mixing coefficients satisfying Assumption 3. Then, by the CLT of Jenish and Prucha [5], ν n ( γ 0 ) d N ( 0 , S 0 ) , where S 0 = V a r s ( W i n , θ 0 , γ 0 ) . So to prove the theorem, it remains to verify (D5) and show that
ν n ( γ ^ ) ν n ( γ 0 ) p 0 ,
n 1 / 2 i = 1 n E s W i n , θ 0 , γ ^ p 0

A. ML estimator

We first verify (D7). Note that by the population log likelihood equality
E s W i n , θ 0 , γ 0 = E θ log f W i n , θ 0 , γ 0 = 0
Next partition E s W i n , θ 0 , γ = E δ m ( W i n , θ 0 , γ ) ; E σ 2 m ( W i n , θ 0 , γ ) and use formulas (D1) to find the conditional expectations for γ γ 0 :
h δ ( W i n , θ 0 , γ ) E δ m ( W i n , θ 0 , γ ) | X ¯ n = Φ γ 0 Z i n δ 0 σ 0 ϕ γ Z i n δ 0 σ 0 σ 0 Φ γ Z i n δ 0 σ 0 Z i n + Z i n σ 0 2 E ε i 1 ε i > γ Z i n δ 0 | X ¯ n = Φ γ 0 Z i n δ 0 σ 0 ϕ γ Z i n δ 0 σ 0 σ 0 Φ γ Z i n δ 0 σ 0 Z i n + ϕ Z i n δ 0 γ σ 0 Φ Z i n δ 0 γ σ 0 Z i n σ 0
h σ ( W i n , θ 0 , γ ) E σ 2 m ( W i n , θ 0 , γ ) | X ¯ n = E 1 Y i = 0 | X ¯ n ϕ γ Z i n δ 0 σ 0 2 σ 0 3 Φ γ Z i n δ 0 σ 0 γ Z i n δ 0 1 2 σ 0 2 E 1 Y i > γ | X ¯ n + 1 2 σ 0 4 E 1 Y i > γ Y i Z i n δ 0 2 | X ¯ n = Φ γ 0 Z i n δ 0 σ 0 ϕ γ Z i n δ 0 σ 0 2 σ 0 3 Φ γ Z i n δ 0 σ 0 γ Z i n δ 0 1 2 σ 0 2 Φ Z i n δ 0 γ σ 0 + 1 2 σ 0 2 Z i n 1 + γ Z i n δ 0 ϕ Z i n δ 0 γ σ 0 Φ Z i n δ 0 γ σ 0
since by Amemiya [24], E ε i 2 1 ε i > γ Z i n δ 0 | X ¯ n = σ 0 2 1 Z i n δ 0 γ ϕ Z i n δ 0 γ σ 0 / Φ Z i n δ 0 γ σ 0 .
By the Law of Iterated Expectations, E s W i n , θ 0 , γ = E h δ ( W i n , θ 0 , γ ) , E h σ ( W i n , θ 0 , γ ) .
Next, observe that h δ ( W i n , θ 0 , γ ) and h σ ( W i n , θ 0 , γ ) are continuous in γ uniformly on Γ , since Γ is compact. Moreover, h δ ( W i n , θ 0 , γ ) and h σ ( W i n , θ 0 , γ ) both satisfy domination conditions in γ in the ϵ-neighborhood of γ 0 . Then, since γ ^ γ 0 and γ ^ γ 0 in probability,
p lim n E h δ ( W i n , θ 0 , γ ^ ) = E p lim n h δ ( W i n , θ 0 , γ ^ ) = E h δ ( W i n , θ 0 , γ 0 ) = 0 p lim n E h σ ( W i n , θ 0 , γ ^ ) = E p lim n h σ ( W i n , θ 0 , γ ^ ) = E h σ ( W i n , θ 0 , γ 0 ) = 0
by (D8), which verifies (D7). We next verify (D6). Write
ν n ( γ ^ ) ν n ( γ 0 ) = n 1 / 2 i = 1 n s W i n , θ 0 , γ ^ s W i n , θ 0 , γ 0 n 1 / 2 i = 1 n E s W i n , θ 0 , γ ^
since E s W i n , θ 0 , γ 0 = 0 . It was already shown that n 1 / 2 i = 1 n E s W i n , θ 0 , γ ^ p 0 . It remains to show that n 1 / 2 i = 1 n s W i n , θ 0 , γ ^ s W i n , θ 0 , γ 0 p 0 . Partition s W i n , θ 0 , γ ^ s W i n , θ 0 , γ 0 = s 1 i γ ^ , γ 0 , s 2 i γ ^ , γ 0 , where
s 1 i γ ^ , γ 0 = σ 0 1 1 Y i = 0 Z i n ϕ γ 0 Z i n δ 0 σ 0 Φ γ 0 Z i n δ 0 σ 0 ϕ γ ^ Z i n δ 0 σ 0 Φ γ ^ Z i n δ 0 σ 0 + 1 σ 0 2 Y i Z i n δ 0 Z i n 1 Y i > γ ^ 1 Y i > γ 0 ; s 2 i γ ^ , γ 0 = 1 2 σ 0 3 1 Y i = 0 ϕ γ ^ Z i n δ 0 σ 0 Φ γ ^ Z i n δ 0 σ 0 γ ^ Z i n δ 0 ϕ γ 0 Z i n δ 0 σ 0 Φ γ 0 Z i n δ 0 σ 0 γ 0 Z i n δ 0 1 2 σ 0 2 1 Y i > γ ^ 1 Y i > γ 0 + 1 2 σ 0 4 Y i Z i n δ 0 2 1 Y i > γ ^ 1 Y i > γ 0
Using (D4) and similar arguments as in the proof of consistency, we have
n 1 / 2 i = 1 n s 1 i γ ^ , γ 0 = n 1 / 2 i = 1 n σ 0 1 1 Y i = 0 Z i n ϕ γ 0 Z i n δ 0 σ 0 Φ γ 0 Z i n δ 0 σ 0 ϕ γ ^ Z i n δ 0 σ 0 Φ γ ^ Z i n δ 0 σ 0 n 1 / 2 1 σ 0 2 γ ^ Z i n , ( 1 ) δ 0 Z i n , ( 1 ) 1 γ ^ > γ 0 : = A 1 n + A 2 n ,
n 1 / 2 i = 1 n s 2 i γ ^ , γ 0 = 1 2 σ 0 3 n 1 / 2 i = 1 n 1 Y i = 0 ϕ γ ^ Z i n δ 0 σ 0 Φ γ ^ Z i n δ 0 σ 0 γ ^ Z i n δ 0 ϕ γ 0 Z i n δ 0 σ 0 Φ γ 0 Z i n δ 0 σ 0 γ 0 Z i n δ 0 + 1 2 σ 0 2 n 1 / 2 1 γ ^ > γ 0 1 2 σ 0 4 n 1 / 2 γ ^ Z i n , ( 1 ) δ 0 2 1 γ ^ > γ 0
Since the function g ( z ) = ϕ z / Φ z is continuously differentiable on R , we have by the Mean Value Theorem: g γ ^ Z i n δ 0 σ 0 g γ 0 Z i n δ 0 σ 0 = σ 0 1 g γ ˜ Z i n δ 0 σ 0 γ ^ γ 0 , where γ ˜ is between γ ^ and γ 0 , and γ ˜ p γ 0 . Hence, the first term in n 1 / 2 i = 1 n s 1 i γ ^ , γ 0 satisfies
A 1 n = n 1 / 2 n γ ^ γ 0 n 1 i = 1 n 1 Y i = 0 Z i n g γ ˜ Z i n δ 0 σ 0 = n 1 / 2 O p ( 1 ) O p ( 1 ) = o p ( 1 )
since n γ ^ γ 0 = O p ( 1 ) by Proposition 1. The second term satisfies
E A 2 n n 1 / 2 E γ ^ Z i n , ( 1 ) δ 0 Z i n , ( 1 ) 2 1 / 2 P 1 / 2 n γ ^ γ 0 > 0 0
since P n γ ^ γ 0 > 0 1 by Proposition 1. Hence, n 1 / 2 i = 1 n s 1 i γ ^ , γ 0 = o p ( 1 ) . Similarly, n 1 / 2 i = 1 n s 2 i γ ^ , γ 0 = o p ( 1 ) . Thus, n 1 / 2 i = 1 n s W i n , θ 0 , γ ^ s W i n , θ 0 , γ 0 = o p ( 1 ) , proving (D6).
Finally, we verify (D5). Write
1 n i = 1 n H i n θ ˜ , γ ^ H 0 1 n i = 1 n H i n θ ˜ , γ ^ H i n θ 0 , γ 0 + 1 n i = 1 n H i n θ 0 , γ 0 H 0
We need to show that each of the terms on the r.h.s. of the last inequality is o p ( 1 ) . The second term is o p ( 1 ) by the LLN of Jenish and Prucha [5] since H W i n , θ 0 , γ 0 is L 1 -NED on X i , ε i .
Now consider the first term. Using formulas (D1), we have
2 m i n θ ˜ , γ ^ δ δ 2 m i n θ 0 , γ 0 δ δ = 1 Y i = 0 g 1 i n θ ˜ , γ ^ g 1 i n θ 0 , γ 0 Z i n Z i n 1 Y i > γ ^ 1 Y i > γ 0 1 σ 2 Z i n Z i n
where g 1 i n θ , γ = ϕ γ Z i n δ σ σ Φ 2 γ Z i n δ σ 1 σ ϕ γ Z i n δ σ 1 σ 2 Φ γ Z i n δ σ γ Z i n δ is continuously differentiable in θ , γ , and hence by the Mean Value Theorem
g 1 i n θ ˜ , γ ^ g 1 i n θ 0 , γ 0 = g 1 i n θ * , γ * θ θ ˜ θ 0 + g 1 i n θ * , γ * γ γ ^ γ 0
where θ * , γ * lie between θ ˜ , γ ^ and θ 0 , γ 0 . Then, using (D4) and similar arguments as in the proof of consistency, we have
1 n i = 1 n 2 m i n θ ˜ , γ ^ δ δ 2 m i n θ 0 , γ 0 δ δ θ ˜ θ 0 1 n i = 1 n g 1 i n θ * , γ * θ Z i n Z i n + γ ^ γ 0 1 n i = 1 n g 1 i n θ * , γ * γ Z i n Z i n + 1 n 1 γ ^ > γ 0 σ 2 Z i n , ( 1 ) Z i n , ( 1 ) p 0
since θ ˜ θ 0 p 0 , γ ^ γ 0 p 0 , and E 1 γ ^ > γ 0 = P n γ ^ γ 0 > 0 1 . By similar arguments,
1 n i = 1 n 2 m i n θ ˜ , γ ^ σ 2 2 2 m i n ( θ 0 , γ 0 ) σ 2 2 p 0 and 1 n i = 1 n 2 m i n θ ˜ , γ ^ σ 2 δ 2 m i n θ 0 , γ 0 σ 2 δ p 0
which verifies (D5), and hence, asymptotic normality of the MLE.

B. LS estimator

The proof of (D5) for the LSE is analogous to that in the case of the MLE. Next, by (D2), the score s i n θ 0 , γ is continuously differentiable in γ, and hence, by the Mean Value Theorem,
n 1 / 2 i = 1 n s i n θ 0 , γ ^ s i n θ 0 , γ 0 n 1 / 2 n γ ^ γ 0 1 n i = 1 n s i n θ 0 , γ ˜ γ p 0
where γ ˜ is between γ ^ and γ , since n γ ^ γ 0 = O p ( 1 ) and n 1 i = 1 n s i n θ 0 , γ ˜ / γ = O p ( 1 ) . Now, observe that ν n ( γ ^ ) ν n ( γ 0 ) = n 1 / 2 i = 1 n s i n θ 0 , γ ^ s i n θ 0 , γ 0 n 1 / 2 i = 1 n E s i n θ 0 , γ ^ . By the population moment condition, E s i n θ 0 , γ 0 = E θ m W i n , θ 0 , γ 0 = 0 , and hence
n 1 / 2 i = 1 n E s i n θ 0 , γ ^ E s i n θ 0 , γ 0 n 1 / 2 n γ ^ γ 0 1 n i = 1 n E s i n θ 0 , γ ˜ γ p 0
which verifies both (D7) and (D6). The proof of the theorem is thus complete. ∎

References

  1. P. Bajari, H. Hong, and D. Nekipelov. “Game theory and econometrics: A survey of some recent research.” Working Paper, University of Minnesota, Minneapolis, MN, USA, 2010. [Google Scholar]
  2. A. De Paula. “Econometric analysis of games with multiple equilibria.” Annu. Rev. Econ. 5 (2013): 107–131. [Google Scholar] [CrossRef]
  3. K. Menzel. “Inference for games with many players.” Working Paper, NYU, New York, NY, USA, 2012. [Google Scholar]
  4. H. Xu. “Social interactions: A game theoretical approach.” Working Paper, University of Texas at Austin, Texas, Austin, 2011. [Google Scholar]
  5. N. Jenish, and I.R. Prucha. “On spatial processes and asymptotic inference under near-epoch dependence.” J. Econom. 170 (2012): 178–190. [Google Scholar] [CrossRef] [PubMed]
  6. W. Brock, and S. Durlauf. “Discrete choice with social interaction.” Rev. Econ. Stud. 68 (2001): 235–260. [Google Scholar] [CrossRef]
  7. C.-L. Su, and K.L. Judd. “Constrained optimization approaches to estimation of structural models.” Econometrica 80 (2012): 2213–2230. [Google Scholar] [CrossRef]
  8. P. Bajari, H. Hong, J. Krainer, and D. Nekipelov. “Estimating static models of strategic interactions.” J. Bus. Econ. Stat. 28 (2010): 469–482. [Google Scholar] [CrossRef]
  9. X. Xu, and L.-F. Lee. “Maximum likelihood estimation of a spatial autoregressive Tobit model.” Working Paper, The Ohio State University, Columbus, OH, USA, 2014. [Google Scholar]
  10. D.B. Audretsch, and M.P. Feldman. “R&D spillovers and the geography of innovation and production.” Am. Econ. Rev. 86 (1996): 253–273. [Google Scholar]
  11. R. Levin, and P. Reiss. “Cost-reducing and demand-creating R&D with spillovers.” Rand J. Econ. 19 (1988): 538–556. [Google Scholar]
  12. C. d’Aspremont, and A. Jacquemin. “Cooperative and noncooperative R&D in duopoly with spillovers.” Am. Econ. Rev. 78 (1988): 1133–1137. [Google Scholar]
  13. M. Motta. “Cooperative R&D and vertical product differentiation.” Int. J. Ind. Org. 10 (1992): 643–661. [Google Scholar]
  14. W.M. Cohen, and S. Klepper. “The anatomy of industry R&D intensity distributions.” Am. Econ. Rev. 82 (1992): 773–788. [Google Scholar]
  15. X. Gonzalez, and J. Jaumandreu. “Threshold effects in product R&D decisions: Theoretical Framework and Empirical Analysis. Studies on the Spanish Economy, FEDEA.” Available online: http://econpapers.repec.org/paper/fdafdaeee/45.htm (accessed on 10 October 2013).
  16. A. Dixit, and J. Stiglitz. “Monopolistic competition and optimum product diversity.” Am. Econ. Rev. 67 (1977): 297–308. [Google Scholar]
  17. S. Durlauf. “A framework for the study of individual behavior and social interactions.” Unpublished Manuscript. 2001. [Google Scholar]
  18. N. Jenish. “Nonparametric spatial regression under near-epoch dependence.” J. Econom. 167 (2012): 224–239. [Google Scholar] [CrossRef]
  19. R. Carson, and Y. Sun. “The Tobit model with a non-zero threshold.” Econom. J. 10 (2007): 488–502. [Google Scholar] [CrossRef]
  20. N. Jenish. “Spatial semiparametric model with endogenous regressors.” Econometric Theory, 2014. forthcoming. [Google Scholar] [CrossRef]
  21. J. Davidson. Stochastic Limit Theory. Oxford, UK: Oxford University Press, 1994. [Google Scholar]
  22. D.W.K. Andrews. “Asymptotics for semi-parametric econometric models via stochastic equicontinuity.” Econometrica 62 (1994): 43–72. [Google Scholar] [CrossRef]
  23. N. Jenish, and I.R. Prucha. “Central limit theorems and uniform laws of large numbers for arrays of random fields.” J. Econom. 150 (2009): 86–98. [Google Scholar] [CrossRef] [PubMed]
  24. T. Amemiya. “Regression analysis when the dependent variable is truncated normal.” Econometrica 41 (1973): 997–1016. [Google Scholar] [CrossRef]
  • 1For the definition of mixing coefficients, see [18].

Share and Cite

MDPI and ACS Style

Jenish, N. Strategic Interaction Model with Censored Strategies. Econometrics 2015, 3, 412-442. https://doi.org/10.3390/econometrics3020412

AMA Style

Jenish N. Strategic Interaction Model with Censored Strategies. Econometrics. 2015; 3(2):412-442. https://doi.org/10.3390/econometrics3020412

Chicago/Turabian Style

Jenish, Nazgul. 2015. "Strategic Interaction Model with Censored Strategies" Econometrics 3, no. 2: 412-442. https://doi.org/10.3390/econometrics3020412

Article Metrics

Back to TopTop