**1. Introduction**

Concerns about the inadequacy of standard practices in statistics and econometrics have been long-standing. Since the 1980s, criticisms of econometric practice, including those of Leamer (1983); Lovell (1983); McCloskey (1985) have given rise to a large literature. Kim and Ji (2015) provide a survey. Even more significant, within the economics profession was the Lucas (1976) critique of Keynesian economic modelling Lucas and Sargent (1981).

Most of the concerns raised in these critiques apply with equal force to other social science disciplines and to fields such as public health and medical science. Ioannidis (2005) offered a particularly trenchant critique, concluding that "most published research findings are false".

The emergence of the "replication crisis", first in psychology and then in other disciplines, has attracted the broader public to some of these concerns. The applicability of the term "crisis" is largely due to the fact that, unlike previous debates of this kind, concern over replication failures has spilled over disciplinary boundaries and into the awareness of the educated public at large.

The simplest form of the replication crisis arises from the publication of a study suggesting the existence of a causal relationship between an outcome of interest *y* and a previously unconsidered explanatory variable *x* followed by studies with a similar design that fail to find such a relationship.

Most commonly, this process takes place in the context of classical inference. In this framework, the crucial step is the rejection of a null hypothesis of no effect, with some specified level of significance, typically 95 per cent or 90 per cent. In the most commonly used definition, a replication failure arises when a subsequent study testing the same hypothesis with similar methods but with a different population fails to reject the null.<sup>1</sup>

For example, Kosfeld et al. (2005) found that exposure to oxytocin increases trust in humans. This finding created substantial interest in possible uses of oxytocin to change mood, potentially for malign as well as benign purposes. Similar results were published by Mikolajczak et al. (2010). However, a subsequent attempt at replication by the same research team, Lane et al. (2015), was unsuccessful.

<sup>1</sup> As an anonymous referee points out, this characterisation of replication failure is too stringent since some failures to reject the null are to be expected. More stringent definitions of replication failure are that the null is not rejected using the pooled data or that the parameter estimates from the two studies are (statistically) significantly different.

The crisis was brought to the wider public's attention by the publication by Open Science Collaboration (2015) of a systematic attempt to replicate 100 experimental results published in in three psychology journals. Replication effects were half the magnitude of original effects, representing a substantial decline. Whereas ninety-seven percent of original studies had statistically significant results, only thirty-six percent of replications did.

A variety of responses have been offered in response to the replication crisis. These include tightening the default P-value threshold to 0.005 (Benjamin et al. 2018), procedural improvements such as the maintenance of data sets and the preregistration of hypotheses ((Nosek and Lakens 2014), attempts to improve statistical practice within the classical framework, for example, through bootstrapping (Lubke and Campbell 2016)), and the suggestion that Bayesian approaches might be less vulnerable to these problems (Gelman 2015).

This paper begins with the observation that the constrained maximisation central to model estimation and hypothesis testing may be interpreted as a kind of profit maximisation. The output of estimation is a model that maximises some measure of model fit, subject to costs that may be interpreted as the shadow price of constraints imposed on the model. This approach recalls the observation of Johnstone (1988) "That research workers in applied fields continue to use significance tests routinely may be explained by forces of supply and demand in the market for statistical evidence, where the commodity traded is not so much evidence, but "statistical significance."<sup>2</sup>

In mainstream economics, an unsatisfactory market outcome is taken as prima facie evidence of a "market failure", in which prices are not equal to social opportunity costs.<sup>3</sup>

In this paper, we will consider the extent to which the replication crisis, along with broader problems in statistical and econometric practice, may be seen as a failure in the market that generates published research.

### **2. Model Selection as an Optimisation Problem**

In the general case, we consider data (*<sup>X</sup>*, *y*) where *y* is the variable (or vector of variables) of interest, and *X* is a set of potential explanatory variables. We consider a finite set of models M, with typical element *m*. The set of models may be partitioned into classes M*<sup>κ</sup>*, where *κ* = 1 ... *K*. Typically, although not invariably, lower values of *κ* correspond to more parsimonious and, therefore, more preferred models.

For a given model *m*, the object of estimation is to choose parameters *β*∗ (*m*; *X*, *y*) to maximise a value function *V* (*β*; *X*, *y*), such as log likelihood or explained sum of squares. Define

$$V^\*\left(m; \mathcal{X}, y\right) = \max\_{\beta} V\left(\beta; \mathcal{X}, y\right) = V\left(\beta^\*\left(m; \mathcal{X}, y\right); \mathcal{X}, y\right). \tag{1}$$

The model selection problem is to choose *m* to maximise the global objective function

$$
\Pi\left(m; X, y\right) = V^\*\left(m; X, y\right) - \mathbb{C}\left(m\right),
\tag{2}
$$

where *C* (*m*) is a cost function. Given the interpretation of *V* as a value function and *C* as a cost function, Π may be regarded as a profit function.

<sup>2</sup> I am indebted to an anonymous referee for pointing out this article, foreshadowing my central point.

<sup>3</sup> Quiggin (2019) extends this analysis to encompass issues such as unemployment and inequality.

### *2.1. Linear Case*

We will confine our attention to linear models with a single variable of interest *y* and *N* observations on a set of *K* potential explanatory variables *X* = (*<sup>x</sup>*1...*xK*). The generic model of this form is

$$\begin{array}{rcl} \mathcal{Y} & = & X\beta + \varepsilon \\\\ \mathcal{R}\beta & = & \upsilon \end{array} \tag{3}$$

where

> *Y* is an *N* × 1 vector of observations on *y*;

**X** is an *N* × *K* matrix of observations on *X*;

*β* is a *K* × 1 vector of parameters;

*ε* is an *N* × 1 error term;

*R* is a *J* × *K* vector of constraints, where *J* < *K* and *R* has full rank;

the special case of ordinary least squares (OLS) is that of *k* unconstrained explanatory variables. 

In this case, *J* = *K* − *k* and *R* = -**0***k* **I***K*−*<sup>k</sup>* . The model may be written without explicit constraints as

$$
\Upsilon = \mathbf{X}\boldsymbol{\beta} + \boldsymbol{\varepsilon}, \tag{4}
$$

where

> *Y* is an *N* × 1 vector; of observations on *y* **X** is an *N* × *k* matrix of observations on (*<sup>x</sup>*1...*xk*); *β* is a *k* × 1 vector of parameters; *ε* is an *N* × 1 error term, distributed as iid.*N*(0, *σ*)

*2.2. Value Functions, Cost Functions, and Profit Functions*
