Next Article in Journal
Occlusion-Based Explanations in Deep Recurrent Models for Biomedical Signals
Previous Article in Journal
A Novel Framework for Anomaly Detection for Satellite Momentum Wheel Based on Optimized SVM and Huffman-Multi-Scale Entropy
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Distributional Replication

School of Economics, University of Sydney, Sydney, NSW 2006, Australia
Entropy 2021, 23(8), 1063; https://doi.org/10.3390/e23081063
Submission received: 9 June 2021 / Revised: 12 August 2021 / Accepted: 14 August 2021 / Published: 17 August 2021
(This article belongs to the Section Multidisciplinary Applications)

Abstract

:
A function which transforms a continuous random variable such that it has a specified distribution is called a replicating function. We suppose that functions may be assigned a price, and study an optimization problem in which the cheapest approximation to a replicating function is sought. Under suitable regularity conditions, including a bound on the entropy of the set of candidate approximations, we show that the optimal approximation comes close to achieving distributional replication, and close to achieving the minimum cost among replicating functions. We discuss the relevance of our results to the financial literature on hedge fund replication; in this case, the optimal approximation corresponds to the cheapest portfolio of market index options which delivers the hedge fund return distribution.

1. Introduction

Suppose that X and Y are random variables. In this paper we consider estimating a function θ such that θ ( X ) and Y have the same distribution. Such a function is said to be a replicating function. Typically, there are many different replicating functions for a given pair of random variables X and Y. We suppose that to each function θ there corresponds a “price”, denoted p ( θ ) , and we seek to estimate the replicating function θ for which p ( θ ) is as small as possible. That is, we seek to estimate the cheapest replicating function for a given X and Y. To estimate this function from a sample of realizations of X and Y, we first obtain an estimate of the set of all replicating functions. The estimated set is formed by choosing a rich but manageable class of functions (i.e., a sieve space) and taking all those functions θ in that class for which the distance between the empirical distributions of θ ( X ) and Y is small. Our estimate of the cheapest replicating function is then obtained by minimizing p over the estimated set of replicating functions.
Our research is motivated by a literature in applied finance on “hedge fund replication”. The hedge fund replication literature is concerned with the possibility of achieving financial returns that resemble those of a particular hedge fund, fund of hedge funds, or index of hedge funds, by engaging in an investment strategy that does not involve a direct investment in the fund or funds in question. Ideally, the replicating strategy should involve trading assets that are highly liquid, thereby avoiding the barriers to entry, lock-in periods and high fees that are characteristic of hedge fund investments. Several major investment banks have launched hedge fund replication products, including Goldman Sachs and Merrill Lynch in 2006 and J.P. Morgan in 2007 [1]. Hedge fund replication strategies have also attracted the attention of the popular press, with articles appearing in The Wall Street Journal [2] and The New Yorker [3], among other outlets. Simonian and Wu [4] have recently described the proliferation of hedge fund replication strategies in investing as a “cottage industry”.
There are two broad streams of the hedge fund replication literature. In one stream, researchers have considered the direct approximation of hedge fund returns by investing in a portfolio of other assets. By direct approximation, we mean that the returns from the selected portfolio should be close to the hedge fund returns with high probability. Typically, the replicating strategy amounts to estimating a factor model for hedge fund returns, and then investing directly in the factors rather than in the hedge fund. Hasanhodzic and Lo [5] and Simonian and Wu [4] are representative of this stream of research. The second stream of the hedge fund replication literature is concerned with the distributional approximation of hedge fund returns, rather than their direct approximation. The aim here is to create a trading strategy that generates returns with the same statistical distribution as the hedge fund returns. This is a more modest goal than direct approximation, because in any given period the return generated by the replicating strategy need not resemble the return from the hedge fund. Key papers in this stream of the hedge fund replication literature include Amin and Kat [6], Kat and Palaro [7,8], and Kat [1]. The results in this paper concern the approach taken by these authors. Our aim is not to provide statistical methods ready to be applied to data, but rather to develop a mathematical framework for thinking about distributional replication.
Suppose that X represents the payoff after one month from a $1 investment in a market index, while Y represents the payoff after one month from a $1 investment in a hedge fund. Amin and Kat [6] propose to estimate a function θ such that θ ( X ) and Y have the same distribution function. Given a sample of n realizations of X and Y, their estimated replicating function is θ ^ n = Q ^ n Y F ^ n X , where Q ^ n Y is an estimate of Q Y , the quantile function of Y, and F ^ n X is an estimate of F X , the distribution function of X. Assuming continuity of F X , the random variable Q Y ( F X ( X ) ) has the same distribution as Y, implying that Q Y F X is a replicating function. We might therefore expect θ ^ n ( X ) and Y to have similar distributions in large samples. The estimated function θ ^ n can be thought of as describing the payoff after one month of a derivative security written on the market index. Under suitable conditions, this payoff can be achieved using a continuously rebalanced self-financed portfolio of market shares and cash, as in the hedging strategy used to justify the celebrated Black-Scholes-Merton option pricing formula [9,10]. We let p ( θ ) denote the start-up cost of a hedging strategy with payoff θ ( X ) , and refer to this quantity as the price of θ .
It need not be the case that p ( θ ) = 1 when θ is a replicating function. This is because the distributional equivalence of θ ( X ) and Y does not imply the existence of an arbitrage opportunity when their initial investment costs differ. Indeed, two replicating functions need not have the same price. Amin and Kat [6] aim to estimate the particular replicating function Q Y F X because it is an increasing function of the market payoff X. In Dybvig [11,12] and Beare [13], it is shown under very general conditions that, given a collection of payoff functions that all achieve the same payoff distribution, the cheapest such function must allocate payoffs to states as a nonincreasing function of the state prices. Amin and Kat [6] observe that in a Black-Scholes world, the state price density (with respect to the true probability measure over states) is inversely related to X. Thus, the cheapest replicating function must be a nondecreasing function of X.
A key difference between the approach to distributional replication proposed in this paper, and the approach taken by Amin and Kat [6], is that we do not assume that the cheapest replicating function is nondecreasing. Instead, we search for the cheapest replicating function over a large space of functions, many of which are not monotone. Empirically, there is good reason to believe that the cheapest replicating function will not be monotone. Jackwerth [14] and Brown and Jackwerth [15] argue that the state price density (in their terminology, pricing kernel) implied by S&P500 options with one month to expiry changed dramatically after the stock market crash of 1987, becoming nonmonotone with respect to the return on the S&P500 index. See, in particular, Figure 2 in [15], in which the state price density is an increasing function of the market return for monthly return levels between approximately 3 % and 3 % , and decreasing elsewhere. Other empirical studies of the relationship between the state price density and market returns have largely confirmed that it is often nonmonotone [16,17,18,19,20,21,22]. See also [23] for a discussion of the relevance of such nonmonotonicity for constructing density forecasts of market returns. If the relationship between the state price density and the market return is not monotone, then the results of Dybvig [11,12] and Beare [13] imply that the cheapest replicating function θ will not be monotone. In this case, the approach to distributional replication taken here is advantageous.
There is a second major conceptual difference between the approach to distributional replication taken here, and the approach taken by Amin and Kat [6]. Amin and Kat propose to implement the desired payoff function θ by engaging in a continuous time hedging strategy, trading market shares and cash. In this paper, we propose to approximate θ by investing in a portfolio of European put and call options written on the market index at various strike prices. The portfolio may also include the market index itself, and risk-free zero-coupon bonds. A key advantage of our approach is that the price of the payoff function θ corresponding to such a portfolio may be calculated directly from observed option and bond prices. By comparison, Amin and Kat price θ by taking the risk neutral expected payoff of θ ( X ) under Black-Scholes conditions, and they require Black-Scholes conditions to hold in order for their hedging strategy to achieve the desired payoff. The empirical limitations of the Black-Scholes pricing model have been extensively documented. We avoid these difficulties by confining ourselves to functions θ for which the market price is directly observable, and which may be implemented in practice by investing directly in a portfolio of actively traded securities.
We embed our approach in the statistical framework of sieve estimation by assuming that the set of strike prices at which options may be traded becomes more dense as the sample size n increases, at a controlled rate. The payoff functions achievable using portfolios of this kind are continuous piecewise linear functions, with kinks at the allowable strike prices. We control the entropy (complexity) of this class of functions using the notion of VC-dimension [24], and are thereby able to bring the machinery of empirical process theory to bear in analyzing the asymptotic properties of our technique. The use of option payoff functions to form the basis for a sieve space is not entirely without precedent. Option payoffs appear as activation functions in the regularized neural network model studied by Corradi and White [25]: take m = 2 in their Equation (4.1). Those authors do not, however, explicitly discuss the connection to option payoffs and portfolio choice.
The approach taken by Amin and Kat [6], and in this paper, aims to replicate the univariate distribution of Y. Typically, the joint distribution of θ ( X ) with any other asset payoff will differ from the joint distribution of Y and that asset payoff. In particular, the joint distribution of θ ( X ) and the market payoff X will differ from the joint distribution of Y and X, and for this reason we cannot expect investors to find θ ( X ) to be a perfect substitute for Y in general. Intuitively, if the correlation between X and Y is lower than the correlation between X and θ ( X ) , risk-averse investors may prefer a balanced portfolio formed from X and Y to a similar portfolio formed from X and θ ( X ) . In response to this issue, Kat and Palaro [7,8] extend the approach of Amin and Kat [6] to the replication of bivariate distributions. They introduce a “reserve asset” with payoff Z, and seek to find a bivariate function θ such that the joint distribution of θ ( X , Z ) and X is the same as the joint distribution of Y and X. This replicating payoff function is implemented in practice using a continuously rebalanced portfolio formed by trading market shares, cash, and the reserve asset. We do not follow that approach in this paper, in part because it is generally not feasible to approximate a wide class of bivariate functions using a portfolio formed from options written on individual assets. Confining ourselves to the replication of univariate distributions may not seem unreasonable if we modify our interpretation of the random variable Y. Rather than representing the payoff from a $1 investment in a hedge fund, Y could represent the payoff from a $1 investment in a portfolio partly invested in the hedge fund and partly in the market index. Amin and Kat [6] take this approach in their empirical study of hedge fund efficiency. More generally, Y could be the payoff from a $1 investment in a portfolio formed from any number of arbitrary assets. If θ ( X ) has the same distribution as Y, and the price of θ is less than $1, an investor should prefer to invest in the replicating portfolio.
The remainder of this paper is structured as follows. In Section 2 and Section 3 we develop a general approach to the estimation of replicating functions, without explicit reference to the financial application that serves as our motivation. In Section 2 we provide some basic mathematical tools for dealing with the notion of replicating functions, while in Section 3 we discuss the statistical estimation of replicating functions using the method of sieves. In Section 4 we explain how the mathematical material in Section 2 and Section 3 can be applied to the problem of hedge fund replication. Section 5 outlines some areas for future research, and concludes. Throughout the paper, there are several numbered assumptions and propositions. In the statement of each proposition, it should be understood that all assumptions introduced prior to the proposition hold. Proofs of all numbered propositions may be found in Appendix A.

2. Replicating Functions

In this section we formally introduce the notion of a replicating function. We construct a pseudometric on the set of Borel measurable functions mapping the support of one random variable to the support of another, and we define a criterion function that identifies the set of replicating functions. Some useful results relating to these objects are given.
Let X and Y be real valued random variables, and let P X : B ( R ) [ 0 , 1 ] and P Y : B ( R ) [ 0 , 1 ] denote the probability measures corresponding to X and Y, where B ( R ) denotes the usual Borel σ -field on R . Let F X : R [ 0 , 1 ] and F Y : R [ 0 , 1 ] denote the distribution functions of X and Y. Let R X = cl ( { x R : 0 < F X ( x ) < 1 } ) , and let R Y = cl ( { y R : 0 < F Y ( y ) < 1 } ) ; here, cl ( A ) denotes the Euclidean closure of a set A R . The sets R X and R Y are intervals of the form [ a , b ] , [ a , ) or ( , b ] , with a , b R . We place the following condition on F X and F Y .
Assumption 1.
F X and F Y are continuous and strictly increasing on R X and R Y respectively.
Assumption 1 is stronger than is required to establish all of the results in this paper, but it will be convenient for us to maintain Assumption 1 throughout. Under Assumption 1, the restriction of F X to R X is a continuous and strictly increasing function, and therefore uniquely defines a continuous and strictly increasing inverse function Q X : F X ( R X ) R X . We refer to this function as the quantile function of X. The quantile function of Y, denoted Q Y : F Y ( R Y ) R Y , is defined in the same way. Note that F X ( R X ) and F Y ( R Y ) are equal to ( 0 , 1 ] , [ 0 , 1 ) , [ 0 , 1 ] or ( 0 , 1 ) , depending on whether X and Y are almost surely bounded above, below, both, or neither.
Let Θ denote the set of all Borel measurable functions θ : R X R Y . Though Θ depends on X and Y, we do not make this dependence explicit in our notation. We are interested in those functions θ Θ for which θ ( X ) and Y have the same distribution.
Definition 1.
A function θ Θ is called a replicating function for X and Y, or simply a replicating function or replicator, if P X θ 1 B = P Y B for all B B ( R ) .
Note that a replicating function does not describe a relationship between X and Y in the usual sense. θ ( X ) and Y may be perfectly correlated, or independent. All that matters is that they have the same marginal distribution. We will let Θ * denote the set of all replicating functions for X and Y. Again, the dependence of Θ * on X and Y is not made explicit in our notation.
Our first result concerns the cardinality of Θ * .
Proposition 1.
Θ * is uncountably infinite. Moreoever, there exists an uncountable subset of Θ * in which no two functions are equal on a set of positive P X -measure.
Remark 1.
One example of a replicating function is the composition Q Y F X , restricted to R X . Clearly, if F X is not continuous and F Y is continuous, so that Assumption 1 is violated, Θ * is empty.
We will sometimes find it helpful to consider the special case where X and Y are both distributed uniformly on the unit interval. In this case, the composition θ = Q Y F X restricted to R X = [ 0 , 1 ] is the identity function, θ ( x ) = x . Another simple example of a replicating function is θ ( x ) = 1 x , restricted to [ 0 , 1 ] . Graphs of these functions, and of four other replicating functions, are provided in Figure 1. We will let Θ ˜ denote the set of all Borel measurable functions θ : [ 0 , 1 ] [ 0 , 1 ] , and let Θ ˜ * denote the set of functions in Θ ˜ that are replicators when X , Y U ( 0 , 1 ) .
As an aid to visualizing the functions in Θ ˜ * , a reader familiar with the concept of local time may find it helpful to think of each function θ Θ ˜ as a (nonrandom) stochastic process on the unit interval. The functions θ Θ ˜ * are precisely those for which the local time at y is equal to one for each y ( 0 , 1 ) . That is, for θ Θ ˜ , we have θ Θ ˜ * if and only if
lim ε 0 1 2 ε 0 1 1 ( | θ ( x ) y | ε ) d x = 1
for each y ( 0 , 1 ) . This can be shown by observing that the above limit is equal to the derivative of the distribution function of θ ( X ) at y when X U ( 0 , 1 ) .
We now introduce a pseudometric d on Θ . For θ 0 , θ 1 Θ , let d : Θ × Θ R be given by
d ( θ 0 , θ 1 ) = R X | F Y ( θ 1 ( x ) ) F Y ( θ 0 ( x ) ) | d F X ( x ) .
It is obvious that d satisfies the four axioms for a pseudometric: nonegativity, symmetry, the triangle inequality, and the requirement that d ( θ , θ ) = 0 for all θ Θ . d is not a metric because we will have d ( θ 0 , θ 1 ) = 0 when θ 0 and θ 1 are equal on a set of P X -measure one, even if the two functions are distinct. Note that when X , Y U ( 0 , 1 ) , d corresponds to the usual L 1 -seminorm for functions on [ 0 , 1 ] . When X and Y are not uniform, d ( θ 0 , θ 1 ) is equal to the L 1 distance between the deformed functions F Y θ 0 Q X and F Y θ 1 Q X .
We now introduce a nonnegative function M : Θ R that is intended to quantify the extent to which a function θ Θ achieves distributional replication. For θ Θ , let F X ( · ; θ ) denote the distribution function of θ ( X ) ; that is, for y R and θ Θ , let F X ( y ; θ ) = P X θ 1 ( , y ] . Define
M ( θ ) = R Y | F X ( y ; θ ) F Y ( y ) | d F Y ( y ) .
Our pseudometric d endows M with a convenient smoothness condition. Specifically, M is Lipschitz continuous with respect to d, with Lipschitz constant no greater than one.
Proposition 2.
For all θ 0 , θ 1 Θ , we have | M ( θ 1 ) M ( θ 0 ) | d ( θ 0 , θ 1 ) .
Our next result concerns the identification of the set of replicators Θ * using the criterion function M. It states that the set of replicators Θ * is precisely those functions θ Θ for which M ( θ ) = 0 .
Proposition 3.
Θ * = { θ Θ : M ( θ ) = 0 } .
Propositions 2 and 3 jointly imply that, if θ 1 , θ 2 , is a sequence of elements of Θ converging to some θ * Θ * in the pseudometric d, then M ( θ n ) 0 as n . We would like to interpret this to mean that θ n gets arbitrarily close to achieving distributional replication as n becomes larger. The next result makes this notion precise.
Proposition 4.
Let θ 1 , θ 2 , be a sequence of elements of Θ. Then, as n , M ( θ n ) 0 if and only if F X ( y ; θ n ) F Y ( y ) for each y R .
Remark 2.
Note that, since F Y is continuous, pointwise convergence of F X ( · ; θ n ) to F Y ( · ) is equivalent to the statement P X θ n 1 P Y , where “⇒” denotes weak convergence of probability measures (see e.g., [26]), and P X θ n 1 is the measure on B ( R ) given by P X θ n 1 B = P X { x R X : θ n ( x ) B } for each B B ( R ) . We could also write this statement as θ n ( X ) d Y , where “ d ” denotes convergence in distribution in the usual sense.
Our final result of this section is a modification of Proposition 4 that allows θ 1 , θ 2 , to be random elements. An obvious first step towards defining such random elements would be to introduce a σ -field on Θ ; however, such an approach leads to complications relating to the measurability of θ ( X ) when θ and X are both random. We will need to require each of the random elements θ n , n N , to be a random element of some subspace Θ n Θ . Each subspace Θ n will be equipped with a σ -field T n that is well behaved in the following sense.
Definition 2.
Given a collection of functions Θ Θ , an admissible structure for Θ is a σ-field T of subsets of Θ such that the evaluation mapping ( θ , x ) θ ( x ) is a measurable map from ( Θ × R X , T B ( R X ) ) to ( R , B ( R ) ) .
Definition 2 is a version of a definition of admissibility given in Section 5.2 of [27]. B ( R X ) denotes the Borel σ -field on R X , while the notation T B ( R X ) refers to the product σ -field on Θ × R X ; that is, the σ -field on Θ × R X generated by sets of the form A × B , with A T and B B ( R X ) . With Definition 2 in hand, we are now in a position to state the final result of this section.
Proposition 5.
Let Θ 1 , Θ 2 , be a sequence of subsets of Θ, and for each n N let T n be an admissible structure for Θ n and let P θ n be a probability measure on T n . Let P θ n ( X ) be the probability measure on B ( R ) given by P θ n ( X ) B = P θ n P X { ( θ , x ) Θ n × R X : θ ( x ) B } for each B B ( R ) , where P θ n P X is the product measure on T n B ( R X ) . Then, as n , if Θ n M d P θ n 0 then also P θ n ( X ) P Y .
Remark 3.
It is possible to rephrase Proposition 5 in a somewhat less precise fashion that may be easier to interpret. For each n N , we can think of the measure P θ n as corresponding to a random function θ n taking values in Θ n . The measure P θ n ( X ) describes the distribution of θ n ( X ) when θ n and X are both random, and θ n is independent of X. The statement Θ n M d P θ n 0 can be written as E M ( θ n ) 0 . Thus, the final statement of Proposition 5 could be written as follows: as n , if E M ( θ n ) 0 then also θ n ( X ) d Y .

3. Sieve Estimation of Replicating Functions

In this section we turn our attention to the statistical estimation of a replicating function using a sample of observations { ( X i , Y i ) : 1 i n } .
Assumption 2.
{ X i : i N } and { Y i : i N } are iid collections of real valued random variables defined on a complete probability space ( Ω , F , P ) . Each X i has distribution function F X , and each Y i has distribution function F Y .
Remark 4.
The iid condition in Assumption 2 refers to the independence of X i and X j , and of Y i and Y j , when i j . X i and Y j may be dependent for any i , j .
Remark 5.
The assumption that ( Ω , F , P ) is complete will be useful later when we employ a result due to Stinchcombe and White [28] that provides conditions under which certain real valued functions on Ω are analytic (in the measure-theoretic sense). The interested reader may refer to that paper for the definition of an analytic function. When ( Ω , F , P ) is complete, real valued functions on Ω are analytic if and only if they are measurable.
We wish to use our observed sample { ( X i , Y i ) : 1 i n } to construct an estimate of a replicating function that has good properties when n is large. As was made clear in Proposition 1, the set of replicating functions is uncountably infinite in a nontrivial sense. We are thus confronted with the problem of partial identification: the distributional replication property does not uniquely identify the function we are seeking to estimate. The first step in our estimation procedure is to empirically discriminate between those functions that come close to achieving distributional replication, and those that do not. In the previous section, the function M : Θ R was used to quantify the extent to which a function θ Θ achieves distributional replication. We will construct an empirical analogue to M. Given a sample of size n, let F n Y : R [ 0 , 1 ] denote the empirical distribution function of Y, and for θ Θ let F n X ( · ; θ ) : R [ 0 , 1 ] denote the empirical distribution function of θ ( X ) . That is,
F n Y ( y ) = 1 n i = 1 n 1 ( Y i y ) , F n X ( y ; θ ) = 1 n i = 1 n 1 ( θ ( X i ) y ) .
Let the function M n : Θ R be defined by
M n ( θ ) = | F n X ( y ; θ ) F n Y ( y ) | d F n Y ( y ) = 1 n i = 1 n | F n X ( Y i ; θ ) F n Y ( Y i ) | .
M n will serve as our empirical analogue to M. Note that we have suppressed the dependence of F n Y , F n X and M n on ω Ω in our notation.
We would like M n to serve as a good approximation to M when n is large. Unfortunately, the space Θ is too rich for us to expect M n to be close to M uniformly over Θ . We shall instead consider the approximation of M by M n over a more manageable subset of the functions in Θ . We will consider a sequence of such subsets Θ 1 Θ 2 , with Θ n becoming more complex as n grows, but at a slow enough rate to allow the uniform approximation error sup θ Θ n | M n ( θ ) M ( θ ) | to decay to zero in a suitable sense. Our approach may be regarded as a version of the method of sieve estimation. See [29] for a general discussion of sieve estimation in econometrics.
To control the entropy (complexity) of Θ n , we shall employ the notion of VC-major dimension. VC-major dimension is a characterization of complexity for classes of functions that is related to the notion of VC-dimension for classes of sets.
Definition 3.
Let C be a collection of subsets of R . C is said to shatter a set of points D = { x 1 , , x d } R , d N , if all 2 d subsets of D can be written as the intersection of D with some set in C . C is said to be a VC-class if, for some d N , C cannot shatter any set of size d. If C is a VC-class then the VC-dimension of C , written V ( C ) , is defined to be the smallest d N for which no set of size d is shattered by C . If C is not a VC-class, we set V ( C ) = .
Definition 3 is standard in the literature on empirical processes; see e.g., Section 2.6.1 in [30]. Building on Definition 3, we define the VC-major dimension of a subset of Θ as follows.
Definition 4.
Consider a collection of functions Θ Θ . A subset of R is said to be majorized by Θ if it can be written as { x R X : θ ( x ) > c } for some θ Θ and some c R . Let C denote the collection of all sets majorized by Θ . We say that Θ is a VC-major class if C is a VC-class. The VC-major dimension of Θ , written V ( Θ ) , is defined to be the VC-dimension of C .
Remark 6.
The definition of VC-major dimension should not be confused with that of VC-subgraph dimension, which also appears frequently in the empirical process literature; in general, the two are different. When Θ is the set of indicator functions of a collection of sets C , the VC-major dimension and VC-subgraph dimension of Θ are both equal to the VC-dimension of C . Sections 2.6.2 and 2.6.4 in [30] provide discussions of VC-subgraph and VC-major classes respectively.
We will control the entropy of the spaces Θ n by bounding the growth rate of their VC-major dimension. In addition, we will need to introduce some additional technical conditions to ensure the measurability of certain real valued functions on Ω . For Θ Θ , let B ( Θ ) denote the Borel σ -field on Θ induced by the pseudometric d.
Assumption 3.
For each n N , Θ n Θ is a nonempty VC-major class. Further, B ( Θ n ) is an admissible structure on Θ n , and ( Θ n , B ( Θ n ) ) is a Souslin measurable space.
Remark 7.
Refer to Stinchcombe and White [28] for the definition of a Souslin measurable space, and further discussion. Here, we note only that for ( Θ n , B ( Θ n ) ) to be a Souslin measurable space, it suffices that ( Θ n , d ) is a Polish metric space; that is, ( Θ n , d ) is a metric space that is topologically isomorphic to a complete separable metric space.
The following result shows how the uniform approximation error sup θ Θ n | M n ( θ ) M ( θ ) | relates to V ( Θ n ) .
Proposition 6.
As n , we have E sup θ Θ n | M n ( θ ) M ( θ ) | = O V ( Θ n ) / n .
Remark 8.
In the proof of Proposition 6 it is established that sup θ Θ n | M n ( θ ) M ( θ ) | is a measurable function from ( Ω , F ) to ( R , B ( R ) ) . Thus, our statement of Proposition 6 uses the ordinary expectation operator. It is common in the empirical process literature to see results of this kind expressed in terms of outer expectation; see e.g., Section 1.2 in [30].
Proposition 6 indicates that, if V ( Θ n ) = o ( n ) , then when n is large we can use the empirical criterion function M n to distinguish between those functions in Θ n that are close to achieving distributional replication, and those that are not. We have yet to address the issue of partial identification: there may be many functions in Θ n that are close to achieving distributional replication. We wish to entertain the possibility that not all replicating functions are created equal. Let p : Θ R be a function describing the “price” of each function θ Θ . Rather than seeking to estimate an arbitrary replicating function, we will seek to estimate a replicating function θ for which p ( θ ) is as small as possible.
Assumption 4.
The function p : Θ R is nonnegative, and continuous with respect to d.
Loosely speaking, we seek to estimate the cheapest, or optimal, replicating function. The following result concerns the selection of our estimated function θ ^ n . In it, we make the random nature of M n explicit by writing M n as a function of both ω Ω and θ Θ .
Proposition 7.
Let ϵ 1 , ϵ 2 , and λ 1 , λ 2 , be sequences of positive real numbers. For each n N , there exists a measurable function θ ^ n from ( Ω , F ) to ( Θ n , B ( Θ n ) ) that satisfies θ ^ n ( ω ) Θ ^ n * ( ω ) and p ( θ ^ n ( ω ) ) inf θ Θ ^ n * ( ω ) p ( θ ) + ϵ n for all ω Ω , where
Θ ^ n * ( ω ) = θ Θ n : M n ( ω , θ ) inf ϑ Θ n M n ( ω , ϑ ) + λ n .
Remark 9.
The mathematical content of Proposition 7 is the existence of a random function θ ^ n satisfying the stated conditions. The proof applies the Sainte-Beauve measurable selection theorem (see Corollary 5.3.2 in [27]) and Theorem 2.17 of Stinchcombe and White [28], which concerns the measurability of the suprema of random functions over random sets. Proposition 7 also serves to define our estimated replicating function θ ^ n . That is, we take θ ^ n to be any random function satisfying the conditions given in Proposition 7.
Remark 10.
The random set Θ ^ n * can be viewed as our estimate of the set of replicators Θ * . It consists of all those functions θ Θ n such that M n ( θ ) comes close to achieving its infimum over Θ n . Note that this infimum is not necessarily achieved by any θ Θ n . The tuning parameter λ n governs how close M n ( θ ) must be to inf ϑ Θ n M n ( ϑ ) before θ is admitted into the set Θ ^ n * . We will require that λ n 0 as n , but at a rate that is not too fast. θ ^ n is chosen such that θ ^ n ( ω ) Θ ^ n * ( ω ) for each ω Ω . Thus, if Θ ^ n * is an effective estimator of Θ * , we can expect θ ^ n to come close to achieving distributional replication.
Remark 11.
The sequence ϵ 1 , ϵ 2 , should be thought of as converging to zero very quickly. We would like to choose θ ^ n such that p ( θ ^ n ( ω ) ) is equal to the infimum of p over Θ ^ n * ( ω ) for each ω Ω , but in general this is not possible because the set Θ ^ n * ( ω ) need not be compact. So instead, we choose θ ^ n such that p ( θ ^ n ( ω ) ) is very close to the infimum of p over Θ ^ n * ( ω ) , with arbitrarily small approximation error ϵ n . This technical argument relates closely to what Chen [29] (p. 5561) refers to as an approximate sieve extremum estimate. Though λ n and ϵ n appear to play similar roles in Proposition 7, from a more substantive perspective we wish ϵ n to be as small as possible, while λ n plays a more involved role in the asymptotic results to follow, and must be chosen to converge to zero at a suitable rate.
Remark 12.
If there is no relevant notion of “price” over the space of functions Θ, we may simply take p to be constant over Θ. In this case, the sequence ϵ 1 , ϵ 2 , and the function p play no role in Proposition 7. Instead, Proposition 7 merely asserts the existence of a measurable function θ ^ n from ( Ω , F ) to ( Θ n , B ( Θ n ) ) that satisfies θ ^ n ( ω ) Θ ^ n * ( ω ) for each ω Ω .
It remains to show that our estimator θ ^ n has desirable asymptotic properties. To ensure that θ ^ n is well-behaved, the rate at which the sieve space Θ n expands, and at which the tuning parameter λ n decays, must be suitably controlled. The following assumption provides a sufficient condition of this kind.
Assumption 5.
As n , we have λ n 0 , n 1 λ n 2 V ( Θ n ) 0 and λ n 1 inf θ Θ n d ( θ , θ ) 0 for each θ Θ , where Θ is some dense subset of Θ * under d.
Remark 13.
The requirement that n 1 λ n 2 V ( Θ n ) 0 and λ n 1 inf θ Θ n d ( θ , θ ) 0 for each θ Θ places opposing constraints on the rate of expansion of Θ n as n . The complexity of Θ n must increase sufficiently fast for the sieve approximation error inf θ Θ n d ( θ , θ ) to tend to zero faster than λ n for each θ Θ , but not so fast that V ( Θ n ) increases faster than n λ n 2 . On the other hand, the rate of decay of λ n may be arbitrarily slow, provided that λ n 0 .
Our final result of this section indicates that, when the above assumptions are satisfied, in large samples we can expect our estimated function to be close to achieving distributional replication, and close to achieving the minimum cost among replicators. We first require some additional notation. Let P θ ^ n be the probability measure on B ( Θ n ) given by P θ ^ n B = P θ ^ n 1 B for each B B ( Θ n ) , and let P θ ^ n ( X ) be the probability measure on B ( R ) given by P θ ^ n ( X ) B = P θ ^ n P X { ( θ , x ) Θ n × R X : θ ( x ) B } for each B B ( R ) . Note that for P θ ^ n ( X ) to be well defined we need B ( Θ n ) to be an admissible structure for Θ n ; this condition was given in Assumption 3. We can think of P θ ^ n ( X ) as the probability distribution of θ ^ n ( X ) when θ ^ n and X are distributed independently of one another.
Proposition 8.
As n , P θ ^ n ( X ) P Y and P { ω : p ( θ ^ n ( ω ) ) > inf θ Θ * p ( θ ) + ε } 0 for any ε > 0 .
Remark 14.
Proposition 8 indicates that θ ^ n can be expected to perform well with respect to the dual goals of distributional replication and cost minimization in large samples. This duality complicates any discussion of the optimal selection of the tuning parameter λ n . When λ n is large, we include functions in our estimated set Θ ^ n * for which the empirical evidence for distributional replication is weaker, but we also minimize the function p over a larger set. In applications, the best choice of λ n would depend on an individual’s relative preference for distributional replication, quantified by M ( θ ) , and cost minimization, quantified by p ( θ ) .

4. Distributional Replication Using Options

In this section we consider the problem of choosing a portfolio of options on some financial asset such that the payoff from our portfolio after a specified period of time has approximately the same statistical distribution as the payoff from a $1 investment in some other asset over the same time period. We would like to find the cheapest portfolio of options such that distributional replication is achieved; in particular, we would like the cost of the portfolio to be $1 or less. We will show how this problem of portfolio selection can be interpreted and solved using the machinery developed in the previous two sections.
We suppose that the random variables X and Y represent the dollar denominated payoffs after one period from a $1 investment in each of two assets. The asset with payoff X will be referred to as the base asset, and the asset with payoff Y will be referred to as the target asset. The price of a one share investment in either asset is taken to be $1. We assume that X and Y are nonnegative and may be arbitrarily large with nonzero probability, so that R X = R Y = [ 0 , ) under Assumption 1. We may thus replace Assumption 1 with the following more restrictive condition.
Assumption 6.
F X and F Y are continuous and strictly increasing on [ 0 , ) , and zero on ( , 0 ] .
We find the payoff distribution of the target asset to be desirable, but we seek to achieve this distribution by investing in a portfolio composed of the base asset itself and a basket of European put and call options written on the base asset, with the options expiring after one period. The payoff of such a portfolio after one period is a nonrandom function of X; for instance, the payoff from a European call option with strike price s after one period is given by max { 0 , X s } , while the payoff from a European put option with strike price s after one period is given by max { 0 , s X } . We also allow our portfolio to include an investment in risk-free zero-coupon bonds with $1 par value, expiring after one period. The payoff from such a bond after one period is simply $1. We allow our portfolio to include long or short positions in each of the component assets, but the payoff from the complete portfolio must be nonnegative.
The payoff from a portfolio of options and bonds after one period is a nonrandom function of X. Thus, we can think of a portfolio as a function θ Θ , and write the payoff from the portfolio as θ ( X ) . Suppose our portfolio includes options at m different strike prices s 1 , , s m , with 0 < s 1 < < s m < . Without loss of generality, we may consider all options to be call options, since the payoff function for a put option with strike price s i can be replicated by selling one share of the base asset, purchasing a call option with strike price s i , and purchasing s i zero-coupon bonds. Suppose we form a portfolio by purchasing β 1 bonds, β 2 shares in the base asset, and β i + 2 call options at strike price s i , with i = 1 , , m . The payoff function corresponding to our portfolio is then given by
θ ( x ; β , s ) = β 1 + β 2 x + i = 1 m β i + 2 max { 0 , x s i } ,
where x [ 0 , ) . For fixed s = ( s 1 , , s m ) , the collection of functions { θ ( · ; β , s ) : β R m + 2 } consists of all the continuous functions from [ 0 , ) to R that are linear on each of the m + 1 subintervals ( 0 , s 1 ) , ( s 1 , s 2 ) , , ( s m , ) . To ensure that the payoff from our portfolio is nonnegative, we require that β lies in a suitable subset of R m + 2 . We will let Ψ m ( s ) denote the collection of all continuous functions from [ 0 , ) to [ 0 , ) that are linear on each of the m + 1 subintervals ( 0 , s 1 ) , ( s 1 , s 2 ) , , ( s m , ) , and let B ( Ψ m ( s ) ) denote the Borel σ -field on Ψ m ( s ) generated by d.
Proposition 9.
For fixed s R m with 0 < s 1 < < s m , we have (i) V ( Ψ m ( s ) ) = m + 3 ; (ii) ( Ψ m ( s ) , d ) is a Polish metric space; and (iii) B ( Ψ m ( s ) ) is an admissible structure for Ψ m ( s ) .
We can see from Proposition 9 and Remark 7 that Ψ m ( s ) satisfies the conditions placed on Θ n in Assumption 3. The main idea behind the application discussed in this section is that Ψ m ( s ) , the space of nonnegative payoff functions achievable using strike prices s, can be used to play the role of the sieve space Θ n described in the previous section. We obtain an expanding sequence of sieve spaces by assuming that the collection of strike prices s varies with the sample size n, becoming more dense (in a sense soon to be made precise) as n increases. Suppose that m 1 , m 2 , is a nondecreasing sequence of natural numbers with m n and m n / n 0 as n . Let { s i , n : i = 1 , , m n ; n N } be a triangular array of positive real numbers satisfying (i) 0 < s 1 , n < < s m n , n for each n N , and (ii) { s 1 , n , , s m n , n } { s 1 , n + 1 , , s m n + 1 , n + 1 } for each n N . We define our expanding sequence of sieve spaces by setting Θ n = Ψ m n ( s 1 , n , , s m n , n ) . Proposition 9 implies that this choice of Θ n satisfies Assumption 3, with V ( Θ n ) = m n + 3 .
In the context of the present application, the function p introduced in the previous section describes, literally, the price of each payoff function θ Θ . For a payoff function θ Θ n , we can calculate the price p ( θ ) directly from the prices of bonds and options. Consider the function θ ( x ; β , s ) = β 1 + β 2 x + i = 1 m β i + 2 max { 0 , x s i } defined earlier. Let p 1 denote the price of a bond, p 2 denote the price of a share in the base asset, and p i + 2 denote the price of a call option with strike price s i , where i = 1 , , m . Note that p 2 = 1 by assumption. The price of θ ( · ; β , s ) is simply i = 1 m + 2 p i β i . In this way we can calculate p ( θ ) for any θ Θ n , provided we observe the bond price p 1 and the prices of call options at strike prices s 1 , n , , s m n , n .
Assumption 5 imposes a condition on the rate of decay of the sieve approximation error: we require that λ n 1 inf θ Θ n d ( θ , θ ) 0 for each θ Θ , where Θ is some dense subset of Θ * under d. The following result shows how Θ may be chosen such that this condition is satisfied when our sieve space corresponds to portfolios of options.
Proposition 10.
Let Θ denote the set of all functions θ Θ * such that F Y θ Q X is Lipschitz continuous. Then Θ is dense in Θ * under d. Also, when Θ n = Ψ m n ( s 1 , n , , s m n , n ) , as n we have
inf θ Θ n d ( θ , θ ) = O m n sup 0 i m n P X ( s i , n , s i + 1 , n ) 2
for each θ Θ , where s 0 , n = 0 and s m n + 1 , n = .
Proposition 10 reveals that our sequence of sieve spaces constructed using option payoffs can approximate replicating functions satisfying a deformed Lipschitz condition, provided that sup 0 i m n P X ( s i , n , s i + 1 , n ) decays to zero at a suitable rate. Further, that set of deformed Lipschitz continuous replicating functions is dense in the set of all replicating functions. If we could choose our strike prices such that P X ( s i , n , s i + 1 , n ) was constant across i = 0 , , m n , we would have inf θ Θ n d ( θ , θ ) = O ( m n 1 ) for each θ Θ .
Proposition 10 and part (i) of Proposition 9 show how the choice of strike prices is constrained by Assumption 5. Specifically, the conditions on Θ n imposed by Assumption 5 may be rewritten as follows: n 1 λ n 2 m n 0 and λ n 1 m n sup 0 i m n P X ( s i , n , s i + 1 , n ) 2 0 as n . If our strike prices are chosen such that P X ( s i , n , s i + 1 , n ) is constant across i = 0 , , m n , Assumption 5 will be satisfied provided that λ n = o ( 1 ) , m n = o ( n λ n 2 ) and m n 1 = o ( λ n ) . For instance, we could choose m n n a and λ n n b , with 0 < b < a < 1 2 b . As noted in Remark 14, it is difficult to see how an optimal choice of m n and λ n could be made in practice, because the two parameters may have different effects on the twin criterion functions M ( θ ) and p ( θ ) , and one’s relative preference for optimizing with respect to those two functions may be idiosyncratic. It is perhaps best to experiment with a range of different values for m n and λ n . Further, the choice of strike prices is likely to be constrained by the strike prices being actively traded on the market.

5. Conclusions

In this paper we have developed a mathematical framework for thinking about the estimation of a function θ such that θ ( X ) has the same distribution as Y. We have discussed the relevance of our results to financial applications in which one seeks to find the cheapest way to achieve a desired payoff distribution by trading liquid assets. We now briefly discuss two possible extensions of our results that may prove fruitful.
In terms of the relevance of our technical conditions in financial applications, the elephant in the room is clearly Assumption 2, which imposes an iid condition on the random variables ( X i , Y i ) , i = 1 , , n . It is certainly the case that time series of financial returns typically do not behave as though they were distributed independently over time, as is clear from the voluminous literature on stochastic volatility. The iid condition comes into play in the proof of Proposition 6, in which results in empirical process theory are used to establish a uniform bound on the error in the approximation of M by M n over our sieve space Θ n . The results we apply are based on iid conditions, but generalizations suitable for dependent data are available [31,32,33]. It seems likely that, with some strengthening of the rate conditions in Assumption 5, the results in this paper could be adapted to allow for dependent data. However, by allowing for the possibility of serial dependence, a further question is raised. The methods we have proposed are designed such that the unconditional distribution of θ ( X ) is approximately equal to the unconditional distribution of Y. If data are serially dependent, the more relevant objects may be distributions that are conditional on past information. Though we acknowledge the importance of this issue, it goes beyond the scope of this paper.
A second potential extension of our results would be to consider the replication of multivariate distributions. As discussed in the introduction, Kat and Palaro [7,8] consider estimating a transformation θ of a pair of random variables X and Z such that X and θ ( X , Z ) have the same joint distribution as X and Y. The difficulty with adapting our own method to this approach is that the class of bivariate functions that can be approximated by portfolios of options written on individual assets is rather small. One possible solution would be to consider portfolios formed from derivative securities that are written on multiple underlying assets; another would be to forgo exact distributional replication, and seek the closest distributional match from a smaller class of multivariate payoff functions that is approximable using portfolios of simple options. We leave these possibilities for future research.

Funding

This research received no external funding.

Data Availability Statement

Data is contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proofs

Proof of Proposition 1.
Choose a point c ( 0 , 1 ) , and let θ ˜ c : [ 0 , 1 ] ( 0 , 1 ) be defined by θ ˜ c ( u ) = u / c for u ( 0 , c ) , θ ˜ c ( u ) = ( u c ) / ( 1 c ) for u ( c , 1 ) , and θ ˜ c ( u ) = 1 2 for u { 0 , c , 1 } . Let θ c : R X R Y be defined by θ c ( x ) = Q Y θ ˜ c F X ( x ) . F X is continuous under Assumption 1, so F X ( X ) U ( 0 , 1 ) . Hence, for any a ( 0 , 1 ) , we have
P X { x : θ ˜ c F X ( x ) a } = 0 1 1 { θ ˜ c ( u ) a } d u = 0 1 1 { u a c or c u c + a a c } d u = a ,
implying that θ ˜ c F X ( X ) U ( 0 , 1 ) . Therefore, θ c ( X ) Y , and so θ c is a replicating function for X and Y. If we choose c 0 , c 1 ( 0 , 1 ) with c 0 c 1 , then θ ˜ c 0 ( u ) θ ˜ c 1 ( u ) for u { 0 , c 0 , c 1 , 1 } . Since Q Y is strictly increasing under Assumption 1, it follows that θ c 0 ( x ) θ c 1 ( x ) for x { Q X ( 0 ) , Q X ( c 0 ) , Q X ( c 1 ) , Q X ( 1 ) } , and so continuity of F X implies that θ c 0 ( x ) θ c 1 ( x ) for all x in a set of P X -measure one. Thus, by allowing c to vary over ( 0 , 1 ) , we obtain an uncountable collection of replicating functions, no two of which are equal on a set of positive P X -measure. □
Proof of Proposition 2.
For θ 0 , θ 1 Θ we can use the triangle inequality to show that
M θ 1 M θ 0 R Y | F X y ; θ 1 F X y ; θ 0 | d F Y ( y ) .
For each y R Y we have
| F X y ; θ 1 F X y ; θ 0 | = | P X { x : θ 1 ( x ) y } P X { x : θ 0 ( x ) y } | = | P X { x : θ 1 ( x ) y < θ 0 ( x ) } P X { x : θ 0 ( x ) y < θ 1 ( x ) } | ,
and so applying the triangle inequality again we obtain
M θ 1 M θ 0 R Y P X { x : θ 1 ( x ) y < θ 0 ( x ) } d F Y ( y ) + R Y P X { x : θ 0 ( x ) y < θ 1 ( x ) } d F Y ( y ) .
Tonelli’s theorem implies that
R Y P X { x : θ 1 ( x ) y < θ 0 ( x ) } d F Y ( y ) = R X R Y 1 { θ 1 ( x ) y < θ 0 ( x ) } d F Y ( y ) d F X ( x ) .
The distribution function F Y is continuous under Assumption 1, and so we have
R Y 1 { θ 1 ( x ) y < θ 0 ( x ) } d F Y ( y ) = max { F Y ( θ 0 ( x ) ) F Y ( θ 1 ( x ) ) , 0 }
for each x R X . Therefore,
R Y P X { x : θ 1 ( x ) y < θ 0 ( x ) } d F Y ( y ) = R X max { F Y ( θ 0 ( x ) ) F Y ( θ 1 ( x ) ) , 0 } d F X ( x ) .
Similarly, we have
R Y P X { x : θ 0 ( x ) y < θ 1 ( x ) } d F Y ( y ) = R X max { F Y ( θ 1 ( x ) ) F Y ( θ 0 ( x ) ) , 0 } d F X ( x ) ,
and so
| M θ 1 M θ 0 | R X | F Y ( θ 1 ( x ) ) F Y ( θ 0 ( x ) ) | d F X ( x ) = d ( θ 0 , θ 1 ) .
 □
Proof of Proposition 3.
It is obvious that M ( θ ) = 0 if θ Θ * . We will prove the reverse implication. Suppose M ( θ ) = 0 . Then F X ( y ; θ ) = F Y ( y ) for all y in a set of P Y -measure one. Suppose F X ( c 0 ; θ ) F Y ( c 0 ) for some c 0 in the interior of R Y . Right continuity of F X ( · ; θ ) and F Y ensures that F X ( y ; θ ) F Y ( y ) for all y in some open interval ( c 0 , c 1 ) . Since F Y is strictly increasing on R Y under Assumption 1, ( c 0 , c 1 ) must have strictly positive P Y -measure, leading to a contradiction. Thus it must be the case that F X ( y ; θ ) = F Y ( y ) for all y in the interior of R Y . Since F X ( · ; θ ) is nondecreasing and takes values between zero and one, and F Y increases continuously from zero to one over R Y , it follows that F X ( y ; θ ) = F Y ( y ) for all y R , so that θ Θ * . □
Proof of Proposition 4.
If F X ( y ; θ n ) F Y ( y ) for each y R , then M ( θ n ) 0 by dominated convergence. Suppose F X ( c 0 ; θ n ) F Y ( c 0 ) for some c 0 R . Then there must be an increasing sequence of natural numbers n 1 , n 2 , and a real number ε > 0 (or ε < 0 ) such that F X ( c 0 ; θ n k ) F Y ( c 0 ) + ε (resp. F X ( c 0 ; θ n k ) F Y ( c 0 ) + ε ) for all k sufficiently large. Suppose ε > 0 . Since F Y is continuous under Assumption 1, we may choose c 1 > c 0 such that F Y ( c 1 ) = F Y ( c 0 ) + ε / 2 . Monotonicity of F X ( · ; θ n k ) and F Y then ensures that F X ( y ; θ n k ) F Y ( y ) + ε / 2 for all y [ c 0 , c 1 ] , for all k sufficiently large. Consequently, we have
| F X ( y ; θ n k ) F Y ( y ) | d F Y ( y ) ε 2 c 0 c 1 d F Y ( y ) = ε 2 ( F Y ( c 1 ) F Y ( c 0 ) ) = ε 2 4 > 0
for all k sufficiently large, implying that M ( θ n ) 0 . □
Proof of Proposition 5.
Since ( θ , x ) θ ( x ) is T n B ( R X ) -measurable, it follows that ( θ , x , y ) 1 { θ ( x ) y } is T n B ( R X ) B ( R ) -measurable. Tonelli’s theorem thus implies that ( θ , y ) R X 1 { θ ( x ) y } d P X ( x ) = F X ( y ; θ ) is T n B ( R ) -measurable, justifying the following interchange of integrals:
Θ n M d P θ n = Θ n R Y | F X ( y ; θ ) F Y ( y ) | d F Y ( y ) d P θ n ( θ ) = R Y Θ n | F X ( y ; θ ) F Y ( y ) | d P θ n ( θ ) d F Y ( y ) .
We thus have
Θ n M d P θ n R Y Θ n F X ( y ; θ ) d P θ n ( θ ) F Y ( y ) d F Y ( y ) .
Again using the T n B ( R ) -measurability of ( θ , y ) F X ( y ; θ ) , Tonelli’s theorem implies that
Θ n F X ( y ; θ ) d P θ n ( θ ) = Θ n R X 1 { θ ( x ) y } d P X ( x ) d P θ n ( θ ) = P θ n ( X ) ( , y ]
for each y R . Letting F θ n ( X ) denote the cdf corresponding to P θ n ( X ) , we now have
Θ n M d P θ n R Y | F θ n ( X ) ( y ) F Y ( y ) | d F Y ( y ) .
Arguing as in the proof of Proposition 4, we can show that R Y | F θ n ( X ) ( y ) F Y ( y ) | d F Y ( y ) 0 if and only if F θ n ( X ) ( y ) F Y ( y ) for each y R . Thus, Θ n M d P θ n 0 implies F θ n ( X ) F Y pointwise, which is equivalent to P θ n ( X ) P Y . □
Proof of Proposition 6.
Elementary arguments can be used to show that, for all θ Θ ,
| M n θ M θ | | F n Y y F Y y | d F n Y ( y ) + | F n X y ; θ F X y ; θ | d F n Y ( y ) + | F X y ; θ F Y y | d F n Y ( y ) | F X y ; θ F Y y | d F Y ( y ) .
We will establish a uniform bound on the order of each of the three terms on the right-hand side of (A1). These bounds will be expressed in terms of the outer expectation operator E * , denoting outer integration of nonnegative functions defined on the underlying probability space ( Ω , F , P ) ; see e.g., Section 1.2 in [30]. Obtaining a bound for the first term is simple as it does not depend on θ : Donsker’s theorem yields
E * | F n Y y F Y y | d F n Y ( y ) E * sup y R | F n Y y F Y y | = O ( n 1 / 2 ) .
For the second term on the right-hand side of (A1), we have
E * sup θ Θ n | F n X y ; θ F X y ; θ | d F n Y ( y ) E * sup θ Θ n sup y R | F n X y ; θ F X y ; θ | = E * sup g G n | P n X g P X g | ,
where G n is the class of indicator functions of sets of the form { x R X : θ ( x ) y } with θ Θ n and y R . Note that G n is the collection of indicators of all complements of sets majorized by Θ n . Since Θ n is VC-major with dimension V ( Θ n ) , G n must be VC-subgraph with dimension V ( Θ n ) . Hence Theorem 2.6.7 in [30] implies that, for any ε ( 0 , 1 ) and any probability measure Q on B ( R ) , there exists K < such that we have the uniform entropy bound N ( ε , G n , L 2 ( Q ) ) K V ( Θ n ) ( 16 e ) V ( Θ n ) ε 2 ( V ( Θ n ) 1 ) . Theorem 2.14.1 in [30] thus gives E * sup g G n | P n X g P X g | = O ( V ( Θ n ) / n ) , implying that
E * sup θ Θ n | F n X y ; θ F X y ; θ | d F n Y ( y ) = O V ( Θ n ) / n .
For the third term on the right-hand side of (A1), we have
E * sup θ Θ n | F X y ; θ F Y y | d F n Y ( y ) | F X y ; θ F Y y | d F Y ( y ) = E * sup h H n | P n Y h P Y h | ,
where H n is the class of functions { | F X · ; θ F Y · | : θ Θ n } . Consider the simpler class of functions H n 0 = { F X ( · ; θ ) : θ Θ n } . Since H n 0 is a subset of the collection of monotone increasing functions from R to [ 0 , 1 ] , Theorem 2.7.5 in [30] implies the existence of K < such that we have the uniform bracketing entropy bound N [ ] ( ε , H n 0 , L 2 ( P Y ) ) K ε 1 for all ε ( 0 , 1 ) . It is straightforward to show that N [ ] ( ε , H n , L 2 ( P Y ) ) N [ ] ( ε , H n 0 , L 2 ( P Y ) ) , and so Theorem 2.14.2 in [30] gives E * sup h H n | P n Y h P Y h | = O ( n 1 / 2 ) , implying that
E * sup θ Θ n | F X y ; θ F Y y | d F n Y ( y ) | F X y ; θ F Y y | d F Y ( y ) = O ( n 1 / 2 ) .
Collecting together these bounds on the order of the terms on the right-hand side of (A1), we obtain
E * sup θ Θ n | M n ( θ ) M ( θ ) | = O V ( Θ n ) / n .
It remains only to show that ω sup θ Θ n | M n ( ω , θ ) M ( θ ) | is F -measurable. Corollary 5.3.5 in [27] implies that ω sup θ Θ n | M n ( ω , θ ) M ( θ ) | is universally F -measurable if ( Θ n , B ( Θ n ) ) is a Souslin measurable space and ( ω , θ ) | M n ( ω , θ ) M ( θ ) | is F B ( Θ n ) -measurable. Since ( Ω , F , P ) is complete under Assumption 2, universal F -measurability implies F -measurability. Assumption 3 states that ( Θ n , B ( Θ n ) ) is a Souslin measurable space, and θ M ( θ ) is continuous and hence B ( Θ n ) -measurable by Proposition 3, so it suffices for us to show that ( ω , θ ) M n ( ω , θ ) is F B ( Θ n ) -measurable. Observe that
M n ( ω , θ ) = 1 n i = 1 n 1 n j = 1 n 1 { θ ( X j ( ω ) ) Y i ( ω ) } 1 n j = 1 n 1 { Y j ( ω ) Y i ( ω ) } .
It is clear from (A2) that ( ω , θ ) M n ( ω , θ ) is F B ( Θ n ) -measurable if ( ω , θ ) θ ( X j ( ω ) ) is F B ( Θ n ) -measurable for any j. This condition is satisfied for each j since B ( Θ n ) is an admissible structure for Θ n under Assumption 3, and each X j is F -measurable. □
Proof of Proposition 7.
We showed in the proof of Proposition 6 that ( ω , θ ) M n ( ω , θ ) is F B ( Θ n ) -measurable. Since ( Θ n , B ( Θ n ) ) is a Souslin measurable space, it follows from Corollary 5.3.5 in [27] that ω inf θ Θ n M n ( ω , θ ) is F -measurable. Consequently, we have
gr ( Θ ^ m * ) = ( ω , θ ) Ω × Θ n : M n ( ω , θ ) inf ϑ Θ n M n ( ω , ϑ ) + λ n F B ( Θ n ) ,
where gr ( Θ ^ m * ) denotes the graph of Θ ^ m * . Therefore, since p is continuous and hence B ( Θ n ) -measurable under Assumption 4, Theorem 2.17(a) of Stinchcombe and White [28] implies that ω inf θ Θ ^ m * ( ω ) p ( θ ) is F -analytic, and hence F -measurable given that ( Ω , F , P ) is complete under Assumption 2. Let H n : Ω × Θ n R be the function defined by
H n ( ω , θ ) = max M n ( ω , θ ) inf ϑ Θ n M n ( ω , ϑ ) λ n , p ( θ ) inf ϑ Θ ^ n * ( ω ) p ( ϑ ) ϵ n .
We have shown that ( ω , θ ) M n ( ω , θ ) is F B ( Θ n ) -measurable, ω inf θ Θ n M n ( ω , θ ) is F -measurable, θ p ( θ ) is B ( Θ n ) -measurable, and ω inf θ Θ ^ m * ( ω ) p ( θ ) is F -measurable, so it must be the case that ( ω , θ ) H n ( ω , θ ) is F B ( Θ n ) -measurable. Observe that, for all ω Ω , there exists θ Θ n such that H n ( ω , θ ) 0 ; here we use the fact that M n and p are bounded from below. The Sainte-Beuve selection theorem (see Theorem 5.3.2 in [27]) thus implies the existence of a measurable function θ ^ n from ( Ω , F ) to ( Θ n , B ( Θ n ) ) such that H n ( ω , θ ^ n ( ω ) ) 0 for all ω Ω . Clearly, θ ^ n must therefore satisfy θ ^ n ( ω ) Θ ^ n * ( ω ) and p ( θ ^ n ( ω ) ) inf θ Θ ^ n * ( ω ) p ( θ ) + ϵ n for all ω Ω , as required. □
Proof of Proposition 8.
We first show that P θ ^ n ( X ) P Y as n . For each ω Ω , we have θ ^ n ( ω ) Θ n and θ ^ n ( ω ) Θ ^ n * ( ω ) . These two inclusions justify the following two inequalities respectively:
M ( θ ^ n ( ω ) ) M n ( ω , θ ^ n ( ω ) ) + sup θ Θ n | M n ( ω , θ ) M ( θ ) | inf θ Θ n M n ( ω , θ ) + λ n + sup θ Θ n | M n ( ω , θ ) M ( θ ) | .
Since M n ( ω , θ ) M ( θ ) + sup ϑ Θ n | M n ( ω , ϑ ) M ( ϑ ) | for all θ Θ n , we therefore have
M ( θ ^ n ( ω ) ) inf θ Θ n M ( θ ) + λ n + 2 sup θ Θ n | M n ( ω , θ ) M ( θ ) | .
Proposition 2 implies that M ( θ ) M ( ϑ ) + d ( θ , ϑ ) for any θ , ϑ Θ . Proposition 3 implies that M ( θ ) = 0 for any arbitrary θ Θ , so we have M ( θ ) d ( θ , θ ) . Consequently, inf θ Θ n M ( θ ) inf θ Θ n d ( θ , θ ) , and so
M ( θ ^ n ( ω ) ) inf θ Θ n d ( θ , θ ) + λ n + 2 sup θ Θ n | M n ( ω , θ ) M ( θ ) | .
Integrating both sides over Ω , we obtain
Ω M ( θ ^ n ( ω ) ) d P ( ω ) inf θ Θ n d ( θ , θ ) + λ n + 2 Ω sup θ Θ n | M n ( ω , θ ) M ( θ ) | d P ( ω ) .
The first two terms on the right-hand side of (A3) converge to zero as n under Assumption 5. The third term is O ( V ( Θ n ) / n ) by Proposition 6, and therefore must also converge to zero under Assumption 5. Thus we have Ω M ( θ ^ n ( ω ) ) d P ( ω ) 0 as n . But Ω M ( θ ^ n ( ω ) ) d P ( ω ) = Θ n M ( θ ) d P θ ^ n ( θ ) , and so Proposition 5 implies that P θ ^ n ( X ) P Y as n .
We next show that P { ω : p ( θ ^ n ( ω ) ) > inf θ Θ * p ( θ ) + ε } 0 for any ε > 0 as n . Fix ε > 0 , and choose θ Θ such that p ( θ ) inf θ Θ * p ( θ ) + ε / 2 . Choose a sequence θ n Θ n , n N , such that d ( θ n , θ ) = O ( inf θ Θ n d ( θ , θ ) ) . Observe that
P ω : p ( θ ^ n ( ω ) ) > inf θ Θ * p ( θ ) + ε P ω : p ( θ ^ n ( ω ) ) > p ( θ ) + ε / 2 , θ n Θ ^ n * ( ω ) + P ω : θ n Θ ^ n * ( ω ) .
We will show that the two terms on the right-hand side of (A4) tend to zero as n . First we consider the first term. If θ n Θ ^ n * ( ω ) , we must have p ( θ ^ n ( ω ) ) p ( θ n ) + ϵ n . Therefore,
P ω : p ( θ ^ n ( ω ) ) > p ( θ ) + ε / 2 , θ n Θ ^ n * ( ω ) P ω : p ( θ n ) + ϵ n > p ( θ ) + ε / 2 , θ n Θ ^ n * ( ω ) P ω : p ( θ n ) + ϵ n > p ( θ ) + ε / 2 = 1 p ( θ n ) + ϵ n > p ( θ ) + ε / 2 .
Assumption 4 states that p is continuous. Therefore, since d ( θ n , θ ) 0 and ϵ n 0 as n , we must have 1 { p ( θ n ) + ϵ n > p ( θ ) + ε / 2 } = 0 for all sufficiently large n. Thus the first term on the right-hand side of (A4) is zero for all sufficiently large n. Next we consider the second term on the right-hand side of (A4). We have
P ω : θ n Θ ^ n * ( ω ) P ω : M n ( ω , θ n ) > λ n λ n 1 Ω M n ( ω , θ n ) d P ( ω ) ,
using Markov’s inequality to obtain the second inequality. Proposition 6 gives
Ω M n ( ω , θ n ) d P ( ω ) M ( θ n ) + Ω sup θ Θ n | M n ( ω , θ n ) M ( θ n ) | d P ( ω ) = M ( θ n ) + O V ( Θ n ) / n .
Proposition 2 implies that M ( θ n ) M ( θ ) + d ( θ n , θ ) , while Proposition 3 implies that M ( θ ) = 0 , so we have
Ω M n ( ω , θ n ) d P ( ω ) d ( θ n , θ ) + O V ( Θ n ) / n = O inf θ Θ n d ( θ , θ ) + O V ( Θ n ) / n .
It now follows from Assumption 5 that the second term on the right-hand side of (A4) tends to zero as n . □
Proof of Proposition 9.
We first prove (i). Let Λ m ( s ) denote the collection of all continuous functions from [ 0 , ) to R that are linear on each of the m + 1 subintervals ( 0 , s 1 ) , ( s 1 , s 2 ) , , ( s m , ) . Λ m ( s ) is an ( m + 2 ) -dimensional real vector space of functions, and so Theorem 7.2 in [34] implies that Λ m ( s ) is a VC-major class with V ( Λ m ( s ) ) = m + 3 . Since Ψ m ( s ) Λ m ( s ) , it must be the case that V ( Ψ m ( s ) ) V ( Λ m ( s ) ) . Moreover, given any θ 0 Λ m ( s ) and any interval [ a , b ] R , we can find θ 1 Ψ m ( s ) and c R such that θ 1 ( x ) = θ 0 ( x ) + c for all x [ a , b ] . It is easy to see that this implies V ( Ψ m ( s ) ) V ( Λ m ( s ) ) . This proves (i).
We next prove (ii). First, observe that d is a metric (rather than merely a pseudometric) on Ψ m ( s ) , because any two distinct functions θ 0 , θ 1 Ψ m ( s ) must differ everywhere on some open interval, which must be of positive P X -measure under Assumption 6. Since F Y is strictly increasing on [ 0 , ) under Assumption 6, we will thus have F Y θ 0 and F Y θ 1 differing everywhere on the interval in question, forcing d ( θ 0 , θ 1 ) to be nonzero. It remains to show that the metric space ( Ψ m ( s ) , d ) is topologically isomorphic to a complete separable metric space. Each function θ Ψ m ( s ) can be written in the form θ ( x ) = i = 1 m + 2 β i f i ( x ) for some unique β Ψ ˜ m ( s ) , where f 1 ( x ) = 1 , f 2 ( x ) = x , f i + 2 ( x ) = max { 0 , x s i } for i = 1 , , m , and
Ψ ˜ m ( s ) = β R m + 2 : i = 1 m + 2 β i f i ( x ) 0 for all x R X .
Similarly, each β Ψ ˜ m ( s ) uniquely identifies a function θ Ψ m ( s ) . We will denote this bijection between Ψ m ( s ) and Ψ ˜ m ( s ) by S : Ψ m ( s ) Ψ ˜ m ( s ) . It is easy to see that Ψ ˜ m ( s ) is a complete separable metric space. We will show that S defines a topological isomorphism between ( Ψ m ( s ) , d ) and ( Ψ ˜ m ( s ) , d ˜ ) , where d ˜ is the usual Euclidean metric on Ψ ˜ m ( s ) . That is, we will show that S and S 1 are continuous. Suppose β 1 , β 2 , is a sequence in Ψ ˜ m ( s ) converging to some β * Ψ ˜ m ( s ) , and let θ * = S 1 β * and θ n = S 1 β n for each n N . For x [ 0 , ) , Cauchy’s inequality gives
θ n ( x ) θ * ( x ) = i = 1 m + 2 ( β n , i β i * ) f i ( x ) d ˜ ( β n , β * ) i = 1 m + 2 f i ( x ) 2 1 / 2 ,
and hence θ n converges to θ * pointwise. It then follows from dominated convergence that d ( θ n , θ * ) 0 , which proves that S 1 is continuous. Suppose now that β 1 , β 2 , does not converge to β * Ψ ˜ m ( s ) . Then we can choose a subsequence β n 1 , β n 2 , and a constant ε > 0 such that d ˜ ( β n k , β * ) > ε for all k. For x R X and all k, we have
θ n k ( x ) θ * ( x ) = d ˜ ( β n k , β * ) i = 1 m + 2 γ n k , i f i ( x ) ε i = 1 m + 2 γ n k , i f i ( x ) ,
where γ n k = d ˜ ( β n k , β * ) 1 ( β n k β * ) . The subsequence γ n 1 , γ n 2 , takes values in the unit sphere in R m + 2 , which is compact, and so we have a further subsequence γ n k 1 , γ n k 2 , that converges to some γ * in the unit sphere. Therefore, arguing as we did above with Cauchy’s inequality, we have i = 1 m + 2 γ n k , i f i ( x ) i = 1 m + 2 γ i * f i ( x ) pointwise in x as j . Noting that i = 1 m + 2 γ i * f i ( x ) 0 on a set of positive P X -measure, we conclude that the subsequence θ n k cannot contain a further subsequence that converges to θ * pointwise on a set of P X -measure one. Recall (see e.g., Theorem 9.2.1 in [35]) that a sequence of random variables converges in probability if and only if every subsequence contains a further subsequence that is almost surely convergent. It must therefore be the case that θ n ( X ) p θ * ( X ) . Since F Y has a continuous inverse, this implies that F Y ( θ n ( X ) ) p F Y ( θ * ( X ) ) , which implies that E | F Y ( θ n ( X ) ) F Y ( θ * ( X ) ) | 0 . That is, d ( θ n , θ * ) 0 . This establishes that S is continuous, which proves (ii).
To prove (iii), we must show that ( θ , x ) θ ( x ) is B ( Ψ m ( s ) ) B ( R X ) -measurable. By Theorem 12.2.1 in [35], it suffices to show that ( Ψ m ( s ) , d ) is a separable metric space, and that θ θ ( x ) is continuous in θ for each x R X . The former assertion was established in (ii). To demonstrate the latter assertion, we let θ 1 , θ 2 , be a sequence in Ψ m ( s ) converging to some θ * Ψ m ( s ) . Using the topological isomorphism S introduced above, we identify our sequence in Ψ m ( s ) with another sequence β 1 , β 1 , in Ψ ˜ m ( s ) converging to β * Ψ ˜ m ( s ) . As above, we have | θ n ( x ) θ * ( x ) | d ˜ ( β n , β * ) ( i = 1 m + 2 f i ( x ) 2 ) 1 / 2 for each x R X , so that θ n converges to θ * pointwise. This proves that θ θ ( x ) is continuous in θ for each x R X . □
Proof of Proposition 10.
We first show that Θ is dense in Θ * under d. Fix a function θ * Θ * . The function θ ˜ * = F Y θ * Q X is a Borel measurable mapping from [ 0 , 1 ) to [ 0 , 1 ) ; we extend the domain and range of θ ˜ * to [ 0 , 1 ] by setting θ ˜ * ( 1 ) = 1 . A well known consequence of Urysohn’s lemma (see e.g., Lemma 2.6.3 in [35]) is that the continuous functions on [ 0 , 1 ] form a dense subset of the Lebesgue integrable functions on [ 0 , 1 ] under the L 1 -seminorm. It is also well known (see e.g., Theorem 11.2.4 and the following example in [35]) that any continuous function on [ 0 , 1 ] can be approximated arbitrarily well by a continuous piecewise linear function on [ 0 , 1 ] , with finitely many kinks. Such a function can in turn be approximated arbitrarily well by a continuous piecewise linear function on [ 0 , 1 ] , with finitely many kinks, for which the slope of the function is nonzero wherever it is defined. We will let θ ˜ 1 , θ ˜ 2 , be a sequence of such functions, chosen such that lim k | θ ˜ * ( u ) θ ˜ k ( u ) | d u = 0 and 0 θ ˜ k 1 for each k N .
Let U be a random variable distributed uniformly on [ 0 , 1 ] , and let F U ( · ; θ ˜ k ) denote the distribution function of θ ˜ k ( U ) . The piecewise linearity and nonzero slope of each θ ˜ k ensures that F U ( · ; θ ˜ k ) is Lipschitz continuous for each k. Specifically, for a , b [ 0 , 1 ] we have | F U ( b ; θ ˜ k ) F U ( a ; θ ˜ k ) | ν k 1 ( N k + 1 ) | b a | , where ν k is a lower bound on the derivative of θ ˜ k , and N k is the number of kinks. Let θ ˜ k : [ 0 , 1 ] [ 0 , 1 ] be defined by θ ˜ k ( u ) = F U ( θ ˜ k ( u ) ; θ ˜ k ) . Since θ ˜ k and F U ( · ; θ ˜ k ) are Lipschitz continuous for each k, each θ ˜ k must also be Lipschitz continuous. Let θ k = Q Y θ ˜ k F X , and observe that θ k ( X ) = Q Y ( θ ˜ k ( F X ( X ) ) ) Q Y ( θ ˜ k ( U ) ) = Q Y ( F U ( θ ˜ k ( U ) ; θ ˜ k ) ) Q Y ( U ) Y . This establishes that θ k Θ for each k. It remains to show that lim k d ( θ k , θ * ) = 0 . Since d ( θ k , θ * ) = | θ ˜ k ( u ) θ ˜ * ( u ) | d u , it suffices to show lim k | θ ˜ k ( u ) θ ˜ k ( u ) | d u = 0 . Observe that | θ ˜ k ( u ) θ ˜ k ( u ) | d u = | F U ( θ ˜ k ( u ) ; θ ˜ k ) θ ˜ k ( u ) | d u sup 0 v 1 | F U ( v ; θ ˜ k ) v | . Since θ ˜ * ( U ) U and lim k | θ ˜ * ( u ) θ ˜ k ( u ) | d u = 0 , Propositions 2–4 jointly imply that θ ˜ k ( U ) d U as k . But this implies that lim k sup 0 v 1 | F U ( v ; θ ˜ k ) v | = 0 , which proves lim k | θ ˜ k ( u ) θ ˜ k ( u ) | d u = 0 . We conclude that Θ is dense in Θ * under d.
Next we show that inf θ Θ n d ( θ , θ ) = O ( m n sup 0 i m n P X ( s i , n , s i + 1 , n ) 2 ) for each θ Θ . Choose θ n Θ n such that θ n ( s i , n ) = θ ( s i , n ) for i = 0 , , m n , and such that θ n ( x ) is constant for x s m n , n . Then we have
d ( θ n , θ ) = i = 0 m n s i , n s i + 1 , n | F Y ( θ n ( x ) ) F Y ( θ ( x ) ) | d F X ( x ) = i = 0 m n F X ( s i , n ) F X ( s i + 1 , n ) | F Y θ n Q X ( u ) F Y θ Q X ( u ) | d u ,
where F X ( ) = 1 . Our choice of θ n and the Lipschitz property of F Y θ Q X ensure that
sup F X ( s i , n ) < u < F X ( s i + 1 , n ) | F Y θ n Q X ( u ) F Y θ Q X ( u ) | K | F X ( s i + 1 , n ) F X ( s i , n ) |
for i = 0 , , m n , where K is the Lipschitz coefficient of F Y θ Q X . We thus have
d ( θ n , θ ) K i = 0 m n | F X ( s i + 1 , n ) F X ( s i , n ) | 2 = K i = 0 m n P X ( s i , n , s i + 1 , n ) 2 K m n sup 0 i m n P X ( s i , n , s i + 1 , n ) 2 ,
giving the desired result. □

References

  1. Kat, H.M. Alternative routes to hedge fund return replication. J. Wealth Manag. 2007, 10, 29–39. [Google Scholar] [CrossRef]
  2. Laise, E. The Hedge Fund ‘Clones’. The Wall Street Journal. 2007. Available online: https://www.wsj.com/articles/SB118497668204773569 (accessed on 2 July 2021).
  3. Cassidy, J. Hedge Clipping. The New Yorker. 2007. Available online: https://www.newyorker.com/magazine/2007/07/02/hedge-clipping (accessed on 2 July 2021).
  4. Simonian, J.; Wu, C. Factors in time: Fine tuning hedge fund replication. J. Portf. Manag. 2019, 45, 159–164. [Google Scholar] [CrossRef]
  5. Hasanhodzic, J.; Lo, A.W. Can hedge-fund returns be replicated? The linear case. J. Invest. Manag. 2007, 5, 5–45. [Google Scholar] [CrossRef] [Green Version]
  6. Amin, G.S.; Kat, H.M. Hedge fund performance 1990–2000: Do the “money machines” really add value? J. Financ. Quant. Anal. 2003, 38, 251–274. [Google Scholar] [CrossRef] [Green Version]
  7. Kat, H.M.; Palaro, H.P. Hedge fund returns: You can make them yourself! J. Wealth Manag. 2005, 8, 62–68. [Google Scholar] [CrossRef]
  8. Kat, H.M.; Palaro, H.P. Who Needs Hedge Funds? A COPULA Based Approach to Hedge Fund Return Replication. Alternative Investment Research Centre Working Paper No. 27, Cass Business School Research Papers; University of London: London, UK, 2005. [Google Scholar] [CrossRef]
  9. Black, F.; Scholes, M. The pricing of options and corporate liabilities. J. Political Econ. 1973, 81, 637–654. [Google Scholar] [CrossRef] [Green Version]
  10. Merton, R. An intertemporal capital asset pricing model. Econometrica 1973, 41, 867–887. [Google Scholar] [CrossRef]
  11. Dybvig, P.H. Distributional analysis of portfolio choice. J. Bus. 1988, 63, 369–393. [Google Scholar] [CrossRef] [Green Version]
  12. Dybvig, P.H. Inefficient dynamic portfolio strategies or how to throw away a million dollars in the stock market. Rev. Financ. Stud. 1988, 1, 67–88. [Google Scholar] [CrossRef]
  13. Beare, B.K. Measure preserving derivatives and the pricing kernel puzzle. J. Math. Econ. 2011, 47, 689–697. [Google Scholar] [CrossRef]
  14. Jackwerth, J.C. Recovering risk aversion from option prices and realized returns. Rev. Financ. Stud. 2000, 13, 433–451. [Google Scholar] [CrossRef]
  15. Brown, D.P.; Jackwerth, J.C. The pricing kernel puzzle: Reconciling index option data and economic theory. In Derivative Securities Pricing and Modelling; Batten, J.A., Wagner, N., Eds.; Emerald: Bingley, UK, 2012; pp. 155–183. [Google Scholar]
  16. Aït-Sahalia, Y.; Lo, A.L. Nonparametric risk management and implied risk aversion. J. Econom. 2000, 94, 9–51. [Google Scholar] [CrossRef] [Green Version]
  17. Rosenberg, J.V.; Engle, R.F. Empirical pricing kernels. J. Financ. Econ. 2002, 64, 341–372. [Google Scholar] [CrossRef] [Green Version]
  18. Bakshi, G.; Madan, D.; Panayotov, G. Returns of claims on the upside and the viability of U-shaped pricing kernels. J. Financ. Econ. 2010, 97, 130–154. [Google Scholar] [CrossRef]
  19. Golubev, Y.; Härdle, W.K.; Timonfeev, R. Testing monotonicity of pricing kernels. Adv. Stat. Anal. 2014, 98, 305–326. [Google Scholar] [CrossRef] [Green Version]
  20. Chaudhuri, R.; Schroder, M. Monotonicity of the stochastic discount factor and expected option returns. Rev. Financ. Stud. 2015, 28, 1462–1505. [Google Scholar] [CrossRef]
  21. Härdle, W.K.; Okhrin, Y.; Wang, W. Uniform confidence bands for pricing kernels. J. Financ. Econom. 2015, 13, 376–413. [Google Scholar] [CrossRef] [Green Version]
  22. Beare, B.K.; Schmidt, L.D.S. An empirical test of pricing kernel monotonicity. J. Appl. Econom. 2016, 31, 338–356. [Google Scholar] [CrossRef] [Green Version]
  23. Beare, B.K.; Dossani, A. Option augmented density forecasts of market returns with monotone pricing kernel. Quant. Financ. 2018, 18, 623–635. [Google Scholar] [CrossRef]
  24. Vapnik, V.N.; Červonenkis, A.Y. On the uniform convergence of relative frequencies of events to their probabilities. Theory Probab. Appl. 1971, 16, 264–280. [Google Scholar] [CrossRef]
  25. Corradi, V.; White, H. Regularized neural networks: Some convergence rate results. Neural Comput. 1995, 7, 1225–1244. [Google Scholar] [CrossRef]
  26. Billingsley, P. Convergence of Probability Measures; Wiley: New York, NY, USA, 1968. [Google Scholar]
  27. Dudley, R.M. Uniform Central Limit Theorems; Cambridge University Press: Cambridge, UK, 1999. [Google Scholar]
  28. Stinchcombe, M.B.; White, H. Some measurability results for extrema of random functions over random sets. Rev. Econ. Stud. 1992, 59, 495–512. [Google Scholar] [CrossRef]
  29. Chen, X. Large sample sieve estimation of semi-nonparametric models. In Handbook of Econometrics; Heckman, J.J., Leamer, E.E., Eds.; Elsevier: Amsterdam, The Netherlands, 2007; Volume 6B, ch. 76. [Google Scholar]
  30. van der Vaart, A.W.; Wellner, J.A. Weak Convergence and Empirical Processes; Springer: New York, NY, USA, 1996. [Google Scholar]
  31. Doukhan, P.; Massart, P.; Rio, E. Invariance principles for absolutely regular empirical processes. Ann. L’institut Henri Poincaré Probab. Stat. 1995, 31, 393–427. [Google Scholar]
  32. Rio, E. Processus empiriques absolument réguliers et entropie universelle. Probab. Theory Relat. Fields 1998, 111, 585–608. [Google Scholar] [CrossRef]
  33. Rio, E. Théorie Asymptotique des Processus Aléatoires Faiblement Dépendants; Springer: Berlin, Germany, 2000. [Google Scholar]
  34. Dudley, R.M. Central limit theorems for empirical measures. Ann. Probab. 1978, 6, 899–929. [Google Scholar] [CrossRef]
  35. Dudley, R.M. Real Analysis and Probability, 2nd ed.; Cambridge University Press: Cambridge, UK, 2002. [Google Scholar]
Figure 1. Examples of replicating functions when X , Y U ( 0 , 1 ) .
Figure 1. Examples of replicating functions when X , Y U ( 0 , 1 ) .
Entropy 23 01063 g001
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Beare, B.K. Distributional Replication. Entropy 2021, 23, 1063. https://doi.org/10.3390/e23081063

AMA Style

Beare BK. Distributional Replication. Entropy. 2021; 23(8):1063. https://doi.org/10.3390/e23081063

Chicago/Turabian Style

Beare, Brendan K. 2021. "Distributional Replication" Entropy 23, no. 8: 1063. https://doi.org/10.3390/e23081063

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop