1. Introduction
In their seminal paper [
1], Hastie and Stuetzle proposed a notion of principal curves as an elegant and geometric non-linear generalization of factor models as the principal component analysis. A principal curve has the property of self-consistence in the sense that it passes through the middle of the data set representing a sample of some random variable. More precisely, any point of the curve coincides with the expected value of the data projected on it. This is a direct consequence of the fact that a principal curve
is critical for the variance of the Euclidean distance between the data and any locally defined perturbation of
. In particular, a straight line is a principal curve if and only if its direction is an eigenvector of the covariance matrix of
z, where
z stands for the a vector containing the observed data.
The original idea by Hastie and Stuetzle has been developed into relevant improvements, applications and extensions. We point out however that the criticality of a principal curve is usually defined in terms of the Euclidean distance. Hence, although itself could represent non-Euclidean features of the model, some underlying least-squares approach is still in force. Our main contribution here is to rephrase the notion of principal curves (and, more generally, of principal p-submanifolds) in terms of a general statistical divergence which replaces the Euclidean divergence, that is, the variance used in the original definition.
Considering statistical divergences as Kullback–Leibler or Bregman divergence allows us to deal with random variables with probabilities given by exponential and deformed exponential distributions. In the context of exponential and -exponential statistical families, straight lines are replaced by affine geodesics and the Hessian of the cumulant function plays the role of a generalized covariance.
As highlighted by Naudts, deformed exponentials play a central role in the foundations of the Generalized Thermostatistics formulated by Tsallis [
2,
3] and collaborators. This new approach to Thermodynamics has been evolved along the last two decades in a wide range of applications to complex systems, particularly in Finance [
4,
5,
6,
7]. Indeed, Naudts’ work established deep and fruitful connections between Statistical Physics and Information Geometry [
8,
9,
10,
11]. For instance, both Rényi’s and Tsallis’ entropies are described by Naudts in terms of statistical divergences in the family of
q-exponential distributions that includes
q-Gaussian distributions, defined in details by Plastino and Vignat [
11,
12,
13,
14,
15]. The analytic and geometric features of deformed exponentials suggest that they are well suited to model non-normally distributed returns of contingent claims. In this direction, for instance, a non-Gaussian option pricing theory has been successfully proposed in terms of diffusion processes associated to
q-Gaussian distributions [
4,
7,
16,
17,
18]. Other related developments are summarized in [
6,
19].
In [
20,
21], the authors elaborated some preliminary results towards a theory of portfolio optimization in the context of deformed exponentials. One of the cornerstones of the modern Finance Theory, the classical Markowitz’s mean-variance model of portfolio selection, relies on the assumptions that the returns of assets are normally distributed and that the investor preferences are described by constant risk-aversion utility function. Some works have dealt with other variations of the portfolio optimization problem. For instance, Zagrodny [
22] proposed a convex programming solution to portfolio selection by considering the Hilbert space and a reinsurance approach. In the multi-objective model, Sawik [
23] used the expected return as the performance metric and expected worst-case return to measure the risk and providing good interpretation about the consideration of the variance of the risks.
The traditional criticism to the normality assumption in Markowitz’s theory raises the need of alternative models for dealing with non-Gaussian distributions. This question has been addressed since then under different methods. In [
24,
25], Nock et al. extended the Markowitz’s model to the wider family of exponential distributions, replacing the mean-variance by a mean-divergence model. Bregman divergences replace the variance as risk measures for non-Gaussian distributions, eventually encompassing information from higher order momenta. On the other hand, since statistical divergences define geometric notions on the statistical manifold of exponential distributions, their method has a geometric interpretation in terms of a steepest descent by the natural gradient of the risk premium [
26,
27,
28].
In [
20], the authors proposed a model of portfolio selection of financial assets that explores the non-additivity and non-normality aspects of Tsallis’ Thermostatistics. More precisely, they have extended the mean-divergence model in [
24,
25] to deformed exponentials families.
In the sequel, the authors formulated [
21] a generalization of beta pricing models adapted to a mean-divergence portfolio selection [
29,
30,
31]. In particular, it is presented an extension of Capital Asset Pricing Model (CAPM) flexible enough to be applied for financial returns with deformed exponential distributions. The method relies on a geometric approach to the classical mean-variance analysis developed by LeRoy and Werner [
32] and Luenberger [
33] (see also [
34]). The main results in [
20,
21] are summarized in
Section 3.
This paper is structured as follows. In
Section 2, we define the generalized notion of principal curve and principal submanifold in the geometric context of a given statistical divergence. The earlier contributions for portfolio selection and asset pricing in the case of financial returns distributed according deformed exponential probability densities are schematically resumed in
Section 3. In
Section 4, we apply the generalized notion of principal submanifolds and the correspondent version of the principal component analysis to obtain an explicit expression of optimal principal portfolios is provided in
Section 5.
2. Statistical Divergences and Principal Curves
Let
be a space of random variables
whose probability distributions are given by densities lying in a
n-dimensional statistical manifold
where
are statistical parameters ranging in some open subset
U of the
n-dimensional Euclidean space
. Let
D be a given statistical divergence in
. Given a curve
, a projection of
z on the trace of
is a point
, for some
, such that
In the following, we suppose that such a projection exists and it is unique for any curve
we are going to consider. Under this assumption, we denote
In this notation, we propose the following variational notion of principal curve relative to D:
Definition 1. A curve is a principal curve
in iffor all one-parameter family of curves , , such that . Recall that a statistical divergence
D determines a dually flat structure in
for which affine geodesics are parameterized as straight lines of the form
where
and
are constant vectors. By definition, the projection of the random variable
z on a principal curve
minimizes the divergence among the projections on curves close to
. Projections satisfy a Pythagorean theorem, one of the fundamental results in Information Geometry that can be stated as follows.
Theorem 1 (Theorem 1.2 and Theorem 1.3, [
26])
. Given such that that the dual affine geodesic connecting z and w is orthogonal to the affine geodesic connecting w and o, the following generalized Pythagorean relation holds Similarly, if the affine geodesic connecting z and w is orthogonal to the dual affine geodesic connecting w and o, we have the dual relationwhere is the dual divergence. The dual divergence is defined with respect to the dual connection as defined in [26]. In view of this proposition, it is natural to draw our attention to one-parameter families of affine geodesics in .
Theorem 2. An affine geodesic in is a principal curve with respect to one-parameter families of affine geodesics if and only if its direction is an eigenvector of the Fisher metric associated to D.
Proof. Denote by
and
, respectively, the differential and Hessian of
D with respect to the second variable. Hence, we have for a fixed
that
where the derivative is computed at the critical value
and for a fixed value of
s,
is a geodesic parameterized by
which we can write
.
We may write
where
is the variational field that corresponds to
. Thus, we have
and
If
is a critical curve, we have
Setting
and
, one gets
Since
can be arbitrarily chosen in such a way that
and
are linearly independent, we conclude there exists
such that
This means that
is an an eigenvector of the Fisher information metric at the point
associated to the divergence
D. This finishes the proof. ☐
A result concerning principal submanifolds similar to Theorem 2 follows easily as a scholia of its proof: we may consider the projection of the random variable z onto a p-dimensional affine submanifold in parameterized by a smooth map whose differential has rank p. The submanifold is principal with respect to families of affine submanifolds if and only if it is spanned by p geodesics whose velocities are linearly independent and are eigenvectors of the Hessian matrix at the projection point.
A fundamental example of divergence is the Euclidean
-norm
on which is based both the least-squares method and the principal component analysis. In their seminal work [
1], Hastie and Stuetzle proved that a Euclidean straight line is a principal curve with respect to their definition if and only if its direction is an eigenvector of the covariance matrix of the random variable
z.
Now, we obtain an extension of this result by Hastie and Stuetzle valid in the context of non-Euclidean statistical divergences. In our setting, the role of the covariance matrix is played by its non-Euclidean and non-Gaussian counterpart, namely the Hessian matrix , where K is the cumulant generating function.
Corollary 1. Let K be a convex function in and D be the Bregman divergence in determined by K. Then an affine geodesic is a principal curve with respect to one-parameter families of affine geodesics in if and only if its direction is an eigenvector of the Hessian of K.
Proof. This follows directly from Theorem 2 once we have observed that the Fisher metric in this case coincides with the Hessian of
K. This is however a well-known fact that may be deduced easily from the definition of the Bregman divergence itself as
For details, we refer the reader to [
26]. ☐
In the sequel, we are going to consider more general examples, not necessarily quadratic. For instance, we may fix the Kullback–Leibler divergence
or, more generally, the relative
-entropy associated to
-exponential distributions.
3. The Space of Financial Assets
From now on,
stands for the linear span of financial assets traded in a securities market. More precisely, every point in
corresponds to the payoff
z of a contingent claim at a fixed time, say
, a random variable
where
s are the states of the world with probability distribution specified by some density
. Recall that
is the distribution parameter of a family of probability distributions in a
n-dimensional statistical manifold
.
In the following, we will consider the statistical manifold of
-exponential probability densities
where
T is a sufficient statistics of the random variable
and
K is the moment generating function. Here,
is a fixed reference density and
is the
-exponential defined as the inverse function of the
-logarithm [
8,
9]
where
is a strictly positive, nondecreasing and continuous real function. A particular case of this deformed exponential is given by the
q-exponential function
with
, which corresponds to set
, Hence, the
q-logarithm is defined by
The moment generating function
K defines a Bregman divergence given by
where the probability distributions of
and
are, respectively, given by the densities
and
.
3.1. Deformed Exponentials and Portfolio Selection
Setting
, one gets the family of exponential distributions, in particular multivariate Gaussian distributions. For this family, Nock et al. [
24,
25] represented the key concepts of Portfolio Selection theory in terms of the moment generating function and the associated Bregman divergence. More precisely, they proved that, for constant absolute risk aversion (CARA) utility functions, the certainty equivalent and risk premium of risky assets are, respectively, given by
and
where
is a risk-aversion parameter. Hence, they extended the classical mean-variance portfolio selection to a general mean-divergence model for which an optimal allocation
is a solution of the minimization problem
In the particular case of Gaussian distributed returns, they easily recover the classical Markowitz’s optimal portfolio allocation vector
where
is the variance-covariance matrix of the returns on the assets.
In [
20], the authors extended this approach to
-exponential distributions, in particular to
q-exponential distributions. They proved that the optimal portfolio for their extended mean-divergence model is given in terms of the cumulant function by
Note that the Hessian of the (convex) function
K is positive-definite and plays the role of the variance-covariance matrix in the Gaussian case. In the particular case of
q-Gaussian distributions [
14], the optimal allocation portfolio is given by
where
with
and
Here,
is the determinant of
. We refer the reader to [
14] for further details in
q-multivariate Gaussian distributions. It is evident that one re-obtains the Markowitz’s portfolio for
in Equation (
10).
In view of Equation (
9), the authors have elaborated in [
20] a steepest descent algorithm by the natural (Riemannian) gradient of the risk premium. Some empirical support to the proposed method is provided by comparing the cumulated returns and the evolution of the divergence for optimal portfolios according to the mean-divergence model and the classical one by Markowitz. The numerical evaluations in [
20] show the proposal is able to yield better tracking of deep changes in the stock market, such as the ones present in crisis scenarios, and yet produce a higher return than the classical mean-variance strategy.
3.2. Mean-Divergence Efficient Frontier
In Markowitz’s model, the optimal portfolio allocation lies in the mean-variance efficient frontier that bounds the feasible set of allowed returns and risks of traded risky portfolios. In [
32,
33], LeRoy, Werner and Luenberger have developed a geometric approach to the mean-variance analysis in terms of the geometry of orthogonal projections onto a mean-variance efficient frontier. From this approach, they easily deduce an elegant geometric interpretation of the celebrated Capital Asset Pricing Method (CAPM) as well as other factor pricing models.
In [
21], the authors have extended the geometric pricing method to general divergence geometries in
instead of the Hilbert space
-norm.
Since
K is a strictly convex function, its Hessian is positive-definite and then defines a Riemannian metric in
, that is, for each
, we define an inner product in the tangent space
by
This metric can be expanded in local coordinates around a fixed reference point
as
where quadratic terms are determined in terms of the Riemann curvature of the Riemannian manifold
, see [
35].
Denote by
the expectation kernel, that is, an asset in
that yields the expected payoffs of the assets in
. More precisely,
for any
. We define the pricing kernel
as an asset in
that gives the price of any contingent claim
as the expected discounted payoff
where
m is a stochastic discount factor. Here,
is the price functional, that is, the present value of the expected returns of the asset, discounted at rate
m. The existence of this functional is one of the consequences of the Fundamental Theorem of Finance Theory whose key assumption is that there are no arbitrage portfolios in
. For a comprehensive treatment of those fundamentals on Finance, we refer the reader to [
32,
36].
Denote by
the subspace in
spanned by
and
. The projection
of
onto
is defined by
It follows from the generalized Pythagorean Theorem for divergences (Theorem 1) that, fixing a reference point
, one has
If the case of the divergence given by the Euclidean
-norm in
Equation (
16) reduces to the Euclidean decomposition
where
is the variance, the classical risk measure in Portfolio Theory [
36,
37].
Motivated by the analogy between Equations (
16) and (
17), the authors proposed in [
21] the projection
as a novel risk measure for assets
. Since it depends on the whole information about the probability densities
, this measure encodes higher moments of
z instead of only the variance. Moreover, one easily verifies that
is the variance in the case of normally distributed returns and Euclidean divergence. Hence, we have defined a risk measure that embodies non-normality and non-Euclidean features of the returns of financial assets and the estimation of their statistical parameters, respectively.
The main result in [
21] is that the two reference assets
and
determine the efficient frontier for portfolios of assets in
with respect to the risk measure
. Indeed, we have the following theorem.
Theorem 3 (Theorem 2 in [
21])
. Let the subspace in spanned by the expectation and pricing kernels. Given , we haveandwhere is the projection of z onto . Since the efficient frontier is spanned by two assets, this last result can be regarded as a non-Gaussian and non-Euclidean version of the two-fund spanning theorem in Finance. Generalizing the mean-variance case, we can prove in the case of
-exponentials that the efficient mean-divergence frontier for portfolio selection is spanned by two portfolios
where
is the desired expected return of the portfolio.
4. Generalized Beta Pricing Models and CAPM
Denote by
and
the returns of
and
, respectively. In [
21], the authors have proved that the minimum divergence portfolio in
is given by
where
A similar expression holds replacing the basic assets
and
by two efficient assets
and
in
such that
These zero-covariance pair of assets is given by
and
where
is given by
Note that is well-defined if and only if is not the minimum divergence portfolio in .
We have obtained in [
21] a generalized beta pricing equation involving
and
for assets in
, where the generalized beta coefficient is given by
If there exists a risk-free asset
with return
in
, we fix
reducing Equation (
20) to
As in the classical CAPM, we can take
as the market return
since it is possible to prove under some assumptions that
is in the mean-divergence efficient frontier. More precisely, this is the case when every agent in the market has consumption preferences given by a time-separable utility function of the form
where
is strictly decreasing with respect to the second variable. Here,
is the agent’s consumption plan at time
and
is a random variable in
that describes the consumption plan of the agent at time
.
Under this assumption, we obtained in [
21] a generalized CAPM equation
where
is the return of the market portfolio and
is the generalized beta market. This coefficient measures the generalized covariance between the risk of the asset or portfolio and the market risk. Note that both Equations (
20) and (
24) define a generalized security market line [
32,
38].
The Fisher information metric
plays the role here of the covariance matrix. In the particular case when the returns of traded assets are distributed according to a
q-Gaussian distribution, it holds that
for every
, where the
q-variance matrix
is defined in
Section 3.1.
5. Generalized Principal Components Analysis (PCA) and Applications to Finance
The results we have quoted in
Section 3.2 and
Section 4 indicate that the Hessian information matrix
plays a central role in the extension of portfolio selection and asset pricing models in the case of non-Gaussian returns. Even under the assumption of normality of the asset returns,
can provide a more accurate risk measure since it is sensitive to higher moments of the underlying probability distributions.
A portfolio composed by
N risky assets
in
is determined by an allocation vector
where
D is the vector of payoffs
. We assume that the payoffs have probability distributions given by densities
,
. The expected return of this portfolio is
whereas its generalized covariance is given by
The matrix
is referred to as the generalized covariance matrix of the assets
. Thus, we consider the optimization problem
subject to the constraint
Setting the Lagrangian
one easily verifies that the first order necessary condition for the optimal portfolio
is
that is,
is an eigenvector of the generalized covariance matrix
relative to the eigenvalue
. Supposing the
has
N distinct eigenvalues and iterating this same optimization procedure in subspaces orthogonal to the span of the already given eigenvectors, one obtains the principal directions
correspondent to the eigenvalues
We then define a matrix
R by
in such a way that an arbitrary portfolio’s payoffs may be rewritten as
Next, we restrict ourselves to the projections of portfolios onto the (totally geodesic) affine subspace spanned by the first
principal directions
, taken as the most significant ones because they represent the largest
p diagonal elements in the generalized covariance matrix in diagonal form, that is,
Hence, we obtain a multi-factor linear model of the form
where
satisfies
The expected return of the
p-principal portfolio
is
and its generalized variance is given by
We claim that the
p-principal portfolio with expected return
and minimum generalized variance is determined by the weights
To prove this claim, we denote
and then we set the Lagrangian
with a constraint given by
The first order condition is
for all
. Taking traces and using the constraint condition, one gets
We conclude that
as claimed. In sum, we have proven the following theorem.
Theorem 4. The p-principal portfolio with minimum generalized variance is given bywhere and , , are, respectively, the expected return and the generalized variance of the first p eigenvectors of the generalized covariance matrix . This portfolio coincides with the projection of the random variable over the principal p-dimensional submanifold spanned by the eigenvectors.