**3. Results**

First, we are going to spell out the saddle point conditions in full detail and reduce them to special cases later.

Let us bring the integral in (2) to a more convenient form by integrating by parts:

$$I = \frac{1}{\sqrt{\pi}} \int\_{-\infty}^{\infty} \mathrm{d}s \, e^{-s^2} g\left(\frac{\epsilon}{\Delta} + s\sqrt{\frac{2q\_0}{\Delta^2}}\right) = \frac{2q\_0}{\Delta^2} \left[ \mathcal{W}\left(\frac{\Delta + \epsilon}{\sqrt{q\_0}}\right) - \mathcal{W}\left(\frac{\epsilon}{\sqrt{q\_0}}\right) \right] - 1 - 2\frac{\epsilon}{\Delta} \,. \tag{9}$$

With this identity, the free energy becomes

$$f = \lambda - \frac{\alpha \epsilon}{r} - \Delta \hat{q}\_0 - \hat{\Delta} q\_0 - \frac{\Delta}{2r} + \frac{q\_0}{r \Delta} \left[ \mathcal{W} \left( \frac{\Delta + \epsilon}{\sqrt{q\_0}} \right) - \mathcal{W} \left( \frac{\epsilon}{\sqrt{q\_0}} \right) \right] + \langle \text{min} \, V \rangle\_{\sigma, z} \tag{10}$$

The function *W* in the above formulae, together with two related functions Φ and Ψ, will frequently appear in the following; they are integrals of the Gaussian √12*πe*<sup>−</sup>*<sup>x</sup>*2/2:

$$\Phi(x) \quad = \int\_{-\infty}^{x} \text{d}t \, \frac{1}{\sqrt{2\pi}} e^{-t^2/2} \tag{11}$$

$$\Psi(x) \quad = \int\_{-\infty}^{x} \mathrm{d}t \,\Phi(t) \tag{12}$$

$$\mathcal{W}(\mathbf{x}) \quad = \int\_{-\infty}^{\mathbf{x}} \mathbf{d}t \,\Psi(t) \,. \tag{13}$$

Now we evaluate the minimum of V in (3) and denote the "representative weight" where this minimum is located by *<sup>w</sup>*<sup>∗</sup>. It works out to be

$$w^\* = \frac{\lambda + \sigma z \sqrt{-2\hat{q}\_0} - \eta^+ \Theta(w^\*) + \eta^- \Theta(-w^\*)}{2\sigma^2 \hat{\Lambda}},\tag{14}$$

or

$$w^{+} = \begin{cases} \frac{\lambda + \sigma z \sqrt{-2\mathfrak{q}\_{0} - \eta^{+}}}{2\sigma^{2}\bar{\Lambda}}, \text{ if } z \ge \frac{\eta^{+} - \lambda}{\sigma \sqrt{-2\mathfrak{q}\_{0}}} \\\\ 0, \text{ if } -\frac{\lambda + \eta^{-}}{\sigma \sqrt{-2\mathfrak{q}\_{0}}} < z < \frac{\eta^{+} - \lambda}{\sigma \sqrt{-2\mathfrak{q}\_{0}}} \\\\ \frac{\lambda + \sigma z \sqrt{-2\mathfrak{q}\_{0} + \eta^{-}}}{2\sigma^{2}\bar{\Lambda}}, \text{ if } z \le -\frac{\lambda + \eta^{-}}{\sigma \sqrt{-2\mathfrak{q}\_{0}}}. \end{cases} \tag{15}$$

With this and (4), one can calculate *V*<sup>∗</sup>, the value of *V* at the minimum, and perform the double averaging to obtain

$$
\langle V^\* \rangle\_{\sigma, z} = \frac{\not p\_0}{\hat{\Delta}} \frac{1}{N} \sum\_i \left[ \mathcal{W} \left( \frac{\lambda - \eta^+}{\sigma\_i \sqrt{-2\not p\_0}} \right) + \mathcal{W} \left( -\frac{\lambda + \eta^-}{\sigma\_i \sqrt{-2\not p\_0}} \right) \right]. \tag{16}
$$

Then, the fully explicit form of the free energy becomes

$$\begin{split} \left| f \right\rangle &= \left. \lambda - \frac{\alpha \epsilon}{r} - \Delta \dot{q}\_0 - \dot{\Lambda} q\_0 - \frac{\Delta}{2r} + \frac{q\_0}{r \Delta} \left[ \mathcal{W} \left( \frac{\Delta + \epsilon}{\sqrt{q\_0}} \right) - \mathcal{W} \left( \frac{\epsilon}{\sqrt{q\_0}} \right) \right] \right| \\ &+ \quad \left. \frac{\dot{q}\_0}{\dot{\Lambda}} \frac{1}{N} \sum\_i \left[ \mathcal{W} \left( \frac{\lambda - \eta^+}{\sigma\_i \sqrt{-2 \dot{q}\_0}} \right) + \mathcal{W} \left( - \frac{\lambda + \eta^-}{\sigma\_i \sqrt{-2 \dot{q}\_0}} \right) \right]. \end{split} \tag{17}$$

It is now straightforward to take the derivatives of *f* with respect to the order parameters and derive the stationary conditions.

From *∂ f* /*∂λ* = 0, it follows that

$$1 = \frac{\sqrt{-2\hat{q}\_0}}{2\hat{\Lambda}} \frac{1}{N} \sum\_{i} \frac{1}{\sigma\_i} \left[ \mathbb{Y} \left( \frac{\lambda - \eta^+}{\sigma\_i \sqrt{-2\hat{q}\_0}} \right) - \mathbb{Y} \left( -\frac{\lambda + \eta^-}{\sigma\_i \sqrt{-2\hat{q}\_0}} \right) \right]. \tag{18}$$

The derivative with respect to *q*ˆ0 yields

$$2\Delta\Lambda = \frac{1}{N} \sum\_{i} \left[ \Phi\left(\frac{\lambda - \eta^{+}}{\sigma\_{i}\sqrt{-2\tilde{\eta}\_{0}}}\right) + \Phi\left(-\frac{\lambda + \eta^{-}}{\sigma\_{i}\sqrt{-2\tilde{\eta}\_{0}}}\right) \right].\tag{19}$$

From the derivative with respect to Δ ˆ , we ge<sup>t</sup>

$$\eta\_0 = -\frac{\mathfrak{j}\_0}{\hat{\Delta}^2} \frac{1}{N} \sum\_i \left[ \mathcal{W} \left( \frac{\lambda - \eta^+}{\sigma\_i \sqrt{-2\hat{\mathfrak{j}}\_0}} \right) + \mathcal{W} \left( -\frac{\lambda + \eta^-}{\sigma\_i \sqrt{-2\hat{\mathfrak{j}}\_0}} \right) \right]. \tag{20}$$

As mentioned before, *q*0 determines the out-of-sample estimate for ES and the estimation error. The derivative with respect to *q*0 leads to

$$2r\Delta\bar{\Lambda} = \Phi\left(\frac{\Delta+\epsilon}{\sqrt{q\_0}}\right) - \Phi\left(\frac{\epsilon}{\sqrt{q\_0}}\right),\tag{21}$$

where use has been made of the identity

$$\mathcal{W}(\mathbf{x}) = \frac{1}{2}\mathbf{x}\Psi(\mathbf{x}) + \frac{1}{2}\Phi(\mathbf{x})\,. \tag{22}$$

The condition for the derivative with respect to to vanish is

$$\alpha = \frac{\sqrt{q\_0}}{\Delta} \left[ \Psi \left( \frac{\Delta + \epsilon}{\sqrt{q\_0}} \right) - \Psi \left( \frac{\epsilon}{\sqrt{q\_0}} \right) \right]. \tag{23}$$

The derivation of the last equation takes a little more effort. Let us go back to the free energy in (2) and take the derivative with respect to Δ. Noticing that *V<sup>σ</sup>*,*<sup>z</sup>* does not depend on Δ, and using the integral given in (9), we have

$$\frac{\partial f}{\partial \Delta} = -\mathfrak{f}\_0 + \frac{1}{2r}I + \frac{\Delta}{2r}\frac{\partial I}{\partial \Delta} = 0\tag{24}$$

valid at the stationary point. From here we find

$$\frac{1}{2r}I\_{st} = \mathfrak{H}\_0 + \frac{2q\_0}{r\Delta^2} \left[ W\left(\frac{\Delta + \epsilon}{\sqrt{q\_0}}\right) - W\left(\frac{\epsilon}{\sqrt{q\_0}}\right) \right] - \frac{\epsilon}{r\Delta} - \frac{\sqrt{q\_0}}{r\Delta} \Psi\left(\frac{\Delta + \epsilon}{\sqrt{q\_0}}\right),\tag{25}$$

where (9) was used again and we denoted by *Ist* the integral *I* evaluated at the stationary point. Now we apply the identity (22) and the stationary conditions (23), (21) to arrive at

$$\frac{1}{2r}I\_{st} = \not{q}\_0 + \frac{2q\_0\hat{\Lambda}}{\Delta} - (1 - \kappa)\frac{\epsilon}{r\Delta},\tag{26}$$

which, combined with (9), finally leads to

$$\ddot{q}\_0 + \frac{2q\_0\dot{\Lambda}}{\Delta} + a\frac{\epsilon}{r\Delta} + \frac{1}{2r} - \frac{q\_0}{r\Delta^2} \left[ \mathcal{W}\left(\frac{\Delta + \epsilon}{\sqrt{q\_0}}\right) - \mathcal{W}\left(\frac{\epsilon}{\sqrt{q\_0}}\right) \right] = 0. \tag{27}$$

The Equations (18)–(23) and (27) constitute the system of equations for the six order parameters. These equations are valid both for the regularized and (setting *η*<sup>+</sup> = *η*<sup>−</sup> = 0) for the unregularized cases.

Let us now work out the relationship between the free energy and the chemical potential. Comparing (16) and (20), we see that *V*∗*<sup>σ</sup>*,*<sup>z</sup>* = −*q*0Δ<sup>ˆ</sup> , which with (10) and (27), results in the simple formula

$$f = \lambda \tag{28}$$

at the stationary point, as we anticipated before. In [12], we argued that the stationary value of *f* determines the in-sample estimate of ES through (1).

The last object to determine is the distribution of weights:

$$p(w) = \langle \delta(w - w^\*) \rangle\_{\sigma, z} \tag{29}$$

With (14), we find

$$p(w) \quad = \quad n\_0 \delta(w) + \frac{1}{N} \sum\_{i} \frac{1}{\sigma\_w^{(i)} \sqrt{2\pi}} \exp\left(-\frac{1}{2} \left(\frac{w - w\_i^+}{\sigma\_w^{(i)}}\right)^2\right) \theta(w) \tag{30}$$

$$+\quad\frac{1}{N}\sum\_{i}\frac{1}{\sigma\_{w}^{(i)}\sqrt{2\pi}}\exp\left(-\frac{1}{2}\left(\frac{w-w\_{i}^{-}}{\sigma\_{w}^{(i)}}\right)^{2}\right)\theta(-w)\,,\tag{31}$$

where *<sup>δ</sup>*(*w*) is the Dirac delta,

$$
\sigma\_w^i = \frac{\sqrt{-2\mathfrak{H}\_0}}{2\hat{\Delta}\sigma\_l} \tag{32}
$$

is the (estimated) variance of the *i*th return,

$$w\_i^+ = \frac{\lambda - \eta^+}{2\sigma\_i^2 \Lambda} \tag{33}$$

is the center of the Gaussian distribution of the (estimated) positive weight *i*,

$$w\_i^- = \frac{\lambda + \eta^-}{2\sigma\_i^2 \hat{\Delta}} \tag{34}$$

is the same for negative weight *i*, and finally,

$$m\_0 = \frac{1}{N} \sum\_{i} \left[ \Phi \left( \frac{\lambda + \eta^-}{\sigma\_l \sqrt{-2\hat{q}\_0}} \right) - \Phi \left( \frac{\lambda - \eta^+}{\sigma\_l \sqrt{-2\hat{q}\_0}} \right) \right] \tag{35}$$

is the density of the assets whose weights are set to zero by the regularizer. We wish to make an important remark here: the right hand side of (19) is just 1 − *n*0. This willprovetobethekeytothemappingbetweentheregularizedandunregularizedcases.

Let us record the condensate density *n*0 also for the special case when short positions are excluded (*η*<sup>−</sup> → ∞), but long positions are not penalized (*η*<sup>+</sup> = 0):

$$m\_0 = \frac{1}{N} \sum\_{i} \left[ 1 - \Phi\left(\frac{\lambda}{\sigma\_i \sqrt{-2\hat{q}\_0}}\right) \right]. \tag{36}$$

From (36), we can see that, since <sup>Φ</sup>(*x*) is monotonic increasing and, for *x* ≥ 0, concave, the contribution to *n*0 from assets with larger *σi*s is larger than that from smaller *σi*s. This means that in the no-short limit, the regularizer -1 eliminates more volatile assets with larger probability than the less volatile ones. Thus, we can think of the no-short constraint as a smooth upper cutoff in volatility. This is not true in the generic case (35), where the contributions of the small and large volatility items depend on the order parameters and the regularizer's slopes *η*<sup>+</sup> and *η*<sup>−</sup> in a complicated manner: the probability of an asset with volatility *σi* to be removed is given by the difference of the two term in (35) under the sum. We do not wish to analyze this situation in detail, apart from the remark that a sufficiently large *η*<sup>−</sup> generally favors the elimination of large volatility items.

The integral of *p*(*w*)is, of course, 1. Its first moment, *w*∗*<sup>σ</sup>*,*z*, works out to be the same as (18):

$$
\langle w^\* \rangle\_{\sigma, z} = 1 \,. \tag{37}
$$

The second moment of the weight distribution is readily obtained as

$$\langle\langle w^\* \rangle^2\rangle\_{\sigma, z} = -\frac{\mathfrak{d}\_0}{\hat{\Lambda}^2} \frac{1}{N} \sum\_i \frac{1}{\sigma\_i^2} \left[ W\left(\frac{\lambda - \eta^+}{\sigma\_i \sqrt{-2\hat{\eta}\_0}}\right) + W\left(-\frac{\lambda + \eta^-}{\sigma\_i \sqrt{-2\hat{\eta}\_0}}\right) \right].\tag{38}$$

The variance of the weight distribution is then

$$
\langle (w^\*)^2 \rangle\_{\sigma, z} - \left( \langle w^\* \rangle\_{\sigma, z} \right)^2,\tag{39}
$$

which is equal to *q*0 − 1, when the variances of the assets are all equal to 1. For a portfolio with different *<sup>σ</sup>i*'s, however, the relevant quantity that determines the out-of-sample estimate of ES is not the second moment of the weight distribution, but the true variance of the *i*th asset multiplied by the estimated portfolio weights squared and summed over the different assets, that is

$$
\langle \sigma^2(w^\*)^2 \rangle\_{\sigma, z, \prime} \tag{40}
$$

which is precisely *q*0 as given in (20), and this is the quantity (multiplied by the correction as in (7)) that enters the formula for the out-of-sample estimate of ES in (8). For a not too inhomogeneous portfolio, the difference between the second moment of the weight distribution and *q*0 is not significant, so we can think of *q*0 as a measure of the variance of the portfolio.

Now we are ready to consider various special cases.

#### *3.1. The Limit of Complete Information*

When we have many observations (very long time series, *T* → ∞) relative to the dimension *N* of the portfolio, we are in the *r* = *N*/*T* → 0 limit. As we have already mentioned, this also corresponds to the "chemical potential" *λ* going to infinity. Obviously, in this limit, the regularizer plays no role.

We need the asymptotic behavior of the functions appearing in our stationary conditions: for *x* → <sup>∞</sup>, <sup>Φ</sup>(*x*) → 1, <sup>Ψ</sup>(*x*) ∼ *x*, and *<sup>W</sup>*(*x*) ∼ *x*2/2, while for *x* → <sup>−</sup>∞, all three vanish exponentially.

Then from (18) we have

$$1 = \frac{\lambda}{2\hat{\Lambda}} \frac{1}{N} \sum\_{i} \frac{1}{\sigma\_i^2} \,. \tag{41}$$

From (19)

$$2\Lambda\hat{\Lambda} = 1.\tag{42}$$

Combining the two:

$$1 = \lambda \Delta \frac{1}{N} \sum\_{i} \frac{1}{\sigma\_i^2} \,. \tag{43}$$

We know from (1) and (28) that *λ* must be inversely proportional to *r* when *r* → 0. It follows that Δ ∼ *r* for small *r*.

Then, from (20) we find

$$q\_0 = \Delta^2 \lambda^2 \frac{1}{N} \sum\_{i} \frac{1}{\sigma\_i^2} \,. \tag{44}$$

Combined with the previous equation, this gives

$$q\_0 = \frac{1}{\frac{1}{N} \sum\_{i} \frac{1}{\sigma\_i^2}}.\tag{45}$$

The "true" (*r* → 0) value of the order parameter *q*0 is thus determined by the structural constant 1*N* ∑*i* 1*σ*2*i* , which is given by the variances of the returns *σ*2*i* . This is in accord with the corresponding result found in the case of the -1-regularized variance risk measure [21,29]. The above result for *q*0 also means that the quantity *q*˜0 introduced in (7) is equal to 1, and according to (8) the out-of-sample estimate of ES is equal to its true value ES(0), the estimation error is zero—an obvious result for the case of complete information.

From (23) with Δ → 0 we obtain *α* = <sup>Φ</sup>(/√*q*0), or

$$
\epsilon = \Phi^{-1}(u)\sqrt{q\_0}\ . \tag{46}
$$

Now from (21) we ge<sup>t</sup> *r* = Φ √*q*0 √Δ*q*0 , or

$$
\Delta = r \sqrt{q\_0} \frac{1}{\frac{1}{\sqrt{2\pi}} e^{-a^2/2q\_0}}.\tag{47}
$$

However, then we have found

$$
\lambda = \frac{q\_0}{\Delta} = \frac{1}{r} \frac{1}{\sqrt{2\pi}} e^{-\mathfrak{a}^2/2q\_0} \sqrt{q\_0} = \frac{1}{r} \frac{1}{\sqrt{2\pi}} e^{-\left(\Phi^{-1}(a)\right)^2/2} \sqrt{q\_0} \,. \tag{48}
$$

Since *λ* = *f* and ES = *f r*/(<sup>1</sup> − *<sup>α</sup>*), we have the *r* → 0 limit (the true value) of ES:

$$\text{ES}^{(0)} = \frac{1}{1 - a} \frac{1}{\sqrt{2\pi}} e^{-\left(\Phi^{-1}(a)\right)^2/2} \sqrt{q\_0} \ . \tag{49}$$

We record the *r* → 0 limits of the two auxiliary variables, Δ ˆ and *q*ˆ, for completeness:

*q*

0

ˆ

$$
\hat{\Delta} = \frac{1}{2r\sqrt{q\_0}} \frac{1}{\sqrt{2\pi}} e^{-\kappa^2/2q\_0} \tag{50}
$$

and

$$
\sim -\frac{1}{r},\tag{51}
$$

with a coefficient that will not be needed in the following.

Let us turn to the distribution of weights now.

In the *r* → 0 limit, the widths of the Gaussians in (30) all vanish, so the Gaussians become delta functions:

$$p = \frac{1}{N} \sum\_{i} \delta(w - w\_i^+) \theta(w) + \frac{1}{N} \sum\_{i} \delta(w - w\_i^-) \theta(-w) \,. \tag{52}$$

In the *r* → 0 limit, the weights are all positive, so the second sum disappears.

For the weights, *<sup>w</sup>*+*i*we find

$$w\_i^+ \simeq \frac{\lambda}{2\sigma\_i^2 \Lambda} = \frac{\lambda \Lambda}{\sigma\_i^2} = \frac{1}{\sigma\_i^2} \frac{1}{\frac{1}{N} \sum\_k \frac{1}{\sigma\_k^2}}.\tag{53}$$

They sum to *N*, as stipulated.

The variance of a linear combination of independent random variables with averages *<sup>w</sup>*+*i* and variances *σ*2*i*is

$$
\sigma\_p^2 = \sum\_i \left( w\_i^+ \right)^2 \sigma\_i^2 = \frac{N}{\frac{1}{N} \sum\_k \frac{1}{\sigma\_k^2}}.\tag{54}
$$

Now we recognize the meaning of the (true value of the) order parameter *q*0: it is the normalized (to O(1)) variance of the portfolio. This also explains the correction factor appearing in (7). We also see that (46) and (49) are the standard expressions for Value at Risk and Expected Shortfall indeed.

We emphasize again that all the results presented in this subsection are only valid in the *r* → 0 limit when we are dealing with a finite dimension *N* and infinitely long time series *T*.

For finite *r*, the sample fluctuations start to broaden the delta spikes in the distribution of weights, the condensation of zero weights begins, *λ* decreases, and all the formulae above become considerably more complicated. We turn to this situation in the next subsections.

By now, we have learned everything that was to be learned from keeping the variances *σi* different, in particular the tendency of the elimination of the most volatile assets by the regularizer in the case of restriction of short selling. In order to simplify the presentation and avoid the appearance of very large and hardly transparent formulae, henceforth we set all the *<sup>σ</sup>i*'s equal to 1. We stress, however, that the main message of this paper, namely the existence of a mapping between the regularized and unregularized cases, depends only on the structure of the equations, and works also with different *σ*'s.

#### *3.2. Without Regularization*

In this subsection, we set *η*<sup>+</sup> = *η*<sup>−</sup> = 0, that is we consider our problem without regularization, and according to what has just been said, put *σi* = 1. We will make use of the identities

$$
\Phi(x) + \Phi(-x) \quad = \quad 1 \tag{55}
$$

$$\Psi(\mathbf{x}) + \Psi(-\mathbf{x}) \quad = \quad \mathbf{x} \tag{56}$$

$$\mathcal{W}(\mathbf{x}) + \mathcal{W}(-\mathbf{x}) \quad = \quad \frac{1}{2}(\mathbf{x}^2 + 1). \tag{57}$$

The free energy (17) becomes

$$f\_{\perp}f\_{\perp} = \lambda - \frac{a\epsilon}{r} - \Delta \mathfrak{q}\_{0} - \hat{\Delta}\mathfrak{q}\_{0} - \frac{\Delta}{2r} + \frac{q\_{0}}{r\Delta} \left[ \mathcal{W}\left(\frac{\Delta + \epsilon}{\sqrt{q\_{0}}}\right) - \mathcal{W}\left(\frac{\epsilon}{\sqrt{q\_{0}}}\right) \right] - \frac{\lambda^{2}}{4\hat{\Delta}} + \frac{\hat{q}\_{0}}{2\hat{\Delta}}.\tag{58}$$

For the saddle point equations, we find:

$$1 = \frac{\lambda}{2\tilde{\Lambda}}\,'\,. \tag{59}$$

$$2\Delta\bar{\Lambda} = 1\ \text{ .} \tag{60}$$

$$q\_0 = \frac{\lambda^2}{4\hat{\Lambda}^2} - \frac{\hat{q}\_0}{2\hat{\Lambda}^2} \,, \tag{61}$$

$$2r\Delta\hat{\Lambda} = r = \Phi\left(\frac{\Delta + \epsilon}{\sqrt{q\_0}}\right) - \Phi\left(\frac{\epsilon}{\sqrt{q\_0}}\right),\tag{62}$$

$$\alpha = \frac{\sqrt{q\_0}}{\Delta} \left[ \Psi \left( \frac{\Delta + \epsilon}{\sqrt{q\_0}} \right) - \Psi \left( \frac{\epsilon}{\sqrt{q\_0}} \right) \right],\tag{63}$$

$$\dot{q}\_0 + \frac{2q\_0\dot{\Lambda}}{\Delta} + \frac{a\epsilon}{r\Delta} + \frac{1}{2r} - \frac{q\_0}{r\Delta^2} \left[ W\left(\frac{\Lambda + \epsilon}{\sqrt{q\_0}}\right) - W\left(\frac{\epsilon}{\sqrt{q\_0}}\right) \right] = 0. \tag{64}$$

These equations are rather similar to their counterparts in the previous subsection, but of course *r* → 0 is not assumed here. As for their solutions, they were discussed and illustrated in several figures in [12], therefore we will not dwell upon them here. (Some results will be given in Section 3.6.) Instead, we write up the corresponding equations in the case where no short positions are allowed and make a term-by-term comparison between the two sets of equations.

#### *3.3. No Short Selling*

Short positions will be excluded by imposing infinite penalty on them by letting *η*<sup>−</sup> go to infinity. The functions <sup>Φ</sup>(*x*), <sup>Ψ</sup>(*x*), and *<sup>W</sup>*(*x*) all vanish when *x* → <sup>−</sup>∞. Long positions will not be penalized, so we set *η*<sup>+</sup> = 0.

The free energy becomes

$$\begin{array}{rcl} f &=& \lambda - \frac{\mathfrak{a}\mathfrak{e}}{r} - \Delta \mathfrak{q}\_{0} - \hat{\Delta}q\_{0} - \frac{\Delta}{2r} + \frac{q\_{0}}{r\Delta} \left[ \mathcal{W} \left( \frac{\Delta + \mathfrak{e}}{\sqrt{q\_{0}}} \right) - \mathcal{W} \left( \frac{\mathfrak{e}}{\sqrt{q\_{0}}} \right) \right] \end{array} \tag{65}$$

$$+\quad\frac{\mathfrak{d}\_0}{\hat{\Lambda}}\mathcal{W}\left(\frac{\lambda}{\sqrt{-2\hat{\eta}\_0}}\right).\tag{66}$$

The stationary conditions now read as:

$$1 = \frac{\sqrt{-2\hat{\eta}\_0}}{2\hat{\Lambda}} \Psi \left(\frac{\lambda}{\sqrt{-2\hat{\eta}\_0}}\right),\tag{67}$$

$$2\Lambda\dot{\Lambda} = \Phi\left(\frac{\lambda}{\sqrt{-2\dot{q}\_0}}\right),\tag{68}$$

$$q\_0 = -\frac{\mathfrak{H}\_0}{\hat{\Delta}^2} \mathcal{W} \left( \frac{\lambda}{\sqrt{-2\hat{\eta}\_0}} \right),\tag{69}$$

$$2r\Delta\hat{\Lambda} = \Phi\left(\frac{\Delta+\epsilon}{\sqrt{q\_0}}\right) - \Phi\left(\frac{\epsilon}{\sqrt{q\_0}}\right),\tag{70}$$

$$\alpha = \frac{\sqrt{q\_0}}{\Delta} \left[ \Psi \left( \frac{\Delta + \epsilon}{\sqrt{q\_0}} \right) - \Psi \left( \frac{\epsilon}{\sqrt{q\_0}} \right) \right],\tag{71}$$

$$\left[r\left(\hat{q}\_0 + \frac{2q\_0\hat{\Lambda}}{\Delta}\right) + \frac{a\epsilon}{\Delta} + \frac{1}{2} - \frac{q\_0}{\Delta^2}\left[\mathcal{W}\left(\frac{\Delta + \epsilon}{\sqrt{q\_0}}\right) - \mathcal{W}\left(\frac{\epsilon}{\sqrt{q\_0}}\right)\right] = 0\right.\tag{72}$$

the last equation being the same as (64), just multiplied by *r*.

In the distribution of weights in (30), the second sum of Gaussians will disappear, because for *η*<sup>−</sup> → <sup>∞</sup>, all the weights (34) go to infinity. The weights (33) become

$$w\_i^+ = \frac{\lambda}{2\tilde{\Lambda}}\,'\,. \tag{73}$$

while the density of zero weights is now

$$m\_0 = 1 - \Phi\left(\frac{\lambda}{\sqrt{-2\hat{\eta}\_0}}\right),\tag{74}$$

which with (68) leads to

$$1 - n\_0 = 2\Lambda \hat{\Delta}.\tag{75}$$

From (74), we see that *n*0 = 0 for *r* = 0 and increases as *λ* decreases, until it reaches its maximal value 1/2 when *λ* vanishes. Mathematically, there is nothing to prevent us from continuing to increase *r* and driving *λ* to negative values, which would allow *n*0 to grow beyond 1/2, up to *n*0 = 1, but a negative *λ* would cause the free energy and thus also ES to change sign—an extreme case of "in-sample optimism", entirely due to the lack of sufficient information. We consider such a situation "unphysical", and never go beyond the point where *λ* (or *λ* − *η*<sup>+</sup> if *η*<sup>+</sup> > 0) vanishes anywhere in this paper.

### *3.4. No-Short Mapping*

We are now ready to spell out the mapping between the no-short case and the unregularized one.

The first point to notice is that the only difference between Equation (62) valid in the unregularized case and its counterpart (70) in the no-short case (combined with (75)) appears on their left hand side: the terms *r* and (1 − *<sup>n</sup>*0)*<sup>r</sup>*, respectively. This suggests to introduce an effective *r*:

$$r\_{\rm eff} = (1 - n\_0)r.\tag{76}$$

Now *r* = *N*/*T*, and *n*0 is the density of the assets removed by the regularizer, thus (1 − *<sup>n</sup>*0)*<sup>r</sup>* = *N*−*N*<sup>0</sup> *T* is the number of surviving assets divided by the length of the time series. As *<sup>r</sup>*eff increases from zero to 1/2, *r* will increase between zero and 1.

Inspired by the connection between *r* and *<sup>r</sup>*eff, we compare the two sets of equations and recognize that, in fact, the whole system of saddle point equations can be mapped from the regularized case to the unregularized one. A variable that appears in all the subsequent equations is

$$z = \frac{\lambda}{\sqrt{-2\tilde{\eta}\_0}},\tag{77}$$

where the variables *λ* and *q*ˆ0 are those that appear in the no-short equations.

Then the connection between the order parameters belonging to the two cases is the following:

$$q\_0 = q\_0^{\rm eff} \frac{z}{\Psi(z)}\,'\,\tag{78}$$

$$
\Delta = \Delta\_{\rm eff} \sqrt{\frac{z}{\Psi(z)}} \; , \tag{79}
$$

$$
\epsilon = \epsilon\_{\rm eff} \sqrt{\frac{z}{\Psi(z)}} \; \prime \tag{80}
$$

$$
\lambda = \lambda\_{\rm eff} \sqrt{\frac{z}{\Psi(z)}} \Phi(z) \,, \tag{81}
$$

$$
\mathfrak{q}\_0 = \mathfrak{q}\_0^{\mathrm{eff}} \Phi(z) \, , \tag{82}
$$

$$
\Delta = \Delta\_{\rm eff} \sqrt{\frac{\Psi(z)}{z}} \Phi(z) \,. \tag{83}
$$

A direct substitution shows that if the order parameters on the left hand sides of the above equations satisfy the no-short equations, then the effective variables satisfy the unregularized ones, provided we also replace *r* with *<sup>r</sup>*eff. In particular, the contour maps of the unregularized order parameters presented in [12] can be taken over and simply blown up by a factor 1 1−*n*<sup>0</sup> to obtain the contour maps of the no-short variables. Given the relation between *q*0 and the estimation error, we see that the mapping also means that a given error belongs to a larger *r* in the no-short case than in the unregularized one, in other words, the no-short constrained problem demands (1 − *<sup>n</sup>*0) times less data (shorter time series) than the unregularized one.

One may wonder whether this mapping expresses some symmetry of the problem, that is whether the free energy functional is invariant under this mapping. The answer is no: the mapping works only in the saddle point equations, it is a property of the stationary point.

It is important to learn the range of this transformation. In the limit *r* → 0, the transformation is the identity, but this is trivial: when we have complete information, the regularizer does not play any role. It is more interesting to consider the vicinity of the phase transition in the unregularized case, where *q*eff 0 and Δeff diverge. These divergences are removed by the mapping, no singularity is found in the no-short case. This is in accord with [18]: the infinite penalty on short positions precludes the phase transition and no singularity shows up in *q*0 , Δ, or . Mathematically, we can continue the unregularized solutions into the non-feasible region beyond the phase boundary, but they make no sense there (for example, *q*0 changes sign, Δ and become imaginary, etc.), while their mapped counterparts continue to behave reasonably. According to (76), when *<sup>r</sup>*eff reaches the critical point *rc*(*α*), the corresponding value of *r* in the no-short problem will be twice as large, so the whole phase diagram is multiplied by a

factor 2. Beyond the mapped phase boundary the regularized solutions still survive, but their meaning becomes questionable, because the free energy, hence also ES change sign. As noted in the previous Subsection, we refrain from the discussion of this unphysical region.

#### *3.5. Mapping for Generic* -1 *Constraint*

The mapping between the generic -1-constrained ES optimization and the unregularized one is a straightforward generalization of the results in the previous Subsection. The mapping is made more complicated because of the sums and differences of the Ψ, Φ, and *W* functions appearing on the right hand side of Equations (18)–(20). We introduce the following notation for these combinations:

$$A\_{\Psi} = \Psi \left( \frac{\lambda - \eta^{+}}{\sqrt{-2\hat{\eta}\_{0}}} \right) - \Psi \left( -\frac{\lambda + \eta^{-}}{\sqrt{-2\hat{\eta}\_{0}}} \right), \tag{84}$$

$$A\_{\Phi} = \Phi\left(\frac{\lambda - \eta^{+}}{\sqrt{-2\hat{q}\_{0}}}\right) + \Phi\left(-\frac{\lambda + \eta^{-}}{\sqrt{-2\hat{q}\_{0}}}\right),\tag{85}$$

and

$$A\_W = W\left(\frac{\lambda - \eta^+}{\sqrt{-2\delta\_0}}\right) + W\left(-\frac{\lambda + \eta^-}{\sqrt{-2\delta\_0}}\right),\tag{86}$$

where we have set all the *σi* = 1.

In terms of these quantities the generic map reads as

$$q\_0 = q\_0^{\rm eff} \frac{2A\_W - A\_\Phi}{(A\_\Psi)^2},\tag{87}$$

$$
\Delta = \Delta\_{\rm eff} \frac{\sqrt{2A\_W - A\_{\Phi}}}{A\_{\Psi}},
\tag{88}
$$

$$
\varepsilon = \varepsilon\_{\rm eff} \frac{\sqrt{2A\_W - A\_{\Phi}}}{A\_{\Psi}} \,, \tag{89}
$$

$$
\lambda = \lambda\_{\rm eff} \frac{zA\_{\Phi}}{\sqrt{2A\_{W} - A\_{\Phi}}},
\tag{90}
$$

$$
\mathfrak{q}\_0 = \mathfrak{q}\_0^{\text{eff}} A\_{\Phi'} \tag{91}
$$

$$
\hat{\Delta} = \hat{\Delta}\_{\rm eff} \frac{A\_{\Phi} A\_{\Psi}}{\sqrt{2A\_{W} - A\_{\Phi}}} \cdot \tag{92}
$$

For the condensate density *n*0, we have

$$1 - n\_0 = A\_{\Phi'} \tag{93}$$

and for the effective aspect ratio

$$r\_{\rm eff} = 2r\Delta\hat{\Lambda} = rA\phi = (1 - n\wp)r.\tag{94}$$

As before, if the order parameters satisfy the regularized stationarity conditions (18)–(27) (with *σi* = 1), then the effective parameters will satisfy the unregularized Equations (59)–(64), and vice versa.

Note that the above equations remain invariant if we redefine *λ* as *λ* − *η*<sup>+</sup> and *η*<sup>−</sup> as *η* − + *<sup>η</sup>*<sup>+</sup>. So we can set *η*<sup>+</sup> = 0 and *η*<sup>−</sup> + *η*<sup>+</sup> = *η* without loss of generality. We will use this setup in the following, in order to reduce the number of parameters when solving the stationarity equations.

#### *3.6. Solutions for the Order Parameters*

Except for a few exceptional points, it is impossible to obtain the solutions of the stationarity equations in closed, analytical form, but it is perfectly possible to ge<sup>t</sup> them numerically, by a computer. (The case of *α* = 1 is exceptional in several respects and will not be considered here.) In the following, the solutions will be presented in graphical form.

Figure 1 exhibits three special lines, belonging to three different cases: the unregularized case, the one with a finite regularizer, and the one with a no-short constraint.

**Figure 1.** The boundary of the region where the optimization of ES is feasible in the unregularized case (nr); its image under the map for a finite *η*<sup>−</sup> = 0.05, *η*+ = 0 regularizer; and the same under the no-short map (ns).

The blue line is the upper boundary of the region where the optimization of unregularized ES is feasible. This line was first determined in [10]. It is a phase boundary, along which a phase transition takes place: *q*0, Δ, and diverge here, while *λ* becomes zero. The unregularized equations can be solved also above this line, up to the horizontal line at *r* = 1 (not shown in the Figure), but the solutions are meaningless: *q*0 is negative, while *λ*, Δ, and become imaginary. The unregularized equations do not have any solution above *r* = 1.

The green line is the image of the unregularized phase boundary under the mapping described in the previous Subsection, and corresponds to a one-sided regularizer with *η* − = 0.05, *η*+ = 0. There is no phase transition when we cross this line, the order parameters remain smooth, finite quantities, but *λ* (along with the free energy and the in-sample estimate of ES) changes sign, rendering the solution in the region above the green line "unphysical". Nevertheless, if we keep following the solutions beyond the green line we can go up to the image of the *r* = 1 line (mapped into *r* → ∞), where *q*0 and Δ will ultimately diverge. The region between the green line and the image of the *r* = 1 line has an intricate structure, but because it corresponds to negative risk, it is of no interest for us in the present context.

In the no-short case, there is always a solution with the order parameters remaining finite all the way up to infinity, which is the image of the *r* = 1 line under the no-short map. However, as we cross the orange line, *λ* changes sign, and the region beyond it is meaningless again. The orange line is the unregularized phase boundary (blue line) blown up by a factor 1 1−*n*<sup>0</sup> = 2. All this is in accord with the picture described in [18] in that the no-short constraint eliminates the critical line. The solutions becoming unphysical beyond a certain *r*-range could not be foreseen on the basis of the analysis in [18].

Figure 2 shows the *η*-dependence of *q*0 and the density of the zero weights *n*0 at criticality, and that of the value of the critical *r*. In the unregularized case (*η* → 0), *q*0 → <sup>∞</sup>, while in the no-short case (*η* → ∞) *q*0 → *π*. At *α* = 0.975, the value of the critical *rc* increases from *rc* ≈ 1/2 in the unregularized case to ≈1 for the no-short case. The proportion of the assets eliminated from the portfolio (the condensate density) goes from zero for *η* = 0 to 1/2 for large *η*.

**Figure 2.** Dependence of *q*0 at *rc* (**left**), critical point (**middle**), and proportion of zero weights at *rc* (**right**) as a function of the regularization strength, *η*<sup>−</sup> = *η* (*η*<sup>+</sup> = 0). Note the logarithmic scale in the left panel.

In Figure 3, we display the *r*-dependence of *q*0, Δ, and *λ* for the three cases: unregularized, regularized, and no-short. Without regularization, *q*0 and Δ increase with *r* and diverge at an *rc* slightly less than 12 ; while *λ* decreases from infinity at *r* = 0 to zero at *rc*. (The confidence limit *α* is set at its regulatory value 0.975 in these figures.) Under the regularizer *η*<sup>−</sup> = 0.05, *η*<sup>+</sup> = 0, *q*0, and Δ increases up to the *r* where *λ* vanishes. The situation is similar for an infinitely strong (no-short) regularizer, with the limiting value of *q*0 = *π* and *λ* = 0 at *r* ≈ 1.

**Figure 3.** Dependence of *q*0(**left**), Δ (**middle**) and "chemical potential" *λ* (**right**) on *r* = *N*/*T*, for the unregularized (blue), *η*<sup>−</sup> = 0.05, *η*<sup>+</sup> = 0 regularized (green), and no-short (yellow) cases.

The left panel in Figure 4 shows the relative out-of-sample estimation error, which is related to the out-of-sample estimate of ES by (8) (*q*˜0 = *q*0 now, as we have set all the *σi* =1). These curves are similar to the curves of *q*0 in the previous Figure. It can be seen that the curves of the relative estimation error run very close to each other for small values of *r*: there is no substantial reduction of the error in this range. Where they fan out and the effect of regularization starts to be felt (say around *r* = 0.1), the relative error is already about 20%.

**Figure 4.** Dependence of the out-of-sample estimation error (**left**), proportion of zero weights (**center**), and in-sample ES (**right**) on *r* = *N*/*T*, for the non-regularized (blue), *η*<sup>−</sup> = *η* (*η*<sup>+</sup> = 0) regularized (green), and no-short (orange) cases.

The middle panel in Figure 4 shows the behavior of the density of zero weights as function of *r* for the finite *η*-regularized and the no-short cases. In the no-short case, *n*0 reaches its maximal value 12 at *r* ≈ 1 (for *α* = 0.975) where *λ* vanishes. For a regularizer of finite strength, it always remains below 12.

The right panel in Figure 4 displays the behavior of the in-sample estimate of ES for the three cases. This quantity is directly related to *λ* through (1) and (28). The monotonic and fast decay of these curves demonstrates what is called in-sample optimism, a strong underestimation of risk.

### **4. Discussion**

In the preceding Section we compared the behavior of the order parameters in the three instances considered in this paper: the case of the unregularized, the -1-regularized, and the no-short constrained Expected Shortfall optimization. We have seen that without regularization, there is a phase transition as we cross the phase boundary *rc*(*α*) shown in Figure 1 with Δ, *q*0, and diverging here, as known since the paper [10]. In contrast, the infinite penalty on short positions suppresses this phase transition, while an -1 regularizer with finite slopes only shifts the phase boundary. These facts were also known from earlier work [14,18]. However, the picture has turned out to be more complicated than envisaged in [18]. The numerical solution for the order parameters performed in this paper has revealed that new characteristic lines emerge both in the case of finite regularization and the no-short constraint, along which the order parameter *λ* and, consequently, the free energy and the in-sample estimate of Expected Shortfall change sign. We have determined the position of these new characteristic lines: in the no-short case the new line is the curve <sup>2</sup>*rc*(*α*), for a finite regularizer it is *rc* (*α*) 1−*n*<sup>0</sup> , where *n*0 ≤ 12 . We have omitted the detailed analysis of the regions above these lines, where the estimated risk becomes negative. Instead, we confined ourselves to merely pointing out that the critical line for the no-short constraint is projected out to infinity, so the phase transition is removed indeed, while for a finite slope regularizer the critical line is shifted into the unphysical, negative risk region, where for some values of the regularizer's strength *η*, it even develops two branches.

We have also found the behavior of the various order parameters, most notably that of *q*0 that determines the out-of-sample estimation error of ES, the free energy that gives the in-sample estimator, and the susceptibility-like quantity Δ, and displayed their behavior for the three cases studied here. It is satisfactory to see that *q*0 and Δ remain finite up to the new characteristic lines, that is, the regularizer acts as expected: it suppresses the divergent sample fluctuations in the optimization of ES. Unfortunately, this suppression is not strong enough to bring down the estimation error to acceptable values, except for the range of small *r* = *N T* ratios where it demands far too long time series for any realistic *N*, and where *r* is small already without any regularization.

What is the meaning of this phase transition? As analyzed in [8,26] it follows from the coherence axioms that coherent risk measures, including ES, are unstable in the sense that whenever an asset or a combination of assets in the portfolio stochastically dominates the others in a given sample, the investor can take an extremely large long position in the dominant asset and compensate this with an appropriately large short position, without violating the budget constraint. This means that the weight of the dominant asset runs away practically to infinity, resulting in an arbitrarily large negative value of the risk measure. This is a mirage of an arbitrage, which can disappear in the next sample, or change into another arbitrage with a different weight running away to infinity. In practice, there are always constraints that prevent such a divergence from taking place. The ban on short selling is just this sort of constraint. The runaway solutions try to escape, but ge<sup>t</sup> arrested at the walls constituted by the constraint, in the case of a no-short ban, at the coordinate planes. This is how the condensate of zero weights builds up. This mechanism is the stronger the larger the ratio *r* = *N*/*T*.

There is nothing surprising about solutions sitting on the constraint-walls or at corners in a linearly programmable problem, such as the optimization of ES. In the usual applications of linear programming, the constraints typically express some physical limitation like a finite amount of resources, material or labor, etc. In the present finance problem, such a finite resource would be the limited budget, but if short selling is not constrained, the budget in itself cannot prevent runaway solutions. The ban on short positions corresponds to an infinitely strong -1 regularizer, which, combined with the budget constraint, is already sufficient to take care of the runaway solutions. So, with a no-short ban on, we can increase *r* (that is the dimension, or decrease the amount of data) without any mathematical contradiction showing up; neither *q*0 nor Δ will diverge. It is clear, however, that the solution based on less and less information becomes increasingly meaningless. In these circumstances, the optimization will not tell us anything useful about the structure of the market, it will be determined more and more by the constraint.

What we regard as the most intriguing result of this paper is the existence of a mapping between the regularized and the unregularized problems.

**Author Contributions:** Conceptualization, G.P., I.K. and F.C.; formal analysis, G.P. and I.K.; funding acquisition, F.C.; investigation, G.P., I.K. and F.C.; writing—original draft preparation, I.K.; writing— review and editing, G.P. and F.C.; visualization, G.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Acknowledgments:** We are indebted to Susanne Still and Matteo Marsili for collaboration and useful discussions years ago on joint works preceding the present one. Although they did not participate in this work, their ideas have remained a source of inspiration for us. I.K. is obliged to Risi Kondor for several enlightening discussions.

**Conflicts of Interest:** The authors declare no conflict of interest.
