*Article* **Reducing the Bias of the Smoothed Log Periodogram Regression for Financial High-Frequency Data**

#### **Erhard Reschenhofer \* and Manveer K. Mangat**

Department of Statistics and Operations Research, University of Vienna, Oskar-Morgenstern-Platz 1, 1090 Vienna, Austria; manveer.mangat@univie.ac.at

**\*** Correspondence: erhard.reschenhofer@univie.ac.at; Tel.: +43-1-4277-38646

Received: 8 March 2020; Accepted: 27 September 2020; Published: 10 October 2020

**Abstract:** For typical sample sizes occurring in economic and financial applications, the squared bias of estimators for the memory parameter is small relative to the variance. Smoothing is therefore a suitable way to improve the performance in terms of the mean squared error. However, in an analysis of financial high-frequency data, where the estimates are obtained separately for each day and then combined by averaging, the variance decreases with the sample size but the bias remains fixed. This paper proposes a method of smoothing that does not entail an increase in the bias. This method is based on the simultaneous examination of different partitions of the data. An extensive simulation study is carried out to compare it with conventional estimation methods. In this study, the new method outperforms its unsmoothed competitors with respect to the variance and its smoothed competitors with respect to the bias. Using the results of the simulation study for the proper interpretation of the empirical results obtained from a financial high-frequency dataset, we conclude that significant long-range dependencies are present only in the intraday volatility but not in the intraday returns. Finally, the robustness of these findings against daily and weekly periodic patterns is established.

**Keywords:** long-range dependence; log periodogram regression; smoothed periodogram; subsampling; intraday returns

**JEL Classification:** C13; C14; C22; C58

#### **1. Introduction**

After Mandelbrot (1971) had discussed the possibility that the strength of the statistical dependence of stock prices decreases very slowly, several researchers investigated this issue empirically. For example, Greene and Fielitz (1977) found indications of long-range dependence when they applied a technique called range over standard deviation (R/S) analysis (Hurst 1951; Mandelbrot and Wallis 1969; Mandelbrot 1972, 1975) to daily stock return series. This technique is based on the R/S statistic *Qn*, which is defined as the range of all partial sums of a time series of length *n* from its mean divided by its standard deviation. For a large class of short-range dependent processes, *Qn*/*n<sup>H</sup>* converges to a non-degenerate random variable if *H* = 0.5. An analogous result with *H* - 0.5 holds for long-range dependent processes. The parameter *H* is called the Hurst coefficient and is used as a measure of long-range dependence. However, Lo (1991) pointed out that the results obtained with this technique may be misleading because of the sensitivity of *Qn* to short-range dependence (see also Davis and Harte 1987; Hauser and Reschenhofer 1995) and proposed, therefore, a Newey and West (1987) type modification for the denominator of the R/S statistic, which is appropriate for general forms of short-range dependence. Contrary to the findings of Greene and Fielitz (1977) and others, he found no evidence of long-range dependence in daily and monthly index returns once the possible short-range dependence was properly taken care of. A disadvantage of Lo's (1991) modified R/S analysis is its dependence on an important tuning parameter, namely the truncation lag *q*, which determines the number of included autocovariances. The general conditions that ensure the consistency of the Newey and West estimator provide little guidance in selecting *q* in finite samples. Additionally, Andrews's (1991) data-dependent rule for choosing *q* is based on asymptotic arguments.

Long-range dependence can not only be characterized by a Hurst coefficient *H* - 0.5 but also by a slowly decaying autocorrelation function ρ or a spectral density *f* that is steep in a small neighborhood of frequency zero, i.e.,

$$
\rho(k)k^{1-2d} \to \ c \text{ as } k \to \infty, \ c > 0, \ d \neq 0,\tag{1}
$$

and:

$$f(\omega) \sim c\omega^{-2d}, \,\omega \in (0, \varepsilon), \, c > 0, \, d \neq 0,\tag{2}$$

respectively. The parameter *d* is called a memory parameter (or fractional differencing parameter) and is related to *H* by *d* = *H* − 0.5. It can be estimated by replacing the unknown spectral density *f* in (2) by the periodogram (Geweke and Porter-Hudak 1983) or a more sophisticated estimate of *f* (Hassler 1993; Peiris and Court 1993; Reisen 1994), taking the log of both sides, and regressing the log estimate on a deterministic regressor. Robustness against short-range dependence can be achieved by using only the *K* = *n*<sup>α</sup> lowest Fourier frequencies in the regression. A popular choice for the tuning parameter α is 0.5. For the purpose of testing, the asymptotic error variance is used. Applying the log periodogram regression method of Geweke and Porter-Hudak (1983) to the daily returns of the 30 components of the Dow Jones Industrials index and several indices, Barkoulas and Baum (1996) found no convincing evidence in favor of long-range dependence, which is not surprising in light of the finding of Mangat and Reschenhofer (2019) that the test based on the asymptotic error variance has very low power. Unfortunately, using the standard variance formula of the least squares estimator of the slope in a simple linear regression instead of the asymptotic error variance is also problematic because it leads to overrejecting the true null hypothesis (see Mangat and Reschenhofer 2019).

The negative results of Lo (1991) and Barkoulas and Baum (1996) are in line with the results obtained by Cheung and Lai (1995) with both modified R/S analysis and log periodogram regression for stock return data from eighteen countries and by Crato (1994) with fractionally integrated ARMA (ARFIMA) models (Granger and Joyeux 1980; Hosking 1981) for stock indices of the G-7 countries. Using not only the log periodogram regression with the asymptotic error variance but additionally also nonparametric techniques such as R/S analysis and modified R/S analysis as well as parametric techniques, Grau-Carles (2000) also found little evidence of long-range dependence in index returns but strong evidence of persistence in volatility measured as squared returns and absolute returns, respectively, which corroborates earlier findings of Crato and de Lima (1994) and Lobato and Savin (1998). In general, results obtained with ARFIMA models must be treated with caution. Firstly, the true model dimension is unknown in practice and reliable inference after automatic model selection is illusory. Secondly, Pötscher (2002) has shown that the problem of estimating the memory parameter *d* falls into the category of ill-posed estimation problems when the class of data generating processes is too rich. For example, Grau-Carles (2000) considered all ARFIMA(p,q) processes with *p* ≤ 3 and *q* ≤ 3, which is possibly an unnecessarily large class for return series.

While the bulk of empirical research focused on major capital markets, Barkoulas et al. (2000) examined an emerging capital market, namely the Greek stock market, with the log periodogram regression and obtained significant estimates of *d* in the range between 0.20 and 0.30 for values of the tuning parameter α between 0.5 and 0.6. However, their sample period is relatively short and the sampling frequency is weekly rather than daily. Even less confidence-inspiring are the positive results obtained by Henry (2002) with monthly data from several international stock markets. Clearly, methods that have been designed for large samples should not be applied to small and medium samples. Recently, small-sample tests for testing hypotheses about the memory parameter *d* have been proposed (Mangat and Reschenhofer 2019; Reschenhofer and Mangat 2020). When applied to asset returns, these tests produced negative results throughout. Cajueiro and Tabak (2004), Carbone et al. (2004), Batten and Szilagyi (2007), Batten et al. (2008), Souza et al. (2008), Batten et al. (2013), and

Auer (2016a, 2016b) observed time-variability of the Hurst exponent in stock returns, currency prices, and the prices of precious metals, respectively. These apparent changes were occasionally interpreted as indications of changing market efficiency or even used for the construction of trading strategies. Although it cannot be ruled out that some erratic estimator for the memory parameter *d* catches signals that are useful for trading purposes even when in fact there is no long-range dependence, there still seems to be a need for a more efficient estimator that actually allows to get some information about the true nature of the data generating process.

In general, there is always a trade-off between bias and variance. Estimators for the memory parameter *d* that are based on a smooth estimate of the spectral density have typically a smaller variance and a larger bias than those based on the periodogram (Chen et al. 1994; Reschenhofer et al. 2020), which is advantageous in situations where the squared bias is small relative to the variance. However, in the case of high-frequency financial data, there are usually gaps between the individual trading sessions, which make it necessary to estimate *d* separately for each trading session and compute the final estimate by averaging the individual estimates. Here, the variance decreases with the number of trading sessions but the bias remains fixed; hence, conventional smoothing methods, which achieve a reduction in the variance at the expense of an increase in the bias, are of no use. The goal of this paper is therefore to introduce a new method of smoothing that does not systematically have a negative impact on the bias. This method will be described in detail in the next section. Section 3 presents the results of an extensive simulation study, which compares the performance of various estimators for the memory parameter in terms of bias, variance, and root-mean-square error (RMSE). Using limit order book data obtained from Lobster, Section 4 searches for indications of long-range dependence both in the intraday volatility and in the intraday returns. Section 5 provides a conclusion.

#### **2. Methods**

#### *2.1. Log Periodogram Regression*

Fractionally integrated white noise satisfies the difference equation:

$$y\_t = (1 - L)^{-d} u\_{t\prime} \tag{3}$$

where *L* is the lag operator and *ut* is white noise with mean zero and variance σ<sup>2</sup> (Adenstedt 1974). Its spectral density is given by:

$$f(\omega) = \frac{\sigma^2}{2\pi} \left| 1 - e^{-i\omega} \right|^{-2d} = \frac{\sigma^2}{2^{1+2d}\pi} \left( \sin^2(\frac{\omega}{2}) \right)^{-d}.\tag{4}$$

The memory parameter *d*, which represents the degree of long memory if *d* - 0, can be estimated by regressing the log periodogram of the time series *y*1, ... , *yn* on a deterministic regressor (Geweke and Porter-Hudak 1983). Indeed, we have:

$$L\_{\dot{\jmath}} = \log l(\boldsymbol{\omega}\_{\dot{\jmath}}) = \boldsymbol{c} + d\mathbf{x}\_{\dot{\jmath}} + v\_{\dot{\jmath}\prime} \tag{5}$$

where:

$$I(\omega) = \frac{1}{2\pi\alpha} \left| \sum\_{t=1}^{n} y\_t e^{-i\omega t} \right|^2. \tag{6}$$

is the periodogram,

$$\omega\_{\mathbf{j}} = 2\pi \mathbf{j}/n, \; \mathbf{j} = 1, \dots, \mathbf{K} \le m = [(n-1)/2], \tag{7}$$

are the first *K* Fourier frequencies between 0 and π,

$$\mathbf{x}\_{j} = -2\log\left(\sin\left(\frac{\omega\_{j}}{2}\right)\right) \tag{8}$$

*Econometrics* **2020**, *8*, 40

is a deterministic regressor,

$$c = \log\left(\sigma^2 / (2^{1+2d} \,\, \pi)\right) \tag{9}$$

is a constant, and

$$w\_{\rangle} = \log \left( l(\omega\_{\rangle}) / f(\omega\_{\rangle}) \right). \tag{10}$$

are random perturbations. Choosing *K m* rather than *K* = *m* is advisable when it is suspected that not only long-term dependencies are present but also short-term dependencies, e.g., when the data come from an ARFIMA process:

$$y\_t = \left(1 - \phi\_1 L - \dots - \phi\_p L^p\right)^{-1} (1 - L)^{-d} \left(1 + \theta\_1 L + \dots + \theta\_q L^q\right) u\_t \tag{11}$$

(Granger and Joyeux 1980; Hosking 1981), where the parameter *d* takes care of the former dependencies and the parameters φ1, ... , φ*p*, θ1, ... , θ*<sup>q</sup>* take care of the latter. It is assumed that *d* < 0.5 (stationarity condition), *d* > −0.5 (invertibility condition), and all roots of the lag operator polynomials Φ(*L*) = <sup>1</sup> − φ1*L* − ... − φ*pLp* and Θ(*L*) = <sup>1</sup> + θ1*L* + ... + θ*qL<sup>q</sup>* lie outside the unit circle (causality condition and invertibility condition, respectively).

In the special case of *d* = *p* = *q* = 0 and Gaussianity, the ratios *I* ω*j* / *f* ω*j* are independent and identically distributed (i.i.d.) standard exponential and *v*1, ... , *vm* are, therefore, i.i.d. Gumbel with mean <sup>−</sup><sup>γ</sup> and variance <sup>π</sup>2/6, where <sup>γ</sup> = 0.57721 ... is Euler's constant. The variance of the Geweke Porter-Hudak (GPH) estimator ˆ *dGPH* is then identical to the variance of the ordinary least squares (OLS) estimator for the slope in a simple regression model, i.e.,

$$var(\hat{d}\_{\rm GPH}) = \frac{\sigma\_v^2}{S\_{\rm xx}} = \frac{\pi^2}{6S\_{\rm xx}},\tag{12}$$

where:

$$S\_{\text{xx}} = \sum\_{t=1}^{K} \left(\mathbf{x}\_t - \overline{\mathbf{x}}\right)^2. \tag{13}$$

In a neighborhood of frequency zero:

$$
\sin(\omega) \approx \omega\_\prime \tag{14}
$$

Hence:

$$S\_{\text{xx}} \approx 4 \sum\_{t=1}^{K} \left( \log(t) - \overline{\log(t)} \right)^2. \tag{15}$$

Furthermore:

$$\begin{array}{ll} \int\_{1}^{K} \log^{2}(t) - \frac{1}{\mathbb{K}} \Big( \int\_{1}^{K} \log(t) \Big)^{2} = K \log^{2}(K) - 2K \log(K) + 2(K - 1) \\\ \qquad - \frac{1}{\mathbb{K}} (K \log(K) - (K - 1))^{2} =: K + o(K). \end{array} \tag{16}$$

Indeed, we have:

$$s\_{\rm xx} = 4(K + o(K))\tag{17}$$

If:

$$K \log(K) / n \to 0 \tag{18}$$

(see Hurvich and Beltrao 1993), hence, the variance formula (10) becomes:

$$
bar{d}\_{GPH} \approx \frac{\pi^2}{24K} \tag{19}$$

in line with the asymptotic result:

$$\sqrt{\mathsf{K}} \Big( \hat{d}\_{\mathrm{GPH}} - d \Big) \xrightarrow{d} \mathrm{N} \Big( 0, \frac{\pi^2}{24} \Big) \tag{20}$$

derived by Hurvich et al. (1998) under the assumption that *K* = *o n*4/5 and log2(*n*) = *o*(*K*) for a class of stationary Gaussian long-memory processes with spectral densities of the form:

$$f(\omega) = \left| 1 - e^{-i\omega} \right|^{-2d} f^\*(\omega),\tag{21}$$

which includes all stationary ARFIMA processes.

If *d* - 0, the ratios *I* ω*j* / *f* ω*j* are neither independent nor identically distributed, not even asymptotically (Robinson 1995). The problem is the irregular behavior of the spectral density in the neighborhood of frequency zero, i.e., *f*(ω) → ∞ as ω → 0 if *d* > 0 and *f*(ω) → 0 as ω → 0 if *d* < 0. Robinson (1995), therefore, proposed to remove the lowest Fourier frequencies from the log periodogram regression. Künsch (1986) showed that in the case of ARFIMA processes, the ratios *I* ω*j* / *f* ω*j* , *j* = *<sup>H</sup>* <sup>+</sup> 1, ... , *<sup>H</sup>* <sup>+</sup> *<sup>K</sup>* are indeed asymptotically i.i.d. standard exponential provided that (*<sup>H</sup>* <sup>+</sup> <sup>1</sup>)/ <sup>√</sup> *n* → ∞ and (*H* + *K*)/*n* → 0. However, Reisen et al. (2001) and Mangat and Reschenhofer (2019) found that even the removal of only the first Fourier frequency already has a negative effect on the performance of the estimator ˆ *dGPH*.

#### *2.2. Smoothing the Periodogram*

An obvious possibility to further develop the estimator ˆ *dGPH* is to smooth the periodogram before it is used in the regression (5) (Hassler 1993; Peiris and Court 1993; Reisen 1994). In order to illustrate the effect of smoothing, we consider the simple case of *K*/3 non-overlapping averages:

$$(l(\omega\_{j-1}) + l(\omega\_j) + l(\omega\_{j+1})) / 3, \; j = 2, 5, 8, \dots, K - 1. \tag{22}$$

In this case, the sample size is divided by three but at the same time the variance of the error term decreases approximately from:

$$\text{var}\left(\log\left(\frac{I(\omega\_{\bar{j}})}{f(\omega\_{\bar{j}})}\right)\right) \approx \frac{\pi^2}{6} \tag{23}$$

to the variance of the log chi-square distribution with 6 degrees of freedom because:

$$\operatorname{var}\left(\log\left(\frac{l\left(\omega\_{j-1}\right) + l\left(\omega\_{j}\right) + l\left(\omega\_{j+1}\right)}{3\left\{f\left(\omega\_{j}\right)\right\}}\right)\right) \approx \operatorname{var}\left(\log\left(\frac{2l\left(\omega\_{j-1}\right)}{f\left(\omega\_{j-1}\right)} + \frac{2l\left(\omega\_{j}\right)}{f\left(\omega\_{j}\right)} + \frac{2l\left(\omega\_{j+1}\right)}{f\left(\omega\_{j+1}\right)}\right)\right).\tag{24}$$

Noting that the mean (first cumulant) and the variance (second cumulant) of the log chi-square distribution with *k* degrees of freedom are given by:

$$\kappa\_1 = \log(2) + \psi(k/2) \tag{25}$$

and:

$$
\kappa\_2 = \psi'(k/2),
\tag{26}
$$

respectively. we obtain for *k* = 6, κ<sup>1</sup> = 1.615932 and κ<sup>2</sup> = 0.3949341. Here, ψ is the digamma function and ψ is its first derivative. Overall, the (approximate) variance of the least squares estimator of the memory parameter *d* decreases from

$$\frac{\pi^2}{6} \frac{1}{4K} = 1.644934 \frac{1}{4K} \tag{27}$$

to

$$
\psi'(3)\frac{1}{4K/3} = 1.184802\frac{1}{4K},\tag{28}
$$

where we have assumed that

$$\frac{1}{K/3} \sum\_{t=2,5,\dots} (\mathbf{x}\_t - \overline{\mathbf{x}})^2 \approx \frac{1}{K} \sum\_{t=1}^K (\mathbf{x}\_t - \overline{\mathbf{x}})^2 \approx 4. \tag{29}$$

The little practical relevance of asymptotic results such as (20) can be seen when the asymptotic values are confronted with the actual values obtained by simulations. In the simplest case of Gaussian white noise, we do not have to safeguard against short-range dependence and can therefore choose a value of <sup>α</sup> slightly below 4/5. Choosing <sup>α</sup> = 0.7 and *<sup>K</sup>* <sup>≈</sup> *<sup>n</sup>*α, we obtain 0.00857 (27) and 0.00617 (28) vs. 0.01148 and 0.00885 (simulated) for *n* = 250 and *K* = 48, 0.00326 and 0.00235 vs. 0.00381 and 0.00282 for *n* = 1000 and *K* = 126, 0.00065 and 0.00047 vs. 0.00068 and 0.00050 for *n* = 10, 000 and *K* = 630, and 0.00021 and 0.00015 vs. 0.00021 and 0.00015 for *n* = 50, 000 and *K* = 1947. Obviously, huge sample sizes are required for good agreement. In the case of a nontrivial ARFIMA process, this problem will become even more serious because a smaller value of α must be chosen.

More sophisticated further developments of the estimator ˆ *dGPH* are obtained by using more than three periodogram ordinates, allowing for overlaps, and introducing weights, or, equivalently, by using a lag-window estimator of the form:

$$f(w\_j) = \frac{1}{2\pi} \sum\_{s=-\text{m}}^{\text{m}} w(s/m) \dot{\gamma}(s) e^{-i\omega\_j s}, \; j = 1, \dots, K,\tag{30}$$

where <sup>γ</sup>ˆ(*s*) denotes the sample autocovariance at lag s and the lag window *<sup>w</sup>* satisfies *<sup>w</sup>*(0) <sup>=</sup> 1, *w*(*s*) <sup>≤</sup> 1, and *<sup>w</sup>*(−*s*) <sup>=</sup> *<sup>w</sup>*(*s*) (see Hassler 1993; Peiris and Court 1993; Reisen 1994). A disadvantage of these estimation procedures is that they require the specification of a second tuning parameter, namely the length of the weighted averages in the former case and *m* ≤ *n* − 1 in the latter case, in addition to *K*. Of course, suitable weights and a suitable lag window, respectively, must be chosen too. Carrying out an extensive simulation study to compare various frequency-domain estimators for *d*, Reschenhofer et al. (2020) found that too strong smoothing, e.g., caused by choosing a too small value for *m*, entails an extremely large bias. Hunt et al. (2003) derived an approximation for the bias and observed generally a good agreement between their approximation and the corresponding value obtained by simulations when *d* > 0. However, the practical relevance of this approximation is limited because of its dependence on characteristics of the data generating process, which are unknown in practice.

#### *2.3. Using Subsamples*

A simple method of smoothing without introducing a bias is to average estimates obtained from different subsamples. Assume, for example, that the final estimate ˆ *d* is obtained by averaging over *N* preliminary estimates ˆ *d*1, ... , ˆ *dN* obtained from independent subsamples *y*11, ... , *yn*1, ... , *y*1*N*, ... , *ynN*; then, the variance of ˆ *d* vanishes as *N* increases while the bias remains unchanged. Of course, artificially splitting a long, homogeneous time series into non-overlapping subseries does not necessarily have a positive effect. For illustration, consider the simplest case where the time series *y*1, ... , *yn* is split into two disjoint subseries *y*1, ... , *yn*/2 and *yn*/2<sup>+</sup>1, ... , *yn* of equal length. To allow a fair comparison, the frequency range (0, ω*K*], is kept constant, which implies that in the case of the two subseries the number of used Fourier frequencies is *K*/2. Under the simplistic and mostly unrealistic assumption that the two subseries are independent, the (approximate) variance of the mean of the two GPH estimators based on the two subseries is given by:

$$\frac{1}{4} \left( \frac{\pi^2}{6} \frac{1}{4K/2} + \frac{\pi^2}{6} \frac{1}{4K/2} \right) = \frac{\pi^2}{6} \frac{1}{4K} \tag{31}$$

which is, therefore, of the same size as that of the original estimator, which is based on the whole time series. However, there is still room for improvement. A reduction in the variance may be achieved by *Econometrics* **2020**, *8*, 40

allowing for overlaps between the subseries, e.g., with a rolling estimation window or a combination of different partitions.

At first glance, the idea of improving an OLS estimator by averaging the OLS estimators obtained from the whole sample and the first and second halves, respectively, seems to be at odds with the Gauß-Markov theorem because the combined estimator is still linear. However, the crucial point here is that only the observations are partitioned and not the log periodogram, which is used as dependent variable in the regression and is obtained from the observations through nonlinear transformations. For illustration, consider an estimator of the form:

$$
\overline{d}\_2 = (1 - 2\lambda)\hat{d}\_1 + \lambda\hat{d}\_{21} + \lambda\hat{d}\_{22} \tag{32}
$$

where ˆ *d*1, ˆ *d*21, ˆ *d*<sup>22</sup> are the OLS estimators for *d* based on the log periodograms *L*1, *L*21, *L*<sup>22</sup> of the whole sample and the first and second halves, respectively. In the special case of Gaussian white noise with variance 2π, the constant *c* in the regression (3) vanishes, and we may, therefore, use the simpler estimators:

$$d\_1 = \frac{\sum\_{j=1}^{K} \underline{\chi}\_j L\_j^1}{\sum\_{j=1}^{K} \underline{\chi}\_j^2} \approx \frac{1}{4K} \sum\_{j=1}^{K} \underline{\chi}\_j L\_j^1. \tag{33}$$

and:

$$d\_{2s} = \frac{\sum\_{j=1}^{K/2} \underline{\mathbf{x}}\_{2j} L\_j^{2s}}{\sum\_{j=1}^{K/2} \underline{\mathbf{x}}\_{2j}^2} \approx \frac{1}{2K} \sum\_{j=1}^{K/2} \underline{\mathbf{x}}\_{2j} L\_j^{2s}, s = 1, 2,\tag{34}$$

where *xj* <sup>=</sup> *xj* <sup>−</sup> *<sup>x</sup>*. For the variances of the simplistic estimators ˘ *d*<sup>1</sup> and:

$$d\_2 = (1 - 2\lambda)d\_1 + \lambda d\_{21} + \lambda d\_{22\nu} \tag{35}$$

we obtain approximately:

$$
bar{(d\_1)} \approx \left(\frac{1}{4K}\right)^2 \sum\_{j=1}^K \underline{\underline{x}}\_j^2 \frac{\pi^2}{6} \approx \frac{\pi^2}{24K} \tag{36}$$

and:

$$\begin{split} var(\vec{d}\_2) &\approx \frac{\pi^2}{24\mathcal{K}}((1-2\lambda)^2 + 4\lambda^2) + 4\lambda(1-2\lambda)cov(\vec{d}\_1, \vec{d}\_{21})\\ &\approx \frac{\pi^2}{24\mathcal{K}}((1-2\lambda)^2 + 4\lambda^2 + 4\lambda(1-2\lambda)(\rho\_0 + \rho\_1)) \\ &\approx 0.69 \frac{\pi^2}{24\mathcal{K}} \text{ if } \lambda = \frac{1}{4} \end{split} \tag{37}$$

respectively, where we have used that *cov*( ˘ *d*1, ˘ *d*21) = *cov*( ˘ *d*1, ˘ *d*22) and *cov*( ˘ *d*21, ˘ *d*22) = 0 as well as the rough approximations:

$$\sum\_{j=1}^{\frac{K}{2}} \underline{\mathbf{x}}\_{2j}^{2} \approx \sum\_{j=1}^{\frac{K}{2}} \underline{\mathbf{x}}\_{2j} \underline{\mathbf{x}}\_{2j-1} \approx \sum\_{j=1}^{\frac{K}{2}-1} \underline{\mathbf{x}}\_{2j} \underline{\mathbf{x}}\_{2j+1} \approx 2K,\tag{38}$$

$$\text{cor}(L\_{j}^{1}, L\_{k}^{2\varepsilon}) \approx \begin{cases} \quad \rho\_{0} = 0.35, \text{ if } 2k = j, \\\ \rho\_{1} = 0.13, \text{ if } \begin{vmatrix} 2k - j \\ 0 \end{vmatrix} = 1, \\\ \text{0\\_else} \end{cases} \tag{39}$$

*Econometrics* **2020**, *8*, 40

(see Table 1), and:

$$\begin{split} cov(\boldsymbol{d}\_{1}, \boldsymbol{d}\_{21}) &\approx \frac{1}{8\mathcal{K}^{2}} cov(\sum\_{j=1}^{\frac{K}{2}} \underline{\mathbf{x}}\_{2j} L\_{2j}^{1} + \sum\_{j=1}^{\frac{K}{2}} \underline{\mathbf{x}}\_{2j-1} L\_{2j-1}^{1} \sum\_{k=1}^{\frac{K}{2}} \underline{\mathbf{x}}\_{2k} L\_{k}^{21}) \\ &\approx \frac{1}{8\mathcal{K}^{2}} \Big( \sum\_{j=1}^{\frac{K}{2}} \underline{\mathbf{x}}\_{2j} \underline{\mathbf{x}}\_{2k} \operatorname{cov}(L\_{2j}^{1}, L\_{2k}^{21}) + \sum\_{j=1}^{\frac{K}{2}} \sum\_{k=1}^{\frac{K}{2}} \underline{\mathbf{x}}\_{2j-1} \underline{\mathbf{x}}\_{2k} \operatorname{cov}(L\_{2j-1}^{1}, L\_{2k}^{21}) \Big) \\ &\approx \frac{1}{8\mathcal{K}^{2}} \Big( \rho\_{0} \frac{\pi^{2}}{6} \sum\_{j=1}^{\frac{K}{2}} \underline{\mathbf{x}}\_{2j}^{2} + \rho\_{1} \frac{\pi^{2}}{6} \sum\_{j=1}^{\frac{K}{2}} \underline{\mathbf{x}}\_{2j} \underline{\mathbf{x}}\_{2j-1} \Big) \\ &\approx \frac{\pi^{2}}{24K} (\rho\_{0} + \rho\_{1}) \end{split} \tag{40}$$

For a further reduction of the variance, we may consider more general estimators of the form:

$$\widetilde{d}\_k = \frac{1}{k} \Big(\widehat{d}\_1 + \sum\_{j=2}^k \frac{1}{j} (\widehat{d}\_{j1} + \dots + \widehat{d}\_{j\bar{j}}) \Big) \tag{41}$$

which are based on *k* partitions. The next section examines whether this possible reduction actually materializes and whether it is accompanied by an increase in the bias. All computations are carried out with the free statistical software R (R Core Team 2018).

**Table 1.** Sample correlations between *Lj*, *j* = 1, ... , 20, and *L*<sup>1</sup> *k* , *k* = 1, ... , 10, obtained from 10,000,000 realizations of Gaussian white noise (*n* = 400).


#### **3. Simulations**

In this section, we compare the new estimator *d* -*<sup>k</sup>* (41) for *k* = 2, 3, 5, 10 with Geweke and Porter-Hudak's (1983) estimator ˆ *dGPH*, which is based on the log periodogram regression (5), and the estimators ˆ *dsm* and ˆ *d* β *smP*, which are obtained by replacing the periodogram ordinates in (5) by simple moving averages of neighboring periodogram ordinates and lag-window estimates of the form (30) with truncation lags *<sup>m</sup>* <sup>=</sup> *n*β , β = 0.5, 0.7, 0.9, 1, respectively. In the latter case, the Parzen window is used, which is given by:

$$w(z) = \begin{cases} \left. \left. 1 - 6z^2 + 6|z|^3 \right| \right| |z| < \frac{1}{2}, \\\left. 2(1 - |z|)^3 \right| \left. \frac{1}{2} \le |z| \le 1. \end{cases} \tag{42}$$

With a view to the later application of the estimators to 1-min intraday returns in Section 4, the sample size *n* = 390 is chosen for our simulation study because there are 390 min in a regular

trading session for U.S. stocks, which starts at 9:30 a.m. and ends at 4:00 p.m. The number *K* of Fourier frequencies included in the log periodogram regression is defined by setting *<sup>K</sup>* = 20 <sup>≈</sup> [*n*α], <sup>α</sup> = 0.5. For *k* = 2, the first *K*/*k* = 10 Fourier frequencies of the two disjoint subseries of length *n*/*k* = 195 are given by ω2, ω4, ... , ω*K*, and for *k* = 10, the first *K*/*k* = 2 Fourier frequencies of the 10 disjoint subseries of length *n*/*k* = 39 are given by ω10, ω*K*. Clearly, we cannot go beyond *k* = 10 because at least two frequencies are required to carry out the log periodogram regression. Additionally, using frequencies outside the interval (0, ω*K*] is not an option because this would amount to an unfair advantage, particularly when there are no short-term dependencies which have to be taken into account.

With the help of the R-package 'fracdiff', 10,000 realizations of length *n* = 390 of ARFIMA(1,*d*,0) processes with standard normal innovations and parameter values *d* = −0.25, −0.1, 0, 0.1, 0.25 and φ<sup>1</sup> = −0.25, −0.1, 0, 0.1, 0.25, respectively, are generated using a burn-in period of 10,000. For each realization, the estimators ˆ *dGPH*, ˆ *dsm*, ˆ *d* β *smP*, β = 0.5, 0.7, 0.9, 1, *d* -*k*, *k* = 2, 3, 5, 10, are employed for the estimation of the memory parameter *d*. The competing estimators are compared with respect to bias (Table 2), variance (Table 3), and RMSE (Table 4). Table 3 shows that *d* -<sup>2</sup> has indeed a smaller variance than ˆ *dGPH* = *d* -1. The variance keeps decreasing as the number of partitions increases from two to 10. Table 2 shows that this improvement does in general not come at the cost of a greater bias. In contrast, the reduction in the variance achieved in the case of the estimator ˆ *d* β *smP* by increasing the degree of smoothing from β = 0.9 to β = 0.5 is for *d* - 0 accompanied by a dramatic increase in the bias. Overall, in terms of the RSME, the best results are obtained with ˆ *d*0.5 *smP* for small values of *<sup>d</sup>* and with <sup>ˆ</sup> *d*0.7 *smP* for larger value of *d*. However, this is only relevant in the standard case where only a single time series is available. When a large number of time series are examined simultaneously (as in the empirical study of Section 4), the bias is the decisive factor and the new estimators *d* -*<sup>k</sup>* are therefore more appropriate than the conventional estimators ˆ *d* β *smP*.

Since values of β such as 0.5, 0.7, or 0.9 are usually chosen to minimize the MSE for a single sample, we may suspect that the estimator ˆ *d* β *smP* becomes more competitive in the case of multiple samples when the averaging is taken into account. This can be done by further reducing the degree of smoothing. Unfortunately, there is a limit to what can be achieved by increasing the value of β. Table 2 shows that large biases are still obtained with the maximum possible value of β, i.e., β = 1. This is due to the fact that global smoothing inevitably causes local distortions and cutting off higher-order sample autocovariances is not the only source of smoothing. Downweighting the sample autocovariances with the Parzen window also has a strong smoothing effect, even when all sample autocovariances are used.

**Table 2.** Bias of the estimators ˆ *dGPH* (log periodogram regression), ˆ *dsm* (simple smoothing), ˆ *d* β *smP*, β = 1, 0.9, 0.7, 0.5 (smoothing with Parzen window and truncation lag *m* = [*n*β]), and *d* -*k*, *k* = 2, 3, 5, 10 (*k* partitions) obtained from 10,000 realizations (length: *n* = 390, number of used Fourier frequencies: *K* = 20) of Gaussian ARFIMA(1,*d*,0) processes with *d* = −0.25, −0.1, 0, 0.1, 0.25 and φ<sup>1</sup> = −0.25, −0.1, 0, 0.1, 0.25.


**Table 2.** *Cont.*


**Table 3.** Variance of the estimators ˆ *dGPH* (log periodogram regression), ˆ *dsm* (simple smoothing), ˆ *d* β *smP*, <sup>β</sup> <sup>=</sup> 1, 0.9, 0.7, 0.5 (smoothing with Parzen window and truncation lag *<sup>m</sup>* = [*n*β]), and *<sup>d</sup>* -*k*, *k* = 2, 3, 5, 10 (*k* partitions) obtained from 10,000 realizations (length: *n* = 390, number of used Fourier frequencies: *K* = 20) of Gaussian ARFIMA(1,*d*,0) processes with *d* = −0.25, −0.1, 0, 0.1, 0.25 and φ<sup>1</sup> = −0.25, −0.1, 0, 0.1, 0.25.


**Table 4.** RMSE of the estimators ˆ *dGPH* (log periodogram regression), ˆ *dsm* (simple smoothing), ˆ *d* β *smP*, β = 1, 0.9, 0.7, 0.5 (smoothing with Parzen window and truncation lag *m* = [*n*β]), and *d* -*k*, *k* = 2, 3, 5, 10 (*k* partitions) obtained from 10,000 realizations (length: *n* = 390, number of used Fourier frequencies: *K* = 20) of Gaussian ARFIMA(1,*d*,0) processes with *d* = −0.25, −0.1, 0, 0.1, 0.25 and φ<sup>1</sup> = −0.25, −0.1, 0, 0.1, 0.25.


#### **4. Empirical Results**

In this section, we employ the estimators discussed in the previous sections for the search of possible long-range dependencies in intraday returns and absolute intraday returns. For this purpose, the limit order book data from 27 June 2007 to 30 April 2019 (2980 trading days) of the iShares Core S&P 500 ETF (IVV) are downloaded from Lobster (https://lobsterdata.com). In the process of data cleaning, 27 early-closure days (the day before Independence Day, the day after Thanksgiving, and Christmas Eve) are removed as well as 9 January 2019 because of a large number of missing values. For each of the remaining days, the first mid-quotes (midpoints of the best bid and ask quotes) in each minute and the last mid-quote in the last minute are computed and subsequently used to obtain 1-min log returns. Finally, another three days are omitted because of extreme returns, namely 19 September 2008, 6 May 2010, and 24 August 2015, which leaves 2949 days for our analysis. Estimates are computed for each day, divided by the number of days, and plotted cumulatively; hence, the last values correspond to the averages of the estimates. The validity of these values is reinforced by the striking linearity of the curves. This linearity also implies that the possible long-range dependence is not changing over time; hence, there appears to be no such thing as fractal dynamics. Figure 1a suggests that *d* is close to zero in case of the 1-min log returns. The large negative values obtained with ˆ *d*0.9 *smP* and <sup>ˆ</sup> *d*0.7 *smP* as well as the comparatively inconspicuous values obtained with ˆ *d*0.5 *smP* can be explained with the help of the results of our simulation study. According to Table 2, they are indicative for *d* = 0. In contrast, there is strong evidence of long-range dependence in the volatility (see Figure 1b). Most estimators suggest

that the memory parameter *d* is approximately in the range between 0.3 and 0.4. Only the estimator ˆ *d*0.5 *smP*, which is severely downward biased in case of positive *d* (see Table 2), favors a smaller value.

**Figure 1.** Cumulative plots of the estimates obtained by applying ˆ *dGPH* (blue), ˆ *dsm* (darkgreen), ˆ *d*1 *smP* (green), ˆ *d*0.9 *smP* (gold), <sup>ˆ</sup> *d*0.7 *smP* (red), <sup>ˆ</sup> *d*0.5 *smP* (orange), <sup>ˆ</sup> *d*<sup>2</sup> (pink), ˆ *d*<sup>3</sup> (magenta), ˆ *d*<sup>5</sup> (turquoise), ˆ *d*<sup>10</sup> (yellowgreen) to the (**a**) 1-min intraday log returns *rt*(*s*), *<sup>s</sup>* <sup>=</sup> 1, ... , 390, (**b**) absolute 1-min intraday log returns *rt*(*s*) , (**c**) *rt*(*s*) − *rt*−1(*s*), (**d**) *rt*(*s*) <sup>−</sup> *rt*−1(*s*) , (**e**) *rt*(*s*) <sup>−</sup> *rt*−5(*s*), (**f**) *rt*(*s*) <sup>−</sup> *rt*−5(*s*) .

Visual significance of the differences between certain estimates can be ascertained just by observing the large differences between the slopes of the corresponding lines in Figure 1 and noting the striking stability of these lines over time. However, we still might want to augment our visual analysis with a formal statistical test. A simple way to accomplish that is to calculate the difference between two estimates separately for each trading day and compare the number of positive differences to the number of negative differences (sign test). Not surprisingly, the resulting p-values are infinitesimal. For example, even in the case of the two neighboring lines corresponding to ˆ *dGPH* and *d* -<sup>2</sup> in Figure 1b, the *p*-value is less than 2.2 <sup>×</sup> 10−16. It is still less than 9.7 <sup>×</sup> 10−<sup>8</sup> when we omit most of the trading days and use only Wednesdays in order to ensure approximate independence of the subsamples. Note that there are 4 × 390 = 1560 1-min returns between the last 1-min return of some Wednesday

and the first 1-min return of the next Wednesday plus five overnight breaks and a whole weekend. Even for a relatively large value of the memory parameter such as *d* = 0.3, the autocorrelation of an ARFIMA(0,*d*,0) process at lag *j* = 1561 is quite small, i.e.,

$$\rho(j) = \frac{\Gamma(j+d)\Gamma(1-d)}{\Gamma(j-d+1)\Gamma(d)} = \prod\_{h=1}^{j} \frac{h-1+d}{h-d} \approx 0.023.\tag{43}$$

Finally, in order to check the robustness of our findings against daily and weekly periodic patterns, we repeat the graphical analyses with suitably transformed data. Replacing the 1-min log returns *rt*(*s*), *s* = 1, ... , 390, by the daily differences *rt*(*s*) − *rt*−1(*s*) and the weekly differences *rt*(*s*) − *rt*−5(*s*), respectively, ensures that any daily or weekly periodic patterns are erased while long-range dependencies remain unaffected. Figure 1c,e are very similar to Figure 1a, which shows that the insights gained from Figure 1a are genuine. Analogously, comparing Figure 1d,f with Figure 1b, we see that the same is true for the absolute returns

#### **5. Discussion**

In this paper, we have introduced a new estimator for the memory parameter *d*, which is based on running a log periodogram regression repeatedly for different partitions of the data. In contrast to conventional smoothing methods, which manage to achieve a reduction in the variance at the expense of an increase in the bias, our approach does not systematically have a negative impact on the bias, which makes it particularly useful for applications where the bias is the decisive factor. For example, intraday returns are usually only available during trading hours and estimation must therefore be carried out separately for each trading day. When the individual estimates are eventually combined by averaging, the variance decreases as the sample size increases, but the bias does not change. The results of an extensive simulation study confirm the good performance of the new estimator. It outperforms all of its competitors when both bias and variance are taken into account, but the bias is weighted more heavily.

The importance of results obtained with the help of simulations is due to the fact that reliable inference on the memory parameter *d* is not possible under general conditions. Some asymptotic results can be obtained under very restrictive conditions though. Unfortunately, convergence is typically very slow (recall the discussion in Section 2.2). Indeed, Pötscher (2002) showed that many common estimation problems in statistics and econometrics, which include the estimation of *d*, are ill-posed in the sense that the minimax risk is bounded from below by a positive constant independent of *n* and does, therefore, not converge to zero as *<sup>n</sup>* → ∞. In particular, he found that for any estimator <sup>ˆ</sup> *dn* for *d* based on a sample of size *n* from a Gaussian process with spectral density *f*:

$$\sup\_{f \in \mathcal{F}} E\left|\hat{d}\_n - d\right|^r \ge \frac{1}{2^r} > 0,\tag{44}$$

where 1 ≤ *r* < ∞ and F is the set of all ARFIMA spectral densities (*p* ≥ 0, *q* ≥ 0), ARFI spectral densities (*p* ≥ 0, *q* = 0), or FIMA spectral densities (*p* = 0, *q* ≥ 0). Furthermore, he showed that for every *f*<sup>0</sup> ∈ F , (44) holds also "locally," when the supremum is taken over an arbitrarily small *L*1-neighborhood of *f*0. Finally, he established that confidence intervals for *d* coincide with the entire parameter space for *d* with high probability and are therefore uninformative. Nevertheless, it may be possible to formally derive the statistical properties of our new estimator for a rather narrow class of processes such as low order ARFI processes. However, this is left for future research. The current paper provides just a proof of concept.

In our empirical investigation of high-frequency data of an index ETF, we have applied the competing estimators to 1-min log returns and absolute 1-min log returns separately for each day. The results are quite stable over time and across estimation methods. The few deviations are due to conventional smoothing methods and can easily be explained by the size of their bias as shown

in Table 2. We may, therefore, safely conclude that significant long-range dependencies are present only in the intraday volatility but not in the intraday returns. These findings are genuine and not just due to daily or weekly periodic patterns because similar results are obtained when daily and weekly differences are investigated instead of the original intraday returns.

**Author Contributions:** Both authors contributed equally to the paper. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research received no external funding.

**Acknowledgments:** The authors thank the academic editor and three anonymous reviewers for helpful comments and suggestions. Open Access Funding by the University of Vienna.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

### *Article* **Regularized Maximum Diversification Investment Strategy †**

**N'Golo Koné**

Department of Economics, Queen's University, 94 University Avenue Kingston, Kingston, ON K7L 3N6, Canada; ngk1@queensu.ca; Tel.: +1-514-652-1005

† I am greatly indebted to Marine Carrasco for her invaluable guidance. I thank Georges Dionne,

Tony S. Wirjanto, Jade Wei, and two anonymous referees for their helpful comments.

**Abstract:** The maximum diversification has been shown in the literature to depend on the vector of asset volatilities and the inverse of the covariance matrix of the asset return covariance matrix. In practice, these two quantities need to be replaced by their sample statistics. The estimation error associated with the use of these sample statistics may be amplified due to (near) singularity of the covariance matrix, in financial markets with many assets. This, in turn, may lead to the selection of portfolios that are far from the optimal regarding standard portfolio performance measures of the financial market. To address this problem, we investigate three regularization techniques, including the ridge, the spectral cut-off, and the Landweber–Fridman approaches in order to stabilize the inverse of the covariance matrix. These regularization schemes involve a tuning parameter that needs to be chosen. In light of this fact, we propose a data-driven method for selecting the tuning parameter. We show that the selected portfolio by regularization is asymptotically efficient with respect to the diversification ratio. In empirical and Monte Carlo experiments, the resulting regularized rules are compared to several strategies, such as the most diversified portfolio, the target portfolio, the global minimum variance portfolio, and the naive 1/N strategy in terms of in-sample and outof-sample Sharpe ratio performance, and it is shown that our method yields significant Sharpe ratio improvements.

**Keywords:** portfolio selection; maximum diversification; regularization

**JEL Classification:** G11; C16; C52

#### **1. Introduction**

Since the seminal work of Markowitz (1952), which offers essential basis to portfolio selection, diversification issues have been in the center of many problems in the financial market. According to Markowitz's portfolio theory, a portfolio is diversified if its variance could not be reduced any further at the same level of the expected return.The fundamental objective of this diversification is to construct a portfolio with various assets that earns the highest return for the least volatility that may be a good alternative to the market capweighted portfolios. In fact, there is evidence that market portfolios are not as efficient as assumed by Sharpe (1964) in his Capital Asset Price Model (CAPM). The CAPM model as introduced by Sharpe (1964) implies that the tangency portfolio is the only efficient one and should produce the greatest returns relative to risk. Nonetheless, several empirical studies have shown that investing in the minimum variance portfolio yields better out-of-sample results than does an investment in the tangency portfolio (see Haugen and Baker 1991; Choueifaty et al. 2013; Lohre et al. 2014).

Even if these surprising results seem to be due to the high estimation risk associated with the expected returns (according to Kempf and Memmel (2006)), the efficiency of the market capitalization-weighted index has been questioned motivating numerous investment alternatives (see Arnott et al. (2005); Clarke et al. (2006); Maillard et al. (2010)). Subsequently, Choueifaty (2011) introduced the concept of maximum diversification, via

**Citation:** Koné, N'Golo. 2021. Regularized Maximum Diversification Investment Strategy. *Econometrics* 9: 1. http://doi.org/ 10.3390/econometrics9010001

Received: 17 August 2020 Accepted: 22 December 2020 Published: 29 December 2020

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/ licenses/by/4.0/).

a formal definition of portfolio diversification: the diversification ratio (DR) and claimed that portfolios with maximal DRs were maximally diversified and provided an efficient alternative to market cap-weighted portfolios.

This optimal maximum diversification portfolio is shown to be a function of the inverse of the covariance matrix of asset returns (see Theron and Van Vuuren 2018), which is unknown and needs to be estimated. Solving for the maximum diversification portfolio leads to estimate the covariance matrix of returns and take its inverse. This results in estimation error, amplified due to (near) singularity of the covariance matrix, in financial markets with many assets. This, in turn, may lead to the selection of portfolios that are far from the optimal regarding standard portfolio performance measures of the financial market. Therefore, Choueifaty et al. (2013) propose the most diversified portfolio (MDP) by imposing a non-negative constraint on the maximum diversification problem1. However, this ad hoc constraint suggests that the MDP is unlikely to represent the final word of diversification. Without the ability to short securities it may be impossible to unlock the full range of uncorrelated risk sources present in the market (see Maguire et al. 2014). Therefore, this paper proposes a more general method to control for estimation error in the covariance matrix of asset returns without restricting the ability to short sell in the financial market. This method is fundamentally based on different ways to stabilize the inverse of the covariance matrix particularly useful when the number of assets in the financial market increases considerably compared with the estimation window. More precisely, as in Carrasco (2012) and Carrasco and Tchuente (2015) we investigate three regularization techniques including the spectral cut-off, the Tikhonov, and Landweber–Fridman approaches in order to stabilize the inverse of the covariance matrix. This procedure has been used by Carrasco et al. (2019) to stabilize the inverse of the covariance matrix in the mean-variance portfolio.

These regularization schemes involve a tuning parameter that needs to be chosen. Therefore, we propose a data-driven method for selecting the tuning parameter that minimizes the distance between the inverse of the estimated covariance matrix and the inverse of the population covariance matrix.

We show, under appropriate regularity conditions, that the selected strategy by regularization is asymptotically efficient with respect to the diversification ratio for a wide choice of the tuning parameter. Meaning that, even if the optimal diversified portfolio is unknown, there exists a feasible portfolio obtained by regularization capable of reaching similar level of performance in terms of the diversification ratio.

To evaluate the performance of our procedures, we implement a simulation exercise based on a three-factor model calibrated on real data from the US financial market. We obtain by simulation that our procedure significantly improve the performance of the proposed strategy with respect to the Sharpe ratio. Moreover, the regularized rules are compared to several strategies such as the most diversified portfolio, the target portfolio, the global minimum variance portfolio, and the naive 1/N strategy in terms of in-sample and out-of-sample Sharpe ratio, and it is shown that our method yields significant Sharpe ratio improvements. To confirm our simulations, we do an empirical analysis using Kenneth R. French's 30-industry portfolios, 100 portfolios formed on size and book-to-market, and a subset of the S&P500 index constituents. The empirical results show that by stabilizing the inverse of the covariance matrix in the maximum diversification portfolio, we considerably improve the performance of the selected strategy in terms of maximizing the Sharpe ratio.

The main finding of this paper is that by stabilizing the inverse of the covariance matrix in the maximum diversification portfolio, we considerably improve the performance of the selected portfolio with respect to several statistics in the financial market including the diversification ratio, the Sharpe ratio, and the rebalancing costs (turnover) as shown by extensive simulations and empirical study. Therefore, our methods are highly

<sup>1</sup> The objective is to reduce the effect of estimation error on the performance of selected maximum diversification portfolio.

recommended for investors in the sense that these procedures help them to select very effective strategies with lower rebalancing cost.

The rest of the paper is organized as follows. Section 2 presents the economy. The regularized portfolio is presented in Section 3. Section 4 gives some asymptotic properties of the selected strategy and proposes a data-driven method to select the tuning parameter. Section 5 presents some simulation results and an empirical study. Section 6 concludes the paper.

#### **2. The Model**

We consider a simple economy with *N* risky assets with random returns vector *Rt*+<sup>1</sup> and a risk-free asset. Let us denote *Rf* the gross return on this risk-free asset. Empirically with monthly data, *Rf* is calibrated to be the mean of the one-month Treasury-Bill (T-B) rate observed in the data. The number of risky assets in our economy *N* is assumed to be large for diversification issue.

We assume that the excess returns *rt*+<sup>1</sup> = *Rt*+<sup>1</sup> − *Rf* 1*<sup>N</sup>* are independent and identically distributed with the mean and the covariance matrix given by *μ* and Σ = \$ *σi*,*<sup>j</sup>* % *<sup>i</sup>*,*j*∈*N*, respectively. Let us denote by *ω* = (*ω*1, ..., *ωN*) - the vector of portfolio weights that represents the amount of the capital to be invested in the risky assets and the remain <sup>1</sup> <sup>−</sup> *<sup>ω</sup>*- 1*<sup>N</sup>* is allocated to the risk-free asset. Short-selling is allowed in the financial market, i.e., some of the weights *ω<sup>i</sup>* could be negative. Let us denote *σ* = (*σ*1,1, ..., *σN*,*N*) - the vector of asset volatilities.

According to Choueifaty (2011), the diversification ratio (DR) of any portfolio *ω* is given by

$$DR(\omega) = \frac{\omega^{'}\sigma}{\sqrt{\omega^{'}\Sigma\omega}}\tag{1}$$

which is the ratio of weighted average of volatilities divided by the portfolio volatility.

Using the relation in Equation (1), the maximum diversification portfolio is obtained by solving the following optimization problem

$$\max\_{\omega} DR(\omega). \tag{2}$$

As the DR is invariant by scalar multiplication (for instance see Choueifaty et al. (2013)), solving the problem in Equation (2) is equivalent of solving the following new problem according to Theron and Van Vuuren (2018)

$$\min\_{\omega' \sigma = 1} \frac{1}{2} \omega' \Sigma \omega. \tag{3}$$

This new optimization problem is very close to the global minimum variance portfolio. The only difference is that the constraint *ω*- 1 = 1 in the global minimum variance problem is replaced by *ω*- *σ* = 1. The solution of this new optimization problem is given by

$$
\omega = \frac{\Sigma^{-1}\sigma}{\sigma^{\prime}\Sigma^{-1}\sigma} = \left(\sigma^{\prime}\Sigma^{-1}\sigma\right)^{-1} \left(\Sigma^{-1}\sigma\right). \tag{4}
$$

The solution in (4) is unknown because it depends on the covariance matrix of asset returns and the vector of volatilities that are unknown and need to be estimated from available data set. We need, in particular, to estimate the covariance matrix and take its inverse. The sample covariance may not be appropriate because it may be nearly singular, and sometimes not invertible. The issue of ill-conditioned covariance matrix must be

addressed because inverting such matrix increases dramatically the estimation error and then makes the maximum diversification portfolio unreliable. Many techniques have been proposed in the literature to stabilize the inverse of the covariance matrix in the solution in (4). According to Carrasco et al. (2007), an interesting way to stabilize the inverse of the covariance matrix consists of dampening the explosive effect of the inversion of the singular values of Σˆ . It consists in replacing the sequence \$ 1/*λ<sup>j</sup>* % of explosive inverse singular values by a sequence \$ *q*(*α*, *λj*)/*λ<sup>j</sup>* % , where the damping function *q*(*α*, *λ*) is chosen such that


where *α* is the regularization parameter. The damping function is specific to each regularization.

In this paper, we propose a consistent way to estimate the solution in (4) using three regularization schemes based on three different ways of inverting the covariance matrix of asset returns. These regularization techniques are the spectral cut-off, the Tikhonov, and the Landweber–Fridman. The spectral cut-off regularization scheme is based on principal components whereas the Tikhonov's one is based on Ridge regression (also called Bayesian shrinkage) and the last one is an iterative method.

#### **3. The Regularized Portfolio**

The regularization methods used in this paper are drawn from the literature on inverse problems (see Kress (1999)). They are designed to stabilize the inverse of Hilbert–Schmidt operators (operators for which the eigenvalues are square summable). These regularization techniques will be applied to the sample covariance matrix of asset returns to stabilize its inverse in the selected portfolio.

Let *λ*ˆ <sup>2</sup> <sup>1</sup> <sup>≥</sup> *<sup>λ</sup>*<sup>ˆ</sup> <sup>2</sup> <sup>2</sup> <sup>≥</sup> ... <sup>≥</sup> *<sup>λ</sup>*<sup>ˆ</sup> <sup>2</sup> *<sup>N</sup>* <sup>≥</sup> 0 be the eigenvalues of the sample covariance matrix <sup>Σ</sup><sup>ˆ</sup> . By spectral decomposition, we have that Σˆ = *PDP*- with *PP*- = *IN*, where *P* is the matrix of eigenvectors and *D* the diagonal matrix with eigenvalues *λ*ˆ <sup>2</sup> *<sup>j</sup>* on the diagonal. Furthermore, let Σˆ *<sup>α</sup>* be the regularized inverse of Σˆ .

$$
\triangle^{a} = PD^{a}P^{'} 
$$

where *D<sup>α</sup>* is the diagonal matrix with elements *q*(*α*, *λ*ˆ <sup>2</sup> *j*)/*λ*<sup>ˆ</sup> <sup>2</sup> *<sup>j</sup>* . The positive parameter *α* is the regularization parameter, a kind of smoothing parameter which is unknown and needs to be selected. *q*(*α*, *λ*ˆ <sup>2</sup> *<sup>j</sup>*) is the damping function that depends on the regularization scheme used.

#### *3.1. Tikhonov Regularization (TH)*

This regularization scheme is close to the well known ridge regression used in presence of multicollinearity to improve properties of OLS estimators. In Tikhonov's regularization scheme, the real function *q*(*α*, *λ*ˆ <sup>2</sup> *<sup>j</sup>*) is given by

$$q(\alpha, \hat{\lambda}\_j^2) = \frac{\lambda\_j^2}{\lambda\_j^2 + \alpha}$$

#### *3.2. The Spectral Cut-Off (SC)*

It consists in selecting the eigenvectors associated with the eigenvalues greater than some threshold.

$$q(\alpha, \hat{\lambda}\_j^2) = I\left\{\hat{\lambda}\_j^2 \ge \alpha\right\},$$

The explosive influence of the factor 1/*λ*ˆ <sup>2</sup> *<sup>j</sup>* is filtered out by imposing *<sup>q</sup>*(*α*, *<sup>λ</sup>*<sup>ˆ</sup> <sup>2</sup> *<sup>j</sup>*) = 0 for small *λ*ˆ 2 *<sup>j</sup>* , that is, *<sup>λ</sup>*<sup>ˆ</sup> <sup>2</sup> *<sup>j</sup>* < *α*. *α* is a positive regularization parameter such that no bias is introduced when *λ*ˆ <sup>2</sup> *<sup>j</sup>* exceeds the threshold *α*. Another version of this regularization scheme is the Principal Components (PC) which consists in using a certain number of eigenvectors to compute the inverse of the operator. The PC and the SC are perfectly equivalent, only the definition of the regularization term *α* differs. In the PC, *α* is the number of principal components. In practice, both methods will give the same estimator.

*3.3. Landweber–Fridman Regularization (LF)*

In this regularization scheme, Σˆ *<sup>α</sup>* is computed by an iterative procedure with the formula

$$\begin{cases} \begin{array}{l} \hat{\Sigma}\_{l}^{a} = \left( I\_{N} - c\hat{\Sigma}^{a} \right) \hat{\Sigma}\_{l-1} + c\hat{\Sigma} \quad \text{for } l = 1, 2, \dots 1/a - 1, \\\ \hat{\Sigma}\_{0}^{a} = c\hat{\Sigma} \end{array} \end{cases}$$

The constant *c* must satisfy 0 < *c* < 1/*λ*ˆ <sup>2</sup> <sup>1</sup>. Alternatively, we can compute this regularized inverse with

$$q(\alpha, \hat{\lambda}\_j^2) = 1 - \left(1 - c\hat{\lambda}\_j^2\right)^{\frac{1}{\alpha}}$$

The basic idea behind this procedure is similar to the spectral cut-off method but with a smooth bias function.

See Carrasco et al. (2007) for more details about these regularization techniques. The regularized diversified portfolio for a given regularization scheme is

$$
\hat{\omega}\_a = \frac{\hat{\Sigma}^a \hat{\sigma}}{\hat{\sigma}^{\prime} \hat{\Sigma}^a \hat{\sigma}} = \left(\hat{\sigma}^{\prime} \hat{\Sigma}^a \hat{\sigma}\right)^{-1} \hat{\Sigma}^a \hat{\sigma}.\tag{5}
$$

This regularized portfolio depends on an unknown tuning parameter that needs to be selected through a data-driven method.

#### **4. Asymptotic Properties of the Selected Portfolio**

In this section, we will look at the efficiency of the regularized portfolio with respect to the diversification ratio. We will also propose a data-driven method to select the tuning parameter.

#### *4.1. Efficiency of the Regularized Diversified Portfolio*

To obtain the efficiency of the selected portfolio, we need to impose some regularity conditions, in particular we will need the following assumption.

#### **Assumption A:** <sup>Σ</sup> *<sup>N</sup>* is a trace class operator.

A a trace class operator *K* is a compact operator with a finite trace, i.e., *Tr*(*K*) = *O*(1). This assumption is more realistic than assuming that Σ is a Hilbert–Schmidt operator. Moreover, Carrasco et al. (2019) show that Assumption A holds when the returns are generated from a standard factor model.

Under Assumption A, the following proposition presents information about the asymptotic property of the diversification ratio associated with the selected portfolio.

**Proposition 1.** *Under Assumption A we have that*

$$DR(\hat{\omega}\_a) \to\_p DR(\omega\_t)\_\prime \tag{6}$$

*if <sup>N</sup> α* <sup>√</sup>*<sup>T</sup>* → <sup>0</sup> *as T goes to infinity.*

#### **Proof.** In Appendix A.

**Comment on Proposition 1.** The regularity condition behind proposition 1 implies several things: First, *α* <sup>√</sup>*<sup>T</sup>* <sup>→</sup> <sup>+</sup><sup>∞</sup> implies that the estimation window should go to infinity faster than the optimal tuning parameter goes to zero. Second, *<sup>N</sup> α* <sup>√</sup>*<sup>T</sup>* → 0 implies that *<sup>α</sup>* √*T* should go to infinity faster than the number of assets in the financial market. Therefore, the number of assets should be limited asymptotically compared with the estimation window. As the regularization parameter *α* is in (0, 1), *<sup>N</sup> α* <sup>√</sup>*<sup>T</sup>* → 0 is implied by the following condition <sup>√</sup>*<sup>N</sup> <sup>T</sup>* <sup>→</sup> 0. However, the regularity condition <sup>√</sup>*<sup>N</sup> <sup>T</sup>* → 0 seems to be more restrictive than assuming that *<sup>N</sup> <sup>T</sup>* → *Constant*. One way to avoid this regularity condition will be to assume that the covariance matrix of assets returns is a trace class operator or to assume that this covariance matrix is a Hilbert–Schmidt operator. These assumptions seem to be more restrictive than assuming that <sup>√</sup>*<sup>N</sup> <sup>T</sup>* → 0, which seems to be close to the reality asymptotically. Moreover, <sup>√</sup>*<sup>N</sup> <sup>T</sup>* → 0 is only an asymptotic assumption and we do not need to have <sup>√</sup>*<sup>N</sup> <sup>T</sup>* close to zero in practice to obtain good performance with the regularized portfolio. Particularly, in finite sample, *N* could be larger than *T* or too close to *T*. Proposition 1 shows that the regularized diversified portfolio is asymptotically efficient in terms of the diversification ratio for a wide choice of the tuning parameter. Meaning that, even if the optimal diversified portfolio in Equation (4) is unknown, there exists a feasible portfolio obtained by regularization capable of reaching similar level of performance in terms of the diversification ratio.

#### *4.2. Data-Driven Method for Selecting the Tuning Parameter*

We show in the previous sections that the selected portfolio depends on a certain smoothing parameter *α* ∈ (0, 1). We have derived the efficiency of the selected portfolio assuming that this tuning parameter is given. However, in practice, the regularization parameter is unknown and needs to be selected. Therefore, we propose a data-driven selection procedure to obtain an approximation of this parameter.

Our objective here is to select the tuning parameter which minimizes the distance between the inverse of the estimated covariance matrix and the inverse of the true covariance matrix. According to Ledoit and Wolf (2003), most of the existing shrinkage estimators from finite-sample statistical decision theory as well as in Frost and Savarino (1986) break down when *N* ≥ *T* because their loss functions involve the inverse of the sample covariance matrix which is a singular matrix in this situation. Therefore, to avoid this problem, they propose a loss function that does not depend on this inverse. This loss function is a quadratic measure of distance between the true and the estimated covariance matrices based on the Frobenius norm. Unlike in Ledoit and Wolf (2003), we will use a loss function that depends on the inverse of the covariance matrix under the assumption that the true covariance matrix is invertible. One important thing to notice here is that the regularized covariance matrix is always invertible even if *N* ≥ *T* meaning that our loss function exists for *N* ≥ *T*. In fact, we know that the optimal diversified portfolio as given by Equation (4) depends on the inverse of the covariance matrix of assets returns. Moreover, because our objective is to stabilize the inverse of this covariance matrix in the estimated portfolio by regularization, we propose to use a loss function that minimizes a quadratic distance between the regularized inverse and the theoretical covariance matrix.

The loss function we consider here is given by

$$\mu^{'}\left[\left(\hat{\Sigma}^{a} - \Sigma^{-1}\right)^{'}\Sigma\left(\hat{\Sigma}^{a} - \Sigma^{-1}\right)\right]\mu\tag{7}$$

where *μ* is the expected excess return. The choice of this specific quadratic distance is useful to obtain a criterion that can easily be approximated by generalized cross validation approach.

Therefore, the objective is to select the tuning parameter that minimizes

$$E\left\{\mu^{'}\left[\left(\hat{\Sigma}^{a}-\Sigma^{-1}\right)^{'}\Sigma\left(\hat{\Sigma}^{a}-\Sigma^{-1}\right)\right]\mu\right\}.\tag{8}$$

It implies that

$$\mathfrak{A} = \arg\min\_{a \in H \underline{r}} E\left\{ \mu' \left[ \left( \Sigma^a - \Sigma^{-1} \right)' \Sigma \left( \Sigma^a - \Sigma^{-1} \right) \right] \mu \right\} \tag{9}$$

To obtain a better approximation of the tuning parameter based on a generalized cross-validation criterion, we need additional assumptions. Therefore, let us start with some useful notations. 

We denote by Ω = *E rtr* - *t* = *E* -*X*- *X* /*T* and *β* = Ω−1*μ* = *E*(*X*- *X*) −1 *E*(*X*- 1*T*) where *rt*, *t* = 1, ··· , *T* are the observations of the excess returns and *X* the *T* × *N* matrix with *t*th row given by *r*- *t*.

#### **Assumption B**

For some *ν* > 0, we have that

$$\sum\_{j=1}^N \frac{<\beta, \phi\_j >^2}{\eta\_j^{2\nu}} < \infty$$

where *φ<sup>j</sup>* and *η*<sup>2</sup> *<sup>j</sup>* denote the eigenvectors and eigenvalues of <sup>Ω</sup> *N* .

The regularity condition in Assumption B can be found in Carrasco et al. (2007) and Carrasco (2012). Moreover, Carrasco et al. (2019) show that Assumption B hold if the returns are generated by a factor model. Assumption B is used combined with Assumption A to derive the rate of convergence of the mean squared error in the OLS estimator of *<sup>β</sup>*. These two assumptions imply in particular that *<sup>β</sup>* <sup>2</sup> <sup>&</sup>lt; <sup>+</sup><sup>∞</sup> such that we have the following relations,

$$\left\|\beta - \beta\_{\mathfrak{a}}\right\|^2 = \begin{cases} O(\mathfrak{a}^{\nu}) & \text{for SC, LP} \\ O\left(\mathfrak{a}^{\min(\nu, 2)}\right) & \text{for } T \end{cases}$$

*βα* is the regularized version of *β*.

The following result gives us a very nice equivalent of the objective function. We can easily apply a cross-validation approximation procedure on this expression of the objective function.

**Proposition 2.** *Under Assumptions A and B we have that*

$$E\left\{\mu'\left[\left(\hat{\Sigma}^a - \Sigma^{-1}\right)'\Sigma\left(\hat{\Sigma}^a - \Sigma^{-1}\right)\right]\mu\right\} \sim E\left\{\left(\hat{\Sigma}^a \hat{\mu} - \Sigma^{-1}\mu\right)'\Sigma\left(\hat{\Sigma}^a \hat{\mu} - \Sigma^{-1}\mu\right)\right\}$$

*if* <sup>1</sup> *<sup>α</sup>*2*<sup>T</sup>* <sup>→</sup> <sup>0</sup> *and* <sup>√</sup>*Nα*min( *<sup>ν</sup>* <sup>2</sup> ,1) <sup>→</sup> <sup>0</sup> *as T goes to infinity.*

**Proof.** In Appendix B.

We obtain the following corollary from this proposition.

**Corollary 1.** *Under Assumptions A and B we have that*

$$E\left\{\mu'\left[\left(\hat{\Sigma}^a - \Sigma^{-1}\right)'\Sigma\left(\hat{\Sigma}^a - \Sigma^{-1}\right)\right]\mu\right\} \sim \frac{1}{T}E\left\|X\left(\hat{\beta}\_a - \beta\right)\right\|^2 + \frac{\left(\mu'\left(\beta\_a - \beta\right)\right)^2}{\left(1 - \mu'\beta\right)}$$

*if* <sup>1</sup> *<sup>α</sup>*2*<sup>T</sup>* <sup>→</sup> <sup>0</sup> *and* <sup>√</sup>*Nα*min( *<sup>ν</sup>* <sup>2</sup> ,1) <sup>→</sup> <sup>0</sup> *as T goes to infinity.*

The result in Corollary 1 is obtained by using Proposition 2 combined with Proposition 1 in Carrasco et al. (2019).

From Corollary 1, it follows that minimizing *E μ* - <sup>Σ</sup><sup>ˆ</sup> *<sup>α</sup>* <sup>−</sup> <sup>Σ</sup>−<sup>1</sup> - Σ <sup>Σ</sup><sup>ˆ</sup> *<sup>α</sup>* <sup>−</sup> <sup>Σ</sup>−<sup>1</sup> *μ* is equivalent to minimizing

$$\frac{1}{T}E\left\|X\left(\beta\_a-\beta\right)\right\|^2\tag{10}$$

$$+\frac{\left(\mu'(\beta\_{\alpha}-\beta)\right)^{2}}{\left(1-\mu'\beta\right)}.\tag{11}$$

Terms (10) and (11) depend on the unknown *β*, and therefore need to be approximated. The approximation of these two quantities is borrowed from Carrasco et al. (2019). More precisely, the rescaled MSE

$$\frac{1}{T}E\left[\left\|\left\|\mathbf{X}\left(\widehat{\beta}\_{\alpha}-\beta\right)\right\|\right\|^2\right]$$

can be approximated by generalized cross-validation criterion:

$$GCV(\boldsymbol{\alpha}) = \frac{1}{T} \frac{\left\| \left( I\_T - M\_T(\boldsymbol{\alpha}) \right) \mathbf{1}\_T \right\|^2}{\left( 1 - tr(M\_T(\boldsymbol{\alpha})) / T \right)^2}.$$

Using the fact that

$$
\hat{\mu}'(\beta\_{\mathbb{A}} - \beta) = \frac{1'\_T}{T} (M\_T(\alpha) - I\_T) X \beta\_{\mathbb{A}}.
$$

(11) can be estimated by plug-in:

$$\frac{\left(1\_T^\prime (M\_T(\alpha) - I\_T) X \hat{\boldsymbol{\beta}}\_{\overline{\alpha}}\right)^2}{T^2 \left(1 - \boldsymbol{\mu}^\prime \boldsymbol{\beta}\_{\overline{\alpha}}\right)}\tag{12}$$

where *β*ˆ \**<sup>α</sup>* is an estimator of *<sup>β</sup>* obtained for some consistent *<sup>α</sup>*˜ (*α*˜ can be obtained by minimizing *GCV*(*α*)).

The optimal value of *τ* is defined as

$$\mathfrak{A} = \arg\min\_{\pi \in H\_T} \left\{ GCV(\mathfrak{a}) + \frac{\left(1\_T'(M\_T(\mathfrak{a}) - I\_T)X\widehat{\mathfrak{f}}\_{\overline{\mathfrak{a}}}\right)^2}{T^2\left(1 - \widehat{\mathfrak{f}}'\widehat{\mathfrak{f}}\_{\overline{\mathfrak{a}}}\right)}\right\}$$

where *HT* = {1, 2, ..., *T*} for spectral cut-off and Landweber–Fridman and *HT* = (0, 1) for Ridge.

#### **5. Simulations and Empirical Study**

We start this section by a simulation exercise to set up the performance of our procedure and compare our result to the existing methods. In particular, we compare our method to the most diversified portfolio proposed by Choueifaty and Coignard (2008). More precisely, we focus on how our procedure performs in terms of the Sharpe ratio and the diversification ratio. To end this section, we analyze the out-of-sample performance of the selected portfolio.

#### *5.1. Data*

In our simulations and empirical analysis, various forms of monthly data will be used from July 1980 to June 2016. The one-month Treasury-Bill (T-Bill) rate is used as a proxy for the risk-free rate, and *Rf* is calibrated to be the mean of the one-month Treasury-Bill rate observed in the data. We use monthly returns of Fama–French three factors and of 30 industry portfolios from the Kenneth R. French data library in order to calibrate unknown parameters of the simulation model. In the empirical study, we also use monthly data for the 100 portfolios formed on size and book-to-market from the Kenneth R. French data Library and the CRSP monthly data for the S&P500 index constituents.

#### *5.2. Simulation*

We implement a simple simulation exercise to assess the performance of our procedure and compare it with the existing procedures. Let us consider for this purpose a simple economy with *N* ∈ {10, 20, 40, 60, 80, 90, 100} risky assets. We use several values of *N* to see how the size of the financial market (defined by the number of assets in the economy) could affect the performance of the selected strategy. Let *T* be the sample size used to estimate the unknown parameters in the investment process. Following Chen and Yuan (2016) and Carrasco et al. (2019), we simulate the excess returns at each simulation step from the following three-factor model for *i* = 1, ..., *N* and *t* = 1, ..., *T*

$$r\_{it} = b\_{i1}f\_{1t} + b\_{i2}f\_{2t} + b\_{i3}f\_{3t} + \varepsilon\_{it} \tag{13}$$

*ft* = (*f*1*t*, *f*2*t*, *f*3*t*) - is the vector of common factors, *bi* = (*bi*1, *bi*2, *bi*3) - is the vector of factor loadings associated with the ith asset, and *it* is the idiosyncratic component of *rit* satisfying *<sup>E</sup>*(*it*<sup>|</sup> *ft*) <sup>=</sup> 0. We assume that *ft* ∼ N - *μ<sup>f</sup>* , Σ*<sup>f</sup>* , where *μ<sup>f</sup>* and Σ*<sup>f</sup>* are calibrated on the monthly data of the market portfolio, the Fama–French size, and the book-to-market portfolio from July 1980 to June 2016. Moreover, we assume that *bi* ∼ N (*μb*, Σ*b*) with *μ<sup>b</sup>* and Σ*<sup>b</sup>* calibrated using data of 30 industry portfolios from July 1980 to June 2016. Idiosyncratic terms *it* are supposed to be normally distributed. The covariance matrix of the residual vector is assumed to be diagonal and given by Σ=diag *σ*2 <sup>1</sup> , ..., *<sup>σ</sup>*<sup>2</sup> *N* with the diagonal elements drawn from a uniform distribution between 0.10 and 0.30 to yield an average cross-sectional volatility of 20%.

In the compact form (13) can be written as follows,

$$R = BF + \epsilon \tag{14}$$

where *B* is a *N* × 3 matrix whose ith row is *b* - *i* . The covariance matrix of the vector of excess return *rt* is given by

$$
\Sigma = B\Sigma\_f B^{'} + \Sigma\_c.
$$

The mean of the excess return is given by *μ* = *Bμ<sup>f</sup>* . The return on the risk-free asset *Rf* is calibrated to be the mean of the one-month T-B observed in the data from July 1980 to June 2016.

The calibrated parameters used in our simulation process are given in Table 1. The gross return on the risk-free asset calibrated on the data is given by *Rf* = 1.0036. Once generated, the factor loadings are kept fixed over replications, while the factors differ from simulations and are drawn from a trivariate normal distribution.

**Table 1.** Calibrated parameters.


Let SR(*ωt*) be the Sharpe ratio associated with the optimal portfolio *ωt*, then SR(*ωt*) is given as follows,

$$SR(\omega\_t) = \left[\mu^{'}\Sigma\mu\right]^{1/2}$$

To evaluate the performance of our procedure in terms of the Sharpe ratio, we focus on the actual Sharpe ratio associated with the selected portfolio. The actual Sharpe ratio at time point *t* is given by

$$SR(\hat{\omega}\_t) = \frac{\hat{\omega}\_t^{'}\mu}{\left[\hat{\omega}\_t^{'}\Sigma\hat{\omega}\_t^{'}\right]^{1/2}}$$

We consider the following portfolio selection procedures.

• The sample-based diversified portfolio (SbDP). This strategy is obtained using sample moments to estimate the unknown parameters in the maximum diversification portfolio.

$$SbDP = \frac{\beth^{-1}\partial}{\partial'\beth^{-1}\partial'}$$

• The most diversified portfolio (MDP) proposed by Choueifaty et al. (2013). This strategy is obtained by solving the optimization problem in Equation (2) under the following constraint,

*ω<sup>i</sup>* ≥ 0 *f or i* = 1, ..., *N*.

The closed form associated with this new optimization problem is given as follows,

$$MDP = diag(\Sigma) \mathbb{C}^{-1} \mathbf{1}$$

where *diag*(Σ) is a diagonal matrix of assets volatilities, *C* the correlation matrix, and 1 a *N* × 1 vector of ones. The MDP is then estimated by replacing the unknown parameters by their empirical counterparts.

• The global minimum variance portfolio (GMVP) obtained by minimizing the variance of the return on the optimal selected portfolio. By solving this optimization problem, the following closed form is obtained,

$$GMVP = \frac{\Sigma^{-1} 1}{1^{\lceil \Sigma - 1 \rceil} 1}$$

This solution is then estimated by replacing the covariance matrix by the sample covariance matrix.


$$T\mathcal{g}P = \frac{\Sigma^{-1}\mu}{\mu^{'}\Sigma^{-1}1}$$

This portfolio is also estimated using sample moments such as the sample mean and the sample covariance matrix to estimate the unknown parameters.

• The linear factor-based shrinkage estimators proposed by Ledoit and Wolf (2003) (LWP). It consists of replacing the sample covariance matrix in the selected portfolio by an optimally weighted average of two existing estimators: the sample covariance matrix and single-index covariance matrix. This method involves also a tuning parameter that is unknown and has been selected by the authors. The tuning parameter selection procedure proposed in Ledoit and Wolf (2003) is based on minimizing the distance between the population covariance matrix and the regularized one. This implies that the way they select the turning parameter is different from our data-driven method. Therefore, the LWP will be considered here as a very good benchmark (and it will be the only benchmark that we consider) to evaluate the ability of our data-driven method to deliver additional performance compare to other data-driven methods.

We perform 1000 simulations and estimate our statistics over replications. We obtain the following results about the actual Sharpe ratio.

Table 2 contains the results about the average monthly Sharpe ratio obtained by simulations. The results show that the sample based diversified portfolio performs very poorly in terms of maximizing the Sharpe ratio in the financial market with large number of assets. This result is essentially due to the fact that the estimation error from estimating the vector of assets volatilities is amplified by using the sample covariance matrix of assets returns closed to a singular matrix when *N* becomes too large compared with the sample size. Therefore, even if this strategy is supposed to be the maximum diversification's one with the highest Sharpe ratio, the SbDP is dominated by several other strategies such as the GMVP, the XoNP, and the TgP. Therefore, this strategy cannot be consider as the maximum diversification strategy in practice. To solve this problem, Choueifaty et al. (2013) proposes the most diversified portfolio (MDP) obtained by maximizing the diversification ratio under a non-negative constraint on the portfolio weights. This additional constraint in the investment process may help to reduce the effect of estimation error on the performance of the selected portfolio. The results of this analysis are in Table 2. By imposing the non-negative constraint, investors considerably improve the performance of the selected portfolio in terms of the Sharpe ratio. This new strategy even outperforms the global minimum variance portfolio. However, this procedure is still dominated by the target portfolio and the equal weighted portfolio meaning that much remains to be done about finding the maximum diversification strategy in practice. One explanation to this result is that imposing the non-negative constraint on the portfolio weight may limit the ability of the selected portfolio to be fully diversified. Therefore, one needs to find a more general estimation procedure for the maximum diversified portfolio that allows for short selling.


**Table 2.** The average monthly Actual Sharpe ratio from optimal strategies using a three-factor model as a function of the number of assets in the economy with the sample size *n* = 120, over 1000 replications. True SR is the true actual Sharpe ratio.

For this purpose, we propose a new way to estimate the optimal diversified portfolio by stabilizing the inverse of the sample covariance matrix without imposing a non-negative constraint on the portfolio weights in the investment process. The results of these methods are also in Table 2. The first thing to point out about these results is that the regularized diversified portfolio outperforms the most diversified portfolio in terms of maximizing the Sharpe ratio. For instance, we obtain an average Sharpe ratio of 0.2514, 0.2587, 0.2592, and 0.2605 for the MDP, the RdgDP, the SCDP, and the LFDP, respectively, when only 10 assets are considered in the economy. The difference in terms of the actual Sharpe ratio performance between our procedure and the most diversified portfolio significantly increases with the number of assets in the financial market. For example, for 100 assets, the average Sharpe ratio is about 0.1935, 0.2991, 0.2853, and 0.2980 for the MDP, the RdgDP, the SCDP, and the LFDP, respectively. This results may be due to the fact that when the number of assets in the economy increases, the degree of diversification of the selected strategy may deteriorate with non-negative constraints on the investment process that may reduce the ability to find a strategy that performs the Sharpe ratio. Moreover, the regularized diversified portfolio outperforms the target strategy and the equal-weighted portfolio when the number of assets in the financial market exceeds 40. Nonetheless, for 10 assets in the economy, the target portfolio outperforms the RdgDP and the SCDP but is dominated by the LFDP. With 20 assets the target portfolio dominates the RdgDP and the LFDP and is dominated by the SCDP. The equal-weighted portfolio outperforms some regularized strategies such as the RdgDP and the SCDP only for 10 assets in the financial market. The fact that the regularized strategies give very interesting results in terms of maximizing the Sharpe ratio (compared with the existing strategies) for large *N* is because these methods are essentially used to address estimation issues in large dimensional problems. The performance of these procedures seems to be independent of the size of the financial market. In fact, with a reasonable choice of the tuning parameter, each of these methods can achieve satisfactory performance in terms of the Sharpe ratio even if the number of assets in the economy is large.

Our regularized portfolio also outperforms the selected strategy obtained using the linear shrinkage estimator of Ledoit and Wolf (2003) to estimate the covariance matrix of asset returns. The difference in terms of performance between these two portfolios tends to become large when the number of assets we consider in the economy increases. This result can be due to the fact that the estimation error associated with estimating the single-index covariance matrix may be important for very large assets. One other thing that could explain this result comes from the fact that our tuning parameter is selected to minimize the distance between the regularized inverse of the covariance matrix and the inverse of the population's one. Moreover, because the optimal portfolio depends on the inverse of the covariance matrix, selecting a tuning parameter that minimizes the estimation error in the inverse of the covariance matrix seems to be more appropriate than choosing this parameter to minimize the estimation error in the covariance matrix. One important thing to point out is that the Ridge regularized portfolio is a special case of the selected portfolio with the linear shrinkage estimation of the covariance matrix. In this case, the structural covariance matrix is replaced by the identity to avoid the potential estimation error which may be associated with this covariance matrix.

Similar results are obtained when choosing the estimation window to be 1000 and by increasing the number of assets in the economy from 150 to 999 (*N* ∈ {150, 250, 400, 550, 700, 850, 950, 999}) as given in Table 3.


**Table 3.** The average monthly Actual Sharpe ratio from optimal strategies using a three-factor model as a function of the number of assets in the economy with the sample size *n* = 1000, over 1000 replications. True SR is the true actual Sharpe ratio.

To analyze the statistical significance of the regularized portfolio over the other strategies, we implement the following test procedure about the Sharpe ratio,

$$H\_0: \mathcal{RSR} \le \mathcal{SR}\_0 \\ \upsilon \nu H\_1: \mathcal{RSR} > \mathcal{SR}\_0$$

where *RSR* is the regularized Sharpe ratio and *SR*<sup>0</sup> the Sharpe ratio of the portfolio under comparison. This test is conducted using the same procedure as in Ao et al. (2019). For more information about this test procedure see Jobson and Korkie (1981) and Memmel (2003). The fundamental objective of this test procedure is to confirm the domination of our method over the existing strategies with a statistic test. The *p*-values associated with this test procedure for each of the regularized portfolios are given in Tables 4–6. According to these results, our regularized portfolio dominates the other strategies in terms of maximizing the Sharpe ratio at the significant level 5%. In particular, our method outperforms the LW portfolio in the large financial market setting.

**Table 4.** The *p*-value associated with performance hypothesis testing with the Sharpe ratio from Ridge regularized strategy using a three-factor model as a function of the number of assets in the economy with the sample size *n* = 1000, over 1000 replications.



**Table 5.** The *p*-value associated with Performance hypothesis testing with the Sharpe ratio from Landweber–Fridman regularized strategy using a three-factor model as a function of the number of assets in the economy with the sample size *n* = 1000, over 1000 replications.

**Table 6.** The *p*-value associated with performance hypothesis testing with the Sharpe ratio from spectral cut-off regularized strategy using a three-factor model as a function of the number of assets in the economy with the sample size *n* = 1000, over 1000 replications.


We also compute in Table 7 the average monthly diversification ratio associated with the selected portfolio. We obtain similar results to what has been obtained in Table 2. The regularized portfolio performs well in terms of maximizing the diversification ratio and dominated most of the existing methods in the large financial market. The diversification ratio that we obtain with our method is very close to the true one. This implies that in addition to the asymptotic results obtained in the Section 4, the regularized portfolio has very good finite sample properties. This result shows that we do not need *N*/ <sup>√</sup>*<sup>T</sup>* to be close to zero to improve the finite sample performance of the selected portfolio.

**Table 7.** The average monthly Actual monthly diversification ratio from optimal strategies using a three-factor model as a function of the number of assets in the economy with the sample size *n* = 120, over 1000 replications. True DR is the true diversification ratio.


#### *5.3. Empirical Study*

In this empirical section, our objective is to use the real data (unlike in the simulation part) to estimate the unknown parameters of the optimal portfolio and then to evaluate the performance of each estimation procedure based on the same statistics as in the simulation section. Note that our purpose in this paper lies not in forecasting but proposing a consistent way that allows us to correctly estimate the portfolio in Equation (4) in large dimensional setting.

We apply our method to several sets of portfolios from Kenneth R. French's website. In particular, we apply our procedure to the following portfolios: the 30-industry portfolios and the 100 portfolios formed on size and book-to-market. We allow investors to rebalance their portfolios every month. This implies that the optimal portfolio is constructed at the end of each month for a given estimation window *M* by maximizing the diversification ratio. The investor holds this portfolio for one month, realizes gains and losses, updates information, and then recomputes optimal portfolio weights for the next period using the same estimation window. This procedure is repeated each month, generating a time series of out-of-sample returns. This time series can then be used to analyze the out-of-sample performance of each strategy based on several statistics such as the out-of-sample Sharpe ratio. For this purpose, we use data from July 1980 to June 2018.

Table 8 contains some results of the out-of-sample analysis in terms of the Sharpe ratio for two different data sets: the FF30 and the FF100. The empirical results in this table confirm what we have obtained in the simulation part. According to this result, by stabilizing the inverse of the covariance matrix in the maximum diversification portfolio, we considerably improve the performance of the selected strategy in terms of maximizing the Sharpe ratio. Moreover, our regularized strategies outperform the most diversified strategy, the target portfolio, The LW portfolio, and the global minimum variance portfolio for each data set. The most diversified strategy outperforms the global minimum variance portfolio but is dominated by the Equal-Weight portfolio for each data set. These results of the most diversified portfolio can essentially be explained by the fact that by imposing a non-negative constraint in the investment process, one cannot fully diversify the optimal portfolio. The LWP outperforms the other strategies, in particular, this method dominates the most diversified strategy of Choueifaty et al. (2013). The return of the regularized portfolio is less volatile than what we obtain with the most diversified portfolio, the target one, and the LW strategy.

**Table 8.** Out-of-sample performance in terms of the Sharpe ratio applied on the 30 industry portfolios (FF30) and the 100 portfolios formed on size and book-to-market (FF100) with a rolling window of 120.


We are also interested in how our procedure can perform in terms of minimizing the rebalancing cost at a given period. The rebalancing cost at the time *t* can be naturally measured by

$$Cost\_t = \sum\_{j=1}^{N} |\omega\_{t,j} - \omega\_{t-1,j}|.$$

This measure of the trading cost is, in fact, the turnover. The transaction cost can be measured using the turnover in the sense that these costs are positively related to the turnover. Therefore, in the rest of the paper the turnover will be called transaction costs. The average trading cost over the investment horizon is given by

$$\text{TradingCost} = \frac{1}{Q} \sum\_{t=1}^{Q} \text{Cost}\_{t}$$

where *Q* is the number of rebalancing periods. This quantity can be interpreted as the average percentage of wealth traded at each period. The average monthly rebalancing costs are given in Table 9. These results show that by stabilizing the inverse of the covariance matrix by regularization, we help investors to select strategies that significantly reduce the rebalancing cost. The regularized portfolio outperforms the other strategies in terms of minimizing the trading costs faced by investors in their investment process.


**Table 9.** Out-of-sample performance in terms of rebalancing cost (turnover) applied on the 30 industry portfolios (FF30) and the 100 portfolios formed on size and book-to-market (FF100) for two different rolling windows.

The evolution of the share of the selected assets in the optimal portfolio in Figure 1 shows that by regularizing the covariance matrix, we considerably reduce extreme positions in the selected strategy. Therefore, we significantly reduce the transaction costs faced by investors when they decide to take positions in the financial market. Moreover, the return on the selected portfolio becomes less volatile in such a situation.

**Figure 1.** The evolution of the selected assets in the optimal portfolio. We obtain this figure using the 30 industry portfolios with an estimation window of *n* = 120.

Tables 10 and 11 contain the Fama–French monthly regression coefficients for the 100 portfolios formed on size and book-to-market and the 30-industry portfolios, respectively. Monthly data are used from July 1990 to June 2018. According to the result in

Table 10, only the return on the Equal-Weight portfolio can be explained by the Fama– French three-factor model for the 100 portfolios formed on size and book-to-market. The return obtained with the regularized portfolios and the most diversified portfolio can be explained only with the return on the market portfolio (a one-factor model) through a positive relation. However, the return of the most diversified portfolio and the global minimum variance portfolio can be explained with a two factors model when using the 30-industry portfolios. The return of the other strategies such as the regularized portfolios, the Equal-Weight portfolio, and the target portfolio can be explained by the Fama–French three-factor model.

**Table 10.** Fama–French Monthly Regression Coefficients for the 100 portfolios formed on size and book-to-market from July 1990 to June 2018.


**Table 11.** Fama–French Monthly Regression Coefficients for the 30-industry portfolios from July 1990 to June 2018.


As the portfolio optimization is generally based on individual stocks instead of aggregate portfolios as the Fama–French portfolio, we apply also our method to a subset of the S&P500 index constituents to see how our method performs in such universe. We use for this purpose monthly data from March 1986 to December 2019. At the beginning of this empirical analysis, we randomly form pools of 100 or 150 stocks from the S&P500 index constituents for which there are complete return data for the prior 120 or 240 months. The

optimal portfolio will then be constructed using the same procedure as before. We then compute the out-of-sample performance in terms of the Sharpe ratio and the turnover. The results of this empirical analysis are given in Tables 12 and 13. We obtain similar results as in the case of the Fama–French portfolios proving that our method also performs well when the optimal portfolio is formed with individual stocks from S&P500.

**Table 12.** Out-of-sample performance in terms of Sharpe ratio applied on two subsets of S&P500 constituents for two different rolling windows.


**Table 13.** Out-of-sample performance in terms of rebalancing cost (turnover) applied on two subsets of S&P500 constituents for two different rolling windows.


#### **6. Conclusions**

This paper addresses the estimation issue that exists in the maximum diversification portfolio framework in the large financial market. We propose to stabilize the inverse of the covariance matrix in the diversified portfolio using regularization techniques from inverse problem literature. These regularization techniques, namely, the ridge, the spectral cut-off, and Landweber–Fridman, involve a regularization parameter or penalty term whose optimal value is selected to minimize the expected distance between the inverse of the estimated covariance matrix and the inverse of the true covariance matrix. We show, under appropriate regularity conditions, that the selected strategy by regularization is asymptotically efficient with respect to the diversification ratio for a wise choice of the tuning parameter. Meaning that, even if the diversified portfolio is unknown, there exists a feasible portfolio obtained by regularization capable of reaching a similar level of performance in terms of the diversification ratio.

To evaluate the performance of our procedures, we implement a simulation exercise based on a three-factor model calibrated on real data from the US financial market. We obtain by simulation that our procedure significantly improves the performance of the selected strategy with respect to the Sharpe ratio. Moreover, the regularized rules are compared to several strategies such as the most diversified portfolio, the target portfolio, the global minimum variance portfolio, and the naive 1/N strategy in terms of in-sample and out-of-sample Sharpe ratio, and it is shown that our method yields significant Sharpe ratio improvements. To confirm our simulations, we do an empirical analysis using Kenneth R. French's 30-industry portfolios and 100 portfolios formed on size and bookto-market. According to this empirical result, by stabilizing the inverse of the covariance matrix in the maximum diversification portfolio, we considerably improve the performance of the selected strategy in terms of maximizing the Sharpe ratio.

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** The data presented in this study are available on request from the corresponding author.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **Appendix A. Proof of Proposition 1**

By definition we have that

$$DR(\hat{\omega}\_a) = \frac{\hat{\omega}\_a^{'}\sigma}{\sqrt{\hat{\omega}\_a^{'}\Sigma\hat{\omega}\_a}}\cdot$$

Let us first look at *ω*ˆ - *<sup>α</sup>*Σ*ω*ˆ *<sup>α</sup>*

$$\begin{split} \boldsymbol{\hat{\omega}}\_{a}^{'} \boldsymbol{\Sigma} \boldsymbol{\hat{\omega}}\_{a} &= \quad \left[ \left( \boldsymbol{\hat{\omega}}\_{a} - \boldsymbol{\omega} \right) + \boldsymbol{\omega} \right]^{'} \boldsymbol{\Sigma} \big[ \left( \boldsymbol{\hat{\omega}}\_{a} - \boldsymbol{\omega} \right) + \boldsymbol{\omega} \big] \\ &= \quad \boldsymbol{\omega}^{'} \boldsymbol{\Sigma} \boldsymbol{\omega} + \underbrace{\left( \boldsymbol{\hat{\omega}}\_{a} - \boldsymbol{\omega} \right)^{'} \boldsymbol{\Sigma} (\boldsymbol{\hat{\omega}}\_{a} - \boldsymbol{\omega})}\_{\left( a \right)} + 2 \underbrace{\left( \boldsymbol{\hat{\omega}}\_{a} - \boldsymbol{\omega} \right)^{'} \boldsymbol{\Sigma} \boldsymbol{\omega}}\_{\left( b \right)}. \end{split}$$

Now we are going to look at the properties of (a) and (b). We know that

$$\begin{aligned} \label{eq:SDAC-a.e.} \label{eq:SDAC-b.e.} \begin{pmatrix} \underbrace{\phi'\Delta^{a}\partial}\_{(c)} \end{pmatrix}^{-1} \begin{aligned} \label{eq:SDAC-b.e.} \begin{aligned} \label{eq:SDAC-b.e.} \begin{Bmatrix} \Delta^{a}\partial^{c} \\ \end{Bmatrix} \\\\ \begin{array}{rcl} \begin{array}{rcl} \left(\boldsymbol{c}\right)^{\end{Bmatrix} &=& \begin{array}{rcl} \boldsymbol{\sigma}'\Delta^{a}\boldsymbol{\sigma} + \left(\boldsymbol{\sigma}-\boldsymbol{\sigma}\right)'\Delta^{a}\left(\left\Vert\boldsymbol{\sigma}-\boldsymbol{\sigma}\right\Vert + 2\left(\boldsymbol{\sigma}-\boldsymbol{\sigma}\right)'\Delta^{a}\boldsymbol{\sigma} \end{array} \end{aligned} \end{aligned} \end{aligned} \end{cases}$$

$$\begin{split} \left\| \left( \left( \hat{\sigma} - \sigma \right)^{'} \hat{\Sigma}^{a} (\hat{\sigma} - \sigma) \right) \right\| &=& \left\| \left( \frac{\left( \hat{\sigma} - \sigma \right)^{'}}{\sqrt{N}} \left( \frac{\hat{\Sigma}}{N} \right)^{a} \frac{\left( \hat{\sigma} - \sigma \right)^{\*}}{\sqrt{N}} \right\| \right\| \\ &=& O\_{p} \left( \frac{\left\| \left| \hat{\sigma} - \sigma \right| \right\|^{2}}{N \alpha} \right) \\ &=& O\_{p} \left( \frac{\left\| \left| \frac{\left| \left| \hat{\sigma} - \sigma \right| \right|}{\sqrt{N}} \right|^{2}}{\alpha} \right) . \end{split}$$

By Assumption A ) ) ) √*σ N* ) ) ) <sup>=</sup> *<sup>O</sup>*(1). Therefore, we obtain that

$$\begin{split} \left\| \left( \left( \hat{\sigma} - \sigma \right)^{'} \hat{\Sigma}^{a} \sigma \right) \right\| &=& \left\| \frac{\left( \hat{\sigma} - \sigma \right)^{'}}{\sqrt{N}} \left( \frac{\hat{\Sigma}}{N} \right)^{a} \frac{\sigma}{\sqrt{N}} \right\| \\ &=& O\_{p} \left( \frac{\left\| \left| \hat{\sigma} - \sigma \right| \right\|}{\sqrt{N} \alpha} \right) \\ &=& O\_{p} \left( \frac{\left\| \left| \frac{\hat{\sigma} - \sigma}{\sqrt{N}} \right|}{\alpha} \right\| . \end{split} $$

Using those information combine with the fact that <sup>Σ</sup><sup>ˆ</sup> *<sup>α</sup>* <sup>=</sup> <sup>Σ</sup><sup>ˆ</sup> *<sup>α</sup>* <sup>−</sup> <sup>Σ</sup>*<sup>α</sup>* <sup>+</sup> <sup>Σ</sup>*α*, we have that

$$\begin{split} \left(c\right) = &\sigma^{'}\Sigma^{a}\sigma + \sigma^{'}\left(\hat{\Sigma}^{x} - \Sigma^{a}\right)\sigma + O\_{p}\left(\left\|\frac{\theta - \sigma}{\sqrt{N}}\right\| + \left\|\frac{\theta - \sigma}{\sqrt{N}}\right\|^{2}\right) \\\\ \left\|\sigma^{'}\left(\Sigma^{x} - \Sigma^{x}\right)\sigma\right\| &=&\left\|\frac{\sigma^{'}}{\sqrt{N}}\left[\left(\frac{\hat{\Sigma}}{N}\right)^{x} - \left(\frac{\Sigma}{N}\right)^{x}\right]\frac{\sigma}{\sqrt{N}}\right\| \\ &\leq&\left\|\frac{\sigma}{\sqrt{N}}\right\|^{2}\left\|\left(\frac{\Sigma}{N}\right)^{x} - \left(\frac{\Sigma}{N}\right)^{x}\right\| \\ &=&O\_{p}\left(\left\|\left(\frac{\hat{\Sigma}}{N}\right)^{x} - \left(\frac{\Sigma}{N}\right)^{x}\right\|\right). \end{split}$$

$$\left\|\left(\frac{\hat{\Sigma}}{N}\right)^{x} - \left(\frac{\Sigma}{N}\right)^{x}\right\| \leq \left\|\left(\frac{\Sigma}{N}\right)^{x}\right\|\left\|\left(\frac{\hat{\Sigma}}{N}\right)^{x}\right\| \left\|\left(\frac{\hat{\Sigma}}{N}\right)^{x} - \frac{\Sigma}{N}\right\|. \end{split}$$

Hence,

$$\left\| \left( \frac{\hat{\Sigma}}{N} \right)^{\alpha} - \left( \frac{\Sigma}{N} \right)^{\alpha} \right\| = O\_p \left( \frac{\left\| \hat{\Sigma} - \hat{\Sigma} \right\|}{\alpha} \right)$$

which implies that

$$\mathcal{I}(\mathbf{c}) = \sigma^{'} \Sigma^{a} \sigma + O\_p \left( \frac{\left\| \left\| \begin{array}{c} \frac{\mathbf{c}}{N} - \frac{\mathbf{c}}{N} \right\| + \left\| \begin{array}{c} \frac{\hat{\sigma} - \mathbf{c}}{\sqrt{N}} \right\| + \left\| \begin{array}{c} \frac{\hat{\sigma} - \mathbf{c}}{\sqrt{N}} \right\| \end{array} \right\|^{2}}{\alpha} \right) .\right.$$

As *T* → ∞ we have that *α* → 0 ⇒

$$\mathcal{C}(\sigma) = \sigma^{'} \Sigma^{-1} \sigma + O\_P \left( \frac{\left\| \frac{\Sigma}{N} - \frac{\Sigma}{N} \right\| + \left\| \frac{\vartheta - \sigma}{\sqrt{N}} \right\| + \left\| \frac{\vartheta - \sigma}{\sqrt{N}} \right\|^2}{\alpha} \right).$$

Using Assumption A combined with Theorem 4 of Carrasco and Florens (2000), we have that

.

$$\left\| \left\| \frac{\Sigma}{N} - \frac{\Sigma}{N} \right\| \right\| = O\_p \left( \frac{1}{\sqrt{T}} \right).$$

Moreover, as ) ) ) *σ*ˆ √−*<sup>σ</sup> N* ) ) ) 2 = *Op* - 1 *T* by Assumption A, we have that

$$(c) = \sigma^{'} \Sigma^{-1} \sigma + O\_p\left(\frac{1}{\alpha \sqrt{T}}\right).$$

$$\begin{aligned} (d) \quad &= \begin{array}{ll} \tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\tiny{\Gamma}}}}}}}}}} \widehat{\scriptstyle{}} \widehat{\scriptstyle{}}} \widehat{\scriptstyle{}} \\ &= \quad \widehat{\Sigma} \prescript{a}{\scriptstyle{\sigma}} \sigma + \widehat{\Sigma}^{a} (\widehat{\sigma} - \sigma) \\ &= \quad \Sigma^{a} \sigma + \left(\widehat{\Sigma}^{a} - \Sigma^{a}\right) \sigma + \widehat{\Sigma}^{a} (\widehat{\sigma} - \sigma) . \end{array} \end{aligned}$$

As *α* → 0 as *T* → ∞, we have that

$$\mathcal{I}(d) = \Sigma^{-1}\sigma + \left(\widehat{\Sigma}^{\mathfrak{a}} - \Sigma\right)\sigma + \widehat{\Sigma}^{\mathfrak{a}}(\mathfrak{b} - \sigma).$$

We know that

$$\begin{aligned} \left\| \left| \hat{\Sigma}^{a} (\hat{\sigma} - \sigma) \right| \right\| &=& \left\| \left( \frac{\hat{\Sigma}}{N} \right)^{a} \frac{(\hat{\sigma} - \sigma)}{N} \right\| \\ &\leq& \left\| \left( \frac{\hat{\Sigma}}{N} \right)^{a} \right\| \left\| \frac{(\hat{\sigma} - \sigma)}{N} \right\| \\ &=& O\_p \left( \frac{1}{a \sqrt{TN}} \right) . \end{aligned}$$

Using the fact that

$$\begin{array}{rcl} \left\| \left( \hat{\Sigma}^{\boldsymbol{a}} - \Sigma \right) \sigma \right\| &=& \left\| \left\{ \left( \frac{\hat{\Sigma}}{N} \right)^{\boldsymbol{a}} - \left( \frac{\Sigma}{N} \right)^{\boldsymbol{a}} \right\} \frac{\sigma}{N} \right\| \\ &\leq& \left\| \left( \frac{\Sigma}{N} \right)^{\boldsymbol{a}} - \left( \frac{\Sigma}{N} \right)^{\boldsymbol{a}} \right\| \left\| \frac{\sigma}{N} \right\| \\ &=& O\_{p} \left( \frac{\left\| \frac{\Sigma}{N} - \frac{\Sigma}{N} \right\|}{\alpha \sqrt{N}} \right) \\ &=& O\_{p} \left( \frac{1}{\alpha \sqrt{TN}} \right) \end{array}$$

we obtain that

$$(d) = \Sigma^{-1} \sigma + O\_p\left(\frac{1}{\alpha \sqrt{TN}}\right).$$

Under the assumption that <sup>1</sup> *α* <sup>√</sup>*<sup>T</sup>* → 0, we have that

$$
\hat{\omega}\_{\mathfrak{n}} = \omega + o\_{\mathcal{P}}(1). \tag{A1}
$$

By Assumption A we have that Σ = *O*(*N*). Therefore, using (A1), we obtain that

$$
\boldsymbol{\hat{\omega}}\_{\mathfrak{a}}^{'} \boldsymbol{\Sigma} \boldsymbol{\hat{\omega}}\_{\mathfrak{a}} = \boldsymbol{\omega}^{'} \boldsymbol{\Sigma} \boldsymbol{\omega} + o\_{\mathfrak{p}}(1) \tag{A2}
$$

if *<sup>N</sup> α* <sup>√</sup>*<sup>T</sup>* → 0. Therefore,

$$DR(\hat{\omega}\_{\mathfrak{a}}) \rightarrow\_p DR(\omega\_t).$$

**Appendix B. Proof of Proposition 2**

$$(A) = \mu^{'} \left[ \left( \beth^{a} - \Sigma^{-1} \right)^{'} \Sigma \left( \beth^{a} - \Sigma^{-1} \right) \right] \mu^{'}$$

We also know that *μ* = *μ*ˆ + (*μ* − *μ*ˆ), so

$$\begin{split} (A) \quad &= \quad \mu' \Big[ \left( \mathfrak{L}^{a} - \Sigma^{-1} \right)' \Sigma \left( \mathfrak{L}^{a} - \Sigma^{-1} \right) \Big] \mu \\ &= \quad \left[ \mathfrak{L}^{a} (\mu - \hat{\mu}) + \left( \mathfrak{L}^{a} \hat{\mu} - \Sigma^{-1} \mu \right) \right]' \Sigma \left[ \mathfrak{L}^{a} (\mu - \hat{\mu}) + \left( \mathfrak{L}^{a} \hat{\mu} - \Sigma^{-1} \mu \right) \right] \\ &= \quad \left( \mathfrak{L}^{a} \hat{\mu} - \Sigma^{-1} \mu \right)' \Sigma \left( \mathfrak{L}^{a} \hat{\mu} - \Sigma^{-1} \mu \right) + \left[ \mathfrak{L}^{a} (\mu - \hat{\mu}) \right]' \Sigma \left[ \mathfrak{L}^{a} (\mu - \hat{\mu}) \right] \\ &\quad + \quad 2 \left[ \mathfrak{L}^{a} (\mu - \hat{\mu}) \right]' \Sigma \left( \mathfrak{L}^{a} \hat{\mu} - \Sigma^{-1} \mu \right) \end{split}$$

Let denote by *x* = Σ−1*μ* and *x*ˆ = Σˆ *αμ*ˆ; therefore,

$$\begin{aligned} \left(A\right) = \left(\hat{\mathbf{x}} - \mathbf{x}\right)' \Sigma \left(\hat{\mathbf{x}} - \mathbf{x}\right) + \left[\hat{\Sigma}^a \left(\mu - \hat{\mu}\right)\right]' \Sigma \left[\hat{\Sigma}^a \left(\mu - \hat{\mu}\right)\right] + 2\left[\hat{\Sigma}^a \left(\mu - \hat{\mu}\right)\right]' \Sigma \left(\hat{\mathbf{x}} - \mathbf{x}\right) \\\\ \text{As } \left\|\mu - \hat{\mu}\right\|^2 = O\_p\left(\frac{N}{\mathsf{T}}\right), \left\|\left(\frac{\mathsf{T}}{N}\right)^a\right\|^2 = O\_p\left(\frac{1}{\mathsf{T}^2}\right), \text{we have that} \\\\ \left\|\left[\hat{\Sigma}^a \left(\mu - \hat{\mu}\right)\right]'\right\| &= \left\|\left[\left(\frac{\mathsf{T}}{N}\right)^a \frac{\left(\mu - \hat{\mu}\right)}{N}\right]'\right\| \\ &\le \left\|\left(\frac{\mathsf{T}}{N}\right)^a \frac{\left\|\left(\frac{\mathsf{T}}{N}\right)^a \frac{\left\|\left(\mu - \hat{\mu}\right)\right\|}{N}\right\|}{\left\|\left[\frac{\mathsf{T}}{a\sqrt{NT}}\right]\right\|} \\ &= O\_p\left(\frac{1}{a\sqrt{NT}}\right) \end{aligned}$$

$$\begin{aligned} \left[\hat{\Sigma}^a(\mu-\hat{\mu})\right]' \Sigma \left[\hat{\Sigma}^a(\mu-\hat{\mu})\right] &=& O\_p\left(\left\|\left[\hat{\Sigma}^a(\mu-\hat{\mu})\right]'\right\|^2 \left\|\Sigma\right\|\right) \\ &=& O\_p\left(\frac{\left\|\Sigma\right\|}{a^2TN}\right) \end{aligned}$$

Using the fact that Σ = *O*(*N*) by Assumption A, we obtain that

$$\begin{aligned} \left[\Sigma^{a}(\mu-\hat{\mu})\right]'\Sigma\left[\Sigma^{a}(\mu-\hat{\mu})\right] &= O\_{p}\left(\frac{N}{a^{2}TN}\right) = O\_{p}\left(\frac{1}{a^{2}T}\right) \\\\ \hat{\mu} - x &= \left.\Sigma^{a}\hat{\mu} - \Sigma^{-1}\mu \\ \hat{\mu} &= \left.(\hat{\mu}-\mu) + \mu\right. \\\\ \left.\hat{\pi} - x\right| &= \left.\hat{\Sigma}^{a}(\hat{\mu}-\mu) + \left(\hat{\Sigma}^{a} - \Sigma^{-1}\right)\mu \Rightarrow \\\\ \left.\Sigma^{a}(\mu-\hat{\mu})\right|'\Sigma(\hat{\mu}-\hat{x}) &= \left[\Sigma^{a}(\mu-\hat{\mu})\right]'\Sigma\left[\Sigma^{a}(\mu-\hat{\mu})\right] + \left[\Sigma^{a}(\mu-\hat{\mu})\right]'\Sigma\left(\Sigma^{a} - \Sigma^{-1}\right)\mu \\\\ \left(\Sigma^{a} - \Sigma^{-1}\right)\mu &= \left.\left(\frac{\Sigma}{N}\right)^{a}\left[\frac{\Sigma}{N} - \frac{\Sigma}{N}\right]\left(\frac{\Sigma}{N}\right)^{-1}\frac{\mu}{N} \\ &= \left.O\_{p}\left(\frac{1}{a\sqrt{NT}}\right)\right| \end{aligned}$$

= *Op*

*α*

which implies that

$$\begin{aligned} \left(A\right)\_{\mathbf{x}} &= \left(\left(\mathbf{f} - \mathbf{x}\right)^{\prime}\Sigma(\mathbf{f} - \mathbf{x}) + O\_p\left(\frac{2}{\alpha^2 T}\right)\right) \\ &= \left(\left(\mathbf{f} - \mathbf{x}\right)^{\prime}\Sigma(\mathbf{f} - \mathbf{x}) + O\_p\left(\frac{1}{\alpha^2 T}\right)\right) \end{aligned}$$

Therefore, we obtain that

$$\begin{split} &E\left\{\mu^{'}\left[\left(\hat{\Sigma}^{\mathfrak{a}}-\Sigma^{-1}\right)^{'}\Sigma\left(\hat{\Sigma}^{\mathfrak{a}}-\Sigma^{-1}\right)\right]\mu\right\} \\ &\sim \quad E\left\{\left(\hat{\mathfrak{x}}-\mathfrak{x}\right)^{'}\Sigma\left(\hat{\mathfrak{x}}-\mathfrak{x}\right)\right\} \end{split}$$

if <sup>1</sup> *<sup>α</sup>*2*<sup>T</sup>* → 0.

#### **References**


Choueifaty, Yves. 2011. Methods and Systems for Providing an Anti-Benchmark Portfolio. U.S. Patent 7,958,038, June 7.


*Cogent Economics & Finance* 6: 1427533.

MDPI St. Alban-Anlage 66 4052 Basel Switzerland Tel. +41 61 683 77 34 Fax +41 61 302 89 18 www.mdpi.com

*Econometrics* Editorial Office E-mail: econometrics@mdpi.com www.mdpi.com/journal/econometrics

MDPI St. Alban-Anlage 66 4052 Basel Switzerland

Tel: +41 61 683 77 34 Fax: +41 61 302 89 18