1. Introduction
In 1952, Harry M. Markowitz [
1] published the classic “Portfolio Selection” in
The Journal of Finance, which ushered in a new era of financial mathematical analysis. Markowitz pointed out that investors who care about return and risk should hold portfolios located at the efficient boundary of mean-variance, which is the famous mean-variance portfolio (MVP) selection model. Since then, many portfolio selection strategies have been proposed by referring to the MVP and its variants. However, MVPs exhibit instability due to estimation errors in the input parameters [
2], especially in large-scale conditions. The instability means that the solution obtained under sample fluctuation may be optimal for a given sample, but it is not optimal from the perspective of risk. For more comments on this model, we refer to [
3,
4,
5,
6] and the references therein.
This paper focuses attention on sample fluctuations and parameter uncertainty in the portfolio selection problem. We now review some relevant methods for the parameter uncertainty. Among various approaches, the attractive one is the robust portfolio (RP), which corresponds to a robust optimization, since it does not use any information about the probability distribution of the uncertain parameters. RP we considered is a conservative approach that minimizes the loss function within an uncertainty set and then solves the problem under the worst-case scenario. In the last two decades, robust portfolio selection problems have gained the increasing interest of researches. These researches constructed well-known optimal portfolios from the perspective of robust optimization [
7,
8,
9,
10]. In this way, Goldfarb and Iyengar [
11] formulated and solved RP problems. They introduced the uncertainty structures for the input parameters, then they showed that the RP problems corresponding to the second-order cone programs and these uncertainty structures correspond to confidence regions employed to estimate the market parameters. Given the uncertainty in the mean and covariance matrix of the asset return, Lobo and Boyd [
12] computed the maximum risk of a portfolio in a numerically efficient way. They proved that this is a semi-definite programming problem and is readily solved by interior-point methods for convex optimization. Min et al. [
13] proposed the hybrid RP models under ellipsoidal uncertainty sets, and they considered both the best-case and the worst-case counterparts. Won and Kim [
14] considered RP problems involving a trade-off between the worst-case utility and the worst-case regret, or the largest difference between the best utility achievable under the model and that achieved by a given portfolio. They showed that the entire optimal trade-off curve can be found via solving a series of semi-definite programs under the ellipsoidal uncertainty model. Some research works [
15,
16] concentrated on the application of robust optimization on basic mean-variance, mean value-at-risk (mean-VaR), and mean conditional-value-at-risk (mean-CVaR) problems, but did not consider variants of the problem like robust index tracking, robust and sparse portfolio selection problems, and so on. More relevant works can be found in [
17,
18,
19,
20] and the references therein.
RPs have a wide range of applications, among these, one essential step is the construction of uncertainty sets. Two types of uncertainty sets are widely used, namely the box uncertainty set and the ellipsoidal uncertainty set. Tütüncü and Koenig [
21] used symmetric box uncertainty sets defined as
and
, where
and
are the lower and upper bounds of mean vector
,
and
are the lower and the upper bounds of the covariance matrix
, respectively, and
is positive semi-definite. Khodamoradi et al. [
22] used box uncertainty sets for a cardinal-constrained mean-variance portfolio problem which allows short selling. Swain and Ojha [
10] analyzed the robust version of the mean-variance portfolio problem and mean-semi-variance portfolio problem with box uncertainty sets. Alternatively, Fabozzi et al. [
23] defined an ellipsoidal uncertainty set for the expected asset return as
, where
is the nominal asset return and
is a small scalar, which controls the size of the uncertainty set. However, they did not consider the uncertainty of the covariance matrix, thus the solution was robust only against perturbations in the asset return vector. Pıinar [
24] developed a multi-period robust mean-variance portfolio problem with an ellipsoidal uncertainty set while allowing short selling. As we all know, the estimation error is more sensitive to the mean vector than the covariance matrix. On the other hand, dealing with the uncertainty in the covariance matrix is more complicated than dealing with the uncertainty set of the mean vector. Thus, in this paper, we consider two types of uncertainty sets for the mean vector.
Financial data have some remarkable features, such as multicollinearity and a heavy tail. Therefore, the perturbations of these data should not be underestimated. By referring to Brodie et al. [
2], who transferred the MVP into a Lasso-type portfolio, we consider the perturbations in the asset return matrix and design its uncertainty set. In addition, from the perspective of transaction costs and administrative expenses, more assets are not always better. Therefore, it is also necessary to consider sparsity when constructing a portfolio [
25,
26,
27]. After these discussions, a natural question follows: How do we find better RPs that not only reduce the undesired impact of parameter uncertainty, but also improve sparsity and reduce cost?
Following the above considerations, this paper proposes a sparsity constrained robust portfolio optimization model with parameter uncertainty and data perturbation. Specifically, we consider the perturbation in the asset return matrix and the parameter uncertainty in the expected asset return. By using the equivalence of robustness and regularization, the Lasso-type objective function can be converted into the sum of a square root and the norm. We consider two kinds of uncertainty sets: the box uncertainty set and the ellipsoidal uncertainty set. For its penalty model, we define three types of stationary points: the Karush–Kuhn–Tucker (KKT) point, the strong KKT point, and the partial minimizer. Under mild constraint qualification (CQ), we prove that any local minimizer of the penalty model is a KKT point. Moreover, the global minimizer of the penalty model is proven to be a partial minimizer and, then, a stronger KKT point under Slater’s CQ. Finally, a penalty alternating direction method is proposed to obtain a portfolio, and its convergence is established. We confirm the effectiveness of our approach by comparing with nine widely studied portfolio models on seven real-world data sets. The numerical results show that the portfolios we proposed have less volatility, that is less risk. Moreover, our portfolio strategies can yield higher Sharpe ratios when the appropriate parameters are selected.
This paper is organized as follows. Some notations and preliminaries used in this paper are given in the next section. The model of robust and sparse portfolios and the analysis of their optimization theory are stated in
Section 3. Two types of uncertainty sets of mean vectors are presented in
Section 4. The optimization algorithm named the penalty alternating direction method is established in
Section 5. Extensive numerical experiments are conducted in
Section 6. Conclusions are drawn in
Section 7.
2. Notations and Preliminary
We use and and to denote the set of real numbers and the n-dimensional and -dimensional Euclidean space. We use boldfaced small letters to denote vectors, e.g., is a column vector with n elements , . The transpose of is denoted as , which is a row vector. In particular, is the vector of all ones of size n. For a vector , we define its absolute value vector by . We use capital letters to denote matrices, e.g., and denote the -th entry of A. Given an index , denotes the sub-vector of . We write the Euclidean norm of by , the norm by , and the infinity norm by . For two vectors and , denotes the standard inner product.
We now provide some existing results of optimization that are crucial for the theory of this paper. For the convenience of expression, we define the following convex programming:
where
is a nonempty convex set,
f is a convex function, and the
s are concave functions. For problem (
1), Slater’s CQ builds a bridge between its solution and the KKT point (the point satisfying the conditions in Theorem 1).
Definition 1 ([
28], Definition 4.17).
Slater’s CQ holds in problem (1) if there exists such that for all . Theorem 1 ([
28], Theorem 4.18).
Suppose that Slater’s CQ holds in problem (1). Then, is an optimal solution to problem (1) if and only if there exist non-negative Lagrange multipliers such thatand for all , where denotes the classical sub-differential set ([28], Definition 2.30) of f at and denotes the classical normal cone ([28], Definition 2.9) of # at . We also introduce some crucial terminologies and results for sparsity nonlinear programming:
where
f is a convex function and
g and
h are continuously differentiable. A restricted linear independence constraint qualification (R-LICQ) used for sparsity nonlinear programming (
2) was defined by [
29] as follows.
Definition 2 ([
29], Definition 2.4).
We say that the R-LICQ holds at , where is feasible for the problem (2):When , , , , are linearly independent.
When , , , , are linearly independent.
Based on the R-LICQ, the following decomposition result holds.
Theorem 2 ([
29], Proposition 2.5).
Let be a feasible point of problem (2) and the R-LICQ hold at . Then,where , , and denotes the Frechét normal cone ([30], Definition 6.3) of # at , which degenerates into the classical norm cone described in Theorem 1 if # is a convex set. For the partial problem (
10) of the portfolio model (
6) in
Section 3.1, the R-LICQ holds automatically at
, where
,
, and
. Next, we establish the relationship between the local minimizer of problem (
2) and its KKT point (the point satisfying the KKT system in Theorem 3).
Theorem 3. Suppose that is a local minimizer of problem (2) and the R-LICQ holds at . Then, there exist non-negative Lagrange multipliers and such that Proof. It follows from Theorem 6.12 of [
30] that
Combing Theorem 2 with the proof of Theorem 3.2 of [
29], this result holds. □
This result is different from Theorem [
29]. We allow the objective function of problem (
2) to be non-differentiable. The analysis process of this result is completely consistent with that of Theorem [
29].
6. Numerical Results
This section shows extensive numerical experiments. In
Section 6.1, we first present six real data sets, explain some existing models to be compared with and describe the performance measures to be used. In
Section 6.2, we demonstrate that our methods lead to robust and sparse portfolios. In
Section 6.3, we compare nine popular portfolios in terms of
out-of-sample (OOS) performance measures. Finally, in
Section 6.4, we show the cumulative return of different portfolio strategies. All of our computations are conducted in the Matlab R2019a environment, on a PC with an Intel(R) Core(TM) i5-7200U CPU (2.50 GHz, 4 CPUs) and 4G RAM processors.
6.1. Models of Comparison, Data, and Performance Measures
(a) Eleven portfolio models compared. We compare the OOS performance of 11 portfolio models across six real data sets of weekly and monthly returns. Those models are well studied, and we divide them into four groups, which are summarized in
Table 1. The first group is the robust and sparse portfolio strategies developed in this paper. The second group includes some well-studied portfolio strategies. The third group includes three benchmark portfolio strategies. The last group consists of two portfolios that use the shrinkage technique to estimate the covariance matrix.
(c) Measuring the OOS performance and its setup. We largely follow the “rolling-window” procedures in [
2,
37] to conduct our comparison. Let
T be the length of a data set and
be the window length (e.g.,
) used to construct the optimal portfolio by a model. In each period
,
, we compute different portfolios over the previous
periods. We then compute the OOS return in the
-th period based on the obtained portfolio. We repeat this procedure until we reach the end of the data set. In this way, we will obtain a series of
portfolio vectors for each model listed in
Table 1. To make it precise, let
be the optimal portfolio obtained by the portfolio strategy
s over the date from
. The OOS return in the
period is computed as
, where
is the return in the
-th period. Thus, we obtain a time series of
periods OOS returns for all strategies. Note that we use the traditional “rolling-window” procedures for the numerical analysis, and some new methods could provide new ideas for the analysis of portfolio selection problems, see [
48].
The OOS performance of each portfolio strategy is assessed by using four quantities: (i) the OOS portfolio variance (
), (ii) the OOS portfolio Sharpe ratio (
), (iii) portfolio turnover (
), and (iv) the average short positions (
). The specific definitions can be found in DeMiguel et al. [
6], Yen and Yen [
38], and Zhao et al. [
37]. We evaluate the cumulative return (CR). The CR of a portfolio scores the total payoffs that are yielded by the investment strategy across the investment periods without considering any risk or cost, see Shen et al. [
49]. We also consider some quantities studied in [
38] on the profiles of the portfolio weights: PAP represents the proportion of active positions and PZP is the proportion of zero positions, respectively, defined as
,
, where
and
.
6.2. Robust and Sparse Portfolio
This section shows the weight of robust and sparse portfolios. We use the DJIA data set and the sparse levels and . The parameter , and the value of varies from to .
Figure 1 shows the portfolio weights, PAP, and PZP. The two plots in the top panel correspond to a robust portfolio under the quadratic uncertainty set, and the sparsity is
. The two plots in the bottom panel correspond to a robust portfolio under the absolute uncertainty set, and the sparsity is
. With the increase of penalty parameter
, the portfolio weights tended to be sparse. The PAP and PZP indicate that we can obtain sparse portfolios that satisfy the specified sparsity.
Figure 2 shows the sparse portfolio. We use four different data sets. The sparsity level on DJIA is
, on NASDAQ and FF100 is
, and on Russell2000 is
. We solve the robust portfolio under the quadratic uncertainty set to show the results. We obtain the portfolio with the specified sparsity and the distribution of different asset weight values.
6.3. Out-of-Sample Performance
The Sharpe ratio considers return and risk at the same time; it is a comprehensive measurement for us to observe the performance of a portfolio. Thus, we first test the Sharpe ratio of different portfolio strategies. We use the SP100 data set. The parameter and the value of varies from 10 to 10. The sparsity level and .
By comparing with two benchmark portfolios,
Figure 3 shows that the RSQ and RSA can produce a higher Sharpe ratio when choosing a suitable penalty parameter.
Table 3 reports the OOS performance by using four quantities defined in
Section 6.1. We set
and the sparsity level
(on the DJIA, NASDAQ, SP500, and FF100 data sets) and
(on the Russell2000 and Russell3000 data sets). We can observe that the RSA and RSQ portfolios achieve the smallest variances across all portfolio strategies, i.e., on average with
and
, respectively. This means they are less volatile, i.e., less risky. SU, SC1F, and SCID have the highest variance on average,
,
and
in this setting. The variance of the remaining portfolio strategies is
(L1),
(EN),
(L12),
(SC),
(EW), and
(Box), respectively. In addition, we observe that the Sharpe ratios of the various portfolios on average are 12.34% (SC), 11.82% (EW), 11.77% (RSA), 11.70% (RSQ), 11.61% (L12), 11.21% (EN), 11.18% (L1), 10.58% (SCID), 10.09% (SC1F), 9.23% (Box), and 7.92% (SU). We see that the RSA and RSQ portfolios do not result in a significantly different OOS Sharpe ratio when compared with SC and EW; however, they are higher than the rest of the portfolio strategies.
As for the portfolio turnover, unsurprisingly, the EW portfolio strategy exhibit the lowest turnover of all portfolio strategies, amounting to 3.20%. The RSA and RSQ portfolio strategies have moderate levels of turnover on average, 13.49% and 13.20%. The highest average turnover is generated by the SU portfolio and, then, by the Box portfolio, amounting on average to 225.81% and 193.10%, meaning that they are very costly. The turnover of the remaining portfolio strategies range between 11.85% (L12), 25.76% (L2), 16.66% (L1), and 13.98% (EN), respectively. The high turnover of SU and Box was reflect in the enormous average short positions of over 283.52% and 333.52% on average across the six data sets. The second two highest average short positions are by SCID and SC1F, respectively, amounting to 164.29% and 132.81%. The average short positions of the SC and EW portfolios are on average approximately 0% across the six data sets. The average short positions of the RSQ and RSA portfolio strategies also tend to zero. Therefore, considering the moderate turnover and the average short positions, the proposed RSQ and RSA strategies represent a practically implementable method that outperform the portfolio strategies listed in
Table 1.
6.4. Cumulative Return
In this subsection, we show the CR of several portfolio strategies. We use the FF100 data set. The sparsity level . The parameter . According to the OOS performance, we choose RSQ, RSA, L12, L1, EN, EW, and SC to compare the CR.
Figure 4 shows the curves of the CR over the corresponding investment periods for the different portfolio strategies. Apparently, RSQ and RSA outperform the others with visible margins. However, RSA and RSQ do not produce significant differences. This result suggest that, compared with the other portfolios, the sparse portfolios RSA and RSQ grow more steadily together with a reduced volatility across most of the investment periods.