Robust and Sparse Portfolio: Optimization Models and Algorithms

Zhao, Hongxin; Jiang, Yilun; Yang, Yizhou

doi:10.3390/math11244925

Open AccessArticle

Robust and Sparse Portfolio: Optimization Models and Algorithms

by

Hongxin Zhao

^1,*,

Yilun Jiang

² and

Yizhou Yang

³

¹

School of Mathematics and Statistics, Beijing Jiaotong University, Beijing 100044, China

²

Department of Economic Management, Shijiazhuang Institute of Railway Technology, Shijiazhuang 050000, China

³

Personnel Department, Shijiazhuang University, Shijiazhuang 050035, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(24), 4925; https://doi.org/10.3390/math11244925

Submission received: 8 November 2023 / Revised: 7 December 2023 / Accepted: 8 December 2023 / Published: 11 December 2023

(This article belongs to the Special Issue Modeling, Analysis and Optimization for Mathematical Finance, Economics and Risks)

Download

Browse Figures

Versions Notes

Abstract

:

The robust and sparse portfolio selection problem is one of the most-popular and -frequently studied problems in the optimization and financial literature. By considering the uncertainty of the parameters, the goal is to construct a sparse portfolio with low volatility and decent returns, subject to other investment constraints. In this paper, we propose a new portfolio selection model, which considers the perturbation in the asset return matrix and the parameter uncertainty in the expected asset return. We define three types of stationary points of the penalty problem: the Karush–Kuhn–Tucker point, the strong Karush–Kuhn–Tucker point, and the partial minimizer. We analyze the relationship between these stationary points and the local/global minimizer of the penalty model under mild conditions. We design a penalty alternating-direction method to obtain the solutions. Compared with several existing portfolio models on seven real-world datasets, extensive numerical experiments demonstrate the robustness and effectiveness of our model in generating lower volatility.

Keywords:

portfolio optimization; robustness; sparsity; uncertainty set; penalty-alternating-direction method

MSC:

91G10; 90C90; 90C30

1. Introduction

In 1952, Harry M. Markowitz [1] published the classic “Portfolio Selection” in The Journal of Finance, which ushered in a new era of financial mathematical analysis. Markowitz pointed out that investors who care about return and risk should hold portfolios located at the efficient boundary of mean-variance, which is the famous mean-variance portfolio (MVP) selection model. Since then, many portfolio selection strategies have been proposed by referring to the MVP and its variants. However, MVPs exhibit instability due to estimation errors in the input parameters [2], especially in large-scale conditions. The instability means that the solution obtained under sample fluctuation may be optimal for a given sample, but it is not optimal from the perspective of risk. For more comments on this model, we refer to [3,4,5,6] and the references therein.

This paper focuses attention on sample fluctuations and parameter uncertainty in the portfolio selection problem. We now review some relevant methods for the parameter uncertainty. Among various approaches, the attractive one is the robust portfolio (RP), which corresponds to a robust optimization, since it does not use any information about the probability distribution of the uncertain parameters. RP we considered is a conservative approach that minimizes the loss function within an uncertainty set and then solves the problem under the worst-case scenario. In the last two decades, robust portfolio selection problems have gained the increasing interest of researches. These researches constructed well-known optimal portfolios from the perspective of robust optimization [7,8,9,10]. In this way, Goldfarb and Iyengar [11] formulated and solved RP problems. They introduced the uncertainty structures for the input parameters, then they showed that the RP problems corresponding to the second-order cone programs and these uncertainty structures correspond to confidence regions employed to estimate the market parameters. Given the uncertainty in the mean and covariance matrix of the asset return, Lobo and Boyd [12] computed the maximum risk of a portfolio in a numerically efficient way. They proved that this is a semi-definite programming problem and is readily solved by interior-point methods for convex optimization. Min et al. [13] proposed the hybrid RP models under ellipsoidal uncertainty sets, and they considered both the best-case and the worst-case counterparts. Won and Kim [14] considered RP problems involving a trade-off between the worst-case utility and the worst-case regret, or the largest difference between the best utility achievable under the model and that achieved by a given portfolio. They showed that the entire optimal trade-off curve can be found via solving a series of semi-definite programs under the ellipsoidal uncertainty model. Some research works [15,16] concentrated on the application of robust optimization on basic mean-variance, mean value-at-risk (mean-VaR), and mean conditional-value-at-risk (mean-CVaR) problems, but did not consider variants of the problem like robust index tracking, robust and sparse portfolio selection problems, and so on. More relevant works can be found in [17,18,19,20] and the references therein.

RPs have a wide range of applications, among these, one essential step is the construction of uncertainty sets. Two types of uncertainty sets are widely used, namely the box uncertainty set and the ellipsoidal uncertainty set. Tütüncü and Koenig [21] used symmetric box uncertainty sets defined as

U_{μ} = {μ \in R^{n} | μ^{L} \leq μ \leq μ^{U}}

and

U_{Σ} = {Σ \in R^{n \times n} | Σ^{L} \leq Σ \leq Σ^{U}, Σ ⪰ 0}

, where

μ^{L} \in R^{n}

and

μ^{U} \in R^{n}

are the lower and upper bounds of mean vector

μ

,

Σ^{L} \in R^{n \times n}

and

Σ^{U} \in R^{n \times n}

are the lower and the upper bounds of the covariance matrix

Σ

, respectively, and

Σ

is positive semi-definite. Khodamoradi et al. [22] used box uncertainty sets for a cardinal-constrained mean-variance portfolio problem which allows short selling. Swain and Ojha [10] analyzed the robust version of the mean-variance portfolio problem and mean-semi-variance portfolio problem with box uncertainty sets. Alternatively, Fabozzi et al. [23] defined an ellipsoidal uncertainty set for the expected asset return as

U_{μ} = {μ | (μ - \bar{μ}) Σ^{- 1} {(μ - \bar{μ})}^{⊤} \leq ϵ^{2}}

, where

\bar{μ}

is the nominal asset return and

ϵ^{2}

is a small scalar, which controls the size of the uncertainty set. However, they did not consider the uncertainty of the covariance matrix, thus the solution was robust only against perturbations in the asset return vector. Pıinar [24] developed a multi-period robust mean-variance portfolio problem with an ellipsoidal uncertainty set while allowing short selling. As we all know, the estimation error is more sensitive to the mean vector than the covariance matrix. On the other hand, dealing with the uncertainty in the covariance matrix is more complicated than dealing with the uncertainty set of the mean vector. Thus, in this paper, we consider two types of uncertainty sets for the mean vector.

Financial data have some remarkable features, such as multicollinearity and a heavy tail. Therefore, the perturbations of these data should not be underestimated. By referring to Brodie et al. [2], who transferred the MVP into a Lasso-type portfolio, we consider the perturbations in the asset return matrix and design its uncertainty set. In addition, from the perspective of transaction costs and administrative expenses, more assets are not always better. Therefore, it is also necessary to consider sparsity when constructing a portfolio [25,26,27]. After these discussions, a natural question follows: How do we find better RPs that not only reduce the undesired impact of parameter uncertainty, but also improve sparsity and reduce cost?

Following the above considerations, this paper proposes a sparsity constrained robust portfolio optimization model with parameter uncertainty and data perturbation. Specifically, we consider the perturbation in the asset return matrix and the parameter uncertainty in the expected asset return. By using the equivalence of robustness and regularization, the Lasso-type objective function can be converted into the sum of a square root and the

ℓ_{1}

norm. We consider two kinds of uncertainty sets: the box uncertainty set and the ellipsoidal uncertainty set. For its penalty model, we define three types of stationary points: the Karush–Kuhn–Tucker (KKT) point, the strong KKT point, and the partial minimizer. Under mild constraint qualification (CQ), we prove that any local minimizer of the penalty model is a KKT point. Moreover, the global minimizer of the penalty model is proven to be a partial minimizer and, then, a stronger KKT point under Slater’s CQ. Finally, a penalty alternating direction method is proposed to obtain a portfolio, and its convergence is established. We confirm the effectiveness of our approach by comparing with nine widely studied portfolio models on seven real-world data sets. The numerical results show that the portfolios we proposed have less volatility, that is less risk. Moreover, our portfolio strategies can yield higher Sharpe ratios when the appropriate parameters are selected.

This paper is organized as follows. Some notations and preliminaries used in this paper are given in the next section. The model of robust and sparse portfolios and the analysis of their optimization theory are stated in Section 3. Two types of uncertainty sets of mean vectors are presented in Section 4. The optimization algorithm named the penalty alternating direction method is established in Section 5. Extensive numerical experiments are conducted in Section 6. Conclusions are drawn in Section 7.

2. Notations and Preliminary

We use

R

and

R^{n}

and

R^{m \times n}

to denote the set of real numbers and the n-dimensional and

m \times n

-dimensional Euclidean space. We use boldfaced small letters to denote vectors, e.g.,

w \in R^{n}

is a column vector with n elements

w_{i}

,

i = 1, \dots, n

. The transpose of

w

is denoted as

w^{T}

, which is a row vector. In particular,

1_{n}

is the vector of all ones of size n. For a vector

a \in R^{n}

, we define its absolute value vector by

| a | : = (| a_{1} |, \dots, | a_{n} |)

. We use capital letters to denote matrices, e.g.,

A \in R^{m \times n}

and

a_{i j}

denote the

(i, j)

-th entry of A. Given an index

Γ \subset {1, \dots, n}

,

a_{Γ}

denotes the sub-vector of

a

. We write the Euclidean norm of

w

by

{∥ w ∥}_{2}

, the

ℓ_{1}

norm by

{∥ w ∥}_{1}

, and the infinity norm by

{∥ w ∥}_{\infty}

. For two vectors

a \in R^{n}

and

b \in R^{n}

,

〈 a, b 〉

denotes the standard inner product.

We now provide some existing results of optimization that are crucial for the theory of this paper. For the convenience of expression, we define the following convex programming:

\begin{matrix} min_{x \in R^{n}} & f (x), \\ s . t . & g_{i} (x) \geq 0, i = 1, \dots, m, x \in Ω, \end{matrix}

(1)

where

Ω

is a nonempty convex set, f is a convex function, and the

g_{i} (x)

s are concave functions. For problem (1), Slater’s CQ builds a bridge between its solution and the KKT point (the point satisfying the conditions in Theorem 1).

Definition 1

([28], Definition 4.17). Slater’s CQ holds in problem (1) if there exists

u \in Ω

such that

g_{i} (u) > 0

for all

i = 1, \dots, m

.

Theorem 1

([28], Theorem 4.18). Suppose that Slater’s CQ holds in problem (1). Then,

x^{*}

is an optimal solution to problem (1) if and only if there exist non-negative Lagrange multipliers

(λ_{1}, \dots, λ_{m}) \in R^{m}

such that

0 \in \partial f (x^{*}) - \sum_{i = 1}^{m} λ_{i} \partial g_{i} (x^{*}) + N (x^{*}; Ω)

and

λ_{i} \partial g_{i} (x^{*}) = 0

for all

i = 1, \dots, m

, where

\partial f (x^{*})

denotes the classical sub-differential set ([28], Definition 2.30) of f at

x^{*}

and

N (x^{*}; #)

denotes the classical normal cone ([28], Definition 2.9) of # at

x^{*}

.

We also introduce some crucial terminologies and results for sparsity nonlinear programming:

\begin{matrix} min_{x \in R^{n}} & f (x) \\ s . t . & g_{i} (x) \geq 0, i = 1, \dots, m, \\ h_{j} (x) = 0, j = 1, \dots, l, \\ {∥ x ∥}_{0} \leq s, \end{matrix}

(2)

where f is a convex function and g and h are continuously differentiable. A restricted linear independence constraint qualification (R-LICQ) used for sparsity nonlinear programming (2) was defined by [29] as follows.

Definition 2

([29], Definition 2.4). We say that the R-LICQ holds at

x^{*}

, where

x^{*}

is feasible for the problem (2):

When ${∥ x^{*} ∥}_{0} = s$ , $\nabla g_{i} (x^{*})$ , $i \in I (x^{*})$ , $\nabla h_{j} (x^{*})$ , $j = 1, \dots, l,$ are linearly independent.
When ${∥ x^{*} ∥}_{0} < s$ , $\nabla_{Γ^{*}} g_{i} (x^{*})$ , $i \in I (x^{*})$ , $\nabla_{Γ^{*}} h_{j} (x^{*})$ , $j = 1, \dots, l,$ are linearly independent.

Based on the R-LICQ, the following decomposition result holds.

Theorem 2

([29], Proposition 2.5). Let

x^{*}

be a feasible point of problem (2) and the R-LICQ hold at

x^{*}

. Then,

\hat{N} (x^{*}; S \cap Q) = \hat{N} (x^{*}; S) + \hat{N} (x^{*}; Q),

where

S : = {x : ∥ x ∥_{0} \leq s}

,

Q : = {x : g_{i} (x) \geq 0, i = 1, \dots, m, h_{j} (x) = 0, j = 1, \dots, l}

, and

\hat{N} (x^{*}; #)

denotes the Frechét normal cone ([30], Definition 6.3) of # at

x^{*}

, which degenerates into the classical norm cone described in Theorem 1 if # is a convex set.

For the partial problem (10) of the portfolio model (6) in Section 3.1, the R-LICQ holds automatically at

x^{*}

, where

m = 0

,

l = 1

, and

h (x) : = 1^{T} x - 1

. Next, we establish the relationship between the local minimizer of problem (2) and its KKT point (the point satisfying the KKT system in Theorem 3).

Theorem 3.

Suppose that

x^{*}

is a local minimizer of problem (2) and the R-LICQ holds at

x^{*}

. Then, there exist non-negative Lagrange multipliers

(λ_{1}^{*}, \dots, λ_{m}^{*}) \in R_{+}^{m}

and

(μ_{1}^{*}, \dots, μ_{m}^{*}) \in R^{l}

such that

\begin{matrix} \{\begin{matrix} 0 \in \partial f (x^{*}) - \sum_{i = 1}^{m} λ_{i} \partial g_{i} (x^{*}) + \sum_{j = 1}^{l} λ_{i} \partial h_{j} (x^{*}) + \hat{N} (x^{*}; S), \\ g_{i} (x) \geq 0, λ_{i} g_{i} (x) = 0, i = 1, \dots, m, \\ h_{j} (x) = 0, j = 1, \dots, l, \\ {∥ x ∥}_{0} \leq s . \end{matrix} \end{matrix}

(3)

Proof.

It follows from Theorem 6.12 of [30] that

0 \in \partial f (x^{*}) + \hat{N} (x^{*}; S \cap Q) .

Combing Theorem 2 with the proof of Theorem 3.2 of [29], this result holds. □

This result is different from Theorem [29]. We allow the objective function of problem (2) to be non-differentiable. The analysis process of this result is completely consistent with that of Theorem [29].

3. Model and Optimization Theory

In this section, we first propose a robust and sparse portfolio model (4) with an uncertainty set constraint and a sparsity constraint. For the convenience of the numerical calculation, we consider its

ℓ_{1}

norm penalization variant (6). We define three types of stationary points of the penalization variant: the KKT point, the strong KKT point, and the partial minimizer. The relationships of these stationary points and the local/global minimizer of the penalization problem (6) are established in Section 3.2.

3.1. Robust and Sparse Portfolio Model

Consider n risky assets, denoting the asset return at period t by

r_{t} = {(r_{1}, . . ., r_{n})}^{⊤} \in R^{n}

. The expected return vector of different assets is denoted by

E (r_{t}) = μ

, and the covariance matrix is denoted by

E [(r_{t} - μ) {(r_{t} - μ)}^{⊤}] = V

. In the traditional Markowitz portfolio selection problem, the portfolio construction is based on the trade-off between risk and return. For a given level of acceptable portfolio return

ρ = w^{⊤} μ

, the mean-variance optimization can be formulated as

\begin{matrix} min_{w \in R^{n}} \frac{1}{2} w^{⊤} V w, s . t . w^{⊤} μ = ρ, w^{⊤} 1_{n} = 1, \end{matrix}

and its aim is to find a portfolio that has minimal risk for a given expected return. A significant model that has been developed from the Markowitz model is the Lasso-type portfolio proposed by Brodie et al. [2], which is given as:

\begin{matrix} min_{w \in R^{n}} \frac{1}{T} ∥ ρ 1_{T} {- R w ∥}_{2}^{2} + α {∥ w ∥}_{1}, s . t . w^{⊤} \bar{μ} = ρ, w^{⊤} 1_{n} = 1, \end{matrix}

where

\bar{μ} = \frac{1}{T} \sum_{t = 1}^{T} r_{t}

,

α

is the penalty parameter, and

R \in R^{T \times n}

is the asset return matrix. Brodie et al. [2] confirmed that the

ℓ_{1}

norm can produce a sparse portfolio, and this method can stabilize the problem. In this paper, we start with the square root Lasso-type portfolio, while adding more consideration about the perturbation in asset return matrix R and the parameter uncertainty in

μ

. We propose the following robust and sparse portfolio selection model:

\begin{matrix} min_{w \in R^{n}} max_{Δ \in U_{0}} & ∥ ρ 1_{T} {- (R + Δ) w ∥}_{2} \\ s . t . & min_{μ \in U} w^{T} μ \geq ρ, \\ w^{T} 1_{n} = 1, w \geq 0, \\ {∥ w ∥}_{0} \leq s, \end{matrix}

(4)

where

Δ

is the data perturbation matrix and

U_{0} = {Δ \in R^{T \times n} : ∥ Δ_{i} ∥_{2} \leq α, \forall i \in {1, . . ., n}}

. The uncertainty set of the asset return is denoted by U, and we will discuss two selections of U in the last section.

In [31] (Chapter 2), they showed the equivalence of robustness and regularization. Specifically, they precisely characterized the conditions on the model of uncertainty and loss function under which robustness is equivalent to regularization for linear regression.

Definition 3.

Let

g : R^{T} \to R

and

h : R^{n} \to R

be the norm, then the induced norm

{∥ \cdot ∥}_{(h, g)}

is defined as

{∥ Δ ∥}_{(h, g)} = max_{w \in R^{n}} \frac{g (Δ w)}{h (w)} .

Theorem 4

([31], Chapter 2). If

r, q \in [1, \infty]

, then

min_{w} max_{Δ \in U_{(ℓ_{q}, ℓ_{r})}} {∥ y - (R + Δ) w ∥}_{r} = min_{w} {∥ y - R w ∥}_{r} + α {∥ w ∥}_{q},

where

U_{(ℓ_{q}, ℓ_{r})} = {{Δ : ∥ Δ ∥}_{(ℓ_{q}, ℓ_{r})} \leq α}

. Moreover, if

U_{0} = {Δ : ∥ Δ_{i} ∥_{2} \leq α, \forall i \in {1, . . ., n}}

, then

U_{(ℓ_{1}, ℓ_{2})} = U_{0}

, and this implies

min_{w} max_{∥ Δ_{i} ∥_{2} \leq α} {∥ y - (R + Δ) w ∥}_{2} = min_{w} {∥ y - R w ∥}_{2} + α {∥ w ∥}_{1} .

From the relationship of the robustness and the regularization, problem (4) can be rewritten as

\begin{matrix} min_{w \in R^{n}} & ∥ ρ 1_{T} {- R w ∥}_{2} + α {∥ w ∥}_{1} \\ s . t . & min_{μ \in U} w^{T} μ \geq ρ, \\ w^{T} 1_{n} = 1, w \geq 0, \\ {∥ w ∥}_{0} \leq s . \end{matrix}

(5)

Under this transformation, the problem (5) actually enjoys robustness. We plan to use an alternating penalty method to solve problem (5). To ensure the implementation of the alternating penalty method, we add a copy constraint

w = v

to the problem (5) and, then, move it to the objective function by means of the

ℓ_{1}

norm penalty, then the penalization formulation is

\begin{matrix} min_{w, v \in R^{n}} & f (w, v) : = ∥ ρ 1_{T} {- R w ∥}_{2} + {α ∥ w ∥}_{1} + β {∥ w - v ∥}_{1} \\ s . t . & w \in Ω_{1} : = {w | min_{μ \in U} w^{T} μ \geq ρ, w \geq 0} \\ v \in Ω_{2} : = {v | v^{T} 1_{n} = {1, ∥ v ∥}_{0} \leq s} . \end{matrix}

(6)

We conduct its optimality analysis in the next subsection.

3.2. Optimization Theory

We now analyze the optimality of the penalization problem (6). Obviously, the objective function of the problem (6) is a lower semi-continuous and coercive function. Theorem 5 in the next subsection provides the existence of optimal solutions. The selection of the uncertainty set U is discussed in Section 4.

This subsection provides a few theoretical results of the problem (6) including the existence of the solution and three classes of the first-order necessary optimal condition.

Theorem 5.

For any given

α \in R_{+}

and

β \in R_{+}

, the optimal solutions of the problem (6) can be attained.

Proof.

It is clear that f is a proper, closed, and coercive function and

Ω_{1} \times Ω_{2}

is a nonempty closed set satisfying

Ω_{1} \times Ω_{2} \cap dom (f) \neq \emptyset

. It follows from Theorem 2.14 of [32] that this theorem holds. □

We now define a class of KKT points of the problem (6). For the convenience of expression and the generality of optimality, we write

{min}_{μ \in U} w^{T} μ \geq ρ

as

g (w) \geq 0

and suppose that g is a concave function and is not necessarily differentiable. Indeed, the quadratic uncertainty set and the absolute uncertainty set introduced in Section 4 satisfy these terminologies.

Definition 4.

The point

(w^{*}, v^{*}) \in Ω_{1} \times Ω_{2}

is called a KKT point of the problem (6), if there exist Lagrange multipliers

λ_{1}^{*} \in R_{+}

and

λ_{2}^{*} \in R

such that the following system holds:

\begin{matrix} \{\begin{matrix} 0 \in \partial_{w} f (w^{*}, v^{*}) - λ_{1}^{*} \partial_{w} g (w^{*}) + N (w^{*}; R_{+}), \\ 0 \in \partial_{v} f (w^{*}, v^{*}) - λ_{2}^{*} 1 + \hat{N} (v^{*}; S), \\ g (w^{*}) \geq 0, λ^{*} g (w^{*}) = 0, \\ 1^{T} v^{*} = 1, {∥ v^{*} ∥}_{0} \leq s . \end{matrix} \end{matrix}

(7)

Although the functions corresponding to the quadratic uncertainty set and the absolute uncertainty set introduced in Section 4 are all concave and may both be non-differentiable and Slater’s CQ automatically holds for both functions, we still considered Slater’s CQ as a condition of Theorem 6 for the sake of generality. Moreover, it is stated in Section 2 that the R-LICQ of

Ω_{2}

holds at every point. Then, only under the condition that Slater’s CQ holds, the relationship between the local minimizer of the problem (6) and the KKT point of the problem (6) can be established.

Theorem 6.

Let

(w^{*}, v^{*}) \in Ω_{1} \times Ω_{2}

be a local minimizer of the problem (6). If Slater’s CQ holds on

Ω_{1}

, then it is a KKT point of the problem (6).

Proof.

On the one hand, since

(w^{*}, v^{*}) \in Ω_{1} \times Ω_{2}

is a local minimizer of the problem (6),

w^{*}

is a local minimizer of the following optimization:

\begin{matrix} min_{w \in R^{n}} & f (w, v^{*}) = ∥ ρ 1_{T} {- R w ∥}_{2} + {α ∥ w ∥}_{1} + β {∥ w - v^{*} ∥}_{1} \\ s . t . & w \in Ω_{1} . \end{matrix}

(8)

Notice that

f (w, v^{*})

is a convex function about

w

and

Ω_{1}

is a convex set. Then, problem (8) is a convex optimization. Since Slater’s CQ holds on

Ω_{1}

, it follows from Theorem 1 that there exists a Lagrange multiplier

λ_{1}^{*} \in R_{+}

such that

\begin{matrix} \{\begin{matrix} 0 \in \partial_{w} f (w^{*}, v^{*}) + λ_{1}^{*} \partial_{w} g (w^{*}) + N (w^{*}; R_{+}), \\ g (w^{*}) \geq 0, λ_{1}^{*} g (w^{*}) = 0 . \end{matrix} \end{matrix}

(9)

On the other hand,

v^{*}

is a local minimizer of the following optimization:

\begin{matrix} min_{v \in R^{n}} & f (w^{*}, v) = ∥ ρ 1_{T} - R w^{*} ∥_{2} + α ∥ w^{*} ∥_{1} + β {∥ w^{*} - v ∥}_{1} \\ s . t . & v \in Ω_{2} . \end{matrix}

(10)

Since the R-LICQ of

Ω_{2}

holds at every point, it follows from Theorem 3 that there exists a Lagrange multiplier

λ_{2}^{*} \in R

such that

\begin{matrix} \{\begin{matrix} 0 \in \partial_{v} f (w^{*}, v^{*}) + λ_{2}^{*} 1 + \hat{N} (v^{*}; S), \\ 1^{T} v^{*} = 1, {∥ v^{*} ∥}_{0} \leq s . \end{matrix} \end{matrix}

(11)

Combing the system (9) and (11), this theorem holds. □

Again, problem (10) can be simply written as

\begin{matrix} min_{v \in R^{n}} & ∥ w^{*} {- v ∥}_{1} \\ s . t . & v \in Ω_{2}, \end{matrix}

and it has a closed-form solution; see [33], i.e.,

\begin{matrix} v_{i}^{*} = \{\begin{matrix} \frac{w_{i}^{*}}{{(w_{s}^{*})}^{T} 1_{s}}, & if i \in I_{s}^{*} \\ 0, & otherwise, \end{matrix} \end{matrix}

(12)

where

I_{s}^{*} : = {i | w_{〈 1 〉}^{*} \geq \dots \geq w_{〈 s 〉}^{*}}

and

w_{〈 i 〉}^{*}

denotes the i-th largest absolute value among the n elements of

w^{*}

. Thus, we can define a class of strong KKT points of the problem (6) as follows.

Definition 5.

The point

(w^{*}, v^{*}) \in Ω_{1} \times Ω_{2}

is called a strong KKT point of the problem (6), if there exists a Lagrange multiplier

λ^{*} \in R_{+}

such that the following system holds:

\begin{matrix} \{\begin{matrix} 0 \in \partial_{w} f (w^{*}, v^{*}) + \partial_{w} g (w^{*}) + N (w^{*}; R_{+}), \\ g (w^{*}) \geq 0, λ^{*} g (w^{*}) = 0, \\ v_{i}^{*} = \{\begin{matrix} \frac{w_{i}^{*}}{{(w_{s}^{*})}^{T} 1_{s}}, & if i \in I_{s}^{*} \\ 0, & otherwise . \end{matrix} \end{matrix} \end{matrix}

(13)

It is easy to prove that, if

(w^{*}, v^{*})

is a strong KKT point of the problem (6), then it is a KKT point of the problem (6). The following result provides the relationship between the global minimizer of the problem (6) and the strong KKT point of the problem (6).

Theorem 7.

Let

(w^{*}, v^{*}) \in Ω_{1} \times Ω_{2}

be a global minimizer of the problem (6). If Slater’s CQ holds on

Ω_{1}

at

w^{*}

, then it is a strong KKT point of the problem (6).

Proof.

The part of

w^{*}

in (13) follows from (7). We only need to discuss the part of

v^{*}

in (13). Since

v^{*}

is the global minimizer of (10), it follows from (12) that the part of

v^{*}

in (13) holds. □

Note that the local minimizer of the problem (6) cannot be guaranteed to be a strong KKT point.

Finally, we introduce the third stationary point of the problem (6), which is called the partial minimizer.

Definition 6.

The point

(w^{*}, v^{*}) \in Ω_{1} \times Ω_{2}

is called a partial minimizer of the problem (6), if it satisfies

f (w^{*}, v^{*}) \leq f (w, v^{*}), \forall w \in Ω_{1}, f (w^{*}, v^{*}) \leq f (w^{*}, v), \forall v \in Ω_{2} .

Clearly, any global minimizer of the problem (6) is a partial minimizer. Moreover, on the one hand, the partial problem (8) is a convex optimization, and Slater’s CQ ensures that its KKT point and global minimizer are consistent. On the other hand, the partial problem (10) has a closed-form solution. Thus, the equivalence relationship between the KKT point of the problem (6) and the partial minimizer of the problem (6) can be established under Slater’s CQ.

Theorem 8.

Let

(w^{*}, v^{*}) \in Ω_{1} \times Ω_{2}

be a feasible point of the problem (6). Suppose that Slater’s CQ holds on

Ω_{1}

. Then,

(w^{*}, v^{*})

is a partial minimizer of the problem (6) if and only if

(w^{*}, v^{*})

is a strong KKT point of the problem (6).

Proof.

Suppose that

(w^{*}, v^{*})

is a strong KKT point of the problem (6), then

0 \in \partial_{w} f (w^{*}, v^{*}) + \partial_{w} g (w^{*}) + N (w^{*}; R_{+}), g (w^{*}) \geq 0, and λ_{1}^{*} g (w^{*}) = 0 .

Since Slater’s CQ holds at

w^{*}

,

w^{*}

is a global minimizer of the problem (8). Then, we have that

f (w^{*}, v^{*}) \leq f (w, v^{*})

\forall w \in Ω_{1}

. Moreover, it follows from the definition of the strong KKT point of the problem (6) that

v^{*}

is a global minimizer of the problem (10). Then, we have that

f (w^{*}, v^{*}) \leq f (w^{*}, v)

,

\forall v \in Ω_{2}

. Thus,

(w^{*}, v^{*})

is a partial minimum of the problem (6). The opposite conclusion clearly holds. □

4. The Uncertainty Set U

In SubSection 3.2, we rewrite the uncertainty set constraint as

g (w) \geq 0

, where g is a generalized concave function and is not necessarily differentiable. This section introduces two mainstream formulations for the uncertainty set in asset mean return vector

μ

(see [34]), which corresponds to the quadratic uncertainty set and the absolute uncertainty set, respectively.

4.1. The Quadratic Uncertainty Set

The first one is the quadratic formulation,

U = {μ | {(μ - \bar{μ})}^{T} Ω (μ - \bar{μ}) \leq κ^{2}}

, where

\bar{μ}

is the nominal expected return and

κ

is the error. Assume that asset returns are independent and identically distributed and

μ - \bar{μ}

follows a normal distribution with mean value

0

and covariance matrix

Ω

, where

Ω

is the covariance matrix of errors in the expected asset return. In Yin et al. [35], they discussed the choice of uncertainty matrix

Ω

in the quadratic uncertainty set and proposed the selection criteria. In the quadratic uncertainty case,

{min}_{μ \in U} w^{T} μ

in problem (5) is equivalent to the following problem:

max_{μ \in U} w^{T} \bar{μ} - w^{T} μ .

Solving the above problem, we obtain

μ = \bar{μ} - \sqrt{\frac{κ^{2}}{w^{T} Ω w}} Ω w .

Then, the problem (5) is rewritten as

\begin{matrix} min_{w, v \in R^{n}} & ∥ ρ 1_{T} {- R w ∥}_{2} + α {∥ w ∥}_{1} \\ s . t . & {\bar{μ}}^{T} w - κ \sqrt{w^{T} Ω w} \geq ρ, \\ w^{T} 1_{n} = 1, w \geq 0, \\ {∥ w ∥}_{0} \leq s . \end{matrix}

(14)

Here,

g (w) = {\bar{μ}}^{T} w - κ \sqrt{w^{T} Ω w} - ρ

. The penalization form of problem (14) can be rewritten as

\begin{matrix} min_{w, v \in R^{n}} & ∥ ρ 1_{T} {- R w ∥}_{2} + {α ∥ w ∥}_{1} + β {∥ w - v ∥}_{1} \\ s . t . & w \in Ω_{1} = {w | κ ∥ \sqrt{Ω} w ∥_{2} \leq {\bar{μ}}^{T} w - ρ, w \geq 0} \\ v \in Ω_{2} = {v | v^{T} 1_{n} = {1, ∥ v ∥}_{0} \leq s} . \end{matrix}

According to the proposition of Yin et al. [35], we choose

Ω \propto d i a g (V)

. By using this uncertainty matrix, it is expected to reduce the sensitivity to the inputs, as well as keep the original volatility unchanged.

4.2. The Absolute Uncertainty Set

Fabozzi et al. [23] used the absolute uncertainty set in mean returns that ask that the sum of absolute spreads between estimated and possible mean returns should not be too large. The absolute formulation is

U = {μ | \sum_{i} | μ_{i} - {\bar{μ}}_{i} | \leq \frac{κ \bar{σ}}{\sqrt{T}}}

. In this case,

μ^{T} w - {\bar{μ}}^{T} w \geq - \sum_{i} | μ_{i} - {\bar{μ}}_{i} | max (| w_{i} |) \geq - \frac{κ \bar{σ}}{\sqrt{T}} max (| w_{i} |);

thus,

μ^{T} w \geq {\bar{μ}}^{T} w - \frac{κ \bar{σ}}{\sqrt{T}} max (| w_{i} |) .

Then, the problem (5) is equivalent to

\begin{matrix} min_{w, v \in R^{n}} & ∥ ρ 1_{T} {- R w ∥}_{2} + α {∥ w ∥}_{1} \\ s . t . & {\bar{μ}}^{T} w - \frac{κ \bar{σ}}{\sqrt{T}} max (| w_{i} |) \geq ρ, \\ w^{T} 1_{n} = 1, w \geq 0, \\ {∥ w ∥}_{0} \leq s . \end{matrix}

(15)

Here,

g (w) = {\bar{μ}}^{T} w - κ \bar{σ} max (| w_{i} |) / \sqrt{T} - ρ

. Similarly, the penalization from of problem (15) can be written as

\begin{matrix} min_{w, v \in R^{n}} & ∥ ρ 1_{T} {- R w ∥}_{2} + {α ∥ w ∥}_{1} + β {∥ w - v ∥}_{1} \\ s . t . & w \in Ω_{1} = {w | | w_{i} | \leq \frac{\sqrt{T}}{κ \bar{σ}} (w^{T} \bar{μ} - ρ), i = 1, . . ., n, w \geq 0} \\ v \in Ω_{2} = {v | v^{T} 1_{n} = {1, ∥ v ∥}_{0} \leq s} . \end{matrix}

5. Optimization

This section introduces a penalty alternating direction method (PADM) to solve problem (5).

5.1. Alternating Direction Methods

We first discuss the optimization of the problem (6). Due to the complexity of this problem, alternating direction methods (ADMs) can be used to solve this problem. The framework of ADMs is described as follows:

Next, we state the general convergence result of Algorithm 1, and one can refer to Geissler et al. [36] for a proof (Theorem 8) and for further details about this method.

Algorithm 1 ADM: Alternating Direction Method.

1:: Set the problem parameters: $α, κ, ρ, T > 0$ , asset return matrix $R \in R^{T \times n}$ , and nominal expected return vector $\bar{μ} \in R^{n}$ . Initialize $ε > 0$ , $(w^{0}$ , $v^{0})$ and penalty parameter $β > 0$ . Set the iteration index $k : = 0, 1, . . .$ .
2:: Compute

$w^{k + 1} \in arg min_{w} {f (w, v^{k}) : w \in Ω_{1}},$

(16)

and

$v^{k + 1} \in arg min_{v} {f (w^{k + 1}, v) : v \in Ω_{2}} .$

(17)
3:: If $\frac{∥ w^{k + 1} - w^{k} ∥_{2} + {∥ v^{k + 1} - v^{k} ∥}_{2}}{∥ w^{k} ∥_{2} + {∥ v^{k} ∥}_{2}} \leq ε$ , then stop with $(w^{k}, v^{k})$ being an output point of (6).

Theorem 9.

Let

{(w^{k}, v^{k})}

be a sequence generated by Algorithm 1. Then, the following holds:

(a): ${(w^{k}, v^{k})}$ is bounded.
(b): Any limiting point ${(w^{*}, v^{*})}$ of ${(w^{k}, v^{k})}$ is a partial minimizer of the problem (6).
(c): If Slater’s CQ holds on $Ω_{1}$ , the limiting point of ${(w^{k}, v^{k})}$ is also a strong KKT of the problem (6).

Proof.

(a) It follows from Algorithm 1 that

f (w^{k + 1}, v^{k + 1}) \leq f (w^{k + 1}, v^{k}) \leq f (w^{k}, v^{k}) .

Since f is a coercive function, then the level set of f is bounded. Thus,

{(w^{k}, v^{k})}

is bounded.

(b) Clearly,

{f (w^{k}, v^{k})}

is a decreasing sequence and

f (w^{k}, v^{k}) \geq 0

, then there exists a value

f^{*}

such that

{lim}_{k \to \infty} f (w^{k}, v^{k}) = f^{*}

. Suppose that

{(w^{*}, v^{*})}

is a limiting point of

{(w^{k}, v^{k})}

. Then, there exists a sequence

{k_{j}}

such that

{lim}_{j \to \infty} k_{j} = \infty

,

{lim}_{j \to \infty} (w^{k_{j}}, v^{k_{j}}) = (w^{*}, v^{*})

and

{lim}_{j \to \infty} f (w^{k_{j}}, v^{k_{j}}) = f (w^{*}, v^{*}) = f^{*}

, where the last equality holds since

{lim}_{k \to \infty} f (w^{k}, v^{k}) = f^{*}

. Without loss of generality, let

{lim}_{k \to \infty} (w^{k}, v^{k}) = (w^{*}, v^{*})

. It follows from Algorithm 1 that

w^{k + 1} \in arg min_{w \in Ω_{1}} f (w, v^{k}) .

Since f is continuous with respect to

w

, then taking

k \to \infty

, we have that

w^{*} \in arg min_{w \in Ω_{1}} f (w, v^{*}) .

Similarly,

v^{*} \in arg min_{v \in Ω_{2}} f (w^{*}, v) .

Thus, the limiting point

{(w^{*}, v^{*})}

of

{(w^{k}, v^{k})}

is a partial minimizer of the problem (6).

(c) Under Slater’s CQ, the partial minimizer of the problem (6) is a strong KKT point of the problem (6) and the opposite also holds. Thus, this result holds. □

5.2. The Optimization for the Partial Problem (8)

We now discuss the optimization of the partial problem (8) at the k-th iteration of the ADM. Some non-exact penalty methods and smoothing methods can be used to solve this problem. Here, we obtain

w^{k + 1}

by solving the following optimization:

\begin{matrix} min_{w \in R^{n}} & f (w, v^{k}) = ∥ ρ 1_{T} {- R w ∥}_{2} + {α ∥ w ∥}_{1} + β ∥ w - v^{k} ∥_{1} + γ | g {(w)}_{-} | \\ s . t . & w \geq 0, \end{matrix}

(18)

where

γ > 0

is a penalty parameter. Let

ψ_{μ} (t) = \{\begin{matrix} | t |, & | t | \geq μ, \\ \frac{t^{2}}{2 μ} + \frac{μ}{2}, & | t | < μ, \end{matrix} ϕ_{μ} (t) = \frac{1}{2} (t + \sqrt{t^{2} + μ}),

where

μ > 0

is a smoothing parameter. Then, a class of the smoothing optimization of problem (18) can be given as follows:

\begin{matrix} min_{w \in R^{n}} & f_{μ} (w, v^{k}) = \sqrt{∥ ρ 1_{T} {- R w ∥}_{2}^{2} + μ} + \\ α \sum_{i = 1}^{n} ψ_{μ} (w_{i}) + β \sum_{i = 1}^{n} ψ_{μ} (w_{i} - v_{i}^{k}) + γ ϕ_{μ} (- g (w)) \\ s . t . & w \geq 0 . \end{matrix}

(19)

The projection gradient method (PGM) can be used to solve the problem (19), and its iteration formula is

w^{j + 1, k} = P_{R_{+}} (w^{j, k} + η \nabla f_{μ} (w^{j, k}, v^{k})),

where

η > 0

denotes the step length at the j-th iteration of the PGM at the k-th iteration of the ADM and

P_{#} (t)

denotes the projection point of t onto #. The framework of the above method is called the penalty projection gradient method (PPGM) and can be described in Algorithm 2.

Algorithm 2 PPGM: Penalty Projection Gradient Method.

1:: Set the problem parameters: $α, κ, ρ, T, β, γ_{max} > 0$ , asset return matrix $R \in R^{T \times n}$ , and nominal expected return vector $\bar{μ} \in R^{n}$ . Initialize penalty parameters $γ^{0} > 0$ , $τ > 1$ . Set the iteration index $j = 0, 1, \dots$ .
2:: Computing $w^{j + 1, k} = P_{R_{+}} (w^{j - 1, k} + η \nabla f_{μ} (w, v^{k}))$ .
3:: If $w^{j + 1, k}$ satisfies $\frac{∥ w^{j + 1, k} - w^{j, k} ∥_{2}}{∥ w^{j, k} ∥_{2}} \leq ε$ and $g (w^{j, k}) \geq - ε$ , then stop and $w^{k} = w^{j, k}$ .
4:: If $\frac{∥ w^{j + 1, k} - w^{j, k} ∥_{2}}{∥ w^{j, k} ∥_{2}} \leq ε$ and $g (w^{j, k}) < - ε$ , then choose new penalty parameter $γ = min {τ γ, γ_{max}}$ . Otherwise, return to Step 2.

5.3. Penalty Alternating Direction Method

At the end of this section, we describe the PADM for the general problem (5). At iteration l, set the value of penalty parameter

β^{l}

and obtain

(w^{l}, v^{l})

by the ADM with

β^{l}

. If the inequality

∥ w^{l} - v^{l} ∥_{1} \leq t o l

holds, where

t o l

is a small positive constant, we stop with a feasible solution of problem (6). Otherwise, the penalty parameter

β^{l}

is updated to

β^{l + 1}

. In this way, the PADM generates a sequence of the partial minimizer of problem (6) with

β^{l}

. The framework of the PADM is formally stated in Algorithm 3.

Algorithm 3 PADM: Penalty Alternating Direction Method.

1:: Set the problem parameters: $α, κ, ρ, T, β_{max} > 0$ , asset return matrix $R \in R^{T \times n}$ , and nominal expected return vector $\bar{μ} \in R^{n}$ . Initialize penalty parameters $β^{0} > 0$ , $τ > 1$ . Set the iteration index $l = 0, 1, \dots$ .
2:: Obtain $(w^{l}, v^{l})$ by the ADM with $β^{l}$ .
3:: If $(w^{l}, v^{l})$ satisfies $∥ w^{l} - v^{l} ∥_{1} \leq t o l$ , then stop with $(w^{l}, v^{l})$ . Otherwise, choose new penalty parameter $β^{l + 1} = min {τ β^{l}, β_{max}}$ , and return to Step 2.

6. Numerical Results

This section shows extensive numerical experiments. In Section 6.1, we first present six real data sets, explain some existing models to be compared with and describe the performance measures to be used. In Section 6.2, we demonstrate that our methods lead to robust and sparse portfolios. In Section 6.3, we compare nine popular portfolios in terms of out-of-sample (OOS) performance measures. Finally, in Section 6.4, we show the cumulative return of different portfolio strategies. All of our computations are conducted in the Matlab R2019a environment, on a PC with an Intel(R) Core(TM) i5-7200U CPU (2.50 GHz, 4 CPUs) and 4G RAM processors.

6.1. Models of Comparison, Data, and Performance Measures

(a) Eleven portfolio models compared. We compare the OOS performance of 11 portfolio models across six real data sets of weekly and monthly returns. Those models are well studied, and we divide them into four groups, which are summarized in Table 1. The first group is the robust and sparse portfolio strategies developed in this paper. The second group includes some well-studied portfolio strategies. The third group includes three benchmark portfolio strategies. The last group consists of two portfolios that use the shrinkage technique to estimate the covariance matrix.

(b) Seven data sets tested. Table 2 lists some real-world data sets: DJIA [42], NASDAQ [43], S&P [44,45], Russell2000 [46], Russell3000 [47], and FF100 [38]. All the data are obtained from Yahoo finance (https://finance.yahoo.com/, accessed on 10 January 2023) and Ken French’s website (https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html, accessed on 10 January 2023). In all cases, we remove those assets that have missing values.

(c) Measuring the OOS performance and its setup. We largely follow the “rolling-window” procedures in [2,37] to conduct our comparison. Let T be the length of a data set and

τ

be the window length (e.g.,

τ = 120

) used to construct the optimal portfolio by a model. In each period

(t + 1)

,

t = τ, . . ., T - 1

, we compute different portfolios over the previous

τ

periods. We then compute the OOS return in the

(t + 1)

-th period based on the obtained portfolio. We repeat this procedure until we reach the end of the data set. In this way, we will obtain a series of

(T - τ)

portfolio vectors for each model listed in Table 1. To make it precise, let

w_{t}^{s}

be the optimal portfolio obtained by the portfolio strategy s over the date from

t - τ + 1, \dots, t

. The OOS return in the

t + 1

period is computed as

r_{t + 1}^{s} = {w_{t}^{s}}^{⊺} r_{t + 1}

, where

r_{t + 1}

is the return in the

(t + 1)

-th period. Thus, we obtain a time series of

(T - τ - 1)

periods OOS returns for all strategies. Note that we use the traditional “rolling-window” procedures for the numerical analysis, and some new methods could provide new ideas for the analysis of portfolio selection problems, see [48].

The OOS performance of each portfolio strategy is assessed by using four quantities: (i) the OOS portfolio variance (

{\hat{σ}}^{2}

), (ii) the OOS portfolio Sharpe ratio (

\hat{S R}

), (iii) portfolio turnover (

T U R N

), and (iv) the average short positions (

A S P

). The specific definitions can be found in DeMiguel et al. [6], Yen and Yen [38], and Zhao et al. [37]. We evaluate the cumulative return (CR). The CR of a portfolio scores the total payoffs that are yielded by the investment strategy across the investment periods without considering any risk or cost, see Shen et al. [49]. We also consider some quantities studied in [38] on the profiles of the portfolio weights: PAP represents the proportion of active positions and PZP is the proportion of zero positions, respectively, defined as

P A P_{t} = \frac{| S_{t}^{1} |}{N}

,

P Z P_{t} = \frac{| S_{t}^{0} |}{N}

, where

S_{t}^{1} = {i : w_{i, t} \neq 0}

and

S_{t}^{0} = {i : w_{i, t} = 0}

.

6.2. Robust and Sparse Portfolio

This section shows the weight of robust and sparse portfolios. We use the DJIA data set and the sparse levels

s_{1} = ⌈ 30 % n ⌉

and

s_{2} = ⌈ 50 % n ⌉

. The parameter

α = β = 10 λ

, and the value of

λ

varies from

10^{- 2}

to

10^{1}

.

Figure 1 shows the portfolio weights, PAP, and PZP. The two plots in the top panel correspond to a robust portfolio under the quadratic uncertainty set, and the sparsity is

s_{1}

. The two plots in the bottom panel correspond to a robust portfolio under the absolute uncertainty set, and the sparsity is

s_{2}

. With the increase of penalty parameter

λ

, the portfolio weights tended to be sparse. The PAP and PZP indicate that we can obtain sparse portfolios that satisfy the specified sparsity.

Figure 2 shows the sparse portfolio. We use four different data sets. The sparsity level on DJIA is

s_{1} = ⌈ 30 % n ⌉

, on NASDAQ and FF100 is

s_{2} = ⌈ 10 % n ⌉

, and on Russell2000 is

s_{3} = ⌈ 1 % n ⌉

. We solve the robust portfolio under the quadratic uncertainty set to show the results. We obtain the portfolio with the specified sparsity and the distribution of different asset weight values.

6.3. Out-of-Sample Performance

The Sharpe ratio considers return and risk at the same time; it is a comprehensive measurement for us to observe the performance of a portfolio. Thus, we first test the Sharpe ratio of different portfolio strategies. We use the SP100 data set. The parameter

α = 2 β

and the value of

β

varies from 10 to 10

^{1.5}

. The sparsity level

s_{1} = ⌈ 15 % n ⌉

and

s_{2} = ⌈ 5 % n ⌉

.

By comparing with two benchmark portfolios, Figure 3 shows that the RSQ and RSA can produce a higher Sharpe ratio when choosing a suitable penalty parameter.

Table 3 reports the OOS performance by using four quantities defined in Section 6.1. We set

α = β = 10

and the sparsity level

s_{1} = = ⌈ 30 % n ⌉

(on the DJIA, NASDAQ, SP500, and FF100 data sets) and

s_{2} = = ⌈ 30 % n ⌉

(on the Russell2000 and Russell3000 data sets). We can observe that the RSA and RSQ portfolios achieve the smallest variances across all portfolio strategies, i.e., on average with

10.84 {(%)}^{2}

and

11.09 {(%)}^{2}

, respectively. This means they are less volatile, i.e., less risky. SU, SC1F, and SCID have the highest variance on average,

995.73 {(%)}^{2}

,

442.81 {(%)}^{2}

and

404.20 {(%)}^{2}

in this setting. The variance of the remaining portfolio strategies is

11.94 {(%)}^{2}

(L1),

11.98 {(%)}^{2}

(EN),

16 {(%)}^{2}

(L12),

27.78 {(%)}^{2}

(SC),

28.71 {(%)}^{2}

(EW), and

71.11 {(%)}^{2}

(Box), respectively. In addition, we observe that the Sharpe ratios of the various portfolios on average are 12.34% (SC), 11.82% (EW), 11.77% (RSA), 11.70% (RSQ), 11.61% (L12), 11.21% (EN), 11.18% (L1), 10.58% (SCID), 10.09% (SC1F), 9.23% (Box), and 7.92% (SU). We see that the RSA and RSQ portfolios do not result in a significantly different OOS Sharpe ratio when compared with SC and EW; however, they are higher than the rest of the portfolio strategies.

As for the portfolio turnover, unsurprisingly, the EW portfolio strategy exhibit the lowest turnover of all portfolio strategies, amounting to 3.20%. The RSA and RSQ portfolio strategies have moderate levels of turnover on average, 13.49% and 13.20%. The highest average turnover is generated by the SU portfolio and, then, by the Box portfolio, amounting on average to 225.81% and 193.10%, meaning that they are very costly. The turnover of the remaining portfolio strategies range between 11.85% (L12), 25.76% (L2), 16.66% (L1), and 13.98% (EN), respectively. The high turnover of SU and Box was reflect in the enormous average short positions of over 283.52% and 333.52% on average across the six data sets. The second two highest average short positions are by SCID and SC1F, respectively, amounting to 164.29% and 132.81%. The average short positions of the SC and EW portfolios are on average approximately 0% across the six data sets. The average short positions of the RSQ and RSA portfolio strategies also tend to zero. Therefore, considering the moderate turnover and the average short positions, the proposed RSQ and RSA strategies represent a practically implementable method that outperform the portfolio strategies listed in Table 1.

6.4. Cumulative Return

In this subsection, we show the CR of several portfolio strategies. We use the FF100 data set. The sparsity level

s = ⌈ 10 % n ⌉

. The parameter

α = β = 10

. According to the OOS performance, we choose RSQ, RSA, L12, L1, EN, EW, and SC to compare the CR.

Figure 4 shows the curves of the CR over the corresponding investment periods for the different portfolio strategies. Apparently, RSQ and RSA outperform the others with visible margins. However, RSA and RSQ do not produce significant differences. This result suggest that, compared with the other portfolios, the sparse portfolios RSA and RSQ grow more steadily together with a reduced volatility across most of the investment periods.

7. Conclusions

Portfolio selection has been a fertile area for robust optimization techniques. We proposed a robust and sparse portfolio selection optimization model by considering the perturbation in the asset return matrix and the parameter uncertainty in the expected asset return. We used the equivalence of robustness and regularization to deal with the perturbation in the asset return matrix. To deal with the uncertainty in the expected asset return, we considered two kinds of uncertainty sets and solved the worst-case scenario. We defined three types of stationary points of the penalty problem and then analyzed the relationship between these stationary points and local/global minimizers. Then, we designed the penalty alternating direction method to solve each problem. Although there is no theoretical guarantee for the equivalence between problems (5) and (6), as well as problems (8) and (18), we confirmed the effectiveness of our approach by comparing with nine widely studied portfolio models on seven real-world data sets. Extensive numerical experiments confirm that the portfolios we proposed have lower volatility, that is less risk. Moreover, our portfolio strategies can yield higher Sharpe ratios when the appropriate parameters are selected.

We note that the robust optimization (RO) mainly consider the uncertainty sets of parameters and thus it do not consider any distribution information of the data. This characteristic makes RO attractive, but at the same time, this method loses the comprehensive characterization of the data. Recently, distributed robust optimization (DRO) has attracted widespread attention and research. Although DRO takes into account the distribution information of the data, the cost paid is that it is difficult to solve. We will consider how to apply DRO to sparse portfolio problems, while considering the distribution information of financial data and improving the sparsity. The most direct extension is the distributed robust portfolio optimization with the

ℓ_{0}

norm constraint, which is a worthwhile and challenging issue.

Author Contributions

Conceptualization, H.Z.; methodology, H.Z. and Y.J.; software, H.Z., Y.J., and Y.Y.; formal analysis, H.Z. and Y.J.; writing—original draft preparation, H.Z.; writing—review and editing, H.Z., Y.J., and Y.Y.; visualization, H.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data source has been presented in Section 6 of the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Markowitz, H. Portfolio selection. J. Financ. 1952, 7, 142–149. [Google Scholar]
Brodie, J.; Daubechies, I.; De Mol, C.; Giannone, D.; Loris, I. Sparse and stable markowitz portfolios. Proc. Natl. Acad. Sci. USA 2009, 106, 12267–12272. [Google Scholar] [CrossRef] [PubMed]
Ledoit, O.; Wolf, M. Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. J. Empir. Financ. 2003, 10, 603–621. [Google Scholar] [CrossRef]
Ledoit, O.; Wolf, M. Nonlinear shrinkage of the covariance matrix for portfolio selection: Markowitz meets goldilocks. Rev. Financ. Stud. 2017, 30, 4349–4388. [Google Scholar] [CrossRef]
Jagannathan, R.; Ma, T. Risk reduction in large portfolios: Why imposing the wrong constraints helps. J. Financ. 2003, 58, 1651–1683. [Google Scholar] [CrossRef]
DeMiguel, V.; Garlappi, L.; Nogales, F.J.; Uppal, R. A generalized approach to portfolio optimization: Improving performance by constraining portfolio norms. Manag. Sci. 2009, 55, 798C812. [Google Scholar] [CrossRef]
Ben-Tal, A.; Nemirovski, A.; Roos, C. Robust solutions of uncertain quadratic and conic-quadratic problems. SIAM J. Optim. 2002, 13, 535–560. [Google Scholar] [CrossRef]
El Ghaoui, L.; Oustry, F.; Lebret, H. Robust solutions to uncertain semidefinite programs. SIAM J. Optim. 1998, 9, 33–52. [Google Scholar] [CrossRef]
Lee, Y.; Kim, M.J.; Kim, J.H.; Jang, J.R.; Chang, K.W. Sparse and robust portfolio selection via semi-definite relaxation. J. Oper. Res. 2020, 71, 687–699. [Google Scholar] [CrossRef]
Swain, P.; Ojha, A.K. Robust approach for uncertain portfolio allocation problems under box uncertainty. In Recent Trends in Applied Mathematics: Select Proceedings of AMSE; Springer: Singapore, 2019; pp. 347–356. [Google Scholar]
Goldfarb, D.; Iyengar, G. Robust portfolio selection problems. Math. Oper. Res. 2003, 28, 1–38. [Google Scholar] [CrossRef]
Lobo, M.S.; Boyd, S. The Worst-Case Risk of a Portfolio. Unpublished Manuscript. Available online: http://faculty.fuqua.duke.edu/(2000)%7Emlobo/bio/researchfiles/rsk-bnd.pdf (accessed on 13 July 2022).
Min, L.; Dong, J.; Liu, J.; Gong, X. Robust mean-risk portfolio optimization using machine learning-based trade-off parameter. Appl. Soft Comput. 2021, 113, 107948. [Google Scholar] [CrossRef]
Won, J.H.; Kim, S.J. Robust trade-off portfolio selection. Optim. Eng. 2020, 21, 867–904. [Google Scholar] [CrossRef]
Fabozzi, F.J.; Huang, D.; Zhou, G. Robust portfolios: Contributions from operations research and finance. Ann. Oper. Res. 2010, 176, 191–220. [Google Scholar] [CrossRef]
Scutella, M.G.; Recchia, R. Robust portfolio asset allocation and risk measures. Ann. Oper. Res. 2013, 204, 145–169. [Google Scholar] [CrossRef]
Xidonas, P.; Steuer, R.; Hassapis, C. Robust portfolio optimization: A categorized bibliographic review. Ann. Oper. Res. 2020, 292, 533–552. [Google Scholar] [CrossRef]
Ghahtarani, A.; Saif, A.; Ghasemi, A. Robust portfolio selection problems: A comprehensive review. Oper. Res. 2022, 22, 3203–3264. [Google Scholar] [CrossRef]
Leyffer, S.; Menickelly, M.; Munson, T.; Vanaret, C.; Wild, S.M. A survey of nonlinear robust optimization. INFOR: Inf. Syst. Oper. Res. 2020, 58, 342–373. [Google Scholar] [CrossRef]
Zhao, Z.; Xu, F.; Du, D.; Meihua, W. Robust portfolio rebalancing with cardinality and diversification constraints. Quant. Financ. 2021, 21, 1707–1721. [Google Scholar] [CrossRef]
Tütüncü, R.H.; Koenig, M. Robust asset allocation. Ann. Oper. Res. 2004, 132, 157–187. [Google Scholar] [CrossRef]
Khodamoradi, T.; Salahi, M.; Najafi, A.R. Robust CCMV model with short selling and risk-neutral interest rate. Phys. A Stat. Mech. Its Appl. 2020, 547, 124429. [Google Scholar] [CrossRef]
Fabozzi, F.J.; Kolm, P.N.; Pachamanova, D.A.; Focardi, S.M. Robust portfolio optimization. J. Portf. Manag. 2007, 33, 40. [Google Scholar] [CrossRef]
Pınar, M.Ç. On robust mean-variance portfolios. Optimization 2016, 65, 1039–1048. [Google Scholar] [CrossRef]
Busse, J.A.; Chordia, T.; Jiang, L.; Tang, Y. Transaction costs, portfolio characteristics, and mutual fund performance. Manag. Sci. 2021, 67, 1227–1248. [Google Scholar] [CrossRef]
Hautsch, N.; Voigt, S. Large-scale portfolio allocation under transaction costs and model uncertainty. J. Econom. 2019, 212, 221–240. [Google Scholar] [CrossRef]
Yu, J.R.; Chiou, W.J.P.; Lee, W.Y.; Lin, S.J. Portfolio models with return forecasting and transaction costs. Int. Rev. Econ. Financ. 2020, 66, 118–130. [Google Scholar] [CrossRef]
Mordukhovich, B.S.; Nguyen, M.M. An Easy Path to Convex Analysis and Applications; Morgan and Claypool Publishers Series: San Rafael, CA, USA, 2014. [Google Scholar]
Pan, L.; Xiu, N.; Fan, J. Optimality conditions for sparsity nonlinear programming. Sci. China Math. 2017, 60, 759–776. [Google Scholar] [CrossRef]
Rockafellar, R.T.; Wets, R.J. Variational Analysis; Springer: Berlin/Heidelberg, Germany, 1998. [Google Scholar]
Bertsimas, D.; Dunn, J. Machine Learning under a Modern Optimization Lens; Dynamic Ideas LLC: Charlestown, MA, USA, 2019. [Google Scholar]
Beck, A. First-Order Methods in Optimization; Society for Industrial and Applied Mathematics and Mathematical Optimization Society: Philadelphia, PA, USA, 2017. [Google Scholar]
Costa, C.M.; Kreber, D.; Schmidta, M. An alternating method for cardinality-constrained optimization: A computational study for the best subset selection and sparse portfolio problems. Informs J. Comput. 2022, 34, 2968–2988. [Google Scholar] [CrossRef]
Heckel, T.; de Carvahlo, R.; Lu, X.; Perchet, R. Insights into robust optimization: Decomposing into mean-variance and risk-based portfolios. J. Invest. Strateg. 2016, 6, 1–24. [Google Scholar] [CrossRef]
Yin, C.; Perchet, R.; Soupé, F. A practical guide to robust portfolio optimization. Quant. Financ. 2021, 21, 911–928. [Google Scholar] [CrossRef]
Geissler, B.; Morsi, A.; Schewe, L.; Schmidt, M. Penalty alternating direction methods for mixed-integer optimization: A new view on feasibility pumps. SIAM J. Optim. 2017, 27, 1611–1636. [Google Scholar] [CrossRef]
Zhao, H.; Kong, L.; Qi, H.D. Optimal portfolio selections via ℓ₁₂-norm regularization. Comput. Optim. Appl. 2021, 80, 853–881. [Google Scholar] [CrossRef]
Yen, Y.M.; Yen, T.J. Solving norm constrained portfolio optimization via coordinate-wise descent algorithms. Comput. Stat. Data Anal. 2014, 76, 737–759. [Google Scholar] [CrossRef]
Behr, P.; Guettler, A.; Miebs, F. On portfolio optimization: Imposing the right constraints. J. Bank. Financ. 2013, 37, 1232–1242. [Google Scholar] [CrossRef]
DeMiguel, V.; Garlappi, L.; Uppal, R. Optimal versus naive diversification: How in-efficient is the 1/n portfolio strategy? Rev. Financ. Stud. 2009, 22, 1915–1953. [Google Scholar] [CrossRef]
Ledoit, O.; Wolf, M. A well-conditioned estimator for large-dimensional covariance matrices. J. Multivar. Anal. 2004, 88, 365–411. [Google Scholar] [CrossRef]
Lai, Z.R.; Yang, P.Y.; Fang, L.; Wu, X. Short-term sparse portfolio optimization based on alternating direction method of multipliers. J. Mach. Learn. Res. 2018, 19, 2547–2574. [Google Scholar]
Chou, R.K.; Chung, H. Decimalization, trading costs, and information transmission between etfs and index futures. J. Futur. Mark. 2006, 26, 131–151. [Google Scholar] [CrossRef]
Kan, R.; Wang, X.; Zhou, G. Optimal portfolio choice with estimation risk: No risk-free asset case. Manag. Sci. 2022, 68, 2047–2068. [Google Scholar] [CrossRef]
Mutunge, P.; Haugl, D. Minimizing the tracking error of cardinality constrained portfolios. Comput. Oper. Res. 2018, 90, 33–41. [Google Scholar] [CrossRef]
Fan, J.; Zhang, J.; Yu, K. Vast portfolio selection with gross-exposure constraints. J. Am. Stat. Assoc. 2012, 107, 592–606. [Google Scholar] [CrossRef]
Teng, Y.; Yang, L.; Yu, B.; Song, X. A penalty PALM method for sparse portfolio selection problems. Optim. Methods Softw. 2017, 32, 126–147. [Google Scholar] [CrossRef]
Wang, Y.; Gao, S.; Yu, Y.; Cai, Z.; Wang, Z. A gravitational search algorithm with hierarchy and distributed framework. Knowl.-Based Syst. 2021, 218, 106877. [Google Scholar] [CrossRef]
Shen, W.; Wang, J.; Ma, S. Doubly regularized portfolio with risk minimization. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27–31 July 2014. [Google Scholar]

Figure 1. Portfolio weights.

Figure 2. Sparse solutions.

Figure 3. The Sharpe ratio.

Figure 4. The cumulative return.

Table 1. List of portfolio strategies considered.

Group	Model	Abbr.	Refer.
(1)	Robust and sparse portfolios with
	quadratic uncertainty set	RSQ	this paper
	absolute uncertainty set	RSA	this paper
(2)	Some well-studied portfolio strategies with
	$ℓ_{1}$ regularization	L1	Brodie et al. [2]
	$ℓ_{1, 2}$ regularization	L12	Zhao et al. [37]
	Elastic Net regularization	EN	Yen and Yen [38]
	upper and lower bound	Box	Behr et al. [39]
(3)	Benchmarks’ portfolio strategies with
	short-sales constrained	SC	Jagannathan and Ma [5]
	short-sales unconstrained	SU	Jagannathan and Ma [5]
	equally weighted (1/N) portfolio	EW	DeMiguel et al. [40]
(4)	Shrinkage of covariance
	sample covariance and identity matrix	SCID	Olivier and Wolf [41]
	sample covariance and 1-factor matrix	SC1F	Olivier and Wolf [3]

Table 2. Information of the seven real data sets.

#	Data Sets	Stocks	Time Period	Source	Frequency
1	DJIA	29	01/10/2017–30/10/2022	Yahoo finance	Weekly
2	NASDAQ	95	01/10/2017–30/10/2022	Yahoo finance	Weekly
3	SP500	336	01/10/2017–30/10/2022	Yahoo finance	Weekly
4	Russell2000	1340	01/10/2017–30/10/2022	Yahoo finance	Weekly
5	Russell3000	2166	01/10/2017–30/10/2022	Yahoo finance	Weekly
6	SP100	71	01/10/2017–30/10/2022	Yahoo finance	Weekly
7	FF100	100	11/1999–06/2022	K.French	Monthly

Table 3. Portfolio out-of-sample variance (

{\hat{σ}}^{2}

) (

{(%)}^{2}

), Sharpe ratio (

\hat{S R}

), turnover (

T U R N

), and the average short positions (

A S P

).

Table 3. Portfolio out-of-sample variance (

{\hat{σ}}^{2}

) (

{(%)}^{2}

), Sharpe ratio (

\hat{S R}

), turnover (

T U R N

), and the average short positions (

A S P

).

		DJIA	NASDAQ	SP500	Russell2000	Russell3000	FF100
		n = 29	n = 95	n = 336	n = 1340	n = 2166	n = 100
	var	5.3741	5.9065	11.5604	12.1813	12.7837	17.2215
RSA	SR	0.0703	0.1807	0.1008	0.0063	0.0267	0.3215
	TURN	0.1194	0.1721	0.1430	0.1033	0.1021	0.1698
	ASP	−1.11e-18	2.22e-18	0	−3.08e-18	−4.01e-18	−6.28e-18
	var	5.3949	5.8633	12.5936	12.7041	12.7702	17.2097
RSQ	SR	0.0737	0.1889	0.0863	0.0059	0.0258	0.3216
	TURN	0.1063	0.1711	0.1334	0.1112	0.1003	0.1698
	ASP	−6.66e-18	−3.33e-18	−1.43e-17	−9.25e-18	−9.25e-18	0
	var	9.5931	7.1442	32.1907	14.8927	13.7879	18.3918
L12	SR	0.0860	0.1768	0.1000	0.0072	0.0295	0.2971
	TURN	0.0220	0.0286	0.0369	0.0675	0.0605	0.0296
	ASP	−0.0216	−0.0209	−0.0240	−0.0057	−0.0050	−0.0276
	var	8.0373	6.7078	13.1228	14.2679	13.0166	16.4937
L1	SR	0.0864	0.1831	0.0414	0.0042	0.0279	0.3279
	TURN	0.1063	0.0677	0.0369	0.1250	0.1136	0.0629
	ASP	−0.0036	−0.0012	−0.0124	−0.0039	0.0020	−0.0195
	var	8.5871	6.5362	12.9449	14.2074	13.0817	16.5823
EN	SR	0.0911	0.1790	0.0418	0.0029	0.0402	0.3181
	TURN	0.0256	0.0472	0.0408	0.1208	0.1181	0.0510
	ASP	−0.0257	−0.0198	−0.0272	0.0015	0.0011	−0.0257
	var	9.9181	8.7749	3.59e+02	7.3636	7.0004	34.5878
BOX	SR	0.0250	0.0768	−0.1204	−0.0495	−0.0023	0.6244
	TURN	0.8322	1.7146	3.2709	0.2820	0.2973	5.1895
	ASP	1.1172	2.7850	6.5137	0.5744	0.5886	8.4327
	var	10.0132	8.0089	86.1291	24.3110	16.9741	21.2942
SC	SR	0.0871	0.1891	0.1202	0.0233	0.0399	0.2812
	TURN	0.0388	0.0313	0.0411	0.0512	0.0481	0.0247
	ASP	1.38e-16	1.23e-16	−1.52e-16	3.12e-16	−4.19e-16	0
	var	9.9181	15.0721	5.91e+03	12.1721	9.9999	17.2369
SU	SR	0.0250	0.1800	−0.1270	−0.0482	−0.0167	0.4624
	TURN	0.8322	2.7947	3.9150	0.3919	0.3722	5.2429
	ASP	1.1172	2.5580	5.8870	0.5542	0.5494	6.3456
	var	11.1339	8.1780	89.8252	21.3370	19.5742	22.2346
EW	SR	0.0701	0.1790	0.1151	0.0194	0.0337	0.2922
	TURN	0.0208	0.0253	0.0400	0.0429	0.0381	0.0252
	ASP	1.13e-16	1.13e-16	−1.12e-16	4.52e-16	−3.39e-16	0
	var	6.9600	6.9782	2.38e+03	7.3676	7.0304	16.8727
SCID	SR	0.0294	0.1097	−0.1259	−0.0497	−0.0025	0.6740
	TURN	0.4115	0.8589	0.9940	0.2803	0.2884	1.5393
	ASP	0.6362	1.6735	2.8275	0.5727	0.5871	3.5605
	var	6.2670	6.5347	2.6140e+03	7.3790	7.0367	15.6203
SC1F	SR	0.0380	0.0902	−0.1253	−0.0510	−0.0034	0.6570
	TURN	0.3032	1.1648	0.8972	0.2964	0.2962	1.5858
	ASP	0.4690	1.3556	2.1771	0.5722	0.5863	2.8086

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, H.; Jiang, Y.; Yang, Y. Robust and Sparse Portfolio: Optimization Models and Algorithms. Mathematics 2023, 11, 4925. https://doi.org/10.3390/math11244925

AMA Style

Zhao H, Jiang Y, Yang Y. Robust and Sparse Portfolio: Optimization Models and Algorithms. Mathematics. 2023; 11(24):4925. https://doi.org/10.3390/math11244925

Chicago/Turabian Style

Zhao, Hongxin, Yilun Jiang, and Yizhou Yang. 2023. "Robust and Sparse Portfolio: Optimization Models and Algorithms" Mathematics 11, no. 24: 4925. https://doi.org/10.3390/math11244925

APA Style

Zhao, H., Jiang, Y., & Yang, Y. (2023). Robust and Sparse Portfolio: Optimization Models and Algorithms. Mathematics, 11(24), 4925. https://doi.org/10.3390/math11244925

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Robust and Sparse Portfolio: Optimization Models and Algorithms

Abstract

1. Introduction

2. Notations and Preliminary

3. Model and Optimization Theory

3.1. Robust and Sparse Portfolio Model

3.2. Optimization Theory

4. The Uncertainty Set U

4.1. The Quadratic Uncertainty Set

4.2. The Absolute Uncertainty Set

5. Optimization

5.1. Alternating Direction Methods

5.2. The Optimization for the Partial Problem (8)

5.3. Penalty Alternating Direction Method

6. Numerical Results

6.1. Models of Comparison, Data, and Performance Measures

6.2. Robust and Sparse Portfolio

6.3. Out-of-Sample Performance

6.4. Cumulative Return

7. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI