Sharpe-Ratio Portfolio in Controllable Markov Chains: Analytic and Algorithmic Approach for Second Order Cone Programming

Lesly Lisset Ortiz-Cerezo; Alin Andrei Carsteanu; Julio Bernardo Clempner

doi:10.3390/math10183221

Abstract

The Sharpe ratio is a measure based on the theory of mean variance, it is the measure of the performance of a portfolio when the risk can be measured through the standard deviation. This paper suggests a Sharpe-ratio portfolio solution using a second order cone programming (SOCP). We use the penalty-regularized method to represent the nonlinear portfolio problem. We present a computationally tractable way to determining the Sharpe-ratio portfolio. A Markov chain structure is employed to represent the underlying asset price process. In order to determine the optimal portfolio in Markov chains, a new hybrid optimization programming method for SOCP is proposed. The suggested method’s efficiency and efficacy are demonstrated using a numerical example.

Keywords:

portfolio; Sharpe ratio; Markowitz; fractional programming; Markov chains; optimization

MSC:

91G10; 60J20; 60J22

1. Introduction

1.1. Brief Review

Markowitz [1] proposed the mean-variance portfolio (MVP) which is a fundamental contribution in the field of finance; currently, models based on it continue to be developed. In its most basic version, it assumes n risky assets. Their returns over the period are modeled as a random vector

Z = {(Z_{i})}_{j = 1, \dots, n} \in R^{n}

such that

μ = E [Z]

represents the mean and

Σ = {(\sum E [{(Z - μ)}^{⊺} (Z - μ)])}^{1 / 2}

is the covariance, where

E

denotes the expectation operator. The mathematical formulation’s decision variable is

ω = {(ω_{i})}_{i = 1, \dots, n}

, which indicates the percentage of the available budget invested in asset i. If

ω_{i} \geq 0

, it means that short selling is not allowed.

The return of a portfolio

ω = {(ω_{i})}_{i = 1, \dots, n} \in R^{n}

is a scalar (random variable) given by

ω^{T} Z = \sum_{i = 1}^{n} ω_{i} Z_{i}

, then the mean return of

ω

is

ω^{T} μ = ω^{T} E [Z]

, and the risk, measured by the variance, is given by

ω^{T} Σ ω

. We suppose that an admissible portfolio

ω = {(ω_{i})}_{i = 1, \dots, n} \in R^{n}

is restricted to being contained within a closed convex set

W_{a d m} \subseteq R^{n}

.

The selection of a portfolio is a risk–return trade-off. The minimum variance MVP issue is used to define the optimal trade-off as

\begin{matrix} ω^{T} Σ ω \to \underset{ω \in W_{a d m}}{m i n i m i z e} \\ s . t . \\ ω^{T} μ \geq r \end{matrix}

where r is the minimum required expected rate of return,

W_{a d m} = \{ω |ω^{T} e = 1 : ω_{i} \geq 0, i = 1, \dots, n\}

and e is a vector of size n whose components are ones. In this problem, we determine the portfolio that minimizes the risk while still meeting the asset allocation and portfolio budget constraints.

The objective of the Markowitz framework is to achieve a balance between the average return of the portfolio and its risk, which is measured by the variance, that is, we look for the highest return with the lowest risk that may exist among all the possibilities.

Another type of mean-variance analysis called the risk-adjusted expected return is expressed as

\begin{matrix} ω^{T} μ - λ ω^{T} Σ ω \to \underset{ω \in W_{a d m}}{m a x i m i z e} \\ s . t . \\ ω^{T} e = 1 \end{matrix}

The dual objectives of this formulation are to maximize the portfolio expected return while minimizing variance, where

λ

is a risk-aversion coefficient determined by the investors, and

ω^{T} e = 1

is the capital budget constraint.

Let us denote the function

Ψ

of the Pareto frontier

Ψ (ω, σ, μ) = sup_{ω \in W_{a d m}} inf_{σ} ω^{T} μ

where the trajectory of the optimal solution defines a concave curve increasing over

σ = {(ω^{T} Σ ω)}^{1 / 2}

(standard deviation) for which

σ = inf \{{(ω^{T} Σ ω)}^{1 / 2}| sup_{ω \in W_{a d m}} ω^{T} μ, ω \in W_{a d m}\}

The best risk–return trade-off of the assets

{(Z_{i})}_{j = 1, \dots, n}

is found in the strictly concave section of the curve, which is known as the efficient frontier. A portfolio

ω

is efficient if for any other portfolio

ϱ

having the same expected return, its variance satisfies

ω^{T} Σ ω \leq ϱ^{T} Σ ϱ

. The performance of a portfolio

ω

with a uncertainty model is characterized by the set of return–risk pairing computed using the variables across the set

P (σ, u) = \{(σ, u) \in R^{2} |σ = inf {(ω^{⊺} Σ ω)}^{1 / 2}, u = sup_{ω \in W_{a d m}} ω^{T} μ\}

If there is no asset allocation restriction (except for the portfolio budget constraint), the two-fund theorem states that the efficient frontier is a hyperbola and that every efficient portfolio can be two-fold in terms of the mean and the variance as a combination of these two efficient funds (portfolios) [2]. To obtain the efficient frontier of a portfolio, the average return is needed, which is given by (

ω^{⊺} μ

) and the standard deviation given by (

{(ω^{⊺} Σ ω)}^{1 / 2}

). The efficient frontier can be computed using the Sharpe-ratio maximization [3,4]

\begin{matrix} \frac{ω^{⊺} μ - r_{f}}{{(ω^{⊺} Σ ω)}^{1 / 2}} \to \underset{ω \in W_{a d m}}{m a x i m i z e} \\ s . t . \\ ω^{⊺} e = 1 \end{matrix}

(1)

where

r_{f}

is the return of a risk-free asset (a risk-free asset is typically regarded to have no risk or variance). The Sharpe-ratio maximization problem is named from the aim of Equation (1), which measures the excess of return (

ω^{T} μ - r_{f}

) normalized by the standard deviation (

{(ω^{⊺} Σ ω)}^{1 / 2}

).

The efficient frontier is defined as the locations on the frontier of the portfolio with expected returns greater than the expected return of the portfolio with the smallest variance. The portfolio with the least variance among all portfolios is the most efficient. All portfolios in the efficient frontier are optimal according to the risk profile of the investor, which is the choice of parameters.

1.2. Related Work

The Markowitz [1] single-period mean-variance portfolio is defined as a model that maximizes the terminal wealth while minimizing risk using variance as a criterion. The idea is to enable an investor to seek the maximum potential return by determining a risk tolerance threshold. Trends, which are seen as inclinations of securities to move in a specific manner over time, control the markets. Mathematical models, which detect patterns when the price meets support and resistance levels throughout time, are used by investors to forecast securities movements [5].

We structure the Markowitz mean-variance portfolio as a system whose variables are represented by a discrete-time Markov chain to solve these challenges. We look at a specific type of discrete-time Markov mean-variance portfolio model and find the portfolio strategy that minimizes total risk, given a fixed anticipated return.

The Sharpe ratio is a measure of portfolio performance as long as the risk can be adequately measured through the standard deviation, and it is usually effective for normally distributed returns. However, there are works, such as Zakamouline and Koekebakker et al. [6] in which they generalized the evaluation of portfolio performance using the Sharpe ratio. Lu and Li et al. [7] with their study identified a theoretical reasonable value of the Sharpe ratio; to this end, they proposed a formula to estimate an expected value for the Sharpe ratio, bounding it in the option pricing model. Kourtis et al. [8] provided how to assess the value of efficient portfolios. Portfolio optimization for Markov chains with restrictions has a significant body of work. See these articles for a survey of the impact of transaction costs on portfolio optimization [9,10]. Portfolio optimization for Markov chains with restrictions has a significant body of work. Sanchez et al. [11] proposed a novel mean-variance customer portfolio optimization approach for a class of ergodic-finite controllable Markov chains, according to citation percent. Sanchez et al. [12] built on the work of [11] by presenting a recurrent reinforcement-learning strategy for controlled Markov chains that adapts policies based on preprocessing and an actor–critic architecture. Clempner and Poznyak [13] investigated the applicability of the penalty regularized expected utilities approach for solving the mean-variance Markowitz customer portfolio optimization issue. In controlled partially observable Markov decision processes, Asiain et al. [14] presented a reinforcement-learning method for calculating the customer portfolio with transaction costs. Garcia-Galicia et al. [15] looked at a continuous-time portfolio strategy for continuous-time discrete-state Markov decision processes with transaction costs requiring temporal penalization. Using the extraproximal technique confined to a finite discrete temporal, ergodic, and controlled Markov chains, Dominguez and Clempner [16] solved the multi-period mean-variance customer-constrained Markowitz’s portfolio optimization issue. Garcia-Galicia et al. [17] looked at policy optimization in the context of continuous-time reinforcement learning for financial portfolio management, where the underlying asset portfolio process is assumed to have a continuous-time discrete-state Markov chain structure with simplex and ergodicity constraints. The portfolio problem’s purpose is to redistribute a fund across various financial assets. Meghwani and Thakur [18] developed a tri-objective portfolio optimization model with risk, return, and transaction cost as the objectives, as well as a method for successfully handling equality constraints. Vazquez and Clempner [19] developed a portfolio technique based on a Lagrangian regularization method. The literature differs depending on whether you use continuous or discrete time, a finite or infinite horizon, and so on [20,21,22,23,24,25,26].

Tikhonov’s regularization has gained a lot of interest in application sectors [27,28]. It is one of the most prominent ways to solve discrete ill-posed minimization problems. The use of Tikhonov’s regularization to create successful algorithms is still a developing topic. To solve the Markowitz MV portfolio model, for example, several strategies based on Tikhonov’s regularization have been devised [11,12,13,19,29,30].

1.3. Main Results

This paper proposes a solution to the Sharpe-ratio portfolio optimization issue, which is based on a market model and allows for the formulation of risk reduction, security returns, and performance assessment. We assume that securities trading occurs in discrete time steps. We assume that the financial market is arbitrage free, meaning that no arbitrage portfolio exists. The premise of an arbitrage-free market is proposed with the goal of obtaining a pricing system that is compatible with the market’s principal asset price.

The main results are summarized as follows:

Consider the problem of Sharpe-ratio portfolio selection.
Formulate a regularization approach based on the penalty technique.
Compute the optimal Sharpe-ratio portfolio using the new algorithm approach.
Propose a financial mathematical method that is combined with increased computing capacity to produce a powerful solution to the problem.

1.4. Organization of the Paper

The remainder of the paper is organized as follows. Section 2 describes the Sharpe-ratio solver. Section 3 suggests a Markov approach for solving the proposed problem. A numerical example is given in Section 4. Our conclusions and final comments are described in Section 5.

2. Sharpe-Ratio Solver

If we denote

σ = {(ω Σ ω)}^{1 / 2} = {∥Σ^{1 / 2} ω∥}_{2}

{(ω Σ ω)}^{1 / 2} = \sqrt{{({(ω Σ ω)}^{1 / 2})}^{2}} \leq \sqrt{(\sum_{i} \sum_{k} ω_{i} Σ_{i k}^{1 / 2}) (\sum_{k} \sum_{j} Σ_{k j}^{1 / 2} ω_{j})} = {∥Σ^{1 / 2} ω∥}_{2}

, where

{∥\cdot∥}_{2}

is the Euclidian norm and

Σ^{1 / 2}

satisfies that

{(Σ^{1 / 2})}^{⊺} (Σ^{1 / 2}) = Σ

we have that the Sharpe-ratio portfolio minimization [3,4] can be expressed as

\begin{matrix} \frac{{∥Σ^{1 / 2} ω∥}_{2}}{ω^{⊺} μ - r_{f}} \to \underset{ω \in W_{a d m}}{m i n i m i z e} \\ s . t . \\ ω^{⊺} e = 1 . \end{matrix}

(2)

Let us introduce the variable

α

such that

\begin{matrix} α \to \underset{ω \in W_{a d m}, α > 0}{m i n i m i z e} \\ s . t . \\ \frac{{∥Σ^{1 / 2} ω∥}_{2}}{(ω^{⊺} μ - r_{f})} \leq α \\ ω^{⊺} e - 1 = 0 \end{matrix}

such that

α - \frac{{∥Σ^{1 / 2} ω∥}_{2}}{(ω^{⊺} μ - r_{f})} \geq 0

with

ω^{⊺} μ - r_{f} > 0 .

Now, consider the following second-order cone programming (SOCP) for the Sharpe-ratio portfolio

\begin{matrix} ω \to \underset{ω \in W_{a d m}, α > 0}{m i n i m i z e} \\ s . t . \\ {∥Σ^{1 / 2} ω∥}_{2} \leq α (ω^{⊺} μ - r_{f}) \\ ω^{⊺} e - 1 = 0 \end{matrix}

Under an affine mapping, the collection of points meeting a second-order cone constraint is the inverse image of the unit second-order cone given by

\begin{matrix} {∥Σ^{1 / 2} ω∥}_{2} \leq α (ω^{⊺} μ - r_{f}) & \Leftrightarrow & [\begin{matrix} Σ^{1 / 2} \\ α μ \end{matrix}] ω + [\begin{matrix} 0 \\ - α r_{f} \end{matrix}] \in K_{n + 1} \end{matrix}

The standard or unit second-order cone of dimension

C_{n + 1}

is defined as

K_{n + 1} = \{[\begin{matrix} ω \\ α \end{matrix}]| ω \in W_{a d m}, α \in R, {∥ω∥}_{2} < α\}

Remark 1.

The constraints, which are analogous to requiring the affine function to lie in the second-order cone in

K_{n + 1}

lead to the SOCP.

It is possible that finding a minimum solution is not unique. We employ the penalization method and introduce a Tokhonov’s regularizator with regularization parameters

q, δ > 0

to solve the ill-posed issue, which consists of

\begin{matrix} {\tilde{Ψ}}_{q, δ} (α, ω, β) = \\ ω + q [\frac{1}{2} {∥ω^{⊺} e - 1∥}^{2} + \frac{1}{2} {∥({∥Σ^{1 / 2} ω∥}_{2} - α (ω^{⊺} μ - r_{f})) + β∥}^{2} + \frac{δ}{2} {∥α∥}^{2} + \frac{δ}{2} {∥ω∥}^{2} + \frac{δ}{2} {∥β∥}^{2}] \end{matrix}

(3)

Clearly, the optimization problem

{\tilde{Ψ}}_{q, δ} (α, ω, β) \to \underset{ω \in W_{a d m}, α > 0, β \geq 0}{m i n i m i z e}

has a unique solution since the optimized function (3) is strongly convex if

δ > 0

. Considering

ϱ = q^{- 1} > 0

, the following property holds:

arg min_{ω \in W_{a d m}, α > 0, β \geq 0} {\tilde{Ψ}}_{q, δ} (α, ω, β) = arg min_{ω \in W_{a d m}, α > 0, β \geq 0} Ψ_{ϱ, δ} (α, ω, β)

and we have

\begin{matrix} Ψ_{ϱ, δ} (α, ω, β) = \\ ϱ ω + \frac{1}{2} {∥ω^{⊺} e - 1∥}^{2} + \frac{1}{2} {∥({∥Σ^{1 / 2} ω∥}_{2} - α (ω^{⊺} μ - r_{f})) + β∥}^{2} + \frac{δ}{2} {∥α∥}^{2} + \frac{δ}{2} {∥ω∥}^{2} + \frac{δ}{2} {∥β∥}^{2} \end{matrix}

The concept behind the portfolio’s function

Ψ_{ϱ, δ} (α, ω, β)

is as follows: if the penalty parameter

ϱ

approaches zero in a specific way, we may suppose that

α_{ϱ, δ}^{*}, ω_{ϱ, δ}^{*}

and

β_{ϱ, δ}^{*}

, which are the optimization problem’s portfolio solution

Ψ_{ϱ, δ} (α, ω, β) \to \underset{ω \in W_{a d m}, α > 0, β \geq 0}{m i n i m i z e}

tend toward the set

W_{a d m}^{*}

of all the portfolio solutions to the original portfolio optimization problem (2), i.e., the distance

d \{α_{ϱ, δ}^{*}, ω_{ϱ, δ}^{*}, β_{ϱ, δ}^{*}; W_{a d m}^{*}\} \underset{ϱ, δ ↓ 0}{\to} 0

is defined as

d \{y; W_{a d m}^{*}\} = min_{ω^{*} \in W_{a d m}^{*}} {∥y - ω^{*}∥}^{2}

Solver method for the Sharpe-ratio portfolio

\begin{matrix} ω_{t + 1} = arg min_{ω \in W_{a d m}} \{\frac{1}{2} {∥ω - ω_{t}∥}^{2} + γ_{ω, t} Ψ_{ϱ_{t}, δ_{t}} (α_{t}, ω, β_{t})\} \\ β_{t + 1} = {[β_{t} - γ_{β, t} \nabla_{β} Ψ_{ϱ_{t}, δ_{t}} (α_{t}, ω_{t}, β_{t})]}_{+} \\ α_{t + 1} = \frac{{∥Σ^{1 / 2} ω_{t}∥}_{2}}{(ω_{t}^{⊺} μ - r_{f})} \end{matrix}\}

where for variable c

\begin{matrix} {[c]}_{+} = ({[c_{1}]}_{+}, \dots, {[c_{n}]}_{+}) \\ {[c_{i}]}_{+} = \{\begin{matrix} c_{i} & i f & c_{i} \geq 0 \\ 0 & i f & c_{i} < 0 \end{matrix} \end{matrix}

3. Markov Approach for the Sharpe-Ratio Portfolio

3.1. Markov Model

Let us consider a discrete-time problem in which n takes integer values, i.e.,

n \in N

. Assume

u_{n}

is a control variable whose value is determined at time n. The partial sequence of controls (or decisions) taken throughout the first n phases is denoted by

U_{n} = (u_{1}, \dots, u_{n})

. The control variable

u_{n}

is chosen based on the knowledge that

U_{n} = {(u_{n})}_{n \in N}

(which determines everything else). However, a more cost-effective portrayal of the past is frequently adequate. For instance, we may not require knowledge about the full path traveled up to time n, but merely the location to which it has led us. The concept behind a state variable

x \in R^{d}

is that its value at time n called

x_{n}

, can be calculated using known values and follows a plant equation (or law of motion)

x_{n + 1} = a (x_{n}, u_{n}),

n \in N

. The optimal

u_{n}

is a function only of

x_{n}

, i.e.,

u_{n} = u (x_{n})

.

Consider a stochastic evolution where the x and u histories at time n are denoted by

X_{n} = {(x_{n})}_{n \in N}

and

U_{n} = {(u_{n})}_{n \in N}

. As previously stated, the state structure is defined by the fact that the process development is specified by a state variable x, which has the value

x_{n}

at time

n \in N

.

A discrete-time Markov chain is a tuple

M C = (X, U, P)

, where X is the state space, U is the action space and

P (x_{n + 1} | X_{n}, U_{n}) = P (x_{n + 1} | x_{n}, u_{n})

is the transition probability distribution (i.e., the stochastic version of the plant equation). A Markov decision process is the tuple defined by

M D P = (M C, f)

, where

f (x_{n + 1}, x_{n}, u_{n})

is the immediate utility function by choice of controls

{(u_{n})}_{n \in N}

.

The transition function

P (x_{n + 1} | x_{n}, u_{n})

and the common prior distribution

P (x_{0})

perfectly describe the behavior represented by a Markov chain, where

P (x_{n}) \in Δ X

, where

Δ X

denotes the set of all probability distributions over X. The Markov chains are self-contained. The absolute values of the utility function

f (x_{n + 1}, x_{n}, u_{n})

are bounded by some constant. We assume that each Markov chain

(P (x_{n}), P (x_{n + 1} | x_{n}, u_{n}))

is irreducible, recurrent and aperiodic (ergodic), and that P is its unique invariant distribution. Then, we have

P (x_{n + 1}) = \sum_{x_{n} \in X} P (x_{n + 1} | x_{n}) P (x_{n})

. As well, there exists a state

x^{*}

which is recurrent for every distribution P.

To formulate the optimization issue related with MDP, let

π : X \to Δ (U)

be a stationary policy, where

Δ (U)

is the U-simplex, which maps state–space X to a probability distribution on action–space U and determines randomized actions based on the current state

x_{n}

. As a result, we obtain that under policy

π

, the action

u_{n}

is chosen based on the probability distribution

π (x_{n})

. Let

Π_{a d m}

be the admissible set of Markov policies, i.e.,

Π_{a d m} = \{π (u_{n} | x_{n}) |\sum_{u_{n} \in U} π (u_{n} | x_{n}) = 1, x_{n} \in X, u_{n} \in U\} .

In this model, the current value of the state is observable, i.e., when selecting

u_{n}

,

x_{n}

is known. We assume that

H_{n} = (X_{n}, U_{n})

, where

H_{n}

is the observed history at time n.

Given the Markovian structure of the state processes, the utility at state vector

x_{n}

with policy

π (u_{n} | x_{n})

and probability distribution

P (x_{n})

can be written as

\begin{matrix} F (π) = \sum_{x_{n} \in X} \sum_{u_{n} \in U} (\sum_{x_{n + 1} \in X} f (x_{n + 1}, x_{n}, u_{n}) P (x_{n + 1} | x_{n}, u_{n})) π (u_{n} | x_{n}) P (x_{n}) = \\ \sum_{x_{n} \in X} \sum_{u_{n} \in U} F (x_{n}, u_{n}) π (u_{n} | x_{n}) P (x_{n}), \end{matrix}

such that

F (x_{n}, u_{n}) = \sum_{x_{n + 1} \in X} f (x_{n + 1}, x_{n}, u_{n}) P (x_{n + 1} | x_{n}, u_{n})

.

A policy

{\{π_{n}\}}_{n \geq 0}

is called a optimal if for each

n \geq 0

maximizes the conditional mathematical expectation of the utility function

F (π)

considering the history process in period n,

H_{n} = (X_{n}, U_{n})

and set of possible states period

H_{n}

is fixed such that cannot be changed hereafter, i.e., it achieves the optimal policy by solving the conditional optimization problem given by

π^{*} : = arg max_{π \in Π_{a d m}} E \{F (π)| H_{n}\}

(4)

where

F (π)

is the average utility function. Under the previous assumptions the admissible set

Π_{a d m}

is nonempty, therefore there exists an optimum policy

π^{*}

in the class of stationary Markovian policies.

The variance

V a r (F (π))

is given by

\begin{matrix} V a r (F (π)) : = \sum_{x_{n} \in X} \sum_{u_{n} \in U} {[F (x_{n}, u_{n}) - F (x_{n}, u_{n})]}^{2} π (u_{n} | x_{n}) P (x_{n}) = \\ \sum_{x_{n} \in X} \sum_{u_{n} \in U} F^{2} (x_{n}, u_{n}) π (u_{n} | x_{n}) P (x_{n}) - F^{2} (x_{n}, u_{n}) \end{matrix}

Finally, the portfolio is defined by

\begin{matrix} Φ (π) = F (π) - λ V a r (F (π)) = \\ \sum_{x_{n} \in X} \sum_{u_{n} \in U} F (x_{n}, u_{n}) π (u_{n} | x_{n}) P (x_{n}) + λ (F^{2} (x_{n}, u_{n}) - \sum_{x_{n} \in X} \sum_{u_{n} \in U} F^{2} (x_{n}, u_{n}) π (u_{n} | x_{n}) P (x_{n})) \end{matrix}

The distribution vector

P (x_{n})

is defined as

P (x_{n + 1}) = \sum_{x_{n + 1} \in X} \sum_{x_{n} \in X} (\sum_{u_{n} \in U} P (x_{n + 1} | x_{n}, u_{n}) π (u_{n} | x_{n})) P (x_{n})

(5)

In the ergodic case, which we are dealing with, these probabilities exponentially quickly converge to the stationary distributions, that is,

P (x_{n}) \underset{n \to \infty}{\to} P (x)

. From now on, we consider stationary distributions (we are considering the one-period portfolio).

3.2. Portfolio Model’s Compliance with MARKOV

Consider a variable v defined as

v (u, x) : = π (u | x) P (x)

such

\begin{matrix} V_{a d m} = \{v (u, x) |\sum_{x \in X} \sum_{\in U} v (u, x) = 1, \sum_{u \in U} v (u, x) > 0 \sum_{x \in X} \sum_{u \in U} [κ_{x^{'}, x} - P (x^{'} | x, u)] v (u, x) = 0, x^{'} \in X\} \end{matrix}

where

κ_{x^{'}, x}

is Kronecker’s variable. The following relationship holds true is the ergodic case

\sum_{u \in U} v (u, x) > 0 .

It is straightforward to determine that

v (u, x)

belongs to the simplex

S

S : = \{v (u, x) |\sum_{x \in X} \sum_{u \in U} v (u, x) = 1\}

The utility function in terms of v-variables is determined by

\tilde{F} (v) = \sum_{x \in X} \sum_{u \in U} F (x, u) v (u, x),

To obtain the variables of interest after the portfolio model is solved, we have a stationary distribution

P (x)

and the policy (portfolio)

π (u | x)

may be recovered using the following formulae

\begin{matrix} P (x) = \sum_{u \in U} v (u, x) & π (u | x) = \frac{v (u, x)}{\sum_{u \in U} v (u, x)} \end{matrix}

Associating these variables with the notions above define the vector

ω = ω (v) : = (v (u, x)) \in W_{a d m}, v \in V_{a d m}

such that the regularized portfolio return is defined as

\begin{matrix} Φ_{δ} (v) = \sum_{x \in X} \sum_{u \in U} F (x, u) v (u, x) + λ \sum_{x \in X} \sum_{u \in U} F (x, u) v (u, x) \sum_{x^{'} \in X} \sum_{u^{'} \in U} F (x^{'}, u^{'}) v (u^{'}, x^{'}) - \\ λ \sum_{x \in X} \sum_{u \in U} F^{2} (x, u) v (u, x) + \frac{δ}{2} ∥v (u, x)∥^{2} \end{matrix}

and satisfies

Ψ_{δ} (ω (v)) = Φ_{δ} (v)

then,

\begin{matrix} ω^{*} \in A r g max_{ω \in W_{a d m}} Ψ_{δ} (ω (v)) ⟺ v^{*} \in A r g max_{v \in V_{a d m}} Φ_{δ} (v) \\ ω^{*} = ω (v^{*}) \end{matrix}

As a result,

\frac{{(ω Σ ω)}^{1 / 2}}{ω^{⊺} μ - r_{f}} = \frac{{[\sum_{x \in X} \sum_{u \in U} F^{2} (x, u) v (u, x) - \sum_{x \in X} \sum_{u \in U} F (x, u) v (u, x) \sum_{x^{'} \in X} \sum_{u^{'} \in U} F (x^{'}, u^{'}) v (u^{'}, x^{'})]}^{1 / 2}}{\sum_{x \in X} \sum_{u \in U} F (x, u) v (u, x) - {\tilde{r}}_{f}}

where

{\tilde{r}}_{f}

is the return risk-free asset in terms of Markov chains.

3.3. Solver for Markov Chains

Let us consider

\begin{matrix} \frac{{[\sum_{x \in X} \sum_{u \in U} F^{2} (x, u) v (u, x) - \sum_{x \in X} \sum_{u \in U} F (x, u) v (u, x) \sum_{x^{'} \in X} \sum_{u^{'} \in U} F (x^{'}, u^{'}) v (u^{'}, x^{'})]}^{1 / 2}}{\sum_{x \in X} \sum_{u \in U} F (x, u) v (u, x) - {\tilde{r}}_{f}} \to min_{v \in V_{a d m}} \\ s . t . \\ \sum_{x \in X} \sum_{\in U} v (u, x) = 1, \\ \sum_{u \in U} v (u, x) > 0, \\ \sum_{x \in X} \sum_{\in U} [κ_{x^{'}, x} - P (x^{'} | x, u)] v (u, x) = 0 \end{matrix}

Considering the SOCP for the Sharpe-ratio approach, we have

\begin{matrix} v \to min_{v \in V_{a d m}, α > 0} \\ s . t . \\ {[\sum_{x \in X} \sum_{u \in U} F^{2} (x, u) v (u, x) - \sum_{x \in X} \sum_{u \in U} F (x, u) v (u, x) \sum_{x^{'} \in X} \sum_{u^{'} \in U} F (x^{'}, u^{'}) v (u^{'}, x^{'})]}^{1 / 2} \leq \\ α (\sum_{x \in X} \sum_{u \in U} F (x, u) v (u, x) - {\tilde{r}}_{f}) \\ \sum_{x \in X} \sum_{\in U} v (u, x) = 1, \\ \sum_{u \in U} v (u, x) > 0, \\ \sum_{x \in X} \sum_{\in U} [κ_{x^{'}, x} - P (x^{'} | x, u)] v (u, x) = 0 \end{matrix}

We have

\begin{matrix} Φ_{ϱ, δ} (v, α, β) = ϱ v + \frac{1}{2} {∥\sum_{x \in X} \sum_{u \in U} v (u, x) - 1∥}^{2} + \\ \frac{1}{2} ∥{[\sum_{x \in X} \sum_{u \in U} F^{2} (x, u) v (u, x) - \sum_{x \in X} \sum_{u \in U} F (x, u) v (u, x) \sum_{x^{'} \in X} \sum_{u^{'} \in U} F (x^{'}, u^{'}) v (u^{'}, x^{'})]}^{1 / 2} - \\ {α (\sum_{x \in X} \sum_{u \in U} F (x, u) v (u, x) - {\tilde{r}}_{f}) + β∥}^{2} - \frac{1}{2} {∥\sum_{u \in U} v (u, x)∥}^{2} + \\ \frac{1}{2} {∥\sum_{x \in X} \sum_{u \in U} [κ_{x^{'}, x} - P (x^{'} | x, u)] v (u, x)∥}^{2} + \frac{δ}{2} {∥α∥}^{2} + \frac{δ}{2} {∥v (u, x)∥}^{2} + \frac{δ}{2} {∥β∥}^{2} \end{matrix}

The optimization problem becomes

Φ_{ϱ, δ} (v, α, β) \to \underset{v \in V_{a d m}, α > 0, β \geq 0}{m i n i m i z e}

Solver method for the Sharpe ratio in Markov chains

\begin{matrix} v_{n + 1} = arg min_{v \in V_{a d m}} \{\frac{1}{2} {∥v - v_{n}∥}^{2} + γ_{v, n} Φ_{ϱ, δ} (v, α_{n}, β_{n})\} \\ β_{n + 1} = {[β_{n} - γ_{β, n} \nabla_{β} Φ_{ϱ, δ} (v_{n}, α_{n}, β_{n})]}_{+} \\ α_{n + 1} = \frac{{[\sum_{x \in X} \sum_{u \in U} F^{2} (x, u) v_{n} (u, x) - \sum_{x \in X} \sum_{u \in U} F (x, u) v_{n} (u, x) \sum_{x^{'} \in X} \sum_{u^{'} \in U} F (x^{'}, u^{'}) v_{n} (u^{'}, x^{'})]}^{1 / 2}}{α_{n} (\sum_{x \in X} \sum_{u \in U} F (x, u) v_{n} (u, x) - {\tilde{r}}_{f})} \end{matrix}

4. Numerical Example

Under the one-period horizon, we assume that investors expect the same probability distribution of returns and target the portfolio with the lowest risk. We believe that there is no inflation or interest rate shift, and that the markets are in a state of equilibrium. To get closer to the actual world, we assume that trading has transaction costs and that investors can trade limitless quantities on an arbitrage-free market.

The proposed method implies that

v_{n + 1} = arg min_{v \in V_{a d m}} \{\frac{1}{2} {∥v - v_{n}∥}^{2} + γ_{v, n} Φ_{ϱ, δ} (v, α, β)\}

Developing further, we have

\begin{matrix} v_{n + 1} = \frac{1}{2} {∥v (u, x)∥}^{2} - v (u, x) v_{n} (u, x) + \frac{1}{2} {∥v_{n} (u, x)∥}^{2} + γ_{v, n} [ϱ v + \frac{1}{2} {∥\sum_{x \in X} \sum_{u \in U} v (u, x) - 1∥}^{2} + \\ \frac{1}{2} ∥{[\sum_{x \in X} \sum_{u \in U} F^{2} (x, u) v (u, x) - \sum_{x \in X} \sum_{u \in U} F (x, u) v (u, x) \sum_{x^{'} \in X} \sum_{u^{'} \in U} F (x^{'}, u^{'}) v (u^{'}, x^{'})]}^{1 / 2} - \\ {α (\sum_{x \in X} \sum_{u \in U} F (x, u) v (u, x) - {\tilde{r}}_{f}) + β∥}^{2} - \frac{1}{2} {∥\sum_{u \in U} v (u, x)∥}^{2} + \\ \frac{1}{2} {∥\sum_{x \in X} \sum_{u \in U} [κ_{x^{'}, x} - P (x^{'} | x, u)] v (u, x)∥}^{2} + \frac{δ}{2} {∥α∥}^{2} + \frac{δ}{2} {∥v (u, x)∥}^{2} + \frac{δ}{2} {∥β∥}^{2}] \end{matrix}

\begin{matrix} β_{n + 1} = {[β_{n} - γ_{β, n} \nabla_{β} Φ_{ϱ, δ} (v_{n}, α_{n}, β_{n})]}_{+} \end{matrix}

Hence,

\begin{matrix} β_{n + 1} = [β_{n} - γ_{β, n} \nabla_{β} [ϱ v + \frac{1}{2} {∥\sum_{x \in X} \sum_{u \in U} v_{n} (u, x) - 1∥}^{2} + \\ \frac{1}{2} ∥{(\sum_{x \in X} \sum_{u \in U} F^{2} (x, u) v_{n} (u, x) - \sum_{x \in X} \sum_{u \in U} F (x, u) v_{n} (u, x) \sum_{x^{'} \in X} \sum_{u^{'} \in U} F (x^{'}, u^{'}) v_{n} (u^{'}, x^{'}))}^{1 / 2} - \\ {α (\sum_{x \in X} \sum_{u \in U} F (x, u) v_{n} (u, x) - {\tilde{r}}_{f}) + β_{n}∥}^{2} - \frac{1}{2} {∥\sum_{u \in U} v (u, x)∥}^{2} + \\ {\frac{1}{2} {∥\sum_{x \in X} \sum_{u \in U} [κ_{x^{'}, x} - P (x^{'} | x, u)] v (u, x)∥}^{2} + \frac{δ}{2} {∥α_{n}∥}^{2} + \frac{δ}{2} {∥v_{n} (u, x)∥}^{2} + \frac{δ}{2} {∥β_{n}∥}^{2})]}_{+} = \\ [β_{n} - γ_{β, n} \{{(\sum_{x \in X} \sum_{u \in U} F^{2} (x, u) v_{n} (u, x) - \sum_{x \in X} \sum_{u \in U} F (x, u) v_{n} (u, x) \sum_{x^{'} \in X} \sum_{u^{'} \in U} F (x^{'}, u^{'}) v_{n} (u^{'}, x^{'}))}^{1 / 2} - \\ {α (\sum_{x \in X} \sum_{u \in U} F (x, u) v_{n} (u, x) - {\tilde{r}}_{f}) + β_{n}\} + δ β_{n}]}_{+} \end{matrix}

Finally,

α_{n + 1} = \frac{{[\sum_{x \in X} \sum_{u \in U} F^{2} (x, u) v_{n} (u, x) - \sum_{x \in X} \sum_{u \in U} F (x, u) v_{n} (u, x) \sum_{x^{'} \in X} \sum_{u^{'} \in U} F (x^{'}, u^{'}) v_{n} (u^{'}, x^{'})]}^{1 / 2}}{α_{n} (\sum_{x \in X} \sum_{u \in U} F (x, u) v_{n} (u, x) - {\tilde{r}}_{f})}

For the proposed problem, we have that the set X has eight states and the set U three controls. The initial parameters

γ_{ν}

and

γ_{β}

are set to be

γ_{ν, 0} = 5 \times 10^{- 3}

and

γ_{β, 0} = 5 \times 10^{- 2}

. For the method, the initial point portfolio

π

is set to be in the middle of the simplex, as well as the initial distribution. The value of

δ

is set to be

δ_{0} = 0.3049

. As well,

α_{0} = 0.5

and

β_{0} = 0.1

.

The resulting portfolio is given by

π (u | x) = [\begin{matrix} 0.3255 & 0.3363 & 0.3382 \\ 0.2741 & 0.3631 & 0.3628 \\ 0.3717 & 0.3958 & 0.2325 \\ 0.3401 & 0.3682 & 0.2917 \\ 0.2452 & 0.3836 & 0.3712 \\ 0.3487 & 0.3211 & 0.3302 \\ 0.5057 & 0.0080 & 0.4863 \\ 0.3250 & 0.3575 & 0.3175 \end{matrix}]

The investor’s primary purpose is to make a profit. A rational investor tries to choose the portfolio with the lowest risk that achieves this goal. To achieve this purpose, we create a mean-variance diagram with all of the conceivable hazardous asset portfolios, where the points indicate the returns

F

and the risk

V a r

(variance) of the portfolios. Figure 1 presents the convergence of the utility, Figure 2 shows the variance and Figure 3 plots the convergence of the functional. Figure 4 shows the convergence of the portfolio strategies.

Figure 1. Utility value of the portfolio.

Figure 2. Variance value of the portfolio.

Figure 3. Functional value of the portfolio.

Figure 4. Convergence of the strategies.

5. Conclusions

Financial market research has grown in importance, owing to the adoption of advanced mathematical tools for improved decision making. The need for more appropriate modeling techniques to handle the portfolio optimization problem has risen due to the enormous expansion in the diversity of financial assets. The Sharpe ratio is a popular performance indicator used to optimize the trade-off between rewards and risks. The Sharpe ratio can be applied to a variety of situations, including performance evaluation, risk management, and market efficiency testing.

This paper proposes a Sharpe-ratio portfolio solution. For ensuring strong convexity and the existence of a unique solution involving equality and inequality requirements, we employed a penalty function approach. The penalty regularized technique was employed to represent the nonlinear portfolio problem. For the proposed model, we suggest a computationally tractable way to determine the Sharpe-ratio portfolio. A Markov chain structure was used to model the underlying asset price process. In order to determine the optimal portfolio in Markov chains, a new hybrid optimization programming method was proposed. The suggested method’s efficiency and efficacy were demonstrated using a numerical example.

Author Contributions

Writing and original draft, L.L.O.-C. and J.B.C.; Writing, review and editing, A.A.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Secretaría de Investigación y Posgrado, Instituto Politécnico Nacional.

Institutional Review Board Statement

Ethical review and approval were waived for this study, since no human or animal resources were involved or mentioned in the study.

Informed Consent Statement

Not applicable.

Data Availability Statement

No real data sets have been used, only simulation examples are given.

Conflicts of Interest

The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

Markowitz, H. Portfolio selection. J. Financ. 1952, 7, 77–98. [Google Scholar]
Merton, R. An analytic derivation of the efficient portfoliofrontier. J. Financ. Quant. Anal. 1972, 4, 1851–1872. [Google Scholar] [CrossRef]
Sharpe, W.F. Mutual Fund Performance. J. Bus. 1966, 39, 119–138. [Google Scholar] [CrossRef]
Sharpe, W.F. The Sharpe Ratio. J. Portf. Manag. 1994, 21, 49–58. [Google Scholar] [CrossRef]
Caruso, G.; Gattone, S.; Fortuna, F.; Di Battista, T. Cluster Analysis for mixed data: An application to credit risk evaluation. Soc.-Econ. Plan. Sci. 2021, 73, 100850. [Google Scholar] [CrossRef]
Zakamouline, V.; Koekebakker, S. Portfolio performance evaluation with generalized sharpe ratios:beyond the mean and variance. J. Bank. Financ. 2009, 33, 1242–1254. [Google Scholar] [CrossRef]
Lu, J.R.; Li, X.Y. Identifying the fair value of sharpe ratio by an option valuation approach. Q. Rev. Econ. Financ. 2021, 82, 63–70. [Google Scholar] [CrossRef]
Kourtis, A. The Sharpe ratio of estimated efficient portfolios. Financ. Res. Lett. 2016, 17, 72–78. [Google Scholar] [CrossRef]
Samuelson, P. Lifetime portfolio selection by dynamic stochastic programming. Rev. Econ. Stat. 1969, 51, 239–246. [Google Scholar] [CrossRef]
Constantinides, G. Multiperiod consumption and investment behavior with convex transaction costs. Manag. Sci. 1979, 25, 1127–1137. [Google Scholar] [CrossRef]
Sánchez, E.M.; Clempner, J.B.; Poznyak, A.S. Solving The Mean-Variance Customer Portfolio in Markov Chains Using Iterated Quadratic/Lagrange Programming: A Credit-Card Customer-Credit Limits Approach. Expert Syst. Appl. 2015, 42, 5315–5327. [Google Scholar] [CrossRef]
Sánchez, E.M.; Clempner, J.B.; Poznyak, A.S. A priori-knowledge/actor-critic reinforcement learning architecture for computing the mean–variance customer portfolio: The case of bank marketing campaigns. Eng. Appl. Artif. Intell. 2015, 46 Pt A, 82–92. [Google Scholar] [CrossRef]
Clempner, J.B.; Poznyak, A.S. Sparse mean–variance customer Markowitz portfolio optimization for Markov chains: A Tikhonov’s regularization penalty approach. Optim. Eng. 2018, 19, 383–417. [Google Scholar] [CrossRef]
Asiain, E.; Clempner, J.B.; Poznyak, A.S. A Reinforcement Learning Approach for Solving the Mean Variance Customer Portfolio in Partially Observable Models. Int. J. Artif. Intell. Tools 2018, 27, 1850034. [Google Scholar] [CrossRef]
Garcia-Galicia, M.; Carsteanu, A.A.; Clempner, J. Continuous-Time Mean Variance Portfolio with Transaction Costs: A Proximal Approach Involving Time Penalization. Int. J. Gen. Syst. 2019, 48, 91–111. [Google Scholar] [CrossRef]
Domínguez, F.; Clempner, J.B. Multiperiod Mean-Variance Customer Constrained Portfolio Optimization For Finite Discrete-Time Markov Chains. Econ Comput Econ Cyb. 2019, 1, 39–56. [Google Scholar]
Garcia-Galicia, M.; Carsteanu, A.A.; Clempner, J. Continuous-Time Learning Method For Customer Portfolio with Time Penalization. Expert Syst. Appl. 2019, 129, 27–36. [Google Scholar] [CrossRef]
Meghwani, S.; Thakur, M. Multi-objective heuristic algorithms for practical portfolio optimization and rebalancing with transaction cost. Appl. Soft Comput. 2018, 67, 865–894. [Google Scholar] [CrossRef]
Vazquez, E.; Clempner, J.B. Customer Portfolio Model Driven By Continuous-Time Markov Chains: An L2 Lagrangian Regularization Method. Econ. Comput. Econ. Cybern. Stud. Res. 2020, 2, 23–40. [Google Scholar]
Akian, M.; Sulem, A.; Taksar, M. Dynamic optimization of long-term growth rate for a portfolio with transaction costs and logarithmic utility. Math. Financ. 2001, 11, 152–188. [Google Scholar] [CrossRef]
Cvitanic, J.; Karatzas, I. Hedging and portfolio optimization under transaction costs: A martingale approach. Math. Financ. 1996, 6, 133–166. [Google Scholar] [CrossRef]
Davis, M.; Norman, A. Portfolio selection with transaction costs. Math. Oper. Res. 1990, 15, 676–713. [Google Scholar] [CrossRef]
Liu, H. Optimal consumption and investment with transaction costs and multiple risky assets. J. Financ. 2005, 59, 289–338. [Google Scholar] [CrossRef]
Ziemba, W.; Vickson, R. Stochastic Optimization Models in Finance; World Scientific: Singapore, 2006. [Google Scholar]
Nowak, P.; Romaniuk, M. Valuing catastrophe bonds involving correlation and CIR interest rate model. Comp. Appl. Math. 2018, 37, 365–394. [Google Scholar] [CrossRef]
Mwanakatwe, P.; Song, L.; Hagenimana, E.; Wang, X. Management strategies for a defined contribution pension fund under the hybrid stochastic volatility model. Comp. Appl. Math. 2019, 38, 1–19. [Google Scholar] [CrossRef]
Tikhonov, A.N.; Arsenin, V.Y. Solution of Ill-Posed Problems; Winston & Sons: Washington, DC, USA, 1977. [Google Scholar]
Tikhonov, A.; Goncharsky, A.; Stepanov, V.; Yagola, A.G. Numerical Methods for the Solution of Ill-Posed Problems; Kluwer Academic Publishers: Alphen aan den Rijn, The Netherlands, 1995. [Google Scholar]
Carrasco, M.; Noumon, N. Optimal Portfolio Selection Using Regularization; Citeseer: Gaithersburg, MD, USA, 2010. [Google Scholar]
Fastrich, B.; Paterlini, S.; Winker, P. Constructing optimal sparse portfolios using regularization methods. Comput. Manag. Sci. 2015, 12, 417–434. [Google Scholar] [CrossRef]

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.