Next Article in Journal
EJS: Multi-Strategy Enhanced Jellyfish Search Algorithm for Engineering Applications
Next Article in Special Issue
Monotone Mean Lp-Deviation Risk Measures
Previous Article in Journal
Compression Reconstruction Network with Coordinated Self-Attention and Adaptive Gaussian Filtering Module
Previous Article in Special Issue
Tail Value-at-Risk-Based Expectiles for Extreme Risks and Their Application in Distributionally Robust Portfolio Selections
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Shortfall-Based Wasserstein Distributionally Robust Optimization

1
Department of Statistics and Finance, University of Science and Technology of China, Hefei 230052, China
2
School of Mathematics and Finance, Chuzhou University, Chuzhou 239000, China
*
Author to whom correspondence should be addressed.
Mathematics 2023, 11(4), 849; https://doi.org/10.3390/math11040849
Submission received: 31 December 2022 / Revised: 26 January 2023 / Accepted: 31 January 2023 / Published: 7 February 2023

Abstract

:
In this paper, we study a distributionally robust optimization (DRO) problem with affine decision rules. In particular, we construct an ambiguity set based on a new family of Wasserstein metrics, shortfall–Wasserstein metrics, which apply normalized utility-based shortfall risk measures to summarize the transportation cost random variables. In this paper, we demonstrate that the multi-dimensional shortfall–Wasserstein ball can be affinely projected onto a one-dimensional one. A noteworthy result of this reformulation is that our program benefits from finite sample guarantee without a dependence on the dimension of the nominal distribution. This distributionally robust optimization problem also has computational tractability, and we provide a dual formulation and verify the strong duality that enables a direct and concise reformulation of this problem. Our results offer a new DRO framework that can be applied in numerous contexts such as regression and portfolio optimization.

1. Introduction

In the literature of operations research (OR) and machine learning (ML), stochastic optimization problems of the following form have been widely studied:
inf ω D ρ P ( l ( ω ξ ) ) ,
where ξ R n is a random vector with distribution P , ω is the decision variable restricted to the set D , l ( · ) represents a cost/loss function and ρ P ( · ) is a measure of risk under the constraint that the distribution of the random variable is P . In ML applications, the function ρ P ( · ) typically takes the form ρ P ( X ) = E P . In OR, l ( · ) always typifies the disutility function, and E P l ( X ) is used as a tool to quantify risks.
In practice, the true distribution P of ξ is often unknown. To overcome the lack of knowledge on this distribution, distributionally robust optimization (DRO) was proposed as an alternative modeling paradigm. It seeks to find a decision variable ω that minimizes the worst-case expected loss sup P P E P l ( ω ξ ) , where P is referred to as the ambiguity set characterized through some known properties of the true distribution P . The choice of P is of great significance and there are typically two ways to construct it. The first is moment-based ambiguity which contains distributions whose moments satisfy certain conditions [1]. The other, more popular approach is the discrepancy-based ambiguity set which is generally taken as a ball that contains distributions close to a nominal distribution with respect to a statistical distance. Popular choices of the statistical distance include Kullback–Leibler (KL) divergence [2,3], the Wasserstein metric [4], etc. Since the Wasserstein metric can be defined between a discrete distribution and a continuous distribution, the ambiguity set based on the Wasserstein metric includes richer distributions than that based on divergence [5]. This makes the Wasserstein metric more popular in modeling the ambiguity set. However, the authors of [5] point out that the ball B ε d W ( P ^ N ) is desirable as it contains various forms of distributions; however, the flip side is that it may be considered overly-conservative as distributions differing greatly from the empirical distribution may also be included.
Motivated by this, we extend classic Wasserstein metrics to shortfall–Wasserstein metrics by applying the utility-based shortfall risk measures to summarize the distribution of a transportation cost random variable. The formal definition will be given in Section 2. It is worth noting that the properties of the utility-based shortfall risk measure are widely researched in the literature of risk measures [6,7] and that it naturally includes the expectation as special cases. Based on the shortfall–Wasserstein metrics, we define an ambiguity set and formulate a new DRO problem. The utility based shortfall risk measure has been applied well in DRO; see, e.g., [8,9]. However, in the literature, the shortfall risk measure serves as an objective risk measure. In this paper, we employ a utility-based shortfall risk measure to construct the ambiguity set instead of the objective function, which is novel. Moreover, in this paper, we study the tractability of the new problem. In particular, we study the reformulation of the problem and reduce the problem to a convex problem with tractability. For the new metric, the property of the finite sample guarantee is also studied. In particular, to obtain those results, a key result is the projection result of the ambiguity set based on the shortfall–Wasserstein metric. Such projection result for different ambiguity sets have been studied by [10,11]. In this paper, a necessary and sufficient condition for the projection result of the new ambiguity set is given.
The main contribution of the paper can be summarized as follows.
  • A new family of Wasserstein metrics based on utility-based shortfall risk measures is introduced and is called the shortfall–Wasserstein metric. We propose a data-driven DRO problem based on the shortfall–Wasserstein metric. It is shown that the new DRO model has the benefits of desirable properties of finite sample guarantee and computational tractability.
  • For the new shortfall–Wasserstein metric, we define the corresponding uncertainty set, which is called the shortfall–Wasserstein ball, and give an equivalent characterization of the projection result to a one-dimensional ball. Based on the projection result, we show that the multi-dimensional constraint of our distributionally robust optimization model can be reformulated as a one-dimensional one. Based on this reformulation, we established the finite sample guarantee of the DRO problem which is free from the curse of dimensionality.
  • We obtain a dual formulation for the robust optimization problem and verify the strong duality. In addition, the dual form admits a reformulation that can be completely characterized when taking the discrete empirical distribution as the center of ambiguity sets.

Related Work

The central idea behind DD-DRO is to model the distribution of uncertain parameters by constructing a uncertainty set from the sample. Designing a good uncertainty set is crucial. There are two kinds of sets in literature: the moment-based ambiguity set [1] and a “ball” structure based on a statistical distance [4]. Popular choices of the statistical distance include Kullback–Leibler (KL) divergence [2,3], the Wasserstein metric [4,12,13] and the CVaR– and expectile–Wasserstein metrics [5]. The shortfall–Wasserstein metrics proposed in this paper employ a utility-based shortfall risk measure to construct the ambiguity set which includes the expectile–Wasserstein metric as a special case. We also point out that there are also uncertainty sets defined based on stochastic dominance; see [14].
Notational conventions Throughout this paper, we denote R ¯ : = R { + } as the extended reals and R + : = [ 0 , ) . Let M ( R n ) denote the set of all distributions supported on R n . For p 1 , · p denotes the 𝓎 p norm on R n . By δ ξ we denote the Dirac distribution concentrating unit mass at ξ and x + : = max { x , 0 } and x : = max { x , 0 } . For any A S × S , let us denote P r o j 1 ( A ) = { x 1 : ( x 1 , x 2 ) A } . Moreover, by S we denote a Polish space and by B ( S ) the corresponding σ -algebra and we let M b ( S ) be the set of all finite signed measures on ( S , B ( S ) ) . We use C b ( S ) to denote the set of all finite continuous functions from S to R . P ( S ) is denoted as as the set of all probability measures. For any μ P ( S ) , we use B μ ( S ) to denote the completion of B ( S ) with respect to μ . The extension of μ to B μ ( S ) is unique, and we interpret μ in φ d μ as this extension defined on B μ ( S ) when φ : ( S , B ( S ) ) ( R , B ( R ) ) is not measurable but φ : ( S , B μ ( S ) ) ( R , B ( R ) ) is measurable [15]. For any μ P ( S ) and m 1 , by L m ( d μ ) we denote the set of all Borel-measurable functions f : S R satisfying | f | m d μ < . Let U ( S ) : = μ P ( S ) B μ ( S ) , then we shall use m U ( S ; R ¯ ) to denote the set of all measurable functions φ : ( S , U ( S ) ) ( R ¯ , B ( R ¯ ) ) .

2. Shortfall–Wasserstein Metric

Motivated by data-drivenness, the choice of P 0 is usually chosen as the empirical distribution. Let us denote the training dataset by Ξ ^ N : = { ξ ^ i } i N R n and the empirical distribution by P ^ N = 1 N i = 1 N δ ξ ^ i , with δ ξ denoting the Dirac distribution at ξ R n . In the DRO models introduced above, the choice of the probability metric d plays a significant role in constructing the ambiguity set B ε d ( P ^ N ) . One of the most widely used probability metrics is the (type-1) Wasserstein metric [16]
d W ( P 1 , P 2 ) : = inf E Π ( ξ 1 ξ 2 ) Π is a joint distribution of ξ 1 and ξ 2 with marginals P 1 and P 2 , respectively ,
which is applicable to any distributions P 1 , P 2 M ( R n ) with finite first moments. In this paper, we take 𝓎 p norm · p with p > 1 . The ambiguity set based on the Wasserstein metric is naturally defined as
B ε d W ( P ^ N ) : = P M ( R n ) : d W ( P ^ N , P ) ε .
which is called the Wasserstein ball centered at P ^ N with radius ε . In the literature, the distributional robust problem with the ambiguity set being the above Wasserstein ball has been widely studied [4,12,13] and is known as the Wasserstein data-driven distributionally robust optimization model (W-DD-DRO),
inf ω D sup P B ε d W ( P ^ N ) E P l ( ω ξ ) .
As argued by [5], the ball B ε d W ( P ^ N ) is desirable as it contains various forms of distributions; however, the flip side is that it may be considered overly-conservative as distributions differing greatly from the empirical distribution may also be included. Motivated by this, they proposed the CVaR–Wasserstein and expectile–Wasserstein balls and studied the reformulation and tractability of the corresponding DRO problems. In this paper, we extend the expectile–Wasserstein metrics to Shortfall–Wasserstein metrics by applying normalized utility-based shortfall risk measures to evaluate the transportation cost.

2.1. Risk Measures

We first introduce some notions of risk measures. Let ( Ω , B , P ) be the probability space, and B is the σ -algebra on Ω . Following the convention in mathematical finance, we describe a risk by a random variable X : Ω R . A risk measure is a functional ρ mapping from a set of risks to R . We list some desired properties as follows:
(P1)
(Translation invariance) ρ ( X + a ) = ρ ( X ) + a for a 0 ;
(P2)
(Positive homogeneity) ρ ( c X ) = c ρ ( X ) for any c > 0 ;
(P3)
(Monotonicity) ρ ( X 1 ) ρ ( X 2 ) for any X 1 X 2 ;
(P4)
(Subadditivity) ρ ( X 1 + X 2 ) ρ ( X 1 ) + ρ ( X 2 ) ;
(P5)
(Convexity) ρ ( α X + ( 1 α ) Y ) α ρ ( X ) + ( 1 α ) ρ ( Y ) , α ( 0 , 1 ) ;
(P6)
(Law invariance) ρ ( X 1 ) = ρ ( X 2 ) for any X 1 = d X 2 .
A risk measure satisfying the above properties (P1) and (P2) is called a monetary risk measure and a risk measure satisfying the above properties (P1)–(P4) is called a coherent risk measure, which has been viewed as one of the most important risk measures since the seminal work [17]. A risk measure ρ is called a convex risk measure if it satisfies properties (P1), (P2) and (P5). We then introduce the definition of utility-based shortfall risk measures.
Definition 1 
(Utility-based shortfall risk measures). Let u : R R be non-decreasing and continuous, satisfying u ( 0 ) = 0 . For a random variable X, the utility-based shortfall risk measure is defined as
S u ( X ) = inf { t : E [ u ( X t ) ] u ( 0 ) } .
It is well known that any utility-based shortfall risk measure satisfies the monotonicity and translation invariance, and thus is a monetary risk measure. Based on the utility-based shortfall risk measure, we now formally give the definition of shortfall–Wasserstein metric as follows.
Definition 2 
(Shortfall–Wasserstein metric). A metric d u ( · , · ) : M ( R n ) × M ( R n ) R { + } is called the Shortfall–Wasserstein metric if it has the form of
d u ( P 1 , P 2 ) : = inf S u Π ( ξ 1 ξ 2 p ) Π is a joint distribution of ξ 1 and ξ 2 with marginals P 1 and P 2 , respectively ,
where p > 1 and · p is the 𝓎 p norm on R n .
We first study the basic properties of the shortfall–Wasserstein metric.
Proposition 1. 
Let u : R R be a convex, non-decreasing and continuous function satisfying u ( 0 ) = 0 . We have the following statements:
(i) 
d u satisfies the identity of indiscernibles, i.e., d u ( P 1 , P 2 ) = 0 if and only if P 1 = P 2 ;
(ii) 
d u satisfies the symmetry, that is, d u ( P 1 , P 2 ) = d u ( P 2 , P 1 ) for any P 1 , P 2 ;
(iii) 
d u satisfies the non-negativity, i.e., d u ( P 1 , P 2 ) 0 for any P 1 , P 2 ;
(iv) 
If u is positively homogeneous, then d u satisfies the triangle inequality: d u P 1 , P 2 + d u P 2 , P 3 d u P 1 , P 3 for any P 1 , P 2 , P 3 .
Proof. 
(i) Denote by Π ( P 1 , P 2 ) the set of all distributions on M ( R 2 n ) with margins P 1 and P 2 . To see the sufficiency, note that when P 1 = P 2 = P and ξ P , the set Π ( P 1 , P 2 ) contains the joint distribution of ( ξ , ξ ) . Therefore, d u ( P 1 , P 2 ) = 0 as S u ( ξ ξ ) = 0 . To see the necessity, note that under the condition, S u satisfies convexity. By Theorem 4.2 of [18], we have S u E . By that, the Wasserstein metric satisfies the identity of indiscernibles, then we have P 1 = P 2 .
(ii) The symmetry follows immediately.
(iii) Note that u ( 0 ) = 0 and u is non-decreasing, we have u ( x ) 0 for every x 0 . As the norm is non-negative and u is non-decreasing, we have S u ( ξ 1 ξ 2 ) 0 , which means that d u satisifes the non-negativity.
(iv) From the definition, S u is convex and homogeneous as u is convex and homogeneous. It follows that for any ε > 0 , Π 1 Π ( P 1 , P 2 ) , Π 2 Π ( P 2 , P 3 ) exists such that
d u ( P 1 , P 2 ) S u Π 1 ( ξ 1 ξ 2 ) ε , d u ( P 2 , P 3 ) S u Π 1 ( ξ 2 ξ 3 ) ε .
From Theorem 6.10 of [19], we know that ξ 1 * , ξ 2 * , ξ 3 * exists such that ( ξ 1 * , ξ 2 * ) has the joint distribution Π 1 and ( ξ 2 * , ξ 3 * ) has the joint distribution Π 2 . As a result,
d u ( P 1 , P 2 ) + d u ( P 2 , P 3 ) S u ( ξ 1 * ξ 2 * ) + S u ( ξ 2 * ξ 3 * ) 2 ε = S u ( 2 ξ 1 * ξ 2 * ) / 2 + S u ( 2 ξ 2 * ξ 3 * ) / 2 2 ε S u ( ξ 1 * ξ 3 * ) 2 ε ,
where the second equality is the result of the homogeneity of S u and the last inequality holds due to the convexity of S u and the subadditivity of the norm. □
Proposition 1 tells us that when u is increasing, convex and positively homogeneous with u ( 0 ) = 0 , then the metric d u satisfies all the desired properties of a distance metric.

2.2. Formula of DRO Problems Based on Shortfall-Wasserstein Metric

Based on the shortfall–Wasserstein metric, we define the following ambiguity set
B ε d u ( P ^ N ) : = P M ( R n ) : d u ( P ^ N , P ) ε .
which is called the shortfall–Wasserstein ball centered at the empirical distribution P ^ N . We consider the following problem
inf ω D sup P B ε d u ( P ^ N ) E P l ( ω ξ ) ,
where ω is the decision vector, D is the feasible set of decision vector, and l is the loss function. The above problem (2) is called the shortfall–Wasserstein data-driven distributionally robust optimization model (SW-DD-DRO).
To end the section, we present two applications of stochastic optimization (1): regression and risk minimization.

2.2.1. Regression

Considering a linear regression problem, our purpose is to find a linear predictor function β X with β the regression coefficient vector. Our attention here is to find an accurate estimator of β that is robust to adversarial perturbations of the data. Thus, distributionally robust regression models are constructed in the following form:
inf β D ¯ sup P B ε d ( P ^ N ) E P l ( Y β X ) ,
In the literature, the loss function always takes the form l ( Y , X ) = | Y β X | p with p 1 , and the regression model is known as least-squares regression when p = 2 . For convenience, we denote ξ : = ( Y , X ) and D : = { ( 1 , β ) : β D ¯ } , then the problem can be reformulated as the form of (2).

2.2.2. Portfolio Optimization

If we denote by ξ a random vector of returns from n different financial assets and by ω the allocation vector, then the random variable ω ξ means the total return of the portfolio. Thus, the problem (1) is considered a portfolio optimization problem. As an example, the risk measure ρ P ( ω ξ ) = E P ( ( ω ξ c ) + p ) is a well-known class of downside risk measure as the loss function takies the form of l ( X ) = ( X c ) + p , p 1 .

3. Shortfall–Wasserstein Data-Driven DRO

3.1. Reformulation of the Shortfall–Wasserstein DRO

To solve problem (2), the key step is to solve the inner maximization problem of (2), that is, for fixed ω D , ε 0 ,
sup P M ( R n ) E P l ( ω ξ ) subject to d u ( P , P ^ N ) ε .
By the definition of the metric, the above problem can be rewritten as
sup P M ( R n ) E P l ( ω ξ ) subject to S u Π ( ξ ξ ^ p ) ε , M ( R n × R n ) is a joint distribution of ξ and ξ ^ with marginals P and P ^ N , respectively .
As S u ( · ) satisifes the translation invariance, for fixed ε 0 , we have the constraint of the above problem S u Π ( ξ ξ ^ p ) ε being equivalent to S u Π ( ξ ξ ^ p ε ) 0 , that is, E [ u ( ξ ξ ^ p ε ) ] 0 . Therefore, we can further rewrite the above problem (5) as
sup P M ( R n ) E P l ( ω ξ ) subject to E Π u ( ξ ξ ^ p ε ) 0 , M ( R n × R n ) is a joint distribution of ξ and ξ ^ with marginals P and P ^ N , respectively .
To further simplify this problem, we first define the following notation:
B p , ε n ( P 0 ) : = { P : E Π u ( ξ ξ 0 p ε ) 0 , Π ( d ξ , R n ) = P ( d ξ ) , Π ( R n , d ξ 0 ) = P 0 ( d ξ 0 ) } ,
and
B ω , p , ε ( P 0 ) : = { F ω Z : F Z B p , ε n ( P 0 ) } .
We study the projection result of the shortfall–Wasserstein ball. Throughout the following subsections, U denotes the set of all strictly increasing continuous functions on R with u ( 0 ) = 0 . In the following theorem, we show that the constraint of the above problem (6) can be conveniently converted to a univariate setting.
Theorem 1. 
Assume random vector X with distribution function F X M ( R n ) , u U and p > 1 , 1 / p + 1 / q = 1 , we obtain that B ω , p , ε ( F X ) = B ω q ε 1 ( F ω X ) holds for any ε 0 , ω R n , if and only if β , a 1 , a 2 > 0 exists such that
u ( x ) = a 1 x β , x 0 , a 2 ( x ) β , x < 0 .
Proof. 
To see the sufficiency, assume that u ( · ) is given by (9) with β , a 1 , a 2 > 0 . Then we can verify that u ( λ x ) = λ β u ( x ) for any x R and λ 0 . For any F B ω , p , ε ( F X ) , Z R n exists such that F Z B p , ε n ( F X ) satisfies E Π u ( Z X p ε ) 0 , and F = F ω Z . Due to the increasing property of u ( · ) and the Hölder inequality, we have
E u ( | ω Z ω X | ω q ε ) E u ( ω q Z X p ω q ε ) = ω q β E u ( Z X p ε ) 0 .
Thus, F B ω q ε 1 ( F ω X ) , then B ω , p , ε ( F X ) B ω q ε 1 ( F ω X ) .
The converse direction of the set inclusion is presented next. For any F B ω q ε 1 ( F ω X ) , Z F exists satisfying E u ( | Z ω X | ω q ε ) 0 . Let Y = Z ω X , Z ˜ = X + ( ω q / p Y ) / ( ω ω q / p ) , in which ω q / p is defined of the form
ω q / p : = ( s i g n ( ω 1 ) | ω 1 | q / p , , s i g n ( ω n ) | ω n | q / p ) ,
and s i g n : R { 1 , 1 } is the sign function. Thus, we compute ω ω q / p = i = 1 n | ω i | | ω i | q / p = i = 1 n | ω i | 1 + q / p = i = 1 n | ω i | q = ω q q , ω q / p Y p = ( i = 1 n | ω i | q | Y | p ) 1 / p = | Y | ( i = 1 n | ω i | q ) 1 / p = | Y | ω q q / p . Then,
E u ( Z ˜ X p ε ) = E u ( ω q / p Y ) / ( ω ω q / p ) p ε = E u 1 ω q q ω q / p Y p ε = E u ω q q / p ω q q | Y | ε = E u 1 ω q | Y | ε = 1 ω q β E u ( | Z ω X | ω q ε ) 0 ,
the fourth equality comes from q ( 1 1 / p ) = 1 . As ω Z ˜ = Z , we obtain F B ω , p , ε ( F X ) . This implies B ω q ε 1 ( F ω X ) B ω , p , ε ( F X ) . Hence, we conclude that B ω , p , ε ( F X ) = B ω q ε 1 ( F ω X ) .
To see the necessity, suppose B ω , p , ε ( F X ) = B ω q ε 1 ( F ω X ) holds for any ε 0 , ω R n . From (7) and (8), we obtain that
E u ( Z X p ε ) 0 E u ( | ω Z ω X | ω q ε ) 0 , ω R n .
Specifically, choose Z X = Y e , where Y is a random variable on R and e R n with e p = 1 . Let ω = λ e / e q , λ > 0 , then we have
E u ( | ω Z ω X | ω q ε ) = E u λ | e e | e q | Y | λ ε = E u ( λ | Y | λ ε ) ,
where the second equality comes from | e e | = e p e q and e p = 1 . Noting that E u ( Z X p ε ) = E u ( | Y | ε ) , from (10) we obtain E u ( λ | Y | λ ε ) 0 if and only if E u ( | Y | ε ) 0 for any λ > 0 . By the arbitrariness of random variable Y and ε 0 , we obtain for any random variable X on R
E u ( X ) 0 E u ( λ X ) 0 , λ > 0 .
It follows that S u ( · ) is positively homogeneous. By the proposition 2.9 of [20], we can obtain the representation of (9). This completes the proof. □
By Theorem 1, a multi-dimensional sphere can be affinely projected to a one-dimensional sphere; thus, the multi-dimensional constraint of (6) can be simplified to the following one-dimensional one which enables the problem to be much more tractable.
Corollary 1. 
Under the condition of Theorem 1, for u given by (9), we have problem (6) equivalent to
sup G M ( R ) E G l ( Y ) subject to E G u ( | Y ω ξ ^ | ω q ε ) 0 .

3.2. Finite Sample Guarantee

Next, we will demonstrate that SW-DD-DRO has the great property of finite sample guarantee. In practice, the optimal solution ω ^ N is constructed from the training dataset Ξ ^ N , but we always pay more attention to the out-of-sample performance of the optimizer. As the true distribution P 0 is always unknown, the out-of-sample performance is hard to evaluate. Thus, we hope to establish a tight upper bound to provide performance guarantees for the solution as our main concern is to control the costs from above. To clarify our analysis, we first introduce some notations and assumptions.
  • V * : the optimal risk we target at, i.e.,
    V * : = inf ω D E P 0 l ( ω ξ ) ,
    where P 0 is the true distribution.
  • V ^ N : the in-sample risk achieved by SW-DD-DRO, i.e.,
    V ^ N : = inf ω D sup P B ε u d u ( P ^ N ) E P l ( ω ξ ) ,
    where P ^ N is the empirical distribution.
  • V * : the out-of-sample risk achieved by the SW-DD-DRO solution, i.e.,
    V * = E P 0 l ( ω ^ N ξ ) ,
    where ω ^ N is the optimal solution of the problem in (13).
From (11), the ambiguity set of the inner maximization problem of (13) can be reformulated as
B ω q ε u 1 ( F ω ξ ^ ) : = { G M ( R ) : E G u ( | Y ω ξ ^ | ω q ε u ) 0 } .
To establish the result of finite sample guarantee, the following assumptions are needed.
Assumption 1 
(Light-tailed distribution). With β identified in the function u ( · ) defined in Theorem 1, an exponent a > β exists such that A : = E P 0 [ exp ( ξ p a ) ] = exp ( ξ p a ) P 0 ( d ξ ) < .
Assumption 2. 
The feasible set D R n is bounded from above such that H D : = sup ω D ω q < + , where 1 / p + 1 / q = 1 .
Assumption 3. 
The feasible set D R n is away from the origin such that L D : = inf ω D ω q > 0 , where 1 / p + 1 / q = 1 .
Assumption 1 is a common assumption that demands the rate of decay of the tail of the distribution P 0 . Assumptions 2 and 3 are requirements of the feasible set D , which are also mentioned in [21,22].
Proposition 2 
(Finite sample guarantee). Assumptions 1–3 are in force, u ( · ) has the form defined in Theorem 1. Let V ^ N and ω ^ N denote the optimal value and an optimizer of the problem (13). The radius of the ambiguity set is defined as the following form, with η ( 0 , 1 ) and a constant C relies on a 1 , a 2 , β ,
ε N u ( η ) = C 1 / β L D 1 log ( c 1 η 1 ) c 2 N 1 / 2 β , i f N l o g ( c 1 η 1 ) c 2 , C 1 / β L D 1 log ( c 1 η 1 ) c 2 N 1 / a , i f N < l o g ( c 1 η 1 ) c 2 .
where c 1 , c 2 only rely on a, A, H D and β. Thus, it holds the finite sample guarantee
P N { V * V ^ N } 1 η .
Proof. 
Denote I 1 : = I { X ε } , I 2 : = I { X < ε } , for all nonnegative random variables X,
E u ( X ε ) = E [ u ( X ε ) I 1 ] + E [ u ( X ε ) I 2 ] = a 1 E [ ( X ε ) β I 1 ] a 2 E [ ( ε X ) β I 2 ] a 1 E [ X β I 1 ] a 2 E [ ( ε X ) β I 2 ] = a 1 E [ X β ] E [ ( a 1 X β + a 2 ( ε X ) β ) I 2 ] .
Define f ( x ) = a 1 x β + a 2 ( ε x ) β , x [ 0 , ε ) , we have f ( x ) = a 1 β x β 1 a 2 β ( ε x ) β 1 , f ( x ) = a 1 β ( β 1 ) x β 2 + a 2 β ( β 1 ) ( ε x ) β 2 . Solving the equation f ( x ) = 0 , we obtain
x 0 = a 2 1 / ( β 1 ) a 1 1 / ( β 1 ) + a 2 1 / ( β 1 ) ε .
Denote c 0 : = a 2 1 / ( β 1 ) a 1 1 / ( β 1 ) + a 2 1 / ( β 1 ) , then x 0 ( 0 , ε ) as 0 < c 0 < 1 . When 0 < β < 1 , we obtain f ( x ) < 0 , thus f ( x ) is concave. In the interval [ 0 , ε ) , f ( x ) attains its infimum at the endpoints, which means f ( x ) min { a 1 , a 2 } ε β . When β > 1 , we have f ( x ) > 0 , f ( x ) is a convex function and attain its infimum at x 0 . We have f ( x ) f ( c 0 ε ) = ( a 1 c 0 β + a 2 ( 1 c 0 ) β ) ε β . Then f ( x ) min { a 1 c 0 β + a 2 ( 1 c 0 ) β , min { a 1 , a 2 } } ε β holds for all β > 0 . Denoting C : = min { a 1 c 0 β + a 2 ( 1 c 0 ) β , min { a 1 , a 2 } } / a 1 .Therefore, E [ ( a 1 X β + a 2 ( ε X ) β ) I 2 ] a 1 C ε β , and
E u ( X ε ) a 1 E [ X β ] a 1 C ε β .
As a result, for any ε > 0 , E u ( X ε ) 0 if E [ X β ] C ε β . The β -type Wasserstein metric is defined as the following:
d W β ( P 1 , P 2 ) : = inf E Π ( ξ 1 ξ 2 p β ) is a joint distribution of ξ 1 and ξ 2 with marginals P 1 and P 2 , respectively ,
Therefore, we have
B C ε β d W β ( F ω ξ ^ ) B ε d u ( F ω ξ ^ ) .
From the assumption, a > β , ξ 0 P 0 exists, satisfying A : = E P 0 [ exp ( ξ 0 p a ) ] < . We have
E F 0 [ exp ( H D a | ω ξ 0 | a ) ] E P 0 [ exp ( ξ 0 p a ) ] < ,
where F 0 : = F ω ξ 0 , and the first inequality is due to the Hölder inequality. The above inequality implies that F ω ξ 0 satisfies the condition of Theorem 2 of [23]. As the corresponding empirical distribution of F ω ξ 0 is F ω ξ ^ , we know that the finite sample guarantee holds when d W β is applied to summarize the transportation cost random variable and ε N W ( η ) is specified, that is
P F ω ξ 0 B ε N W d W β ( F ω ξ ^ ) 1 η , where ε N W ( η ) : = log ( c 1 η 1 ) c 2 N 1 / 2 , if N log ( c 1 η 1 ) c 2 , log ( c 1 η 1 ) c 2 N β / a , if N < log ( c 1 η 1 ) c 2 ,
where c 1 , c 2 only rely on a, A, H D and β . We define ε N u ( η ) for the shortfall–Wasserstein metric d u ,
ε N u ( η ) : = 1 L D ε N W ( η ) C 1 / β = C 1 / β L D 1 log ( c 1 η 1 ) c 2 N 1 / 2 β , if N log ( c 1 η 1 ) c 2 , C 1 / β L D 1 log ( c 1 η 1 ) c 2 N 1 / a , if N < log ( c 1 η 1 ) c 2 .
From (17), we obtain
P ( V * V ^ N ) P ( P 0 B ε N u d u ( P ^ N ) ) = P F ω ξ 0 B ω q ε N u d u ( F ω ξ ^ ) P F ω ξ 0 B ( ω q L D ) β ε N W d W β ( F ω ξ ^ ) P F ω ξ 0 B ε N W d W β ( F ω ξ ^ ) 1 η ,
where ξ 0 P 0 , ξ ^ P ^ N . The first inequation comes from the fact that P 0 B ε N u d u ( P ^ N ) implies V * V ^ N . This completes the proof. □
We show in Proposition 2 that the out-of-sample performance of ω ^ N can be bounded, when the radius of the ambiguity set is properly calibrated, by the optimal value V ^ N with some confidence level. It is noteworthy that the order of the radius ε in this paper is O ( N 1 / 2 β ) , independent of the dimension of the nominal distribution, while the order of the radius suffers seriously from the curse of dimensionality in [4].

4. Worst-Case Expectation under the Shortfall–Wasserstein Metric

This section studies the tractability of solving (4). For notational convenience, denote c ( x , y ) : = u ( | y x | ε * ) , and X has a known probability distribution μ P ( R ) . We define the primal problem in the following form:
I ¯ : = sup l ( y ) d π ( x , y ) : π Φ μ ,
where Φ μ : = { π Π ( μ , ν ) : c ( x , y ) d π ( x , y ) 0 } . Let C : = { ( x , y ) R × R , c ( x , y ) < + } , Λ c , l : = { ( λ , φ ) : λ 0 , φ m U ( R ; R ¯ ) , φ ( x ) + λ c ( x , y ) l ( y ) for all ( x , y ) C } [15]. For every such ( λ , φ ) Λ c , f , define
J ( λ , φ ) : = C φ d μ .
For any π Φ μ , ( λ , φ ) Λ c , l , π ( C ) = 1 as c d π is finite. As a result, for every measurable g : ( R × R , B ( R × R ) ) ( R ¯ , B ( R ¯ ) ) , C g d π = g d π . Thus we have
J ( λ , φ ) = C φ d μ C ( l ( y ) λ c ( x , y ) ) d π ( x , y ) = l ( y ) d π ( x , y ) λ c ( x , y ) d π ( x , y ) l ( y ) d π ( x , y ) = I ( π ) .
According to the tradition in optimization theory, we refer to the following problem as the dual to the primal problem:
J ̲ : = inf φ d μ : ( λ , φ ) Λ c , l .
Consequently, the weak duality holds:
J ̲ I ¯ .
To further identify the equivalence between I ¯ and J ̲ , for every λ 0 , we define φ λ : R R ¯ as follows:
φ λ ( x ) : = sup y R { l ( y ) λ c ( x , y ) } , i f λ > 0 , sup y R { l ( y ) : c ( x , y ) < + } , i f λ = 0 .
To simplify the notation, we write λ c ( x , y ) = + whenever λ = 0 and c ( x , y ) = + ; thus, we have φ λ ( x ) = sup y R { l ( y ) λ c ( x , y ) } , x R . In Theorem 2, we show that, for performance measures l satisfying Assumption 4, I ¯ equals J ̲ .
Assumption 4. 
The function l : R R is upper semicontinuous with l L 1 ( d μ ) .
Theorem 2. 
Under Assumption 4 and c ( x , y ) = u ( | y x | ε * ) , we can conclude
(a) 
I ¯ = J ̲ ;
(b) 
A dual optimizer of the form ( λ * , φ λ * ) exists for some λ * 0 and φ λ * ( · ) defined as in (20). Moreover, any feasible solutions π * and ( λ * , φ λ * ) are optimizers of the primal and dual problem, satisfying I ( π * ) = J ( λ * , φ λ * ) , if and only if
l ( y ) λ * c ( x , y ) = sup z R { l ( z ) λ * c ( x , z ) } π * a . s . ,
λ * c ( x , y ) d π * ( x , y ) = 0 .
Additionally, if the primal optimizer π * exists and there is solely one y in R that attains the supremum in sup y R { l ( y ) λ * c ( x , y ) } for μ, almost every x R , then π * is unique.
Remark 1. 
From (21), if the optimal measure π * exists, π * { ( x , y ) R × R : y arg max z R { l ( z ) λ * c ( x , z ) } } = 1 , which means the worst-case joint probability is identified by a transport plan that transports mass from x to the optimizer of the local optimization problem sup z R { l ( z ) λ * c ( x , z ) } . Furthermore, when λ * > 0 , we obtain c ( x , y ) d π * ( x , y ) = 0 .
Remark 2. 
If a subset A R , μ ( A ) > 0 exists, for any x A , the rate the loss function l ( y ) grows to + is faster than the rate of c ( x , y ) growth, then I ¯ = J ̲ = + . The reason is that for every λ 0 and x A , φ λ ( x ) = sup z R { l ( z ) λ c ( x , z ) } = + ; thus, φ λ ( x ) d μ ( x ) = + . Therefore, sometimes, it may be necessary to require the loss function l not to grow faster than c.
The proof of Theorem 2 is provided in Appendix A.
Corollary 2. 
Under the Assumption 4 and c ( X , Y ) = u ( | Y X | ε * ) , we can conclude
I ¯ = inf λ 0 E μ sup y R { l ( y ) λ u ( | y X | ε * ) } .
Proof. 
From the proof of Theorem 2, we can see that there always exist λ * [ 0 , + ) , J ̲ = inf λ 0 { φ λ ( x ) d μ ( x ) } = φ λ * ( x ) d μ ( x ) I ¯ = J ̲ , then we conclude
I ¯ = J ̲ = inf λ 0 φ λ ( x ) d μ ( x ) = inf λ 0 E μ sup y R { l ( y ) λ c ( X , y ) } = inf λ 0 E μ sup y R { l ( y ) λ u ( | y X | ε * ) } .
The proof is complete. □
Remark 3. 
This conclusion that the optimal value of the multi-dimensional primal problem can be obtained by paying attention to the univariate reformulation in the right-hand side of (23) is of great significance. Moreover, the right-hand side of (23) is completely characterized given X : = ω ξ ^ and the training dataset { ξ ^ i } i N .
Looking back on the problem (11), with X taking values in { ω ξ ^ 1 , ω ξ ^ 2 , , ω ξ ^ N } with equal probability and ε * : = ω q ε , we obtain
I ¯ N = inf λ 0 1 N i = 1 N sup y R { l ( y ) λ u ( | y ω ξ ^ i | ω q ε ) } .
Indeed, this result implies that this multi-dimensional optimization problem in (5) can also be solved by putting effort into the completely characterized reformulation of (24). Considering the problem with the ambiguity set constructed by the classic Wasserstein metric (the norm defined in the metric is also · p with p > 1 ),
inf ω D sup P B ε d W ( P ^ N ) E P l ( ω ξ ) .
Denote the inner maximum problem I ¯ N W = sup P B ε d W ( P ^ N ) E P l ( ω ξ ) , we can obtain a similar result by applying Theorem 7 of [10] and Theorem 1 of [15], that is
I ¯ N W = inf λ 0 λ ω q ε + 1 N i = 1 N sup y R { l ( y ) λ | y ω ξ ^ i | } .
Observing the formulas above, we can find that I ¯ N = I ¯ N W when u ( x ) = x + x = x . The result (24) is much more flexible than the result (25) as the form of the u ( · ) is optional.
To elaborate on the way our results help to solve practical problems, several simulations are introduced in the following.

5. Simulation

In this section, we show the application of Corollary 2 in solving the regression problem and the portfolio optimization problem. Moreover, several simulations are operated to investigate the performance of our model.

5.1. Regression Model

As mentioned in Section 2, our result can help to find an accurate estimator of the regression coefficient vector in linear regression problems
inf β D ¯ sup P B ε d ( P ^ N ) E P l ( Y β X ) ,
with ξ : = ( Y , X ) and D : = { ( 1 , β ) : β R d 1 } . In the literature, loss functions with the form { l : l ( Y , X ) = | Y β X | p , p 1 } are widely studied, especially when p = 1 and p = 2 . To elaborate on how this result helps to solve the tractability of regression programs, an example is presented below.
Example 1. 
The Least Absolute Deviation (LAD) regression model seeks the regression coefficient estimator that minimizes the sum of absolute residuals i = 1 N | y i x i β | . Take l ( y ) : = | y | , and more specifically, we take ε = 1 , u ( x ) = α x + β x , 0 < α < β . Then we have
I ¯ N = inf λ 0 1 N i = 1 N sup y R { l ( y ) λ u ( | y ω ξ ^ i | ω q ) } = inf λ 0 1 N i = 1 N sup y R { | y | α λ ( | y ω ξ ^ i | ω q ) + + β λ ( | y ω ξ ^ i | ω q ) } .
Let h i ( y ) = | y | α λ ( | y ω ξ ^ i | ω q ) + + β λ ( | y ω ξ ^ i | ω q ) , then h i ( y ) can be further broken down into the following form:
h i ( y ) = | y | α λ ( y ω ξ ^ i ω q ) , y ω ξ ^ i + ω q , | y | + β λ ( ω q | y ω ξ ^ i | ) , ω ξ ^ i ω q < y < ω ξ ^ i + ω q , | y | α λ ( ω ξ ^ i y ω q ) , y ω ξ ^ i ω q .
Considering the case y + , then
h i ( y ) = y α λ ( y ω ξ ^ i ω q ) = ( 1 α λ ) y + α λ ( ω ξ ^ i + ω q ) .
To make the supremum bounded, it is necessary to have 1 α λ 0 , such that λ 1 / α . Similarly, when y , we can also derive λ 1 / α . We next consider the following three cases.
Case A. 
Considering when y ω ξ ^ i + ω q , we can see that h i ( y ) is non-increasing whether ω ξ ^ i + ω q is non-negative or non-positive; thus, h i ( y ) attains its supremum at y = ω ξ ^ i + ω q , with h i ( ω ξ ^ i + ω q ) = | ω ξ ^ i + ω q | .
Case B. 
Considering when ω ξ ^ i ω q < y < ω ξ ^ i + ω q , we are supposed to further consider the form of h i ( y ) under the cases ω ξ ^ i + ω q 0 and ω ξ ^ i + ω q > 0 .
(Case B.1) If ω ξ ^ i + ω q 0 ,
h i ( y ) = ( 1 + β λ ) y + β λ ( ω ξ ^ i + ω q ) , ω ξ ^ i y < ω ξ ^ i + ω q , ( β λ 1 ) y β λ ( ω ξ ^ i ω q ) , ω ξ ^ i ω q < y < ω ξ ^ i .
The supremum in this case is ω ξ ^ i + β λ ω q .
(Case B.2) If ω ξ ^ i + ω q > 0 and ω ξ ^ i ω q > 0 ,
h i ( y ) = ( 1 β λ ) y + β λ ( ω ξ ^ i + ω q ) , ω ξ ^ i y < ω ξ ^ i + ω q , ( 1 + β λ ) y β λ ( ω ξ ^ i ω q ) , ω ξ ^ i ω q < y ω ξ ^ i .
The supremum in this case is ω ξ ^ i + β λ ω q .
(Case B.3) If ω ξ ^ i + ω q > 0 and ω ξ ^ i ω q 0 and ω ξ ^ i > 0 ,
h i ( y ) = ( 1 β λ ) y + β λ ( ω ξ ^ i + ω q ) , ω ξ ^ i < y < ω ξ ^ i + ω q , ( 1 + β λ ) y β λ ( ω ξ ^ i ω q ) , 0 < y ω ξ ^ i , ( β λ 1 ) y β λ ( ω ξ ^ i ω q ) , ω ξ ^ i ω q < y 0 .
The supremum in this case is ω ξ ^ i + β λ ω q .
(Case B.4) If ω ξ ^ i + ω q > 0 and ω ξ ^ i ω q 0 and ω ξ ^ i 0 ,
h i ( y ) = ( 1 β λ ) y + β λ ( ω ξ ^ i + ω q ) , 0 < y < ω ξ ^ i + ω q , ( 1 + β λ ) y + β λ ( ω ξ ^ i + ω q ) , ω ξ ^ i < y 0 , ( β λ 1 ) y β λ ( ω ξ ^ i ω q ) , ω ξ ^ i ω q < y ω ξ ^ i .
The supremum in this case is ω ξ ^ i + β λ ω q .
Combining the above four subcases, we conclude that the supremum is | ω ξ ^ i | + β λ ω q in Case B. Moreover, in all different situations, h i ( x ) keeps the property of first monotonically increasing and then monotonically decreasing.
Case C. 
Considering when y ω ξ ^ i ω q , similar to Case A, h i ( x ) is non-decreasing and attains its supremum at y = ω ξ ^ i ω q , with h i ( ω ξ ^ i ω q ) = | ω ξ ^ i ω q | .
Combining the above three cases, we have that h i ( x ) is monotonically increasing and then monotonically decreasing under all situations. The supremum is | ω ξ ^ i | + β λ ω q . As a result, we have the optimum value
I ¯ N = inf λ 1 / α 1 N i = 1 N [ | ω ξ ^ i | + β λ ω q ] = 1 N i = 1 N | ω ξ ^ i | + α β ω q .
We can also consider this problem under the Wasserstein metric. A similar discussion yields that
I ¯ N W = 1 N i = 1 N | ω ξ ^ i | + ω q .
Although the two results share similar formulas, the result under the shortfall–Wasserstein metric is more flexible as it is possible to adjust the parameters α , β to achieve better performance.
The model that we simulated here is as follows:
y i = β x i + e i , β = ( 1 , 2 , 2 , 3 ) , x i = ( x 1 i , x 2 i , x 3 i , x 4 i ) , e i N ( 0 , 1 ) .
Applying the above result under the shortfall–Wasserstein metric to this practical problem, we are supposed to find an estimator β ^ that minimizes
1 N i = 1 N | y i β x i | + α β ( 1 + β q ) .
When α = β , it reduces to the case of the classic Wasserstein metric. To illustrate their performance, we generate six hundred sets of samples with x i generated from the distribution U ( 0 , 1 ) and e i generated from the distribution N ( 0 , 1 ) independently and identically. The trend of MSE as λ : = α β ranges from one to ten is presented below.
In Figure 1, we find that it is possible to achieve smaller MSE by adjusting the value of λ . Moreover, when q = 2 and q = 3 , the value of λ that minimizes MSE is larger than 1, which means that the shortfall–Wasserstein robust regression can achieve a better prediction effect than the Wasserstein robust regression.
To test the robustness of our model, we further introduce some outliers and compare the performance with the least-squares regression (LSR) model and the ridge regression model. We set α = β for the shortfall–Wasserstein robust regression model and the regularization coefficient 1 for the ridge regression. We conduct one hundred times iterations, producing one hundred sets of samples each time, half of which served as the training set while the rest served as the testing set. Moreover, ten sets of outlier samples are added to the training set at each iteration. The result is presented below.
In Figure 2, the MSE under the shortfall–Wasserstein robust regression model is much smaller and less volatile than the MSE under the other two models, which means the shortfall–Wasserstein robust regression model is better in resisting the large deviations in the predictors. As a result, the result based on the shortfall–Wasserstein metric is reliable, stable, and superior to the one based on the Wasserstein metric.

5.2. Portfolio Optimization

With ξ being a random vector of returns from n different financial assets and ω being the allocation vector, the problem becomes a portfolio optimization problem. In this subsection, we take l ( X ) = ( X c ) + to characterize the downside risk, and the problem we are interested in is of the following form:
inf ω D sup P B ε d u ( P ^ N ) E P ( ω ξ c ) + ,
where c is a constant. By Corollary 2, with u ( x ) = α x + β x , 0 < α < β , the inner maximum problem can be reformulated to the following form:
1 N i = 1 N ( ω ξ ^ i c ) + + α β ω q ε .
With the ambiguity set constructed by the classic Wasserstein metric, the result is the same as the form of the above formula with α = β . For simulation, we choose four MSCI index assets, which are the MSCI Denmark index, the MSCI Turkey index, the MSCI Greece index, and the MSCI Norway index. We collect the daily closing prices of those indexes ranging from 1 January 2020 to 31 December 2022 from cn.investing.com. It is noteworthy that the COVID-19 outbreak began in 2020, and the distribution of assets is highly uncertain during this period. Based on those data, assume the initial asset is USD 1000 and there is no short sale. We set a 30-day time sliding window, the optimal weight is calculated based on the previous 30 days of historical data, and this result is taken as the investment decision of the next day. As the time window rolls, the cumulative return curves under different strategies are presented below.
The four curves represent the cumulative returns under the model constructed by the shortfall–Wasserstein metric with β α = 9 , the model constructed by the classic Wasserstein metric, the mean-variance model and the 1 n portfolio model, respectively. The mean-variance model, a model that aims to minimize investment variance under certain expected returns, was proposed by [24]. The 1 n portfolio model divides the money equally among each asset, and [25] found that the 1 n portfolio performs well when the overall distribution of assets is highly uncertain.
In Figure 3, we can see that the first two cumulative return curves under the robust optimization models outperform the curves under the mean-variance model and the 1 n portfolio model. Because of the similarity of the result of models under the shortfall–Wasserstein metric and the classic Wasserstein metric, the first two curves essentially behave the same. Since 2020, the global economy has been in recession due to the pandemic. In 2022, the epidemic was relatively stable and the economy slowly began to recover. During this period, the distribution of the four assets is highly uncertain. All four of these models perform well as the curves climb steadily, but the robust models take the variability into account and make better decisions, especially during the recovery period.
In general, in regression problems, our results are more reliable and robust when the sample is contaminated; in portfolio optimization problems, when the distribution of assets is highly uncertain, our results perform significantly better. Moreover, our result can be more widely used than the result under the Wasserstein model as it is applicable for many complex forms of the function l. Even when the loss function l ( · ) is relatively simple, our model can achieve the same or even better performance than the classic Wasserstein model by adjusting the form of the function u ( · ) .

6. Conclusions

In this paper, we propose a new DRO framework by extending the classic Wasserstein metrics to the shortfall–Wasserstein metrics. We study the tractability and reformulations of the shortfall–Wasserstein DRO problems for the loss function which is linear in the decision vector. This case of objective function includes many applications, such as regression and portfolio selection. One interesting result in the paper is that we give an equivalent characterization of the projection result to a one-dimensional ball. Based on the projection result, we show that the multi-dimensional constraint of our distributionally robust models can be reformulated as a one-dimensional one. Based on this reformulation, we established the finite sample guarantee of the DRO problem which is free from the curse of dimensionality. We present the application of our model in regression and provide simulation results to illustrate the performance of our results. In addition, we also present the real-data analysis on a portfolio selection to illustrate the performance of our new DRO model. Since this paper focuses on the linear cost function l ( w x ) , a possible future study is to consider the general loss function l ( w , x ) and study the reformulations and tractability of the general DRO problems.

Author Contributions

Conceptualization, R.L.; methodology, R.L., W.L. and T.M.; software, R.L.; validation, R.L., W.L. and T.M.; formal analysis, R.L. and W.L.; investigation, W.L. and T.M.; resources, W.L. and T.M.; data curation, R.L.; writing—original draft preparation, R.L.; writing—review and editing, W.L. and T.M.; visualization, R.L. and W.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Natural Science Foundation of China (Nos. 71671176, 71871208) and Anhui Natural Science Foundation (No. 2208085MA07).

Data Availability Statement

The data that support the analysis of this study are openly available in https://cn.investing.com/.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A. Proof of Theorem 2

The proof procedure of Theorem 2 follows a similar idea as Theorem 1 of [15]. For completely, we have included it in the Appendix. We verify the strong duality in compact Polish spaces S first and then generalize to R .

Appendix A.1. Strong Duality in Compact Spaces

To prepare for the verification, let us denote C S : = { ( x , y ) S × S , c ( x , y ) < + } , Λ S , c , l : = { ( λ , φ ) : λ 0 , φ m U ( S ; R ¯ ) , φ ( x ) + λ c ( x , y ) l ( y ) f o r a l l ( x , y ) C S } , and for η R , define
I η ¯ = sup l ( y ) d π ( x , y ) : π ν P ( S ) Π ( μ , ν ) , c d π η ,
J η ̲ = inf { λ η + φ d μ : ( λ , φ ) Λ S , c , l } .
Proposition A1. 
Suppose S is a compact Polish space, and l : S R satisfies Assumption 4, c ( X , Y ) = u ( | Y X | ε * ) , then for any η R , I η ¯ = J η ̲ , and a primal optimizer π η * satisfying I ( π η * ) = I η ¯ exists.
Proof. 
To prepare to apply the Fenchel duality theorem (see Theorem 1 in Chapter 7 of [26]), denote X = C b ( S × S ) and its topological dual X * = M b ( S × S ) , which is a consequence of the Riesz representation theorem. Next, let
C : = { g ( x , y ) X : g ( x , y ) = φ ( x ) + λ c ( x , y ) f o r a l l x , y f o r s o m e φ C b ( S ) , λ 0 } ,
D : = { g ( x , y ) X : g ( x , y ) l ( y ) f o r e v e r y x , y } .
In the set C, each g is identified by the pair ( λ , φ )
φ ( x ) = g ( x , x ) λ c ( ε * ) , λ = g ( x , y ) g ( x , x ) + λ c ( ε * ) c ( x , y ) ,
for some x , y in S, c ( x , y ) 0 . Define functions Φ : C R and Γ : D R as below:
Φ ( g ) : = λ η + φ d μ , Γ ( g ) : = 0 .
From the definition, it is evident that the functional Φ is convex and Γ is concave,
inf g C D { Φ ( g ) Γ ( g ) } = inf λ η + φ d μ : φ ( x ) + λ c ( x , y ) l ( y ) f o r a l l x , y f o r s o m e φ C b ( S ) , λ 0 .
The corresponding conjugate functions Φ * : C * R and Γ * : D * R are as below:
C * = π X * : sup g C [ g , π Φ ( g ) ] < + = π X * : sup g C g d π Φ ( g ) < + ,
D * = π X * : inf g D [ g , π Γ ( g ) ] > = π X * : inf g D g d π > ,
Φ * ( π ) = sup g C g d π Φ ( g ) , Γ * ( π ) = inf g D g d π .
For every π M b ( S × S ) ,
sup g C g d π Φ ( g ) = sup ( λ , φ ) R + × C b ( S ) ( φ ( x ) + λ c ( x , y ) ) d π ( x , y ) φ ( x ) d μ ( x ) + λ η = sup ( λ , φ ) R + × C b ( S ) λ c ( x , y ) d π ( x , y ) + η φ ( x ) ( d π ( x , y ) d μ ( x ) ) = 0 , i f c d π η a n d π ( A × S ) = μ ( A ) f o r a l l A B ( S ) , , o t h e r w i s e .
Therefore,
C * = π M b ( S × S ) : c d π η , π ( A × S ) = μ ( A ) f o r a l l A B ( S ) , Φ * ( π ) = 0 .
Next, to determine D * , we can see that if a measure π M b ( S × S ) is not non-negative, inf g D g d π = (for more details, see Lemma B.6 of [15]). When π M b ( S × S ) is non-negative, as l is an upper semicontinuous function that is bounded from above, we can use a monotonically decreasing sequence of continuous functions to approximate l point-by-point. As a result of the monotone convergence theorem, we have the following equality:
inf g ( x , y ) d π ( x , y ) : g ( x , y ) l ( y ) f o r a l l x , y = l ( y ) d π ( x , y ) .
Then,
D * = π M b ( S × S ) : l ( y ) d π ( x , y ) > , Γ * ( π ) = l ( y ) π ( x , y ) .
Then,
Γ * ( π ) Φ * ( π ) = l ( y ) d π ( x , y ) ,
C * D * = π ν P ( S ) Π ( μ , ν ) : c d π η , l ( y ) d π ( x , y ) > .
Then,
I η ¯ = sup { Γ * ( π ) Φ * ( π ) : π C * D * } .
Because there are some points of the relative interiors of C and D contained in the set C D and the epigraph of the function has a nonempty interior, further inf g C D { Φ ( g ) Γ ( g ) } is finite, by the Fenchel’s duality theorem,
inf g C D { Φ ( g ) Γ ( g ) } = sup π C * D * { Γ * ( π ) Φ * ( π ) } = I η ¯ .
Moreover, some π η * Φ μ that attain the supremum exist in the right. As C b ( S ) m U ( S ; R ¯ ) , then
J η ̲ inf λ η + φ d μ : φ ( x ) + λ c ( x , y ) f ( y ) f o r a l l x , y f o r s o m e φ C b ( S ) , λ 0 = I η ¯ .
Similar to the proof of weak duality, we can verify that I η ¯ J η ̲ . Then we obtain I η ¯ = J η ̲ and some π η * exist such that I ( π η * ) = I η ¯ . The proof is complete. □

Appendix A.2. Strong Duality in Non-Compact Spaces

Proposition A1 has established the strong duality in compact Polish spaces. We intend to extend the strong duality to R , the verification is broken down into the following steps, and Proposition A2 is the first step.
For convenience, we first introduce some notations. For any π P ( R × R ) , denote
S π : = S p t ( π X ) S p t ( π Y ) ,
in which S p t ( π X ) and S p t ( π Y ) are, respectively, referred to as the supports of marginals π X ( · ) : = π ( · × R ) and π Y ( · ) : = π ( R × · ) . We have that the set S π × S π is σ -compact as every probability measure defined on a Polish space has σ -compact support; thus, in Proposition A2, S π × S π can be expressed as the union of an increasing sequence of compact subsets ( S n × S n : n 1 ) [15]. Then we can apply the results of Proposition A1 via a sequential argument. For any closed subset V R , let
Λ ( V × V ) : = { ( λ , φ ) : λ > 0 , φ m U ( V ; R ¯ ) , φ ( x ) + λ c ( x , y ) l ( y ) f o r a l l ( x , y ) ( V × V ) C } ,
where C : = { ( x , y ) R × R : c ( x , y ) < } and m U ( V ; R ¯ ) denotes the set of measurable functions φ : ( V , U ( V ) ) ( R ¯ , B ( R ¯ ) ) . With this notation, Λ ( R × R ) = Λ c , l . Furthermore, the function φ λ ( x ) : = sup y S { l ( y ) λ c ( x , y ) } : ( R , U ( R ) ( R ¯ , B ( R ¯ ) ) is measurable. A more detailed explanation of this can be found in [15]. Finally, let us denote E as
E : = π ν P ( R ) Π ( μ , ν ) : c ( x , y ) d π ( x , y ) < + , | l ( y ) | d π ( x , y ) < + .
Proposition A2. 
If Assumption 4 holds and c ( X , Y ) = u ( | Y X | ε * ) , then for any π E ,
inf ( λ , φ ) Λ ( S π × S π ) J ( λ , φ ) I ¯ .
Proof. 
This proof is similar to [15]. We give a proof for completeness here. From the discussion above the proposition, we know that S π × S π is σ -compact. By the definition of σ -compactness, an increasing sequence of compact subsets of S n × S n exists such that S p t ( π ) S π × S π = n 1 ( S n × S n ) . As | l ( y ) | d π ( x , y ) and c ( x , y ) d π ( x , y ) are finite, one is able to find ( S n × S n , n 1 ) and η ( 0 , + ) to satisfy
p n : = π ( S n × S n ) 1 1 n ,
c ( x , y ) 1 ( S n × S n ) c d π ( x , y ) η n ( 1 1 n ) ,
| l ( y ) | 1 ( S n × S n ) c d π ( x , y ) 1 n ,
where ( S n × S n ) c : = ( R × R ) ( S n × S n ) . Define π n P ( S n × S n ) and its corresponding marginals μ n P ( S n ) as below:
π n ( · ) : = π ( · ( S n × S n ) ) p n , μ n ( · ) : = π n ( · × S n ) .
For every n 1 , define
I η , n ¯ : = sup l ( y ) γ ( d x , d y ) : γ ν P ( S n ) Π ( μ n , ν ) , c ( x , y ) d γ ( x , y ) η n ,
J η , n ̲ : = inf λ η n + φ d μ n : ( λ , φ ) Λ ( S n × S n ) ,
whose supports are S n × S n . As S n is compact, from Proposition A1, we know that a γ η , n * P ( S n × S n ) exists, satisfying
l ( y ) γ η , n * ( d x , d y ) = I η , n ¯ = J η , n ̲ .
Construct a measure π ˜ P ( R × R ) ,
π ˜ ( · ) = p n γ η , n * ( · ( S n × S n ) ) + π ( · ( S n × S n ) c ) .
It can be verified that π ˜ Π ( μ , ν ) for some ν P ( R ) . From the definition of π ˜ , we have
π ˜ ( · × R ) = p n γ η , n * ( ( · S n ) × S n ) + π ( ( · × R ) ( S n × S n ) c ) = p n μ n ( · S n ) + π ( ( · × R ) ( S n × S n ) c ) = π ( ( · × R ) ( S n × S n ) ) + π ( ( · × R ) ( S n × S n ) c ) = π ( · × R ) = μ ( · ) ,
while the third equality is because μ n ( · ) = π ( · × S n ) / p n , and p n μ n ( · S n ) = π ( ( · R ) ( S n × S n ) ) . Furthermore,
c d π ˜ = p n S n × S n c d γ η , n * + ( S n × S n ) c c d π η n ( 1 1 n ) + η n ( 1 1 n ) = 0 .
Then, π ˜ Φ μ , and consequently,
I ¯ l ( y ) π ˜ ( d x , d y ) = p n S n × S n l d γ η , n * + ( S n × S n ) c l d π .
Thus, we have I ¯ p n I η , n ¯ n 1 , and as I η , n ¯ = J η , n ̲ , we have J η , n ̲ ( 1 1 n ) 1 ( I ¯ + 1 n ) . Proposition A2 already holds when I ¯ = + , so we take I ¯ < + ; for every n 1 and any ε > 0 , take an ε -optimal solution ( λ n , φ n ) for J η , n ̲ ,
λ n η n + φ n d μ n J η , n ̲ + ε .
As ( λ n , φ n ) belongs to Λ ( S n × S n ) , from definition, φ n sup z S n { l ( z ) λ n c ( x , z ) } . Then, for every x S n and every n, we have
λ n η n + sup z S n { l ( z ) λ n c ( x , z ) } d μ n ( x ) J η , n ̲ + ε .
Combining the fact that μ n ( · ) = π ( · × S n ) / p n , we further have
lim n ¯ λ n η n + sup z S n { l ( z ) λ n c ( x , z ) } 1 s n ( x ) · 1 s n ( y ) p n d π ( x , y ) I ¯ + ε .
Since c ( x , x ) = c ( ε * ) > , then l ( x ) λ n c ( ε * ) is a lower bound of the integrand above on S n × S n , and we also have l L 1 ( d μ ) , we obtain the following two results:
(a) From (A1) and the fact that λ n 0 , we have lim ̲ n λ n , lim ̲ n λ n are finite, so { λ n : n 1 } has convergent subsequences, which means that a subsequence { n k : k 1 } exists at least such that λ n k λ * as k for some λ * [ 0 , ) ;
(b) By Fatou’s lemma and dominated convergence theorem, we obtain
I ¯ + ε lim ̲ k λ n k η n k + sup z S n k { l ( z ) λ n k c ( x , z ) } 1 s n k ( x ) · 1 s n k ( y ) p n k d π ( x , y ) S π × S π sup z S π { l ( z ) λ * c ( x , z ) } d π ( x , y ) = S π sup z S π { l ( z ) λ * c ( x , z ) } d μ ( x ) .
Here, these facts are used: η n 0 , λ n λ * as n and p n 1 , n 1 S n = k 1 S n k = S π . We also used the fact that lim ̲ k sup z S n k { l ( z ) λ n k c ( x , z ) } sup z S n k { l ( z ) λ * c ( x , z ) } (see Lemma B.7 in Appendix B of [15]). If we let φ * ( x ) = sup z S π { l ( z ) λ * c ( x , z ) } , then ( λ * , φ * ) Λ ( S π × S π ) , and due to the arbitrariness of ε , it follows that
J ( λ * , φ * ) I ¯ .
This completes the proof. □
Proposition A3. 
Suppose that Assumption 4 is in force, c ( X , Y ) = u ( | Y X | ε * ) , then for any λ 0 ,
sup π E { l ( y ) λ c ( x , y ) } d π ( x , y ) = sup z R { l ( z ) λ c ( x , z ) } d μ ( x ) .
Proof. 
Let g ( x , y ) : = l ( y ) λ c ( x , y ) , for n 1 and k n 2 , define
A k , n : = ( x , y ) R × R : k 1 n g ( x , y ) k n ,
B k , n : = P r o j 1 ( A k , n ) j > k P r o j 1 ( A j , n ) .
Noting that g is upper semicontinuous and the sets A k , n are Borel-measurable, we have that their projections B k , n are also Borel-measurable subsets of R . Additionally, from the definition, the collection ( B k , n : k n 2 ) is disjoint. Then, due to the Jankov–von Neumann selection theorem [27], a universally measurable function γ k ( x ) : P r o j 1 ( A k , n ) R for each k n 2 exits such that A k , n , satisfying
k 1 n g ( x , γ k ( x ) ) k n .
Next, as B k , n P r o j 1 ( A k , n ) and ( B k , n : k n 2 ) is disjoint, we define Γ n : R R :
Γ n ( x ) : = γ k ( x ) , if x B k , n for some k n 2 , x , otherwise .
Since each γ k ( x ) is measurable, we have that Γ n ( x ) is also measurable. For x R , if k exists such that x B k , n , then g ( x , Γ k ( x ) ) = g ( x , γ k ( x ) ) k 1 n , and sup { g ( x , y ) : g ( x , y ) n , y S } 1 n k n 1 n = k 1 n . If such k do not exists, then g ( x , Γ k ( x ) ) = g ( x , x ) and { ( x , y ) : g ( x , y ) n , y S } is an empty set, then we obtain
sup { g ( x , y ) : g ( x , y ) n , y R } 1 n g ( x , Γ n ( x ) ) n .
Let n , the we have
sup { g ( x , y ) : y R } lim ̲ n g ( x , Γ n ( x ) ) .
Define the family of probability measures ( π n : n 1 ) as
d π n ( x , y ) = d μ ( x ) · d δ Γ n ( x ) ( y ) .
From (A2), g ( x , y ) n , π n almost surely; thus, | c ( x , y ) | d π n < + and | l ( y ) | d π n < + . Moreover, since π n ( · × R ) = μ ( · ) , then π n E . Finally, since g ( x , Γ n ( x ) ) l ( x ) λ c ( ε * ) 1 > , by Fatou’s lemma,
lim ̲ n + g ( x , y ) d π n ( x , y ) = lim ̲ n + g ( x , Γ n ( x ) ) d μ ( x ) lim ̲ n + g ( x , Γ n ( x ) ) d μ ( x ) sup y R g ( x , y ) d μ ( x ) .
Then, sup π E g d π sup y R g ( x , y ) d μ ( x ) holds. Since g d π sup y R g ( x , y ) d π ( x , y ) for any π E , the proof is complete. □
Proof of Theorem 2. 
If I ¯ = , then I ¯ J ̲ = , the proof is complete. Consider the case that I ¯ < . As a result of Proposition A2, for every π E , we have
I ¯ inf ( λ , φ ) Λ ( S π × S π ) φ ( x ) d μ ( x ) inf λ 0 sup y S π { l ( y ) λ c ( x , y ) } d μ ( x ) .
For any π E and λ 0 , define
T ( λ , π ) : = sup y S π { l ( y ) λ c ( x , y ) } d μ ( x ) .
As c ( x . x ) = c ( ε * ) ,
T ( λ , π ) l d μ λ c ( ε * ) .
Since T ( λ , π ) > I ¯ for every λ > λ max : = ( l d μ I ¯ ) / c ( ε * ) , we restrict attention to the compact subset [ 0 , λ max ] , from (A3), we can see
I ¯ inf λ [ 0 , λ max ] T ( λ , π ) .
As sup y S π { l ( y ) λ c ( x , y ) } is a lower semicontinuous function with respect to the variable λ , we can verify that T ( λ , π ) is lower semicontinuous in λ as well. For any λ n λ , as (A4) holds, we can apply Fatou’s lemma,
lim ̲ n + T ( λ n , π ) lim ̲ n sup y S n { l ( y ) λ n c ( x , y ) } d μ ( x ) sup y S n { l ( y ) λ c ( x , y ) } d μ ( x ) = T ( λ , π ) ,
T ( λ , π ) is a convex function with respect to λ for fixed π . Additionally, for any α ( 0 , 1 ) and π 1 , π 2 E ,
T ( λ , α π 1 + ( 1 α ) π 2 ) = sup y S α π 1 + ( 1 α ) π 2 { l ( y ) λ c ( x , y ) } d μ ( x ) max i = 1 , 2 sup y S π i { l ( y ) λ c ( x , y ) } d μ ( x ) α T ( λ , π 1 ) + ( 1 α ) T ( λ , π 2 ) ,
which means that T ( λ , π ) is concave in π for fixed λ . By applying the minimax theorem, we can conclude
sup π E inf λ [ 0 , λ m a x ] T ( λ , π ) = inf λ [ 0 , λ m a x ] sup π E T ( λ , π ) .
In conjunction with (A5), the following is yielded:
I ¯ sup π E inf λ [ 0 , λ m a x ] T ( λ , π ) inf λ [ 0 , λ m a x ] sup π E sup y S π { l ( y ) λ c ( x , y ) } d μ ( x ) .
Now, since sup y S π { l ( y ) λ c ( x , y ) } d μ ( x ) ( l ( y ) λ c ( x , y ) ) d π ( x , y ) for any π E , from Proposition A3, we have
sup π E sup y S π { l ( y ) λ c ( x , y ) } d μ ( x ) = sup y R { l ( y ) λ c ( x , y ) } d μ ( x ) = φ λ ( x ) d μ ( x ) .
In conjunction with (A6), we further obtain
I ¯ inf λ [ 0 , λ m a x ] φ λ ( x ) d μ ( x ) inf λ 0 φ λ ( x ) d μ ( x ) .
Let g ( λ ) : = φ λ ( x ) d μ ( x ) , by Fatou’s lemma, l i m ̲ n + g ( λ n ) g ( λ ) as λ n λ , then g ( · ) is lower semicontinuous, and g ( λ ) when λ as g ( λ ) l ( x ) d μ ( x ) λ c ( ε * ) and c ( ε * ) > 0 . As a result, { λ : λ 0 , g ( λ ) m } are compact for every m. Therefore, g ( · ) attains its infimum, i.e., we can find λ * [ 0 , ) such that inf λ 0 { φ λ ( x ) d μ ( x ) } = φ λ * ( x ) d μ ( x ) . As φ λ * ( x ) + λ * c ( x , y ) l ( y ) , we have ( λ * , φ λ * ) Λ c , f ; thus, J ̲ φ λ * ( x ) d μ ( x ) . In conjunction with (A7), we have I ¯ J ̲ , then the proof of assertion(i) is complete.
Through the above verification, it is known that an optimizer of the dual problem ( λ * , φ λ * ) always exists. Next, if π * exists satisfying I ¯ = I ( π * ) = J ( λ * , φ λ * ) , which means it is an optimizer to the primal problem, then we have
l ( y ) d π * ( x , y ) = φ λ * ( x ) d μ ( x ) .
In addition, as π * Φ μ and φ λ * ( x , y ) = sup y R { l ( y ) λ * c ( x , y ) } :
l ( y ) d π * ( x , y ) = ( l ( y ) λ * c ( x , y ) ) d π * ( x , y ) + λ * c ( x , y ) d π * ( x , y ) φ λ * ( x , y ) d π * ( x , y ) + λ * c ( x , y ) d π * ( x , y ) φ λ * ( x , y ) d π * ( x , y ) = l ( y ) d π * ( x , y ) .
Thus, we can conclude
l ( y ) λ * c ( x , y ) = sup z R { l ( z ) λ * c ( x , z ) } π * a . s . ,
λ * c ( x , y ) d π * ( x , y ) = 0 .
Alternatively, if (A8) and (A9) are satisfied by any π * Φ μ , ( λ * , φ λ * ) Λ ( c , f ) ,
l ( y ) d π * ( x , y ) = ( l ( y ) λ * c ( x , y ) ) d π * ( x , y ) + λ * c ( x , y ) d π * ( x , y ) = sup z R { l ( z ) λ * c ( x , z ) } d π * ( x , y ) = φ λ * ( x ) d μ ( x ) ,
which means that π * and ( λ * , φ λ * ) are optimal to the primal and dual problem. The proof of the uniqueness of the primal optimizer π * is similar to the proof of Theorem 1(b) in [15] and is, thus, omitted here. We then complete the proof. □

References

  1. Wiesemann, W.; Kuhn, D.; Sim, M. Distributionally robust convex optimization. Oper. Res. 2014, 62, 1358–1376. [Google Scholar] [CrossRef]
  2. Jiang, R.; Guan, Y. Data-driven chance constrained stochastic program. Math. Program. 2016, 158, 291–327. [Google Scholar] [CrossRef]
  3. Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]
  4. Mohajerin Esfahani, P.; Kuhn, D. Data-driven distributionally robust optimization using the wasserstein metric: Performance guarantees and tractable reformulations. Math. Program. 2018, 171, 115–166. [Google Scholar] [CrossRef]
  5. Li, J.Y.M.; Mao, T. A general wasserstein framework for data-driven distributionally robust optimization: Tractability and applications. arXiv 2022, arXiv:2207.09403. [Google Scholar] [CrossRef]
  6. Föllmer, H.; Schied, A. Convex measures of risk and trading constraints. Financ. Stoch. 2002, 6, 429–447. [Google Scholar] [CrossRef]
  7. Föllmer, H.; Schied, A. Stochastic Finance: An Introduction in Discrete Time, 4th ed.; Walter de Gruyter: Berlin, Germany, 2016. [Google Scholar]
  8. Delage, E.; Guo, S.; Xu, H. Shortfall risk models when information on loss function is incomplete. Oper. Res. 2022, forthcoming. [Google Scholar] [CrossRef]
  9. Guo, S.; Xu, H. Distributionally robust shortfall risk optimization model and its approximation. Math. Program. 2019, 174, 473–498. [Google Scholar] [CrossRef]
  10. Mao, T.; Wang, R.; Wu, Q. Model Aggregation for Risk Evaluation and Robust Optimization. arXiv 2022, arXiv:2201.06370. [Google Scholar] [CrossRef]
  11. Popescu, I. Robust mean-covariance solutions for stochastic optimization. Oper. Res. 2007, 55, 98–112. [Google Scholar] [CrossRef]
  12. Hanasusanto, G.A.; Kuhn, D. Conic programming reformulations of two-stage distributionally robust linear programs over wasserstein balls. Oper. Res. 2018, 66, 849–869. [Google Scholar] [CrossRef]
  13. Kuhn, D.; Esfahani, P.M.; Nguyen, V.A.; Shafieezadeh-Abadeh, S. Wasserstein distributionally robust optimization: Theory and applications in machine learning. In Operations Research & Management Science in the Age of Analytics; Informs: Catonsville, MD, USA, 2019; pp. 130–166. [Google Scholar]
  14. Peng, C.; Delage, E. Data-driven optimization with distributionally robust second order stochastic dominance constraints. Oper. Res. 2022. [Google Scholar] [CrossRef]
  15. Blanchet, J.; Murthy, K. Quantifying distributional model risk via optimal transport. Math. Oper. Res. 2019, 44, 565–600. [Google Scholar] [CrossRef]
  16. Kantorovich, L.V.; Rubinshtein, S.G. On a space of totally additive functions. Vestn. St. Petersburg Univ. Math. 1958, 13, 52–59. [Google Scholar]
  17. Artzner, P.; Delbaen, F.; Eber, J.-M.; Heath, D. Coherent measures of risk. Math. Financ. 1999, 9, 203–228. [Google Scholar] [CrossRef]
  18. Bäuerle, N.; Müller, A. Stochastic orders and risk measures: Consistency and bounds. Insur. Math. Econ. 2006, 38, 132–148. [Google Scholar] [CrossRef]
  19. Kallenberg, O.; Kallenberg, O. Foundations of Modern Probability; Springer: Berlin/Heidelberg, Germany, 1997; Volume 2. [Google Scholar]
  20. Mao, T.; Cai, J. Risk measures based on behavioural economics theory. Financ. Stoch. 2018, 22, 367–393. [Google Scholar] [CrossRef]
  21. Shafieezadeh-Abadeh, S.; Kuhn, D.; Esfahani, P.M. Regularization via mass transportation. J. Mach. Learn. Res. 2019, 20, 1–68. [Google Scholar]
  22. Wu, Q.; Li, J.Y.M.; Mao, T. On generalization and regularization via wasserstein distributionally robust optimization. arXiv 2022, arXiv:2212.05716. [Google Scholar] [CrossRef]
  23. Fournier, N.; Guillin, A. On the rate of convergence in wasserstein distance of the empirical measure. Probab. Theory Relat. Fields 2015, 162, 707–738. [Google Scholar] [CrossRef]
  24. Markovitz, H.M. Portfolio selection. J. Financ. 1952, 7, 77–91. [Google Scholar]
  25. Plyakha, Y.; Uppal, R.; Vilkov, G. Why Does an Equal-Weighted Portfolio Outperform Value-and Price-Weighted Portfolios? 2012. Available online: https://ssrn.com/abstract=2724535 (accessed on 16 September 2017).
  26. Luenberger, D.G. Optimization by Vector Space Methods; John Wiley & Sons: Hoboken, NJ, USA, 1997. [Google Scholar]
  27. Bertsekas, D.; Shreve, S.E. Stochastic Optimal Control: The Discrete-Time Case; Athena Scientific: Belmont, MA, USA, 1996; Volume 5. [Google Scholar]
Figure 1. MSE with respect to λ for different values of q.
Figure 1. MSE with respect to λ for different values of q.
Mathematics 11 00849 g001
Figure 2. The red, blue, and green curves represent the MSE under the SW-DD-DRO, the least-squares methods, and the ridge regression, respectively.
Figure 2. The red, blue, and green curves represent the MSE under the SW-DD-DRO, the least-squares methods, and the ridge regression, respectively.
Mathematics 11 00849 g002
Figure 3. Cumulative return curves under different strategies.
Figure 3. Cumulative return curves under different strategies.
Mathematics 11 00849 g003
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Li, R.; Lv, W.; Mao, T. Shortfall-Based Wasserstein Distributionally Robust Optimization. Mathematics 2023, 11, 849. https://doi.org/10.3390/math11040849

AMA Style

Li R, Lv W, Mao T. Shortfall-Based Wasserstein Distributionally Robust Optimization. Mathematics. 2023; 11(4):849. https://doi.org/10.3390/math11040849

Chicago/Turabian Style

Li, Ruoxuan, Wenhua Lv, and Tiantian Mao. 2023. "Shortfall-Based Wasserstein Distributionally Robust Optimization" Mathematics 11, no. 4: 849. https://doi.org/10.3390/math11040849

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop