**Optimization Problem 4**

• *Step 1. Find an optimal vector C*<sup>∗</sup> *by minimizing deviation from the mixed-quantile quadrangle:*

$$\min\_{\mathbf{C}\in\mathbb{R}^m} \mathcal{D}(Z\_0(\mathbf{C})) $$

• *Step 2. Assign:*

$$
\widetilde{\circlearrowright}\_0 \in \mathcal{S}(Z\_0(\mathcal{C}^\*))
$$

Corollaries 1 and 2 can be used for constructing the linear regression for estimating CVaR. Let *Y* be a random vector of factors for estimating the random value *V*. We consider that the linear regression function *f*(*Y*) = *C*0 + *CTY* approximates CVaR of *V*, where *C*0 ∈ R, *C* ∈ R*m* are variables in the linear regression. The residual is denoted by *<sup>Z</sup>*(*<sup>C</sup>*0,*<sup>C</sup>*) = *V* − *C*0 + *CTY* and *<sup>Z</sup>*0(*C*) = *V* − *CTY*.

Further we provide a lemma about linear regression problems based on Corollary 1 with the Set 1 of parameters. The main statement here is that the Optimization Problems 3 and 4 for the mixed-quantile quadrangle can be used to solve linear regression problems for estimating CVaR. This is the case because CVaR and mixed-quantile quadrangles have the same Statistic and Deviation.

**Lemma 4.** *Let the residual random value <sup>Z</sup>*(*<sup>C</sup>*0,*<sup>C</sup>*) = *V* − *C*0 + *CTY be discretely distributed with* ν *equally probable atoms. Let us consider the CVaR quadrangle with error* Eα(*Z*(*<sup>C</sup>*0,*<sup>C</sup>*))*, deviation* Dα(*<sup>Z</sup>*0(*C*))*, and statistic <sup>S</sup>*α(*<sup>Z</sup>*0(*C*<sup>∗</sup>))*. Let us also consider the mixed-quantile quadrangle with the error* E(*Z*(*<sup>C</sup>*0,*<sup>C</sup>*))*, deviation* D(*<sup>Z</sup>*0(*C*))*, and statistic* S(*<sup>Z</sup>*0(*C*<sup>∗</sup>)) *with parameters defined by Set 1 and r* = ν−να +1*,* λ*k* = *p*να−1+*k,* α*k* = γνα−1+*k, k* = 1, ... ,*r. Then, the Optimization Problems 1–4 are equivalent, i.e., the sets of optimal vectors of these optimization problems coincide. Moreover, let <sup>C</sup>*<sup>∗</sup>0,*C*<sup>∗</sup> *be a solution vector of the equivalent Optimization Problems 1–4. Then:*

$$\mathbf{C}\_0^\* = \mathbf{C} V a \mathbf{R}\_a(Z\_0(\mathbf{C}^\*))$$

$$\overline{\mathcal{E}}\_a(\mathbf{C}\_0^\*, \mathbf{C}^\*) = \mathcal{E} \{ \mathbf{C}\_0^\*, \mathbf{C}^\* \} = \mathcal{D}(Z\_0(\mathbf{C}^\*)) = \overline{\mathcal{D}}\_a(Z\_0(\mathbf{C}^\*))$$

**Proof.** This lemma is a direct corollary of the decomposition Theorem 1 and Corollary 1 of Lemma 1. Indeed, Corollary 1 implies that the Optimization Problems 2 and 4 are equivalent. Further, the decomposition theorem implies that Optimization Problems 1 and 2 and the Optimization Problems 3 and 4 are equivalent. -

Further we provide a lemma about linear regression problems based on Corollary 2 with the Set 2 of parameters. The main statement is that the Optimization Problem 4, Step 1 for the Mixed-Quantile Quadrangle can be used to solve linear regression problem for estimating CVaR. This is the case because CVaR and Mixed-Quantile Quadrangles have the same Deviation. After obtaining vector of coefficients *C*<sup>∗</sup>, intercept is calculated, *C*∗0= *CVaR*α(*<sup>Z</sup>*0(*C*<sup>∗</sup>)).

**Lemma 5.** *Let the residual random value <sup>Z</sup>*(*<sup>C</sup>*0,*<sup>C</sup>*) = *V* − *C*0 + *CTYbe discretely distributed with* ν *equally probable atoms. Let the mixed-quantile quadrangle with deviation* D(*<sup>Z</sup>*0(*C*)) *be defined by parameters of Set 2 and r* = ν − να + 1*,* λ*k* = *q*να−2+*k,* α*k* = βνα−2+*k, k* = 1, ... ,*r. Then, <sup>C</sup>*<sup>∗</sup>0,*C*<sup>∗</sup> *is a solution of Optimization Problem 1, if and only if, <sup>C</sup>*<sup>∗</sup>0,*C*<sup>∗</sup> *is a solution of the following two-step procedure:*

• *Step 1. Find an optimal vector C*<sup>∗</sup> *by minimizing deviation from the mixed-quantile quadrangle:*

$$\min\_{\mathbf{C}\in\mathbb{R}^m} \mathcal{D}(Z\_0(\mathbf{C})) $$

• *Step 2. Calculate C*∗0 = *CVaR*α(*<sup>Z</sup>*0(*C*<sup>∗</sup>)).

**Proof.** This lemma is a direct corollary of the decomposition Theorem 1 and Corollary 2 or Lemma 2. Indeed, the Corollary 2 implies that the Optimization Problems 2 and 4 are equivalent. Further, since deviations of the CVaR and mixed-quantile quadrangles coincide, we can use Step 1 to calculate optimal coefficients *C*<sup>∗</sup>. Further, the intercept is calculated with *C*∗0 = *CVaR*α(*<sup>Z</sup>*0(*C*<sup>∗</sup>)) because CVaR is the statistic in the CVaR quadrangle. -

For the Set 2 of parameters, the deviations in the CVaR and mixed-quintile quadrangles coincide. The two-step procedure in Optimization Problem 4 can be used to solve linear regression problems with Set 2 parameters for the mixed-quantile deviation. Also, the minimization of the Rockafellar error with the Set 2 of parameters may result in a correct *C*<sup>∗</sup>. The statistic of CVaR belongs to the statistic of the mixed-quantile quadrangle. Therefore, the optimization of the Rockafellar error with the Set 2 of parameters may lead to a wrong value of intercept *C*∗ 0. This potential incorrectness can be fixed by assigning *C*∗ 0= *CVaR*α(*<sup>Z</sup>*0(*C*<sup>∗</sup>)).

### **6. Case Study: Estimation of CVaR with Linear Regression and Style Classification of Funds**

The case study described in this section is posted online (see Case Study (2016)). The codes and data are available for downloading and verification. Every optimization problem is presented in three formats: Text, MATLAB, and R. Calculations were done with a PC with a 3.14 GHz processor.

We have applied CVaR regression to the return-based style classification of a mutual fund. We regress a fund return by several indices as explanatory factors. The estimated coe fficients represent the fund's style with respect to each of the indices.

A similar problem with a standard regression based on the mean squared error was considered by Carhart (1997) and Sharpe (1992). They estimated the conditional expectation of a fund return distribution (under the condition that a realization of explanatory factors is observed). Basset and Chen (2001) extended this approach and conducted the style analyses of quantiles of the return distribution. This extension is based on the quantile regression suggested by Koenker and Bassett (1978). The Case Study (2014), "Style Classification with Quantile Regression" implemented this approach and applied quantile regression to the return-based style classification of a mutual fund.

For the numerical implementation of CVaR linear regression, we used the Portfolio Safeguard (2018) package. Portfolio Safeguard (PSG) can solve nonlinear and mixed-integer nonlinear optimization problems. A special feature of PSG is that it includes precoded nonlinear functions: CVaR2 error (cvar2\_err) and CVaR2 deviation (cvar2\_dev) from the CVaR quadrangle, Rockafellar error (ro\_err) from the mixed-quantile quadrangle, and CVaR deviation (cvar\_dev) from the quantile quadrangle.

We implemented the following equivalent variants of CVaR regression:


PSG automatically converts the analytic problem formulations to the mathematical programming codes and solves them. We included in Appendix A convex and linear programming problems for the minimization of the Rockafellar error with the Set 1 of parameters. These formulations are provided for verification purposes. They can be implemented with standard commercial software. For instance, the linear programming formulation can be implemented with the Gurobi optimization package. If Gurobi is installed in the computer, PSG can use Gurobi code as a subsolver. With the CARGRB solver in PSG, by setting the linearize option to 1, it is possible to solve the linear programming problem with Gurobi. However, this conversion will deteriorate the performance, compared to the default PSG solver VAN. For small problems it will not be noticeable. However, for problems with a large number of scenarios (e.g., with 10<sup>8</sup> observations), the standard PSG solver VAN will dramatically outperform the Gurobi linear programming implementation. In this case, Gurobi may not even start on a small PC because of a shortage of memory. Nevertheless, if the number of observations is small (e.g., 103) and the number of factors is very large (e.g., 107), it is recommended that the linear programming formulation is used.

We regressed the CVaR of the return distribution of the Fidelity Magellan Fund on the explanatory variables: Russell 1000 Growth Index (RLG), Russell 1000 Value Index (RLV), Russell Value Index (RUJ), and Russell 2000 Growth Index (RUO). The dataset includes 1264 historical daily returns of the Magellan Fund and the indices, which were downloaded from the Yahoo Finance website. The data (design matrix for the regression) is posted on the Case Study (2016) website.

The CVaR regression was done with the confidence levels α = 0.75 and α = 0.9. Calculation results are in Tables 1 and 2, respectively. Here is the description of the columns of the tables:


**Table 1.** Optimization outputs: estimating CVaR with the linear regression, α = 0.75.


**Table 2.** Optimization outputs: estimating CVaR with the linear regression, α = 0.9.


Tables 1 and 2 show calculation results for the considered equivalent problems. We observe that regression coefficients coincide for all problems in Tables 1 and 2. This confirms the correctness of theoretical results and the numerical implementation. Also, we want to point out that the regression coefficients are quite similar for α = 0.75 (Table 1) and α = 0.9 (Table 2).

The calculation time in majority of cases was around 0.02–0.04 s, except for the case with the mixed CVaR deviation for Set 1, which took 0.11 s. The PSG calculation times were quite low because the solver "knows" analytical expressions for the functions and can take advantage of this knowledge.
