Next Article in Journal
Machine Learning-Enhanced Fabrication of Three-Dimensional Co-Pt Microstructures via Localized Electrochemical Deposition
Previous Article in Journal
Truncation Error of the Network Simulation Method: Chaotic Dynamical Systems in Mechanical Engineering
Previous Article in Special Issue
Enhancing Portfolio Optimization: A Two-Stage Approach with Deep Learning and Portfolio Optimization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Estimating Asset Pricing Models in the Presence of Cross-Sectionally Correlated Pricing Errors

1
Department of Computer Science and Engineering, Sogang University, Seoul 04107, Republic of Korea
2
BlueAlpha Advisors, Seoul, Republic of Korea
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(21), 3442; https://doi.org/10.3390/math12213442
Submission received: 25 September 2024 / Revised: 24 October 2024 / Accepted: 29 October 2024 / Published: 4 November 2024
(This article belongs to the Special Issue Advances in Machine Learning Applied to Financial Economics)

Abstract

:
In this study, we propose an adversarial learning approach to the asset pricing model estimation problem which aims to find estimates of factors and loadings that capture time-series covariations while minimizing the worst-case cross-sectional pricing errors. The proposed estimator is defined by a novel min-max optimization problem in which finding a solution is known to be difficult. This contrasts with other related estimators that admit a well-defined analytic solution but do not effectively account for correlations among the pricing errors. To this end, we propose an approximate algorithm based on the alternating optimization procedure and empirically demonstrate that our proposed adversarial estimation framework outperforms other existing factor models, especially when the explanatory power of the pricing model is limited.

1. Introduction

In this paper, we study one of the central problems in the field of finance, namely the estimation of the multifactor model for asset pricing that explains how prices of various assets, e.g., stocks and bonds, are set to the current values by using a small number of factors compared to the number of assets. In this section, we explain the importance of the estimation problem and discuss how the problem can be cast into a problem of unsupervised learning. To begin with, we describe factor pricing models in its simplest form.

1.1. Factor Pricing Models

Let us consider a bivariate linear mapping
E ( R e i ) = β i E ( f ) + α i for i { 1 , 2 , , N }
from a pair of real numbers ( α i , β i ) to a real number E ( R e i ) where a natural number N denotes the number of all assets and E ( · ) denotes the expectation operator defined on the set of all possible random variables R e i and f that represent excess return of asset i and pricing factor, respectively. The second term in Equation (1), α i , represents the pricing error. Moreover, let us consider N constraints where α i in Equation (1) is equal to zero for all i, i.e.,
α i = 0 for i { 1 , 2 , , N } .
Here, we explicitly write the constraint on α i ’s instead of directly applying it to Equation (1) in order to clarify that finding a model with smaller pricing errors, α i ’s, is one of the two objectives that an estimator proposed in this paper pursues. Given a set of realizations R 1 e i , R 2 e i , , R T e i R of the random variable R e i generated at time t { 1 , 2 , , T } for every i { 1 , 2 , , N } , the estimation of our interest consists of finding β i and E ( f ) that satisfy Equations (1) and (2) as closely as possible.
The factor pricing model, described by Equations (1) and (2), postulates that an asset i’s expected excess return linearly depends on its factor loading β i . The model exhibits how the expected excess return of an arbitrary asset is valued based on the asset’s factor loading. If we acknowledge that α i ’s differ significantly from 0, we could decide either of two things. First, the factor pricing model is wrong. For example, the underlying factor f is chosen in a wrong way. This leads us to conclude that a better pricing factor should be chosen or more pricing factors be added to the model. That is, the model needs to be improved. Second, the world is wrong. This means that the expected excess returns are not explained by the model since the corresponding assets are “mis-priced”. This might present practical trading opportunities for shrewd investors [1]. This is one of many reasons that correctly estimating the underlying pricing factor is a central problem in the field of finance that led to more than 300 published pricing factor candidates [2]. Furthermore, the exploration of new factors driving asset returns continues to be an active area of interest in both academic research and practical applications [3,4,5,6,7].

1.2. Definition of Multifactor Models

In this paper, we consider multifactor models for the following reason. Note first that we can generalize the factor pricing models described above by adding more pricing factors. It is therefore more sensible to use K ( > 1 ) pricing factors instead of a single factor to explain expected excess returns of a large number of arbitrary assets [8,9]. The multifactor models amount to a ( K + 1 ) -variate linear mapping
E ( R e i ) = k = 1 K β i , k E ( f k ) + α i , for i { 1 , 2 , , N }
from a tuple of real numbers ( α i , β i , 1 , β i , 2 , , β i , K ) to a real number E ( R e i ) where a natural number K denotes the number of the pricing factors that are used to model the expected excess returns across assets, or the cross-section of expected excess returns. Usually, K is set to be much smaller than N.
Let f = [ f 1 , , f K ] and β i = [ β i , 1 , , β i , K ] be K × 1 column vectors of the pricing factors and the factor loadings, respectively (We let A denote the transpose of a matrix A). If Equations (2) and (3) are satisfied and the β i is computed as
β i = Σ f 1 c o v ( f , R e i ) for i { 1 , 2 , , N }
where Σ f is the K × K covariance matrix of f and c o v ( f , R e i ) denotes the K × 1 column vector whose k-th entry is c o v ( f k , R e i ) , then we say that there is a multifactor model with factors f 1 , f 2 , , f K [10]. We refer to the multifactor model described by Equations (2)–(4) as the K factor model. As an illustrative example, we present a specific type of multifactor model in Section 2 in order to clarify the motivation of our work.

1.3. Unsupervised Learning Problem for the Multifactor Models

The main objective of this study is to propose a new method for estimating the pricing factors and the factor loadings of the multifactor model based on a given set of training data D = { R t e i : t = 1 , 2 , , T , i = 1 , 2 , , N } . The estimator aims to provide the T estimates of the pricing factors f ^ 1 , f ^ 2 , , f ^ T R K × 1 and the N estimates of the factor loadings β ^ 1 , β ^ 2 , , β ^ N R K × 1 that fit well with the conditions in Equations (2)–(4). Note that the estimator studied in this paper has nothing to do with predicting, say, an expected excess return, from unseen data. In other words, the estimation problem studied in this paper is an unsupervised learning problem.

1.4. Notation

We summarize notation used in this paper as follows. Let N , T , and K be the number of all assets, the number of time-series observations, and the number of factors, respectively. We assume that K < N and K < T throughout the paper. Let m , n N . I n is an n × n identity matrix. 1 n is a n × 1 column vector of ones. For a matrix A R m × n , we let r a n k ( A ) denote the rank of A, A its transpose, A F its Frobenius norm, and vec ( A ) R m n × 1 the vec operation applied to A. For a square matrix S, t r ( S ) denotes the trace of S. For matrices A and B, A B denotes the Kronecker product of A and B. We let S + n denote the set of all n × n symmetric positive-semi-definite matrices. For a V S + n , we define · V : R n [ 0 , ) by x V = x V x for all x R n × 1 . The set of training data D is compactly represented by a T × N matrix X whose ( t , i ) -th entry is R t e i . We define a T × T matrix P 1 = 1 T 1 T 1 T , which is a projection matrix onto the linear subspace spanned by 1 T . We define a T × T matrix M 1 = I T P 1 that annihilates the component that is parallel to the subspace spanned by 1 T , i.e., 1 T ( M 1 x ) = 0 for all x R T × 1 . We widely use the fact that P 1 and M 1 are symmetric and idempotent.

2. Motivation

In this section, we elucidate a critical issue that motivates the estimator proposed in this paper, supported by preliminary experiments conducted using real-world data [11]. We consider the three factor model described by the time-series regression
R t e i = α i + β i , Mkt RF R t Mkt RF + β i , 1 f t 1 + β i , 2 f t 2 + ϵ t i
of excess returns R t e i on the pricing factors ( R t Mkt RF , f t 1 , f t 2 ) , for t { 1 , 2 , , T } . We fix one of the three factors as the market’s excess return R t Mkt RF that represents the entire US stock market and choose the remaining two from five candidates, { SMB , HML , CMA , RMW , Mom } , which account for 10 combinations. We consider N = 25 portfolios formed on size and book-to-market equity ratio, often called the 5 × 5 Size-B/M portfolios, as test assets indexed by i { 1 , 2 , , 25 } . It is known that they are well explained when ( f 1 , f 2 ) = ( SMB , HML ) [8]. For each i, we run the time-series regression from January 2017 to December 2021 (60 months) and obtain estimates of the factor loadings ( β ^ i , Mkt RF , β ^ i , 1 , β ^ i , 2 ) , the pricing error α ^ i , and the residuals ϵ ^ t i .
A caveat is that the estimation problem described in the previous section does not exactly align with this time-series regression analysis. In the former case, both factors and regression coefficients are estimated, whereas in the latter, predefined factors are used without estimating them. By fixing one element for estimation and utilizing the established knowledge of explanatory power exhibited by various factor combinations, we gain a more intuitive understanding of the essential characteristics that “appropriate” factors and, consequently, factor models necessarily possess. These characteristics form the basis for designing the criterion for the proposed factor model estimator.
Figure 1 (top) illustrates the absolute value of the sample correlation coefficients between ϵ ^ 1 i , , ϵ ^ T i and ϵ ^ 1 j , , ϵ ^ T j that are identified by the brightness of the ( i , j ) -th grid with the color bar. Figure 1 (bottom) presents the average absolute values of the sample correlation coefficients and the sphericity test statistic of Bartlett [12] for the hypothesis that the correlation matrix equals the identity matrix. The figure reveals that the choice of ( f 1 , f 2 ) = ( SMB , HML ) indeed results in the smallest correlation. This is shown in the top-left subfigure of Figure 1 (top), represented by the darkest surface. It is also observed by the smallest values presented by the leftmost two bars in Figure 1 (bottom). The figure demonstrates that excluding either SMB or HML leads to a rise in cross-sectional correlation, while excluding both SMB and HML further increases the cross-sectional correlations.
These observations suggest that “appropriately” chosen pricing factors well regress out covariations in the test assets’ returns, making the unexplained part of the test assets, i.e., α ^ i + ϵ ^ t i , less likely to be correlated across assets. Conversely, when the chosen pricing factors are “inappropriate” for explaining the test assets, it becomes more probable that α ^ i + ϵ ^ t i is cross-sectionally correlated.
In this context, evaluating an estimated factor model typically involves testing the null hypothesis H 0 : α 1 = = α N = 0 using a test statistic of the form:
q = c α ^ W ^ 1 α ^
where α ^ = [ α ^ 1 , , α ^ N ] , W ^ S + N is the estimated residual covariance matrix and c is a positive constant independent of α ^ and W ^ [13,14]. Evaluation of the model is conducted by checking whether | q | δ for a predefined threshold δ , in which case it is concluded that the factor model is correctly estimated. Conversely, if | q | > δ , it is inferred that the factor model is not correctly estimated. Therefore, it is important to identify a model capable of achieving a small value of | q | that passes the test for the null H 0 . However, as shown in Figure 1, W ^ can significantly deviate from I N , especially when “inappropriate” factors are chosen. In such cases, the computed value of q under the assumption of W ^ = I N may differ unpredictably difference from the value obtained without this assumption. This observation motivates the incorporation of q and W ^ into the estimation criterion such that models with smaller | q | are preferred.

3. Related Work and Our Contributions

The arbitrage pricing theory of Ross [15] pioneered a line of research wherein statistical factor structure in the covariances of excess returns between assets are considered first, from which it is derived that the cross-section of expected excess returns is explained by the multifactor model (Equations (2)–(4)). The estimation problem of interest to us, wherein both of the pricing factors and the factor loadings are latent and should be estimated, resorts to the principal component analysis (PCA), justified by Chamberlain and Rothschild [16] and Connor and Korajczyk [17], which has been a dominant form of estimator in the literature, evidenced by large number of publications in recent years [18,19,20,21,22,23,24].
The conventional PCA due to Chamberlain and Rothschild [16] and Connor and Korajczyk [17] removes sample mean from the data by using X ˜ = M 1 X instead of X and apply the eigen-decomposition to Σ ^ = 1 T X M 1 X = 1 T X ˜ X ˜ so as to explain as much time-series variation in the de-meaned data X ˜ as possible. This is explained in Box A in Figure 2, wherein the objective is to find the factors F ˜ R T × K and the loadings Λ R N × K that well approximate X ˜ . The conventional PCA does not explicitly address pricing error information represented in the Boxes B and C.
The conventional PCA dicussed above assumes that the mean of the data matrix X is equal to zero. The risk-premium PCA (RP-PCA) is proposed by removing this assumption based on the observation that this assumption might be restrictive if the means have information about the factor structure. The RP-PCA differs from regularized PCA estimators employed in other applications, e.g., low-rank matrix approximation [25,26,27] and matrix completion [28,29,30], in the sense that it adds an economically motivated regularization term tailored to account for cross-sectional pricing errors.
The RP-PCA explicitly takes into account Box B in Figure 2, wherein the risk premia of the assets and the factor risk premia are compared. In other words, Box B aims to find F ¯ R K × 1 and Λ R N × K in such a way that it minimizes the pricing errors, which are estimated by the difference between X ¯ and F ¯ Λ , measured by X ¯ F ¯ Λ I N . By simultaneously considering the time-series variations (Box A) and the cross-section of pricing errors (Box B) in the framework of a regularized minimization problem, Lettau and Pelger [22,31] showed that their estimator can find pricing factors that cannot be detected by the conventional PCA and that it can estimate factors more efficiently than the conventional PCA in the presence of “weak” factors.
Our factor model estimator is developed to address the weaknesses inherent in the RP-PCA. Specifically, a crucial problem of the RP-PCA is its lack of explicit consideration for the real-world scenario, characterized by correlated pricing errors observed in Section 2. This circumstance can result in substantial differences between X ¯ F ¯ Λ I N and X ¯ F ¯ Λ V , with the later being the more suitable distance function for pricing model estimation when V equals the precision matrix of pricing errors, cf., the definition of the test statistic q in Equation (6). Despite its significance in handling real financial data, however, the incorporation of · V for a general V into pricing model estimation has not been explored in the literature.
A significant challenge in utilizing the precision matrix arises from the inherent estimation errors associated with the covariance structure matrix V, which must be estimated and thus introduces inaccuracies. To tackle this issue, we propose a method that integrates all potential matrices V within predefined ranges into the estimation process. This is accomplished by formulating a novel min-max optimization problem designed to address the estimation errors linked to V, ultimately leading to more robust estimates in the factor model estimation framework.

Our Contributions

First, we propose a new estimator for pricing factors and the associated factor loadings which are defined in the framework of adversarial machine learning. To this end, we restate the estimation problem for asset pricing model as a min-max optimization problem. It is important to mention that the factor model examined in this paper is static, meaning that factor loadings remain constant over time. This static nature simplifies the relationships between factors and asset returns, allowing for a more straightforward analysis on the underlying structure of the data without the need for dynamic adjustments. Then, the estimates of pricing factors and factor loadings will aim to closely approximate the time-series fluctuations in the training set of excess returns, while ensuring that the cross-section of pricing errors remains jointly small, even in the presence of correlated pricing errors across assets. Specifically, we extend an existing PCA-based factor model estimator by allowing for the distance to be defined by a seminorm · V for any arbitrary V S + n , which entails the consideration of cross-sectional correlations.
Second, we provide an optimization method that approximately solves the proposed min-max problem. We employ the alternating optimization procedure, which solves minimization and maximization problems, iteratively. In particular, we explain that it is challenging to solve the minimization part of the problem of our interest and prove that the proposed algorithm converges and generates well-defined iterates. By doing so, we introduce a novel computational framework for examining factor pricing model estimators defined to cover estimation errors inherent in the covariance structure matrix V that allows for a broader and more general measurement of pricing errors.

4. The Proposed Estimator of the K Factor Models

In this section, we present our proposed estimator of the K factor models, which is defined by an min-max optimization problem. We explain the objective function of the proposed optimization problem and relate it to the existing factor model estimators.

4.1. The Proposed Min-Max Problem

Based on the preliminary empirical findings in Section 2, it appears that correlations between pricing errors of distinct assets may not be negligible if inappropriate pricing factors are employed to explain asset excess returns. Drawing on these preliminary results and the concept outlined in Figure 2, we propose a novel estimator of the K factor model. The estimates of F and Λ are determined as solutions to the following min-max problem.
min Λ R N × K F R T × K max V V ψ ( Λ , F , V ) 1 N T X ˜ F ˜ Λ F 2 + η N X ¯ Λ F ¯ V 2
The objective function ψ : R N × K × R T × K × V [ 0 , ) is specified by a non-negative real number η that we call a regularization parameter and the set V of symmetric positive-semi-definite matrices defined as
V = V R N × N : V S + N , V l V V u .
Here, the inequalities are evaluated element-wise and V l and V u are pre-specified matrices used to handle estimation errors in V.
This min-max optimization problem can be regarded as a zero-sum game, wherein one player, representing the minimization part, aims to uncover the factors and loadings that provide the most accurate explanation for the time-series variation in asset returns (represented by the first term in Equation (7)) as well as the cross-sectional variation in their mean transformed by V (the second term in Equation (7)). For instance, when a fixed matrix V is decomposed using the Cholesky decomposition as V = Q Q , the cross-section of pricing errors to be minimized can be computed as X ¯ Λ F ¯ V 2 , which is equivalent to Q ( X ¯ Λ F ¯ ) 2 .
Concurrently, the other player, representing the maximization part, aims to construct a hypothesis test that poses the greatest challenge for its counterpart to generate a good estimate of the pair ( Λ ^ , F ^ ) . This is accomplished by trying to find V that maximizes X ¯ Λ F ¯ V in Equation (7) as much as possible, i.e., maximizing the pricing errors generated by the factor model currently estimated by its counterpart. This maximization part allows the minimization part to uncover the pair ( Λ , F ) that “works” across a wide range of adverse environments. Therefore, the interaction between the two players drives an iterative process, fostering the discovery of an improved and robust estimator similar to the “generator” and the “discriminator” in the celebrated generative adversarial network (GAN) [32].
The algorithm to find the estimates, which are defined as a solution to the min-max problem (7), is based on an alternating procedure between the minimization and maximization, similar to the methodology employed in adversarial deep learning approaches, e.g., [32,33]. The reason for employing this alternating method, which yields an approximate solution, is to address the numerical instability highlighted by [34]. By utilizing the alternating procedure, we aim to mitigate the numerical challenges associated with solving such optimization problems. Algorithm 1 presents our proposed estimator. The minimization step, for a fixed V, involves finding the estimates F ^ and Λ ^ by employing the alternating least squares method. This method updates F and Λ iteratively, with one fixed while the other is updated, as explained in the next subsection. The maximization step, for fixed Λ and F, can be reformulated as follows:
max V t r ( X ¯ Λ F ¯ ) ( X ¯ Λ F ¯ ) V s . t . V S + N , V l V V u .
This problem can be solved numerically using a semi-definite programming solver, such as the one provided by [35].
Algorithm 1: Proposed factor model estimator
Mathematics 12 03442 i001

4.2. The Minimization Part

Unlike the maximization part of (7), which is a convex optimization problem, its minimization counterpart is non-convex. This necessitates the development of an algorithm specifically designed to address the challenges of non-convex optimization. We propose a method based on the alternating least squares method as described in lines 5–9 of Algorithm 1. Specifically, let us consider the minimization part of Problem (7) for a fixed V:
min Λ , F ϕ ( Λ , F ) 1 N T X ˜ F ˜ Λ F 2 + η N X ¯ Λ F ¯ V 2
The following reformulation of the objective function ϕ : R N × K × R T × K [ 0 , )
ϕ ( Λ , F ) = 1 N T t r M 1 ( X F Λ ) ( X F Λ ) M 1 + η P 1 ( X F Λ ) V ( X F Λ ) P 1
is useful, which is derived in Appendix B.
If ϕ in Equation (11) only has the first term, i.e., η = 0 , then Problem (10) reduces to the conventional PCA estimator where the pricing factors and factor loadings are estimated by computing the eigen-decomposition of X M 1 X . If we replace η with 1 + γ and V with I N , then our proposed objective function ϕ becomes a function ϕ R P : R N × K × R T × K [ 0 , ) defined by
ϕ R P ( Λ , F ) = 1 N T t r M 1 ( X F Λ ) ( X F Λ ) M 1 + ( 1 + γ ) P 1 ( X F Λ ) ( X F Λ ) P 1
for γ [ 1 , ) , which is exactly the same as the objective function that defines the RP-PCA [22,31]. Consequently, our estimator defined by Problem (10), more generally by (7), subsumes the conventional PCA and RP-PCA as special cases.
It is difficult to solve Problem (10) due to the existence of the matrix V that accepts an arbitrary matrix in S + N . On the other hand, the conventional PCA (or our estimator with η = 0 ) and RP-PCA estimators can find an exact solution simply by applying the eigen-decomposition to the matrix in the form of X ( I T + γ T 1 1 ) X because the first-order optimality conditions of their corresponding minimization problems imply that F in Equation (12) can be removed using the relation F = X Λ ( Λ Λ ) 1 . This substitution is impossible for Problem (10) due to the existence of V in the second term in Equation (11). Appendix B offers a detailed explanation for this issue.
In order to find an approximate solution to Problem (10), we employ the alternating least squares method where ϕ ( Λ , F ) is minimized for one variable at a time with the other variable fixed. In the update step for pricing factors, we find the minimum of ϕ ( Λ * , F ) over F with a fixed Λ * by solving
( Λ * I T ) I N M 1 + η ( V P 1 ) vec ( X ) = ( Λ * I T ) I N M 1 + η ( V P 1 ) ( Λ * I T ) vec ( F )
for F, and in the update step for factor loadings, we find the minimum of ϕ ( Λ , F * ) over Λ with a fixed F * by solving
( I N F * ) I N M 1 + η ( V P 1 ) vec ( X ) = ( I N F * ) I N M 1 + η ( V P 1 ) ( I N F * ) vec ( Λ )
for Λ . Sufficient conditions for existence and uniqueness of solutions to Equations (13) and (14) are given in the following proposition, which is proved in Appendix C.
Proposition 1.
Suppose that V S + N . Then, there exist solutions to Equations (13) and (14). If it is additionally assumed that V is positive-definite, η > 0 and Λ * and F * have full column rank, i.e., r a n k ( Λ * ) = r a n k ( F * ) = K , then the solutions are unique.
Algorithm 2 summarizes the alternating least squares method that defines the estimates as Λ ^ = Λ n a l t and F ^ = F n a l t . For iterates ( Λ n , F n ) generated by the algorithm for any V S + N and η [ 0 , ) , it is true that the sequence { ϕ ( Λ n , F n ) } is monotonically decreasing and is therefore convergent. Indeed, it is clear that the function F ϕ ( Λ * , F ) is convex for any Λ * . Thus, F n that satisfies the first-order optimality condition (Equation (15)) is the global minimum of the convex function F ϕ ( Λ n 1 , F ) . It similarly holds for the function Λ ϕ ( Λ , F * ) .
Algorithm 2: Alternating minimization
Mathematics 12 03442 i002
We empirically demonstrate the convergence of Algorithm 2 through experiments conducted on real-world data. As in Section 2, we use the monthly excess returns of 5 × 5 Size-B/M portfolios, i.e., N = 25 , over a period of T = 60 months, from January 2017 to December 2021, represented by a T × N matrix X. We select the regularization parameters η = 10 in Equation (11) and fix K = 4 as in the simulation study of Lettau and Pelger [22], while varying the input V S + n to handle cross-sectional correlation in the pricing errors.
First, we consider the case when V = I N where Problem (10) is reduced to the RP-PCA, and so the exact solution can be found. Let us denote the exact solution by ( Λ * , F * ) and the minimum value of the objective function by ϕ * = ϕ ( Λ * , F * ) . For a set of iterates { ( Λ n , F n ) : n = 0 , 1 , , n a l t } created by Algorithm 2, let us define ϕ n = ϕ ( Λ n , F n ) . We run the algorithm five times with five different random initializations ( Λ 0 , F 0 ) , and exhibit the suboptimality measured by ϕ n ϕ * and the distance between the iterates and the exact solution F n Λ n F * Λ * F in Figure 3a,b where we can clearly see that the suboptimality and the distance both converge to zero for every random initialization.
Next, we consider a V S + N computed as follows. We run the time-series regressions (Equation (5)) of X on { Mkt RF , SMB , HML } and { Mkt RF , RMW , CMA } , obtain the estimated residuals and compute their sample covariance matrices denoted by Σ 1 and Σ 2 S + N . Then, we normalize Σ 1 and Σ 2 by dividing them by t r ( Σ 1 ) / N and t r ( Σ 2 ) / N , respectively, in order for the normalized covariance matrices to have t r ( Σ 1 ) = t r ( Σ 2 ) = N , which equals the trace of I N . We run Algorithm 2 for V { Σ 1 1 , Σ 2 1 } and plot the objective function values ϕ n for V = Σ 1 1 and V = Σ 2 1 in Figure 3c,d, respectively.We can observe that only five iteration steps are enough for the objective function values to converge for every random initialization.

5. Experiments

We evaluate the performance of the proposed estimator and other related estimators using a set of empirical data which comprises monthly returns of portfolios of stocks listed in the Center for Research in Security Prices (CRSP). The portfolios of stocks are divided into deciles based on 37 characteristics that were also considered in [31,36], and we also use the first and the tenth decile portfolios. Each of the decile portfolios was constructed as a value-weighted, long-only portfolio comprising US stocks within the corresponding decile. Consequently, our dataset consists of N = 74 portfolios in the cross-section. The dataset covers a sample period extending from November 1963 to December 2019, totaling 674 months. We consider the Fama–French three-factor (FF3) and five-factor (FF5) models [8,9], as well as the conventional principal component analysis (PCA) [37,38] and RP-PCA [22], as benchmarks for our evaluation. We assess their performance across three criteria, both in-sample and out-of-sample, as investigated in [31].
For the in-sample analysis, we use the entire dataset from November 1963–December 2019. For the out-of-sample (OOS) analysis, we employ a rolling estimation method with a 240-month estimation period, i.e., T = 240 , and a 1-month prediction period, moving the estimation period forward by 1 month at a time. Our first estimation period spans from November 1963 to October 1983, and the first OOS prediction is made for November 1983. As a result, we have a total of 434 OOS observations.
The first criterion is the maximum Sharpe ratio that is obtained by linearly combining factors based on the weights w R K calculated by w = Σ ^ F 1 μ ^ F . Here, Σ ^ F R K × K and μ ^ F R K are, respectively, the sample covariance matrix and the sample mean of the estimated factors F ^ t 239 , , F ^ t R K . As a result, w R K represents the weights that maximize the Sharpe ratio of the portfolio composed of the estimated factors. The OOS returns of the maximum Sharpe ratio portfolio, corresponding to the weights w, are computed by r t + 1 o o s = w F t + 1 and are gathered for each prediction period t + 1 to compute the OOS maximum Sharpe ratio (OOS-SR). Second, we consider the root mean squared (RMS) pricing error across N test assets RMS α = α ^ α ^ / N where α ^ is the vector of ordinary least squares (OLS) intercepts estimated by regressing the returns of the test assets on the estimated factors. Third, we evaluate the average unexplained variance across N test assets σ ¯ e 2 = 1 N i = 1 N σ e ^ i 2 / σ R i 2 where e ^ i R T × 1 represents the residual estimated by regressing the returns of asset i on the estimated factors. σ e ^ i 2 and σ R i 2 represent the variances of the residual and return of asset i, respectively.

6. Discussion

Table 1 and Table 2 present findings that offer several observations for 15 factor model specifications with η { 1 , 10 , 20 } and K { 3 , 5 , 7 , 10 , 15 } . These hyperparameter settings were selected based on the findings from [31], where they demonstrated superior performance within this specific range. We begin our discussion with the OOS findings presented in Table 2: First, increasing the number of factors K leads to improvements in the Sharpe ratio, pricing errors RMSα and idiosyncratic unexplained variance σ ¯ e 2 across all of the three PCA-based estimators. This result suggests that a larger number of factors provides a better approximation of the stochastic discount factor (SDF), enhances the ability to explain pricing information, and captures variations in asset returns. These findings align with the observations drawn in [31,36], providing additional support for the notion that a higher value of K contributes to an enhanced factor model estimation.
Second, the results consistently demonstrate the superior performance of our method compared to other factor models in the OOS analysis, particularly in terms of OOS-SR. Our method consistently achieves the highest SR values among all of the factor model configurations, indicating its ability to better approximate the SDF. Indeed, a notable observation is that, when the number of factors K takes smaller values such as K = 3 or 5, the superiority of our method in terms of the SR becomes more evident. In comparison to FF3/FF5, the conventional PCA and RP-PCA, our method consistently outperforms them by a substantial margin. As an example, when considering the case wherein K = 3 in Panel A of Table 2, the RP-PCA shows some advantage over the conventional PCA in terms of SR. However, our proposed method outperforms both estimators by a significant margin, indicating its superior performance in capturing and exploiting the underlying factors driving asset returns. This further underscores the strength and effectiveness of our approach, especially when confronted with a smaller number of factors. This highlights the robustness and reliability of our approach in capturing relevant information, in cases wherein the explanatory power of the factor models is inherently constrained.
Third, we observe that increasing the value of η results in higher OOS-SR for both the RP-PCA and our proposed method. However, it is important to note that the magnitude of incremental changes differs between the two approaches. Our proposed estimator consistently achieves higher SR even at lower values of η for all K and exhibits stable changes in OOS-SR with respect to increases in the value of η in contrast to the unpredictable changes observed for the RP-PCA. For instance, in Panel B of Table 2, when K = 5 , our method shows an increase in the OOS-SR by 16.4% from 0.422 to 0.491, while the RP-PCA exhibits an increase of 80% from 0.270 to 0.486 as η varies from 1 to 10. This observation can be attributed to the unique design of our estimator, particularly the utilization of a min-max optimization framework. The maximization part incorporates the specification of η , leading to smaller changes in the outcomes with respect to changes in the value of η . Conversely, the increment in SR for RP-PCA is more pronounced, suggesting a greater sensitivity to changes in η .
Furthermore, our proposed estimator consistently demonstrates a comparative advantage in terms of OOS-RMSα among the PCA-based factor models. Specifically, our method consistently achieves smaller RMSα values compared to RP-PCA. For K 10 and η 10 , we can observe that RP-PCA and our method yield identical values up to the first decimal point. Additionally, we observe that both our method and RP-PCA outperform conventional PCA, indicating that regularization of pricing error information to obtain factor model estimates is indeed effective. In contrast, for σ ¯ e 2 values, we observe that PCA outperforms the other two methods in 7 out of 15 configurations, which can be attributed to the fact that the criterion of PCA solely takes into account time-series variation.
Finally, the in-sample performance, as presented in Table 1, demonstrates improvement in all three performance metrics for all estimators as the number K of factors increases. Specifically, all metrics consistently improve with increasing K in all but one case. In terms of SR, our proposed method outperforms other estimators in all configurations of ( K , η ) , indicating its ability to approximate the SDF better than other methods within the sample. Both the RP-PCA and our method outperform PCA in terms of RMSα, except for one case. This can be attributed to the fact that RP-PCA and our method take into account the first-order information in the estimation process whereas PCA measures the time-series variation only by construction, consequently performing the best in terms of in-sample σ ¯ e 2 for all cases.

7. Conclusions

In this paper, we introduce a novel estimator for factor pricing models by presenting a min-max optimization problem that effectively incorporates both the time-series variations of realized excess returns and the cross-sectional pricing errors. We also present an algorithm designed to approximately solve this optimization problem, which utilizes an iterative method widely employed in adversarial machine learning.
Through extensive empirical experiments using real-world data, we demonstrate that our proposed estimator consistently outperforms existing static factor model estimators. Specifically, the portfolios implied by the factors estimated through our method exhibit larger in-sample and out-of-sample Sharpe ratios compared to those of portfolios implied by other related estimators. This highlights the superior risk-adjusted returns achievable via our approach.
Furthermore, our estimator shows comparable performance in terms of out-of-sample pricing errors and unexplained variations, indicating its effectiveness in maintaining accuracy while addressing the complexities inherent in financial modeling. Overall, our findings suggest that this novel estimator represents a significant advancement in the field of factor pricing models.

Author Contributions

Conceptualization, H.K. and S.K.; Funding acquisition, S.K.; Investigation, H.K.; Methodology, H.K.; Software, H.K.; Supervision, S.K.; Validation, H.K. and S.K.; Visualization, H.K. and S.K.; Writing—original draft, H.K.; Writing—review and editing, S.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2020R1F1A1A0106053811).

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The author Saejoon Kim runs a financial consulting firm BlueAlpha Advisors. The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Appendix A. Basic Facts

In this section, we present some basic facts that are used in proofs presented in this paper. We use the notation [ A : B ] to denote the concatenation of matrices A and B that have the same number of rows.
Lemma A1.
A T × T matrix S η defined as S η = M 1 + η P 1 for a real number η satisfies that
r a n k ( S η ) = T 1 i f η = 0 T i f η 0 .
Proof. 
Since P 1 and M 1 are idempotent matrices and M 1 = I T P 1 , we have that r a n k ( P 1 ) = t r ( P 1 ) = 1 and that r a n k ( M 1 ) = T r a n k ( P 1 ) = T 1 [39].
If η = 0 , r a n k ( S η ) = r a n k ( M 1 ) = T 1 . Now, suppose that η 0 . Since M 1 is a symmetric matrix, its eigen-decomposition can be written as
M 1 = U L U
where the orthonormal matrix U = [ u 1 : u 2 : : u T 1 ] R T × ( T 1 ) consists of the eigenvectors of M 1 in its columns and L = d i a g ( λ 1 , λ 2 , , λ T 1 ) R ( T 1 ) × ( T 1 ) is a diagonal matrix of the associated (non-zero) eigenvalues. It is clear that 1 T u t = 0 for all t { 1 , 2 , , T 1 } . Thus, we can write the eigen-decomposition of the matrix S η as
S η = M 1 + η P 1 = U L U + η 1 T 1 T 1 T 1 T = U ˜ L ˜ U ˜
where U ˜ and Λ ˜ can be written as
U ˜ = U : 1 T 1 T R T × T and L ˜ = d i a g ( λ 1 , λ 2 , , λ T 1 , η ) R T × T .
U ˜ is orthonormal and all of the diagonal entries of L ˜ are non-zero, implying that r a n k ( S η ) = T . □
Lemma A2.
For any m , n N and A R m × n , we have
rowsp ( A A ) = ( a ) colsp ( A A ) = ( b ) colsp ( A ) = ( c ) rowsp ( A ) .
Proof. 
( a ) Let x colsp ( A A ) be given. Then, there exists an y R n such that x = A A y . Now, we have that x = y A A , which implies that x rowsp ( A A ) . Thus, colsp ( A A ) rowsp ( A A ) . Similarly, it can be proved that rowsp ( A A ) colsp ( A A ) . ( b ) We refer to Magnus and Neudecker [40] (Chapter 1.7). ( c ) is clear from the definition of column spaces and row spaces. □
Lemma A3.
Let m , n N , A R m × n and V R m × m . If V is symmetric and non-singular, then
colsp ( A V A ) = colsp ( A A ) .
Proof. 
To see colsp ( A V A ) colsp ( A A ) , let x colsp ( A V A ) be given. y R n such that x = ( A V A ) y = A ( V A y ) .   x colsp ( A ) . By Lemma A2, x colsp ( A A ) .
To see colsp ( A V A ) colsp ( A A ) , consider the eigen-decomposition of V = U L U . Put Q = L 1 / 2 U . ⇒Q is an m × m invertible matrix and satisfies V = Q Q . Now, let x colsp ( A A ) . ⇒ By Lemma A2, x colsp ( A ) , i.e., z R m such that
x = A z = A Q ( Q ) 1 z .
x colsp ( A Q ) . By Lemma A2, x colsp ( A Q Q A ) = colsp ( A V A ) .
Lemma A4.
Let m , n N , A R m × n and V R m × m . If V is symmetric, then
colsp ( A V A ) = colsp ( A V ) .
Proof. 
Let r a n k ( V ) = r m . Without loss of generality, we can assume that r 1 , as the case when r = 0 is clear. Consider the eigen-decomposition of V = U L U , where L R r × r is a diagonal matrix of real eigenvalues and U R m × r consists of r orthogonal eigenvectors, i.e., U U = I r . Then, we have that
colsp ( A V A ) = colsp ( A U L U A ) = colsp ( A U L 1 / 2 L 1 / 2 U A ) = ( a ) colsp ( A U L 1 / 2 L L 1 / 2 U A ) = colsp ( A U L L U A ) = ( b ) colsp ( A U L U U L U A ) = colsp ( A V V A ) = ( c ) colsp ( A V ) ,
where ( a ) and ( b ) are true since L and U U = I r are symmetric non-singular matrices in conjunction with Lemma A3. ( c ) holds true due to Lemma A2. □

Appendix B. Derivations of the First-Order Optimality Conditions

In this section, we derive the first-order optimality conditions for our proposed optimization problem (10) based on basic facts of matrix calculus [39,40]. We use the symbol d f to denote the differential of a function f of matrices. In this section, we assume that an N × N matrix V is symmetric and do not assume singularity nor positive-definiteness.
Reformulation of the proposed objective function.
To simplify notations, we multiply the objective function ϕ by N T and rewrite it as follows:
N T ϕ ( Λ , F ) = X ˜ F ˜ Λ F 2 + T η X ¯ Λ F ¯ V 2 = M 1 X F Λ F 2 + T η X 1 T 1 T Λ F 1 T 1 T V 2 = M 1 X F Λ F 2 + T η ( X F Λ ) 1 T 1 T V 2 = t r M 1 ( X F Λ ) ( X F Λ ) M 1 + T η t r 1 T 1 T ( X F Λ ) V ( X F Λ ) 1 T 1 T = t r M 1 ( X F Λ ) ( X F Λ ) M 1 + η t r P 1 ( X F Λ ) V ( X F Λ ) P 1 .
Differentials of the proposed objective function.
In order to compute the derivatives of ϕ , we first derive differentials of a matrix-valued function ( Λ , F ) ( X F Λ ) W ( X F Λ ) , where W is assumed to be an N × N symmetric matrix, as follows:
d ( X F Λ ) W ( X F Λ ) = ( a ) d ( X F Λ ) W ( X F Λ ) + ( X F Λ ) W d ( X F Λ ) = ( b ) ( d F ) Λ F ( d Λ ) W ( X F Λ ) + ( X F Λ ) W ( d F ) Λ F ( d Λ ) = ( X F Λ ) W Λ ( d F ) + ( X F Λ ) W ( d Λ ) F ( X F Λ ) W Λ ( d F ) + ( X F Λ ) W ( d Λ ) F = A W = A W A W ,
where A W : = ( X F Λ ) W Λ ( d F ) + ( X F Λ ) W ( d Λ ) F . Here, ( a ) holds true since d ( U W V ) = ( d U ) W V + U ( d W ) V + U W ( d V ) and d W = 0 for arbitrary variables U , V and an arbitrary constant W, and ( b ) holds since
d ( X F Λ ) = ( d F ) Λ F ( d Λ ) = ( d F ) Λ F ( d Λ ) .
The last two are just rearrangement that uses the assumption that W is symmetric.
Using Equation (A2), we derive the differential of each term of N T ϕ in Equation (A1) as follows. The first term:
d X ˜ F ˜ Λ F 2 = d t r M 1 ( X F Λ ) ( X F Λ ) M 1 = ( a ) t r M 1 d ( X F Λ ) ( X F Λ ) M 1 = ( b ) t r M 1 A I N + A I N M 1 = 2 t r M 1 A I N M 1 = 2 t r M 1 ( X F Λ ) Λ ( d F ) + ( X F Λ ) ( d Λ ) F M 1 = 2 t r M 1 ( X F Λ ) Λ ( d F ) M 1 2 t r M 1 ( X F Λ ) ( d Λ ) F M 1 = 2 t r M 1 ( X F Λ ) Λ ( d F ) 2 t r F M 1 ( X F Λ ) ( d Λ ) .
Here, ( a ) holds since d t r ( A X B ) = t r ( A ( d X ) B ) for an arbitrary variable X and constants A and B, ( b ) is due to Equation (A2), and other equalities are clear from properties of the trace operator, namely t r ( A ) = t r ( A ) and t r ( A B ) = t r ( B A ) for matrices A , B with appropriate order and the fact that M 1 is a symmetric and idempotent matrix.
The differential of the second term of Equation (A1) is derived similarly to the derivation for the first term:
d T X ¯ Λ F ¯ V 2 = d t r P 1 ( X F Λ ) V ( X F Λ ) P 1 = t r P 1 d ( X F Λ ) V ( X F Λ ) P 1 = t r P 1 A V + A V P 1 = 2 t r P 1 A V P 1 = 2 t r P 1 ( X F Λ ) V Λ ( d F ) + ( X F Λ ) V ( d Λ ) F P 1 = 2 t r P 1 ( X F Λ ) V Λ ( d F ) P 1 2 t r P 1 ( X F Λ ) V ( d Λ ) F P 1 = 2 t r P 1 ( X F Λ ) V Λ ( d F ) 2 t r F P 1 ( X F Λ ) V ( d Λ ) .
Combining Equations (A3) and (A4), we obtain the differential of N T ϕ as follows:
d ( N T ϕ ( Λ , F ) ) = 2 t r M 1 ( X F Λ ) Λ ( d F ) 2 t r F M 1 ( X F Λ ) ( d Λ ) 2 η t r P 1 ( X F Λ ) V Λ ( d F ) 2 η t r F P 1 ( X F Λ ) V ( d Λ ) .
The first-order optimality condition underlying the update step for pricing factors.
Based on the differential of N T ϕ in Equation (A5), we can write the first-order optimality condition for F, i.e., F ϕ ( Λ , F ) = 0 , as follows:
M 1 ( X F Λ ) Λ + η P 1 ( X F Λ ) V Λ = 0 M 1 X Λ + η P 1 X V Λ = M 1 F Λ Λ + η P 1 F Λ V Λ X + P 1 X ( η V I N ) Λ = F Λ + P 1 F Λ ( η V I N ) Λ Λ M 1 + η ( Λ V ) P 1 vec ( X ) = Λ M 1 + η ( Λ V P 1 ) vec ( F Λ )                                                                 = Λ M 1 + η ( Λ V P 1 ) ( Λ I T ) vec ( F ) ( Λ I T ) I N M 1 + η ( V P 1 ) vec ( X ) = ( Λ I T ) I N M 1 + η ( V P 1 ) ( Λ I T ) vec ( F ) ,
where the first three lines are simple reformulations from Equation (A5), and the last four lines are due to the assumption that V is a symmetric matrix and the fact that vec ( A B C ) = ( C A ) vec ( B ) for matrices A , B and C of proper orders.
The optimality condition for F when V = I N and η > 0 .
For the case when V = I N and η > 0 , we have that the relation
F = X Λ ( Λ Λ ) 1
derived from the first-order optimality condition that allows for substituting for F. Indeed, substituting V = I N to the second line in Equation (A6) leads to
M 1 F Λ Λ + η P 1 F Λ Λ = M 1 X Λ + η P 1 X Λ ( M 1 + η P 1 ) F Λ Λ = ( M 1 + η P 1 ) X Λ F = X Λ ( Λ Λ ) 1
where the second line is a simple reformulation of the equation and the last line is satisfied due to the assumption that η > 0 and Lemma A1 in Appendix A.
The first-order optimality condition underlying the update step for factor loadings.
Similarly to the way of deriving the optimality condition for pricing factors, the first-order optimality condition for Λ , i.e., Λ ϕ ( Λ , F ) = 0 , is given by
F M 1 ( X F Λ ) + η F P 1 ( X F Λ ) V = 0 F M 1 X + η F P 1 X V = F M 1 F Λ + η F P 1 F Λ V F X + P 1 X ( η V I N ) = F F Λ + P 1 F Λ ( η V I N ) I N F M 1 + η ( V F P 1 ) vec ( X ) = I N F M 1 + η ( V F P 1 ) vec ( F Λ )                                                                  = I N F M 1 + η ( V F P 1 ) ( I N F ) vec ( Λ ) ( I N F ) I N M 1 + η ( V P 1 ) vec ( X ) = ( I N F ) I N M 1 + η ( V P 1 ) ( I N F ) vec ( Λ ) .

Appendix C. Proof of Proposition 1

The goal of this section is to prove Proposition 1. To this end, we first prove the following lemma.
Lemma A5.
Assume that V R N × N is symmetric and has the eigen-decomposition V = U ˜ L ˜ U ˜ where U ˜ R N × r , L ˜ = d i a g ( λ 1 , , λ r ) R r × r with λ 1 , , λ r 0 , and r a n k ( V ) = r N . Let η be any real number. Then, the following matrix diagonalization is true:
P : = I N M 1 + η ( V P 1 ) = ( U U 1 ) D ( U U 1 ) .
Here, D is an N T × N T diagonal matrix consisting of entries of 0 , 1 , and η λ i on its diagonal. Specifically, the number of 0s is N r , the number of 1s is N ( T 1 ) , and the number of η λ i is r . U = [ U ˜ : U ˜ ] for some matrix U ˜ R N × ( N r ) such that U U = I N and U 1 = 1 T 1 T : U ˜ 1 for some matrix U ˜ 1 R T × ( T 1 ) such that U 1 U 1 = I T .
Proof. 
Without loss of generality, we assume that r < N . We can choose an orthogonal basis of the null space of V to construct U ˜ . Then, we have that U U = I N . In the same way, from the fact that 1 T 1 T is the eigenvector of the rank-1 matrix P 1 corresponding to the eigenvalue of 1, we can find U ˜ 1 that satisfies U 1 U 1 = I T . Then, the following equalities are satisfied:
V = U L U , P 1 = U 1 L 1 U 1 , U U = U U = I N , U 1 U 1 = U 1 U 1 = I T ,
where L = d i a g ( λ 1 , , λ r , 0 , , 0 ) R N × N , λ 1 , , λ r > 0 , and L 1 = d i a g ( 1 , 0 , , 0 ) R T × T . Then, we have that
P = I N M 1 + η ( V P 1 ) = I N I T I N P 1 + η ( V P 1 ) = I N I T + ( η V I N ) P 1 = ( U U ) ( U 1 U 1 ) + ( η U L U U U ) ( U 1 L 1 U 1 ) = ( U U 1 ) ( U U 1 ) + ( U ( η L I N ) U ) ( U 1 L 1 U 1 ) = ( U U 1 ) ( I N I T ) ( U U 1 ) + ( U U 1 ) ( ( η L I N ) L 1 ) ( U U 1 ) = ( U U 1 ) ( I N I T ) + ( ( η L I N ) L 1 ) ( U U 1 ) .
Now, define an N T × N T diagonal matrix D as
D = ( I N I T ) + ( ( η L I N ) L 1 ) .
It is diagonal since a Kronecker product of diagonal matrices is diagonal and sum of diagonal matrices is diagonal. It has N blocks of T × T diagonal matrices on its diagonal, and the i-th block, for i { 1 , , N } , is
I T + ( η λ i 1 ) L 1 = d i a g ( η λ i , 1 , , 1 ) if i r , I T L 1 = d i a g ( 0 , 1 , , 1 ) otherwise .
Proposition 1, which is given below, is proved using Lemma A4 in Appendix A and Lemma A5.
Proposition A1 (Proposition 1 in the main text).
Suppose that V S + N . Then, there exist solutions to Equations (13) and (14). If it is additionally assumed that V is positive-definite, η > 0 and Λ * and F * have full column rank, i.e., r a n k ( Λ * ) = r a n k ( F * ) = K , then the solutions are unique.
Proof. 
Equations (13) and (14) can be rewritten as
A P vec ( X ) = A P A vec ( F ) , B P vec ( X ) = B P B vec ( Λ )
where P is defined in Equation (A9), A : = Λ * I T and B : = I N F * . Clearly, P is a symmetric matrix. Using Lemma A4, we have that A P vec ( X ) colsp ( A P ) = colsp ( A P A ) and B P vec ( X ) colsp ( B P ) = colsp ( B P B ) , which, in turn, implies that solutions to Equation (A11) exist.
Let us additionally assume that V is positive-definite, η > 0 and Λ * and F * have full column rank. Note that, if V is a symmetric positive-definite matrix and η > 0 , then Equation (A9) in Lemma A5 is the eigen-decomposition of P whose eigenvalue is either 1 or η λ i , both of which are positive. This implies that P is positive-definite. Furthermore, if Λ * and F * have full column rank, then so are A and B. This implies that A P A R K T × K T and B P B R K T × K T have full rank, i.e., they are invertible, implying that the solutions are unique. □
Furthermore, Lemma A5 implies the following corollary that is not used in the paper, but might be useful for sanity checks when implementing Algorithm 2.
Corollary A1.
Assume V R N × N is a symmetric matrix and r a n k ( V ) = r N . Then, the rank of the matrix P R N T × N T defined in Equation (A9) satisfies
r a n k ( P ) = N ( T 1 ) , i f η = 0 . N ( T 1 ) + r , i f η 0 ,
Furthermore, P is non-singular if and only if V is non-singular and η 0 .
Proof. 
By counting the number of non-zero entries on the diagonal of D in Equation (A10), we can see that the equality in Equation (A12) is true.
Next, suppose that P is non-singular. Assume, to arrive at a contradiction, that V is singular or η = 0 . First, suppose that V is singular. ⇒ r < N . Then, we have that r a n k ( P ) m a x { N ( T 1 ) , N ( T 1 ) + r } = N ( T 1 ) + r = N T ( N r ) < N T , which contradicts that P is non-singular. Second, suppose that η = 0 . r a n k ( P ) = N ( T 1 ) < N T , which contradicts that P is non-singular. Conversely, suppose that V is non-singular and η 0 . Then, we have that r = N , implying r a n k ( P ) = N ( T 1 ) + r = N T . Thus, P is non-singular. □

References

  1. Cochrane, J.H. Asset Pricing, 2nd ed.; Princeton University Press: Princeton, NJ, USA, 2005. [Google Scholar]
  2. Harvey, C.R.; Liu, Y.; Zhu, H. … and the Cross-Section of Expected Returns. Rev. Financ. Stud. 2016, 29, 5–68. [Google Scholar] [CrossRef]
  3. Asness, C.S.; Frazzini, A.; Pedersen, L.H. Quality Minus Junk. Rev. Account. Stud. 2019, 24, 34–112. [Google Scholar] [CrossRef]
  4. Kim, S. Enhanced Factor Investing in the Korean Stock Market. Pac. Basin Financ. J. 2021, 67, 101558. [Google Scholar] [CrossRef]
  5. Kim, S. Factor Investing: A Unified View. Appl. Econ. 2023, 55, 1567–1580. [Google Scholar] [CrossRef]
  6. Van Gelderen, E.; Huij, J.; Kyosev, G. Factor Investing from Concept to Implementation. J. Portf. Manag. 2019, 45, 123–140. [Google Scholar] [CrossRef]
  7. Yan, J.; Yu, J. Cross-Stock Momentum and Factor Momentum. J. Financ. Econ. 2023, 150, 103716. [Google Scholar] [CrossRef]
  8. Fama, E.F.; French, K.R. Common Risk Factors in the Returns on Stocks and Bonds. J. Financ. Econ. 1993, 33, 3–56. [Google Scholar] [CrossRef]
  9. Fama, E.F.; French, K.R. A Five-Factor Asset Pricing Model. J. Financ. Econ. 2015, 116, 1–22. [Google Scholar] [CrossRef]
  10. Back, K.E. Asset Pricing and Portfolio Choice Theory, 2nd ed.; Oxford University Press: Oxford, UK, 2017. [Google Scholar]
  11. French, K.R. Kenneth French’s Data Library. 2022. Available online: https://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html (accessed on 1 October 2022).
  12. Bartlett, M.S. The Effect of Standardization on a χ2 Approximation in Factor Analysis. Biometrika 1951, 38, 337–344. [Google Scholar] [CrossRef]
  13. Fama, E.F.; MacBeth, J.D. Risk, Return, and Equilibrium: Empirical Tests. J. Political Econ. 1973, 81, 607–636. [Google Scholar] [CrossRef]
  14. Gibbons, M.R.; Ross, S.A.; Shanken, J. A Test of the Efficiency of a Given Portfolio. Econometrica 1989, 57, 1121–1152. [Google Scholar] [CrossRef]
  15. Ross, S.A. The Arbitrage Theory of Capital Asset Pricing. J. Econ. Theory 1976, 13, 341–360. [Google Scholar] [CrossRef]
  16. Chamberlain, G.; Rothschild, M. Arbitrage, Factor Structure, and Mean-Variance Analysis on Large Asset Markets. Econometrica 1983, 51, 1281–1304. [Google Scholar] [CrossRef]
  17. Connor, G.; Korajczyk, R.A. Performance Measurement with the Arbitrage Pricing Theory: A New Framework for Analysis. J. Financ. Econ. 1986, 15, 373–394. [Google Scholar] [CrossRef]
  18. Fan, J.; Liao, Y.; Wang, W. Projected Principal Component Analysis in Factor Models. Ann. Stat. 2016, 44, 219–254. [Google Scholar] [CrossRef]
  19. Kozak, S.; Nagel, S.; Santosh, S. Interpreting Factor Models. J. Financ. 2018, 73, 1183–1223. [Google Scholar] [CrossRef]
  20. Kelly, B.T.; Pruitt, S.; Su, Y. Characteristics Are Covariances: A Unified Model of Risk and Return. J. Financ. Econ. 2019, 134, 501–524. [Google Scholar] [CrossRef]
  21. Pukthuanthong, K.; Roll, R.; Subrahmanyam, A. A Protocol for Factor Identification. Rev. Financ. Stud. 2019, 32, 1573–1607. [Google Scholar] [CrossRef]
  22. Lettau, M.; Pelger, M. Estimating Latent Asset-Pricing Factors. J. Econom. 2020, 218, 1–31. [Google Scholar] [CrossRef]
  23. Giglio, S.; Xiu, D. Asset Pricing with Omitted Factors. J. Political Econ. 2021, 129, 1947–1990. [Google Scholar] [CrossRef]
  24. Bryzgalova, S.; DeMiguel, V.; Li, S.; Pelger, M. Asset-Pricing Factors with Economic Targets. SSRN Electron. J. 2023. [Google Scholar] [CrossRef]
  25. Srebro, N.; Jaakkola, T. Weighted Low-Rank Approximations. In Proceedings of the International Conference on Machine Learning, Washington DC, USA, 21–24 August 2003. [Google Scholar]
  26. Recht, B.; Re, C.; Tropp, J.; Bittorf, V. Factoring Nonnegative Matrices with Linear Programs. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012. [Google Scholar]
  27. Udell, M.; Horn, C.; Zadeh, R.; Boyd, S. Generalized Low Rank Models. Found. Trends® Mach. Learn. 2016, 9, 1–118. [Google Scholar] [CrossRef]
  28. Keshavan, R.; Montanari, A.; Oh, S. Matrix Completion from Noisy Entries. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 7–10 December 2009. [Google Scholar]
  29. Wright, J.; Ganesh, A.; Rao, S.; Peng, Y.; Ma, Y. Robust Principal Component Analysis: Exact Recovery of Corrupted Low-Rank Matrices via Convex Optimization. In Proceedings of the Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 7–10 December 2009. [Google Scholar]
  30. Candès, E.J.; Tao, T. The Power of Convex Relaxation: Near-Optimal Matrix Completion. IEEE Trans. Inf. Theory 2010, 56, 2053–2080. [Google Scholar] [CrossRef]
  31. Lettau, M.; Pelger, M. Factors That Fit the Time Series and Cross-Section of Stock Returns. Rev. Financ. Stud. 2020, 33, 2274–2325. [Google Scholar] [CrossRef]
  32. Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative Adversarial Nets. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 8–13 December 2014; pp. 1–9. [Google Scholar]
  33. Arjovsky, M.; Chintala, S.; Bottou, L. Wasserstein Generative Adversarial Networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; Volume 70, pp. 214–223. [Google Scholar]
  34. Kapsos, M.; Christofides, N.; Rustem, B. Robust Risk Budgeting. Ann. Oper. Res. 2018, 266, 199–221. [Google Scholar] [CrossRef]
  35. O’Donoghue, B.; Chu, E.; Parikh, N.; Boyd, S. Conic Optimization via Operator Splitting and Homogeneous Self-Dual Embedding. J. Optim. Theory Appl. 2016, 169, 1042–1068. [Google Scholar] [CrossRef]
  36. Kozak, S.; Nagel, S.; Santosh, S. Shrinking the Cross-Section. J. Financ. Econ. 2020, 135, 271–292. [Google Scholar] [CrossRef]
  37. Bai, J.; Ng, S. Determining the Number of Factors in Approximate Factor Models. Econometrica 2002, 70, 191–221. [Google Scholar] [CrossRef]
  38. Stock, J.H.; Watson, M.W. Forecasting Using Principal Components from a Large Number of Predictors. J. Am. Stat. Assoc. 2002, 97, 1167–1179. [Google Scholar] [CrossRef]
  39. Lütkepohl, H. Handbook of Matrices; Wiley: Hoboken, NJ, USA, 1996. [Google Scholar]
  40. Magnus, J.R.; Neudecker, H. Matrix Differential Calculus with Applications in Statistics and Econometrics, 3rd ed.; John Wiley & Sons: Hoboken, NJ, USA, 2019. [Google Scholar]
Figure 1. Statistics for residuals of 5 × 5 Size-B/M portfolio returns regressed on three factor models. One factor is fixed to the market’s excess return and the remaining two are written in subfigure title (top) and on the x-axis (bottom). Factors with * are related to the test assets.
Figure 1. Statistics for residuals of 5 × 5 Size-B/M portfolio returns regressed on three factor models. One factor is fixed to the market’s excess return and the remaining two are written in subfigure title (top) and on the x-axis (bottom). Factors with * are related to the test assets.
Mathematics 12 03442 g001
Figure 2. Estimators of the K factor models. The conventional PCA aims to find F ˜ and Λ that well approximate the de-meaned time-series variations in X ˜ (Box A). The RP-PCA adds a regularization term to explicitly address the pricing error (Box B). The PCA-XC further extends the PCA by allowing for the pricing errors to be calculated in a more general way, represented by V (Box C), which indicates that the pricing error is measured by X ¯ F ¯ Λ V for any arbitrary V S + n .
Figure 2. Estimators of the K factor models. The conventional PCA aims to find F ˜ and Λ that well approximate the de-meaned time-series variations in X ˜ (Box A). The RP-PCA adds a regularization term to explicitly address the pricing error (Box B). The PCA-XC further extends the PCA by allowing for the pricing errors to be calculated in a more general way, represented by V (Box C), which indicates that the pricing error is measured by X ¯ F ¯ Λ V for any arbitrary V S + n .
Mathematics 12 03442 g002
Figure 3. Convergence of Algorithm 2. (a,b) display results for V = I N , while (c,d) show results for V = Σ 1 1 and V = Σ 2 1 , respectively. Each curve represents one random initialization.
Figure 3. Convergence of Algorithm 2. (a,b) display results for V = I N , while (c,d) show results for V = Σ 1 1 and V = Σ 2 1 , respectively. Each curve represents one random initialization.
Mathematics 12 03442 g003
Table 1. In-sample performance. The top-performing estimator is highlighted in bold for each ( K , η ) .
Table 1. In-sample performance. The top-performing estimator is highlighted in bold for each ( K , η ) .
SRRMSα σ ¯ e 2 (%)
Panel A: K = 3
FF3 0.1930.3117.48
PCA η = 1 0.2163.1813.71
η = 10 0.2163.1813.71
η = 20 0.2163.1813.71
RP-PCA η = 1 0.2572.8913.72
η = 10 0.3612.7513.84
η = 20 0.3972.8013.91
Ours η = 1 0.4213.0513.87
η = 10 0.4422.8914.01
η = 20 0.4452.8914.02
Panel B: K = 5
FF5 0.3170.2616.02
PCA η = 1 0.3172.5210.31
η = 10 0.3172.5210.31
η = 20 0.3172.5210.31
RP-PCA η = 1 0.3832.3410.32
η = 10 0.5751.9110.44
η = 20 0.6011.8610.46
Ours η = 1 0.6031.9210.45
η = 10 0.6302.0010.45
η = 20 0.6291.9010.47
Panel C: K = 7
PCA η = 1 0.3812.228.89
η = 10 0.3812.228.89
η = 20 0.3812.228.89
RP-PCA η = 1 0.4492.038.90
η = 10 0.5961.788.96
η = 20 0.6181.748.98
Ours η = 1 0.6181.798.97
η = 10 0.6461.838.98
η = 20 0.6441.768.98
Panel D: K = 10
PCA η = 1 0.4102.077.25
η = 10 0.4102.077.25
η = 20 0.4102.077.25
RP-PCA η = 1 0.4611.977.25
η = 10 0.6021.757.31
η = 20 0.6251.717.32
Ours η = 1 0.6972.357.30
η = 10 0.6511.797.33
η = 20 0.6501.737.33
Panel E: K = 15
PCA η = 1 0.5071.705.42
η = 10 0.5071.705.42
η = 20 0.5071.705.42
RP-PCA η = 1 0.5901.515.43
η = 10 0.7141.195.46
η = 20 0.7261.155.46
Ours η = 1 0.7331.205.46
η = 10 0.7401.145.47
η = 20 0.7401.125.47
Table 2. Out-of-sample performance. The top-performing estimator is highlighted in bold for each ( K , η ) .
Table 2. Out-of-sample performance. The top-performing estimator is highlighted in bold for each ( K , η ) .
SRRMSα σ ¯ e 2 (%)
Panel A: K = 3
FF3 0.1500.2516.47
PCA η = 1 0.1073.0415.78
η = 10 0.1073.0415.78
η = 20 0.1073.0415.78
RP-PCA η = 1 0.1332.9615.70
η = 10 0.2962.5315.35
η = 20 0.3252.4515.32
Ours η = 1 0.3022.5215.39
η = 10 0.2992.6115.42
η = 20 0.3272.4515.39
Panel B: K = 5
FF5 0.3020.1913.85
PCA η = 1 0.2352.2111.98
η = 10 0.2352.2111.98
η = 20 0.2352.2111.98
RP-PCA η = 1 0.2702.1211.97
η = 10 0.4861.7512.04
η = 20 0.4981.7012.06
Ours η = 1 0.4221.7811.98
η = 10 0.4911.7212.05
η = 20 0.5001.6912.06
Panel C: K = 7
PCA η = 1 0.2982.1910.62
η = 10 0.2982.1910.62
η = 20 0.2982.1910.62
RP-PCA η = 1 0.3682.0710.64
η = 10 0.4821.7610.76
η = 20 0.4891.7410.77
Ours η = 1 0.4591.7510.73
η = 10 0.4901.7410.76
η = 20 0.4931.7210.76
Panel D: K = 10
PCA η = 1 0.3461.848.97
η = 10 0.3461.848.97
η = 20 0.3461.848.97
RP-PCA η = 1 0.4331.738.96
η = 10 0.5021.559.00
η = 20 0.5061.539.01
Ours η = 1 0.4761.599.01
η = 10 0.5071.539.01
η = 20 0.5081.539.01
Panel E: K = 15
PCA η = 1 0.3721.616.98
η = 10 0.3721.616.98
η = 20 0.3721.616.98
RP-PCA η = 1 0.4881.416.97
η = 10 0.5491.236.97
η = 20 0.5521.216.97
Ours η = 1 0.5251.266.97
η = 10 0.5521.216.97
η = 20 0.5531.216.97
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Kim, H.; Kim, S. Estimating Asset Pricing Models in the Presence of Cross-Sectionally Correlated Pricing Errors. Mathematics 2024, 12, 3442. https://doi.org/10.3390/math12213442

AMA Style

Kim H, Kim S. Estimating Asset Pricing Models in the Presence of Cross-Sectionally Correlated Pricing Errors. Mathematics. 2024; 12(21):3442. https://doi.org/10.3390/math12213442

Chicago/Turabian Style

Kim, Hyuksoo, and Saejoon Kim. 2024. "Estimating Asset Pricing Models in the Presence of Cross-Sectionally Correlated Pricing Errors" Mathematics 12, no. 21: 3442. https://doi.org/10.3390/math12213442

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop