1. Introduction
The efficiency of financial allocations plays a key role in empirical asset pricing frameworks, with theoretical and practical importance in financial markets. A fundamental point is to verify empirically if the allocations are efficient, conditional on the full set of available information. Approaches to constructing efficiency tests under the conditional point of view have been quickly developing, with the work of Ferson and Siegel [
1] being a fundamental reference. The use of conditional information in efficiency tests has several advantages in relation to traditional tests. The first advantage is the incorporation of additional information in the definition of the tests. This allows us to verify if the allocation was efficient based on the whole set of information available and not only the information contained in the returns and a limited set of factors. This structure allows us to verify the impact of dynamic nonlinear strategies on the efficiency of the portfolio, which is not possible in the tests based on fixed-weight combinations of the tested asset returns, as discussed in Ferson and Siegel [
1].
Although this conditional structure of efficiency tests has several advantages in comparison to traditional tests, it introduces some additional complications in terms of statistical inference. The incorporation of conditional information is accomplished through the use of an additional set of instruments in the estimation and testing procedures. We need estimators that allow for the incorporation of this additional information in the parametric structure of the model, which in fact corresponds to the use of additional moment conditions. Thus, we are restricted to moment estimators with the possibility of overidentification, that is, a number of moment conditions greater than the number of fixed parameters of the model. The natural candidate for this problem is the GMM estimator [
2], which appears as a generalization of the method of moments method for the case of overidentification. As the GMM estimators do not impose any restrictions on the data distribution, only being based on assumptions about the moments, this method is widely used in finance. In this article, we discuss the use of generalized empirical likelihood estimators [
3], which can be seen as a generalization of the GMM estimators, where we use a non-parametric estimate of the likelihood function as a weighting function for the construction of the expected value of the moment conditions.
Cochrane [
4] even says that the GMM structure fits naturally for the stochastic discount factor formulation of asset pricing theories due to the easiness of the use of sample moments in place of population moments. However, the performance of these estimators and derived tests can be negatively affected under the conditions in which the conditional tests are performed.
The first difficulty is the use of a large number of instruments related to the incorporation of conditional information in the efficiency tests. An important result is that in the instrumental variables of the estimations by the two-stage and iterated GMM estimators, there is a statistical bias term that is proportional to the number of moment conditions, as shown in Newey and Smith [
3]. Thus, efficiency tests based on conditional information using GMM estimators are subjected to a bias component, which grows with the number of moment conditions (conditional information) incorporated into the tests. Hence, the great advantage of conditional tests, which is the incorporation of information, is affected by the presence of this component of bias, damaging the statistical properties of these tests.
Financial data in particular stock returns are subject to several problems, such as the presence of conditional heteroscedasticity, non-Gaussian/asymmetric distributions, and even measurement error problems due to the impact of transaction costs and the trading structure itself, which is known as market microstructure noise. GMM estimators are partially adapted to these problems since, due to their semi-parametric nature, they do not need to assume a known parametric distribution, and the possibility of using robust estimators for the presence of serial correlation and heteroscedasticity in the estimation of weighting matrices makes this method less sensitive to serial dependency problems in the first two conditional moments. However, GMM estimators can be suboptimal in the presence of data contaminations such as outliers and heavy tails. The use of higher-order moment conditions makes these estimators sensitive to these effects (e.g., [
5]), and thus these estimators are not robust to these problems.
This study analyzes the use of generalized empirical likelihood (GEL) estimators, proposed by Qin and Lawless [
6], to circumvent the deficiencies existing in the use of the usual estimators in testing portfolio efficiency in the presence of conditional information. This class of estimators has some special characteristics that confer better statistical properties, such as robustness to outliers and heavy-tail distributions, and better finite sample properties compared to the usual methods based on least squares and the generalized method of moments. In generalized empirical likelihood and related methods, the bias does not increase as the number of moment conditions grows (e.g., [
7]), which happens with the use of conditional information. Another important feature is that some estimation methods in the GEL family of estimators have better properties in terms of robustness to contaminations such as outliers, heavy tails, and other forms of incorrect specification (e.g., [
5]). Generalized empirical likelihood estimators are related to information and entropy-based estimation methods, as discussed by Judge and Mittelhammer [
8], and share some of the good properties of these estimators (see [
5,
8] for a detailed discussion on the relationship between GEL and other classes of estimators).
Our work contributes to the portfolio efficiency testing literature by proposing an econometric structure suitable for the special features introduced by the use of conditional information in the model. This inference method is not subject to the finite sample bias problem generated by the use of additional moment conditions, and by using a non-parametric estimator for the likelihood function, it is more robust to problems with the incorrect specification of the process distribution and is efficient in the class of semiparametric models (in the sense of Bickel et al. [
9]). These theoretical characteristics suggest that this method is an interesting alternative to the traditional GMM method used in the construction of efficiency tests with conditional information incorporated in the form of moment conditions.
This issue is quite relevant in practical applications in terms of portfolio management since for fund managers, it is essential to verify that asset allocations are efficiently exploiting all the information available in the market, which in the context of conditional information, is made possible by the addition of moments conditional on the realization of other variables relevant to financial management, such as Treasury-bill and corporate bond yields, inflation, and growth rates in industrial production. In this way, our work contributes by analyzing the applied performance of the GEL estimator in the construction of conditional efficiency tests.
We study the robustness of the tests with the use of GMM and GEL estimators in a finite sample context. With Monte Carlo experiments, we assess the effects that data contaminations, such as outliers and the presence of heavy tails in the innovation structure, can have on the results of efficiency tests. In general, we see that GEL has better performance when heavy tails are present, whereas regarding the presence of outliers, both the GMM and GEL can have better robustness depending on the data-generating process (DGP) we use.
We show that under the null hypothesis, tests using either GEL or GMM estimators have a tendency to over-reject the hypothesis of efficiency in finite samples. We also evaluate how efficiency tests based on GEL and GMM estimations can lead to different decisions using real datasets. The results indicate that, in general, efficiency tests using GEL generate lower estimates compared to tests using the standard approach based on GMM. Moreover, for the case that most resembles the features of a finite sample size used in finance, we see that the results of the efficiency tests are conflicting among the GEL and GMM methodologies. All these results indicate that efficiency tests based on estimators from the GEL class perform differently compared to those of GMM, especially under small samples.
Table 1 presents an overview of recent studies grouped into broad topics on how empirical likelihood and other proposed related methods have been employed in the financial economics literature. Empirical likelihood methods have been incorporated into this field over time, and a few papers explored this family of estimators focusing on this audience [
10,
11]. This family of estimators was employed in applications in specific contexts in asset pricing, such as for valuing risk and option pricing [
12,
13,
14,
15,
16], and specifically in portfolio theory [
17,
18,
19]. On the other hand, to address some of the issues present in the standard methods of estimation in the portfolio theory literature, Bayesian approaches were also introduced [
20,
21]. Alternatively, other studies focused on the statistical tests used in portfolio theory [
22,
23,
24,
25,
26].
The structure of this paper is as follows. The next section introduces the methodology, presenting the asset pricing theory and the econometric models for portfolio efficiency tests for the GMM and GEL estimation methods, with an emphasis on the latter.
Section 3 provides an overview of the data used.
Section 4 provides the simulation experiments to evaluate the robustness of the tests under both methods of estimation.
Section 5 presents the empirical results. Finally,
Section 6 concludes the paper.
3. Data
The data employed can be grouped into different instruments, factors, and portfolios. The common maximum time span for all our data is 720 months (60 years) prior to December 2014. As for the factors, we used a set of five standard instruments commonly employed in this type of analysis to measure the state of the economy and form our set of conditional information. One could say that the chosen lagged variables are part of a somewhat standard set of instruments for this purpose. The first was the lagged value of the 3-month Treasury-bill yield [
34]. The second was the spread between corporate bond yields with different ratings. This spread was derived from the difference between the Moody’s Baa and Aaa corporate bond yields [
1,
35]. Another instrument was the spread between the 10-year and 1-year Treasury-bill yields with constant maturity [
1,
36]. Following [
34], we included the percentage change in U.S. inflation measured by the
Consumer Price Index (CPI). Lastly, the monthly growth rate of seasonally adjusted industrial production was also used, measured by the
Industrial Production Index [
34]. All data were extracted from the historical time series provided by the Federal Reserve.
Given that we focused on the CAPM and Fama–French three-factor model, we extracted the factors for both approaches from the Kenneth R. French website (
http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/data_library.html accessed on 11 November 2022). The market portfolio consists of the weighted return of the value of all companies listed on the NYSE, AMEX, and NASDAQ. More precisely, the market portfolio consists of the value-weight returns of all CRSP firms incorporated in the US and listed on the NYSE, AMEX, or NASDAQ that has a CRSP share code of 10 or 11 at the beginning of month
t, good shares and price data at the beginning of
t, and good return data for
t. The SMB and HML factors are computed in accordance with [
37]. The first factor is the average return of three smaller portfolios subtracted from the average return of the three largest portfolios, whereas the second one is the average return of the two portfolios with high book-to-market subtracted from the average return of the two portfolios with low book-to-market.
Figure 1 and
Figure 2, respectively, present the complete historical series of the lagged state variables and factors used. From the plots of the five instruments, important events in the 60-year range of our data can be easily seen through the peaks and valleys. The oil crisis and the 2008 Great Recession are examples of events that impacted the lagged variables of the economy.
Table 2 shows some descriptive statistics for the instruments and factors for the 720-month period. The first-order autocorrelation shows that the instruments were highly persistent, whereas this was not observed for the factors. Note that for most of the five instruments, the first-order autocorrelation was 97% or higher. The only instrument that could not be considered persistent was the
Industrial Production Index, which had a first-order autocorrelation of 37%. The three factors had first-order autocorrelations lower than 20%.
We made use of the six portfolios selected with equal weights by size and book-to-market (
6 Portfolios Formed on Size and Book-to-Market (2 × 3)). The six portfolios are constructed at the end of each June as the intersections of two portfolios based on market equity and three portfolios based on the book-equity to market-equity ratio, and include all NYSE, AMEX, and NASDAQ stocks, with market equity and positive book data regularly reported by Kenneth R. French (see
http://mba.tuck.dartmouth.edu/pages/faculty/ken.french/Data_Library/six_portfolios.html for further details—accessed on 11 November 2022).
Table 2 also shows the descriptive statistics of the monthly returns of these six portfolios for the same sample period. The lagged variables were used to compute the
statistic. Note that the mean ranged from 0.5% to 1.2% and the standard deviation from 4.7% to 7.2%. The table also presents the first-order autocorrelations, which were generally low and between 12% and 26%, as well as the
from the regressions of the five instruments on the returns. Note that the adjustment coefficient was very low for all six assets, being of the order of 2%.
4. Evaluating Robustness with Monte Carlo Simulations
In order to evaluate robustness, we assessed the statistical properties of the efficiency test statistics using GMM and GEL estimators in a finite sample context. The main goal here was to analyze the size of the Wald and GRS tests under different specifications. The robustness properties were of special interest since contaminations such as heavy tails and outliers may be present in this type of data. Specifically, we were interested in assessing their robustness under (i) finite samples; (ii) data contaminations, such as the presence of outliers and heavy tails in the data; and (iii) increasing numbers of moment conditions.
In our Monte Carlo experiments, we restricted the DGP of the artificial returns to be efficient. This is achieved by defining our generating process to be a function of a specific number of factors with no intercepts (i.e., setting ). By defining different processes for the disturbance term in this DGP, we can generate data with certain features that we are interested in assessing. We constructed four different scenarios to try to incorporate some patterns seen in real financial data. Then, we analyzed the robustness of the estimators through the size properties of the tests presented in the previous section.
To build a dataset of artificial returns, we used the actual returns from the six portfolios based on size and book-to-market and the factors from the Fama–French three-factor model. Seeking to analyze the behavior of our estimators in a finite sample context, we set the sample size to
. We used monthly data spanning the 120 months (10 years) prior to December, 2014. We worked with
managed portfolios to assess the impact of a higher number of moment conditions during the estimation process. A HAC covariance matrix was used for the GMM. In order to deal with serially correlated data, we used smoothed moment conditions for GEL as in Equation (
9). We used the set of five instruments from
Section 3 to form our set of conditional information.
For each portfolio, we ran OLS regressions of the excess returns
on the three factors from the Fama–French model, yielding three estimated coefficients of the parameters
. Using these estimates, we built six artificial series of returns with 120 observations, each defining a process for the disturbance
. In summary, our simulations shared the following common structure:
All four scenarios used this generating process, where only the disturbance term differentiated them. We carried out 500 artificial returns dataset simulations for each of the four scenarios. We chose to run 500 simulations due to the computational burden related to the estimation of the parameters for the efficiency tests since GEL, in particular, has a high computational cost. Below, we describe the four different scenarios we considered for defining the different processes for the error term.
Scenario 1—Gaussian Shocks: The first scenario was our baseline. We sought to assess the efficiency tests for both estimators (GMM and GEL) in the presence of Gaussian innovations. The generating process for
is defined by
Scenario 2—Shocks from a distribution: In the second scenario, we wanted to evaluate the efficiency tests under the presence of heavy tails. As heavy tails are characterized by more extreme values in the disturbance term, an appropriate way to model them is by using innovations drawn from a t-Student distribution. We set the parameter of this distribution to have 4 degrees of freedom in order to have fatter tails. The DGP for
is given by
Scenario 3—Outlier on a fixed date: The third and fourth simulation scenarios sought to evaluate the Wald and GRS tests when outliers were present in the data. In the third case, we modeled the generating process to plug a large-magnitude shock on a fixed date in our sample. Arbitrarily, we chose to add an error in the middle of the sample, i.e., when
. Following the structure of the previous scenarios, the beta coefficients of each asset in the portfolio were estimated by OLS, and when
, there was a negative shock of 5 standard deviations randomly drawn from a normal distribution, with the variance calculated using the original data. In this case,
is defined as
Scenario 4—Outlier with 5% probability: The fourth scenario took another approach to simulating outliers. We used a probability process of extreme events, arbitrarily assuming that the probability of an outlier occurring in each period was
. In case of success, we added an outlier with 5 standard deviations randomly drawn from a normal distribution, with the variance estimated from the original data. In this case, the DGP of
is given by
5. Results
5.1. Sampling Distributions of the Test Statistics
To analyze the results of the Monte Carlo experiments, we used the graphical method proposed by Davidson and MacKinnon [
38]. First, we assessed the
p-value plot that reports the empirical distribution function (
) of the
p-values from the Wald and GRS tests against
for any point
in the
interval. The empirical distribution function in this case is given by
where
is the
p-value of the
J tests, i.e., either
or
. If the distributions of the tests
and
used to calculate the
p-values
are correct, then each
must be distributed uniformly
. This implies that the
chart against
should be as close as possible to a 45° line. Hence, with a
p-value plot, it is possible to quickly evaluate statistical tests that systematically over-reject, under-reject, or reject about the right proportion of the time. Having the actual size in the vertical axis and the nominal size in the horizontal axis, for a well-behaved test for any nominal size, its
p-value plot should always lie close to the 45° line, as the actual size of the said test should be close enough to its nominal size, with the chance of observing a small deviation equally likely (thus, close to a uniform distribution). This feature is what makes it very easy to distinguish between tests that work well and tests that work badly. Additionally, as with these plots we are presented with how a given test performs for all nominal sizes, they are particularly useful for comparing tests that systematically over- or under-reject, or a combination of both, as one can easily identify the nominal size ranges in which the deterioration of the test occurs.
For situations where the test statistics being studied behaved close to the expected behavior, i.e., with graphs being close to the 45-degree line, the authors proposed the p-value discrepancy plot. This chart plots against . According to the authors, there are advantages and disadvantages to this representation. Among the advantages of this chart, it presents more information than the p-value plot when the statistics of the tests are well behaved. However, this information can be spurious as it is just a result of the randomness of the experiments conducted. Furthermore, there is no natural scale for the vertical axis, which could cause some difficulties in interpretation. For the p-value discrepancy plot, if the distribution is correct, then each must be distributed uniformly and the graph of against should be near the horizontal axis.
The results for the first simulated scenario derived from a Gaussian disturbance are shown in
Figure 3. By analyzing the
p-value plot, we can see that GEL provided better
p-values than the GMM for both the Wald and GRS tests under the null hypothesis. We can see that both GEL and the GMM over-rejected for any nominal sizes. For instance, taking a
nominal size for the Wald test, the GMM showed an actual size (proportion of rejections under the validity of the null hypothesis) of
, whereas the size of GEL was less than half of this (
). For the same
nominal size, the GRS test derived for the finite samples indeed performed better for both the GMM and GEL. However, GEL still had better performance. Regarding the
p-value discrepancy plot, we can observe similar results. Based on these graphs, it is possible to observe the superiority of GEL compared to the GMM for estimating the parameters for the
and
tests when Gaussian shocks exist.
The results for the second scenario with shocks from a
t distribution are presented in
Figure 4. The structure of the graphs is the same. In this scenario, by adding a shock from a
t distribution, we investigated the tests’ robustness for data with heavy-tail distributions. Clearly, the tests based on the GMM performed badly in the finite samples for distributions with long tails. For a
nominal size, the Wald test using the GMM had an actual size of
, whereas that using GEL was slightly more than half of this (
). For the GRS test, the performance of both estimators improved. For the same
nominal size, the GMM had an actual size of
and that of GEL was
. However, although one can say that the GMM performed poorly in finite samples with heavy tails compared to GEL, these results cannot hide the fact that both estimators generally over-rejected under these circumstances. Even if we consider that GEL performed better, having an actual size of nearly 5 times the
nominal size for the Wald test, and an actual size of more than 3 times the
nominal size for the GRS test, we cannot necessarily conclude that their performance was satisfactory.
Figure 5 shows the results for the third scenario, with great magnitude shocks in the middle of the sample. The goal was to check robustness in the presence of outliers. Here, the evidence was similar, indicating that the GMM had worse performance than GEL under the null hypothesis. Note that both estimators always over-rejected when we added a random shock with 5 standard deviations in the middle of the sample.
Finally, in
Figure 6, we can see the results for the fourth scenario in which we also sought to evaluate robustness to outliers. Here, we obtained interesting results that differed from the earlier ones. The
and
tests based on the GMM estimations showed better results than those based on GEL for any nominal size we choose. However, note that this superiority was tenuous, being more discernible for nominal values below
. Taking a
nominal size, the Wald test with the GMM has an actual size of
, whereas that using GEL was
. For the GRS test, assuming the same
nominal size, the size of the GMM was
and that of GEL was
. By analyzing the
p-value discrepancy plots, we can observe a similar pattern with an important feature; for both tests, both the GMM and GEL estimations tended to consistently improve performance after reaching a peak of discrepancy around a nominal size of
.
In summary, by analyzing all the results presented in this section, it is possible to observe that efficiency tests in finite samples with GEL estimations tend to have better performance compared to estimations via the GMM. Furthermore, tests using GEL are more robust to the presence of heavy tails. To assess the robustness for outliers, depending on the generating process assumed, both the GMM and GEL can be advantageous. However, these results also demonstrate that whatever estimator and test we evaluate, in general, the Wald and GRS tests have a tendency to over-reject.
5.2. Empirical Analysis
Briefly, in this section, we show how efficiency tests based on the GEL and GMM estimations can lead to different decisions using real datasets. We evaluated both methods (i) with no conditional information and (ii) when a
managed portfolios structure was used. To do so, the analysis was conducted by comparing the test results for the different sample sizes, as well as for the two asset pricing models (CAPM and the Fama–French three-factor model), employing the efficiency tests defined according to
Section 2.2. For all portfolios, testing their efficiency should be seen as testing whether the factors from each of the asset pricing models explain the portfolios’ average returns. For the CAPM, the interpretation was made by assessing whether using the individual historical returns with a unique risk factor (i.e., the
Mkt factor) yielded an efficient portfolio (i.e., when the estimated intercepts are not jointly statistically significant), whereas for the Fama–French three-factor model, we evaluated whether the three risk factors used in Equation (
11) (namely,
Mkt,
SMB, and
HML) yielded a similar statistical conclusion when jointly evaluating the vector of the estimated alphas.
Table 3 presents the estimation results of the GMM and GEL when no conditional information was used in the asset pricing moments for an increasing sequence of months, starting with the last 60 months and extending the window up to 1020 months. Each sample begins in January of a given year and ends in December 2014. The table also presents the estimations of the two asset pricing models of interest for each time interval, the capital asset pricing model (CAPM) and the Fama–French (FF) three-factor model. Initially, by examining the test results using either the GMM or GEL, we noticed that for all periods over 180 months, both the CAPM and Fama–French models showed strong evidence for rejecting the hypothesis of efficiency for each model. However, for a short
T, we observed strong disagreement between both methodologies, whereas for
(i.e., 5 years), we saw no evidence for rejecting the efficiency using either the GMM or GEL for both models, and in the tests for
,
, and
, the GMM and GEL pointed in opposite directions.
For 90 months, the GMM rejected efficiency at a significance level for the CAPM using either the Wald or GRS tests. We did not observe the same results using GEL for the same sample size. For the Fama–French model, we did not see such a strong disagreement between them. For 120 months, we saw similar results. With GEL, the p-values for the Wald and GRS tests were and , respectively, for the CAPM model. With the GMM, these p-values were much smaller and provided evidence against the null hypothesis that the alphas were jointly equal to zero at a standard significance level. For the Fama–French model, the p-values generated by the GMM and GEL were very similar: and (Wald) and and (GRS), respectively. For months, the same pattern was repeated. The p-value for the CAPM using GEL of the Wald statistic was , whereas the p-value of the F distribution under the assumption of normality given by the GRS test was . The GMM provided much smaller p-values, with both tests showing evidence for rejecting the efficiency hypothesis for a significance level of . For the Fama–French model, the difference between the p-values using either GEL or the GMM was smaller. Thus, the divergence between them was more tenuous.
In
Table 3, overall, we can see some evidence to endorse the simulation results presented in
Section 4, as the GMM over-rejected the null hypothesis compared to tests conducted via GEL, especially in a finite sample context.
Table 4 presents the results of the efficiency tests for the
multiplicative approach. Here, we use
managed portfolios, where five lagged variables were used as instruments. In the
Appendix A, we extend the analysis to different portfolios with higher numbers of assets (e.g., N = 25 and N = 49).
A quick inspection of the results of the tests shows us compelling evidence for rejecting the efficiency for all intervals of 180 months and above for all tests and models based on estimations from either the GMM or GEL. Although for longer periods the
p-values were virtually zero, for
,
, and
months, the inference tests using the GMM and GEL were conflicting. Singularity problems may have occurred during the estimations, impeding the inversion of the covariance matrix. These cases are shown as “NA”. For
, we could not perform the tests for both models using the GMM. Even though we obtained estimates for the CAPM coefficients using the GMM, we were not able to invert the covariance matrix and perform the tests. For the CAPM, GEL showed no indication to reject the efficiency (for
and
), whereas the GMM did (
p-values were practically zero for the Wald and GRS tests). The results are similar to the case in
Table 3 where no instruments were used.
With the use of instruments, the tests of efficiency for the Fama–French model did not necessarily provide different inferences regarding the rejection of the null hypothesis. However, we still saw that the GMM generated smaller p-values for both tests than GEL. However, for , the GMM and GEL strongly disagreed, where the GMM generated p-values higher than and GEL had p-values practically equal to zero.
In order to connect these results with those from the Monte Carlo experiments performed under the different data contamination scenarios from
Section 4, there are some particularities to be taken into consideration, as the results shown in
Table 4 might be influenced, unlike those of the controlled Monte Carlo experiments. In fact, there is a range of complexities to be controlled in order to be able to make a fair comparison. First, embodied in our empirical results is the fact that the true DGP that generated the real data used in this analysis is unknown; we just relied on the most common factor specifications for the pair of models employed. In the case of the incompleteness of the risk factors, this inherently affects the results of any of the tests as the power and size might be impacted in distinctive ways, independently of the estimation procedure employed. Similarly, the correct test specification is fundamental (see [
39] for a discussion of an alternative formulation of the GRS test). All of these issues could naturally lead to conclusions in either direction with regard to the observed rejections, given the true unknown DGP. In light of these points, the results here for the comparable cases in both analyses in which we used
managed portfolios with a sample size of
evaluated under the Fama–French model show only marginal differences (slightly higher GMM
p-values than GEL ones). Given this magnitude of divergence in the
p-values, one cannot argue in favor of the validation or not of the previous results solely based on these cases.
6. Conclusions
We evaluate the behavior of the GMM and GEL estimators in tests of portfolio efficiency. We argue that both estimators have different statistical features, and therefore, tests of portfolio efficiency based on them may reflect these differences.
First, we assess the robustness of the tests with the use of the GMM and GEL estimators in a finite sample context. Defining different DGPs to incorporate different specifications, we perform several Monte Carlo experiments to examine the effects that distortions in the data can have on tests of efficiency, and consequently, on decisions based on these results. In general, we see evidence that GEL estimators have better performance when heavy tails are present. Depending on the characteristics of the DGP chosen, both the GMM and GEL can have better robustness to outliers. However, under the null hypothesis, for both estimators, the Wald and GRS tests have a tendency to over-reject the hypothesis of efficiency in finite samples.
Using returns from real datasets in our analysis, we see that (i) in general, efficiency tests using GEL generate lower estimates (higher p-values) and (ii) when the sample size has finite characteristics, with low N and T, we note that the results are conflicting among the methodologies. These results may be evidence that estimators from the GEL class perform differently in small samples. In addition, they show that tests based on the GMM have a tendency to over-reject the null hypothesis of efficiency.
The results obtained in our work indicate some limitations of the use of GEL in the construction of efficiency tests, especially in empirical applications. Although the use of this method leads to improvements in properties in finite samples and greater robustness in relation to the presence of heavy tails, as discussed in
Section 5.1, the GEL-based tests still show the over-rejection tendency that is also present in the tests based on GEL in the GMM. Another possible limitation is the possibility of local optima in numerical maximization procedures. As discussed in Anatolyev and Gospodinov [
5], a numerical optimization with respect to the conditional structural parameters in empirical likelihood models can be hampered by the presence of local minima, possible singularities, and convergence problems due to the fact that the Hessian is not guaranteed to be positively defined during a numerical optimization. Although it is possible to use more robust optimization methods in relation to these problems, especially in empirical analysis, there is a risk of estimating a local optimum due to the dependence on the choice of initial values.
An interesting generalization of our work is the construction of portfolio efficiency tests in the presence of conditional information using other estimators related to the empirical likelihood approach. As discussed in Anatolyev and Gospodinov [
5], empirical likelihood can be viewed as a member of a general family of minimum contrast estimators, especially the class of power-divergence-based estimators. By placing restrictions and some modifications on the general Cressie–Read [
40] divergence function, it is possible to obtain the empirical likelihood, exponential tilting, Euclidean likelihood, GMM estimator with continuous updating, exponentially tilted empirical likelihood, and a version of the Hellinger distance estimator as particular cases. Although these classes of estimators are asymptotically equivalent, their properties in finite samples can be different, especially in relation to robustness to general forms of misspecification. In this aspect, the exponentially tilted empirical likelihood and Hellinger distance estimator classes have some theoretical robustness properties, which can be potentially relevant in the analysis of financial data.
Other possibilities for building and evaluating efficient portfolios involve the use of data envelopment analysis methods [
41,
42,
43]. A comparison between the DEA methods and our analysis would require modifying the DEA methods to use conditional information in the form of moment conditions or instruments, which is not yet fully developed for this class of applications.
An important limitation of our work is the limited number of factors considered in our analysis, as we do not consider the impact of possible high dimensionality on the set of possible risk factors. The recent financial literature has discussed the possibility of a huge number of possible risk factors in a phenomenon known as the Zoo factor, as discussed, for example, by Harvey and Liu [
44] and Feng et al. [
45]. The high dimensionality in the number of possible risk factors would affect our analysis in several dimensions. The inclusion of a greater number of factors in the estimation of portfolio risk premiums would lead to a large increase in the number of moment conditions, especially in the context of the incorporation of conditional information, and the use of GEL estimators in this case would be advantageous in the sense that this method does not present the problem of bias in finite samples proportional to the number of moment conditions that impairs the performance of the GMM estimator. Note that our analysis assumes the usual estimation conditions, where the sample size is greater than the number of parameters of the conditional mean of the returns, and thus the context of a number of factors greater than the sample size would require the combination of the GMM and GEL estimators with some form of shrinkage, which has not yet been developed, to the best of our knowledge. The results of our empirical analysis also consider that the specification of the risk factors included in the model is correct, and thus the empirical results, in particular, the observed rejections, may reflect both the possible inefficiency of the portfolios in relation to the factors included and the impact on the power and size of the tests in the presence of omitted factors. A relevant development would be to adapt the portfolio efficiency tests in the presence of conditional information for the possible omission of factors in line with the methods developed by Giglio and Xiu [
46] for the pricing of assets with omitted factors.