Evidence Based Investing is Dead. Long Live Evidence Based Investing! Part 1

Michael Edesses’ article, “The Trend that is Ruining Finance Research” makes the case that financial research is flawed. In this two-part article series, we will examine the points that Michael raises in some detail. We find his arguments have some merit. Importantly however, his article fails to undermine the value of finance research in general. Rather, his points serve to highlight that finance is a real profession, that requires skills, education, and experience that differentiates professionals from laymen.

Thoughtful, educated finance professionals equipped with the right tools can use evidence based finance to make much better decisions.

Michael’s case against evidence based investing rests on three general assertions. First, there is a very real issue with using a static t-statistic threshold when the number of independent tests becomes very large. Second, financial research is often conducted on a universe of securities that includes a large number of micro-cap and nano-cap stocks. These stocks often do not trade regularly, and exhibit large overnight jumps in prices. They are also illiquid and costly to trade. Third, the regression models used in most financial research are poorly calibrated to form conclusions on non-stationary financial data with large outliers.

This article will explore the issues around the latter two challenges. Our next article will tackle the “p-hacking” issue in finance, and propose a framework to help those who embrace evidence based investing to make judicious decisions based on a more thoughtful interpretation of finance research.

An un-investable investment universe

A large proportion of finance studies perform their analysis on a universe of stocks that is practically un-investable for most investors. That’s because they include stocks in their analysis with very small market capitalizations. In fact, the top 1000 stocks by market capitalization represent over 93% of the total aggregate market capitalization of all U.S. stocks. This means the bottom 3000 or so stocks account for just 7% of total market capitalization. The median market cap of a stock in the bottom half of the market capitalization distribution is just over $1billion.

Figure 1. Cumulative proportion of U.S. market capitalization

Source: Blackrock

Mathematically, only a very small portion of investment capital can be deployed outside the top 1000 or so stocks. These smaller stocks are also much less liquid, with less frequent trading, high bid-ask spreads, and larger overnight jumps. Moreover, these companies tend to trade at low prices, which means trading costs are larger for institutions who pay commissions on a per share basis.

For these reasons, practitioner oriented studies should include sections on how inefficiencies manifest among larger and smaller companies in isolation. And many do. In particular, many of the papers from AQR break down the performance of anomalies into effects among large (top 30% by market cap), mid (middle 40% by market cap) and small (lowest 30% by market cap) companies. The paper “The Role of Shorting, Firm Size, and Time on Market Anomalies” by Israel and Moskowitz at AQR focuses specifically on this topic. Figure 2 below shows the results for traditional value and momentum factor portfolios for five different market capitalization buckets from 1926 – 2011.

Figure 2. Performance of value and momentum factor portfolios conditioned on market capitalization

Source: Israel, R., and T. Moskowitz. “The Role of Shorting, Firm Size, and Time on Market Anomalies.” Journal of Financial Economics, Vol. 108, No. 2 (2013)

Many readers may be surprised at the results. Notable effects in Figure 2 are highlighted in circles of different colors. Red circles show the long-short factor returns for the largest 20% of firms by market capitalization. The value factor implemented on the largest capitalization bucket produced 3.7% excess average annual returns, which produces a t-stat of just 1.9, which is not quite statistically significant. On the other hand momentum produced 7.49% average annual excess returns with a highly significant t-stat of 2.95 (more on t-stats below). Regression alphas were more grim for large-cap value, with a t-stat of just 1.14, while large-cap momentum has produced over 10% average annual alpha with a very significant t-stat of 4.23 (more on regression below).

The blue circles in Figure 2 examine whether the difference in factor alphas between the lowest and highest market capitalization buckets are statistically significant. The value factor produced over 10% greater average annual alpha in the smallest capitalization stocks than in large cap stocks. This is a highly statistically significant effect, with a t-statistic of 3.21 (top blue circle). In contrast, the difference in alphas between the lowest and highest capitalization buckets was relatively small (2.88%) and insignificant (t-stat = 1.31) for the momentum factor.

It’s worth noting that the analysis in Figure 2 did not account for trading frictions. After accounting for the cost of liquidity, which might be substantial for small-cap stocks, but inconsequential for large-cap stocks, the gap between large- and small-cap factor performance would almost certainly close, perhaps significantly. In addition, those practitioners who are fond of small- or mid-cap value should feel well validated, as value factor performance is strong and significant for every market capitalization quintile other than the largest cap stocks.

To summarize this section, investors must be aware of the practical implications of the universe chosen for investment research. Practitioners should focus on observed effects among mid- and large-capitalization stocks, where results in practice may be expected to align more closely with academic findings.

Regression is a blunt tool

Researchers in empirical finance use linear regression to determine whether, and to what extent, an effect that they are investigating is already explained by previously documented effects. For example, academics use linear regression to determine how well a factor model explains differences in the cross-section of securities prices. Researchers in search of novel return premia use linear regression to determine how much value a newly proposed factor adds above what is explained by already well-known factors. Advisors, consultants and investors use regression to determine if an active investment product or strategy has delivered significant excess risk-adjusted performance, above what they could achieve through inexpensive exposure to factor products.

Unfortunately, linear regression is a very blunt tool when it comes to dealing with complex financial data. The following example will highlight one important reason why. Note that I poached this example from Larry Swedroe, because it is so perfect and surprising.

Consider two strategies A and B, and their returns over a 10-year period. Their return series is depicted in the table below.

Period 1.

Strategy Year 1 Year 2 Year 3 Year 4 Year 5 Year 6 Year 7 Year 8 Year 9 Year 10
A 12% 8% 12% 8% 12% 8% 12% 8% 12% 8%
B 8% 12% 8% 12% 8% 12% 8% 12% 8% 12%

Both strategies have an annual average return of 10. Whenever A’s return is above its average of 10, B’s return is below its average of 10. And whenever A’s return is below its average of 10, B’s return is above its average of 10. Thus, regressing strategy A’s returns on strategy B’s returns over this period will conclude they are negatively correlated. Note that they are negatively correlated even though they both always produced positive returns.

Now imagine that the same strategies produced the following returns in a different 10-year period.

Period 2:

Strategy Year 1 Year 2 Year 3 Year 4 Year 5 Year 6 Year 7 Year 8 Year 9 Year 10
A 2 -2 2 -2 2 -2 2 -2 2 -2
B -2 2 -2 2 -2 2 -2 2 -2 2

Over this period, the same strategies have an average annual return of 0 percent. Perhaps the styles went out of favor. However, whenever A’s return is above its average of 0, B’s return is below its average of zero. And whenever A’s return is below its average of 0, B’s return is above its average of zero. Thus, regressing A on B will render the conclusion that they are negatively correlated.

Now let’s string together the two 10-year periods so that we have a 20-year period. Thus, the return series looks like this:

Asset A: 12, 8, 12, 8, 12, 8, 12, 8, 12, 8, 2, -2, 2, -2, 2, -2, 2, -2, 2, -2.

Asset B:  8, 12, 8, 12, 8, 12, 8, 12, 8, 12, -2, 2, -2, 2, -2, 2, -2, 2, -2, 2.

Recall that both A and B had average returns in the first 10 years of 10 percent, and average returns of 0 percent in the second 10 years. Thus, their average return for the full 20 years in both cases is 5 percent. Now: Are A and B positively or negatively correlated?

A closer inspection reveals that, over the full 20-year period whenever A’s return was above its average of 5, B’s return was also above its average of 5. And whenever A’s return was below its average of 5, B’s return was also below its average of 5. Thus, we see that despite the fact that A and B were negatively correlated over each of the two 10-year periods independently, over the full 20-year period they were positively correlated.

This example highlights an omnipresent by rarely discussed challenge with financial time-series. Specifically, that the measured relationship between variables will almost always change dramatically across time. This effect is not isolated to observations over two distinct periods of time; rather, we observe similar dynamics at play when time series are observed at different frequencies. In fact, variables can appear to be negatively correlated at one frequency – say daily – and yet be positively correlated at another frequency – say monthly!

There are other reasons to be skeptical of results from financial time-series regression analysis. One reason relates to factor specification. Most regression analyses in the finance literature use a common set of risk factors like the Fama French 3-Factor model; the Fama-French-Carhart 4-Factor model; the Fama-French 5-Factor Model, or; a few other variations that include factors like quality, low volatility, and term structure.

Let’s unpack the most common 3-Factor model from Fama and French. This model seeks to explain returns using a combination of a market factor (MKT), a size factor (SMB) and a value factor (HML). Fama and French define the value factor using the Book-to-Price ratio. Specifically, each July 31st they sort stocks based on the Book-to-Price ratio observed on December 31st of the previous year. So when a value oriented investment strategy is regressed on the 3-Factor model, if the strategy employs the Book-to-Price ratio, and rebalances on the same dates as the value strategy in the Fama French model, the regression will show a strong value tilt[1].

However, “value” can be defined in many ways. Some practitioners use Book-to-Price; others use Earnings-to-Price, or Sales-to-Price, or Cash-Flow-to-Price, or other metrics. Portfolios have different numbers of holdings and are rebalanced at different times. Many managers use several factors at once to measure value. All of these deviations from the traditional value factor specification will lead the regression model to observe weak exposure to the “value” factor, even though the other value specifications and methods are equally useful.

The AQR Alternative Style Premia Fund offers an informative case study. The fund purports to invest in pure, market neutral value, momentum, carry, and “defensive” factor strategies applied to individual stocks and bonds, as well as stock and bond indexes and other asset classes around the world.

Using the fantastic PortfolioVisualizer web application, we ran a linear regression analysis to determine the fund’s exposures to the ubiquitous Fama French factors. We started with the three-factor model (market beta (Mkt), small-cap (SMB), value (HML)), then proceeded to the 4-factor model (adding momentum (UMD), and finally to the 5-factor model (removing UMD and adding profitability (RMW) and investment (CMA). The results are shown in Figure 3.

Figure 3. Linear regression factor attribution analysis of AQR Style Premia Fund (QSPIX) using Fama-French factor models

  1. Regression on Fama French 3-Factor Model
  2. Regression on Fama-French-Carhart 4-Factor Model
  3. Regression on Fama-French 5-Factor Model

Source: PortfolioVisualizer

Unpacking the results in Figure 3 we see that when the fund returns were regressed on the 3-Factor model (part 1), the fund had no meaningful loading on the HML value factor (t-statistic of 0.5, p-value of 0.617). However, when the fund returns were regressed on the 4-factor model in part 2, adding momentum (UMD), the analysis surfaced an extremely significant loading on the exact same value factor, along with a very significant loading on momentum. Then when momentum was replaced with profitability and investment factors in part 3, value disappears again. In fact, the fund returns appear not to load meaningfully on any of the factors!

Finally, we ran a regression using AQRs own factor specifications. Specifically, we regressed on the market, AQRs value factor (HML-Devil), momentum, and quality (QMJ). Per Figure 4, this regression surfaced very statistically significant loadings on all of the factors that one might expect given the fund’s mandate. (Note: when we included the Betting Against Beta (BAB) factor, neither the QMJ or BAB factors were statistically significant, because these factors are highly cointegrated).

Figure 4. Linear regression factor attribution analysis of AQR Style Premia Fund (QSPIX) using AQR factor model

Source: PortfolioVisualizer

Given the challenges described above with the use of linear regression models, many practitioners may be tempted to abandon the process altogether. Worse, investors may resort to comparing simple returns, with no awareness of the exposures to risk factors beneath the hood. However, those investors who persist in finding better tools for analysis are likely to be richly rewarded with better calibrated models, and a clearer understanding of the factors that drive investment returns.

To address the challenges raised above, specifically the fact that relationships between variables change over time (non-stationarity), and issues around how explanatory variables are specified, researchers should employ more robust regression methods. For example, k-fold cross-validation, where linear regression is performed on a subset of the data and applied to several out-of-sample subsets, helps control for the non-stationarity issue. Constrained regressions like LASSO, sequential, and ridge regression allow researchers to include many correlated variables in their analyses – like different specifications of “value”, or both the QMJ and BAB factors – which would otherwise corrupt a linear regression analysis. Admittedly, these tools require technical knowledge and advanced education, but this is the nature of a true profession. Those who wish to learn more about these methods could do a lot worse than this book (h/t to Dave Cantor for the book reference).


Finance research suffers from a variety of challenges that make it difficult for practitioners to make informed decisions. Many papers examine financial effects by including in their analysis stocks with very small market capitalizations, which would probably not be tradable in practice. When the same studies are conducted on a larger capitalization universe of stocks, which investors could trade with reasonable costs and scale, researchers often arrive at different results. We highlighted one example, which showed that, while the popular “value” factor exhibits a large and significant effect when applied to mid- and small-cap U.S. companies, it renders a statistically insignificant result when applied to a large-cap investment universe. Thus there appears to be little value in exposing investors to value tilts in large-cap portfolios.

Another important consideration for evidence-based investors is that the most common tool for investigation – linear regression – is not well designed to deal with noisy and evolving financial data. As a case study, we performed several factor attribution regression analyses on a pure factor-oriented product, the AQR Alt Premia Fund (QSPIX). Our results show that these types of regression analyses can be highly sensitive to which factors are included as explanatory variables, and how those factors are specified. We suggested several advanced regression methods that address the key challenges of traditional regression analysis, but warned that meaningful research will require a greater depth of knowledge about advanced statistical techniques.

In our next article, we will explore the issue of scalability in financial research. Advances in computational power and an explosion of new data sources makes it easy to test thousands of potential relationships among financial variables. Just as a billion monkeys typing randomly for thousands of years will eventually produce a Shakespearean sonnet, thousands of researchers running tests on millions of combinations of economic variables will inevitably stumble onto spurious relationships. We take this issue head-on, and show that the most robust factors easily survive this statistical challenge. We also propose a framework to help investment professionals make judicious decisions based on finance research.

[1] Fama and French performed other machinations to create their factor portfolios which confound regression analysis for attribution on real investment strategies. For example to create the value factor returns they perform the sort on large cap stocks, and again on small-cap stocks, and average the results from the two sorts.