Over Two Centuries of Global Factor Premiums
Paper Summary

Hot off the press, a new paper by Guido Baltussen, Laurens Swinkels and Pim van Vliet at Dutch quant powerhouse, Robeco, covers global multi-asset factor premiums over an unprecedented sample of 217 years. We thought the topics and findings were important and timely enough to warrant a summary.

The new paper, titled “Global Factor Premiums” examines global equity indexes, 10-year government bond indexes, commodities and currency markets to understand how well the most pervasive, persistent, economically significant and investable style premia hold up on a very long out-of-sample dataset. Specifically, the authors study global multi-asset trend, momentum, value, carry, seasonality, and betting-against-beta (BAB) premia on monthly data back to 1799.

The authors focus exclusively on multi-asset factors since several authors have already published extensive research on factor persistence in individual securities. For example, Golez and Koudijs examined stock and bond returns back to 1629; Goetzmann and Huang analyzed stock momentum in Imperial Russia from 1865-1914; and Geczy and Samanov (2013) showed “Two Centuries of Price Return Momentum” in U.S. securities.

The novel contributions from this paper pertain to the following:

  • Replication of seminal studies dealing with multi-asset premia over the period 1981 – 2011 using a uniform testing methodology to mitigate the potential impact of p-hacking from different implementation methodologies.
  • The introduction of more rigorous statistical tests based on the Bayesian perspectives on p-values advocated by Cam Harvey.
  • An application of uniform testing methodologies to examine factor premia over a very long out-of-sample dataset including the period 1799 – 1980 and 2012-2016.

Within each asset class the authors construct factor portfolios using the following uniform methodologies:

  • Trend and momentum are defined as the 12-month-minus-1-month excess return. While this is consistent with traditional measures of cross-sectional momentum, “trend” as proposed by Moskowitz, Ooi and Pedersen; and Hurst, Ooi and Pedersen does not impose a skip month. The current authors make the case that a skip month is more conservative because it allows for a delay in data diffusion in very early periods.
  • Value is defined as dividend yield for equity indices; real yield for bonds; 5-year reversal in spot prices for commodities; and absolute and relative purchasing power parity for currencies.
  • Carry is simply the implied ex ante yield on each instrument.
  • Return seasonality is computed as the mean excess return produced by a market in a certain month over the prior 20-year period.
  • Betting-against-Beta is long the low-beta assets and short the high beta assets with positions neutralized for ex ante beta relative to the global asset class portfolio return.

Replication study

For the purpose of the uniform tests, portfolios were formed at the end of each month. The trend portfolio takes long positions in markets where the trend is positive and vice versa. All other tests are cross-sectional, where the authors rank markets in each universe based on the factor measure and take a position equal to the rank minus its cross-sectional average. Note that this procedure is distinct from the methodology applied in many academic papers, which sort on target characteristics and take market-cap or equal-weight positions in securities that exceed certain quantiles in either direction. The authors note that their results hold in general for this alternative approach.

The authors find Sharpe ratios averaged 0.41 with uniform specification in the replication study. Half of the factor premiums were significant at the traditional 5% threshold, while 1/3 of the strategies were significant at the stricter 1% threshold. The multi-asset versions produced Sharpe ratios between 0.39 (BAB) and 1.15 (Carry).

Figure 1 below illustrates the Sharpe ratios observed from tests in the original papers (Panel A) versus results from the uniform replication tests (Panel B) performed by the authors. The grey dashed line shows a traditional α threshold of 5% (t-stat of 1.96) and the black dashed line shows a more conservative α threshold of 1% (t-stat of 3). Numbers above the bars represent Bayesian p-values using a 4:1 prior odds ratio, consistent with a threshold Cam Harvey classifies as “perhaps” sufficient to address p-hacking concerns1. From our perspective, the Bayesian p-values are almost absurdly conservative, especially since many of the factors under discussion were documented long before the introduction of modern computing capabilities.

1 Bayesian p-value = -exp(1) x p-value x ln(p-value) x prior odds / (1+ (-exp(1) x p-value x ln(p-value) ) x prior odds)

Figure 1. Global factor returns: modern period

Panel A: Original documentation

Panel B: Replicating factors 1981-2011

Source: Baltussen, Swinkels and van Vliet (2019) “Global Factor Premiums”

From Figure 1 it’s clear that trend and carry are dominant factors in both the original and the replication samples, with statistical significance in excess of conservative thresholds in all asset categories. The uniform multi-asset versions of trend and carry produced very impressive Sharpe ratios of 1.09 and 1.15, respectively, with t-stats above 6. In addition, equity indexes and commodities exhibited strong momentum effects in both the original and replication samples.

Interestingly, the current authors found that value, seasonality and BAB showed only marginally significant effects in the replicating sample, even against the more tolerant frequentist thresholds (traditional p-values of 0.06, 0.05 and 0.04 respectively). The effects fell well below more conservative thresholds with respective Bayesian p-values of 0.64, 0.6 and 0.58. More troubling, these three well-known factors also failed to show Bayesian significance at the multi-asset level, even after accounting for the benefits of diversification between same factor returns across asset class categories. However, the authors did observe that:

The main purpose of this paper is to provide more robust and rigorous long-term evidence of the historical presence of global return factors, utilizing their most simple or basic definitions as put forward in influential papers analyzing recent samples. In this light, this study does not examine smarter and possibly better definitions

Global return factors since 1800

After discussing the results of the replication study over the modern period the authors applied the same factor analyses to their novel long-term dataset.  New markets were introduced as data became available. The sample consisted of 13 global markets in 1800, increasing to 18 markets in 1822; 36 markets by 1870; 50 markets by 1914; and 66 markets by 1974. In 1999 the total number of markets declines from 68 to 63 because of the introduction of the Euro currency. The authors employed a number of methods to screen for bad data or outliers consistent with other similar studies.

Out-of-sample results were presented independent of the replicating sample (1800-1980 and 2012-2016) and also over the entire period from 1800-2016. Interestingly, the authors found an average Sharpe ratio across factors of 0.41 in the out-of-sample period, exactly consistent with what they found in the replicating sample from 1980-2012. However, t-values were much larger because of the larger sample, so that 19 of 24 combinations of factors and asset categories produced t-values above 3.

In addition to confirming the strong economic significance of trend and carry, the larger sample surfaced a highly significant seasonality effect. Return seasonality in government bonds and currencies was especially strong in the new data. They observed statistically significant momentum and value effects for three out of four assets. Momentum in commodities and value in currencies were notable exceptions, though the sample for commodity value only extended to 1968. Results for BAB were insignificant for all but the equity index category.

Figure 2. Statistical perspectives on global return factors, 1800-2016

Panel B: 1800-2016

TrendMomentumValueCarrySeasonalityBAB
Equitiesp-value0.000.000.000.000.000.00
Bayesian-p0.000.000.010.000.000.00
BE-odds>9,999>9,99913.18>9,999>9,999>9,999
Bondsp-value0.000.000.000.000.000.66
Bayesian-p0.000.000.010.000.000.75
BE-odds>9,9991,040.3416.36>9,999>9,9990.06
Commoditiesp-value0.000.480.000.000.000.48
Bayesian-p0.000.790.000.050.000.79
BE-odds4,424.810.05>9,9993.78>9,9990.05
FXp-value0.000.000.280.000.001.00
Bayesian-p0.000.000.790.000.001.00
BE-odds>9,9991,090.150.05>9,99979.160.00
Multi Assetp-value0.000.000.000.000.000.01
Bayesian-p0.000.000.000.000.000.23
BE-odds>9,999>9,999>9,999>9,999>9,9990.63

Source: Baltussen, Swinkels and van Vliet (2019) “Global Factor Premiums”

The authors were aware that factor specification can play a material role in results, even over long sample horizons. They performed tests on a variety of methodological variations to test for robustness. For example, they removed the liquidity screen (increased Sharpe ratios); formed equal-weight tertile portfolios (no change); eliminated volatility scaling (over-weighted high volatility instruments and lowered aggregate Sharpe ratios); lagged signals by one month (small decay in most strategies but completely eliminated seasonality effect by construction); rebalanced quarterly (small decay in most strategies except value); and trimmed extreme returns (slight increase in performance). Overall the authors concluded that results were robust to alternative specifications.

Economic and risk explanations

Investors may be interested in whether the factors under consideration represented compensation for well-known economic or financial risk factors. The advantage of long-term data series is that it is easier to identify explanations that relate to relatively infrequent observations such as downside or macroeconomic risks.

The authors presented several comprehensive analyses to address common factor variation and sensitivity to known risks. They found low correlations across factors, and that most individual correlation coefficients between multi-asset factor series and each factor-asset-class series are also close to zero. The authors conclude that the 24 return factors are unique drivers of returns and share little common variation. This stands in contrast to other studies, such as Asness, Moskowitz and Pedersen (2013) “Value and Momentum Everywhere” which found that global value and momentum effects across assets and securities showed significant common covariance effects.

In spanning tests, the authors confirm the findings of Moskowitz, Ooi and Pedersen that trend returns subsume momentum returns. In other words, cross-sectional momentum effects are insignificant when controlling for trend effects. We would note that in a July 2017 paper “Cross-Sectional and Time-Series Tests of Return Predictability: What Is the Difference?” Goyal and Jegadeesh show that trend and momentum are not distinct sources of return, and that differences in returns stem from the time-varying net long exposure to risky assets invoked by trend strategies.

The new dataset provides a relatively large sample of 43 equity bear markets and 218 downside market states (where returns are below -1 standard deviation from the average). In contrast to other studies, the authors find that downside risk explains at best a part of the global factor returns, most notably carry. When beta is replaced by downside beta in the Fama and MacBeth regressions the authors find an insignificant cross-sectional risk premium of 0.05 percent (t-value = 0.26). As a result, the authors conclude that the factor premia are not explained by downside risk in the long-term sample.

The authors also calculate the contemporaneous annual factor returns for “good” and “bad” market states, and the return difference between the states, and perform several other regressions against common macroeconomic variables. They conclude, “In summary, our tests reveal very limited evidence of a link between macroeconomic risk and global return factors.”

Summary

The paper “Global Factor Premiums” analyzes well known return anomalies by employing long-term data not previously considered in the literature. They replicate seminal studies with a uniform methodology and introduce robust statistical tests that are resilient to p-hacking.

The authors find that trend and carry factors dominate in the replication studies with multi-asset Sharpe ratios of 1.09 and 1.15 respectively, exceeding even the strictest significance thresholds. Other factors exhibit inconsistent results.

In the extended dataset the authors observed highly significant results from trend, carry and seasonality premiums, bolstered by large sample sizes. Value was significant for most asset categories excepting currencies, and BAB produced significant results only for equity indexes.

Surprisingly, the authors did not surface a meaningful relationship between return premiums and major risk factors. The anomalies, while exhibiting extremely significant, pervasive, and persistent results across centuries of data, are still largely unexplained by contemporary theories of risk.

Finally, given that the premia have produced strong returns with low sensitivity to markets and traditional risk factors, allocations to risk factors have the potential to substantially expand the efficient frontier, and add value to most portfolios.