From Fragility to Robustness: The Value of Ensembles – A Case Study in Robust Equity Momentum

Google dictionary defines the word robust thusly:

  • sturdy in construction
  • able to withstand or overcome adverse conditions

… and offers the following definitions for the word fragile:

  • easily broken or damaged
  • flimsy or insubstantial; easily destroyed
  • not strong or sturdy; delicate and vulnerable

How can an investment model be “sturdy in construction” and “able to withstand or overcome adverse conditions?” How might we tell when an investment model is “easily broken or damaged” or “delicate and vulnerable?”.

Why does it matter?

In this brief case study we will explore the concept of fragility using a slimmed down version of the Newfound/ReSolve Robust Equity Momentum Index(NRROMOT), which rotates between regional equity indexes and bonds based on trend and momentum indicators. We use a slimmed down version for computational tractability, since we will be performing a large number of simulations.

Supervised Human Learning

It is useful to think about the construction of systematic investment strategies as a machine learning process. For the purpose of this article a human (yours truly) will perform much of the analysis that would be performed by machines, but the process is the same.

Specifically, the process we will follow in this article is akin to supervised machine learning because we are attempting to train a model to deliver on a specific objective. We want to predict which markets will produce the highest returns in the next period so that we can compound our wealth at the highest rate with manageable losses.

A model requires explanatory variables that are used to inform predictions. Consistent with the NRROMOT index, we will use measures of trend and momentum to predict the optimal asset to hold for each period.

Trend and momentum are close cousins. Momentum compares the strength of trends between two assets while trend measures the direction of movement; up or down. We use the following trend/momentum oriented explanatory variables to fit our model:

  • Time-series momentum (TS) with lookbacks of 30, 45, 60, 75, 90, 105, 120, 135, 150, 165, 180, 195, 210, 225, 240, 255, 270, 285, 300 days.
  • Price relative to moving average (PMA) with lookbacks of 30, 45, 60, 75, 90, 105, 120, 135, 150, 165, 180, 195, 210, 225, 240, 255, 270, 285, 300 days.
  • Short-term moving average relative to long-term moving average (DMA) with short/long lookback pairs of 8/30, 11/45, 15/60, 19/75, 22/90, 26/105, 30/120, 34/135, 38/150, 41/165, 45/180, 49/195, 52/210, 56/225, 60/240, 64/255, 68/270, 71/285, 75/300 days.

The idea behind NRROMOT is that we want to own the regional equity index with the highest momentum so long as global equities are in a positive trend. When global equities are in a negative trend, we will own either short or intermediate-term Treasuries based on which of these has the strongest trend. Figure 1 describes the basic logic to determine the optimal holding at each rebalance.

Figure 1: Strategy Decision Tree

Source: Newfound Research. For illustrative purposes only

We have daily total return data for US equities (S&P 500), foreign equities (EAFE), global equities (ACWI), 7-10 year Treasury bonds, and 1-3 year Treasury bonds back to about 1990. Thus, allowing for priming periods our simulations will start in 1992. Table 1 summarizes the performance of the individual assets over our test horizon.

Table 1: Performance summary for constituent asset classes, 1992 to 2019.

1-3 Year Treasuries 7-10 Year Treasuries Global Equities US Equities Foreign Equities
Start Date Jan 03, 1992 Jan 03, 1992 Jan 03, 1992 Jan 03, 1992 Jan 03, 1992
Annualized Return 2.67% 6.14% 6.11% 10.05% 3.63%
Annualized Volatility 2.00% 6.60% 17.20% 17.60% 17.80%
Sharpe Ratio 0.46 0.67 0.33 0.53 0.19
Max Drawdown -7.20% -11.40% -59.00% -55.50% -63.60%

Source: Data from Bloomberg and CSI Data. Data extensions available upon request at author’s discretion.

Bias / Variance Tradeoff

The purpose of a model is to make predictions with minimal error. But the concept of error is not well understood by finance practioners.

Folks in data science describe error in terms of bias and variance. Models with high bias tend to be simple and generalize well out of sample, but they may leave some explanatory power on the table (i.e. middling backtest but live results are more likely to resemble simulated results). Models with high variance are more tightly coupled with the training data. They are highly explanatory in-sample but may not generalize well on unseen data (i.e. great backtest, poor live results).

To boil it down, data scientists understand that model engineering requires a tradeoff between model complexity (want to explain as much of the effect as possible) and model robustness (want a model that works well on data that may be slightly different from what it was trained on.)

Consider a junior analyst attempting to use our trend and momentum features to engineer a trading model. In our experience, less experienced quants will start out by testing the performance of each individual strategy over the full sample period. Figure 2 plots the compound annual growth rates (CAGR) for strategies specified on each of our 57 trend definitions, traded weekly in five tranches1.

Figure 2: Ordered compound annual growth rates of individual strategy specifications.

Source: Data from Bloomberg and CSI. Analysis by ReSolve Asset Management.

In scrutinizing the results in Figure 2 our junior analyst – and many experienced financial engineers! – might be tempted to conclude that the dma_38,150 indicator is the most optimal predictor. After all, this indicator produced the highest returns out of all the features that were tested.

However, had our junior analyst been trained in data science rather than financial engineering, he would realize that he’s perpetrated a serious flaw in his analysis: he has specified his model based entirely on in-sample performance. In other words, he has chosen a model that was the best fit for the data based on what actually happened in the past. But we have no idea whether the chosen model is likely to be optimal when applied to data that the model hasn’t seen yet.

Walk-forward analysis

At the heart of this issue is the question of whether the best performing model in the past will go on to be the best performing model in the future. While we obviously can’t know how markets will unfold in the future, we can imagine having to decide on an optimal model to trade at times in the past, and observe how those decisions would have played out in subsequent years.

One systematic way to explore this approach – “walk-forward” analysis – follows this process:

  1. Run simulations for all strategies over the full sample period
  2. At each rebalance, use all of the available returns for each strategy up until that date to find the top performing strategy(s)
  3. In the following period, allocate only to those strategy(s) with the best performance until that date

In essence, at each point in time we are going to make a decision about which models are “optimal” based on all available data up to that date, and then we will hold those models in the next period.

This prompts the question of how we should judge which models are optimal. It is common (among junior quants anyway) to choose the strategy with the highest returns. More experienced quants might choose strategies with the highest risk-adjusted returns, measured by Sharpe ratio for example. More sophisticated analysts might seek the portfolio of strategies that maximized the meta-strategy Sharpe ratio. We also employed an approach that bootstrapped the returns up to each rebalance date; we found the max Sharpe optimal strategy weights for each bootstrap sample over a five-year holding horizon and averaged the weights.

Before we discuss the results of our walk-forward analysis, however, we should decide how we might judge performance. What is an unbiased, neutral benchmark against which we can determine whether our walk-forward approach is effective?

The most neutral model that we can construct from our features is one that gives each feature equal weight. We’ll call this strategy the ‘ensemble’ model. If our dynamic methods are able to select certain specifications that materially outperform our ensemble on a walk-forward basis, this would indicate that there is some persistence in top performing specifications. If not, we can reject the theory that specifications that happened to work best on the in-sample period are more likely to produce better results in the future.

Let’s examine the results from our walk-forward tests, described in Figure 3.

Figure 3. Performance of walk-forward simulations.

Ensemble Walk-Forward Top 10 CAGR Walk-Forward Top 10 Sharpe Walk-Forward Top 10 Return/Ulcer Walk Forward Combo Walk-Forward Max Sharpe
Start Date Jan 03, 1992 Jan 03, 1992 Jan 03, 1992 Jan 03, 1992 Jan 03, 1992 Jan 03, 1992
Annualized Return 11.36% 11.34% 11.26% 10.78% 10.97% 11.15%
Sharpe Ratio 0.90 0.84 0.84 0.80 0.82 0.87
Annualized Volatility 10.60% 11.50% 11.50% 11.50% 11.40% 10.80%
Max Drawdown -14.90% -18.20% -17.30% -16.60% -17.90% -16.40%
Positive Rolling Yrs 89.30% 88.40% 89.40% 87.70% 87.60% 90.70%
Growth of $100 2017.19 2007.15 1966.89 1744.09 1831.39 1914.01

Source: Data from Bloomberg and CSI. Analysis by ReSolve Asset Management.

It doesn’t appear as though the walk-forward methods add any value in excess of the naive ensemble. The historical performance of individual strategy specifications does not seem to provide sufficient information to allow us to choose a subset of “optimal” models, which would be expected to outperform our naive equal-weight ensemble. As a result, attempts to choose a single model based on the models’ explanatory power in sample exhibits low bias error, but high variance error since our choice of model does not generalize to out-of-sample data.

Jitter Resampling

The only way to determine the optimal bias/variance tradeoff for model selection is to evaluate models on data that they haven’t seen before.

Walk-forward testing is useful as it recreates the analysts choices at each point in the past. However, the drawback of walk-forward testing is that we don’t use all of the data for our evaluation; rather we only use the data up to each point in time.

Another way to examine the robustness of model specifications is to run the models on brand new data. This is not a trivial exercise as the new data needs to preserve the characteristics of the original data that our models were trained on while introducing enough randomness to tease out potential model fragility.

We propose a method we call “jitter resampling”, which creates new data by subtly shuffling the order of returns in the local area around each daily data point. This approach sustains the mean and volatility, as well as the trend behaviour of each market, which is central to our trend equity thesis.

Specifically, for each daily return we replace the return at time t with a sample return drawn from returns at t-3 through t+3. There is a 40 percent probability that we sample the same return; a 30 percent probability that we replace with the return at t±1; a 20 percent probability that we replace with the return at t±2; and a 10 percent probability that we replace with the return at t±3. We perform this resampling by row so that the cross-sectional relationships between assets are preserved in each sample.

We created a thousand synthetic data sets for our asset class universe and re-ran our simulations for all model specifications on each synthetic universe. This produced a thousand simulations for each model specification, where all specifications were tested on the same data set in each sample.

We were specifically interested in the distribution of terminal wealth across models, since this is a useful proxy for how robust a model is to small changes in sequences of returns. For each model in each sample we found the percent rank of terminal wealth over the full investment horizon (i.e. we found the rank of each model’s terminal wealth relative to all other models on that sample, and then standardized to a value between zero and 100). Figure 4 plots the distribution of model ranks for the top models of each type in the original sample alongside the ensemble.

Figure 4: Distribution of percent ranks of terminal wealth for select strategies relative to all other strategies across one thousand samples.

Source: Data from Bloomberg and CSI. Analysis by ReSolve Asset Management.

We selected for examination the models that had produced the best performance in the original test, and compared the distribution of outcomes for these models against the distribution of outcomes for the ensemble. The performance of top in-sample models exhibited extremely wide rank dispersion on the out-of-sample data, suggesting high variance error.

On the other hand, the ensemble model produced returns above the median (dashed line) on average. Of greater importance, the ensemble produced a much tighter distribution of relative terminal wealth suggesting low variance error. There were no extremely negative outcomes.


In creating investment models (or any models for that matter), investors must seek out models that are most likely to produce the performance they need given the unknowable sequence of returns that they experience once the model goes live with real funds.

Our case study revealed that, while certain models delivered materially better outcomes on the in-sample data used to evaluate the models in hindsight, our walk-forward analysis confirmed that we could not use this relative performance to effectively select model specifications that are more likely to perform out of sample.

Our jitter resampling analysis subjected each model to slightly modified data that preserved the distribution and trending character of the original returns. Consistent with the true objectives of investors, we evaluated how effectively each model navigated these changes in the data by measuring the dispersion of terminal wealth. Strategies that performed best on the in-sample data struggled with the out-of-sample data, delivering a wide range of outcomes.

In contrast, the enesemble strategy produced better than average results consistently and presented a more manageable probability of material adverse outcomes.

  • We rebalance 1/5th of the portfolio every day to approximate the effect of running 5 strategies in parallel, each rebalanced on a different day of the week. For the purpose of this case study we want to isolate the impact of choices of parameter specifications versus the ensemble.

We’ve spent a great deal of time in past articles discussing the merits of portfolio optimization. In this article we will examine the merits and challenges of portfolio optimization in the context of one of the most challenging investment universes: Managed Futures.

Futures exhibit several features that make them challenging from a portfolio optimization perspective. In particular, there can be mathematical issues with large correlation matrices, and certain futures markets may exhibit high correlation in certain periods. However, for practitioners that are willing to make the effort, the extreme diversity offered by futures markets represents a lucrative opportunity to improve results through portfolio optimization.

Seeking diversity

Why are we convinced that diversity produces opportunity? We are motivated by the Fundamental Law of Active Management described by (Grinold 1989), which states that the risk-adjusted performance of a strategy is a mathematical function of skill and the square-root of breadth.

    \[IR=IC \times \sqrt{breadth}\]

where IR is information ratio, IC is information coefficient and breadth is the number of independent bets placed by the manager. For our purpose we can substitute Sharpe ratio for information coefficient because we are focused on absolute performance, not performance relative to a benchmark. Information coefficient quantifies skill by measuring the correlation between a strategy’s signals and subsequent results.

Breadth is a more nebulous concept. Grinold described breadth as the number of securities times the number of trades. However, (Polakow and Gebbie 2006) raise the issue that, “The square root of N in mathematical statistics implies ‘independence’ amongst statistical units (here bets) rather than simply the notion of ‘separate bets’ as is most often implied” in the finance literature.

It is therefore insufficient to simply add more securities in an effort to increase breadth and expand one’s Sharpe ratio. Rather, investors must account for the fact that correlated securities are, by definition, not independent. This prompts questions about how to quantify breadth – the number of independent sources of risk or “bets” – in the presence of correlations.

Quantifying breadth

All things equal investors seeking to improve results should seek to maximize the breadth that is available to them. We will show that traditional methods do a relatively poor job of maximizing breadth and diversification, and that a portfolio’s maximum potential can usually only be reached through optimization.

It’s illustrative to examine the number of independent bets that are expressed when portfolios are formed using traditional versus more advanced optimization methods. We will quantify the number of independent bets by taking the square of the Diversification Ratio of the portfolio.

(Choueifaty and Coignard 2008) showed that the Diversification Ratio of a portfolio is the ratio of the weighted sum of asset volatilities to the portfolio volatility after accounting for diversification.


This is intuitive because if all of the assets in the portfolio are correlated, the weighted sum of their volatilities would equal the portfolio volatility, and the Diversification Ratio would be 1. As the assets become less correlated, the portfolio volatility will decline due to diversification, while the weighted sum of constituent volatilities will remain the same, causing the ratio to rise. At the point where all assets are uncorrelated (zero pairwise correlations), every asset in the portfolio represents an independent source of risk.

(Choueifaty, Froidure, and Reynier 2012) demonstrate that the number of independent risk factors expressed in a portfolio is equal to the square of the Diversification Ratio of the portfolio. Thus, we can find the number of independent risk factors in a portfolio as a function of the weights in each asset and the asset covariances, which allow us to calculate the portfolio volatility.

There are many ways to form futures portfolios. Futures are defined by their underlying exposures, which can range from extremely low volatility instruments like Japanese government bonds (JGB) to very high volatility commodities like natural gas. With such large differences in risk between futures contracts, few managers would choose to hold contracts in equal weight.

Maximizing breadth

Traditional risk weighting

Perhaps the most common portfolio formation method among futures managers is to weight assets by the inverse of their volatility subject to a target risk. Contracts with low volatility would receive a larger capital allocation and vice versa. Exposure to each asset is calculated in the following way:

wi = σT/σi
where wi is the portfolio’s weight in market i, σT is the target volatility and σi is the estimated volatility of market i.

Consider a portfolio of JGB futures with an annualized volatility of 4% and natural gas futures with an annualized volatility of 50%. A manager targeting 10% annualized volatility from each instruments would hold 10%/4% = 250% of portfolio exposure in JGBs and 10%/50% = 20% in natural gas.

When we apply this equal volatility weighting methodology to a diversified universe of 47 futures markets across equities, fixed income, commodities and currencies we produce 6.85 independent sources of return.

Diverse correlations

The popular inverse volatility weighting method described above has the goal of dividing risk equally among diverse futures markets. Unfortunately, inverse volatility weighted portfolios are effective at diversifying risk only when all pairwise correlations between markets are equal (learn why in our seminal whitepaper, The Portfolio Optimization Machine: A General Framework for Portfolio Choice).

(Baltas 2015) observed that inverse volatility weighted portfolios are susceptible to major shifts in portfolio concentration as correlations change through time. This issue has become more important in the policy driven markets subsequent to the 2008 financial crisis. Figure 1 illustrates how pairwise correlations between futures markets shifted higher over the past decade.

Figure 1: Rolling average annual daily pairwise correlations across 47 futures markets.

Source: Analysis by ReSolve Asset Management. Data from CSI.

It is rarely the case that markets exhibit homogeneous correlations. Rather, segments of futures markets such as certain equity markets and some relatively fungible commodity markets like WTI and Brent crude tend to have high correlations while other markets have low or even negative correlations. Figure 2 illustrates just how diverse pairwise correlations can be across futures markets. On average the most correlated markets have had correlations of 0.56 while the least correlated markets have had correlations of -0.26.

Figure 2: Rolling 95th and 5th percentile pairwise correlations across 47 futures markets.

Source: Analysis by ReSolve Asset Management. Data from CSI.

Given that correlations exhibit such large dispersion, inverse volatility weighted portfolios will fail to produce optimally diversified portfolios. Methods for forming portfolios of futures that account for correlations require optimization1.

Toward optimal diversification

The objective of portfolio optimization is to maximize the opportunity for diversification (i.e. maximize breadth) when correlations vary over a wide range. Some assets in our futures universe have strong positive correlations while others are negatively correlated. For example, over the past year Eurostoxx and DAX have experienced a correlation of 0.93 while British Pound and Gilt futures have experienced a correlation of -0.44. All things equal, assets with low or negative correlations relative to most other assets should earn a larger weight in portfolios.

Risk parity

There are several optimization methods to choose from. (Baltas 2015) proposed using a risk parity optimization, where all assets contribute the same target risk to the portfolio after accounting for diversification. The weights for the risk parity portfolio can be found using several methods. The following method was formulated by (Spinu 2013):

    \[w^{ERC}=\operatorname*{arg\,min} \frac{1}{2}w^T\cdot\Sigma\cdot w - \frac{1}{n}\sum_{i=1}^n\ln(w_i)\]

The advantage of risk parity optimization is that all assets with non-zero expected returns will earn non-zero weights. Thus, the portfolio will resemble what might be produced by a traditional inverse volatility weighted approach, and will be more inuitive in constitution.

However, the risk parity portfolio will not maximize portfolio diversification, and will not explicitly maximize the expected return of the portfolio with minimal risk. A risk parity weighted portfolio of 47 futures markets would produce 10.25 independent sources of return. This is a non-trivial improvement over the traditional method’s 6.85 bets.

Mean-variance optimization

As mentioned above, (Choueifaty, Froidure, and Reynier 2012) described why the square of the portfolio’s Diversification Ratio quantifies the number of independent bets available in a portfolio. We can solve for the portfolio weights that maximize the Diversification Ratio – and thus portfolio breadth – using a form of mean-variance optimization of the following form:

    \[w^{MD}=\operatorname*{arg\,max}{}\frac{w \times \sigma}{\sqrt{w^T\cdot\Sigma\cdot w}}\]

where σ and Σ reference a vector of volatilities, and the covariance matrix, respectively.

The optimization maximizes the ratio of weighted-average asset volatilities to portfolio volatility after accounting for diversification. When we solve for the most diversified portfolio of futures using this method we give rise to 13.01 independent sources of return.

Figure 3: Number of independent bets from futures portfolios formed using different methods. Simulated results.

Source: Analysis by ReSolve Asset Management. Data from CSI. Simulated results.

Sharpe multiplier (M*)

Let’s take a moment to understand the importance of the results in Figure 3. Greater breadth, in the form of independent bets, is a force-multiplier on Sharpe ratios.

Recall from our discussion of the Fundamental Law of Active Management above that expected Sharpe ratio is a function of \sqrt{breadth}. As such portfolios formed using the risk parity method should produce Sharpe ratios \sqrt{\frac{10.25}{6.85}}=1.22 times higher than the Sharpe ratio of inverse volatility weighted portfolios. And mean-variance optimized portfolios can produce Sharpe ratios \sqrt{\frac{13.01}{6.85}}=1.38 times higher. We will call this quantity the “Sharpe multiplier” M*.


To put this in perspective, if traditional diversified managed futures strategies have Sharpe ratios of 1, moving from traditional formation methods to optimization-based methods could boost Sharpe ratios to 1.38. At 10% target volatility this boosts a 10% expected annualized excess return strategy to a 13.8% expected return. Over ten years, a $1 million portfolio would be expected to grow to $2.59million using the traditional portfolio inverse volatility weighting method, but it might grow to $3.64 million if portfolios were constructed to maximize diversification. That’s an extra $1million of wealth from applying exactly the same method to select securities, but more thoughtful portfolio construction.

Breadth changes through time

Up to this point we have been calculating independent bets based on long-term average pairwise-complete correlations. Each correlation element is calculated based on the returns for each pair of futures since the inception of the shortest running futures contract.

Of course, not all futures contracts have data all the way back to 1988 and correlations are not stable through time. It is more useful to examine the true breadth – measured as number of independent bets – at each period based on point-in-time correlation estimates. Figure 4 tracks the number of independent bets produced by the three example portfolio formation methods at the end of each calendar year from 1988 through July 2018, derived from trailing 252-day (1-year) correlations.

Figure 4: Rolling number of independent bets produced by different portfolio formation methods, smoothed by trailing 252-day average. Simulated results.

Source: Analysis by ReSolve Asset Management. Data from CSI. Simulated results.

The correlation structure of our futures universe, and commensurate breadth, has fluctuated materially over the past thirty years. Breadth historically contracts when markets enter crisis periods, and expands when markets are functioning normally. In all cases however, portfolios that are optimized to maximize diversification produce considerably more breadth than traditional portfolios that ignore correlations altogether.

Notice that portfolios appear to produce greater breadth on average when they are regularly reconstituted to reflect point-in-time correlations. This is partly because the long-term average smoothes away the many different economic environments experienced by markets over the past three decades, each which produced its own diverse correlation structure. However, short-term sample correlation matrices also typically understate “true”” correlations for mathematical reasons, which are beyond the scope of this discussion. In practice, one should make adjustments to sample correlation matrices to account for sample biases2.

Across rolling annual periods, traditional inverse volatility weighted futures portfolios produced an average of 10.28 independent bets, while risk parity and optimization weighted portfolios produced an average of 16.02 and 21.91 bets respectively.

Sharpe multiplier through time

Recall that when we increase breadth with more thoughtful portfolio formation methods we also increase the expected Sharpe ratio of the portfolio by a factor equal to the Sharpe multiplier, M*. As the ratio of the number of bets produce by optimization versus traditional naive methods fluctuates over time, so does M*. Figure 5 plots the evolution of this effect.

Figure 5: Rolling Sharpe multiplier (M*). Simulated results.

Source: Analysis by ReSolve Asset Management. Data from CSI. Sharpe multiplier (M*) is calculated as the square-root of the ratio of the number of bets produced through optimization versus the naive Inverse Volatility method. Number of bets are calculated from rolling 252-day correlation matrices, smoothed by trailing 252-day average.

The average M* over the past three decades from applying mean-variance optimization rather than traditional inverse volatility weights is 1.51, implying a 51% increase in expected Sharpe ratio. However, intervention in currencies and bonds by global government entities over the past decade caused a meaningful shift in correlation structure as certain markets became more highly correlated while other markets became more negatively correlated. As a result, the average M* over the past decade has been 1.77 and it averaged 1.92 over the highly interventionary period from 2008 through 2013. The multiplier has recently retreated to levels observed in the late 1990s and early 2000s as central banks have retreated from their interventionary policies.

Summary and next steps

Grinold’s Fundamental Law of Active Management implies that equally skilled managers should seek to maximize breadth to boost expected performance.

Managers of futures portfolios have traditionally employed naive portfolio formation methods that ignore information about correlations. These methods, which weight markets in proportion to the inverse of volatility, are optimally diversified only if markets all have equal pairwise correlations.

Figure 2 clearly shows that correlations deviate substantially from the assumption of equality over all historical periods. As a result, traditional portfolio formation methods will render overly concentrated portfolios most of the time.

Some authors have recently proposed risk parity weighting as a solution to the problem of diverse correlations. Risk parity seeks to form portfolios such that markets contribute equal volatility after accounting for diversification. We show that risk parity produces greater breadth than traditional methods on our futures universe. However, risk parity leaves a material amount of breadth on the table.

Portfolio optimization is the only way to extract the maximum amount of breadth when markets have diverse correlations. We show that optimization produces greater breadth than both traditional methods and risk parity at every time step over the past thirty years.

The Sharpe Multiplier quantifies the expected boost to strategy performance as a result of higher breadth. Figure 5 demonstrates that optimization-based methods provide a consistent boost to expected performance. On average, optimized portfolios may exceed the performance of portfolios constructed using traditional means by up to 50%.

Our analysis prompts at least one obvious question. If optimization has such great potential to improve performance, why do most futures managers avoid it? We’ll answer it in the next article in this series. To get the most out of the articles in this series, read The Portfolio Optimization Machine: A General Framework for Portfolio Choice.


Baltas, Nick. 2015. “Trend-Following, Risk-Parity and the Influence of Correlations.”

Bun, Joël, Jean-Philippe Bouchaud, and Marc Potters. 2016. “Cleaning large correlation matrices: tools from random matrix theory.”

Choueifaty, Yves, and Yves Coignard. 2008. “Toward Maximum Diversification.” Journal of Portfolio Management 35 (1). 40–51.

Choueifaty, Yves, Tristan Froidure, and Julien Reynier. 2012. “Properties of the Most Diversified Portfolio.” Journal of Investment Strategies 2 (2). 49–70.

Grinold, Richard. 1989. “The Fundamental Law of Active Management.” 15 (3). 30–37.

Polakow, Daniel, and Tim Gebbie. 2006. “How many independent bets are there?”

Spinu, Florin. 2013. “An Algorithm for Computing Risk Parity Weights.” SSRN.

  1. Given that correlations exhibit such large dispersion, inverse volatility weighted portfolios will fail to produce optimally diversified portfolios. Methods for forming portfolios of futures that account for correlations require optimization^[Many futures managers use heuristic methods to help enhance diversification. For example, they may form sub-portfolios of equity markets, fixed income, currencies, and commodities. Other managers divide commodities into sub-sectors like “energies”, “grains”, “metals” or “softs”. Markets within a sector are weighted by inverse volatility, and then sector portfolios are weighted in the final portfolio by inverse volatility. This may enhance diversification properties somewhat, but pairwise correlations can vary substantially within and across sectors, suggesting this approach provides only marginal benefits relative to naive inverse volatility weighting.
  2. See (Bun, Bouchaud, and Potters 2016) for an overview of biases in sample correlation matrices.