We're using cookies, but you can turn them off in your browser settings. Otherwise, you are agreeing to our use of cookies. Learn more in our Privacy Policy

Bridge over ocean
1 March 2016 CFA Institute Journal Review

Backtesting (Digest Summary)

  1. Clifford S. Ang, CFA

Because of the potential for data mining and multiple testing, it is common practice to haircut reported Sharpe ratios by 50% when evaluating backtests of trading strategies. The authors propose an approach that calculates a haircut to the Sharpe ratios to account for data mining and multiple testing. The result is a “haircut Sharpe ratio” that penalizes trading strategies that generate high Sharpe ratios less than those strategies with marginal Sharpe ratios.

What’s Inside?

The reported results of trading strategies typically are the product of data mining and multiple testing. Because the data mining approach examines many strategies until one is found that works, a common rule of thumb is to discount Sharpe ratios of backtested strategies by 50%. The authors propose a method of reducing the reported Sharpe ratios to account for data mining and multiple testing. The end result is what the authors call the “haircut Sharpe ratio.” Strategies with high Sharpe ratios are penalized less, and strategies with low Sharpe ratios are penalized more. This haircut Sharpe ratio can then be translated into a minimum profitability hurdle for the investor to exceed.

How Is This Research Useful to Practitioners?

The results reported in academic and practitioner research with respect to trading strategies often suffer from the issue of data mining. For example, researchers attempting to develop trading strategies are likely to try different tests until they find one that eventually works and only report those results. In such situations, the use of a t-statistic of 2.0 (i.e., a probability value of 5%), which is appropriate in a single test framework, may not be the appropriate cutoff for statistical significance under a multiple testing approach. The authors propose a method by which the p-values can be adjusted to appropriately reflect multiple testing and arrive at a deflated Sharpe ratio that takes into account the effects of data mining.

Another benefit to practitioners of the authors’ research is that it provides a method to determine the minimum profitability hurdles for proposed strategies. The authors provide their computer code (available on the lead author’s website), in which the inputs are the desired significance level; the number of observations, the strategy volatility, and the assumed number of tests are included; and the output is the minimum average monthly return that the proposed strategy must exceed. Providing the code allows practitioners to implement the concepts presented and allows investors to make timely decisions about a proposed strategy’s viability.

The authors acknowledge several caveats to their method. First, because it is not known how many tests were actually conducted by the authors, an assumption has to be made about the number of tests that were performed. Second, the use of Sharpe ratios means that the method is also limited by factors that limit Sharpe ratios, such as nonlinearities in the trading strategy or the variance not being a complete measure of risk.

How Did the Authors Conduct This Research?

The framework of the research relies on the statistical concept of multiple testing. In a single test, a threshold t-statistic greater than 2.0 may be appropriate. But in multiple testing, the usual p-value for single tests no longer reflects the strategy’s statistical significance. In fact, it is not clear what the appropriate cutoff for statistical significance would be. The authors propose a method that attempts to answer this question.

Their method explicitly takes into account that there are hundreds of strategies that have been proposed and tested in the past, and they come up with a method to adjust the Sharpe ratio (denoted as the haircut Sharpe ratio) to account for the multiple tests. This haircut Sharpe ratio can be interpreted as the Sharpe ratio that would have resulted from a single test.

The authors apply their multiple testing adjustment to the following three strategies: earnings-to-price ratio, momentum, and the betting against beta factor. These three strategies cover three different investment styles—value, trend following, and potential distortions induced by leverage, respectively. The authors do not account for transaction costs, so the Sharpe ratios and t-statistics they report are overstated.

They also provide a method to calculate the minimum profitability hurdle the trading strategy must exceed given the desired significance level, the number of observations, the strategy volatility, and the assumed number of tests. The resulting minimum profitability hurdle is intended to allow investors to make timely decisions about the viability of a particular trading strategy.

Abstractor’s Viewpoint

The authors highlight one of the effects of data mining that investors may or may not be aware of but is prevalent in a large number of empirical works produced by academics and practitioners. Because of data mining, many purportedly profitable investment strategies work in backtesting but do not perform as well when implemented. The procedures proposed by the authors appear to help investors identify which investment strategies would be more profitable in real time.