Backtesting Strategies for Quantitative Trading
Quant Trading Ideas from This is the Trade
In the last section, we talked about how to select good quant strategies that are suitable for you based on Ernest P Chan’s book Quantitative Trading. In this section, we are going to talk about the basics of backtesting. This is one of the most important sections in the book. I am currently building a GitHub database where the codes and resources will be shared in addition to the summaries here.
Backtesting is important as it serves several purposes:
Thorough understanding of the strategy
Correct implementation of the strategy in the trading system
Due diligence on the results produced to ensure there are no errors prior to using real money
Allows refinement and improvement during the backtesting and tinkering process
Common Backtesting Platforms
There are many backtesting platforms, some expensive some free. The common ones are:
Excel: The most basic and common tool for both retail and institutional traders. Excel can be more powerful with Visual Basic macros. Excel is good in that what you see is what you get and it is less likely to have a look-ahead bias as you can easily align the dates with the various data columns and signals on a spreadsheet. Another advantage is that Excel allows you to backtest and live trade from the same spreadsheet, eliminating any duplication of programming efforts. The only setback is that it is only suitable for the most simple models.
Python: Python has become the main backtesting language. Many third-party packages allow Python to become much easier to use for machine learning implementation and for data visualization and interaction. There are setbacks to Python in that it can have version conflicts which can throw a wrench into the production system on different computer systems. Python is also pretty slow and has no customer support since it is free. For the purpose of this blog, I will only be focused on Python as it is most widely used by all.
Matlab: Matlab is one of the most common backtesting platforms used by quantitative analysts and traders in large institutions. It is easier to use and faster than Python and has full customer support from the vendor. However, it is not free and comes with a price tag so it is not as easily accessible to most.
R: R is great for classical statistical and econometric analyses for trading but not as many implementations of machine learning algorithms compared to Python and Matlab. R is good for strategy exploration but like Python the IDE is primitive and there is no customer support.
QuantConnect: QuantConnect is a web-based algorithmic trading platform providing research, backtesting, and live trading tools to support strategy creation in C# or Python. QuantConnect also features a large database of financial instruments supporting 7 asset classes including equities, equity-options, forex, CFD, crypto, futures, and future-options. QuantConnect helps takes into account trade fills, slippage, margin, transaction costs, and bid-ask spread to provide realistic backtest results. The platform also allows users to transition from backtesting to live trading with no code changes.
Blueshift: Blueshift is an integrated platform for research, backtest, and trading offered by QuantInsti which offers live trading in the US, India, and FX markets. It also includes minute-level data free of cost across the markets. The trading strategy can be developed in Python or through a visual nonprogramming interface builder. Moving strategy to live trading is turnkey which ensures that what is backtested is exactly how it will be traded.
Historical Databases
If you are looking for a specific type of historical data, search for that type of data (example search): “free historical intraday futures data”. There are a lot of free or low-cost historical databases on the internet listed below.
Yahoo Finance - Free, dividend/split adjusted, but has survivorship bias and can only download 1 symbol at a time. Daily stock data only.
Sharadar - Survivorship bias free. Daily stock data only.
Algoseek - Moderately priced. Tick data provided and enriched with identifiers and tags. Has stock and futures data as well as intraday data. Daily and intraday stock and futures data.
CSIdata - Low cost and is the source of Yahoo Finance and Google’s historical data. The software enables the download of multiple symbols at once. Does have survivorship bias, though delisted stock’s history can be purchased. Daily stock and futures data.
Tickdata - Institutional quality intraday stock and futures data but is expensive
Interactive Brokers - Daily and intraday forex data which is free if you have an account.
Important Considerations on the Backtesting Data
Are the data split and dividend adjusted? 2 for 1 split of a stock worth $100, shareholders will have 2 shares of $50 instead of 1 share of $100 after the split. Similarly, if a company pays a dividend, apply a ratio to the pre-dividend data using (Close(T-1) -d/Close(T-1)) where T is the ex-date before the dividend is paid. Recommend getting data that already adjust for dividends and splits to minimize erroneous trading signals.
Are the data survivorship-bias free? Best is to backtest the strategy with more recent data so that the results are not distorted by too many missing stocks. For example, if the data has survivorship bias, the delisted stocks will not be factored into the performance of the strategy so a strategy that has an actual -40% performance may show that it has over 300% total return as it only takes into account of the stocks that survived the market downturn.
Does the strategy use high and low data? Backtest data that relies on the high and low data instead of the open and close data may be less reliable as there are much fewer transactions that happen at the high and low of the day than the open and close range of the day.
Measure the Backtest Performance
Quantitative traders use a myriad of performance measures and per Earnest, the most important ones to focus on are:
Sharpe Ratio
What is the risk-free rate in a dollar-neutral portfolio? The answer is 0. A dollar-neutral portfolio is self-financing, meaning the cash you get from selling short pays for the purchase of long-term securities, so the financing cost is small and can be neglected for backtesting purposes. For a long-only day trading strategy that does not hold positions overnight, there is no financing cost, and so the risk-free rate is also 0. In general, you need to subtract the risk-free rate from your strategy return in calculating the Sharpe ratio only if your strategy incurs financing costs.
To further facilitate comparison across strategies, most traders annualize the Sharpe Ratio. This can be further applied to hourly returns as well. If the trading strategy holds positions only during the NYSE market hours (6.5 hours a day), the average hourly return is R, and the standard deviation of the hourly return is s, then the annualized Sharpe Ratio is sqrt(1638) * R/s.
Maximum Drawdown
MAR Ratio
Divide the compound annual growth rate (CAGR) since inception by its most significant drawdown. The higher the ratio, the better the risk-adjusted returns. MAR ratio is similar to the Calmar ratio except that the Calmar ratio only looks at the past 36 months instead of the whole history since inception.
CAGR
Common Backtesting Pitfalls to Avoid
An erroneous backtest will give a better historical performance than what can be expected in actual trading.
Look-Ahead Bias: The error occurs when you are using information that was available only at a time ahead of the instant the trade was made. Ex: If your trade entry rule is, buy the stock if it is within 1% of the day’s low, you have introduced a look ahead bias as you would not have known the day’s low prior to it occurring. Use lagged historical data for calculating signals at every opportunity to avoid the bias. Lagging a series of data means you calculate all the quantities like moving averages, highs and lows, or even volume, based on data up to the close of the previous trading period only. Look ahead bias is easy to detect in Excel but harder in programming languages. A test Earnest suggested is to run the program and save all the recommended position file to file A. Then truncate the historical data so the most recent N days is removed. So if the last day in the original data is T, then the last day in the truncated data should be T-N. run the backtest program again with the T-N data and save the resulting position to file B. Truncate file A position so they align from 0 to T-N and check if A has identical positions compared to B. If different, then there is look-ahead bias in the backtest program.
Data-Snooping Bias: This is caused by over-optimizing parameters of the model based on the transient noises in the historical data. Ernest recommends not employing more than 5 parameters including quantities such as entry and exit thresholds, holding period, or lookback period, in computing moving averages. Also, the more tweaks done using historical data to obtain optimal performance, the more deflated the outcome may be, which can be measured by the deflated Sharpe Ratio developed by Bailey et. al.
Sample Size: The most basic safeguard against data-snooping bias is to ensure you have a sufficient amount of backtest data relative to the number of free parameters you want to optimize. Bailey et. al. provided some rigorous mathematical results on the minimum backtest length. The idea is that any backtest’s Sharpe Ratio is only an estimate of the true Sharpe Ratio (actual performance if implemented) if the amount of data used is finite. If you want to be statistically confident (95% level) that your true Sharpe Ratio is equal to or greater than 0, you need a backtest Sharpe Ratio of 1 and a sample size of 681 data points (2.71 years of daily data). The higher the Sharpe Ratio, the smaller the sample size needed. If your backtest Sharpe Ratio is 2 or more, then you need only 174 data points (0.69 years of daily data) to be confident that your true Sharpe Ratio is equal to or greater than 0. If you want to be confident your true Sharpe Ratio is equal to or greater than 1, then you need a backtest Sharpe Ratio of at least 1.5 and a sample size of 2,739 (10.87 years of daily data)
Out-of-Sample Testing: Divide the historical data into 2 parts and save the second part of the data for out-of-sample testing. The minimum size of the out-of-sample test set should follow the same requirement as the sample size above. Ideally, the set of optimal parameters and decisions for the first part (training part) of the backtest period is also the optimal set for the test period but things don’t always turn out to be perfect. A more computationally intensive method of out-of-sample testing is to use moving optimization of the parameters where the parameters are constantly adapting to the changing historical data, and data-snooping bias with respect to the parameters can then be eliminated. (Check out Conditional Parameter Optimization (CPO) which makes use of machine learning to determine optimal parameters to use for each trade for each day and this Youtube video by Earnest. ) The final out-of-sample testing is paper-trading which runs the model on actual unseen data.
Sensitivity Analysis: Once the parameters are optimized and the backtest test results look good, vary these parameters and make small qualitative changes in the features of the model to see how the performance changes on both the training and test sets. If the performance drop drastically, the model most likely suffers from data-snooping bias. Pruning unnecessary conditions, constraints, and parameters is the best practice as long as there is no significant decrease in performance in the test set even if it decreases performance in the training set.
Transaction Costs
It is important to add transaction costs into the evaluation of the final Sharpe Ratio as adding transaction costs can make the strategy highly unprofitable.
Strategy Refinement
Refining a strategy is more art than science but the guiding principle is the same that whatever improves the training set should improve the testing set. Although there are some pretty well know strategies, some minor variations may improve the performance. Examples are like excluding pharmaceutical stocks from their technical trading program because of the dramatic impact of news on their prices, or excluding stocks that have pending merger/acquisition deals. Other ideas may be changing the entry and exit timing or frequency of the trades. Another is to consider the selection of the stock universe for example a strategy that has a good Sharpe Ratio on small-cap stocks may become unprofitable when applied to large-cap stocks. When introducing strategy refinements, they should be based on fundamental economics or a well-studied market phenomenon instead of arbitrary rule based on trial and error to minimize data-snooping bias.
That’s all folks for backtesting. We will explore setting an individual quant trading business in the next section.