There are three mainstream approaches to backtesting strategies: using actual price data divided into three groups; bootstrap, which uses actual price data but resamples it; and Monte Carlo simulation. There are theoretical issues that divide system builders over which method is best. What is important for the trader is that he correctly employs at least one of the backtesting strategies to his system before trusting his trading capital to it. A critical issue in choosing a backtesting strategy is the number of trades generated; at least 1,000 trades are needed in each phase of the system builder’s work.
Using actual price data, divided into three parts, is the usual beginning point for most system builders. The system is created using the first one-third of the data. At this point, the builder has found algorithms that appear to generate enough profit with small enough risk to offer good prospects. The second one-third of the data is used to optimize the system.
After the system has been optimized, it will be applied to the remaining one-third of the data. This is called out-of-sample testing, and it is where most systems fail. If the system still has good results across at least 1,000 trades, the system builder has a viable system. If the system generates fewer than 1,000 trades in the out-of-sample testing, the builder should consider another backtesting strategy.
Bootstrapping is a method of drawing some data from the total set, testing, putting the data back in, and drawing more data, or resampling, and retesting. The ideal number of resamples is nn, or n to the nth power, where n is the number of data in the original sample. For a trader who is likely dealing with at least 2,500 data points — 250 days a year across 10 years — that is not practical. Fortunately, 100 resamples will provide a high level of confidence that the bootstrap sample will mirror the original data, making the results reliable. If taking 100 resamples does not provide the needed 1,000 trades, the trader needs to continue resampling until that goal is met if he expects the system, rather than just the resampling of the data, to be reliable.
The last method of backtesting strategies is Monte Carlo (MC) simulation. This method uses a computer to generate simulated data, and the system is then tested on that data. The advantage to MC simulation is that one can create limitless amounts of data, allowing one to generate 10,000 trades or any other number of trades. Another advantage is each new data set is out of sample. This offers the opportunity to do repeated optimization and testing runs; simply optimize on this data set, then apply those system parameters to the next data the computer generates.
A disadvantage of MC simulation is that the data may not have exactly the same probability distribution function that trading data have, which could skew the results. In the best of all possible worlds, all three backtesting strategies should be used in the process of vetting the system. Success in all three should offer a very high likelihood of success in real-world trading.