Good resources to learn auto trade backtest

Research Backtesting Environments in Python with pandas

By Michael Halls-Moore on January 16th, 2014

from: https://www.quantstart.com/articles/Research-Backtesting-Environments-in-Python-with-pandas

Backtesting is the research process of applying a trading strategy idea to historical data in order to ascertain past performance. In particular, a backtester makes no guarantee about the future performance of the strategy. They are however an essential component of the strategy pipeline research process, allowing strategies to be filtered out before being placed into production.

In this article (and those that follow it) a basic object-oriented backtesting system written in Python will be outlined. This early system will primarily be a “teaching aid”, used to demonstrate the different components of a backtesting system. As we progress through the articles, more sophisticated functionality will be added.

Backtesting Overview

The process of designing a robust backtesting system is extremely difficult. Effectively simulating all of the components that affect the performance of an algorithmic trading system is challenging. Poor data granularity, opaqueness of order routing at a broker, order latency and a myriad of other factors conspire to alter the “true” performance of a strategy versus the backtested performance.

When developing a backtesting system it is tempting to want to constantly “rewrite it from scratch” as more factors are found to be crucial in assessing performance. No backtesting system is ever finished and a judgement must be made at a point during development that enough factors have been captured by the system.

With these concerns in mind the backtester presented in here will be somewhat simplistic. As we explore further issues (portfolio optimisation, risk management, transaction cost handling) the backtester will become more robust.

Types of Backtesting Systems

There are generally two types of backtesting system that will be of interest. The first is research-based, used primarily in the early stages, where many strategies will be tested in order to select those for more serious assessment. These research backtesting systems are often written in Python, R or MatLab as speed of development is more important than speed of execution in this phase.

The second type of backtesting system is event-based. That is, it carries out the backtesting process in an execution loop similar (if not identical) to the trading execution system itself. It will realistically model market data and the order execution process in order to provide a more rigourous assessment of a strategy.

The latter systems are often written in a high-performance language such as C++ or Java, where speed of execution is essential. For lower frequency strategies (although still intraday), Python is more than sufficient to be used in this context.

Object-Oriented Research Backtester in Python

The design and implementation of an object-oriented research-based backtesting environment will now be discussed. Object orientation has been chosen as the software design paradigm for the following reasons:

  • The interfaces of each component can be specified upfront, while the internals of each component can be modified (or replaced) as the project progresses
  • By specifying the interfaces upfront it is possible to effectively test how each component behaves (via unit testing)
  • When extending the system new components can be constructed upon or in addition to others, either by inheritance or composition

At this stage the backtester is designed for ease of implementation and a reasonable degree of flexibility, at the expense of true market accuracy. In particular, this backtester will only be able to handle strategies acting on a single instrument. Later the backtester will modified to handle sets of instruments. For the initial backtester, the following components are required:

  • Strategy – A Strategy class receives a Pandas DataFrame of bars, i.e. a list of Open-High-Low-Close-Volume (OHLCV) data points at a particular frequency. The Strategy will produce a list of signals, which consist of a timestamp and an element from the set {1,0,1}{1,0,−1} indicating a long, hold or short signal respectively.
  • Portfolio – The majority of the backtesting work will occur in the Portfolio class. It will receive a set of signals (as described above) and create a series of positions, allocated against a cash component. The job of the Portfolio object is to produce an equity curve, incorporate basic transaction costs and keep track of trades.
  • Performance – The Performance object takes a portfolio and produces a set of statistics about its performance. In particular it will output risk/return characteristics (Sharpe, Sortino and Information Ratios), trade/profit metrics and drawdown information.

What’s Missing?

As can be seen this backtester does not include any reference to portfolio/risk management, execution handling (i.e. no limit orders) nor will it provide sophisticated modelling of transaction costs. This isn’t much of a problem at this stage. It allows us to gain familiarity with the process of creating an object-oriented backtester and the Pandas/NumPy libraries. In time it will be improved.

Implementation

We will now proceed to outline the implementations for each object.

Strategy

The Strategy object must be quite generic at this stage, since it will be handling forecasting, mean-reversion, momentum and volatility strategies. The strategies being considered here will always be time series based, i.e. “price driven”. An early requirement for this backtester is that derived Strategy classes will accept a list of bars (OHLCV) as input, rather than ticks (trade-by-trade prices) or order-book data. Thus the finest granularity being considered here will be 1-second bars.

The Strategy class will also always produce signal recommendations. This means that it will advise a Portfolio instance in the sense of going long/short or holding a position. This flexibility will allow us to create multiple Strategy “advisors” that provide a set of signals, which a more advanced Portfolio class can accept in order to determine the actual positions being entered.

The interface of the classes will be enforced by utilising an abstract base class methodology. An abstract base class is an object that cannot be instantiated and thus only derived classes can be created. The Python code is given below in a file called backtest.py. The Strategy class requires that any subclass implement the generate_signals method.

In order to prevent the Strategy class from being instantiated directly (since it is abstract!) it is necessary to use the ABCMeta and abstractmethod objects from the abc module. We set a property of the class, called __metaclass__ to be equal to ABCMeta and then decorate the generate_signals method with the abstractmethod decorator.

While the above interface is straightforward it will become more complicated when this class is inherited for each specific type of strategy. Ultimately the goal of the Strategy class in this setting is to provide a list of long/short/hold signals for each instrument to be sent to a Portfolio.

Portfolio

The Portfolio class is where the majority of the trading logic will reside. For this research backtester the Portfolio is in charge of determining position sizing, risk analysis, transaction cost management and execution handling (i.e. market-on-open, market-on-close orders). At a later stage these tasks will be broken down into separate components. Right now they will be rolled in to one class.

This class makes ample use of pandas and provides a great example of where the library can save a huge amount of time, particularly in regards to “boilerplate” data wrangling. As an aside, the main trick with pandas and NumPy is to avoid iterating over any dataset using the for d in … syntax. This is because NumPy (which underlies pandas) optimises looping by vectorised operations. Thus you will see few (if any!) direct iterations when utilising pandas.

The goal of the Portfolio class is to ultimately produce a sequence of trades and an equity curve, which will be analysed by the Performance class. In order to achieve this it must be provided with a list of trading recommendations from a Strategy object. Later on, this will be a group of Strategy objects.

The Portfolio class will need to be told how capital is to be deployed for a particular set of trading signals, how to handle transaction costs and which forms of orders will be utilised. The Strategy object is operating on bars of data and thus assumptions must be made in regard to prices achieved at execution of an order. Since the high/low price of any bar is unknown a priori it is only possible to use the open and close prices for trading. In reality it is impossible to guarantee that an order will be filled at one of these particular prices when using a market order, so it will be, at best, an approximation.

In addition to assumptions about orders being filled, this backtester will ignore all concepts of margin/brokerage constraints and will assume that it is possible to go long and short in any instrument freely without any liquidity constraints. This is clearly a very unrealistic assumption, but is one that can be relaxed later.

The following listing continues backtest.py:

At this stage the Strategy and Portfolio abstract base classes have been introduced. We are now in a position to generate some concrete derived implementations of these classes, in order to produce a working “toy strategy”.

We will begin by generating a subclass of Strategy called RandomForecastStrategy, the sole task of which is to produce randomly chosen long/short signals! While this is clearly a nonsensical trading strategy, it will serve our needs by demonstrating the object oriented backtesting framework. Thus we will begin a new file called random_forecast.py, with the listing for the random forecaster as follows:

Now that we have a “concrete” forecasting system, we must create an implementation of a Portfolio object. This object will encompass the majority of the backtesting code. It is designed to create two separate DataFrames, the first of which is a positions frame, used to store the quantity of each instrument held at any particular bar. The second, portfolio, actually contains the market price of all holdings for each bar, as well as a tally of the cash, assuming an initial capital. This ultimately provides an equity curve on which to assess strategy performance.

The Portfolio object, while extremely flexible in its interface, requires specific choices when regarding how to handle transaction costs, market orders etc. In this basic example I have considered that it will be possible to go long/short an instrument easily with no restrictions or margin, buy or sell directly at the open price of the bar, zero transaction costs (encompassing slippage, fees and market impact) and have specified the quantity of stock directly to purchase for each trade.

Here is the continuation of the random_forecast.py listing:

This gives us everything we need to generate an equity curve based on such a system. The final step is to tie it all together with a __main__ function:

The output of the program is as follows. Yours will differ from the output below depending upon the date range you select and the random seed used:

In this instance the strategy lost money, which is unsurprising given the stochastic nature of the forecaster! The next steps are to create a Performance object that accepts a Portfolio instance and provides a list of performance metrics upon which to base a decision to filter the strategy out or not.

We can also improve the Portfolio object to have a more realistic handling of transaction costs (such as Interactive Brokers commissions and slippage). We can also straightforwardly include a forecasting engine into a Strategy object, which will (hopefully) produce better results. In the following articles we will explore these concepts in more depth.

 

 

Backtesting a Moving Average Crossover in Python with pandas

By Michael Halls-Moore on January 21st, 2014

In the previous article on Research Backtesting Environments In Python With Pandas we created an object-oriented research-based backtesting environment and tested it on a random forecasting strategy. In this article we will make use of the machinery we introduced to carry out research on an actual strategy, namely the Moving Average Crossover on AAPL.

Moving Average Crossover Strategy

The Moving Average Crossover technique is an extremely well-known simplistic momentum strategy. It is often considered the “Hello World” example for quantitative trading.

The strategy as outlined here is long-only. Two separate simple moving average filters are created, with varying lookback periods, of a particular time series. Signals to purchase the asset occur when the shorter lookback moving average exceeds the longer lookback moving average. If the longer average subsequently exceeds the shorter average, the asset is sold back. The strategy works well when a time series enters a period of strong trend and then slowly reverses the trend.

For this example, I have chosen Apple, Inc. (AAPL) as the time series, with a short lookback of 100 days and a long lookback of 400 days. This is the example provided by the zipline algorithmic trading library. Thus if we wish to implement our own backtester we need to ensure that it matches the results in zipline, as a basic means of validation.

Implementation

Make sure to follow the previous tutorial here, which describes how the initial object hierarchy for the backtester is constructed, otherwise the code below will not work. For this particular implementation I have used the following libraries:

  • Python – 2.7.3
  • NumPy – 1.8.0
  • pandas – 0.12.0
  • matplotlib – 1.1.0

The implementation of ma_cross.py requires backtest.py from the previous tutorial. The first step is to import the necessary modules and objects:

As in the previous tutorial we are going to subclass the Strategy abstract base class to produce MovingAverageCrossStrategy, which contains all of the details on how to generate the signals when the moving averages of AAPL cross over each other.

The object requires a short_window and a long_window on which to operate. The values have been set to defaults of 100 days and 400 days respectively, which are the same parameters used in the main example of zipline.

The moving averages are created by using the pandas rolling_mean function on the bars[‘Close’] closing price of the AAPL stock. Once the individual moving averages have been constructed, the signal Series is generated by setting the colum equal to 1.0 when the short moving average is greater than the long moving average, or 0.0 otherwise. From this the positionsorders can be generated to represent trading signals.

The MarketOnClosePortfolio is subclassed from Portfolio, which is found in backtest.py. It is almost identical to the implementation described in the prior tutorial, with the exception that the trades are now carried out on a Close-to-Close basis, rather than an Open-to-Open basis. For details on how the Portfolio object is defined, see the previous tutorial. I’ve left the code in for completeness and to keep this tutorial self-contained:

Now that the MovingAverageCrossStrategy and MarketOnClosePortfolio classes have been defined, a __main__ function will be called to tie all of the functionality together. In addition the performance of the strategy will be examined via a plot of the equity curve.

The pandas DataReader object downloads OHLCV prices of AAPL stock for the period 1st Jan 1990 to 1st Jan 2002, at which point the signals DataFrame is created to generate the long-only signals. Subsequently the portfolio is generated with a 100,000 USD initial capital base and the returns are calculated on the equity curve.

The final step is to use matplotlib to plot a two-figure plot of both AAPL prices, overlaid with the moving averages and buy/sell signals, as well as the equity curve with the same buy/sell signals. The plotting code is taken (and modified) from the zipline implementation example.

The graphical output of the code is as follows. I made use of the IPython %paste command to put this directly into the IPython console while in Ubuntu, so that the graphical output remained in view. The pink upticks represent purchasing the stock, while the black downticks represent selling it back:

AAPL Moving Average Crossover Performance from 1990-01-01 to 2002-01-01
AAPL Moving Average Crossover Performance from 1990-01-01 to 2002-01-01

As can be seen the strategy loses money over the period, with five round-trip trades. This is not surprising given the behaviour of AAPL over the period, which was on a slight downward trend, followed by a significant upsurge beginning in 1998. The lookback period of the moving average signals is rather large and this impacted the profit of the final trade, which otherwise may have made the strategy profitable.

In subsequent articles we will create a more sophisticated means of analysing performance, as well as describing how to optimise the lookback periods of the individual moving average signals.

https://www.quantstart.com/articles/Backtesting-a-Moving-Average-Crossover-in-Python-with-pandas