Mechanical.Trading: pairs

Showing posts with label pairs. Show all posts

Basics

Let us define a pair of companies as follows: two competing companies that are equally efficient in running an identical business. For example, PepsiCo, Inc and The Coca-Cola companies are likely to be such a pair. Each part of a pair returns the same profit/loss per $1 of investments. Mathematically this can be expressed as follows:
dY - adX=0, (1)
where dY and dX stand for the change in the stock price of the pair components, Y and X, per time dt. The solution of (1) gives the relationship, Y= aX + b, which should hold true if both parts of the pair are fairly priced. Let us introduce a synthetical underlying, R(t) = Y(t) - aX(t). Since market forces are expected ensuring <R(t) - b>=0 ( <...> stands for averaging over time), R is an ideal vehicle for mean reverting /statistical arbitrage strategies.

Pair trading strategies look to profit from a temporary divergence in R.

For example, when Y moves up while aX moves up slower than Y, or X doesn't move up, or X moves down then R can become larger than its stationary value b. A trader expects this divergence to revert back to the mean. In the above example, the trader ought to sell the outperformer, Y, and to buy the underperformer, X. The bet that R would converge to b relies only on relative changes in X and Y. It means that pair trading is not sensitive to market uptrend, downtrend, or sideways movements.

Why Trade Pairs?

Hedging is an intrinsic property of pair trading. Thus, if properly diversified, pair trading is not prone to market crashes.

What Pairs are suitable for a retail trader?

Creating a market-neutral portfolio of stock pairs requires significant investments of capital, analytical and computational resources which is usually beyond the limits of retail trading. It doesn't mean that a retail trader should avoid pair trading at all. In retail, it is reasonable to trade pairs made of index ETFs/futures, currency ETFs/futures and commodity ETF/futures where the causal relationship between X and Y is of a general economical/financial character while X & Y are diversified by the very design.

How to generate Y = aX +b.

In the public domain, there are several quantitative methods (correlation, distance, stochastic, stochastic differential residual, and cointegration) describing how to select a pair and how to generate the corresponding residual series.

Retail Perspective on Cointegration

Cointegration is a recognized technique that mathematically expresses the basic idea of pair trading.

Method

Time series {X(t),Y(t)} form a cointegrated pair if there exists a stationary process R(t) that is a linear combination of X(t) and Y(t):
R(t) = Y(t) - aX(t) - b, <R(t)>=0 ,
where <...> stands for averaging over time. It should be noted that cointegration of {X(t),Y(t)} is meaningful - therefore {X(t),Y(t)} pair is tradable - only when cointegration of {X(t),Y(t)} is not an artificial phenomenon caused by a spurious correlation but is the consequence of a casual relationship. The cointegration method is as follows:

Calculate a and b using the ordinary least square (OLS) regression method;
Generate R(t) = Y(t) - aX(t) - b time series;
Check that R(t) is a stationary process;
Calculate the standard deviation, S(t), of R(t);
Generate trade signals using Z-score Z(t) = R(t) / S(t).

Let us consider the time series produced by the daily closing price of PepsiCO, Inc. and The Coca-Cola companies in between 5/01/18 and 9/30/18.

Figure 1 The linear regression of the daily closing price of PepsiCO, Inc. and The Coca-Cola companies between 5/01/18 and 9/30/18 obtained by the OLS method.

The calculated linear regression,
R(t) = PEP(t) - 3.8 KO(t) - 61.3,
is shown in Figure 1. The regression r squared is 0.87. The calculated standard deviation of R(t) is ~ 2.3.

Figure 2 The residual time series, R(t), generated from PepsiCO, Inc. and The Coca-Cola stock price series.

The time dependence of the residual series, R(t), is shown in Figure 2. When R(t) is below zero a trader accumulates PEP and simultaneously sells KO shares, e. g. buy 10 shares of PEP and sell 38 shares of KO. When R(t) is above zero, the trader sells PEP and buys KO using the same ratio.

Discussion

Cointegration looks for a stationary relationship between two different investment assets. Except for spurious correlation, this is possible only when these assets have the same return on investment (ROI). Typically, the pairs selected by the cointegration method work fine in backtesting. However, the calculated parameters a and b often break down when forward testing/trading is carried out. The reason for such a break is that in reality there are no ROI-identical companies. For example, General Motors went bankrupt in the recession of 2008 while Ford managed to survive. There is no way to predict such a fundamental divergence using only statistical analysis of the historical price data. Large funds have a chance to mitigate this problem by running a diversified portfolio of pairs.

A retail trader, on the other side, should select pairs made of index ETFs/futures, currency ETFs/futures, and commodity ETF/futures where the long-term relationship between the assets is of a general nature while the diversification is achieved by the very design of a trading vehicle. Besides, R(t) has to be treated as a nonstationary process. In this case, the main goal of a pair selection method is to find out R(t) that is characterized by a weak time dependence relative to that of the components of the pair.

A practical example of pair trading suitable for a retail trader is given in "Correlation: Statistical Arbitrage of the Russell 2000 - Nasdaq 100".

Correlation: Statistical Arbitrage of the Russell 2000 - Nasdaq 100

"Though the Russell 2000 and Nasdaq 100 are highly correlated, these indexes do not form a cointegrated time series..."

The Russell 2000 is a small-cap index that includes the 2,000 smallest stocks out of the top 3,000 stocks in the U.S. market. While small-cap stocks are riskier than large-cap stocks, small-cap companies returned 12.1 percent annually between 1926 and 2017 compared with 10.2 percent by large-cap stocks during the same period. As a growth story, the Russell 2000 is highly correlated with the Nasdaq 100 Index which is composed of assets in various sectors, excluding financial services. A large portion of Nasdaq 100 is the technology sector, which accounts for 54% of the index's weight. This index also includes consumer services, healthcare, industrials, and telecommunications. Similar to the Russell 2000, companies included in the Nasdaq 100 index offer high returns which compensate for the growth-related risks.

The Russell 2000 and Nasdaq 100 is an example of a correlated but not cointegrated pair - these indexes have different ROI. It means that any combination of the Russell 2000 and Nasdaq 100 is going to drift with time instead of reverting to the mean. However, in a stationary macroeconomic environment, the difference in the returns of the Russell 2000 Nasdaq 100 indexes is expected to be a constant. Mathematically this condition can be approximated as follows:

dY - adX=bdt, (1)

where dY and dX stand for change in Y and X, respectively; a is the following ratio: "dollar value per point of X/ dollar value per point of Y"; b is determined by the difference in ROIs; dt stands for time differential. The solution of (1) gives the relationship, Y= aX + bt + c, which holds true if both the X and Y are fairly priced and b remains constant, i. e. when the difference in ROI is constant. Let us introduce a time series R(t) = Y(t) - aX(t) -bt - c. Market forces are expected to ensure <R(t)>=0 ( <...> stands for averaging over time). Thus R(t) is a stationary process that can be used as an indicator for a statistical arbitrage strategy.

The generic part of a statistical arbitrage (pair trading) algorithm is as follows:

1. Calculate b and c using the ordinary least square (OLS) regression method;
2. Generate R(t) = Y(t) - aX(t) - bt -c time series;
3. Check that R(t) is a stationary process;
4. Calculate the standard deviation, S(t), of R(t);
5. Generate trade signals using Z-score Z(t) = R(t) / S(t).

Though steps 1-5  are basically the same as in the cointegration method, one has to keep in mind that R(t) can't be expressed as a combination of Y(t) and X(t) only, hence we can't trade R(t) directly. Let us use P(t) = Y(t) - aX(t) as the trading vehicle:

5. if (Z(t) > Zthresh)  sell P(t);
if (Z(t) < -Zthresh) buy P(t);

Since P(t) is trending with time, the stop-loss parameter has to be defined both in terms of equity loss and time. Equity stop-loss, Equity_stop, is a routine problem in trading. However, setting time stop-loss, Time_stop, is an open question. My solution is based on the empirical observation that R(t) has a quasiperiodic nature while P(t), on the characteristic time scale, is characterized by a weak drift relative to its deviations from the trend (see the figure above). In the proposed method, the Fourier transform of R(t) and variational optimization is used to find out the dominant harmonic in R(t). The period of this harmonic is used then as Time_stop:

6. if ((duration of a trade > Time_stop) or (equity loss> Equity_stop)) exit.

Backtesting was carried out using QQQ and IWM data starting from the year 2000. Risk management and bet optimization were implemented as described in "Leverage using the Kelly criterion".  Live simulations are published here. The results of the algorithm implementation are shown below. On average, this algorithm produces 5 entries per year and doubles equity every 1.5 years.

Disclaimer Simulated performance does not represent actual performance and should not be interpreted as an indication of such performance. All material presented within is for research purposes only.