Pair Trading Guide to Identifying and Profiting from Stock Pairs

Marketopedia / Trading System: All You Need to Know / Pair Trading Guide to Identifying and Profiting from Stock Pairs

Tracking the pair data

In this last chapter on pair trading we will review a live example and analyze the elements that affect the trade.

Here is a quick recap of pre-trade theory –

Basic overview of linear regression and how to perform one

Using linear regression, you can assess the relation between an independent variable X and a dependent variable Y.

Linear regression produces the intercept, slope, residuals, standard error and the standard error of the intercept as its output.

The choice to label a stock as either Y (dependent) or X (independent) is based on the error rate.

The ratio of standard error of intercept to the standard error can be defined as the error ratio.

The error ratio is computed by switching X and Y. The combination which offers the least error ratio will determine which stock is set as X and which one as Y.

The residuals resulting from the regression must be stationary for us to deduce that the two stocks are co-integrated.

If the stocks are cointegrated, they exhibit a tendency to track each other’s movements.

Stationarity of a series can be assessed by means of an ADF test.

The acceptable level for the ADF of an ideal pair should be under 0.05.

Throughout the past few chapters, we have discussed in depth which pairs are worthy of contemplation for pair trading. To summarise, we take two stocks (from a similar industry) and conduct a linear regression between them, look at the error ratio to determine which is X and which is Y. We afterward go on to run an ADF test on their residuals. A pair must have an ADF score of 0.05 or lower for it to be observed and potentially traded. After that, we keep an eye on the residuals each day as we seek out trading chances.

A pair trade opportunity arises when –

The residuals have dropped to -2 standard deviations (-2SD), indicating a lengthy position on the pair; thus, we purchase Y and dispose of X.

The residuals exceeded the +2 standard deviation marker, representing an opportunity to buy X and sell Y.

Once the trade is set in motion, I like to keep my stop loss at -3SD for long and +3SD for short trades, as well as a target of -1SD and +1SD respectively. Naturally, you will need to monitor the residual value to make any necessary adjustments. We can go into more detail about this later on.

– Note for the programmers

In chapter, I illustrated the ‘Pair Data’ sheet as an output of the Pair Trading Algo. This algorithm operates to perform a certain task –

Obtain the last 200-day closing prices of the underlying from NSE’s bhavcopy, or even automate the process by running a script.

We’ve already categorised our stocks by sector, making the download more organised.

A series of regressions are run and an ‘error ratio’ is determined for each. For example, suppose we’re considering RBL Bank and Kotak Bank; the regression module would take RBL (X) and Kotak (Y), as well as Kotak (X) and RBL (Y). The combination with the lowest error ratio is selected, while the other is discarded.

We use the adf test to analyze the residuals from the combination with the smallest error rate.

A report is generated with all the viable X-Y pairs and their respective values such as intercepts, beta, adf value, standard error, and sigma. Sigma still needs to be discussed. I will do so shortly.

If you are a programmer, I advise you to use this as a guide in creating your own pair trading algo.

In the previous chapters, I provided a brief overview of how to read the data in the Pair data excel sheet. Now it’s time to take a closer look. Here is a snapshot of the Pair output 

Examine the highlighted data. This combination of Bajaj Auto (Y) and TVS (X) has a lower standard error ratio, suggesting that the reverse pair – TVS as Y and Bajaj as X – is not advisable due to the greater error ratio. Consequently, this pairing would not be seen in this report.

The report does not only let you distinguish X from Y, it also provides informative data.

Intercept – 1172.72

Beta – 2.804

ADF value – 0.012

Std_err – -0.77

Sigma – 103.94

I assume familiarity with the intercept, Beta, and ADF value, so I won’t reiterate them. Now, let’s briefly discuss the remaining two variables.

The report states that Standard Error (Std_err) is a ratio of the today’s residual relative to the standard error of the residual. This may be confusing since two different standard errors are discussed. Let me explain this further with an example.

Examine the image below –

The regression output summary of Yes Bank versus South Indian Bank include standard error (22.776). This indicates the standard error of the residuals and was discussed earlier in this module.

The second noteworthy figure is 20.914, which is the remainder.

The std_err in the report is merely a proportion of –

Today’s residual error can be calculated using the standard error of the residual.

= 20.92404/22.776

= 0.91822

This figure reveals today’s residual in relation to its usual distribution. It is pivotal for trading decisions, as a long position should be taken if the number is -2.5 or more; and -3.0 would constitute a stop loss. The same applies to short positions when the figure reads +2.5 or higher with a stop loss at +3.0. For long, the target should reach -1 or lower, and conversely, for short it should be +1 or lower.

This means that the std_err needs to be calculated and tracked every day in order to discover potential trading opportunities. We will go over this topic in more detail shortly.

In the pair data report, the standard error of the residual is denoted by the sigma value, which in this instance is 22.776.

By reading through the pair data sheet, you will gain a full comprehension of the details.

Let’s move on to trading now!

– Live example

I have ran the pair trading algo, and an opportunity presented itself on 10th May 2018. This snapshot of pair data is provided, as well as a download option at the end of this chapter. Keep in mind that the closing prices from 10th May were used to generate this algo.

Examine the data featured in red; this provides information on Tata Motors Ltd as Y (dependent) and Tata Motors DVR as X (independent).

The ADF value of 0.0179 is highly satisfactory as it is lower than the threshold of 0.05; this implies that the residual is stationary, which is precisely what we are hoping to achieve.

The std_err of -2.54 suggests that the residuals have diverged from the mean significantly enough to warrant a long trade. Hence, it was necessary to buy Tata Motors and short Tata Motors DVR. Ideally I should have placed the trade on 11th May (Friday) morning; however, this did not happen and I had to place it on 14th May (Monday), though at a slightly less favourable rate. The main objective here was for me to showcase the trade rather than chase profit.

Here are the trade execution details –

At this stage, you could have two queries. Let me enumerate them for you –

Question- I was wondering if I had to go through the process of looking up stock prices, checking support/resistance levels, and consulting RSI values, or if I could just execute the trade without taking these steps into consideration?

Answer- No need to worry about any of that; I simply focused on the current price of the residual.

Question – On what basis did I choose to trade 1 lot each? Why can’t I trade 2 lots of TM and 3 lots of TMD?

Answer – To determine the number of stocks we need for a beta-neutral position, we’ll use the beta of the stock. Beta neutrality dictates that for every unit of stock Y, one should have beta*stock X. Let’s take Tata Motors (Y) and Tata Motors DVR (X) as an example, with a beta of 1.59. This means that if I had one stock of Tata Motors (Y), I would need to own 1.59 stocks of Tata Motors DVR (X).

Given that the lot size of Tata Motors (Y) is 1500, to match this proportion I took into account 1.59*1500 = 2385 shares of Tata Motors DVR (X). This was quite near to 2400 and thus, I went ahead with 1 lot each. However, as a result 115 more shares were purchased than required.

This constraint prevents us from trading pairs when the beta is negative; it’s not always possible.

I initiated this trade when the residual value was -2.54, with the intent to have a position open. We had planned for the target of -1 or stop loss of -3 on residual, and would wait until either of those conditions were met. This was essentially a waiting game.

I’ve created a basic Excel tracker to follow the position live. While those with programming expertise could do more with additional tools, my limited knowledge has resulted in this spreadsheet; I’ve included a snapshot here. The download link can be found below.

The position tracker contains all the important information about the pair. It should be straightforward to use. I have structured it in a way that when you enter the current values of X & Y, the latest Z score is automatically calculated, along with P&L. I suggest you experiment with this sheet, or even better create your own! ☺

After the spot was taken, tracking the z-score of the residual was all that remained. To that end, I and Faisal logged every value (barring the 14th and 15th). Here are the records –

As seen, the current values were followed and the most recent z-score was computed multiple times daily. This position was kept open for almost 7 trading sessions, which is quite normal with pair trading. I’ve gone through cases where they stayed open for up to 22-25 trading sessions. In any case, as long as your calculations are accurate, you just have to wait until the target or stop loss initiates.

The 23rd of May morning saw the z-score decline to the objective, creating an opening to finish this transaction. Here is a glimpse –

When it comes to Tata Motors DVR, we may never be sure which position will bring in the profits. We can only hope that one of them will move in our favor, but there’s no guarantee either way. Nevertheless, the gains we might make have the potential to dwarf any losses incurred.

The position of our final day (23rd May) was as follows –

The P&L came to approximately Rs.14,000/-; a generous profit for a decision with limited downside.

 – Final words on Pair Trading

Alright, in the last 13 chapters we have gone over everything I know about pair trading. It’s an exciting way to trade compared to just blindly speculating. Despite being less risky than usual, it still has its own specific risks which you must be conscious of. One of the main ways you can end up losing money is when the two stocks continue to drift apart after your position is established, leaving you with a substantial loss. The margin requirements are also slightly higher since there are two contracts that need dealing with. This means that you’ll need some extra funds in your account to cover any daily M2M charges.

On 23rd May, a signal pointed to taking a position in the spot market, with Allahabad Bank (Y) being shorted and Union Bank (X) being longed. The z-score was 2.64 and the corresponding beta was 0.437.

In line with beta neutrality, I need 0.437 shares of Union Bank (X) for each share of Allahabad Bank (Y). As the lot size of Allahabad Bank is 10,000, this translates to 4378 shares of Union Bank. But since the lot size of Union Bank is 4000, I had to acquire 370 shares in the spot market.


When I didn’t have knowledge of programming, I found someone who did, and assured them that this venture would bring in money. That’s exactly how I proceeded.


We use a linear regression between Stock A and Stock B to determine if their residuals are stationary, to check for cointegration.


What if Stock A is stationary, but instead Stock B and C are considered together as a single entity?

Beyond Pair trading, one encounters the concept of multivariate regression which is anything but straightforward to comprehend. However, if you are able to comprehend it, you will find that the game has changed completely.