Pair Trading Calculating Correlation and Understanding Statistical Concepts

[wpcode id="20363"]
[wpcode id="28109"]
[wpcode id="28030"]
[wpcode id="28110"]

Correlation and its types

I want to mention ‘Trading Pairs’ by Mark Whistler here. This book got me into pair trading and over time, I delved deeper into the strategy beyond what Whistler suggested. Later on in this module we will discuss those techniques, but right now I’m taking you through the same journey that I went through when I learnt pair trading.

At the end of the preceding chapter, correlation and how it can be studied was introduced. We will move forward on this topic to undertake a calculation of correlation between two stocks by utilizing Excel. As you can imagine, computing correlation between two stocks is critical for pair trading.

For this case, Axis Bank and ICICI Bank have been taken into account. Although the two are private sector banking institutions with similar origins, it is expected that their stocks should show a high level of correlation.

I have obtained the closing price of Axis Bank and ICICI Bank for two years, which amounts to 496 days of trading. This information was acquired starting from 4th Dec 2015 up to 4th Dec 2017.

Before continuing, it is important to note the data we are using.

Ensure your data sets are equivalent by comparing the number of data points. For instance, if Stock A is 400 points, make sure Stock B has the same amount corresponding to the same dates.

Ensure that the data has been processed correctly with reference to any corporate changes such as bonus/stock splits.

The image above shows that, in addition to ICICI and Axis, I have obtained data for BPCL, HPCL, and HDFC Bank. This data can be utilized to construct and check other correlations.

At this point, all we have to work with is the date and closing price of the stock. We can go ahead and compute the daily returns. I assume you are familiar with this calculation as it has been included in our previous courses.

The daily return can be calculated as

= [today’s closing price / previous day’s closing price] – 1

I have taken into account the data from both ICICI and Axis Bank for this calculation.

Now, correlation can be calculated on the basis of two parameters –

The daily closing price

The daily return series

The daily closing price correlation necessitates a computation of the relationship between two stocks’ closing values. I don’t find it particularly pleasing to have to assess their correlation in this manner, however we’ll proceed with this for now.

To calculate correlation in Excel, simply utilise the ‘= Correl()’ function on the daily closing prices. I’m running the calculation on a new sheet labelled as ‘Pair Data’.

This is a glimpse of the situation:

The correlation between the closing prices of ICICI Bank and Axis bank is 0.51, which isn’t a particularly strong bond, but it’s bearable for the time being. Though our intuition suggested that these two organisations would be closely linked due to their business backgrounds, the figures don’t appear to agree.

We will now use the correl function to examine the correlation between the two stock’s daily % returns. I’ve already calculated this % return, so all that’s left is to run the correlation. 

This is definitely not uplifting news, however, it will suffice for the present.

Many traders compute the correlation on the absolute daily change, which is calculated as ‘Today’s stock price minus yesterday’s stock price’. Personally, I am not a great supporter of this idea. Nevertheless, I will be introducing it to you.

By running correlation calculations on Axis Bank and ICICI Bank, I obtained the same results that could be expected if I had tried to calculate the correlation between ICICI and Axis. Generally, the correlation between two sets of data remains constant, no matter which order they are shown in.

In this trading pairs approach, the correlation number should ideally be above 0.75. ICICI and Axis don’t meet that requirement, however we can manage despite that.

– Setting up the datasheet

In the preceding chapter, we talked over three variables associated with the pairs, the spread, differential, and ratios. Moving forward, we will compute these variables on the two stocks we are analysing. We’ll do this on a new sheet in the same workbook and entitle it ‘Data Sheet’. Here’s what it looks like –

In the prior chapter, I put forward a way to work out these variables easily.

This version of pair trading employs basic statistics. Thus, three commonly employed statistical variables shall be outlined.

– Basic stats

At this juncture, I’d like to cover three core statistical concepts. These are essential for understanding pair trading, and I’m confident you may have learned about them in your high school maths class. Anyway, they’re straightforward, so it’ll be easy to quickly become familiar with them.

I’ve generated a list of random scores achieved by batsmen in 10 cricket matches, which should help you comprehend these jargons more easily.

The mean, otherwise known as the arithmetic average, is the average of a group of numbers. To get this figure, you simply add up all the values and divide by their total amount.

To get the average in this example, I need to add up all the scores and divide it by 10 (the number of observations).

Mean (Average) = 626/10


It is easy to calculate the average of any set of numbers on Excel; all you need to do is use the ‘=Average()’ function.

The median is the middle data point or the average of the two middle data points, when the data is sorted in numerical order. Here, as we are dealing with an even number of points, the median is computed by taking the mean of the two central values.

Let me put the data points in their numerical order.

23, 34, 44, 51, 55, 65, 72, 82, 100, 100

Considering there are an even amount of observations, I will look at the middle two values – 55 and 65 – with their mean as the median.

Median = (55 + 65)/2


The excel function to calculate median is ‘=Median()’.

When viewed together, the mean and median can provide much insight into a trend. Further discussion on this will be addressed later.

The mode of a data series is the data point that shows up most frequently. In this example, 100 has two occurrences and no other number appears more than once, thus making it the mode.

To compute Mode, you can use the excel function ‘=Mode()’.

In the upcoming section, we will explore how to implement these function in excel and comprehend their significance for pair trading.

[wpcode id="28030"]