Normal Distribution and Descriptive Statistics in Pair Trading Analysis

Marketopedia / Trading System: All You Need to Know / Normal Distribution and Descriptive Statistics in Pair Trading Analysis
Revisiting the Normal Distribution
  • The general theory around the normal distribution which you should know –
  • In the first standard deviation, approximately 68% of the data is visible.
  • Within the 2nd deviation, approximately 95% of the data is visible.
Within the 3rd standard deviation, it is evident that almost all of the data (99.7%) can be seen. As for other ways in which the data is allocated, you should be aware that it can come in various forms – such as uniform, binomial or exponential distributions. – Descriptive Statistics In the preceding chapter, we discussed such basic statistical measurements as Mean, Median, and Mode. Now we will apply those same computations to our pair data: differential, spread and ratio. To do that, we’ll be utilising various Excel functions. Please be aware that the chapter continues using the excel from the previous section. Access the updated version from the link at its conclusion. The Excel functions are as follows – Mean – ‘=average()’ Median – ‘=median()’ Mode – ‘=mode.mult()’ The figures are as follows – The numbers from the previous chapter were calculated as correlations. We now have the data setup and we must introduce one additional variable: the standard deviation. Standard Deviation quantifies the spread of data from the average. According to the standard definition, it is a measure used in statistics “to capture the amount of variation or dispersion of a set of data values”. In other words, Standard Deviation helps us to make sense of the variability of the data, or in layman’s terms, understand how far apart the data points are from one another. To put this into context with our Pair data,  The differential data which we computed a while ago is something like this – Altogether, there are 496 distinct data points, and we computed the average value of these points as 228.52 earlier in this chapter. Would you be able to explain to me why it is important to understand the differences between the data points and its average value? Without knowledge of the variability of the data, it is difficult to make a valid assessment of how it behaves. For example, when the 498th data point is generated we can only evaluate it based on its proximity to the typical mean, or the range within which it derives. This, in fact, forms the crux of pair trading. Standard Deviation is useful for quantifying variation. I believe standard deviation is a sufficient measure, however some traders may prefer to work out another metric known as ‘Absolute Deviation’. Both these stats provide an indication of data variability, but they approach it in different ways. I found a clear and concise explanation on Investopedia concerning the distinction between standard deviation and absolute deviation. I thought it explained the concept well, so I’m sharing it here – When measuring variability within a set of data, two of the most common methods are standard deviation and average deviation. While the calculations for each may be similar, one fundamental difference between them is how they are interpreted; making it essential for finance professionals like accountants, investors and economists to know both. Determining range and volatility is especially important in their respective fields. Standard deviation is a widely-used measure of variability, especially for gauging the volatility of stock markets or other investments. It is calculated by finding the variance, which involves subtracting the mean from each data point, squaring the differences and then summing and averaging them. Variance not only illustrates variability and range; it also indicates how wide the spread of underlying data is — greater variance suggests broader dispersion. Standard deviation takes this one step further: it takes the square root of the variance to obtain a measure that’s expressed in terms of the original unit of measurement, making it much easier to interpret and utilise in further calculations. The mean absolute deviation, another measure of variability, is the average deviation. This technique avoids the issue of negative differences between data and the mean by utilising absolute values instead of squares. To calculate it, subtract the mean from each value, add their absolute values, then take the average. It is rarely used because further calculations become more complex than with standard deviation due to the need for absolute values. We will calculate both Standard and Absolute Deviation for all three pairs of data variables. I’m altering the Y-axis to Mean, Median, and Mode and the X-axis to Differential, Ratio, and Spread. The images previously shared will look slightly different from the one shared after these changes, apologies for any difficulty this may cause. The excel function for determining these values is – Standard Deviation – ‘=Stdev.p()’ Absolute Deviation – ‘=avedev()’ These basic descriptive statistics go by several names: Mean, Median, Mode, Standard Deviation, and Absolute Deviation. – The Standard deviation table The standard deviation provides us with an indicator of the variation in the data. We will now look to quantify this further, and gain an understanding of how different elements compare to one another. We can then assess whether any values are notably above or below the mean. For example, if the 498th differential data is 275, we can easily ascertain how far away that value is from the average. We can decide if we will purchase or short the pair. Later, we will address the minute particulars. To quantify the data, we must construct a standard deviation table. The table has below structure – We shall be determining the values of 1, 2, and 3 standard deviations above and below the mean for spread, differential and ratio. Let us consider the Spread as an example. We know the mean is 0.06 and the standard deviation (SD) is 8.075. The 1st SD in relation to the mean is – 0.064 + 8.075 = 8.139 2nd SD – 0.064 + (2*8.075) = 16.123 3rd SD – 0.064 + (3*8.075) = 24.288 We can employ the same tactic to determine the values below the mean. -1 SD – 0.064 – 8.075 = -8.011 -2 SD – 0.064 – (2*8.075) = -16.086 -3 SD – 0.064 – (3*8.075) = -24.160 We can instantly grasp that the 498th differential data is around +2 standard deviation, granting 95% confidence that the succeeding set of data points will be no higher than 315. At this point, we have collected nearly all the information that is needed to analyze the currency pair and potentially spot a trading opportunity. In the following chapter, we will move forward and do just that. To make sure everyone is up-to-date, I’ll begin with a brief summary of our discussion thus far.