The Error Ratio
– Who is X and who is Y?
The last chapter should have given you an overview of linear regression and how it can be applied to two collections of data using MS Excel. Recall that this involves the use of two variables; X and Y.
X could be considered the independent variable while Y serves as the dependent variable. If you’ve put some thought into this, then undoubtedly you can surmise X and Y will be two distinct stocks.
Let us analyse two stocks – possibly HDFC Bank and ICICI Bank – by performing a linear regression and examine the results.
Let’s begin by setting ICICI Bank as X and HDFC Bank as Y. Before we go further, it is essential to note the data associated with this.
Be sure your information is precise – take into account any corporate actions, such as stock splits or bonuses.
Verify that the data corresponds to the exact dates – for example, I have compiled stock records between 4th December 2015 and 4th December 2017.
This is the data breakdown.
I’m going to apply a linear regression to these stocks, following the instructions I provided in the prior chapter. Please note that I’m running the analysis on stock prices, not stock returns.
The outcome of the linear regression is this:
The equation between ICICI and HDFC is clear: ICICI is independent whereas HDFC is dependent.
HDFC = Price of ICICI * 7.613 – 663.677
Assuming you are knowledgeable about the equation, I recommend reading the prior two chapters for those who don’t understand it. However, here is a general synopsis – the equation proposes forecasting HDFC’s price through ICICI’s value.
We are translating HDFC’s worth into the language of ICICI.
Let us turn this around, making ICICI dependent and HDFC independent.
This is what the outcomes are –
The equation is –
ICICI = HDFC * 0.09 + 142.4677
On these two stocks, you can regression in two different ways by changing the order of which stock is the dependent and independent variable.
The dilemma, then, is determining which variable should be noted as dependent and which as independent. Additionally, one must consider what sequence best fits the scenario.
The answer to this depends on three things –
Standard Error of intercept
The ratio of the above two variables.
The equation above illustrates how the price of ICICI varies with HDFC. This explanation or description will never be completely accurate, otherwise there would be no elements of chance.
The equation should be robust enough to explain the fluctuation of the dependent variable as determined by the independent variable; this is the optimal condition.
We then come to the issue of determining the strength of the linear regression equation. The ratio is relevant here.
To understand the ratio of Standard Error of Intercept and Standard Error, we must grasp the basics of both the numerator and denominator.
– Back to residuals
This linear regression equation shows the relationship between ICICI (independent) and HDFC (dependent).
The ratio of HDFC to ICICI stands at 7.613 minus 663.677.
This indicates that, though we may have an idea of what the price of HDFC will be based on the ICICI price, fluctuations in the market may mean a gap between our expectations and reality. This can be termed as ‘Residuals’.
This is what we observe as the residuals when attempting to explain the cost of HDFC with ICICI as the control variable.
When I discuss the regression equation and the residuals, I’m often asked the same question – why should we trust an equation if there are residuals each time? In other words, how can we put our faith in a formula that fails to give reliable predictions?
This is certainly a legitimate query. Examining the residuals, we can see that their values range from -288 to 548; as such, using this equation to calculate any kind of price forecast is unrealistic.
This was not just a matter of forecasting the value of a dependent stock based on the cost of an independent stock. Instead, it was all about the residuals!
I’ll give you the heads-up: The residuals show a certain behaviour. If we can find out what it is and identify the pattern within it, then we can construct a pair trade by reversing it – that is, buying and selling the two stocks at the same time.
In the upcoming sections, we will go into further detail. For now, let’s discuss ‘Standard Error’, the denominator of the Standard Error of Intercept / Standard Error equation.
The standard error is one of the metrics reported when you run a linear regression. Here is a visual representation –
The standard error is simply the standard deviation of the residuals. This can be determined by taking the time series array of the residuals and calculating its standard deviation.
Let me compute the standard error of the residuals manually for X = ICICI and y = HDFC.
Excel reports the standard deviation as 152.665, while the summary output reflects a slightly different number of 152.819; however, this minor discrepancy can be ignored.
The ‘Standard Error of the Intercept’ is a bit tricky, as it is included in the regression report. This metric can be found by substituting x = ICICI and y = HDFC.
Recall, the regression equation
M = Slope
C = Intercept
Realising that here both M and C are based on estimates, we understand the algorithm uses historical data to reach them. However, as this data could contain noise components and outliers, there is a chance of errors in the estimating process.
The Standard Error of the Intercept estimates the variance of the intercept, providing an indication of how much it could deviate. This can be related to the concept of ‘Standard Error’ itself. To sum up –
Standard Error of Intercept – The variance of the intercept
Standard Error – The variance of the residuals.
Having defined both variables, the ‘Error Ratio’ should be considered. This is a term I have devised for ease of understanding.
We are aware of the error ratio.
The percentage of errors can be determined by calculating the ratio of the standard error of the intercept to the standard error. This measure gives an indication of how precise an estimation is.
I use the same calculations for-
ICICI as X and HDFC as y = 0.401
HDFC as X and ICICI as y = 0.227
Choosing X and Y as stocks depends on their error ratio. HDFC and ICICI have the most favourable rate, so we will make HDFC our independent variable (X) and ICICI the dependent one (Y).
For the present, bear in mind to work out the error rate and decide which stock is dependent and which will be autonomous.