Introduction to Linear Regression
In the previous chapter, we established the fundamentals of a straight line equation, using a simple example to demonstrate how two variables can be connected. To further illustrate this idea, we presented a table of two arrays of numbers and challenged the readers to discern if there was any relationship between them and if yes, how could one put it into an equation form. The additional task was finding out what the intercept and constant should be.
In this chapter, we will look into forming a link and progress to relative value trading. For ease, I’ll post the table with two sets of numbers again –
A casual perusal of the two sets of numbers won’t give a clue as to how they might be related. Unless you’re some kind of superhuman, it’s not an effective way to discern the connection.
In this situation, we turn to the ‘Linear Regression’ method. This statistical operation takes an array of two sets of numbers as input and returns various parameters, like the intercept and constant required for forming a straight line equation.
We will utilize Excel to perform the linear regression procedure. Here is the comprehensive guide for performing a standard linear regression on two data arrays. Be ready for plenty of screenshots and instructions
Step 1 – Install the Plugin
Create a new Excel spreadsheet, utilizing the values of X and Y from the table. Refer to the example below –
This data set is ours to use. Recall that Y, the ‘Dependent’ variable, takes its value based on X, the independent one. Both of them will be our input variables for the linear regression operation.
To access the Data ribbon, click on the highlighted tab in red on the Excel sheet. The tab can be seen below.
The data ribbon should display the ‘Data Analysis’ option, highlighted in blue. If you do not see it, don’t worry – just follow these instructions.
To access the file menu, select ‘File’ on the top bar.
By clicking on the new window, you will notice a selection to choose from in the left-hand side panel – ‘Option’.
To add the ‘Data Analysis’ option to the data ribbon, you need to click on the ‘Options’ tab. Navigate to the left-hand side panel and select ‘Add-Ins’, then press ‘Go’ and finally hit ‘Ok’. That’s it!
Shut your Excel document and restart your system, and you should be ready to go.
Step 2 – Enter the values
Assuming your excel sheet is equipped with the data analysis pack, the next step is to invoke the linear regression function. Click ‘Data’ in the ribbon and choose Data Analysis. This will open a pop-up presenting a range of statistical operations you can conduct on data sets; select ‘Regression’.
When you choose regression and hit OK, a pop-up will appear
You will notice a variety of fields in this area. Take particular notice of the ‘Input Y Range’ and ‘Input X Range’ in the initial section. These are for the dependent and independent variables, respectively.
To feed in the X and Y series data, click on the input channel and select the corresponding ranges.
I have checked the label box, indicating that cells A2 and B2 both have labels – X and Y, respectively.
For the time being, it would be best to overlook the extra inputs.
Check that you have gone through the steps mentioned below –
When you select ‘New worksheet’, the output data will show up on a different sheet. Additionally, please ensure Residuals and Standardised Residuals are both ticked; I’ll provide more details about them at a later point.
This will enable you to do the linear regression operation. Simply hit the ‘Ok’ button located in the top-right corner.
Excel will accept these inputs and complete the linear regression, then generate a new sheet within the workbook with the results.
Linear Regression Output
This is how the linear regression output looks; as expected, a summary of the result has been provided in a new sheet.
The summary output is intimidating at first glance, since it contains such a huge amount of information. We’ll break it up gradually as we go along.
Right now, let us focus on discovering our incline and intercept. I have highlighted this for you in the image underneath –
The data points coloured red contain the coefficients we seek: the intercept (or constant) and the slope (denoted by x).
You may be confused by the use of x to denote the slope, rather than M as it would typically go with a straight-line equation. I understand that this may be misleading, but unfortunately we’ll have to use x for our purposes.
So,
Slope of the equation = 1.885
Intercept (or constant) = -7.859813.
Given this, the straight-line equation for the arbitrary set of data is –
y = 1.885*x + (-7.859813) or
y = 1.885*x – 7.859813
In other words, what is the implication of this?
Recall from the preceding chapter, this equation enables us to ascertain the value of y or the dependent variable for a given x. For ease, I will post the table here again –
I’ve included a 15 to the x data, and now we can utilise the slope and intercept to estimate y. Let’s get on with it!
y = 1.885 * 15 – 7.859813
= 28.275 – 7.859813
= 20.415
Considering x is 15, the most probable value of y should be around 20.415.
You might be wondering how reliable this projection is.
That is simply what one might expect. To illustrate, if x had a value of 18 (as mentioned in the preceding data point), then according to the straight line equation, y should be -.
y = 1.885*18 – 7.859813
= 33.93 – 7.859813
= 26.07019
However, the actual value of y is 22.
This leads us two values of y –
The straight line equation predicts the value of y.
Actual value of y
The residual for y, when x = 18, is the difference between the actual and predicted y.
26.07019 – 22
= 4.070187
The summary output when linear regression is carried out includes the residuals, as shown in the snapshot below –
The residual when x equals 18 is discernible, as calculated previously.
By signing up, you agree to receive transactional messages on WhatsApp. You may also receive a call from a BP Wealth representative to help you with the account opening process
Disclosures and Disclaimer: Investment in securities markets are subject to market risks; please read all the related documents carefully before investing. The securities quoted are exemplary and are not recommendatory. Past performance is not indicative of future results. Details provided in the above newsletter are for educational purposes and should not be construed as investment advice by BP Equities Pvt. Ltd. Investors should consult their investment advisor before making any investment decision. BP Equities Pvt Ltd – SEBI Regn No: INZ000176539 (BSE/NSE), IN-DP-CDSL-183-2002 (CDSL), INH000000974 (Research Analyst), CIN: U45200MH1994PTC081564. Please ensure you carefully read the Risk Disclosure Document as prescribed by SEBI | ICF
Attention Investors
Issued in the interest of Investors
Communications: When You use the Website or send emails or other data, information or communication to us, You agree and understand that You are communicating with Us through electronic records and You consent to receive communications via electronic records from Us periodically and as and when required. We may communicate with you by email or by such other mode of communication, electronic or otherwise.
Investor Alert:
BP Equities Pvt Ltd – SEBI Regn No: INZ000176539 (BSE/NSE), INZ000030431 (MCX/NCDEX), IN-DP-CDSL-183-2002 (CDSL),
INH000000974 (Research Analyst) CIN: U45200MH1994PTC081564
BP Comtrade Pvt Ltd – SEBI Regn No: INZ000030431 CIN: U45200MH1994PTC081564
For complaints, send email on investor@bpwealth.com