MS Excel – Simple Linear Regression

Linear regression is something rather useful for forecasting analysis. If you want to understand more about linear regression, you may go to wikipedia, there you have a wonderful article about it.

Long story short, linear regression is an approach, modeling relationship between one dependent and one or more explanatory variable(s). With simple words, if we imagine that every year the petrol prices in Bulgaria are increasing (you really do not need a lot of imagination for this), you may make a linear regression analysis, trying to make a model of the increase. Thus, your dependent variable will be the price per liter and your explanatory variable will be the year. Thus, with some history, you may make a good forecast for the next year, based on the line of the linear regression.

In this article I will just show how to make a good forecasting with linear regression, using MS Excel. Actually, this article does not show something new or something unknown to the community of statisticians / forecasters / IT specialists, the article is used only to sum up my knowledge on the subject.

Many companies give simple or multiple linear regression examples in their assessments, so it is something useful and good to know if you are interested in the IT business.


Let’s imagine that you have opened a small lemonade stand. The stand is in a good place, but so far not a lot of customers know it. Anyway, you are a good guy and as such you are calculating every day your profit. At some time, you decide to buy a refridgirator for soft drinks, but you do not have enough money yet. So you decide to forecast, in order to have a clue when you will have enough money. You simply sit down and write down your profits per day for the last 8 weeks. Then you make an average of daily profit per week and you get the following table:

Table

Then you start doing linear analysis in order to calculate the profits for the next three weeks. Below is explained how you do it just with a chart.


1. Select the table at A1:B9.

2. Insert>Charts>Scatter>

Scatter

3.  Once you get the chart, you can edit it easily with the tabs Design, Layout, Format on the Chart Tools menu. There you may add legend, axes, gridlines and plenty of other stuff you would never think that you need.Chart Design

4. What do you actually need is hidden in the menu Format Trendline. In order to access it, select somewhere close to the blue dots on the chart and rightclick. Then on the menu select “Format Trendline”

TrendlineFormat

When you open the new menu, put a check on the last two options:

  • Display Equation on chart
  • Display R-squared value on chart

Once you are ready, you will see Equation and R-squared appearing on the chart. In the given example, the equation is y=3.1071x +35.5. This means that every period on average is expected to have an increase of 3.1, compared to the previous period. The 35.5 is representing the profit per day for week number 0. As we see from our table, the profit per day in that week is 35.0, which is pretty close, but is definitely not the same. So, if we want to forecast with linear regression the profit per day in week number 8, with the so created chart, we should simply calculate the following:

y=3.1071x + 35.5 or y = 3.1071*8 + 35.5
Thus, we will see that the expected profit for this day is 60.36 EUR. This can be done for any period, just by switching the 8 with the number of the period. Of course, the profit is not dependent only by the week number and this is very optimistic model, but this is the idea how this functions. Especially in the case of a lemonade stand, which is a good example for seasonal product, the forecasting should not be done like this for long periods. The idea of the article was only to show how the simple linear regression functions, not to propose a model for lemonade forecasting.

At the end I would like to mention something about the R squared, a.k.a. coefficient of determination. In wikipedia there is again a very good article about it, but if you want the story short, it just indicates how well data points fit a line or a curve. If in our example, all the dots were lying exactly on the line, we would have had a R squared of 1. Our result, 0.9477 shows quite good determination.


I really hope that you have enjoyed this article! It’s idea was to show you that linear regression is not something frightening and man should just read a little more to be able to use it. If you want to see the file with the screenshots, you may download it from here!

Tagged with: ,