Python – Linear Regression

In 2013 and 2014 (wow, already 7 years ago!) I wrote two articles about linear regression with Excel. Now, I am getting more and more interested in Python, thus I guess it would be interesting to remake the article into a python one. So, this is our input from the daily profit per week:

So, starting and loading the data looks like this:

import numpy as np
from sklearn.linear_model import LinearRegression
import pandas as pd
import matplotlib.pyplot as plt

x = np.array([x for x in range(0, 8)]).reshape(-1, 1)
y = np.array([35, 36, 43, 47, 50, 51, 52, 57])

The .reshape(-1, 1) is required, so we can produce a list of lists for the x :

Now, starting with the model, the following 2 lines do the magic:

model = LinearRegression()
model.fit(x, y)

The model is “fitted”. This means, that a line is produced, which “fits” the dots in a way, that the minimal r_sq is produced. This is how to produce the fitted line and the scattered ponts:

plt.scatter(x, y,  color = 'black')
line = model.coef_*x + model.intercept_
plt.plot(x, line, 'r', label = f' y  = {model.coef_} x + {model.intercept_}')
plt.legend(fontsize = 22)
plt.show()

The more interesting part in the Linear Regression is the “Prediction”. E.g., it is like saying – “What if we only had that tiny red line from the plot above, where would we have put our values for a given period?” And the answer is actually quite simple – “On that red line!”. This is how to do it. First generated the predicted values:

y_pred = model.predict(x)

They look like this:

array([35.5, 38.60714286, 41.71428571, 44.82142857, 47.92857143, 51.03571429, 54.14285714, 57.25])

And they are quite different from the original values. How different? See for yourself:

fig = plt.figure()
ax1 = fig.add_subplot()
ax1.scatter(x, y, color = "black", label = "real data")
ax1.scatter(x, y_pred, color = "red", label = "prediction")
ax1.set_xlabel('periods', fontsize=10)
ax1.set_ylabel('money', fontsize='large')
plt.legend(loc='best')
plt.rcParams["figure.figsize"] = (10,5)
fig.suptitle('Real vs Predicted', fontsize=16)
plt.show()

Alternatively, we may use fewer lines to produce the same, without the add_subplot() part from the code above. But I guess it is less fun:

plt.scatter(x, y, color ='black', marker='x', label='real data')
plt.scatter(x, y_pred, c='red', marker='o', label='prediction')
plt.legend(loc='upper left')
plt.rcParams["figure.figsize"] = (10,5)
plt.show()

And if we want to finish with something making our article really a bit more statistical, it is these linear regression features:

coefficient for determination (or r^2)
intercept – this is the b in the formula Y= a + bX)
slope – how much y changes for every value of x. If the slope is 7, it means that for x = [1,2,3], y = [7, 14,21], if the intercept is at 0.

r_sq = model.score(x,y)
print(f'coefficient of determination: {r_sq}')
print(f'intercept: {model.intercept_}')
print(f'slope: {model.coef_}')

The code is available here. Enjoy!

Python – Linear Regression

Related posts: