Python – Linear Regression – YouTube video

There is something interesting about linear regression. I have just noticed, that I have actually quite some articles on it some years ago, but today I wanted to make a YouTube video as well. Well, what is the difference this time? I really hope I have become a bit better in explaining basic theory and plotting data.

A plot from the code below

If not, then no worries – the third article will probably come in another 4 years, if you follow the linear regression. Just some minimal code from the video here, if you need to impress someone:

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

import warnings
warnings.filterwarnings('ignore')

# Define five manual data points
x_values = np.array([1, 2, 3, 4, 5]).reshape(-1, 1)  # Reshape for sklearn
y_true = np.array([2.3, 2.9, 3.6, 4.1, 5.0])

# Define a manually set linear equation for prediction
manual_slope = 0.5
manual_intercept = 2.0
y_pred_manual = manual_slope * x_values + manual_intercept

# Create and fit the linear regression model
model = LinearRegression()
model.fit(x_values, y_true)

# Get the predicted values and coefficients from the linear regression model
y_pred_best_fit = model.predict(x_values)

# Calculate residuals and squared residuals for manual prediction
residuals_manual = y_true.flatten() - y_pred_manual.flatten()
squared_residuals_manual = residuals_manual ** 2

# Calculate residuals and squared residuals for best fit line
residuals_best_fit = y_true.flatten() - y_pred_best_fit
squared_residuals_best_fit = residuals_best_fit ** 2

# Plot the actual values, manual prediction line, and best fitting line
plt.figure(figsize=(12, 8))
plt.scatter(x_values, y_true, color='blue', label='Actual values (y_true)')
plt.plot(x_values, y_pred_manual, color='orange', linestyle='--', label='Manual Prediction Line')
plt.plot(x_values, y_pred_best_fit, color='red', label='Best Fitting Line', linewidth=2)

# Draw vertical lines for residuals and squares for squared residuals for both lines
for i in range(len(x_values)):
    # Residuals for the manual prediction line
    plt.vlines(x_values[i], y_pred_manual[i], y_true[i], color='green', linestyle='dotted')
    square_side_manual = np.abs(residuals_manual[i])
    plt.gca().add_patch(plt.Rectangle((x_values[i] - square_side_manual / 2, y_pred_manual[i]), 
                                      square_side_manual, square_side_manual, color='purple', alpha=0.3))
    plt.text(x_values[i] + 0.2, (y_pred_manual[i] + y_true[i]) / 2, f'{squared_residuals_manual[i]:.2f}',
             color='purple')

    # Residuals for the best fitting line
    plt.vlines(x_values[i], y_pred_best_fit[i], y_true[i], color='cyan', linestyle='dotted')
    square_side_best_fit = np.abs(residuals_best_fit[i])
    plt.gca().add_patch(plt.Rectangle((x_values[i] - square_side_best_fit / 2, y_pred_best_fit[i]), 
                                      square_side_best_fit, square_side_best_fit, color='magenta', alpha=0.3))
    plt.text(x_values[i] - 1.0, (y_pred_best_fit[i] + y_true[i]) / 2, f'{squared_residuals_best_fit[i]:.2f}',
             color='magenta')

# Adding labels and titles
plt.title('Comparison of Manual Prediction Line and Best Fitting Line with Residuals')
plt.xlabel('Independent Variable (X)')
plt.ylabel('Dependent Variable (y)')
plt.legend()

# Show plot
plt.grid(True)
plt.show()

What does the code do? It plots the picture above the code, based on 2 options to draw regression lines. The first one is with manually defined intersept and slope and the second one is with the fitted one from the sklearn library. Run the code below to see it:

# Define a manually set linear equation for prediction
manual_slope = 0.5
manual_intercept = 2.0

best_line_slope = model.coef_[0]
best_line_intercept = model.intercept_

print(best_line_slope, best_line_intercept )

The rest is present in the YouTube video, I hope you enjoy it 🙂

Python - Linear Regression

The GitHub code is here:

https://github.com/Vitosh/Python_personal/blob/master/YouTube/016_Python-Linear-Regression/Linear-Regression.ipynb

🙂