Python – Linear Regression – YouTube video
There is something interesting about linear regression. I have just noticed, that I have actually quite some articles on it some years ago, but today I wanted to make a YouTube video as well. Well, what is the difference this time? I really hope I have become a bit better in explaining basic theory and plotting data.

If not, then no worries – the third article will probably come in another 4 years, if you follow the linear regression. Just some minimal code from the video here, if you need to impress someone:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
import warnings
warnings.filterwarnings('ignore')
# Define five manual data points
x_values = np.array([1, 2, 3, 4, 5]).reshape(-1, 1) # Reshape for sklearn
y_true = np.array([2.3, 2.9, 3.6, 4.1, 5.0])
# Define a manually set linear equation for prediction
manual_slope = 0.5
manual_intercept = 2.0
y_pred_manual = manual_slope * x_values + manual_intercept
# Create and fit the linear regression model
model = LinearRegression()
model.fit(x_values, y_true)
# Get the predicted values and coefficients from the linear regression model
y_pred_best_fit = model.predict(x_values)
# Calculate residuals and squared residuals for manual prediction
residuals_manual = y_true.flatten() - y_pred_manual.flatten()
squared_residuals_manual = residuals_manual ** 2
# Calculate residuals and squared residuals for best fit line
residuals_best_fit = y_true.flatten() - y_pred_best_fit
squared_residuals_best_fit = residuals_best_fit ** 2
# Plot the actual values, manual prediction line, and best fitting line
plt.figure(figsize=(12, 8))
plt.scatter(x_values, y_true, color='blue', label='Actual values (y_true)')
plt.plot(x_values, y_pred_manual, color='orange', linestyle='--', label='Manual Prediction Line')
plt.plot(x_values, y_pred_best_fit, color='red', label='Best Fitting Line', linewidth=2)
# Draw vertical lines for residuals and squares for squared residuals for both lines
for i in range(len(x_values)):
# Residuals for the manual prediction line
plt.vlines(x_values[i], y_pred_manual[i], y_true[i], color='green', linestyle='dotted')
square_side_manual = np.abs(residuals_manual[i])
plt.gca().add_patch(plt.Rectangle((x_values[i] - square_side_manual / 2, y_pred_manual[i]),
square_side_manual, square_side_manual, color='purple', alpha=0.3))
plt.text(x_values[i] + 0.2, (y_pred_manual[i] + y_true[i]) / 2, f'{squared_residuals_manual[i]:.2f}',
color='purple')
# Residuals for the best fitting line
plt.vlines(x_values[i], y_pred_best_fit[i], y_true[i], color='cyan', linestyle='dotted')
square_side_best_fit = np.abs(residuals_best_fit[i])
plt.gca().add_patch(plt.Rectangle((x_values[i] - square_side_best_fit / 2, y_pred_best_fit[i]),
square_side_best_fit, square_side_best_fit, color='magenta', alpha=0.3))
plt.text(x_values[i] - 1.0, (y_pred_best_fit[i] + y_true[i]) / 2, f'{squared_residuals_best_fit[i]:.2f}',
color='magenta')
# Adding labels and titles
plt.title('Comparison of Manual Prediction Line and Best Fitting Line with Residuals')
plt.xlabel('Independent Variable (X)')
plt.ylabel('Dependent Variable (y)')
plt.legend()
# Show plot
plt.grid(True)
plt.show()
What does the code do? It plots the picture above the code, based on 2 options to draw regression lines. The first one is with manually defined intersept and slope and the second one is with the fitted one from the sklearn library. Run the code below to see it:
# Define a manually set linear equation for prediction manual_slope = 0.5 manual_intercept = 2.0 best_line_slope = model.coef_[0] best_line_intercept = model.intercept_ print(best_line_slope, best_line_intercept )
The rest is present in the YouTube video, I hope you enjoy it 🙂
The GitHub code is here:
🙂