Python – Data Wrangling with Excel and Pandas

Data wrangling with Excel and Pandas is actually quite useful tool in the belt of any Excel professional, financial professional, data analyst or a developer. Really, everyonecan benefit from the well defined libraries that ease people’s lifes. These are the libraries used:

import pandas as pd              # Main data manipulation
from openpyxl import Workbook    # Excel writing
from openpyxl.styles import Font # Excel formatting (bold, colors)
import glob                      # File path handling
from datetime import datetime

Additionally, a function for making a unique Excel name is used:

def make_unique_name():
    timestamp = datetime.now().strftime('%Y%m%d_%H%M%S')
    return f'{timestamp}__report.xlsx'
An example of the video, where Jupyter Notebook is used.

In the YT video below, the following 8 points are discussed:

# Trick 1 – Simple reading of worksheet from Excel workbook

excel_file_name = "financial_data.xlsx"
df = pd.read_excel(excel_file_name,
                  sheet_name = "Transactions",
                  parse_dates = ["Date"],
                  dtype={"InvoiceID":str})

# Trick 2 – Combine Reports

income = pd.read_excel(excel_file_name, sheet_name="Income")
expenses = pd.read_excel(excel_file_name, sheet_name="Expenses")

combined = pd.concat([
    income.assign(From_Worksheet="Income"),
    expenses.assign(From_Worksheet="Expenses")
])

# Trick 3 – Fix Missing Values

combined["Amount"] = combined["Amount"].fillna(combined["Amount"].mean())

# Trick 4 – Formatting the exported Excel file

with pd.ExcelWriter(new_worksheet, engine="openpyxl") as writer:
    combined.to_excel(writer, index=False)
    
    workbook = writer.book
    worksheet=writer.sheets["Sheet1"]
    
    for cell in worksheet["1:1"]:
        cell.font = Font(bold=True)
        cell.font = Font(color="FFFF22")

# Trick 5 – Merging Excel Files

files = glob.glob("sales12/sales_*.xlsx")
annual_data = pd.concat([pd.read_excel(f) for f in files])

# Trick 6 – Smart Filtering

web_design_only = annual_data[
    (annual_data["Description"]=="Web Design"
    )]
small_transactions = annual_data[
    (annual_data["Amount"] < 200
    )]

# Trick 7 – Mergining Tables

df_transactions = pd.read_excel(
            excel_file_name,
            sheet_name="Transactions")
df_customers = pd.read_excel(
            excel_file_name,
            sheet_name="Customers")
merged = pd.merge(
    df_transactions,
    df_customers,
    on = "CustomerID"
    )

# Trick 8 – Export Dataframe to Excel

with pd.ExcelWriter(new_worksheet, engine="openpyxl") as writer:
    merged.to_excel(writer)

The whole code with the Excel files is available in GitHub here.

Python And Excel Data Wrangling

Enjoy it!