Statsmodels is a nice statistics library in Python, which eases the data processing and analysis with Python. The library is available here – statsmodels.org and on its official web site the installation guide is always up to date.

In the live coding video, the following points are mentioned:

imporing libraries and loading macrodata dataset to pandas
displaying data from dataset in all columns and all rows
inserting index to the data
plotting the data with matplotlib
getting the trend and the cycle of the data with hp_filter.hpfilter (https://www.statsmodels.org/dev/generated/statsmodels.tsa.filters.hp_filter.hpfilter.html)
presenting the data, zooming to specific dates

Python - Statsmodels Example Matplotlib and Pandas

Watch this video on YouTube

Imporing libraries and loading macrodata dataset to pandas is the standard start.

import pandas as pd
import matplotlib.pyplot as plt
import statsmodels.api as sm
%matplotlib inline
df = sm.datasets.macrodata.load_pandas().data

Displaying data from dataset in all columns and all rows is actually a 1-liner. 1 per column, 1 per row:

pd.options.display.max_columns = None 
pd.options.display.max_rows = None

Inserting the index to the data is a standard trick, which could be carried out through different ways – e.g. via length or via from-to period:

index = pd.Index(sm.tsa.datetools.dates_from_range('1959Q1', length=203))

index = pd.Index(sm.tsa.datetools.dates_from_range('1959Q1', '2009Q3'))
df.index = index

Plotting the data with matplotlib is the interesting part. In general, this is why pandas and jupyter are powerful – the display of the data is quite fast – just a single df.plot() starts the magic.

df[['trend', 'cycle']].plot()

Getting the trend and the cycile of the dataset with hp_filter.hpfilter is actually parsing data from a tuple to the dataset:

cycle, trend = sm.tsa.filters.hpfilter(df["realgdp"])
df['trend'] = trend
df['cycle'] = cycle

Presenting the data, zooming to specific dates is actually pretty interesting, if you have mastered the list slicing notations.

df[['trend', 'cycle']]["1985":"1999"].plot()

The Jupyter notebook is in GitHub.

Python – Statsmodels Example – Video

Related posts: