Statsmodels is a nice statistics library in Python, which eases the data processing and analysis with Python. The library is available here – statsmodels.org and on its official web site the installation guide is always up to date.
In the live coding video, the following points are mentioned:
- imporing libraries and loading macrodata dataset to pandas
- displaying data from dataset in all columns and all rows
- inserting index to the data
- plotting the data with matplotlib
- getting the trend and the cycle of the data with hp_filter.hpfilter (https://www.statsmodels.org/dev/generated/statsmodels.tsa.filters.hp_filter.hpfilter.html)
- presenting the data, zooming to specific dates
Imporing libraries and loading macrodata dataset to pandas is the standard start.
1 2 3 4 5 6 |
import pandas as pd import matplotlib.pyplot as plt import statsmodels.api as sm %matplotlib inline df = sm.datasets.macrodata.load_pandas().data |
Displaying data from dataset in all columns and all rows is actually a 1-liner. 1 per column, 1 per row:
1 2 |
pd.options.display.max_columns = None pd.options.display.max_rows = None |
Inserting the index to the data is a standard trick, which could be carried out through different ways – e.g. via length or via from-to period:
1 |
index = pd.Index(sm.tsa.datetools.dates_from_range('1959Q1', length=203)) |
1 2 |
index = pd.Index(sm.tsa.datetools.dates_from_range('1959Q1', '2009Q3')) df.index = index |
Plotting the data with matplotlib is the interesting part. In general, this is why pandas and jupyter are powerful – the display of the data is quite fast – just a single df.plot() starts the magic.
1 |
df[['trend', 'cycle']].plot() |
Getting the trend and the cycile of the dataset with hp_filter.hpfilter is actually parsing data from a tuple to the dataset:
1 2 3 |
cycle, trend = sm.tsa.filters.hpfilter(df["realgdp"]) df['trend'] = trend df['cycle'] = cycle |
Presenting the data, zooming to specific dates is actually pretty interesting, if you have mastered the list slicing notations.
1 |
df[['trend', 'cycle']]["1985":"1999"].plot() |