Python – Plotting DataFrame and Working with Pandas – Video

Working with Pandas is like working with Excel on steroids – it can really do a lot of things fast, but somehow doing the easy things get complicated. In this video and article tutorial I am presenting

loading data to a dataframe
how to select data from a dataframe in pandas
how to change and format the index of the dataframe
how to do a basic operations with pandas
plotting data in matplotlib
summing columns to a new column

Python - Plotting DataFrame and Working with Pandas

Watch this video on YouTube

To load data, which is used in the first part is taken from statsmodels datasets, we use the magic import statsmodels.api as sm . Once the data is loaded to a dataframe, it can be accessed through one the following:

df
df.head()
df['YEAR']

To get the corresponding data:

Changing the index of the dataframe is quite an easy task. First, we need to produce a new index:

index = pd.Index(sm.tsa.datetools.dates_from_range('1700','2008'))

Once the index is produced, we may decide to format it. If we only need to get the year, this is the magic code to achieve so:

index = pd.to_datetime(index, format = "%m%d%Y").strftime("%Y")

Selecting data from the dataframe is done with list comprehension. Plenty of ways to do so, however the easiest ones are these:

df[df.columns[1:2]][3:5]
df.iloc[3:5,1:2]

Basic operations with pandas are quite trivial – there are built-in functions like Sum(), ().Mean, etc ,which could be used for these:

sum(df['YEAR'])
df['SUNACTIVITY'].mean()

Creating our own dataframe and plotting data with matplotlib is quite easy, with Jupyter notebook as well. We may generate a few lists and put them in the dataframe on a single loop:

dfx = pd.DataFrame()
ww, xx, yy, zz = [],[],[],[]

for n in range(100):
    w = n * 10
    if n % 13 == 0:
        x = n * 2
        y = n ** 1.4
        z = n * 10.5
    
    ww.append(w)
    xx.append(x)
    yy.append(y)
    zz.append(z)

dfx['n * 10'] = ww
dfx['n * 2'] = xx
dfx['n ** 1.4'] = yy
dfx['n * 10.5'] = zz

Once this is carried out, the plotting of the data is a piece of cake:

plt.rcParams['figure.figsize'] = [10,10]
dfx.plot()
plt.ylabel("Values")
plt.xlabel("N")
print(dfx.index.tolist())

Summing columns in python to a new column is not science fiction – the trick is to remeber, that the axis of the column is always 1 (and the row is 0):

for n in range(5):
    dfx["n Sum "+ str(n+2)] = dfx.sum(axis = 1, numeric_only = True)
dfx.plot()

Pretty much that’s all. The Jupyter notebook is available in GitHub here: https://github.com/Vitosh/Python_personal/blob/master/JupyterNotebook/sunspots.load_pandas.ipynb

Python – Plotting DataFrame and Working with Pandas – Video

Related posts: