Working with Pandas is like working with Excel on steroids – it can really do a lot of things fast, but somehow doing the easy things get complicated. In this video and article tutorial I am presenting

- loading data to a dataframe
- how to select data from a dataframe in pandas
- how to change and format the index of the dataframe
- how to do a basic operations with pandas
- plotting data in matplotlib
- summing columns to a new column

To **load data**, which is used in the first part is taken from statsmodels datasets, we use the magic
import statsmodels.api as sm. Once the data is loaded to a dataframe, it can be accessed through one the following:

1 2 3 |
df df.head() df['YEAR'] |

To get the corresponding data:

**Changing the index** of the dataframe is quite an easy task. First, we need to produce a new index:

1 |
index = pd.Index(sm.tsa.datetools.dates_from_range('1700','2008')) |

Once the index is produced, we may decide to format it. If we only need to get the year, this is the magic code to achieve so:

1 |
index = pd.to_datetime(index, format = "%m%d%Y").strftime("%Y") |

**Selecting data** from the dataframe is done with list comprehension. Plenty of ways to do so, however the easiest ones are these:

1 2 |
df[df.columns[1:2]][3:5] df.iloc[3:5,1:2] |

**Basic operations with pandas** are quite trivial – there are built-in functions like
Sum(),
().Mean, etc ,which could be used for these:

1 2 |
sum(df['YEAR']) df['SUNACTIVITY'].mean() |

Creating our own dataframe and **plotting data with matplotlib **is quite easy, with Jupyter notebook as well. We may generate a few lists and put them in the dataframe on a single loop:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 |
dfx = pd.DataFrame() ww, xx, yy, zz = [],[],[],[] for n in range(100): w = n * 10 if n % 13 == 0: x = n * 2 y = n ** 1.4 z = n * 10.5 ww.append(w) xx.append(x) yy.append(y) zz.append(z) dfx['n * 10'] = ww dfx['n * 2'] = xx dfx['n ** 1.4'] = yy dfx['n * 10.5'] = zz |

Once this is carried out, the plotting of the data is a piece of cake:

1 2 3 4 5 |
plt.rcParams['figure.figsize'] = [10,10] dfx.plot() plt.ylabel("Values") plt.xlabel("N") print(dfx.index.tolist()) |

Summing columns in python to a new column is not science fiction – the trick is to remeber, that the axis of the column is always 1 (and the row is 0):

1 2 3 |
for n in range(5): dfx["n Sum "+ str(n+2)] = dfx.sum(axis = 1, numeric_only = True) dfx.plot() |

Pretty much that’s all. The Jupyter notebook is available in GitHub here: https://github.com/Vitosh/Python_personal/blob/master/JupyterNotebook/sunspots.load_pandas.ipynb