Numpy and Pandas – Small introduction with Jupyter Notebook

Last week I was participating at the Python and ML Summit in Berlin. One of the most interesting lectures, on which I participated, was a workshop by, named “Reading all yourself was yesterday – How to turn large amounts of text into insights with machine learning”. It explored analysis of large datasets and I have decided to write a basic article on how to use Pandas and NumPy.

These two libraries have the following main features:

  • pandas
    • data analysis, derived from “panel data”
    • provides DataFrame
    • somehow close to a spreadsheet
  • numpy
    • numeric functions for python
    • mainly for calculation purposes
    • has its own tricks with arrays

The examples below are available on a Jupyter notebook here.

So, let’s start with the pandas. This is how the initial sample data looks like:

import numpy as np
import pandas as pd

data = {
    'vitoshacademy.com': [0, 0, 2, 1], 
    'codedaily.vitoshacademy.com': [2, 6, 3, 1]
}

Then, once the articles are Dataframed, they look like this:

 

 

 

 

 

Which becomes even better, if the indices are added. In our case, these are the week numbers:

data_with_index = pd.DataFrame(data, index = ['wk33', 'wk34', 'wk35', 'wk36'])
data_with_index

 

 

 

 

 

The indices are of course accessible through a .index command – data_with_index.index. And if we use  data_with_index.to_numpy  then an array with list of lists shows up:

There are other nice 1-line commands, that can help us get the best out of our data. E.g. it could be

  • described():

  • mean-ed():

  • analyzed with cumulative sum – e.g. 2,8,11,12 is the cumulative sum of codedaily, because 2+6=8; 8+3=11; 11+1=12:

  • with some lambda expression, the difference between max and min is easily taken:

pandas.DataFrame.to_excel

Writing data from the dataframe to Excel and reading is really 1 liner:

data_with_index.to_excel('myExcel.xlsx', sheet_name='Pandas')
pd.read_excel('myExcel.xlsx', 'Pandas', index_col=None, na_values=['NA'])

Indeed, pandas is a game changer even there – pandas.DataFrame.to_excel.html. As mentioned, all examples are available on a Jupyter notebook here.

Cheers!