Numpy and Pandas – Small introduction with Jupyter Notebook

Last week I was participating at the Python and ML Summit in Berlin. One of the most interesting lectures, on which I participated, was a workshop by, named “Reading all yourself was yesterday – How to turn large amounts of text into insights with machine learning”. It explored analysis of large datasets and I have decided to write a basic article on how to use Pandas and NumPy.

These two libraries have the following main features:

  • pandas
    • data analysis, derived from “panel data”
    • provides DataFrame
    • somehow close to a spreadsheet
  • numpy
    • numeric functions for python
    • mainly for calculation purposes
    • has its own tricks with arrays

The examples below are available on a Jupyter notebook here.

So, let’s start with the pandas. This is how the initial sample data looks like:

Then, once the articles are Dataframed, they look like this:

 

 

 

 

 

Which becomes even better, if the indices are added. In our case, these are the week numbers:

 

 

 

 

 

The indices are of course accessible through a .index command – data_with_index.index. And if we use  data_with_index.to_numpy  then an array with list of lists shows up:

There are other nice 1-line commands, that can help us get the best out of our data. E.g. it could be

  • described():

  • mean-ed():

  • analyzed with cumulative sum – e.g. 2,8,11,12 is the cumulative sum of codedaily, because 2+6=8; 8+3=11; 11+1=12:

  • with some lambda expression, the difference between max and min is easily taken:

pandas.DataFrame.to_excel

Writing data from the dataframe to Excel and reading is really 1 liner:

Indeed, pandas is a game changer even there – pandas.DataFrame.to_excel.html. As mentioned, all examples are available on a Jupyter notebook here.

Cheers!

Tagged with: , , , , ,