# Numpy and Pandas – Small introduction with Jupyter Notebook

Last week I was participating at the Python and ML Summit in Berlin. One of the most interesting lectures, on which I participated, was a workshop by, named “Reading all yourself was yesterday – How to turn large amounts of text into insights with machine learning”. It explored analysis of large datasets and I have decided to write a basic article on how to use Pandas and NumPy.

These two libraries have the following main features:

• pandas
• data analysis, derived from “panel data”
• provides DataFrame
• somehow close to a spreadsheet
• numpy
• numeric functions for python
• mainly for calculation purposes
• has its own tricks with arrays

The examples below are available on a Jupyter notebook here.

So, let’s start with the pandas. This is how the initial sample data looks like:

Then, once the articles are Dataframed, they look like this: Which becomes even better, if the indices are added. In our case, these are the week numbers: The indices are of course accessible through a .index command – data_with_index.index. And if we use  data_with_index.to_numpy  then an array with list of lists shows up: There are other nice 1-line commands, that can help us get the best out of our data. E.g. it could be

• described(): • mean-ed(): • analyzed with cumulative sum – e.g. 2,8,11,12 is the cumulative sum of codedaily, because 2+6=8; 8+3=11; 11+1=12: • with some lambda expression, the difference between max and min is easily taken: ## pandas.DataFrame.to_excel

Writing data from the dataframe to Excel and reading is really 1 liner:

Indeed, pandas is a game changer even there – pandas.DataFrame.to_excel.html. As mentioned, all examples are available on a Jupyter notebook here.

Cheers!

Tagged with: , , , , ,