Last week I was participating at the Python and ML Summit in Berlin. One of the most interesting lectures, on which I participated, was a workshop by, named “Reading all yourself was yesterday – How to turn large amounts of text into insights with machine learning”. It explored analysis of large datasets and I have decided to write a basic article on how to use Pandas and NumPy.

These two libraries have the following main features:

**pandas**- data analysis, derived from “
**pan**el**da**ta” - provides DataFrame
- somehow close to a spreadsheet

- data analysis, derived from “
**numpy**- numeric functions for python
- mainly for calculation purposes
- has its own tricks with arrays

The examples below are available on a Jupyter notebook here.

So, let’s start with the pandas. This is how the initial sample data looks like:

1 2 3 4 5 6 7 |
import numpy as np import pandas as pd data = { 'vitoshacademy.com': [0, 0, 2, 1], 'codedaily.vitoshacademy.com': [2, 6, 3, 1] } |

Then, once the articles are Dataframed, they look like this:

Which becomes even better, if the indices are added. In our case, these are the week numbers:

1 2 |
data_with_index = pd.DataFrame(data, index = ['wk33', 'wk34', 'wk35', 'wk36']) data_with_index |

The indices are of course accessible through a .index command – data_with_index.index. And if we use data_with_index.to_numpy then an array with list of lists shows up:

There are other nice 1-line commands, that can help us get the best out of our data. E.g. it could be

- described():

- mean-ed():

- analyzed with cumulative sum – e.g. 2,8,11,12 is the cumulative sum of codedaily, because 2+6=8; 8+3=11; 11+1=12:

- with some lambda expression, the difference between max and min is easily taken:

## pandas.DataFrame.to_excel

Writing data from the dataframe to Excel and reading is really 1 liner:

1 2 |
data_with_index.to_excel('myExcel.xlsx', sheet_name='Pandas') pd.read_excel('myExcel.xlsx', 'Pandas', index_col=None, na_values=['NA']) |

Indeed, pandas is a game changer even there – pandas.DataFrame.to_excel.html. As mentioned, all examples are available on a Jupyter notebook here.

Cheers!