Working with python, pandas and dataframes is quite challenging even for experienced Excel & VBA developer, as I love to consider myself for one. Thus, today I was fighting for quite some time to add new columns to a dataframe, based on other columns and I have decided to summarize the knowledge, so I can use it later (pretty much the purpose of 95% of the articles here).
Creating the table
Imagine starting with the following table with two columns:
1 2 3 4 5 6 |
import datetime as dt import pandas as pd my_funds = [1, 2, 5, 7, 9, 11] my_time = ['2020-01', '2019-12', '2019-11', '2019-10', '2019-09', '2019-08'] df = pd.DataFrame({'TIME': my_time, 'FUNDS':my_funds}) |
Adding computated columns
And the main purpose is initially to put the values from the FUNDS column to the power of 2, 3 and 4, just to see what would happen. As the .insert Python “magic” is to be used ( len(df.columns)), the length of the table should be defined, as the idea is that these columns should go to the right of the existing ones:
1 2 |
for x in range(2,5): df.insert(len(df.columns), f'x**{x}', df["FUNDS"]**x) |
Adding columns, based on rows in another column
The idea here is a bit more complicated. Let’s consider that we are currently in the first month of 2020. Thus, based on our data, we would like to have structured information for the previous 4 periods on the same row with the data from 2020-01. Something like this:
This can be done with a loop, concerning the fact that the data which should be taken is sorted and is present at a specific row, below ours. If there are no more rows, the fill_value = 0 parameter comes to help:
1 2 3 |
for x in range(1,5): df[f' period (-{x})'] = df["FUNDS"].shift(periods = -x, fill_value = 0) df |
Adding the columns on the left side
Ok, as far as adding to the right is done with len(df.columns), then you may need a couple of seconds to realize that for the opposite action, a is needed in the parameter:
1 2 3 |
df = pd.DataFrame({'TIME': my_time, 'FUNDS':my_funds}) for x in range(3, 0, -1): df.insert(0, f'period (-{x})', 10 * df["FUNDS"].shift(periods = -x, fill_value = 0)) |
The pandas.DataFrame documentation is here.
The examples are in Jupyter notebook in GitHub here.
🙂