Python – Building New Columns in a Dataframe, Based on Other Columns

Working with python, pandas and dataframes is quite challenging even for experienced Excel & VBA developer, as I love to consider myself for one. Thus, today I was fighting for quite some time to add new columns to a dataframe, based on other columns and I have decided to summarize the knowledge, so I can use it later (pretty much the purpose of 95% of the articles here).

Creating the table

Imagine starting with the following table with two columns:

import datetime as dt
import pandas as pd

my_funds = [1, 2, 5, 7, 9, 11]
my_time = ['2020-01', '2019-12', '2019-11', '2019-10', '2019-09', '2019-08']
df = pd.DataFrame({'TIME': my_time, 'FUNDS':my_funds})

Adding computated columns

And the main purpose is initially to put the values from the FUNDS column to the power of 2, 3 and 4, just to see what would happen. As the .insert Python “magic” is to be used (len(df.columns)), the length of the table should be defined, as the idea is that these columns should go to the right of the existing ones:

for x in range(2,5):
    df.insert(len(df.columns), f'x**{x}', df["FUNDS"]**x)

Adding columns, based on rows in another column

The idea here is a bit more complicated. Let’s consider that we are currently in the first month of 2020. Thus, based on our data, we would like to have structured information for the previous 4 periods on the same row with the data from 2020-01. Something like this:

This can be done with a loop, concerning the fact that the data which should be taken is sorted and is present at a specific row, below ours. If there are no more rows, the fill_value = 0 parameter comes to help:

for x in range(1,5):
    df[f' period (-{x})'] = df["FUNDS"].shift(periods = -x, fill_value = 0)
df

Adding the columns on the left side

Ok, as far as adding to the right is done with len(df.columns), then you may need a couple of seconds to realize that for the opposite action, a 0 is needed in the parameter:

df = pd.DataFrame({'TIME': my_time, 'FUNDS':my_funds}) 
for x in range(3, 0, -1):
    df.insert(0, f'period (-{x})', 10 * df["FUNDS"].shift(periods = -x, fill_value = 0))

The pandas.DataFrame documentation is here.

The examples are in Jupyter notebook in GitHub here.

🙂

Python – Building New Columns in a Dataframe, Based on Other Columns

Creating the table

Adding computated columns

Adding columns, based on rows in another column

Adding the columns on the left side

Related posts: