Some time ago, probably yesterday or the day before, I had to filter a pandas dataframe and somehow it did not come intuitively into my mind how to do it. As I did not have an article about it, I decided to write one, as this problems will be something I will be seeing a lot.
Anyway, let’s assume this is our input table:
These are the tasks we need to fulfill:
- Remove the null values from going_from and going_to
- Display the data with removed null values
- Display the data with removed null values only
- Remove rows where either “VitoshAcademy” or “Academy” is present
- Display the rest of the rows
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 |
import pandas as pd df = { "name": ["Tom", "Jack", "Peter", "Mick", "VitoshAcademy", "Academy", "Juli"], "number": [10, 11, 12, 13, 14, 15, 16], "going_from": ["Varna", "Sofia", None, "Sofia", None, "Rousse", "Rousse"], "going_to": ["Hambourg", "Varna", None, "Kazanlak", "Plovdiv", "Sofia", "Kazanlak"], } df = pd.DataFrame(df) filter_pod_pol_nulls = df[["going_from", "going_to"]].isna().any(axis=1) df_with_nulls = df[filter_pod_pol_nulls].reset_index(drop=True) if len(df_with_nulls) > 0: print("Removing null values...") df_without_nulls = df[~filter_pod_pol_nulls].reset_index(drop=True) print("These values are removed:") print(df_with_nulls) print("These values are left:") print(df_without_nulls) values_to_filter = ["VitoshAcademy", "Academy"] filter_vitosh_academy_present = df[df.name.isin(values_to_filter)] print(filter_vitosh_academy_present) filter_vitosh_academy_missing = df[~df.name.isin(values_to_filter)] print(filter_vitosh_academy_missing) filter_vitosh_academy_only_present = df.query(f'name == "{values_to_filter[0]}"') print(filter_vitosh_academy_only_present) |
That’s all!