Split pandas DataFrame every time a column is True
TL;DR
If the Series
you want to use to split is a column in the DataFrame
, continue reading this post. Else, read Split pandas DataFrame every time a Series is True.
Use this utility function:
def split_dataframe_by_column(df, column):
"""
Split a DataFrame where a column is True. Yields a number of dataframes
"""
previous_index = df.index[0]
for split_point in df[df[column]].index:
yield df[previous_index:split_point]
previous_index = split_point
# Yield remainder of dataset
try:
yield df[split_point:]
except UnboundLocalError:
pass # There is no split point => Ignore
# Usage example:
list(split_dataframe_by_column(df, "ZeroCrossing"))
Note that one or more of those dataframes might be empty.
Full example:
We’ll use the ZeroCrossing
column we built in our previous post on How to detect value change in pandas string column/series which itself builds on our post on How to create pandas time series DataFrame example dataset. Based on that example we add the utility function shown above:
import pandas as pd
# Load pre-built time series example dataset
df = pd.read_csv("https://techoverflow.net/datasets/timeseries-example.csv", parse_dates=["Timestamp"])
df.set_index("Timestamp", inplace=True)
# Create a new column containing "Positive" or "Negative"
df["SinePositive"] = (df["Sine"] >= 0).map({True: "Positive", False: "Negative"})
# Create "change" column (boolean)
df["ZeroCrossing"] = df["SinePositive"].shift() != df["SinePositive"]
# Set first entry to False
df["ZeroCrossing"].iloc[0] = False
def split_dataframe_by_column(df, column):
"""Split a DataFrame where a column is True. Yields a number of dataframes"""
previous_index = df.index[0]
for split_point in df[df[column]].index:
yield df[previous_index:split_point]
previous_index = split_point
# Yield remainder of dataset
try:
yield df[split_point:]
except UnboundLocalError:
pass # There is no split point => Ignore
# Print result
split_frames = list(split_dataframe_by_column(df, "ZeroCrossing"))
print(f"Split DataFrame into {len(split_frames)} separate frames by zero-crossing")
# This prints "Split DataFrame into 20 separate frames by zero-crossing"