TL;DR
If the Series
you want to use to split is a column in the DataFrame
, continue reading this post. Else, read Split pandas DataFrame every time a Series is True.
Use this utility function:
def split_dataframe_by_column(df, column): """ Split a DataFrame where a column is True. Yields a number of dataframes """ previous_index = df.index[0] for split_point in df[df[column]].index: yield df[previous_index:split_point] previous_index = split_point # Yield remainder of dataset try: yield df[split_point:] except UnboundLocalError: pass # There is no split point => Ignore # Usage example: list(split_dataframe_by_column(df, "ZeroCrossing"))
Note that one or more of those dataframes might be empty.
Full example:
We’ll use the ZeroCrossing
column we built in our previous post on How to detect value change in pandas string column/series which itself builds on our post on How to create pandas time series DataFrame example dataset. Based on that example we add the utility function shown above:
import pandas as pd # Load pre-built time series example dataset df = pd.read_csv("https://techoverflow.net/datasets/timeseries-example.csv", parse_dates=["Timestamp"]) df.set_index("Timestamp", inplace=True) # Create a new column containing "Positive" or "Negative" df["SinePositive"] = (df["Sine"] >= 0).map({True: "Positive", False: "Negative"}) # Create "change" column (boolean) df["ZeroCrossing"] = df["SinePositive"].shift() != df["SinePositive"] # Set first entry to False df["ZeroCrossing"].iloc[0] = False def split_dataframe_by_column(df, column): """Split a DataFrame where a column is True. Yields a number of dataframes""" previous_index = df.index[0] for split_point in df[df[column]].index: yield df[previous_index:split_point] previous_index = split_point # Yield remainder of dataset try: yield df[split_point:] except UnboundLocalError: pass # There is no split point => Ignore # Print result split_frames = list(split_dataframe_by_column(df, "ZeroCrossing")) print(f"Split DataFrame into {len(split_frames)} separate frames by zero-crossing") # This prints "Split DataFrame into 20 separate frames by zero-crossing"