How to replace pandas values by NaN by threshold

When processing pandas datasets, often you need to remove values above or below a given threshold from a dataset. One way to “remove” values from a dataset is to replace them by NaN (not a number) values which are typically treated as “missing” values.

For example: In order to replace values of theĀ xcolumn by NaNwhere the x column is< 0.75 in a DataFrame df, use this snippet:

import numpy as np

df["x"][df["x"] < -0.75] = np.nan

For example, we can run this on the TechOverflow pandas time series example dataset. The original dataset has two columns: Sine and Cosine and looks like this:

After running

df["Sine"][df["Sine"] < -0.75] = np.nan

you can see that all Sine values below 0.75 have been omitted from the plot, but all the values from theĀ Cosine column are left unchanged:

 

Complete example code:

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
plt.style.use("ggplot")

# Load pre-built time series example dataset
df = pd.read_csv("https://datasets.techoverflow.net/timeseries-example.csv", parse_dates=["Timestamp"])
df.set_index("Timestamp", inplace=True)

# Plot original code
df.plot()
plt.savefig("TimeSeries-Original.svg")

and this is the code to plot the filtered dataset:

df[df < -0.75] = np.nan
df.plot()
plt.savefig("TimeSeries-NaN.svg")