How to create pandas time series DataFrame example dataset
TL;DR: Use our pre-built example dataset like this:
# Load pre-built time series example dataset
df = pd.read_csv("https://datasets.techoverflow.net/timeseries-example.csv", parse_dates=["Timestamp"])
df.set_index("Timestamp", inplace=True)
How to build your own time series example dataset
In our previous post Easily generate sine/cosine waveform data in Python using UliEngineering we showed how to generate sine and cosine waves using UliEngineering.
In this post, we show how to create a pandas DataFrame
containing sine and cosine data to be used as a sample time series dataset.
First, we generate the sine and cosine wave data:
import pandas as pd
import numpy as np
from UliEngineering.SignalProcessing.Simulation import sine_wave, cosine_wave
# Configure the properties of the sine wave here
frequency = 10.0 # 10 Hz sine / cosine wave
samplerate = 10000 # 10 kHz
nseconds = 1 # Generate 1 second of data
sine = sine_wave(frequency=frequency, samplerate=samplerate, length=nseconds)
cosine = cosine_wave(frequency=frequency, samplerate=samplerate, length=nseconds)
nsamples = len(sine) # How many values we have in the data arrays
After that, we define the timestamp where the dataset starts:
start_timestamp = pd.Timestamp('now')
Now we can create a list
of Timestamp
objects representing the points in time where the signal has been sampled:
# Create timestamps by offsetting
timedelta = pd.Timedelta(1/samplerate, 'seconds')
timestamps = [start_timestamp + i * timedelta for i in range(nsamples)]
Now we’re reading to create the DataFrame
object:
df = pd.DataFrame(index=timestamps, data={
"Sine": sine,
"Cosine": cosine
})
df.index.name = 'Timestamp'
Now we can use df.plot()
to plot the dataset:
# Use nice plotting style
from matplotlib import pyplot as plt
plt.style.use("ggplot")
# Plot dataset
df.plot()
# Make figure larger
plt.gcf().set_size_inches(10, 5)
Additionally we can export the dataset as CSV using
df.to_csv("/ram/timeseries-example.csv")
This example file is also available online at https://techoverflow.net/datasets/timeseries-example.csv
Full example:
#!/usr/bin/env python3
import pandas as pd
import numpy as np
from UliEngineering.SignalProcessing.Simulation import sine_wave, cosine_wave
# Configure the properties of the sine wave here
frequency = 10.0 # 10 Hz sine / cosine wave
samplerate = 10000 # 10 kHz
nseconds = 1 # Generate 1 second of data
sine = sine_wave(frequency=frequency, samplerate=samplerate, length=nseconds)
cosine = cosine_wave(frequency=frequency, samplerate=samplerate, length=nseconds)
nsamples = len(sine) # How many values we have in the data arrays
start_timestamp = pd.Timestamp('now')
# Create timestamps by offsetting
timedelta = pd.Timedelta(1/samplerate, 'seconds')
timestamps = [start_timestamp + i * timedelta for i in range(nsamples)]
df = pd.DataFrame(index=timestamps, data={
"Sine": sine,
"Cosine": cosine
})
df.index.name = 'Timestamp'
df.to_csv("timeseries-example.csv")