如何创建 pandas 时间序列 DataFrame 示例数据集
TL;DR: 像这样使用我们预构建的示例数据集:
load_timeseries_example.py
# 加载预构建的时间序列示例数据集
df = pd.read_csv("https://datasets.techoverflow.net/timeseries-example.csv", parse_dates=["Timestamp"])
df.set_index("Timestamp", inplace=True)如何构建你自己的时间序列示例数据集
在我们的上一篇文章使用 UliEngineering 在 Python 中轻松生成正弦/余弦波形数据中,我们展示了如何使用 UliEngineering 生成正弦和余弦波。
在这篇文章中,我们展示如何创建包含正弦和余弦数据的 pandas DataFrame,用作示例时间序列数据集。
首先,我们生成正弦和余弦波数据:
generate_timeseries.py
import pandas as pd
import numpy as np
from UliEngineering.SignalProcessing.Simulation import sine_wave, cosine_wave
# 在此处配置正弦波的属性
frequency = 10.0 # 10 Hz 正弦/余弦波
samplerate = 10000 # 10 kHz
nseconds = 1 # 生成 1 秒的数据
sine = sine_wave(frequency=frequency, samplerate=samplerate, length=nseconds)
cosine = cosine_wave(frequency=frequency, samplerate=samplerate, length=nseconds)
nsamples = len(sine) # 数据数组中有多少值之后,我们定义数据集开始的时间戳:
start_timestamp.py
start_timestamp = pd.Timestamp('now')现在我们可以创建一个 Timestamp 对象列表,表示信号被采样的时间点:
create_timestamps.py
# 通过偏移创建时间戳
timedelta = pd.Timedelta(1/samplerate, 'seconds')
timestamps = [start_timestamp + i * timedelta for i in range(nsamples)]现在我们准备创建 DataFrame 对象:
build_dataframe.py
df = pd.DataFrame(index=timestamps, data={
"Sine": sine,
"Cosine": cosine
})
df.index.name = 'Timestamp'现在我们可以使用 df.plot() 绘制数据集:
plot_timeseries.py
# 使用漂亮的绘图样式
from matplotlib import pyplot as plt
plt.style.use("ggplot")
# 绘制数据集
df.plot()
# 使图更大
plt.gcf().set_size_inches(10, 5)此外,我们可以使用以下命令将数据集导出为 CSV
export_timeseries.py
df.to_csv("/ram/timeseries-example.csv")此示例文件也可在线获取 https://techoverflow.net/datasets/timeseries-example.csv
完整示例:
timeseries_full_example.py
#!/usr/bin/env python3
import pandas as pd
import numpy as np
from UliEngineering.SignalProcessing.Simulation import sine_wave, cosine_wave
# 在此处配置正弦波的属性
frequency = 10.0 # 10 Hz 正弦/余弦波
samplerate = 10000 # 10 kHz
nseconds = 1 # 生成 1 秒的数据
sine = sine_wave(frequency=frequency, samplerate=samplerate, length=nseconds)
cosine = cosine_wave(frequency=frequency, samplerate=samplerate, length=nseconds)
nsamples = len(sine) # 数据数组中有多少值
start_timestamp = pd.Timestamp('now')
# 通过偏移创建时间戳
timedelta = pd.Timedelta(1/samplerate, 'seconds')
timestamps = [start_timestamp + i * timedelta for i in range(nsamples)]
df = pd.DataFrame(index=timestamps, data={
"Sine": sine,
"Cosine": cosine
})
df.index.name = 'Timestamp'
df.to_csv("timeseries-example.csv")If this post helped you, please consider buying me a coffee or donating via PayPal to support research & publishing of new posts on TechOverflow