pandas

Apr

2024

How to fix pandas / matplotlib error: raise ValueError(f’Date ordinal {x} converts to {dt}..

Problem:

You are trying to plot a pandas DataFrame or Series using code such as

df.plot()

but you see an error message like

Traceback (most recent call last):
  ...
  File "/usr/local/lib/python3.11/site-packages/matplotlib/figure.py", line 3390, in savefig
    self.canvas.print_figure(fname, **kwargs)
  File "/usr/local/lib/python3.11/site-packages/matplotlib/backend_bases.py", line 2164, in print_figure
    self.figure.draw(renderer)
  File "/usr/local/lib/python3.11/site-packages/matplotlib/artist.py", line 95, in draw_wrapper
    result = draw(artist, renderer, *args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/matplotlib/artist.py", line 72, in draw_wrapper
    return draw(artist, renderer)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/matplotlib/figure.py", line 3154, in draw
    mimage._draw_list_compositing_images(
  File "/usr/local/lib/python3.11/site-packages/matplotlib/image.py", line 132, in _draw_list_compositing_images
    a.draw(renderer)
  File "/usr/local/lib/python3.11/site-packages/matplotlib/artist.py", line 72, in draw_wrapper
    return draw(artist, renderer)
           ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/matplotlib/axes/_base.py", line 3034, in draw
    self._update_title_position(renderer)
  File "/usr/local/lib/python3.11/site-packages/matplotlib/axes/_base.py", line 2969, in _update_title_position
    bb = ax.xaxis.get_tightbbox(renderer)
  File "/usr/local/lib/python3.11/site-packages/matplotlib/axis.py", line 1334, in get_tightbbox
    ticks_to_draw = self._update_ticks()
  File "/usr/local/lib/python3.11/site-packages/matplotlib/axis.py", line 1276, in _update_ticks
    major_labels = self.major.formatter.format_ticks(major_locs)
  File "/usr/local/lib/python3.11/site-packages/matplotlib/ticker.py", line 216, in format_ticks
    return [self(value, i) for i, value in enumerate(values)]
  File "/usr/local/lib/python3.11/site-packages/matplotlib/ticker.py", line 216, in <listcomp>
    return [self(value, i) for i, value in enumerate(values)]
  File "/usr/local/lib/python3.11/site-packages/matplotlib/dates.py", line 649, in __call__
    result = num2date(x, self.tz).strftime(self.fmt)
  File "/usr/local/lib/python3.11/site-packages/matplotlib/dates.py", line 543, in num2date
    return _from_ordinalf_np_vectorized(x, tz).tolist()
  File "/usr/local/lib/python3.11/site-packages/numpy/lib/function_base.py", line 2372, in __call__
    return self._call_as_normal(*args, **kwargs)
  File "/usr/local/lib/python3.11/site-packages/numpy/lib/function_base.py", line 2365, in _call_as_normal
    return self._vectorize_call(func=func, args=vargs)
  File "/usr/local/lib/python3.11/site-packages/numpy/lib/function_base.py", line 2455, in _vectorize_call
    outputs = ufunc(*inputs)
  File "/usr/local/lib/python3.11/site-packages/matplotlib/dates.py", line 362, in _from_ordinalf
    raise ValueError(f'Date ordinal {x} converts to {dt} (using

Solution:

This issue has been discussed on the Matplotlib Github and also pandas Github. This issue occurs if you use df.plot() but then set custom formatting options for the x axis, for datetime index DataFrames.

Right now you can work around this bug by manually plotting the data:

for column in df.columns:
    plt.plot(df.index.values, df[column].values, label=column)

instead, but keep in mind that you might need to set some matplotlib options yourself (which pandas would otherwise set automatically) when plotting everything manually.

Apr

2024

How to iterate rows of a pandas DataFrame with timestamp index

You can use df.iterrows() normally:

import pandas as pd
import numpy as np

# Assuming 'df' is your DataFrame
# For demonstration, let's create a sample DataFrame
dates = pd.date_range('2023-01-01', periods=5)
data = np.random.randn(5, 2)
df = pd.DataFrame(data, index=dates, columns=['A', 'B'])

# Iterating over rows
for index, row in df.iterrows():
    print(f"Index: {index}, A: {row['A']}, B: {row['B']}")

Note that index is a pd.Timestamp.

Apr

2024

How to drop multiple rows from pandas DataFrame by index

import pandas as pd
import numpy as np

# Create a sample DataFrame with datetime index
dates = pd.date_range('2023-01-01', periods=10)
data = np.random.randn(10, 2)
df = pd.DataFrame(data, index=dates, columns=['A', 'B'])

# Timestamps to drop
timestamps_to_drop = [pd.Timestamp('2023-01-03'), pd.Timestamp('2023-01-05'), pd.Timestamp('2023-01-07'),
                      pd.Timestamp('2023-01-08'), pd.Timestamp('2023-01-10')]

# Drop rows
df_filtered = df.drop(index=timestamps_to_drop)

Jun

2023

How to fix ImportError: Missing optional dependency ‘xlrd’. Install xlrd >= 1.0.0 for Excel support

Problem:

While trying to read a XLSX file using pandas , you see an error message such as

File /usr/lib/python3.10/importlib/__init__.py:126, in import_module(name, package)
    125         level += 1
--> 126 return _bootstrap._gcd_import(name[level:], package, level)

File :1050, in _gcd_import(name, package, level)

File :1027, in _find_and_load(name, import_)

File :1004, in _find_and_load_unlocked(name, import_)

ModuleNotFoundError: No module named 'xlrd'

During handling of the above exception, another exception occurred:

ImportError                               Traceback (most recent call last)
/home/uli/myproject/Analyze.ipynb Cell 3 in 2
     18 # Determine absolute path
     19 filepath = os.path.join(directory, filename)
...
--> 144         raise ImportError(msg)
    145     else:
    146         return None

ImportError: Missing optional dependency 'xlrd'. Install xlrd >= 1.0.0 for Excel support Use pip or conda to install xlrd.

Solution:

You can install xlrd using pip:

sudo pip install xlrd

May

2023

How to read KiCAD pick&place position file using pandas in Python

If you’ve exported a KiCAD pick & place position file using the GUI or the command line:

kicad-cli pcb export pos MyPCB.kicad_pcb --units mm -o MyPCB.pos

you can read it from within your Python script using pandas.read_table(...) like this:

import pandas as pd

pos = pd.read_table('KKS-Microcontroller-Board-R2.2.pos', delim_whitespace=True, names=["Ref", "Val", "Package", "PosX", "PosY", "Rot","Side"], comment="#")

Optionally, you can also index pos by the Ref column (which contains values such as C12, D1 or U5):

pos.set_index("Ref", inplace=True)

You can also pack all that into a function:

def read_kicad_pos_file(filename):
    pos = pd.read_table(filename, delim_whitespace=True, names=["Ref", "Val", "Package", "PosX", "PosY", "Rot","Side"], comment="#")
    pos.set_index("Ref", inplace=True)
    return pos

If you’ve used .set_index(), you can access a component such as C13 using

pos.loc["C13"]

Example output:

Val                100nF_25V
Package    C_0603_1608Metric
PosX                187.1472
PosY               -101.8243
Rot                    180.0
Side                     top
Name: C1, dtype: object

May

2022

How to convert collections.Counter to pandas DataFrame

Pandas can take care of the conversion of a Counter to a DataFrameby itself but you need to add a column label:

pd.DataFrame({"YourColumnLabelGoesHere": counterObject})

Full example

import pandas as pd
from collections import Counter

ctr = Counter()
ctr["a"] += 1
ctr["b"] += 1
ctr["a"] += 1
ctr["a"] += 1
ctr["b"] += 1
ctr["a"] += 1
ctr["c"] += 1

pd.DataFrame({"ctr": ctr})

This will result in the following DataFrame:

Feb

2022

Pandas: How to apply numpy function to every column

You can use df.transform(func, axis=0) to apply a numpy function. This leverages the fact that numpy functions work with pandas Series objects.

Example based on How to create pandas time series DataFrame example dataset:

# Load pre-built time series example dataset
df = pd.read_csv("https://datasets.techoverflow.net/timeseries-example.csv", parse_dates=["Timestamp"])
df.set_index("Timestamp", inplace=True)

# np.square will be called individually for each column
new_df = df.transform(np.square, axis=0)

Output

Original time series:

Squared time series:

Full example code

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

# Load pre-built time series example dataset
df = pd.read_csv("https://datasets.techoverflow.net/timeseries-example.csv", parse_dates=["Timestamp"])
df.set_index("Timestamp", inplace=True)

# np.sqrt will be called individually for each column
new_df = df.transform(np.square, axis=0)

# Plot subsection of original DF for better visibility
df.iloc[:len(df)//2].plot()
plt.gcf().set_size_inches(10,5)
plt.savefig("Normal-Timeseries.svg")

# Plot subsection of transformed DF for better visibility
new_df.iloc[:len(df)//2].plot()
plt.gcf().set_size_inches(10,5)
plt.savefig("Square-Timeseries.svg")

Jan

2022

matplotlib: How to easily format y value as percent [%]

When plotting our time series example dataset, this is the resulting plot

This post shows how to easily plot this dataset with an y axis formatted as percent. We will assume that 1.00 maps to 100%. This post is based on our previous work on Matplotlib custom SI-prefix unit tick formatter:

Note that for pandas, you need to first call df.plot() and call set_major_formatter() after that!

import matplotlib.ticker as mtick
df.plot()
plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter(xmax=1.0))

If you instead want 100.0 to map to 100%, just use xmax=100.0:

import matplotlib.ticker as mtick
df.plot()
plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter(xmax=1.0))

Full example

import matplotlib.ticker as mtick

# Load pre-built time series example dataset
df = pd.read_csv("https://datasets.techoverflow.net/timeseries-example.csv", parse_dates=["Timestamp"])
df.set_index("Timestamp", inplace=True)

# Plot with Y axis scaled as percent
df.plot()
plt.gca().yaxis.set_major_formatter(mtick.PercentFormatter(xmax=1.0))

Jan

2022

How to get first column name of pandas DataFrame

df.columns[0]

will return the name of the first column as str.

Jan

2022

How to plot multiple pandas DataFrames in a single graph

If you use code like

df1.plot()
df2.plot()

you will see both DataFrames being plotted each in their separate graphs/plots.

In order to plot both of them in a single plot, use

ax = df1.plot()
df2.plot(ax=ax)

Jan

2022

How to combine two pandas DataFrames with the same index

If you have two pandas DataFrames you want to join where the index in both DataFrames are and you want to obtain a DataFrame where the respective columns are set to NaN if there is no value from the respective DataFrame, this is typically the correct way to do it:

df_compare = pd.concat([df1, df2], axis=1, join='outer')

If you only want to keep values where both DataFrames have some value, use join='outer'

Jan

2022

How to rename a single column of a pandas DataFrame

In our previous post How to replace string in column names in Pandas DataFrame we already covered a generic method of renaming pandas columns by replacing strings.

If you just want to rename a specific column, say, the column is named value and you want to convert it to €, use this snippet:

new_df = df.rename(columns=lambda s: "€" if s == "value" else s)

or use inplace=True if you want to modify the original DataFrame:

df.rename(columns=lambda s: "€" if s == "value" else s, inplace=True)

Jan

2022

How to fix pandas AttributeError: module ‘pandas’ has no attribute ‘timedelta’

Problem:

While running your Python code or Jupyter notebook, you see an error message like

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
/tmp/ipykernel_99995/3617538487.py in <module>
      1 for date in df.index:
----> 2     print(date - pd.timedelta(5, 'years'))

/usr/local/lib/python3.9/dist-packages/pandas/__init__.py in __getattr__(name)
    259         return _SparseArray
    260 
--> 261     raise AttributeError(f"module 'pandas' has no attribute '{name}'")
    262 
    263 

AttributeError: module 'pandas' has no attribute 'timedelta'

Solution:

pandas Timedelta is spelled with a capital T: Timedelta not timedelta.

Jan

2022

How to parse date column in pandas

If you have a pandas column containing a date string

df

which is listed as object:

df["date"]

parse it like this:

df["date"] = pd.to_datetime(df["date"])

After which it will be listed as datetime64[ns]:

df["date"]

Sep

2021

How to center-align all columns in pandas XLSX export

In our previous post How to center-align column in pandas XLSX export we showed how to center-align one specific column in a XLSX export.

In this post, we’ll show how to center all columns in a pandas DataFrame XLSX export. Note that the variants with and without index shown here differ only in the column index computation. Pandas automatically centers the index column, so we don’t need to do that explicitly.

For export with index

# Center all columns
for column_idx in range(len(df.columns)):
    for cell in sheet[openpyxl.utils.cell.get_column_letter(column_idx + 1)]:
        cell.alignment = Alignment(horizontal="center")

For export without index

# Center all columns
for column_idx in range(len(df.columns)):
    for cell in sheet[openpyxl.utils.cell.get_column_letter(column_idx + 1)]:
        cell.alignment = Alignment(horizontal="center")

Full example:

import pandas as pd
import openpyxl.utils.cell
from openpyxl.styles.alignment import Alignment

df = pd.DataFrame([
    {"a": 1.0},
    {"a": 2.0},
    {"a": 3.0},
    {"a": 4.0},
    {"a": 5.0},
])
with pd.ExcelWriter("out.xlsx", engine="openpyxl") as writer:
    sheet_name = "Bool"
    # Export DataFrame content
    df.to_excel(writer, sheet_name=sheet_name)
    # Align to center
    sheet = writer.sheets[sheet_name]
    # Center all columns
    for column_idx in range(len(df.columns) + 1):
        for cell in sheet[openpyxl.utils.cell.get_column_letter(column_idx + 1)]:
            cell.alignment = Alignment(horizontal="center")

The table will look like this with the centering enabled:

whereas the table is right-aligned with only the title centered by Pandas with the centering code disabled: – Note that the index column is centered automatica

Sep

2021

How to center-align column in pandas XLSX export

Note: Want to align all columns in the export? See our followup post How to center-align all columns in pandas XLSX export

When exporting a DataFrame in pandas, often you want to center a column in order to make the resulting table visually more appealing.

In this post we’ll show how to do this using the openpyxl engine.

If you want to center column B, add this code within the pd.ExcelWriter with: block:

for cell in sheet["B"]:
    cell.alignment = Alignment(horizontal="center")

Full example:

import pandas as pd
from openpyxl.styles.alignment import Alignment

df = pd.DataFrame([
    {"a": 1.0},
    {"a": 2.0},
    {"a": 3.0},
    {"a": 4.0},
    {"a": 5.0},
])
with pd.ExcelWriter("out.xlsx", engine="openpyxl") as writer:
    sheet_name = "Bool"
    # Export DataFrame content
    df.to_excel(writer, sheet_name=sheet_name)
    # Align to center
    sheet = writer.sheets[sheet_name]
    # Align every cell in column "B" horizontally
    for cell in sheet["B"]:
        cell.alignment = Alignment(horizontal="center")

The table will look like this with the centering enabled:

whereas the table is right-aligned with only the title centered by Pandas with the centering code disabled:

Sep

2021

How to write Pandas bool column as True/False instead of 1/0 to XLSX

Problem:

When writing a pandas XLSX, bool-type columns are shown are 0 for False or 1 for True.

df = pd.DataFrame([
    {"a": True},
    {"a": False},
    {"a": True},
    {"a": False},
    {"a": False},
])

with pd.ExcelWriter("out.xlsx", engine="xlsxwriter") as writer:
    df.to_excel(writer)

Solution:

Map the column to a string column with your desired value before exporting:

df["a"] = df["a"].map({True: "True", False: "False"})

Full example code:

df = pd.DataFrame([
    {"a": True},
    {"a": False},
    {"a": True},
    {"a": False},
    {"a": False},
])
df["a"] = df["a"].map({True: "True", False: "False"})
with pd.ExcelWriter("out.xlsx", engine="xlsxwriter") as writer:
    df.to_excel(writer)

Sep

2021

How to create pandas pd.DataFrame from list of dicts

You can create a pandas pd.DataFrame from a list of dicts (lst) using the pd.DataFrame(lst) constructor.

a = {"x": 1, "y": 2}
b = {"x": 3, "y": 2}
c = {"x": 6, "y": 9}
df = pd.DataFrame([a,b,c])

df will look like this:

Aug

2021

How to read single value from XLS(X) using pandas

pandas can be used conveniently to read a table of values from Excel. When extracting data from real-life Excel sheets, there are often metadata fields which are not structured as a table readable by pandas.

Reading the pandas docs, it is not obvious how we can extract the value of a single cell (without any associated headers) with a fixed position, for example:

In this example, we want to extract cell C3, that is we want to end up with a string of value Test value #123. Since there is no clear table structure in this excel sheet (and other cells might contain other values we are not interested in – for example, headlines or headers), we don’t want to have a pd.DataFrame but simply a string.

This is how you can do it:

def read_value_from_excel(filename, column="C", row=3):
    """Read a single cell value from an Excel file"""
    return pd.read_excel(filename, skiprows=row - 1, usecols=column, nrows=1, header=None, names=["Value"]).iloc[0]["Value"]

# Example usage
read_value_from_excel(“Test.xlsx”, “C”, 3) # Prints
Let’s explain the parameters we’re using as arguments to pd.read_excel():

Test.xlsx: The filename of the file you want to read
skiprows=2: Row number minus one, so the desired row is the first we read
usecols="C": Which columns we’re interested in – only one!
nrows=1: Read only a single row
header=None: Do not assume that the first row we read is a header row
names=["Value"]: Set the name for the single column to Value
.iloc[0]: From the resulting pd.DataFrame, get the first row ([0]) by index (iloc)
["Value"] From the resulting row, extract the "Value" column – which is the only column available.

In my opinion, using pandas is the best way of extracting for most real-world usecases (i.e. more focused on development speed than on execution speed) because not only does it provide automatic engine selection for .xls and .xlsx files, it’s also present on most Data Science setups anyway and provides a standardized API.

Aug

2021

How to fix pandas pd.read_excel() error XLRDError: Excel xlsx file; not supported

Problem:

When trying to read an .xlsx file using pandas pd.read_excel() you see this error message:

XLRDError: Excel xlsx file; not supported

Solution:

The xlrd library only supports .xls files, not .xlsx files. In order to make pandas able to read .xlsx files, install openpyxl:

sudo pip3 install openpyxl

After that, retry running your script (if you are running a Jupyter Notebook, be sure to restart the notebook to reload pandas!).

If the error still persists, you have two choices:

Choice 1 (preferred): Update pandas

Pandas 1.1.3 doesn’t automatically select the correct XLSX reader engine, but pandas 1.3.1 does:

sudo pip3 install --upgrade pandas

If you are running a Jupyter Notebook, be sure to restart the notebook to load the updated pandas version!

Choice 2: Explicitly set the engine in `pd.read_excel()`

Add engine='openpyxl' to your pd.read_excel() command, for example:

pd.read_excel('my.xlsx', engine='openpyxl')

Problem:

Solution:

Problem:

Solution:

Full example

Output

Full example code

Full example

Problem:

Solution:

For export with index

For export without index

Full example:

Full example:

Problem:

Solution:

Full example code:

Problem:

Solution:

Choice 1 (preferred): Update pandas

Choice 2: Explicitly set the engine in pd.read_excel()

Search

Categories

Choice 2: Explicitly set the engine in `pd.read_excel()`