Python

How to iterate column names in pandas

In order to iterate column names in pandas, use

for column in df.columns:
    print(columns)

For example, for the TechOverflow pandas time series example dataset, df.columnswill be

Index(['Sine', 'Cosine'], dtype='object')

so iterating over the columns using

# Load pre-built time series example dataset
df = pd.read_csv("https://datasets.techoverflow.net/timeseries-example.csv", parse_dates=["Timestamp"])
df.set_index("Timestamp", inplace=True)

# Iterate over columns
for column in df.columns:
    print(column)

will print

Sine
Cosine

Note that df.columns will not show the index column!

Posted by Uli Köhler in pandas, Python

How to replace pandas values by NaN by threshold

When processing pandas datasets, often you need to remove values above or below a given threshold from a dataset. One way to “remove” values from a dataset is to replace them by NaN (not a number) values which are typically treated as “missing” values.

For example: In order to replace values of the xcolumn by NaNwhere the x column is< 0.75 in a DataFrame df, use this snippet:

import numpy as np

df["x"][df["x"] < -0.75] = np.nan

For example, we can run this on the TechOverflow pandas time series example dataset. The original dataset has two columns: Sine and Cosine and looks like this:

After running

df["Sine"][df["Sine"] < -0.75] = np.nan

you can see that all Sine values below 0.75 have been omitted from the plot, but all the values from the Cosine column are left unchanged:

 

Complete example code:

import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
plt.style.use("ggplot")

# Load pre-built time series example dataset
df = pd.read_csv("https://datasets.techoverflow.net/timeseries-example.csv", parse_dates=["Timestamp"])
df.set_index("Timestamp", inplace=True)

# Plot original code
df.plot()
plt.savefig("TimeSeries-Original.svg")

and this is the code to plot the filtered dataset:

df[df < -0.75] = np.nan
df.plot()
plt.savefig("TimeSeries-NaN.svg")

 

Posted by Uli Köhler in Data science, pandas, Python

How to ignore pylint problem for a specific single line of code

If you have a Python line of code like

writer = pd.ExcelWriter(filename)

that produces a pylint problem message like

Abstract class 'ExcelWriter' with abstract methods instantiated pylint(abstract-class-instantiated)

you can ignore it by adding a comment in the format # pylint: disable=[problem-code] at the end of the line where [problem-code] is the value inside pylint(...) in the pylint message – for example, abstract-class-instantiated for the problem report listed above.

The modified line will look like this:

writer = pd.ExcelWriter(filename) # pylint: disable=abstract-class-instantiated

You can also add the comment line

# pylint: disable=abstract-class-instantiated

to the function-level (or to any other block-level) to disable that pylint rule for the entire current function or block.

Posted by Uli Köhler in Python

How to migrate from ansicolor to ansicolors

The ansicolor package in Python has not been updated in a long time and doesn’t have any documentation on PyPI. Therfore, I migrated my Python scripts to ansicolors (with an s at the end). Only a few steps are neccessary:

First, replace

from ansicolor import red

by

from colors import red

Additionally, bold=True is called style="bold" in ansicolors, hence you need to replace

red("msg", bold=True)

by

red("msg", style="bold")
Posted by Uli Köhler in Python

How to fix matplotlib .ylabel() AttributeError: ‘AxesSubplot’ object has no attribute ‘ylabel’

Problem:

You want to set the ylabel of a matplotlib plot using .ylabel("My ylabel") but you see an error messsage like

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-28a87d95bccc> in <module>
---> 11 axs[0].ylabel("My ylabel")

AttributeError: 'AxesSubplot' object has no attribute 'ylabel'

Solution:

Use .set_ylabel("My ylabel") instead of .ylabel("My ylabel") !

While

from matplotlib import pyplot as plt

plt.ylabel("My ylabel")

works fine, if you have an axes object like the one you get from plt.subplots(), you’ll have to use set_ylabel()!

from matplotlib import pyplot as plt
fig, axs = plt.subplots(2, 1)
# ...
axs[0].set_ylabel("My ylabel")

 

Posted by Uli Köhler in Python

How to fix matplotlib .xlabel() AttributeError: ‘AxesSubplot’ object has no attribute ‘xlabel’

Problem:

You want to set the xlabel of a matplotlib plot using .xlabel("My xlabel") but you see an error messsage like

---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-12-28a87d95bccc> in <module>
---> 11 axs[0].xlabel("My xlabel")

AttributeError: 'AxesSubplot' object has no attribute 'xlabel'

Solution:

Use .set_xlabel("My xlabel") instead of .xlabel("My xlabel") !

While

from matplotlib import pyplot as plt

plt.xlabel("My xlabel")

works fine, if you have an axes object like the one you get from plt.subplots(), you’ll have to use set_xlabel()!

from matplotlib import pyplot as plt
fig, axs = plt.subplots(2, 1)
# ...
axs[0].set_xlabel("My xlabel")

 

Posted by Uli Köhler in Python

How to fix matplotlib .title() TypeError: ‘Text’ object is not callable

Problem:

You want to set the title of a matplotlib plot using .title("My title") but you see an error messsage like

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-11-f5f930f00eac> in <module>
---> 10 axs[0].title("My title")

TypeError: 'Text' object is not callable

Solution:

Use .set_title("My title") instead of .title("My title") !

While

from matplotlib import pyplot as plt

plt.title("My title")

works fine, if you have an axes object like the one you get from plt.subplots(), you’ll have to use set_title()!

from matplotlib import pyplot as plt
fig, axs = plt.subplots(2, 1)
# ...
axs[0].set_title("My title")

 

Posted by Uli Köhler in Python

How to replace string in column names in Pandas DataFrame

Use this snippet in order to replace a string in column names for a pandas DataFrame:

new_df = df.rename(columns=lambda s: s.replace("A", "B")) # df will not be modified !

You can also modify the column names in-place (i.e. modify the original DataFrame):

df.rename(columns=lambda s: s.replace("A", "B"), inplace=True)

For example, if you have the columns ["ColumnA", "X", "Y"] before running .rename(), the result will have ["ColumnB", "X", "Y"] (the "A" has been replaced by "B")

Posted by Uli Köhler in pandas, Python

How to find out if BytesIO is empty in Python

In order to find out if a BytesIO instance is empty or not, get its size and then check if it’s > 0:

my_bytesio.getbuffer().nbytes > 0

The following example shows how to use it in an if clause:

if my_bytesio.getbuffer().nbytes > 0:
    print("my_bytesio is empty")
else:
    print("my_bytesio is NOT empty")

 

Posted by Uli Köhler in Python

How to get size of BytesIO in Python

If you want to find out the size of the data stored in a io.BytesIO instance in Python, use

my_bytesio.getbuffer().nbytes
Posted by Uli Köhler in Python

How to send email with BytesIO attachment via SMTP in Python

This example details how to send an email in Python, with an attachment from a io.BytesIO instance instead of reading the attachment from a file on the filesystem:

#!/usr/bin/env python3
__author__ = "Uli Köhler"
__license__ = "CC0 1.0 Universal (public domain)"
__version__ = "1.0"
import smtplib
import mimetypes
from io import BytesIO
from email.message import EmailMessage

# Create message and set text content
msg = EmailMessage()
msg['Subject'] = 'This email contains an attachment'
msg['From'] = '[email protected]'
msg['To'] = '[email protected]'
# Set text content
msg.set_content('Please see attached file')

def attach_bytesio_to_email(email, buf, filename):
    """Attach a file identified by filename, to an email message"""
    # Reset read position & extract data
    buf.seek(0)
    binary_data = buf.read()
    # Guess MIME type or use 'application/octet-stream'
    maintype, _, subtype = (mimetypes.guess_type(filename)[0] or 'application/octet-stream').partition("/")
    # Add as attachment
    email.add_attachment(binary_data, maintype=maintype, subtype=subtype, filename=filename)

# Attach files
buf = BytesIO()
buf.write(b"This is a test text")
attach_bytesio_to_email(msg, buf, "test.txt")

def send_mail_smtp(mail, host, username, password):
    s = smtplib.SMTP(host)
    s.starttls()
    s.login(username, password)
    s.send_message(msg)
    s.quit()

send_mail_smtp(msg, 'smtp.my-domain.com', '[email protected]', 'sae7ooka0S')

The script from above is using the following utility functions:

def attach_bytesio_to_email(email, buf, filename):
    """Attach a file identified by filename, to an email message"""
    # Reset read position &amp; extract data
    buf.seek(0)
    binary_data = buf.read()
    # Guess MIME type or use 'application/octet-stream'
    maintype, _, subtype = (mimetypes.guess_type(filename)[0] or 'application/octet-stream').partition("/")
    # Add as attachment
    email.add_attachment(binary_data, maintype=maintype, subtype=subtype, filename=filename

def send_mail_smtp(mail, host, username, password):
    s = smtplib.SMTP(host)
    s.starttls()
    s.login(username, password)
    s.send_message(msg)
    s.quit()

which you can use in your code directly. The easiest way to initialize the email message is using

# Create message and set text content
msg = EmailMessage()
msg['Subject'] = 'This email contains an attachment'
msg['From'] = '[email protected]'
msg['To'] = '[email protected]'
# Set text content
msg.set_content('Please see attached file')

and then attaching your BytesIO instance (named buf) using

attach_bytesio_to_email(msg, buf, "test.txt")

and once you’re finished with adding attachments, sending the message using

send_mail_smtp(msg, 'smtp.my-domain.com', '[email protected]', 'sae7ooka0S')

 

Posted by Uli Köhler in E-Mail, Python

How to send email with file attachment via SMTP in Python

This example shows how to send an email with an attachment in Python, with the attachment being read from a file from the filesystem:

#!/usr/bin/env python3
import smtplib
import mimetypes
from email.message import EmailMessage

# Create message and set text content
msg = EmailMessage()
msg['Subject'] = 'This email contains an attachment'
msg['From'] = '[email protected]'
msg['To'] = '[email protected]'
# Set text content
msg.set_content('Please see attached file')

def attach_file_to_email(email, filename):
    """Attach a file identified by filename, to an email message"""
    with open(filename, 'rb') as fp:
        file_data = fp.read()
        maintype, _, subtype = (mimetypes.guess_type(filename)[0] or 'application/octet-stream').partition("/")
        email.add_attachment(file_data, maintype=maintype, subtype=subtype, filename=filename)

# Attach files
attach_file_to_email(msg, "myfile.pdf")

def send_mail_smtp(mail, host, username, password):
    s = smtplib.SMTP(host)
    s.starttls()
    s.login(username, password)
    s.send_message(mail)
    s.quit()

send_mail_smtp(msg, 'smtp.my-domain.com', '[email protected]', 'sae7ooka0S')

The utility functions in this code are:

import smtplib
import mimetypes

def attach_file_to_email(email, filename):
    """Attach a file identified by filename, to an email message"""
    with open(filename, 'rb') as fp:
        file_data = fp.read()
        maintype, _, subtype = (mimetypes.guess_type(filename)[0] or 'application/octet-stream').partition("/")
        email.add_attachment(file_data, maintype=maintype, subtype=subtype, filename=filename)

def send_mail_smtp(mail, host, username, password):
    s = smtplib.SMTP(host)
    s.starttls()
    s.login(username, password)
    s.send_message(mail)
    s.quit()

Initialize your email like this:

# Create message and set text content
msg = EmailMessage()
msg['Subject'] = 'This email contains an attachment'
msg['From'] = '[email protected]'
msg['To'] = '[email protected]'
# Set text content
msg.set_content('Please see attached file')

and then attach the file like this:

attach_file_to_email(msg, "myfile.pdf")

and send the email using

send_mail_smtp(msg, 'smtp.my-domain.com', '[email protected]', 'sae7ooka0S')
Posted by Uli Köhler in Python

How to download Wasabi/S3 object to string/bytes using boto3 in Python

You can use io.BytesIO to store the content of an S3 object in memory and then convert it to bytes which you can then decode to a str. The following example downloads myfile.txt into memory:

# Download to file
buf = io.BytesIO()
my_bucket.download_fileobj("myfile.txt", buf)
# Get file content as bytes
filecontent_bytes = buf.getvalue()
# ... or convert to string
filecontent_str = buf.getvalue().decode("utf-8")

Full example

import boto3
import io

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
my_bucket = s3.Bucket('boto-test')

# Download to file
buf = io.BytesIO()
my_bucket.download_fileobj("myfile.txt", buf)
# Get file content as bytes
filecontent_bytes = buf.getvalue()
# ... or convert to string
filecontent_str = buf.getvalue().decode("utf-8")

print(filecontent_str)

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL instead of https://s3.eu-central-1.wasabisys.com.

Posted by Uli Köhler in Python, S3

How to upload string as Wasabi/S3 object using boto3 in Python

In order to upload a Python string like

my_string = "This shall be the content for a file I want to create on an S3-compatible storage"

to an S3-compatible storage like Wasabi or Amazon S3, you need to encode it using .encode("utf-8") and then wrap it in an io.BytesIO object:

my_bucket.upload_fileobj(io.BytesIO(my_string.encode("utf-8")), "myfile.txt")

Full example:

import boto3
import io

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
my_bucket = s3.Bucket('boto-test')

# Upload string to file
my_string = "This shall be the content for a file I want to create on an S3-compatible storage"

my_bucket.upload_fileobj(io.BytesIO(my_string.encode("utf-8")), "myfile.txt")

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL instead of https://s3.eu-central-1.wasabisys.com.

Posted by Uli Köhler in Python, S3

How to filter for objects in a given S3 directory using boto3

Using boto3, you can filter for objects in a given bucket by directory by applying a prefix filter.

Instead of iterating all objects using

for obj in my_bucket.objects.all():
    pass # ...

(see How to use boto3 to iterate ALL objects in a Wasabi / S3 bucket in Python for a full example)

you can apply a prefix filter using

for obj in my_bucket.objects.filter(Prefix="MyDirectory/"):
    print(obj)

Don’t forget the trailing / for the prefix argument ! Just using filter(Prefix="MyDirectory") without a trailing slash will also match e.g. MyDirectoryFileList.txt.

This complete example prints the object description for every object in the 10k-Test-Objects directory (from our post on How to use boto3 to create a lot of test files in Wasabi / S3 in Python).

import boto3

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
my_bucket = s3.Bucket('boto-test')

# Iterate over objects in bucket
for obj in my_bucket.objects.filter(Prefix="MyDirectory"):
    print(obj)

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL instead of https://s3.eu-central-1.wasabisys.com.

Example output:

s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/10.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/100.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1000.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/10000.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1001.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1002.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1003.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1004.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1005.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1006.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1007.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1008.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1009.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/101.txt')
[...]

 

Posted by Uli Köhler in Python, S3

How to use boto3 to iterate ALL objects in a Wasabi / S3 bucket in Python

This snippet shows you how to iterate over all objects in a bucket:

import boto3

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
my_bucket = s3.Bucket('boto-test')

# Iterate over objects in bucket
for obj in my_bucket.objects.all():
    print(obj)

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL instead of https://s3.eu-central-1.wasabisys.com.

Example output:

s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/10.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/100.txt')
[...]

 

Posted by Uli Köhler in Python, S3

How to use boto3 to create a lot of test files in Wasabi / S3 in Python

The following example code creates 10000 test files on Wasabi / S3. It is based on How to use concurrent.futures map with a tqdm progress bar:

import boto3
import concurrent.futures
executor = concurrent.futures.ThreadPoolExecutor(64)

from tqdm import tqdm
import concurrent.futures
def tqdm_parallel_map(executor, fn, *iterables, **kwargs):
    """
    Equivalent to executor.map(fn, *iterables),
    but displays a tqdm-based progress bar.
    
    Does not support timeout or chunksize as executor.submit is used internally
    
    **kwargs is passed to tqdm.
    """
    futures_list = []
    for iterable in iterables:
        futures_list += [executor.submit(fn, i) for i in iterable]
    for f in tqdm(concurrent.futures.as_completed(futures_list), total=len(futures_list), **kwargs):
        yield f.result()

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
boto_test_bucket = s3.Bucket('boto-test')

def create_s3_object(i, directory):
    # Create test data
    buf = io.BytesIO()
    buf.write(f"{i}".encode())
    # Reset read pointer. DOT NOT FORGET THIS, else all uploaded files will be empty!
    buf.seek(0)

    # Upload the file
    boto_test_bucket.upload_fileobj(buf, f"{directory}/{i}.txt")

for _ in tqdm_parallel_map(executor, lambda i: create_s3_object(i, directory="10k-Test-Objects"), range(1, 10001)):
    pass

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL at https://s3.eu-central-1.wasabisys.com.

Note that running this script, especially when creating lots of test files, will send a lot of requests to your S3 provider and, depending on what plan you are using, these requests might be expensive. Wasabi, for example, does not charge for requests but charges for storage (with a minimum of 1TB storage per month being charged, at the time of writing this).

Posted by Uli Köhler in Python, S3

How to use boto3 to upload BytesIO to Wasabi / S3 in Python

This snippet provides a concise example on how to upload a io.BytesIO() object to

import boto3

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
boto_test_bucket = s3.Bucket('boto-test')

# Create a test BytesIO we want to upload
import io
buf = io.BytesIO()
buf.write(b"Hello S3 world!")

# Reset read pointer. DOT NOT FORGET THIS, else all uploaded files will be empty!
buf.seek(0)
    
# Upload the file. "MyDirectory/test.txt" is the name of the object to create
boto_test_bucket.upload_fileobj(buf, "MyDirectory/test.txt")

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL at https://s3.eu-central-1.wasabisys.com.

Also don’t forget

buf.seek(0)

or your uploaded files will be empty.

 

Posted by Uli Köhler in Python, S3

How to use boto3 to upload file to Wasabi / S3 in Python

Using boto to upload data to Wasabi is pretty simple, but not well-documented.

import boto3

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
boto_test_bucket = s3.Bucket('boto-test')

# Create a test file we want to upload
with open("upload-test.txt", "w") as outfile:
    outfile.write("Hello S3!")
    
# Upload the file. "MyDirectory/test.txt" is the name of the object to create
boto_test_bucket.upload_file("upload-test.txt", "MyDirectory/test.txt")

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL at https://s3.eu-central-1.wasabisys.com.

Posted by Uli Köhler in Python, S3

How to fix Python ValueError: unsupported format character ‘ ‘ (0x20) at index 3

Problem:

You are trying to use Python format strings like

"Hello %.3 world" % 1.234

but you see an error message like

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-7cbe94e4525d> in <module>
----> 1 "Hello %.3 world" % 1.234

ValueError: unsupported format character ' ' (0x20) at index 9

Solution:

Your format string %.3 is incomplete ! You need to specify the type of value to format, e.g. f for floating point, d for integers, s for strings etc. So instead of %.3, write %.3f, or instead of just % write e.g. %f, %d or %s depending on the data type of the variable you want to insert there.

Posted by Uli Köhler in Python