How to download Wasabi/S3 object to string/bytes using boto3 in Python

You can use io.BytesIO to store the content of an S3 object in memory and then convert it to bytes which you can then decode to a str. The following example downloads myfile.txt into memory:

# Download to file
buf = io.BytesIO()
my_bucket.download_fileobj("myfile.txt", buf)
# Get file content as bytes
filecontent_bytes = buf.getvalue()
# ... or convert to string
filecontent_str = buf.getvalue().decode("utf-8")

Full example

import boto3
import io

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
my_bucket = s3.Bucket('boto-test')

# Download to file
buf = io.BytesIO()
my_bucket.download_fileobj("myfile.txt", buf)
# Get file content as bytes
filecontent_bytes = buf.getvalue()
# ... or convert to string
filecontent_str = buf.getvalue().decode("utf-8")

print(filecontent_str)

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL instead of https://s3.eu-central-1.wasabisys.com.

Posted by Uli Köhler in Python, S3

How to download Wasabi/S3 object to file using boto3 in Python

You can use boto3’s download_fileobj() in order to download files from S3 to the local filesystem:

with open("myfile.txt", "wb") as outfile:
    my_bucket.download_fileobj("myfile.txt", outfile)

Note that the file needs to be opened in binary mode ("wb").

Full example

import boto3

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
my_bucket = s3.Bucket('boto-test')

# Download remote object "myfile.txt" to local file "test.txt"
my_bucket.download_file("myfile.txt", "test.txt")

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL instead of https://s3.eu-central-1.wasabisys.com.

Posted by Uli Köhler in Allgemein

How to upload string as Wasabi/S3 object using boto3 in Python

In order to upload a Python string like

my_string = "This shall be the content for a file I want to create on an S3-compatible storage"

to an S3-compatible storage like Wasabi or Amazon S3, you need to encode it using .encode("utf-8") and then wrap it in an io.BytesIO object:

my_bucket.upload_fileobj(io.BytesIO(my_string.encode("utf-8")), "myfile.txt")

Full example:

import boto3
import io

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
my_bucket = s3.Bucket('boto-test')

# Upload string to file
my_string = "This shall be the content for a file I want to create on an S3-compatible storage"

my_bucket.upload_fileobj(io.BytesIO(my_string.encode("utf-8")), "myfile.txt")

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL instead of https://s3.eu-central-1.wasabisys.com.

Posted by Uli Köhler in Python, S3

Mini CSS cheat-sheet for editing websites

These are the most common CSS attributes I use for editing existing websites

General

Append !important if your style doesn’t get applied. This overrides other styles, e.g. display: none !important;

display: none, block, inline-block

Colors & font style

Text color

color: red, #FF0000, lightblue – goto https://colorpicker.me/ and then copy the hex code

Color of the block behind the text:

background-color: red, #FF0000, lightblue – goto https://colorpicker.me/ and then copy the hex code

Other font styling:

font-weight: normal, bold

font-size: 200%, 30px

Spacing

Space outside the element (to other elements)

margin-top / margin-left / margin-right / margin-bottom: 20px

Space inside the element, to its content:

padding-top / padding-left / padding-right / padding-bottom: 20px

Geometric properties

Make the corners of a block rounded instead of sharp:

border-radius: 5px

Width and height of the element

width: 100%, 300px, 30vw

height: 30px, 100%

 

Posted by Uli Köhler in CSS

How to filter for objects in a given S3 directory using boto3

Using boto3, you can filter for objects in a given bucket by directory by applying a prefix filter.

Instead of iterating all objects using

for obj in my_bucket.objects.all():
    pass # ...

(see How to use boto3 to iterate ALL objects in a Wasabi / S3 bucket in Python for a full example)

you can apply a prefix filter using

for obj in my_bucket.objects.filter(Prefix="MyDirectory/"):
    print(obj)

Don’t forget the trailing / for the prefix argument ! Just using filter(Prefix="MyDirectory") without a trailing slash will also match e.g. MyDirectoryFileList.txt.

This complete example prints the object description for every object in the 10k-Test-Objects directory (from our post on How to use boto3 to create a lot of test files in Wasabi / S3 in Python).

import boto3

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
my_bucket = s3.Bucket('boto-test')

# Iterate over objects in bucket
for obj in my_bucket.objects.filter(Prefix="MyDirectory"):
    print(obj)

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL instead of https://s3.eu-central-1.wasabisys.com.

Example output:

s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/10.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/100.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1000.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/10000.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1001.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1002.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1003.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1004.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1005.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1006.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1007.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1008.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1009.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/101.txt')
[...]

 

Posted by Uli Köhler in Python, S3

How to use boto3 to iterate ALL objects in a Wasabi / S3 bucket in Python

This snippet shows you how to iterate over all objects in a bucket:

import boto3

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
my_bucket = s3.Bucket('boto-test')

# Iterate over objects in bucket
for obj in my_bucket.objects.all():
    print(obj)

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL instead of https://s3.eu-central-1.wasabisys.com.

Example output:

s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/10.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/100.txt')
[...]

 

Posted by Uli Köhler in Python, S3

How to use boto3 to create a lot of test files in Wasabi / S3 in Python

The following example code creates 10000 test files on Wasabi / S3. It is based on How to use concurrent.futures map with a tqdm progress bar:

import boto3
import concurrent.futures
executor = concurrent.futures.ThreadPoolExecutor(64)

from tqdm import tqdm
import concurrent.futures
def tqdm_parallel_map(executor, fn, *iterables, **kwargs):
    """
    Equivalent to executor.map(fn, *iterables),
    but displays a tqdm-based progress bar.
    
    Does not support timeout or chunksize as executor.submit is used internally
    
    **kwargs is passed to tqdm.
    """
    futures_list = []
    for iterable in iterables:
        futures_list += [executor.submit(fn, i) for i in iterable]
    for f in tqdm(concurrent.futures.as_completed(futures_list), total=len(futures_list), **kwargs):
        yield f.result()

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
boto_test_bucket = s3.Bucket('boto-test')

def create_s3_object(i, directory):
    # Create test data
    buf = io.BytesIO()
    buf.write(f"{i}".encode())
    # Reset read pointer. DOT NOT FORGET THIS, else all uploaded files will be empty!
    buf.seek(0)

    # Upload the file
    boto_test_bucket.upload_fileobj(buf, f"{directory}/{i}.txt")

for _ in tqdm_parallel_map(executor, lambda i: create_s3_object(i, directory="10k-Test-Objects"), range(1, 10001)):
    pass

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL at https://s3.eu-central-1.wasabisys.com.

Note that running this script, especially when creating lots of test files, will send a lot of requests to your S3 provider and, depending on what plan you are using, these requests might be expensive. Wasabi, for example, does not charge for requests but charges for storage (with a minimum of 1TB storage per month being charged, at the time of writing this).

Posted by Uli Köhler in Python, S3

How to use boto3 to upload BytesIO to Wasabi / S3 in Python

This snippet provides a concise example on how to upload a io.BytesIO() object to

import boto3

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
boto_test_bucket = s3.Bucket('boto-test')

# Create a test BytesIO we want to upload
import io
buf = io.BytesIO()
buf.write(b"Hello S3 world!")

# Reset read pointer. DOT NOT FORGET THIS, else all uploaded files will be empty!
buf.seek(0)
    
# Upload the file. "MyDirectory/test.txt" is the name of the object to create
boto_test_bucket.upload_fileobj(buf, "MyDirectory/test.txt")

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL at https://s3.eu-central-1.wasabisys.com.

Also don’t forget

buf.seek(0)

or your uploaded files will be empty.

 

Posted by Uli Köhler in Python, S3

How to use boto3 to upload file to Wasabi / S3 in Python

Using boto to upload data to Wasabi is pretty simple, but not well-documented.

import boto3

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
boto_test_bucket = s3.Bucket('boto-test')

# Create a test file we want to upload
with open("upload-test.txt", "w") as outfile:
    outfile.write("Hello S3!")
    
# Upload the file. "MyDirectory/test.txt" is the name of the object to create
boto_test_bucket.upload_file("upload-test.txt", "MyDirectory/test.txt")

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL at https://s3.eu-central-1.wasabisys.com.

Posted by Uli Köhler in Python, S3

A modern Kimai setup using docker-compose and nginx

This is the setup I use to run multiple productive kimai instances. In my example, I create the files in /opt/kimai-mydomain. The folder name is not critical, but it is helpful to distinguish multiple indepedent kimai instances.

First, let’s create /opt/kimai-mydomain/docker-compose.yml. You don’t need to modify anything in this file as every relevant configuration is loaded from .env using environment variables.

version: '3.5'
services:
  mariadb:
    image: mariadb:latest
    environment:
      - MYSQL_DATABASE=kimai
      - MYSQL_USER=kimai
      - MYSQL_PASSWORD=${MARIADB_PASSWORD}
      - MYSQL_ROOT_PASSWORD=${MARIADB_ROOT_PASSWORD}
    volumes:
      - ./mariadb_data:/var/lib/mysql
    command: --default-storage-engine innodb
    restart: unless-stopped
    healthcheck:
      test: mysqladmin -p${MARIADB_ROOT_PASSWORD} ping -h localhost
      interval: 20s
      start_period: 10s
      timeout: 10s
      retries: 3

  kimai:
    image: kimai/kimai2:apache-debian-master-prod
    environment:
      - APP_ENV=prod
      - TRUSTED_HOSTS=localhost,${HOSTNAME}
      - [email protected]
      - ADMINPASS=${KIMAI_ADMIN_PASSWORD}
      - DATABASE_URL=mysql://kimai:${MARIADB_PASSWORD}@mariadb/kimai
    volumes:
      - ./kimai_var:/opt/kimai/var
    ports:
      - '17919:8001'
    depends_on:
      - mariadb
    restart: unless-stopped

Now we’ll create the configuration in /opt/kimai-mydomain/.env:

MARIADB_ROOT_PASSWORD=eishi5Pae3chai1Aeth2wiuCh7Ahhi
MARIADB_PASSWORD=su1aesheereithubo0iedootaeRooT
KIMAI_ADMIN_PASSWORD=toiWaeShaiz5Yeifohngu6chunuo6C
[email protected]
HOSTNAME=kimai.mydomain.com

Generate random passwords for .env ! Do NOT leave the default passwords in .env !

You also need to set KIMAI_ADMIN_EMAIL and HOSTNAME correctly.

We can now create the kimai data directory and set the correct permissions:

mkdir -p kimai_var
chown -R 33:33 kimai_var

(33 is the user ID and group ID of the www-data user inside the container)

Now, we will initialize the kimai database and the user:

docker-compose run kimai console kimai:install -n

Once you see a line like

[Sun Mar 07 23:53:35.986477 2021] [core:notice] [pid 50] AH00094: Command line: '/usr/sbin/apache2 -D FOREGROUND'

stop the process using Ctrl+C as this means that Kimai has finished installing.

Now we can create a systemd service that automatically starts Kimai using TechOverflow’s method from Create a systemd service for your docker-compose project in 10 seconds:

curl -fsSL https://techoverflow.net/scripts/create-docker-compose-service.sh | sudo bash /dev/stdin

Now we only need to create an nginx config for reverse proxying of your Kimai domain. There is nothing special to be considered for the config, hence I’ll show my config just as an example that you can copy and paste.

server {
    server_name  kimai.mydomain.com;

    location / {
        proxy_pass http://localhost:17919/;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        proxy_redirect default;
    }

    listen [::]:443 ssl; # managed by Certbot
    ssl_certificate /etc/letsencrypt/live/kimai.mydomain.com/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/kimai.mydomain.com/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot

}
server {
    if ($host = kimai.mydomain.com) {
        return 301 https://$host$request_uri;
    } # managed by Certbot

    server_name  kimai.mydomain.com;

    listen [::]:80; # managed by Certbot
    return 404; # managed by Certbot
}

After setting up your config – I always recommend to setup TLS using Let’s Encrypt, even for test setups, open your Browser and go to your Kimai domain, e.g. to https://kimai.mydomain.com. You can directly login to kimai using KIMAI_ADMIN_EMAIL and KIMAI_ADMIN_PASSWORD as specified in .env.

Posted by Uli Köhler in Container, Docker

How to install python3 pip / pip3 in Alpine Linux

Problem:

You want to install pip3 (also called python3-pip) in Alpine linux, but running apk install python3-pip shows you that the package doesn’t exist

/ # apk add python3-pip
ERROR: unable to select packages:
  python3-pip (no such package):
    required by: world[python3-pip]

Solution:

You need to install py3-pip instead using

apk add py3-pip

Example output:

/ # apk add py3-pip
(1/35) Installing libbz2 (1.0.8-r1)
(2/35) Installing expat (2.2.10-r1)
(3/35) Installing libffi (3.3-r2)
[...]

 

Posted by Uli Köhler in Container, Docker, Linux

How to fix Python ValueError: unsupported format character ‘ ‘ (0x20) at index 3

Problem:

You are trying to use Python format strings like

"Hello %.3 world" % 1.234

but you see an error message like

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-4-7cbe94e4525d> in <module>
----> 1 "Hello %.3 world" % 1.234

ValueError: unsupported format character ' ' (0x20) at index 9

Solution:

Your format string %.3 is incomplete ! You need to specify the type of value to format, e.g. f for floating point, d for integers, s for strings etc. So instead of %.3, write %.3f, or instead of just % write e.g. %f, %d or %s depending on the data type of the variable you want to insert there.

Posted by Uli Köhler in Python

How to auto-fit Pandas pd.to_excel() XLSX column width

If you export XLSX data using df.to_excel(), the column widths in the spreadsheet are left as default and are not adjusted automatically:

# Load example dataset
df = pd.read_csv("https://datasets.techoverflow.net/timeseries-example.csv", parse_dates=["Timestamp"])
df.set_index("Timestamp", inplace=True)

# Export dataset to XLSX
df.to_excel("example.xlsx")

Solution

You can use UliPlot‘s auto_adjust_xlsx_column_width in order to automatically adjust the column width.

pip install UliPlot

Then use it like this in order to  export the XLSX:

from UliPlot.XLSX import auto_adjust_xlsx_column_width

# Load example dataset
df = pd.read_csv("https://datasets.techoverflow.net/timeseries-example.csv", parse_dates=["Timestamp"])
df.set_index("Timestamp", inplace=True)

# Export dataset to XLSX
with pd.ExcelWriter("example.xlsx") as writer:
    df.to_excel(writer, sheet_name="MySheet")
    auto_adjust_xlsx_column_width(df, writer, sheet_name="MySheet", margin=0)

Note that the algorithm currently tends to oversize the columns a bit, but in most cases, every type of column will fit.

Posted by Uli Köhler in pandas, Python

How to fix Windows “echo $PATH” empty result

When you try to run

echo $PATH

you will always get an empty result.

Instead, if you are in cmd, use

echo %PATH%

but if you are using PowerShell, you need to use

$env:PATH

 

Posted by Uli Köhler in PowerShell, Windows

AWS / Wasabi S3 policy to allow a single user complete access to a single bucket

Use this S3 policy in your bucket configuration to allow a single user (identified by its ARN) complete access to a single bucket:

{
  "Id": "MyBucketPolicy",
  "Statement": [
    {
      "Sid": "AllAccess",
      "Action": "s3:*",
      "Effect": "Allow",
      "Resource": [
         "arn:aws:s3:::my-bucket-name",
         "arn:aws:s3:::my-bucket-name/*"
      ],
      "Principal": {"AWS":["arn:aws:iam::100000012345:user/MyUser"]}
    }
  ]
}

Replace both instances of arn:aws:s3:::my-bucket-name by your bucket ARN and replace arn:aws:iam::100000012345:user/MyUser by your user’s ARN. You can also add multiple user ARNs to "Principal": {"AWS": ... }.

Posted by Uli Köhler in Data science

How to allow your EtherPad to be included in IFrames using nginx

In our previous post A modern Docker-Compose config for Etherpad using nginx as reverse proxy we showed how to create a simple, reliable Etherpad installation.

However, if you want to include Etherpads on external websites, you’ll see connection refused errors like

Refused to display 'https://etherpad.nemeon.eu/p/Test123' in a frame because it set 'X-Frame-Options' to 'sameorigin'.

and the Etherpad iframe won’t load.

In order to fix this, we’ll add a line to the nginx config in . Using this approach, you’ll need to list all the domains that are allowed to include the Etherpad (https://gather.town in this example).

The line to add is

add_header "X-Frame-Options" "Allow-From https://gather.town";

which needs to be added inside the location / { ... } block. Full example for the location / block:

location / {
    proxy_pass http://localhost:17201/;
    proxy_http_version 1.1;
    proxy_set_header Upgrade $http_upgrade;
    proxy_set_header Connection "Upgrade";
    proxy_redirect default;
    add_header "X-Frame-Options" "Allow-From https://gather.town";
}

Full nginx config example:

server {
    server_name  etherpad.mydomain.de;
    access_log off;
    error_log /var/log/nginx/etherpad.mydomain.de-error.log;

    location / {
        proxy_pass http://localhost:17201/;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "Upgrade";
        proxy_redirect default;
        add_header "X-Frame-Options" "Allow-From https://gather.town";
    }

    listen [::]:443 ssl; # managed by Certbot
    ssl_certificate /etc/letsencrypt/live/etherpad.mydomain.de/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/etherpad.mydomain.de/privkey.pem; # managed by Certbot
    include /etc/letsencrypt/options-ssl-nginx.conf; # managed by Certbot

}

server {
    if ($host = etherpad.mydomain.de) {
        return 301 https://$host$request_uri;
    } # managed by Certbot


    listen [::]:80; # managed by Certbot
    server_name  etherpad.mydomain.de;
    return 404; # managed by Certbot
}

How it works

The etherpad backend, which is reverse-proxied inside nginx, will add a X-Frame-Options: sameorigin header, effectively disallowing iframes from other domains. Using the add_header clause, nginx will overwrite this value with Allow-From https://gather.town. The browser will only see Allow-From https://gather.town, allowing iframe inclusion from the listed domains.

Posted by Uli Köhler in Networking, nginx