S3

How to S3 concepts relate to standard filesystem concepts: Access keys, objects, …

The following mapping is often useful

  • Objects are essentially files
  • Object keys are filenames
  • An Access Key is essentially a username used for access to the S3 storage
  • A Secret Key is essentially a password for a given access key = username
  • Prefixes are folders
  • region is conceptually a fileserver although in practice it consists of multiple servers linked together
Posted by Uli Köhler in S3

How to sort files on S3 by timestamp in filename using boto3 & Python

Let’s assume we have backup objects in an S3 directory like:

production-backup-2022-03-29_14-40-16.xz
production-backup-2022-03-29_14-50-16.xz
production-backup-2022-03-29_15-00-03.xz
production-backup-2022-03-29_15-10-04.xz
production-backup-2022-03-29_15-20-06.xz
production-backup-2022-03-29_15-30-06.xz
production-backup-2022-03-29_15-40-00.xz
production-backup-2022-03-29_15-50-07.xz
production-backup-2022-03-29_16-00-06.xz
production-backup-2022-03-29_16-10-12.xz
production-backup-2022-03-29_16-20-18.xz
production-backup-2022-03-29_16-30-18.xz
production-backup-2022-03-29_16-40-00.xz
production-backup-2022-03-29_16-50-09.xz
production-backup-2022-03-29_17-00-18.xz
production-backup-2022-03-29_17-10-13.xz
production-backup-2022-03-29_17-20-18.xz
production-backup-2022-03-29_17-30-18.xz
production-backup-2022-03-29_17-40-06.xz
production-backup-2022-03-29_17-50-21.xz
production-backup-2022-03-29_18-00-06.xz

And we want to identify the newest one. Often in these situations, you can’t really rely on modification timestamps as these can change when syncing old files or when changing folder structures or names.

Hence the best way is to rely on the timestamp from the filename as a reference point. The date timestamp we’re using here is based on our post How to generate filename containing date & time on the command line ; if you’re using a different object key format, you might need to adjust the date_regex accordingly.

The following example script iterates all objects within a specific S3 folder, sorting them by the timestamp from the filename and choses the latest one, downloading it from S3 to the local filesystem.

This script is based on a few of our previous posts, including:

#!/usr/bin/env python3
import boto3
import re
import os.path
from collections import namedtuple
from datetime import datetime

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://minio.mydomain.com',
    aws_access_key_id = 'my-access-key',
    aws_secret_access_key = 'my-password'
)

# Get bucket object
backups = s3.Bucket('mfwh-backup')

date_regex = re.compile(r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})_(?P<hour>\d{2})-(?P<minute>\d{2})-(?P<second>\d{2})")

DatedObject =  namedtuple("DatedObject", ["Date", "Object"])
entries = []
# Iterate over objects in bucket
for obj in backups.objects.filter(Prefix="production/"):
    date_match = date_regex.search(obj.key)
    # Ignore other files (without date stamp) if any
    if date_match is None:
        continue
    dt = datetime(year=int(date_match.group("year")), month=int(date_match.group("month")),
        day=int(date_match.group("day")), hour=int(date_match.group("hour")), minute=int(date_match.group("minute")),
        second=int(date_match.group("second")))
    entries.append(DatedObject(dt, obj))
# Sort entries by date
entries.sort(key=lambda entry: entry.Date)

newest_date, newest_obj = entries[-1]
#print(f"Downloading {newest_obj.key} from {newest_date.isoformat()}")
filename = os.path.basename(newest_obj.key)

with open(filename, "wb") as outfile:
    backups.download_fileobj(newest_obj.key, outfile)

# Print filename for automation purposes
print(filename)
Posted by Uli Köhler in S3

A working Traefik & docker-compose minio setup with console

The following config works by using two domains: minio.mydomain.com and console.minio.mydomain.com.

For the basic Traefik setup this is based on, see Simple Traefik docker-compose setup with Lets Encrypt Cloudflare DNS-01 & TLS-ALPN-01 & HTTP-01 challenges. Regarding this setup, the important part is to enabled the docker autodiscovery and defining the certificate resolve (we’re using the ALPN resolver).

Be sure to choose a random MINIO_ROOT_PASSWORD!

version: '3.5'
services:
   minio:
       image: quay.io/minio/minio:latest
       command: server --console-address ":9001" /data
       volumes:
          - ./data:/data
          - ./config:/root/.minio
       environment:
          - MINIO_ROOT_USER=minioadmin
          - MINIO_ROOT_PASSWORD=uikui5choRith0ZieV2zohN5aish5r
          - MINIO_DOMAIN=minio.mydomain.com
          - MINIO_SERVER_URL=https://minio.mydomain.com
          - MINIO_BROWSER_REDIRECT_URL=https://console.minio.mydomain.com
       labels:
          - "traefik.enable=true"
          # Console
          - "traefik.http.routers.minio-console.rule=Host(`console.minio.mydomain.com`)"
          - "traefik.http.routers.minio-console.entrypoints=websecure"
          - "traefik.http.routers.minio-console.tls.certresolver=alpn"
          - "traefik.http.routers.minio-console.service=minio-console"
          - "traefik.http.services.minio-console.loadbalancer.server.port=9001"
          # APi
          - "traefik.http.routers.minio.rule=Host(`minio.mydomain.com`)"
          - "traefik.http.routers.minio.entrypoints=websecure"
          - "traefik.http.routers.minio.tls.certresolver=alpn"
          - "traefik.http.routers.minio.service=minio"
          - "traefik.http.services.minio.loadbalancer.server.port=9000"

 

Posted by Uli Köhler in Container, Docker, S3, Traefik

How to view MinIO request logs for debugging

Use the minio client mc like this:

mc admin trace myminio

where myminio is an alias (URL + access key + secret key) which you can setup using mc alias ....

This will show output like

2022-03-27T18:22:22:000 [403 Forbidden] s3.GetObject minio.haar.techoverflow.net/api/v1/login 95.114.116.235    5.488ms      ↑ 273 B ↓ 634 B
2022-03-27T18:22:23:000 [403 Forbidden] s3.ListObjectsV1 minio.haar.techoverflow.net/login 95.114.116.235    3.677ms      ↑ 320 B ↓ 584 B
2022-03-27T18:24:19:000 [200 OK] s3.GetBucketLocation minio.haar.techoverflow.net/mybucket/?location=  192.168.192.2     6.089ms      ↑ 211 B ↓ 444 B
2022-03-27T18:24:19:000 [200 OK] s3.GetBucketLocation minio.haar.techoverflow.net/mybucket/?location=  192.168.192.2     256µs       ↑ 211 B ↓ 444 B
2022-03-27T18:24:19:000 [200 OK] s3.GetBucketLocation minio.haar.techoverflow.net/mybucket/?location=  192.168.192.2     251µs       ↑ 211 B ↓ 444 B
2022-03-27T18:24:19:000 [200 OK] s3.GetBucketVersioning minio.haar.techoverflow.net/mybucket/?versioning=  192.168.192.2     407µs       ↑ 211 B ↓ 414 B
2022-03-27T18:24:19:000 [404 Not Found] s3.GetBucketObjectLockConfig minio.haar.techoverflow.net/mybucket/?object-lock=  192.168.192.2     519µs       ↑ 211 B ↓ 663 B
2022-03-27T18:24:19:000 [200 OK] s3.GetBucketLocation minio.haar.techoverflow.net/mybucket/?location=  192.168.192.2     269µs       ↑ 211 B ↓ 444 B
2022-03-27T18:24:19:000 [404 Not Found] s3.GetBucketPolicy minio.haar.techoverflow.net/mybucket/?policy=  192.168.192.2     223µs       ↑ 211 B ↓ 621 B
2022-03-27T18:24:19:000 [404 Not Found] s3.GetBucketTagging minio.haar.techoverflow.net/mybucket/?tagging=  192.168.192.2     284µs       ↑ 211 B ↓ 608 B
2022-03-27T18:24:19:000 [200 OK] s3.ListObjectsV2 minio.haar.techoverflow.net/mybucket/?delimiter=%2F&encoding-type=url&fetch-owner=true&list-type=2&prefix=  192.168.192.2     516.96ms     ↑ 211 B ↓ 1.7 KiB
2022-03-27T18:24:20:000 [200 OK] s3.GetBucketLocation minio.haar.techoverflow.net/mybucket/?location=  192.168.192.2     270µs       ↑ 211 B ↓ 444 B
2022-03-27T18:24:20:000 [200 OK] s3.ListObjectsV2 minio.haar.techoverflow.net/mybucket/?delimiter=%2F&encoding-type=url&fetch-owner=true&list-type=2&prefix=  192.168.192.2     45.061ms

If you want even more verbose output, use

mc admin trace -v myminio

This will log the entire HTTP request:

minio.mydomain.com [REQUEST s3.GetBucketLocation] [2022-03-27T18:25:20:000] [Client IP: 192.168.192.2]
minio.mydomain.com GET /mybucket/?location=
minio.mydomain.com Proto: HTTP/1.1
minio.mydomain.com Host: minio.mydomain.com
minio.mydomain.com X-Forwarded-Host: minio.mydomain.com
minio.mydomain.com X-Amz-Content-Sha256: UNSIGNED-PAYLOAD
minio.mydomain.com X-Amz-Date: 20220327T162520Z
minio.mydomain.com X-Forwarded-Port: 443
minio.mydomain.com X-Forwarded-Proto: https
minio.mydomain.com X-Forwarded-Server: MyVM
minio.mydomain.com Authorization: AWS4-HMAC-SHA256 Credential=GFAHJAODMI71TXAFCXZW/20220327/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date;x-amz-security-token, Signature=e1edcc3fb0d2130573f7f6633f9f9130810ee0cebcff3359312084c168f2d428
minio.mydomain.com User-Agent: MinIO (linux; amd64) minio-go/v7.0.23
minio.mydomain.com Content-Length: 0
minio.mydomain.com X-Amz-Security-Token: eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJhY2Nlc3NLZXkiOiJHRkFISkFPRE1JNzFUWEFGQ1haVyIsImV4cCI6MTY0ODQwMTQ0OSwicGFyZW50IjoibWluaW9hZG1pbiJ9.ZiuFcseCRRHOmxFs6j6H6nePV6kt9qBnOJESMCIZ-XiPaQrPm5kMlYHGR2zHOfAxf5EUAX3cN8CFbw9BBAQ-2g
minio.mydomain.com Accept-Encoding: gzip
minio.mydomain.com X-Forwarded-For: 192.168.192.2
minio.mydomain.com X-Real-Ip: 192.168.192.2
minio.mydomain.com 
minio.mydomain.com [RESPONSE] [2022-03-27T18:25:20:000] [ Duration 2.771ms  ↑ 211 B  ↓ 444 B ]
minio.mydomain.com 200 OK
minio.mydomain.com X-Amz-Request-Id: 16E04989FD22A42E
minio.mydomain.com X-Xss-Protection: 1; mode=block
minio.mydomain.com Accept-Ranges: bytes
minio.mydomain.com Content-Length: 128
minio.mydomain.com Content-Security-Policy: block-all-mixed-content
minio.mydomain.com Content-Type: application/xml
minio.mydomain.com Vary: Origin,Accept-Encoding
minio.mydomain.com Server: MinIO
minio.mydomain.com Strict-Transport-Security: max-age=31536000; includeSubDomains
minio.mydomain.com X-Content-Type-Options: nosniff
minio.mydomain.com <?xml version="1.0" encoding="UTF-8"?>
<LocationConstraint xmlns="http://s3.amazonaws.com/doc/2006-03-01/"></LocationConstraint>

 

Posted by Uli Köhler in S3

How to download Wasabi/S3 object to string/bytes using boto3 in Python

You can use io.BytesIO to store the content of an S3 object in memory and then convert it to bytes which you can then decode to a str. The following example downloads myfile.txt into memory:

# Download to file
buf = io.BytesIO()
my_bucket.download_fileobj("myfile.txt", buf)
# Get file content as bytes
filecontent_bytes = buf.getvalue()
# ... or convert to string
filecontent_str = buf.getvalue().decode("utf-8")

Full example

import boto3
import io

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
my_bucket = s3.Bucket('boto-test')

# Download to file
buf = io.BytesIO()
my_bucket.download_fileobj("myfile.txt", buf)
# Get file content as bytes
filecontent_bytes = buf.getvalue()
# ... or convert to string
filecontent_str = buf.getvalue().decode("utf-8")

print(filecontent_str)

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL instead of https://s3.eu-central-1.wasabisys.com.

Posted by Uli Köhler in Python, S3

How to upload string as Wasabi/S3 object using boto3 in Python

In order to upload a Python string like

my_string = "This shall be the content for a file I want to create on an S3-compatible storage"

to an S3-compatible storage like Wasabi or Amazon S3, you need to encode it using .encode("utf-8") and then wrap it in an io.BytesIO object:

my_bucket.upload_fileobj(io.BytesIO(my_string.encode("utf-8")), "myfile.txt")

Full example:

import boto3
import io

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
my_bucket = s3.Bucket('boto-test')

# Upload string to file
my_string = "This shall be the content for a file I want to create on an S3-compatible storage"

my_bucket.upload_fileobj(io.BytesIO(my_string.encode("utf-8")), "myfile.txt")

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL instead of https://s3.eu-central-1.wasabisys.com.

Posted by Uli Köhler in Python, S3

How to filter for objects in a given S3 directory using boto3

Using boto3, you can filter for objects in a given bucket by directory by applying a prefix filter.

Instead of iterating all objects using

for obj in my_bucket.objects.all():
    pass # ...

(see How to use boto3 to iterate ALL objects in a Wasabi / S3 bucket in Python for a full example)

you can apply a prefix filter using

for obj in my_bucket.objects.filter(Prefix="MyDirectory/"):
    print(obj)

Don’t forget the trailing / for the prefix argument ! Just using filter(Prefix="MyDirectory") without a trailing slash will also match e.g. MyDirectoryFileList.txt.

This complete example prints the object description for every object in the 10k-Test-Objects directory (from our post on How to use boto3 to create a lot of test files in Wasabi / S3 in Python).

import boto3

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
my_bucket = s3.Bucket('boto-test')

# Iterate over objects in bucket
for obj in my_bucket.objects.filter(Prefix="MyDirectory"):
    print(obj)

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL instead of https://s3.eu-central-1.wasabisys.com.

Example output:

s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/10.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/100.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1000.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/10000.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1001.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1002.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1003.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1004.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1005.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1006.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1007.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1008.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1009.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/101.txt')
[...]

 

Posted by Uli Köhler in Python, S3

How to use boto3 to iterate ALL objects in a Wasabi / S3 bucket in Python

This snippet shows you how to iterate over all objects in a bucket:

import boto3

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
my_bucket = s3.Bucket('boto-test')

# Iterate over objects in bucket
for obj in my_bucket.objects.all():
    print(obj)

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL instead of https://s3.eu-central-1.wasabisys.com.

Example output:

s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/10.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/100.txt')
[...]

 

Posted by Uli Köhler in Python, S3

How to use boto3 to create a lot of test files in Wasabi / S3 in Python

The following example code creates 10000 test files on Wasabi / S3. It is based on How to use concurrent.futures map with a tqdm progress bar:

import boto3
import concurrent.futures
executor = concurrent.futures.ThreadPoolExecutor(64)

from tqdm import tqdm
import concurrent.futures
def tqdm_parallel_map(executor, fn, *iterables, **kwargs):
    """
    Equivalent to executor.map(fn, *iterables),
    but displays a tqdm-based progress bar.
    
    Does not support timeout or chunksize as executor.submit is used internally
    
    **kwargs is passed to tqdm.
    """
    futures_list = []
    for iterable in iterables:
        futures_list += [executor.submit(fn, i) for i in iterable]
    for f in tqdm(concurrent.futures.as_completed(futures_list), total=len(futures_list), **kwargs):
        yield f.result()

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
boto_test_bucket = s3.Bucket('boto-test')

def create_s3_object(i, directory):
    # Create test data
    buf = io.BytesIO()
    buf.write(f"{i}".encode())
    # Reset read pointer. DOT NOT FORGET THIS, else all uploaded files will be empty!
    buf.seek(0)

    # Upload the file
    boto_test_bucket.upload_fileobj(buf, f"{directory}/{i}.txt")

for _ in tqdm_parallel_map(executor, lambda i: create_s3_object(i, directory="10k-Test-Objects"), range(1, 10001)):
    pass

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL at https://s3.eu-central-1.wasabisys.com.

Note that running this script, especially when creating lots of test files, will send a lot of requests to your S3 provider and, depending on what plan you are using, these requests might be expensive. Wasabi, for example, does not charge for requests but charges for storage (with a minimum of 1TB storage per month being charged, at the time of writing this).

Posted by Uli Köhler in Python, S3

How to use boto3 to upload BytesIO to Wasabi / S3 in Python

This snippet provides a concise example on how to upload a io.BytesIO() object to

import boto3

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
boto_test_bucket = s3.Bucket('boto-test')

# Create a test BytesIO we want to upload
import io
buf = io.BytesIO()
buf.write(b"Hello S3 world!")

# Reset read pointer. DOT NOT FORGET THIS, else all uploaded files will be empty!
buf.seek(0)
    
# Upload the file. "MyDirectory/test.txt" is the name of the object to create
boto_test_bucket.upload_fileobj(buf, "MyDirectory/test.txt")

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL at https://s3.eu-central-1.wasabisys.com.

Also don’t forget

buf.seek(0)

or your uploaded files will be empty.

 

Posted by Uli Köhler in Python, S3

How to use boto3 to upload file to Wasabi / S3 in Python

Using boto to upload data to Wasabi is pretty simple, but not well-documented.

import boto3

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
boto_test_bucket = s3.Bucket('boto-test')

# Create a test file we want to upload
with open("upload-test.txt", "w") as outfile:
    outfile.write("Hello S3!")
    
# Upload the file. "MyDirectory/test.txt" is the name of the object to create
boto_test_bucket.upload_file("upload-test.txt", "MyDirectory/test.txt")

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL at https://s3.eu-central-1.wasabisys.com.

Posted by Uli Köhler in Python, S3