S3

C++ S3 GetObject minimal streaming example using minio-cpp

#include <client.h>
#include <iostream>

using std::cout, std::endl;

int main(int argc, char* argv[]) {
  // Create S3 base URL.
  minio::s3::BaseUrl base_url("minio.mydomain.com");

  // Create credential provider.
  minio::creds::StaticProvider provider(
      "my-access-key", "my-secret-key");

  // Create S3 client.
  minio::s3::Client client(base_url, &provider);
  std::string bucket_name = "my-bucket";

  // Build arguments object
  minio::s3::GetObjectArgs args;
  args.bucket = bucket_name;
  args.object = "my-object.txt";
  args.datafunc = [](minio::http::DataFunctionArgs args) -> bool {
    // This function will be called for every data chunk of the object
    cout << args.datachunk;
    return true;
  };

  // Perform the request (calling datafunc for every chunk)
  minio::s3::GetObjectResponse resp = client.GetObject(args);
  // Handle error (if any)
  if (resp) {
    cout << endl // end line after file content
         << "data of my-object is received successfully" << endl;
  } else {
    cout << "Error during GetObject(): " << resp.Error().String() << endl;
  }

  return EXIT_SUCCESS;
}

 

Posted by Uli Köhler in C/C++, S3

C++ S3 ListObjects minimal example using minio-cpp

#include <client.h>

int main(int argc, char* argv[]) {
  // Create S3 base URL.
  minio::s3::BaseUrl base_url("minio.mydomain.com");

  // Create credential provider.
  minio::creds::StaticProvider provider(
      "my_access_key", "my_secret_key");

  // Create S3 client.
  minio::s3::Client client(base_url, &provider);
  std::string bucket_name = "my-bucket";

  minio::s3::ListObjectsArgs args;
  args.bucket = bucket_name;
  // Optional prefix filter
  args.prefix = "folder/";

  minio::s3::ListObjectsResult result = client.ListObjects(args);
  for (; result; result++) {
      minio::s3::Item item = *result;
      if (!item) {
        throw std::runtime_error("Error during ListObjects(): " + item.Error().String());
      }
      std::cout << item.name << std::endl;
  }

  return EXIT_SUCCESS;
}

 

Posted by Uli Köhler in C/C++, S3

How to fix boto3 upload_fileobj TypeError: Strings must be encoded before hashing

Problem:

You are trying to use boto3′ upload_fileobj() to upload a file to S3 storage using code such as

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://minio.mydomin.com',
    aws_access_key_id = 'ACCESS_KEY',
    aws_secret_access_key = 'SECRET_KEY'
)
# Get bucket object
my_bucket = s3.Bucket('my-bucket')
# Upload string to file
with open("testtext.txt", "r") as f:
    my_bucket.upload_fileobj(f, "test.txt")

But when you try to run it, you see the following stacktrace:

Traceback (most recent call last):
  File "/home/uli/dev/MyProject/put.py", line 13, in <module>
    my_bucket.upload_fileobj(f, "test.txt")
  File "/usr/local/lib/python3.10/dist-packages/boto3/s3/inject.py", line 678, in bucket_upload_fileobj
    return self.meta.client.upload_fileobj(
  File "/usr/local/lib/python3.10/dist-packages/boto3/s3/inject.py", line 636, in upload_fileobj
    return future.result()
  File "/usr/local/lib/python3.10/dist-packages/s3transfer/futures.py", line 103, in result
    return self._coordinator.result()
  File "/usr/local/lib/python3.10/dist-packages/s3transfer/futures.py", line 266, in result
    raise self._exception
  File "/usr/local/lib/python3.10/dist-packages/s3transfer/tasks.py", line 139, in __call__
    return self._execute_main(kwargs)
  File "/usr/local/lib/python3.10/dist-packages/s3transfer/tasks.py", line 162, in _execute_main
    return_value = self._main(**kwargs)
  File "/usr/local/lib/python3.10/dist-packages/s3transfer/upload.py", line 758, in _main
    client.put_object(Bucket=bucket, Key=key, Body=body, **extra_args)
  File "/usr/local/lib/python3.10/dist-packages/botocore/client.py", line 530, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/usr/local/lib/python3.10/dist-packages/botocore/client.py", line 933, in _make_api_call
    handler, event_response = self.meta.events.emit_until_response(
  File "/usr/local/lib/python3.10/dist-packages/botocore/hooks.py", line 416, in emit_until_response
    return self._emitter.emit_until_response(aliased_event_name, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/botocore/hooks.py", line 271, in emit_until_response
    responses = self._emit(event_name, kwargs, stop_on_response=True)
  File "/usr/local/lib/python3.10/dist-packages/botocore/hooks.py", line 239, in _emit
    response = handler(**kwargs)
  File "/usr/local/lib/python3.10/dist-packages/botocore/utils.py", line 3088, in conditionally_calculate_md5
    md5_digest = calculate_md5(body, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/botocore/utils.py", line 3055, in calculate_md5
    binary_md5 = _calculate_md5_from_file(body)
  File "/usr/local/lib/python3.10/dist-packages/botocore/utils.py", line 3068, in _calculate_md5_from_file
    md5.update(chunk)
TypeError: Strings must be encoded before hashing

Solution:

You need to open the file in binary mode  ("rb") . Instead of

with open("testtext.txt", "r") as f:

use

with open("testtext.txt", "rb") as f:

This will fix the issue.

Posted by Uli Köhler in Python, S3

Example of a AWS4 canonical request & string-to-sign

This canonical request has been extracted via boto3 source code modification.

HEAD
/my-bucket/folder/example-object.txt

host:minio.mydomain.com
x-amz-content-sha256:e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855
x-amz-date:20230608T220550Z

host;x-amz-content-sha256;x-amz-date
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

The corresponding string-to-sign is

AWS4-HMAC-SHA256
20230608T220550Z
20230608/us-east-1/s3/aws4_request
e2d4be537009ba634ecc7a2717df2d74e612c63d31cf4bd8cb94eaf43be3665b

Note that the checksum of the string-to-sign does not match since some details of the canonical request have been modified.

Posted by Uli Köhler in S3

How to make S3 HeadObject request using boto3

import boto3

# Create connection to S3
s3 = boto3.client('s3',
    endpoint_url = 'https://minio.mydomain.com',
    aws_access_key_id = 'VO5APZH2B2KS75GWORFQ',
    aws_secret_access_key = 'IBVCAVULO2CQTOQEE6VQ'
)

s3.head_object(
    Bucket='my-bucket',
    Key='folder/example-object.txt'
)

 

Posted by Uli Köhler in Python, S3

How to verify AWS Signature Version 4 implementations

You can use the Python botocore package which is a dependency of the boto3 AWS client in order to verify if your implementation produces correct HMAC signatures for a given string-to-sign.

In order to do this, we’ll use a fixed AmzDate i.e. timestamp and fixed (but random) access keys. The string to sign is also some random-ish string. The only thing that matters is that none of the random strings are empty and all values are the same for the verification path using botocore as they are for your own implementation.

After that, compare the output from the botocore implementation with your own custom implementation. While you might want to check your implementation with different values, in practice it works (maybe except for rare corner cases) if it works correctly for one string.

Verifying the outpt

from botocore.auth import SigV4Auth
from collections import namedtuple

# Data structures for isolated testing
Credentials = namedtuple('Credentials', ['access_key', 'secret_key'])
Request = namedtuple('Request', ['context'])

amzDate = "20130524T000000Z" # Fixed date for testing
signer = SigV4Auth(Credentials(
    access_key="GBWZ45MPRGGMO2JILBXA",
    secret_key="346NO6UJCAMHLHX4SMFA"
), "s3", "global")
signature = signer.signature("ThisStringWillBeSigned", Request(
    context={"timestamp": amzDate}
))
print(signature)

With the values given in this script, the output is

3be60989db53028ca485b46a07df9287a1731df74a234ea247a99febb7c2eb31

Verifying intermediary results

If the global result matches, you’re already finished. There is typically no need to check the intermediary results and input strings.

The SigV4Auth.signature() function doesn’t provide any way of accessing the intermediary results. However, we can just copy its source code to obtain the  relevant intermediaries and print those as hex:

secret_key="346NO6UJCAMHLHX4SMFA"
datestamp = "20130524"
region_name = "global"
service_name = "s3"
string_to_sign = "ThisStringWillBeSigned"

sign_input =  (f"AWS4{secret_key}").encode(), datestamp
k_date = signer._sign(*sign_input)
k_region = signer._sign(k_date, region_name)
k_service = signer._sign(k_region, service_name)
k_signing = signer._sign(k_service, 'aws4_request')
sign_result = signer._sign(k_signing, string_to_sign, hex=True)

print("Sign input: ", sign_input)
print("k_date: ", k_date.hex(), "of length: ", len(k_date))
print("k_region: ", k_region.hex(), "of length: ", len(k_region))
print("k_service: ", k_service.hex(), "of length: ", len(k_service))
print("k_signing: ", k_signing.hex(), "of length: ", len(k_signing))
print("sign_result: ", sign_result)

This prints:

Sign input:  (b'AWS4346NO6UJCAMHLHX4SMFA', '20130524')
k_date:  a788ed61da3106091ac303738fe248c3d391e851858d9b048d3fddf0494cac61 of length:  32
k_region:  90331d205578b73aeaf4ef9082cbb704111d29364dcae4d4405ddfefc4e6a8b0 of length:  32
k_service:  a0b2fb2efe1977349c647d28e86d373aaa67ca9f452c15c7cfbdb9a4fabd685b of length:  32
k_signing:  e02df2af0ce8890816c931c8e72168921f5f481dfbcaf92a35324b65fc322865 of length:  32
sign_result:  3be60989db53028ca485b46a07df9287a1731df74a234ea247a99febb7c2eb31

 

Posted by Uli Köhler in Python, S3

How to generate random AWS-like access key / secret key in Python

Typically, AWS access keys have a length of 20 characters from the following set: ABCDEFGHIJKLMNOPQRSTUVWXYZ234567 (also called uppercase base32 encoding)

The following Python code can be used to generate random AWS-like access keys & secret keys:

import base64
import secrets

def generate_random_base32_key(length=20):
    """
    Generate a random uppercase base32 string of the given length.
    """
    bytes_length = (length * 5) // 8
    random_bytes = secrets.token_bytes(bytes_length)
    base32_string = base64.b32encode(random_bytes).decode()[:length]
    return base32_string.upper()

# Usage example
random_access_key = generate_random_base32_key(20)
print(random_access_key)

Example output:

  • IBVCAVULO2CQTOQEE6VQ
  • 5ONXKHGNV3ILTX3BGJGA
  • 2B4QYQ2RFGWK2LSON4YQ
  • VO5APZH2B2KS75GWORFQ
  • SXCBFBDXLUNCJMCFM2SQ
Posted by Uli Köhler in Python, S3

How to capture rclone S3 requests via mitproxy

First, start mitmproxy:

./mitmproxy

Now, in your rclone command, add the following environment variable

https_proxy=http://localhost:8080

and add the following command line flag:

--no-check-certificate

Full example

https_proxy=http://localhost:8080 rclone copy myfile :s3:mybucket/myfolder/ --no-check-certificate --s3-endpoint https://minio.mydomain.com --s3-access-key-id my-access-key --s3-secret-access-key my-secret-key --s3-region global

 

Posted by Uli Köhler in S3

rclone S3 access with command line flags (without config file)

rclone lsd :s3:my-bucket/ --s3-endpoint https://minio.mydomain.com --s3-access-key-id my-access-key --s3-secret-access-key my-secret-key --s3-region global --log-level ERROR

The reason for --log-level ERROR is to suppress the following NOTICE message

2023/06/07 01:47:34 NOTICE: Config file "/home/uli/.config/rclone/rclone.conf" not found - using defaults

 

Posted by Uli Köhler in S3

How to install MinIO commander (mc) globally on Linux

sudo bash -c 'curl https://dl.min.io/client/mc/release/linux-amd64/mc > /usr/local/bin/mc'
sudo chmod a+x /usr/local/bin/mc

 

Posted by Uli Köhler in Linux, S3

How to S3 concepts relate to standard filesystem concepts: Access keys, objects, …

The following mapping is often useful

  • Objects are essentially files
  • Object keys are filenames
  • An Access Key is essentially a username used for access to the S3 storage
  • A Secret Key is essentially a password for a given access key = username
  • Prefixes are folders
  • region is conceptually a fileserver although in practice it consists of multiple servers linked together
Posted by Uli Köhler in S3

How to sort files on S3 by timestamp in filename using boto3 & Python

Let’s assume we have backup objects in an S3 directory like:

production-backup-2022-03-29_14-40-16.xz
production-backup-2022-03-29_14-50-16.xz
production-backup-2022-03-29_15-00-03.xz
production-backup-2022-03-29_15-10-04.xz
production-backup-2022-03-29_15-20-06.xz
production-backup-2022-03-29_15-30-06.xz
production-backup-2022-03-29_15-40-00.xz
production-backup-2022-03-29_15-50-07.xz
production-backup-2022-03-29_16-00-06.xz
production-backup-2022-03-29_16-10-12.xz
production-backup-2022-03-29_16-20-18.xz
production-backup-2022-03-29_16-30-18.xz
production-backup-2022-03-29_16-40-00.xz
production-backup-2022-03-29_16-50-09.xz
production-backup-2022-03-29_17-00-18.xz
production-backup-2022-03-29_17-10-13.xz
production-backup-2022-03-29_17-20-18.xz
production-backup-2022-03-29_17-30-18.xz
production-backup-2022-03-29_17-40-06.xz
production-backup-2022-03-29_17-50-21.xz
production-backup-2022-03-29_18-00-06.xz

And we want to identify the newest one. Often in these situations, you can’t really rely on modification timestamps as these can change when syncing old files or when changing folder structures or names.

Hence the best way is to rely on the timestamp from the filename as a reference point. The date timestamp we’re using here is based on our post How to generate filename containing date & time on the command line ; if you’re using a different object key format, you might need to adjust the date_regex accordingly.

The following example script iterates all objects within a specific S3 folder, sorting them by the timestamp from the filename and choses the latest one, downloading it from S3 to the local filesystem.

This script is based on a few of our previous posts, including:

#!/usr/bin/env python3
import boto3
import re
import os.path
from collections import namedtuple
from datetime import datetime

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://minio.mydomain.com',
    aws_access_key_id = 'my-access-key',
    aws_secret_access_key = 'my-password'
)

# Get bucket object
backups = s3.Bucket('mfwh-backup')

date_regex = re.compile(r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})_(?P<hour>\d{2})-(?P<minute>\d{2})-(?P<second>\d{2})")

DatedObject =  namedtuple("DatedObject", ["Date", "Object"])
entries = []
# Iterate over objects in bucket
for obj in backups.objects.filter(Prefix="production/"):
    date_match = date_regex.search(obj.key)
    # Ignore other files (without date stamp) if any
    if date_match is None:
        continue
    dt = datetime(year=int(date_match.group("year")), month=int(date_match.group("month")),
        day=int(date_match.group("day")), hour=int(date_match.group("hour")), minute=int(date_match.group("minute")),
        second=int(date_match.group("second")))
    entries.append(DatedObject(dt, obj))
# Sort entries by date
entries.sort(key=lambda entry: entry.Date)

newest_date, newest_obj = entries[-1]
#print(f"Downloading {newest_obj.key} from {newest_date.isoformat()}")
filename = os.path.basename(newest_obj.key)

with open(filename, "wb") as outfile:
    backups.download_fileobj(newest_obj.key, outfile)

# Print filename for automation purposes
print(filename)
Posted by Uli Köhler in S3

A working Traefik & docker-compose minio setup with console

Note: I have not updated this config to use the xl or xl-single storage backends, hence the version is locked at RELEASE.2022-10-24T18-35-07Z

The following config works by using two domains: minio.mydomain.com and console.minio.mydomain.com.

For the basic Traefik setup this is based on, see Simple Traefik docker-compose setup with Lets Encrypt Cloudflare DNS-01 & TLS-ALPN-01 & HTTP-01 challenges. Regarding this setup, the important part is to enabled the docker autodiscovery and defining the certificate resolve (we’re using the ALPN resolver).

Be sure to choose a random MINIO_ROOT_PASSWORD!

version: '3.5'
services:
   minio:
       image: quay.io/minio/minio:RELEASE.2022-10-24T18-35-07Z
       command: server --console-address ":9001" /data
       volumes:
          - ./data:/data
          - ./config:/root/.minio
       environment:
          - MINIO_ROOT_USER=minioadmin
          - MINIO_ROOT_PASSWORD=uikui5choRith0ZieV2zohN5aish5r
          - MINIO_DOMAIN=minio.mydomain.com
          - MINIO_SERVER_URL=https://minio.mydomain.com
          - MINIO_BROWSER_REDIRECT_URL=https://console.minio.mydomain.com
       labels:
          - "traefik.enable=true"
          # Console
          - "traefik.http.routers.minio-console.rule=Host(`console.minio.mydomain.com`)"
          - "traefik.http.routers.minio-console.entrypoints=websecure"
          - "traefik.http.routers.minio-console.tls.certresolver=alpn"
          - "traefik.http.routers.minio-console.service=minio-console"
          - "traefik.http.services.minio-console.loadbalancer.server.port=9001"
          # APi
          - "traefik.http.routers.minio.rule=Host(`minio.mydomain.com`)"
          - "traefik.http.routers.minio.entrypoints=websecure"
          - "traefik.http.routers.minio.tls.certresolver=alpn"
          - "traefik.http.routers.minio.service=minio"
          - "traefik.http.services.minio.loadbalancer.server.port=9000"

 

Posted by Uli Köhler in Container, Docker, S3, Traefik

How to view MinIO request logs for debugging

Use the minio client mc like this:

mc admin trace myminio

where myminio is an alias (URL + access key + secret key) which you can setup using mc alias ....

This will show output like

2022-03-27T18:22:22:000 [403 Forbidden] s3.GetObject minio.mydomain.com/api/v1/login 95.114.116.235    5.488ms      ↑ 273 B ↓ 634 B
2022-03-27T18:22:23:000 [403 Forbidden] s3.ListObjectsV1 minio.mydomain.com/login 95.114.116.235    3.677ms      ↑ 320 B ↓ 584 B
2022-03-27T18:24:19:000 [200 OK] s3.GetBucketLocation minio.mydomain.com/mybucket/?location=  192.168.192.2     6.089ms      ↑ 211 B ↓ 444 B
2022-03-27T18:24:19:000 [200 OK] s3.GetBucketLocation minio.mydomain.com/mybucket/?location=  192.168.192.2     256µs       ↑ 211 B ↓ 444 B
2022-03-27T18:24:19:000 [200 OK] s3.GetBucketLocation minio.mydomain.com/mybucket/?location=  192.168.192.2     251µs       ↑ 211 B ↓ 444 B
2022-03-27T18:24:19:000 [200 OK] s3.GetBucketVersioning minio.mydomain.com/mybucket/?versioning=  192.168.192.2     407µs       ↑ 211 B ↓ 414 B
2022-03-27T18:24:19:000 [404 Not Found] s3.GetBucketObjectLockConfig minio.mydomain.com/mybucket/?object-lock=  192.168.192.2     519µs       ↑ 211 B ↓ 663 B
2022-03-27T18:24:19:000 [200 OK] s3.GetBucketLocation minio.mydomain.com/mybucket/?location=  192.168.192.2     269µs       ↑ 211 B ↓ 444 B
2022-03-27T18:24:19:000 [404 Not Found] s3.GetBucketPolicy minio.mydomain.com/mybucket/?policy=  192.168.192.2     223µs       ↑ 211 B ↓ 621 B
2022-03-27T18:24:19:000 [404 Not Found] s3.GetBucketTagging minio.mydomain.com/mybucket/?tagging=  192.168.192.2     284µs       ↑ 211 B ↓ 608 B
2022-03-27T18:24:19:000 [200 OK] s3.ListObjectsV2 minio.mydomain.com/mybucket/?delimiter=%2F&encoding-type=url&fetch-owner=true&list-type=2&prefix=  192.168.192.2     516.96ms     ↑ 211 B ↓ 1.7 KiB
2022-03-27T18:24:20:000 [200 OK] s3.GetBucketLocation minio.mydomain.com/mybucket/?location=  192.168.192.2     270µs       ↑ 211 B ↓ 444 B
2022-03-27T18:24:20:000 [200 OK] s3.ListObjectsV2 minio.mydomain.com/mybucket/?delimiter=%2F&encoding-type=url&fetch-owner=true&list-type=2&prefix=  192.168.192.2     45.061ms

If you want even more verbose output, use

mc admin trace -v myminio

This will log the entire HTTP request:

minio.mydomain.com [REQUEST s3.GetBucketLocation] [2022-03-27T18:25:20:000] [Client IP: 192.168.192.2]
minio.mydomain.com GET /mybucket/?location=
minio.mydomain.com Proto: HTTP/1.1
minio.mydomain.com Host: minio.mydomain.com
minio.mydomain.com X-Forwarded-Host: minio.mydomain.com
minio.mydomain.com X-Amz-Content-Sha256: UNSIGNED-PAYLOAD
minio.mydomain.com X-Amz-Date: 20220327T162520Z
minio.mydomain.com X-Forwarded-Port: 443
minio.mydomain.com X-Forwarded-Proto: https
minio.mydomain.com X-Forwarded-Server: MyVM
minio.mydomain.com Authorization: AWS4-HMAC-SHA256 Credential=GFAHJAODMI71TXAFCXZW/20220327/us-east-1/s3/aws4_request, SignedHeaders=host;x-amz-content-sha256;x-amz-date;x-amz-security-token, Signature=e1edcc3fb0d2130573f7f6633f9f9130810ee0cebcff3359312084c168f2d428
minio.mydomain.com User-Agent: MinIO (linux; amd64) minio-go/v7.0.23
minio.mydomain.com Content-Length: 0
minio.mydomain.com X-Amz-Security-Token: eyJhbGciOiJIUzUxMiIsInR5cCI6IkpXVCJ9.eyJhY2Nlc3NLZXkiOiJHRkFISkFPRE1JNzFUWEFGQ1haVyIsImV4cCI6MTY0ODQwMTQ0OSwicGFyZW50IjoibWluaW9hZG1pbiJ9.ZiuFcseCRRHOmxFs6j6H6nePV6kt9qBnOJESMCIZ-XiPaQrPm5kMlYHGR2zHOfAxf5EUAX3cN8CFbw9BBAQ-2g
minio.mydomain.com Accept-Encoding: gzip
minio.mydomain.com X-Forwarded-For: 192.168.192.2
minio.mydomain.com X-Real-Ip: 192.168.192.2
minio.mydomain.com 
minio.mydomain.com [RESPONSE] [2022-03-27T18:25:20:000] [ Duration 2.771ms  ↑ 211 B  ↓ 444 B ]
minio.mydomain.com 200 OK
minio.mydomain.com X-Amz-Request-Id: 16E04989FD22A42E
minio.mydomain.com X-Xss-Protection: 1; mode=block
minio.mydomain.com Accept-Ranges: bytes
minio.mydomain.com Content-Length: 128
minio.mydomain.com Content-Security-Policy: block-all-mixed-content
minio.mydomain.com Content-Type: application/xml
minio.mydomain.com Vary: Origin,Accept-Encoding
minio.mydomain.com Server: MinIO
minio.mydomain.com Strict-Transport-Security: max-age=31536000; includeSubDomains
minio.mydomain.com X-Content-Type-Options: nosniff
minio.mydomain.com <?xml version="1.0" encoding="UTF-8"?>
<LocationConstraint xmlns="http://s3.amazonaws.com/doc/2006-03-01/"></LocationConstraint>

 

Posted by Uli Köhler in S3

How to download Wasabi/S3 object to string/bytes using boto3 in Python

You can use io.BytesIO to store the content of an S3 object in memory and then convert it to bytes which you can then decode to a str. The following example downloads myfile.txt into memory:

# Download to file
buf = io.BytesIO()
my_bucket.download_fileobj("myfile.txt", buf)
# Get file content as bytes
filecontent_bytes = buf.getvalue()
# ... or convert to string
filecontent_str = buf.getvalue().decode("utf-8")

Full example

import boto3
import io

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
my_bucket = s3.Bucket('boto-test')

# Download to file
buf = io.BytesIO()
my_bucket.download_fileobj("myfile.txt", buf)
# Get file content as bytes
filecontent_bytes = buf.getvalue()
# ... or convert to string
filecontent_str = buf.getvalue().decode("utf-8")

print(filecontent_str)

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL instead of https://s3.eu-central-1.wasabisys.com.

Posted by Uli Köhler in Python, S3

How to upload string as Wasabi/S3 object using boto3 in Python

In order to upload a Python string like

my_string = "This shall be the content for a file I want to create on an S3-compatible storage"

to an S3-compatible storage like Wasabi or Amazon S3, you need to encode it using .encode("utf-8") and then wrap it in an io.BytesIO object:

my_bucket.upload_fileobj(io.BytesIO(my_string.encode("utf-8")), "myfile.txt")

Full example:

import boto3
import io

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
my_bucket = s3.Bucket('boto-test')

# Upload string to file
my_string = "This shall be the content for a file I want to create on an S3-compatible storage"

my_bucket.upload_fileobj(io.BytesIO(my_string.encode("utf-8")), "myfile.txt")

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL instead of https://s3.eu-central-1.wasabisys.com.

Posted by Uli Köhler in Python, S3

How to filter for objects in a given S3 directory using boto3

Using boto3, you can filter for objects in a given bucket by directory by applying a prefix filter.

Instead of iterating all objects using

for obj in my_bucket.objects.all():
    pass # ...

(see How to use boto3 to iterate ALL objects in a Wasabi / S3 bucket in Python for a full example)

you can apply a prefix filter using

for obj in my_bucket.objects.filter(Prefix="MyDirectory/"):
    print(obj)

Don’t forget the trailing / for the prefix argument ! Just using filter(Prefix="MyDirectory") without a trailing slash will also match e.g. MyDirectoryFileList.txt.

This complete example prints the object description for every object in the 10k-Test-Objects directory (from our post on How to use boto3 to create a lot of test files in Wasabi / S3 in Python).

import boto3

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
my_bucket = s3.Bucket('boto-test')

# Iterate over objects in bucket
for obj in my_bucket.objects.filter(Prefix="MyDirectory"):
    print(obj)

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL instead of https://s3.eu-central-1.wasabisys.com.

Example output:

s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/10.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/100.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1000.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/10000.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1001.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1002.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1003.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1004.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1005.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1006.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1007.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1008.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1009.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/101.txt')
[...]

 

Posted by Uli Köhler in Python, S3

How to use boto3 to iterate ALL objects in a Wasabi / S3 bucket in Python

This snippet shows you how to iterate over all objects in a bucket:

import boto3

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
my_bucket = s3.Bucket('boto-test')

# Iterate over objects in bucket
for obj in my_bucket.objects.all():
    print(obj)

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL instead of https://s3.eu-central-1.wasabisys.com.

Example output:

s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/1.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/10.txt')
s3.ObjectSummary(bucket_name='boto-test', key='10k-Test-Objects/100.txt')
[...]

 

Posted by Uli Köhler in Python, S3

How to use boto3 to create a lot of test files in Wasabi / S3 in Python

The following example code creates 10000 test files on Wasabi / S3. It is based on How to use concurrent.futures map with a tqdm progress bar:

import boto3
import concurrent.futures
executor = concurrent.futures.ThreadPoolExecutor(64)

from tqdm import tqdm
import concurrent.futures
def tqdm_parallel_map(executor, fn, *iterables, **kwargs):
    """
    Equivalent to executor.map(fn, *iterables),
    but displays a tqdm-based progress bar.
    
    Does not support timeout or chunksize as executor.submit is used internally
    
    **kwargs is passed to tqdm.
    """
    futures_list = []
    for iterable in iterables:
        futures_list += [executor.submit(fn, i) for i in iterable]
    for f in tqdm(concurrent.futures.as_completed(futures_list), total=len(futures_list), **kwargs):
        yield f.result()

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
boto_test_bucket = s3.Bucket('boto-test')

def create_s3_object(i, directory):
    # Create test data
    buf = io.BytesIO()
    buf.write(f"{i}".encode())
    # Reset read pointer. DOT NOT FORGET THIS, else all uploaded files will be empty!
    buf.seek(0)

    # Upload the file
    boto_test_bucket.upload_fileobj(buf, f"{directory}/{i}.txt")

for _ in tqdm_parallel_map(executor, lambda i: create_s3_object(i, directory="10k-Test-Objects"), range(1, 10001)):
    pass

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL at https://s3.eu-central-1.wasabisys.com.

Note that running this script, especially when creating lots of test files, will send a lot of requests to your S3 provider and, depending on what plan you are using, these requests might be expensive. Wasabi, for example, does not charge for requests but charges for storage (with a minimum of 1TB storage per month being charged, at the time of writing this).

Posted by Uli Köhler in Python, S3

How to use boto3 to upload BytesIO to Wasabi / S3 in Python

This snippet provides a concise example on how to upload a io.BytesIO() object to

import boto3

# Create connection to Wasabi / S3
s3 = boto3.resource('s3',
    endpoint_url = 'https://s3.eu-central-1.wasabisys.com',
    aws_access_key_id = 'MY_ACCESS_KEY',
    aws_secret_access_key = 'MY_SECRET_KEY'
)

# Get bucket object
boto_test_bucket = s3.Bucket('boto-test')

# Create a test BytesIO we want to upload
import io
buf = io.BytesIO()
buf.write(b"Hello S3 world!")

# Reset read pointer. DOT NOT FORGET THIS, else all uploaded files will be empty!
buf.seek(0)
    
# Upload the file. "MyDirectory/test.txt" is the name of the object to create
boto_test_bucket.upload_fileobj(buf, "MyDirectory/test.txt")

Don’t forget to fill in MY_ACCESS_KEY and MY_SECRET_KEY. Depending on what region and what S3-compatible service you use, you might need to use another endpoint URL at https://s3.eu-central-1.wasabisys.com.

Also don’t forget

buf.seek(0)

or your uploaded files will be empty.

 

Posted by Uli Köhler in Python, S3