Let’s assume we have backup objects in an S3 directory like:
production-backup-2022-03-29_14-40-16.xz production-backup-2022-03-29_14-50-16.xz production-backup-2022-03-29_15-00-03.xz production-backup-2022-03-29_15-10-04.xz production-backup-2022-03-29_15-20-06.xz production-backup-2022-03-29_15-30-06.xz production-backup-2022-03-29_15-40-00.xz production-backup-2022-03-29_15-50-07.xz production-backup-2022-03-29_16-00-06.xz production-backup-2022-03-29_16-10-12.xz production-backup-2022-03-29_16-20-18.xz production-backup-2022-03-29_16-30-18.xz production-backup-2022-03-29_16-40-00.xz production-backup-2022-03-29_16-50-09.xz production-backup-2022-03-29_17-00-18.xz production-backup-2022-03-29_17-10-13.xz production-backup-2022-03-29_17-20-18.xz production-backup-2022-03-29_17-30-18.xz production-backup-2022-03-29_17-40-06.xz production-backup-2022-03-29_17-50-21.xz production-backup-2022-03-29_18-00-06.xz
And we want to identify the newest one. Often in these situations, you can’t really rely on modification timestamps as these can change when syncing old files or when changing folder structures or names.
Hence the best way is to rely on the timestamp from the filename as a reference point. The date timestamp we’re using here is based on our post How to generate filename containing date & time on the command line ; if you’re using a different object key format, you might need to adjust the date_regex
accordingly.
The following example script iterates all objects within a specific S3 folder, sorting them by the timestamp from the filename and choses the latest one, downloading it from S3 to the local filesystem.
This script is based on a few of our previous posts, including:
- How to download Wasabi/S3 object to file using boto3 in Python
- How to filter for objects in a given S3 directory using boto3
#!/usr/bin/env python3 import boto3 import re import os.path from collections import namedtuple from datetime import datetime # Create connection to Wasabi / S3 s3 = boto3.resource('s3', endpoint_url = 'https://minio.mydomain.com', aws_access_key_id = 'my-access-key', aws_secret_access_key = 'my-password' ) # Get bucket object backups = s3.Bucket('mfwh-backup') date_regex = re.compile(r"(?P<year>\d{4})-(?P<month>\d{2})-(?P<day>\d{2})_(?P<hour>\d{2})-(?P<minute>\d{2})-(?P<second>\d{2})") DatedObject = namedtuple("DatedObject", ["Date", "Object"]) entries = [] # Iterate over objects in bucket for obj in backups.objects.filter(Prefix="production/"): date_match = date_regex.search(obj.key) # Ignore other files (without date stamp) if any if date_match is None: continue dt = datetime(year=int(date_match.group("year")), month=int(date_match.group("month")), day=int(date_match.group("day")), hour=int(date_match.group("hour")), minute=int(date_match.group("minute")), second=int(date_match.group("second"))) entries.append(DatedObject(dt, obj)) # Sort entries by date entries.sort(key=lambda entry: entry.Date) newest_date, newest_obj = entries[-1] #print(f"Downloading {newest_obj.key} from {newest_date.isoformat()}") filename = os.path.basename(newest_obj.key) with open(filename, "wb") as outfile: backups.download_fileobj(newest_obj.key, outfile) # Print filename for automation purposes print(filename)