How to modify file inside a ZIP file using Python

Python provides the zipfile module to read and write ZIP files. Our previous posts Python example: List files in ZIP archive and Downloading & reading a ZIP file in memory using Python show how to list and read files inside a ZIP file.

In this example, we will show how to copy the files from one ZIP file to another and modify one of the files in the progress. This is often the case if you want to use ZIP file formats like ODT or LBX as templates, replacing parts of the text content of a file.

import zipfile

with zipfile.ZipFile(srcfile) as inzip, zipfile.ZipFile(dstfile, "w") as outzip:
    # Iterate the input files
    for inzipinfo in inzip.infolist():
        # Read input file
        with inzip.open(inzipinfo) as infile:
            if inzipinfo.filename == "test.txt":
                content = infile.read()
                # Modify the content of the file by replacing a string
                content = content.replace("abc", "123")
                # Write conte
                outzip.writestr(inzipinfo.filename, content)
            else: # Other file, dont want to modify => just copy it
                

After opening both the input file and the output ZIP using

with zipfile.ZipFile(srcfile) as inzip, zipfile.ZipFile(dstfile, "w") as outzip:

we iterate through all the files in the input ZIP file:

for inzipinfo in inzip.infolist():

In case we’ve encountered the file we want to modify, which is identified by it’s filename test.txt:

if inzipinfo.filename == "test.txt":

we read and modify the content ….

with inzip.open(inzipinfo) as infile:
    content = infile.read().replace("abc", "123")

… and write the modified content to the output ZIP:

outzip.writestr("test.txt", content)

Otherwise, if the current file is not the file we want to modify,  we just copy the file to the output ZIP using

outzip.writestr(inzipinfo.filename, infile.read())

Note that the algorithm will always .read() the file from the input ZIP, hence its entire content will be temporarily stored in memory. Therefore, it doesn’t work well for files which are large when uncompressed.