Check file size before downloading it with Python

Published on Oct. 05, 2022

If you're yolo-ing on the web and downloading a lot of content, especially arbitrary media files using a crawler, it might be useful to first check the mimetype & filesize before downloading it. To do this with Python's requests module, you'll have to set stream=True and examine the headers for size & mime type. Following that, you can retrieve the content. More specifically, 'Content-Length' gives the file size in bytes while 'Content-type' gives the mime type (not always reliable). Here's a quick example.

import requests

MAX_SIZE = 2**20
url = ""
resp = requests.get(url, stream=True)
if resp.headers.get("Content-Type", "") == "image/jpeg" and int(resp.headers.get("Content-length")) < MAX_SIZE:
    content = resp.content
    with open("image.jpg", 'wb') as f: