Check file size before downloading it with Python


If you're yolo-ing on the web and downloading a lot of content, especially arbitrary media files using a crawler, it might be useful to first check the mimetype & filesize before downloading it.

To do this with Python's requests module, you'll have to set stream=True and examine the headers for size & mime type. Following that, you can retrieve the content.

More specifically, 'Content-Length' gives the file size in bytes while 'Content-type' gives the mime type (not always reliable). Here's a quick example.

import requests

MAX_SIZE = 2**20
url = "https://i.imgur.com/AD3MbBi.jpeg"
resp = requests.get(url, stream=True)

if all(
    resp.headers.get("Content-Type", "") == "image/jpeg", 
    int(resp.headers.get("Content-length")) < MAX_SIZE
):
    content = resp.content
    with open("image.jpg", 'wb') as f:
        f.write(content)

References