Hacking Pylons for handling large file upload
features a customized Cascade middleware which intentionally does not copy the POSTed data to a tempfile, thus allowing cgi.maxlen to raise an error. In my experimentation with the Paste server, I have observed that this approach does not actually close out the client connection when a too-large body is received. Additionally, it relies upon the internals of Cascade and also the "cgi.maxlen" variable which is not documented, and of course is not compatible with middleware that reads the POST body since it may have already been consumed by a previously cascaded app.
After some conversations on the Paste list, it was instead recommended to simply reject requests based on the "Content-length" argument. While this at first seemed odd, as an attacker could simply omit or forge this value, some perusal of the inner workings of Paste server revealed that the LimitedLengthFile applied to the input guarantees that only "Content-length" bytes will be read. So you can in fact rely upon it to guard the size of incoming content.
So the solution is simple:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19 | from webob import Request
from webob.exc import HTTPBadRequest
class LimitUploadSize(object):
def __init__(self, app, size):
self.app = app
self.size = size
def __call__(self, environ, start_response):
req = Request(environ)
if req.method=='POST'
len = req.headers.get('Content-length')
if not len:
return HTTPBadRequest("No content-length header specified")(environ, start_response)
elif int(len) > self.size:
return HTTPBadRequest("POST body exceeds maximum limits")(environ, start_response)
resp = req.get_response(self.app)
return resp(environ, start_response)
|
Simply place this at the very start of the middleware chain, i.e. at the bottom of make_app() in middleware.py:
1
2 | app = LimitUploadSize(app, 2000000)
return app
|
In my tests with this approach, uploading a very large file errors out immediately; its clear that as soon as the server sees the invalid header, the client is disconnected and no data is transmitted.
In my own application I've customized LimitUploadSize further to work conditionally, based on the login credentials of the user. Beaker's SessionMiddleware, at least when used with cookie-based sessions (which is what you should be using), works fine if you move it all the way to the top of the middleware chain:
1
2
3 | app = LimitUploadSize(app, 2000000)
app = SessionMiddleware(app, config)
return app
|
LimitUploadSize can then check for authentication like:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18 | class LimitUploadSize(object):
def __init__(self, app, size):
self.app = app
self.size = size
def __call__(self, environ, start_response):
req = Request(environ)
if req.method=='POST':
session = environ['beaker.session']
if "login_token" not in session:
len = req.headers.get('Content-length')
if not len:
return HTTPBadRequest("No content-length header specified")(environ, start_response)
elif int(len) > self.size:
return HTTPBadRequest("POST body exceeds maximum limits")(environ, start_response)
resp = req.get_response(self.app)
return resp(environ, start_response)
|
Customizations like this are what make the WSGI-level upload size limiter a better approach than even Apache's LimitRequestBody directive.
Just remember that when using an asynchronous web server like nginx/lighttpd as a reverse proxy, it will gather the request from the client, then send it to your web app all in one go.
This means that if you want to use the limit to stop incoming requests from taking too long if they are too large and/or decrease your incoming bandwidth it has to be done at the nginx/lighttpd level.