A proposal to change Pylons from a WSGI-based stack to a WebOb-based stack, and to encourage the same in other WSGI frameworks (especially our partners TurboGears, BFG, and Marco). It requires creating a WebOb stack structure, which doesn't currently exist.
Reasons
WSGI was a major catalyst for framework interoperability, but has not been able to make the Python 3 leap due to the changing nature of strings. Which parts of WSGI should be Unicode and which should be bytestrings? All socket I/O is bytestrings, but app developers prefer Unicode in many cases. Many users and middleware writers have been able to ignore the distinction because Python 2 automatically converts ASCII strings, but Python 3 does not. There is also the problem that some agents send HTTP headers with a missing or incorrect encoding. Simultaneously, some WSGI developers would like to simplify the protocol (the "WSGI 2" proposal), and some want to make it compatible with asynchronous servers. As of February 2010, these issues have remained for over a year without concensus.
The WSGI interface is also too low-level and complex for efficient component development. It was intentionally made a lowest-common denominator to allow existing software to adapt to it. It was intended mainly to connect applications (web frameworks) to HTTP servers. But it also enabled middleware (double-ended filters in the middle of the stack) to become unexpectedly popular. Different kinds of middleware developed (some that the application wasn't aware of; others that the application depended on) – this caused its own controversy which needn't concern us here. It was also discovered after the fact that implementing a WSGI-compliant module was more complex and failure-prone than anticipated.
WebOb covered over many of these problems by providing a higher-level API. It has become standard in several frameworks and middlewares. But you can't pass WebOb objects themselves between WSGI components, which means each level has to break down its data to a primitive level for I/O. The proposal is to create a higher-level stack based directly on WebOb.
Spec
The spec can be summed up by this Python code:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 | class WebOb(object): def __init__(self, environ): self.request = Request(environ) self.response = Response() class MyMiddleware(object): def __init__(self, app): self.app = app def __call__(self, webob): """I can modify ``webob.request`` and/or ``webob.response``. I return the same webob object I received. """ return self.app(webob) |
webob.Request and webob.Reponse exist already but the WebOb container does not.
The top-level server calls WebOb with a WSGI environ dict (all bytestrings). WebOb creates .request and .response attributes. The webob instance is passed through all middlewares and the application, and returns to the server. Any level may modify the request and response. The server receives the response and transmits it to the user.
PasteHTTPServer would spawn a clone that creates the webob and calls the next middleware. For other WSGI servers, a WSGI-WebOb adapter would do the same thing.
The Pylons middlewares would each add a class or function that implements the WebOb stack. (The existing WSGI classes would not be disturbed.) Pylons would adjust middleware.py to use these middlewares. Existing WSGI middlewares could be used with a WebOb-WSGI wrapper. (The opposite of the server's adapter, and the opposite of current practice where each middleware has to create a local Request & Response.)
The Pylons changes would be the application template, mainly middleware.py, the INI files, and setup.py. PasteHTTPServer/WebOb would presumably have a different entry point. New applications would require WebOb-aware versions of Paste, Beaker, and Routes.
PasteDeploy has a "loadwsgi" module with "loadserver" and "loadapp" functions. This is how it instantiates the server and app from the INI file. I guess these can be used as-is because there's nothing really WSGI-specific about them; they just load the two objects indicated and assume they're compatible with each other.
Open issues
We still have the same Unicode issues the Web-Sig has. But we have a smaller group and more common requirements, so maybe we can decide for ourselves more quickly.
It's worth distinguishing ``request.environ`` from the incoming ``environ`` dict. What we use internally and what we require from the server may be different. The ``WSGI`` environ includes the CGI variables, the body input, some extra context variables, and any other variables a higher level has added. A WebOb not restrained by WSGI could jettison the environ completely in favor of request attributes, but I doubt we're ready for something so radical. Plus, the ability to add arbitrary data to the environ (including function callbacks) has been a strength that we don't want to lose lightly. So, if we keep the environ with its same variables, we have to decide which ones to Unicode.
Alternatives: (A) Unicode all CGI variables except SCRIPT_NAME and PATH_INFO, with some fallback for encoding errors. (B) Keep them all bytestrings. Decoding could also be handled in the Request attributes/methods, leaving the environ as bytestrings. This would allow parallel methods to extract either Unicode or bytestrings. While most applications prefer Unicode, some middleware just passes them through or can go either way, and avoiding unnecessary conversions is more efficient. So I guess I favor leaving them as bytestrings.
The ``environ`` received by the server does not have to be a dict. It could be a dict of CGI variables plus separate arguments for the body input and other context variables, for instance. These would be passed to the WebOb constructor and passed through to the Request constructor. On the other hand, the traditional ``environ`` dict is what people are used to.
The question then becomes, are there particular ``environ`` keys that should be added, changed, or removed? One proposal has been to get the body input out of the environ. Another is to change the type of the body input. I'm not familiar with these issues, or the other proposals to change particular keys, so I have no comment.
Should components be allowed to replace the request, response, and webob objects with new instances? They would have to copy the environ, because it may contain keys they don't recognize but which are vital to other components. I don't see a particular need to allow replacing them. In that case, perhaps ``._call_`` doesn't need to return the webob it received because the caller already has a reference to it. On the other hand, would a multilevel stack work that way? I guess it would. But Pylons actions are allowed to return their own Response instance, and maybe other code wants to do the same.
Asynchronous support
We're not ready to implement this yet (we don't know exactly what's needed), but if we can at least avoid being incompatible with it, that would be a start. We're not sure whether async can be supported directly in these objects, or in a companion future protocol.
The async developers have specifically asked for a NOT_READY token and "post headers". Applications or middleware that have to block would emit NOT_READY in the WSGI result iterable, which higher-level middleware would have to pass through. This would tell the server to check back later for the result (in this context, more body data).
"Post headers" are the ability to send HTTP headers after the body is finished. Sometimes header values can't be calculated until the body is known. These would be merged into/override the regular HTTP headers. Under one WSGI 2 proposal, the result would be (status, headers_iterable, body_iterable, post_headers_iterable).
However, ``webob.Reponse`` is not exactly a WSGI result, so I'm not sure how all this would have to be modified for a WebOb stack.
Before committing to an asynchronous API, we'd need a reference server (something simpler than Twisted, maybe one of those Greenlet things) and demonstration stack. Then we can verify the API is actually usable before blessing it.
One issue is whether dual async/regular components are even feasable. Regular components don't want to be bothered with special tokens that don't benefit them, especially if they have to take an extra step to pass them through. (E.g., don't concat "<html>" + None + "</html"> if None is the token.) "" might make a nice token because it's unobtrusive to regular components. (Unlike file objects, the body is finished when StopIteration is raised, not when "" is emitted.) But asynchronous code has to use special database libraries and socket libraries and file libraries anyway, which regular components <i>really</i> don't want to bother with. So it looks like these would require separate async and regular implementations anyway.
Discussion
Add your feedback here or in the Comments below.
Comments (1)
Feb 23, 2010
Alice Bevan-McGregor says:
Other than the large amount of backwards-compatibility cruft in WebOb (and my pe...Other than the large amount of backwards-compatibility cruft in WebOb (and my personal gripe; multiple accessor methods to access the same data), one thing that has been a thorn with WSGI has been middleware dependancies. Authentication needs sessions and DB, etc.
I've started working on a solution to these issues, named Pulp. It aims to be an effort to rewrite the core components of Paste, PasteScript, PasteDeploy, WebOb, WebError, Flup, and common middleware into a single collection of light-weight dependancies for WSGI application frameworks under both Py2K and Py3K. It makes extensive use of namespace packages.
The dependancy graphing component is external:
http://github.com/GothAlice/Pulp-Dependancy
The unit tests prove that this can work.