Ian Bicking: the old part of his blog

WSGI and Dispatch

A few days ago Mark Baker left a little comment on WSGI on his blog (for background Mark Baker is an enthusiastic REST advocate).

While he was mostly positive about WSGI, he said this in a comment:

This is the problem [with WSGI]:

def application(environ, start_response):

Because it encourages bad practice with HTTP, such as those apps that behave the same way to GET and POST requests (as the examples in the article do).

I think this would have been superior:

def get(environ, start_response):
def post(environ, start_response):
def put(environ, start_response):

Well, the first reason this doesn't work is fairly concrete. It would actually have to look like this:

class Application:
    def get(self, environ, start_response): ...
    etc..

because you can't pass a request to a set of functions, without somehow defining what the "set" is. You could also put them in a module, or some other container object. This opens up a whole bunch of messy and dull questions about how you traverse the container to get to the request method method.

This also encodes request method dispatch into the spec, without encoding any other kinds of dispatch into the spec. The bad solution here would be to encode all kinds of dispatch into the spec. What WSGI got really right was that it has absolutely no dispatch in the specification.

At the same time implementing dispatch in WSGI is really easy. Creating a dispatcher in a style like Mark suggests is rather difficult, because you have to do something like this:

class Dispatcher:
    def dispatch(self, environ, start_response):
        return self.sub_object(environ)(environ, start_response)
    get = post = put = dispatch

And of course you have to enumerate all the possible request methods. This is just silly, but it's become a common design misfeature in frameworks influenced by the Java Servlet specification (e.g., BaseHTTPServer).

The reason I think this misdesign seems reasonable is because other kinds of dispatch are assumed to have already occurred, and are opaque to the specification. WSGI does not make this mistaken assumption.

The primary kind of dispatch that people do is based on the request path. I.e., if you do GET /blog/archive/2006/10, first you have to figure out what /blog/archive/2006/10 is. Here's the general way you do this in WSGI:

  1. Probably you start out by realizing that /blog is the blog application, and you delegate the request there. WSGI separates the path into two parts using the CGI convention that the "used" part of the path is in SCRIPT_NAME, and the "unused" part is in PATH_INFO. So the blog app gets SCRIPT_NAME="/blog" and PATH_INFO="/archive/2006/10"
  2. How does the blog application parse its part? We assume only the blog application knows the best way to do that. Very possibly it does another prefixed based search on /archive, and then that resource in turn parses /2006/10. At different stages the results of this intermediate parsing may go into the request in non-standard locations, e.g., environ['routes.url_vars']
  3. Only after all these steps have happened is it likely that GET has any meaning. Thankfully we've been allowed to totally ignore the request method until this point.

Note that this specific example is not the only way it might work. For instance, we might have started with:

  1. First, look at the Host header and dispatch on that.

This adds virtual hosting. Server environments that code step 1 directly into their environment often have to create special exceptions for virtual hosting. Or:

  1. First, look at the Host header, matching (.*)\.myblogs.org, and putting the matched value into environ['myblogs.username']

Now you can set up a wildcard DNS and get your blog app to look up the user based on that key.

If you want to pay close attention to the request method, you are quite free to do so. WSGI does not tell you how, but it does not in any way hinder you from doing so. A good example of something that pays close attention to the request method is Joe Gregorio's wsgicollection, which could very well be the terminal point for some of these examples.

Because WSGI does not include any form of dispatching it represents HTTP very accurately. HTTP does not lend special meaning to the request path. It does not say that different domains resolve to different servers. HTTP is a kind of message, and there are many ways to interpret that message. Low-level specifications overstep their bounds when they interpret those messages for you. WSGI is not an educational project, it is infrastructure, and an important feature is that it does not overstep its bounds.

I'll also note that I think WSGI represents the HTTP message very well; the parts of HTTP that it leaves out are primarily about the HTTP connection, which are best handled by the actual connected-to-the-browser server. The rest of the message is all in there if you want to use it.

Created 21 Oct '06

Comments:

"bad practice with HTTP, such as those apps that behave the same way to GET and POST requests"

Is there a real reason why this matters? No.

# Sergey

For one, it matters for the simplicity of a client application. If the server implements REST properly, then all the client needs to know is the URI for the resource it's dealing with and then it basically knows that it can do a GET or a POST or a DELETE or whatever on that URI and the appropriate action will be taken. If the server doesn't distinguish between GET and POST (or the others) the dispatching to different functionality has to be done via the URI, so the client then also needs to know what URIs to hit for the equivalent create/delete/update actions.

The other problem is that if the server doesn't distinguish between GET and POST, then some client somewhere is going to use a GET for an update because, hey, it seems to work. Then a cache somewhere in between might legitimately cache it. Or you get problems like with the Google Web Accelerator that would delete all the content on a site because there are "delete" buttons with links to URIs that delete a resource on a GET request and the GWA hits them all because it ought to be safe to make GET requests according to the specs.

# Anders

It matters only sometimes, and in those cases people will handle it on their own. It's counterproductive to force everyone to use programming model they don't need just so that someone else will not make a certain (well known and understood) mistake. That's B&D approach which thankfuly is not in Python or WSGI style. import this will give you a couple reasons what is.

# Sergey

It's not really B&D: when you recieve a request, you have to keep in mind what verb was used because each verb guarantees different kinds of behaviour. For instance, it's one thing to treat GET and POST requests identically if what you're doing to process the request fits the kind of behaviour that's acceptable in response to such a request, such as, say, displaying a page, but quite another thing if they don't, such as Anders' example of an application allowing items to be deleted with GET requests.

However, as Ian outlined, such matters are a matter for a higher-level framework than WSGI and deciding them at WSGI's level would complicate the interface between it and the application or framework sitting on top of it.

It is something that matters quite a bit, but because of HTTP itself and not some misplaced desire for B&D. It's something which ought to be in the minds of any application developer because part of developing a good application is using the underlying protocol correctly. It's nothing to do with forcing a programming model and everything to do with passing the request onto another lump of code smart enough to really know what to do with it.

# Keith Gaughan

I want to be clear that I wasn't disagreeing with Ian at all. I think he's absolutely spot on that the dispatching on method shouldn't be done at the WSGI level. It should be done by some layer that's on top of WSGI (I use wsgicollection for that, and it rocks).

The point that I was making was just that GET and POST are not interchangeable. They are completely different operations with different semantics. Adhering to standardized semantics isn't B&D, it's just interoperability and it lets you make simpler clients and servers.

# Anders

Thanks for clearing up my failed attempt at Python; I've only dabbled with it.

You're right that this was partly inspired by Servlets, but only insofar as I consider them to be a pretty good API for HTTP that I've used successfully for a number of years. You can use Servlets as you would use WSGI, by just overriding the service() method, or you can use doGet, doPost, etc.. URI based dispatch can either be handled within service(), or using any of the URI-dispatching tools the containers typically provide (e.g. web.xml). So there's certainly ample precedent for this form of API supporting the flexibility WSGI seems to require. But only doing the equivalent of service() (as WSGI currently does) places more of the burden on remaining HTTP-friendly with what's built on top. Perhaps that will work out, I don't know, but I don't see much downside to such a framework constraining the use of HTTP as I've described. But if it doesn't work out for WSGI, and you have apps which, for example, don't distingiush GET and POST, then you lose the ability to, say, support cache management at the WSGI level. Providing a Servlet-like API doesn't guarantee you that HTTP won't be abused of course, but it does reduce the likelihood that it'll be done accidentally.

# Mark Baker

Mark, not much people will wirte apps directly to the WSGI layer, but instead they'll use some framework... or at least will use some simple tool (like the 'selector' wsgi dispatch - http://lukearno.com/projects/selector/).

# Damjan