GWA and RFC 2616

So now people are going back and forth about what the spec says. Here's what seems to be the relevant portion of RFC 2616 (HTTP/1.1):

9.1 Safe and Idempotent Methods

9.1.1 Safe Methods

Implementors should be aware that the software represents the user in their interactions over the Internet, and should be careful to allow the user to be aware of any actions they might take which may have an unexpected significance to themselves or others.

In particular, the convention has been established that the GET and HEAD methods SHOULD NOT have the significance of taking an action other than retrieval. These methods ought to be considered "safe". This allows user agents to represent other methods, such as POST, PUT and DELETE, in a special way, so that the user is made aware of the fact that a possibly unsafe action is being requested.

Naturally, it is not possible to ensure that the server does not generate side-effects as a result of performing a GET request; in fact, some dynamic resources consider that a feature. The important distinction here is that the user did not request the side-effects, so therefore cannot be held accountable for them.

9.1.2 Idempotent Methods

Methods can also have the property of "idempotence" in that (aside from error or expiration issues) the side-effects of N > 0 identical requests is the same as for a single request. The methods GET, HEAD, PUT and DELETE share this property. Also, the methods OPTIONS and TRACE SHOULD NOT have side effects, and so are inherently idempotent.

However, it is possible that a sequence of several requests is non- idempotent, even if all of the methods executed in that sequence are idempotent. (A sequence is idempotent if a single execution of the entire sequence always yields a result that is not changed by a reexecution of all, or part, of that sequence.) For example, a sequence is non-idempotent if its result depends on a value that is later modified in the same sequence.

A sequence that never has side effects is idempotent, by definition (provided that no concurrent operations are being executed on the same set of resources).

Interestingly, a delete?id=10 link is idempotent. Assuming the id is unique and not reused, the first GET will be the same as the second or third. Idempotent says nothing about the server side or the difference between N=0 and N=1.

Other links are interesting. Consider a link that changes the order of a list. move?id=10&direction=up is not idempotent -- repeated calls change the response to move the item up more and more. move?id=10&to_position=3 is idempotent. As a sequence it is not idempotent, as it is effected by other links (which could change the order of other items in the list). move?pos=10:1&pos=13:2&pos=15:3 (i.e., item_id:position_index) provides idempotent sequences.

But none of these are safe. But "idempotent" sounds cooler than "safe" so it's being used a lot.

This paragraph is interesting as well:

Naturally, it is not possible to ensure that the server does not generate side-effects as a result of performing a GET request; in fact, some dynamic resources consider that a feature. The important distinction here is that the user did not request the side-effects, so therefore cannot be held accountable for them.

In the case of the Google Web Accelerator, the user most certainly did not request the side-effects. The question everyone is asking, is who is responsible for the side effects, the original web application developer, or Google? I say Google because they are breaking expectations. Other people say the web developer, because that's what the spec says. But frankly I don't see how the spec says that, but it's blindingly obvious what convention says. The spec addresses issues of caching, which is where idempotency comes into play, but has little to do with this situation (though the GWA has been accused of breaking that too -- which I blame on IE for not implementing Vary properly and so rendering a useful header conventionally useless).

Remember what's required to make the GWA work. It's not just the delete links, though those are the most painful ones. A mail program that marks mail read will be broken by the GWA. A logout link will be broken. A vote-for-this link will be broken. And all the fixes involve Javascript; either you make your site inaccessible to people without Javascript, or

GWA will break your site (since it acts just like a Javascript-disabled client). Without nested forms you can't do what people are saying, you can't turn everything unsafe into a POST.

To me, the GWA is a kind of loophole in the spec, not something the spec allowed for. It seems like it makes sense, because it's doing what bots have always done, trolling around for content. But it's doing so pretending it's a user, and that's why it doesn't work with the web we have. If they want a new spec about how to do that, okay. Of course an HTTP/1.2 that clarifies this stuff is unlikely, but for all the reasons that Google has uncovered here.

Created 10 May '05

Comments:

I have yet to see an example of something that can't be done without nested forms. People have used that as an excuse a lot, but nobody's given an example. It seems to me that a GET used in an unsafe way is equivalent to an opening form tag (with an action attribute and a method attribute with the value GET), a bunch of hidden inputs, a submit button, and a closing form tag.

Talking about nested forms as being necessary to replace unsafe GETs would appear to require people to be using nested links already, and they aren't.

# Jim

Here's an example:
<form action="edit_addresses" method="POST">
  Address 1: <input type="text" name="address-1" value="123 W. St">
    <a href="delete_address?id=1">delete this address</a><br>
  Address 2: ...
</form>
Happy? Yes, you could make edit_addresses process deletes as well (which is a semantic change to the form, though arguably a better user interface). Then the delete could be a button like <input type="submit" name="delete_address-1" value="delete this address">. But that's often a rather difficult change to make on a legacy application.

Personally I plan to put in rel="nofollow" on these links in the future; in many ways it seems like the best compromise, and a good indication of what I really mean (which is "no robots here", not "this item has side effects," which is not quite the same thing).

There's also other more subtle places where requests are pretty "safe" and certainly "idempotent", but the GWA will cause problems by following links. The most obvious being a webmail application where it would cause all mail to be marked as "read". Turning a mail read into a form submission would be really really bad UI.
# Ian Bicking

In the example you give, why not use a two-stage delete? The anchor "delete this address" would take the user agent to a confirmation that yes, indeedy, they'd love to delete the address. Then the actual deletion would be a POST, since the form has trouble with its being a DELETE.

However, I wouldn't go so far as to GET an item from my address book and have it magically appear in the trash (which could then be emptied on a POST). Misbehaving robots would then rip all the pages out of my book, and I'd need to dig around in the wastebasket looking for them.

With regard to the reading of mail, "reading" is a kind of GET. How do you tell the difference between a real person and a robot before you change the state?

# Will Cox

In the example you give, why not use a two-stage delete? The anchor "delete this address" would take the user agent to a confirmation that yes, indeedy, they'd love to delete the address. Then the actual deletion would be a POST, since the form has trouble with its being a DELETE.

I don't want to do that because sometimes deletes aren't that big a deal. Maybe they are undoable. Maybe it's assumed that lots of deletes happen. It's a UI concern, and sometimes it's appropriate to allow quick actions. It would be totally backwards to have HTTP methods driving the UI decisions like that.

With regard to the reading of mail, "reading" is a kind of GET. How do you tell the difference between a real person and a robot before you change the state?

You put it behind authentication where robots can't get to. Until Google uses users as a their trojan horse to get their robot claws on all the data hidden behind actions it can't take (because of robots.txt, POST forms, authentication, etc). Not that they are necessarily so sinister... but it's not impossible that they do intend to use GWA users as a way to find data they can't find on their own.

# Ian Bicking

Delete causes data loss, and thus is always a big deal. Think about how you use the trash can in your kitchen (or [=rm], if you prefer). Tossing something is an intentional act. Accidentally tossed the $50 Amazon gift certificate in the bin with the junk mail? You might want to retrieve that. If your trash can is an incinerator, you look twice at everything you toss in, or throw a whole bin's worth once a week. Maybe the action is undoable, and maybe it does need to be quick, but neither requires using a GET.

In the mail example, consider non-web mail user agents. They copy the messages from the server to your desk, but have you read those messages? No. Why should the cacheing of mail by your HTTP user agent indicate that the mail has been read?

# Will Cox

Delete causes data loss, and thus is always a big deal. Think about how you use the trash can in your kitchen. Tossing something is an intentional act. Accidentally tossed the $50 Amazon gift certificate in the bin with the junk mail? You might want to retrieve that. If your trash can is an incinerator, you look twice at everything you toss in, or throw a whole bin's worth once a week. Maybe the action is undoable, and maybe it does need to be quick, but neither requires using a GET.

In the mail example, consider non-web mail user agents. They copy the messages from the server to your desk, but have you read those messages? No. Why should the caching of mail by your HTTP user agent indicate that the mail has been read?

# Will Cox

> <a href="delete_address?id=1">delete this address</a>

What is wrong with <input type="submit" name="delete-id-1" value="Delete this address"> ?

# Jim

Ian Bicking: the old part of his blog

GWA and RFC 2616

Comments: