html5lib is cool too as an HTML parser, and important as a reference implementation, but I'll mostly be waiting for that work to filter back into something like libxml2.

I should stress html5lib is not a reference implementation in the traditional sense of the word; it's the first public implementation of the HTML 5 spec and has a pretty valuable set of tests but it has no other status at all; any disagreements with the spec are bugs in html5lib. In particular testing implementations against html5lib output is not recommended.

There is also some work going on on making a fast implementation of the HTML5 spec parsing algorithm in C; I will be first in line to make Python bindings when that work is complete (and no doubt others will come up with other bindings too).

jgraham