Ian Bicking: the old part of his blog

Re: My first bit of elementtree comment 000

Sorry - explained myself badly.

Was referring to the process of lexing the raw text in the first place. Rather than using characters like > and < to find tokens, as is common in most HTML parsers (HTMLParser and sgmllib seem to do this), look for specific tags by name while treating all else as unintesting plain text, although it may contain HTML tags we're no interested in. In this case it might amount to some fairly simple regular expressions.

Comment on My first bit of elementtree comment 000
by Harry Fuecks