Ian Bicking: the old part of his blog

New feature for the day

The new feature in the wiki for today is version comparisons.

This uses htmldiff.py to do the comparisons. Unlike some other comparisons, htmldiff calculates the differences between HTML documents, instead of relying on line-by-line comparisons of the original text source. Since HTML isn't (very) whitespace sensitive, comparisons based on line endings or other whitespace aren't really accurate. Instead htmldiff parses the HTML into a list of tokens -- one token for each start and end tag, and one token for each whitespace-delimited word in the text (it essentially ignores the nested structure of HTML and treats it as a simple stream of tokens).

This seems like a good compromise to me. Character level comparisons ignore the structure of HTML completely, and tend to create weird differences. Line level comparisons aren't appropriate to HTML or narrative text. Structured comparisons like XmlDiff are too complicated to present in a visually simple way.

Created 14 Apr '04
Modified 14 Dec '04