Ian Bicking: the old part of his blog

Re: XML Processing

getElementById is tricky:

from xml.dom import minidom

s = '<?xml version="1.0"?><foo><bar id="1" /></foo>'
doc = minidom.parseString(s)
assert None == doc.getElementById('1')

s = '<?xml version="1.0"?><!DOCTYPE quote [ <!ATTLIST bar id ID #IMPLIED> ]> <foo><bar id="1" /></foo>'
doc = minidom.parseString(s)
assert None != doc.getElementById('1')

You've basically got to load in the HTML DTD if you expect getElementById to work.

Comment on XML Processing
by Stephen Thorne

Comments:

That, um, sucks. Geez... the only reason the DOM seems useful to me is that it is implemented in browsers. I'm sure it's implemented and widely used elsewhere (I guess, I don't actually hear people talking about it), but the primary implementation in my mind has always been browsers. Or raise an exception when getElementById can't return a meaningful value. At least it should indicate in the documentation how you make it act like the browser's implementation. But eh... ET is much more predictable and seems to have relatively few intricacies. And it's going to be in the standard library (w00t!), so I'll probably just choose to forget that xml.minidom even exists.

# Ian Bicking