It occurs to me that when Ian says "The reality of web pages is that they are a visual, published medium." he is mostly (95+%) correct, and that the source of the division between semantic HTML and "classic" HTML hinges on this well-established precedent -- the majority of web pages are indeed designed for visual consumption. The "semantic" web, in W3C terms, refers to documents whose markup expresses meaning rather than presentation. Unfortunately, web-based browsers center around visual presentation, as does the primary XML-based markup language for web documents, XHTML.
The key fact to consider is that 95+% of all web browsing is done using an agent that doesn't interpret meaning as such. Mozilla has the ability to apply separate presentation instructions (CSS files) to bare XML documents, but does any other user agent have this ability? Not that I am aware, or at least not completely (feel free to edumacate me... ;).
I agree that XHTML is fundamentally broken in its current incarnation. The W3C would like us to believe that XHTML is semantic, an assertion with which I do not agree. I feel that XHTML is, still, a presentation language and I can prove it by pointing any standard browser at an XHTML document that does not contain any style information at all and still get typographic differentiation of portions of the text based on the XHTML-valid tag structure of the document.
The unsolved problem, however, is the remaining 5-% of agents out there, screen readers et al., that do indeed interpret (X)HTML documents in ways that do not coincide with the visual presentation afforded by XHTML. To accommodate them, I write my HTML keeping in mind the W3C-intended differences between EM and I, and similar. I suspect I find it as natural as any other type of writing (coding), because I do write with a tone of voice in mind, as per Jim, rather than considering the visual presentation. Also, I have pretty thoroughly internalized the difference between EM and I, and similar cohorts, so that I actually *think* in terms of emphasis rather than italics. But then, I'm a very non-visual thinker in many ways. I think also that I got into the habit a long time ago of creating content first, without any consideration for presentation, and only after I'm finished with the content do I worry about how it looks. I know for a fact that most people don't write this way, but it was trained into me when I was working telephone support for MS Word. The more advanced features of Word (master documents, in particular) cannot be used effectively while writing. They can only be implemented properly by applying them after finishing content creation.
Ian later says "if you provide a semantic markup language, they won't come. People won't just spontaneously write properly-categorized semantic markup...", with which I also agree. For the most part, people don't do something that provides no real payoff. But I think the payoff is pretty much right around the corner. Mozilla already supports styled XML. The other marginal browsers (sorry, I use a marginal browser myself -- Safari), if they don't support styled arbitrary (though valid) XML now, will do so shortly. I can't begin to guess when or if IE will support styled XML. Kimbro Staken's Syncato weblog software shows us how much added value XML brings to the table. Gentoo uses XML for its documentation, then transforms it to produce the web version. Once people start being able to use a single source document for both the content and the presentation, without mixing the presentation with the content and without making the presentation technology hard to use, semantic documents will come. Look at it this way: wouldn't it be easier to write content just once in an XML vocabulary that is well-suited to express the semantics of the content? Use an easy schema language (e.g. Relax NG) to describe the semantics, then just post the document wherever you want, and allow the user agent to do their job of presenting it to the user. Since the XML has semantic and structural meaning, processing tools can be quickly created to extract information and reuse it elsewhere, thus ennabling the law of unintended consequences.
I feel that this is a compelling vision. I'd love to live in this future. I want it now. Bueller? Bueller? Bueller? :)