Ian Bicking: the old part of his blog

Syndication: boring war where only egos are hurt

(Looking for Metric Fun?)

the Syndication Wars -- they're like the Browser Wars, only much more boring and with far less at stake
(reference)

In other news: OMG, silent data loss, the sky is falling! If I panicked this way everytime I realized I had to apply HTML quoting, I would have gray hair and ulcers by now. To summarize the silent data loss in question: Reuters has text like "Enron Corp. <ENRNQ.PK>" in their feed, and all aggregators treat the description as HTML, and so it gets interpreted as HTML with a <ENRNQ.PK> tag in it.

The silly part is that RSS 2.0, lame though it is, clearly has a standard in this case. Like the article says it wasn't that some aggregators had a problem -- they all did. Which is to say, according to the functional, implemented RSS standard (embodied in the aggregators, not the spec), the Reuters feed was incorrect. It was not "invalid", it was semantically incorrect. Such bugs should be familiar to any working programmer.

Oh well. In the end it's all pretty boring. Sorry for wasting your time.

Created 01 Jun '04
Modified 25 Jan '05

Comments:

It fascinates me to watch people rally around a format that everyone now agrees causes silent data loss. It used to depress me, but now it just fascinates me.
# Mark

Atom would cause silent data loss too, if you did what that feed is (effectively) doing:

foo <MSFT> bar
# Phillip Pearson

argh, let's do it with angle brackets turned into braces and ampersands turned into hash marks:

{content type="text/html" mode="escaped"}foo #lt;MSFT#gt; bar{/content}
# Phillip Pearson

I'm not an RSS (2.0) defender -- I'd agree it's way underspecified, and if I worked with it more I'm sure it would annoy me more. I'm just utterly unimpressed with this silent data loss argument.
# Ian Bicking

Sorry, Mark, this makes your camp look a little silly.

The required unique universal ID also makes Atom depart from the simplicity that made RSS what it is.
# pb

I thought the point was that with Atom, there's no question that Phillip's example is incorrect: run that through an XML parser, and you do not get text/html, you get (not-)tag soup. But where in the RSS 2.0 spec can you point to, and say: "this prohibits once-encoded angle brackets" or "this requires that angle brackets which do not surround HTML tags must be re-escaped before being handed to an HTML renderer"? I'd never considered the re-escape idea before, but now it seems like as reasonable an interpretation of "(entity-encoded HTML is allowed)" as the existing interpretation that it means every RSS description is text/html as it comes out of the parser.
# Phil Ringnalda

Such bugs should be familiar to any working programmer.

The assumption here is that the Atom crowd consists of working programmers. I'm not so sure.
# Fredrik

At first, I thought the silent data loss argument was a thunderstorm in a tea cup. But then I realized that, no matter how small the consequences of that problem are, we CAN'T BE TOLERANT to such ambiguity in a format like RSS. The simplicity of such a problem only shows how badly the format is broken.

This is serious business. We're talking a standard that has already caused an impact on the web and will take it over entirely in a matter of years. Boring as it may be, this discussion is very important. There are some things that simply can't be left unsaid.
# Jonas Galvez

When a programmer creates a bug like this it usually affects a single implementation. When a spec author creates such a bug, it affects *every* implementation.

pb - RSS 1.0 has always had URI-based identifiers, RSS 2.0 has a newly-defined kind of unique universal ID (guid).
# Danny

@Phil (the Rignalda one, not the Pearson one ;-) ): actually doubly escaping is just the only right way to do it. If you want to put brackets or ampersand in, you need to escape them. If you put HTML in, you need to escape that. Those two rules are stacked: if you want to use brackets or ampersand in HTML, you need to escape them anyway and then just escape the full HTML stuff - you get doubly escaped chars for your literal ampersands or brackets. I don't understand the big rumble they made about this simple fact. Any programmer with half a braincell should know this already. But maybe it's like frederik guesses and there aren't that much programmers with half a braincell in the discussion ;-)

Sure, RSS is underspeced. But it's not as if there are no ways to find out if what you are doing makes any sense. So if you can't figure it out on your own, just check what people will see in some aggregators. The Reuters problem would have hit anybody right in the face if anybody at Reuters would have cared to test their feed. And sorry, but no spec - however complete it's definition might be - will safe you from stupidity like this.
# Georg Bauer

> Atom would cause silent data loss too, if you did what that feed is (effectively) doing

Blatantly false. Fixing this exact problem was *the* single most important driving factor behind Atom's content model. Atom can faithfully describe plain text data that includes angle brackets and ampersands.

RSS is not simple because you're better at this than we are. RSS is simple because it's broken. Even Dave has admitted this now, and proposed a solution to RSS's silent data loss problem that looks remarkably similar to Atom's content model (except not as elegant).

If you fix all the things that are wrong with RSS, and then add all the features and ideas that the Atom community has been brainstorming with vendors and publishers, then in a few more years, you might end up with a format that's half as good as Atom. I say "half as good" because it won't really have any coherence; it'll just be an even larger pile of desperate and ignorant hacks than it is already. It won't have a consistent content model like Atom has today. It won't have a consistent linking model like Atom has today. And of course it won't be "simple" like RSS is widely mis-regarded today... but it might actually not cause silent data loss, so that would definitely be a step in the right direction.

RSS isn't simple because you're better at this; RSS is simple because it's broken.
# Mark

> The assumption here is that the Atom crowd consists of working programmers. I'm not so sure.

Yay, Fredrik's back, and more clueless than ever.

Let's see, there's an Atom face-to-face meeting today, let's look at the list of names...

http://www.intertwingly.net/wiki/pie/AtomMeeting

- the primary author of XML
- developer of the Feed Validator
- developers from Blogger
- developer from Technorati
- developer from SixApart
- developer from Bloglines

Yeah, a real motley crew, that. Totally out of touch.
# Mark

Mark, you didn't accurately represent my position, nor do I think you're right about whether Atom solved the problem. Further, you're going to run into big problems with Atom down the road, and the things you're saying now are going to haunt you then, assuming you have someone like yourself taking cheap shots at you (I won't be that person).

Anyway, what would you recommend for someone using Atom if they wanted to include angle-bracketed stock symbols and HTML markup in content? How would they do that?
# Dave Winer

Mark, iM fascinated to watch people rally around a new format, then turncoat and start bashing it 12 months later when it becomes the fastest growing XML format on the Web? By the way, never heard back on the XML.com article. I assume you agreed w/ my feedback.
# Randy Charles Morin

Q: what would you recommend for someone using Atom if they wanted to include angle-bracketed stock symbols and HTML markup in content?

A: Here's an example:

This is
<b>bold</b>. This is a
&lt;stock&gt;
# Sam Ruby

Arg. My markup got eaten. Second attempt:

<content type="text/html">This is
&lt;b&gt;bold&lt;/b&gt;.  This is a
&amp;lt;stock&amp;gt;</content>
# Sam Ruby

I find it humorous that we are trying to talk about silent data loss, while struggling w/ an editor that loses data :)
# Randy Charles Morin

Preview is a wonderful thing. Phillip?

But I have to disagree with Georg that preview solves everything for RSS. For the most basic case, single-encoded angle brackets in description, it probably mostly does: I can find some things that read RSS but don't treat the Reuter's stock symbols as HTML tags, but it will take some looking, and it would be pretty obvious that they are old and broken in other ways. But once you start looking at more elaborate things, and things like treatment of entities in title, or HTML in channel/description, you need to have a Windows machine, a Mac, and Linux, with dozens of aggregators installed on each one, just to find out what you can and can't do. And I need to, too. And so does Sam, and Georg, and Randy, and Ian, and everyone else who ever wants to do anything other than plain text, just like the way we did HTML during the browser wars.

The aggregators work just fine? When I first learned about this problem, 18 months or so ago, it took me four different aggregators to find one that would let me read one of Sam's posts.

And while we claim that everybody knows how to deal with RSS 2.0 and HTML, what about RSS 0.91? It still exists, and every aggregator claims to be able to read it. Do any aggregators at all correctly read it, treating a once-encoded less-than as something which should display as a literal less-than, not be treated as the start of an HTML tag? There is no HTML in RSS 0.91, and the spec makes it clear that if you want to say that 1 is less than 2, you encode the less-than symbol *once*. Sure, forgiving browsers will usually display that, but what about 1 is less than 5 is greater than 4?
# Phil Ringnalda

>When I first learned about this problem,
>18 months or so ago, it took me four
>different aggregators to find one that
>would let me read one of Sam's posts.

I assume you mean his rss2 feed, where he started using xhtml:body. At the time, no aggregator supported this feature and his feed didn't work anywhere. Which, I documented for the historians using Don Box's blog as the example.

http://www.kbcafe.com/iBLOGthere4iM/?guid=20030603070853
# Randy Charles Morin

19 months. Time flew. No, I meant *this* problem: is that single-encoded angle-bracket the start of an encoded HTML tag, or is this plain text?
# Phil Ringnalda