« Querying RDF | Main | 2 more years »
2005-05-08
XML considered harmful, and other things
I was reading CLiki's XML page and came across a note that said
"... though it's worth noting that Wilbur is an RDF toolkit, not an XML toolkit -- it just happens that it can read RDF/XML, an XML serialisation of RDF."
First I thought that I really ought to publicize the fact more that part of Wilbur is a package called NOX that really is a (simple) XML toolkit. But then I remembered why NOX is the way it is (i.e., simple): Because I wanted to do the absolute minimum to be able to say that I can parse RDF. Someone who really cares (about XML, that is) might decide to use some other XML parser with Wilbur (in fact, this has been done).
Actually, this was just a segue to my real point (related to why I wanted to do the minimum wrt. XML): I think it was a mistake to use XML as RDF's syntax. It seems to have created more confusion than good. Early on when designing RDF I advocated an s-expression -based syntax, because
it would not be as verbose as XML,
it would have been easier to write (really), and
most importantly, I didn't think there were any real benefits to using XML.
It was not meant to be, though, and eventually I got voted down; I wish I had been stronger, but I guess parentheses are a lot scarier than angle brackets. Go figure. XML syntax for RDF is a political design decision, not a technical one.
Yet the core problem remains. We still get questions like "Why cannot I just use XML instead of RDF?" which demonstrate the fundamental misunderstanding and wrong focus; people are principally focused on syntax. I think, generally, it easy to operate in terms of something you can see and write. Perhaps that's also the reason why Semantic Web technologies, in a broader sense, are hard to adopt mentally: So much of the benefit of these technologies depends on reasoning and there, ultimately, one is dealing with something one cannot see. Let's just take RDF as an example: Applications should deal with the deductive closure of the RDF graph they process, not the (syntactic) graph itself. If all you do is process the graph that was input, you might as well use XML.
Even those people, who (claim to) have grasped that we are really talking about logic and inference, may get it wrong. As an example, I am thinking of Clay Shirky's criticism of the Semantic Web; this story is now (perhaps inadvertently) a classic, in the worst possible sense. Characterising the Semantic Web as a grand attempt in world-wide reasoning via syllogisms is either a disproportionate misunderstanding, or just plain obtuse. Interestingly, he observes that
"The Semantic Web takes for granted that many important aspects of the world can be specified in an unambiguous and universally agreed-on fashion, then spends a great deal of time talking about the ideal XML formats for those descriptions."
As it is, people tend to be very focused on syntax. As for his criticism of the logic part, the idea that with the Semantic Web we repeat or reattempt the approach where "We can make the entire world logically consistent" (as, perhaps, some knowledge representation folks were guilty of a long time ago) is downright offensive. After all, the AI community has learned quite a few things during the last 20 years.
Nevertheless, many of the critics are very focused on syntax (either by claiming that the XML-layer is enough, or that we messed that up too). The fact remains that we need something on top of the XML, otherwise we just have trees. And quite frankly, XML is a cumbersome way of building trees. A few days ago I started thinking of the s-expression syntax again. On the one hand, I must say that I am tempted, but on the other, there are already several syntaxes for RDF; the real benefit of the XML serialization is that it is a standard. That is no small thing.
As for the other benefits of using XML, one has emerged since the early RDF work: XSLT. At least we can take legacy XML data (or "future legacy data" as someone has put it), encapsulate its semantics in an XSLT script, and transform it to RDF or OWL. Now if only we could go the other way too (claiming that we can use XSLT for that is, again, a misunderstanding).
Posted by ora at 07:19
Comments
There is one technical benefit to RDF/XML: with only a tiny bit of thought, an XML document can be written using striping and end up as both useful XML and RDF. DOAP is a good example.
I think "XML fever" got to people during standardisation; there's almost no benefit to RDF/XML over something like Notation 3 or NTriples, because being able to parse it as XML using XML tools isn't a big time saver over writing a N3 parser, and introduces very verbose syntax. There's also the OWL syntax, which I remember as being quite elegant.
With regards to your "why use RDF over XML" point, there is a flipside: people tend to like XML because they don't have to make their ontological assumptions explicit. It's a very pragmatic way of doing things, relying on literals and implicit nesting to convey meaning; RDF tries to aim a bit higher and get a more accurate, reusable, and interoperable model of a domain. People don't like to think, so XML is easier. Your reasoning point ties in with this, as reasoning is part of reuse, and I think it's a very good observation.
Regarding XSLT: it is possible to go the other way, but only with a properly normalised RDF/XML document (the normalisation is possible with XSLT, of course). It's cumbersome, and I wouldn't do it myself, but discussion on the SIMILE list has shown it to be a viable approach for certain tasks.
Now, if there were an s-expression syntax, manipulating it, converting it to and from XML, etc. would be even more trivial than with XML...
Posted by: Richard Newman at May 8, 2005 12:57 PM
Richard: the pragmatism you are referring to we were hoping to dispense with. I agree with your assessments, except that I tend to have tremendous difficulty with going from RDF to XML with XSLT.
Posted by: Ora Lassila at May 8, 2005 01:34 PM
Yes, it's not a good kind of pragmatism -- I find it amusing that XML is touted as a cure for interoperability concerns, when it clearly facilitates the same implicit, poorly-specified semantics as any other data format.
Not that bad RDF is much better, of course! :)
Posted by: Richard Newman at May 8, 2005 07:10 PM