« April 2005 | Main | June 2005 »

2005-05-29

Querying literals in Wilbur

Richard Newman has realized that literals are kind of second-class citizen when it comes to queries in Wilbur, and has written about it. Nice little addition done - in my mind - more or less in the spirit of how to modify Wilbur (remember my mental illness about Common Lisp).

I have myself written a version of the triple-store database class which interns literals (I did this for an RDF browser I have created - dubbed "OINK"; more about that in a later blog entry). Interning makes literals more on-par with graph nodes. Another mixin class allows full-text indexing of the literals so you can, say, find all literals that contain some substring (in fact, it integrates with CL-PPCRE, but I am yet to implement something that would allow the regular expression string matching to make use of the full-text index - rather than doing a brute-force scan of all the literals). I will make this code available through CVS soon.

I am wondering what Richard meant by saying "Now I await the inevitable backlash!"... OK, here goes: Wilbur is written in CLOS, so the preferred way of changing things is to create new subclasses, not redefinitions of existing ones. Sorry, I couldn't resist. :-)

Posted by ora at 07:44 | Comments (5)

2005-05-23

Wireless Connections 2005, and thoughts

I gave a "lunch keynote" at the Wireless Connections 2005 conference in Calgary last week. I spoke (perhaps not all too surprisingly) about "Semantic Web & Mobile Information Access". The conference audience was mostly wireless industry executives, analysts, marketers, etc.

"Calgary as seen by my Nokia 6630"

In conjunction, the Canadian business paper Business Edge published an article with my interview in it. Though the article also spoke about conversational user interfaces for automated customer support and like, I thought it was well written (by University of Calgary professor Tom Keenan). And I was quoted as characterizing XML as "woefully inadequate"... Good.

Many presenters in the conference talked about various kinds of (current and future) applications that would give mobile users specific information. It seems to me that, as a user, I would much rather decide for myself what information I like, what the sources are, etc. This, in my mind, is a compelling reason to pursue Semantic Web technologies. Power to the user, with miminal hassle on the user's part.

Posted by ora at 07:55

2005-05-13

Off-line

The database that holds this blog got corrupted, somehow, and I could not post new entries (nor could anyone post comments). Thanks to the good folks who host this web site everything has now been corrected. We should be back in business.

Posted by ora at 06:58

2005-05-08

2 more years

Semantic Web is a team sport, and W3C is the key organization when it comes to just and fair rules of that game.

Elections for the W3C Advisory Board are underway. I have decided to run one more time (if I succeed, this would be my 5th term). Take a look at the list of candidates (W3C member access is required for all links).

Posted by ora at 12:55

XML considered harmful, and other things

I was reading CLiki's XML page and came across a note that said

"... though it's worth noting that Wilbur is an RDF toolkit, not an XML toolkit -- it just happens that it can read RDF/XML, an XML serialisation of RDF."

First I thought that I really ought to publicize the fact more that part of Wilbur is a package called NOX that really is a (simple) XML toolkit. But then I remembered why NOX is the way it is (i.e., simple): Because I wanted to do the absolute minimum to be able to say that I can parse RDF. Someone who really cares (about XML, that is) might decide to use some other XML parser with Wilbur (in fact, this has been done).

Actually, this was just a segue to my real point (related to why I wanted to do the minimum wrt. XML): I think it was a mistake to use XML as RDF's syntax. It seems to have created more confusion than good. Early on when designing RDF I advocated an s-expression -based syntax, because

It was not meant to be, though, and eventually I got voted down; I wish I had been stronger, but I guess parentheses are a lot scarier than angle brackets. Go figure. XML syntax for RDF is a political design decision, not a technical one.

Yet the core problem remains. We still get questions like "Why cannot I just use XML instead of RDF?" which demonstrate the fundamental misunderstanding and wrong focus; people are principally focused on syntax. I think, generally, it easy to operate in terms of something you can see and write. Perhaps that's also the reason why Semantic Web technologies, in a broader sense, are hard to adopt mentally: So much of the benefit of these technologies depends on reasoning and there, ultimately, one is dealing with something one cannot see. Let's just take RDF as an example: Applications should deal with the deductive closure of the RDF graph they process, not the (syntactic) graph itself. If all you do is process the graph that was input, you might as well use XML.

Even those people, who (claim to) have grasped that we are really talking about logic and inference, may get it wrong. As an example, I am thinking of Clay Shirky's criticism of the Semantic Web; this story is now (perhaps inadvertently) a classic, in the worst possible sense. Characterising the Semantic Web as a grand attempt in world-wide reasoning via syllogisms is either a disproportionate misunderstanding, or just plain obtuse. Interestingly, he observes that

"The Semantic Web takes for granted that many important aspects of the world can be specified in an unambiguous and universally agreed-on fashion, then spends a great deal of time talking about the ideal XML formats for those descriptions."

As it is, people tend to be very focused on syntax. As for his criticism of the logic part, the idea that with the Semantic Web we repeat or reattempt the approach where "We can make the entire world logically consistent" (as, perhaps, some knowledge representation folks were guilty of a long time ago) is downright offensive. After all, the AI community has learned quite a few things during the last 20 years.

Nevertheless, many of the critics are very focused on syntax (either by claiming that the XML-layer is enough, or that we messed that up too). The fact remains that we need something on top of the XML, otherwise we just have trees. And quite frankly, XML is a cumbersome way of building trees. A few days ago I started thinking of the s-expression syntax again. On the one hand, I must say that I am tempted, but on the other, there are already several syntaxes for RDF; the real benefit of the XML serialization is that it is a standard. That is no small thing.

As for the other benefits of using XML, one has emerged since the early RDF work: XSLT. At least we can take legacy XML data (or "future legacy data" as someone has put it), encapsulate its semantics in an XSLT script, and transform it to RDF or OWL. Now if only we could go the other way too (claiming that we can use XSLT for that is, again, a misunderstanding).

Posted by ora at 07:19 | Comments (3)

2005-05-02

Querying RDF

Lately I have been doing a lot of thinking about the Semantic Web and specifically about how to query RDF. I have this query language that I like using to build applications that have to access RDF data. It's based on paths: This seems like a good idea, since I think of the RDF data as a graph with recursive and repetitive patterns, and my query language supports those (via an operator capable of computing the transitive closure of an arbitrary subpath).

Against this, I just cannot understand what's going on with SPARQL. Since when was it a good idea to query graphs using a query language that, essentially, is based on the relational calculus? It seems that others have noticed this too.

Neither approach (path queries or relational queries) is completely satisfactory, although I like the path approach better. More thinking is needed (I recommend this to the SPARQL folks too).

Speaking of thinking, I really like this quote (about scientific work) attributed to Werner Heisenberg that goes something like this: "...go on thinking beyond the point where thinking begins to hurt."

Now where did I put those painkillers again...?

Posted by ora at 13:10 | Comments (7)

Adding triples to a Wilbur database

Richard Newman has a good idea of how to add triples to a Wilbur database. Basically, he introduces a function that takes "Lisp-like" descriptions of RDF resources. It's funny, Wilbur used to support a function a bit like this; perhaps the whole idea should be revisited in Wilbur2 (yes, I know, I have said that it's coming, and it really is...).

Richard's function add-triple-list is, however, a bit too much like a function from, say, Java or C++ "transcribed" into Common Lisp. I couldn't help writing my own (mostly because of my mental disease). So here goes:

(defun db-add-description (db description)
  (destructuring-bind (frame &rest slot-and-values) description
    (dolist (slot-and-value slot-and-values)
      (destructuring-bind (slot &rest values) slot-and-value
        (dolist (value values)
          (db-add-triple db (triple frame slot value)))))
    frame))

(defun db-add-descriptions (db descriptions)
  (mapcar #'(lambda (description)
              (db-add-description db description))
          descriptions))

Basically, db-add-description takes two parameters: a database instance and a special description of a resource (an rdf:Description, that is; hence the name of the function). The syntax for the description is

( frame { ( slot { value } * ) } * )

It returns the node affected. The function db-add-descriptions merely takes a list of special descriptions and returns a list of the nodes affected.

Looking at the code, doesn't it make you think that it would be so nice to have a special version of dolist that instead of a variable would use a destructuring pattern. We could define it as follows:

(defmacro dolist+ ((pattern list &optional (value nil)) &body body)
  (if (symbolp pattern)
    `(dolist (,pattern ,list ,value) ,@body)
    (let ((i (gentemp)))
      `(dolist (,i ,list ,value)
         (destructuring-bind ,pattern ,i ,@body)))))

We could now rewrite db-add-description as follows:

(defun db-add-description (db description)
  (destructuring-bind (frame &rest slot-and-values) description
    (dolist+ ((slot &rest values) slot-and-values)
      (dolist (value values)
        (db-add-triple db (triple frame slot value))))
    frame))

No big deal, I guess, but I have a bunch of other code where this type of constructs abound.

Posted by ora at 11:55 | Comments (4)