« Wireless Connections 2005, and thoughts | Main | Apple switching to Intel processors »

2005-05-29

Querying literals in Wilbur

Richard Newman has realized that literals are kind of second-class citizen when it comes to queries in Wilbur, and has written about it. Nice little addition done - in my mind - more or less in the spirit of how to modify Wilbur (remember my mental illness about Common Lisp).

I have myself written a version of the triple-store database class which interns literals (I did this for an RDF browser I have created - dubbed "OINK"; more about that in a later blog entry). Interning makes literals more on-par with graph nodes. Another mixin class allows full-text indexing of the literals so you can, say, find all literals that contain some substring (in fact, it integrates with CL-PPCRE, but I am yet to implement something that would allow the regular expression string matching to make use of the full-text index - rather than doing a brute-force scan of all the literals). I will make this code available through CVS soon.

I am wondering what Richard meant by saying "Now I await the inevitable backlash!"... OK, here goes: Wilbur is written in CLOS, so the preferred way of changing things is to create new subclasses, not redefinitions of existing ones. Sorry, I couldn't resist. :-)

Posted by ora at 07:44

Comments

Point definitely taken!

My next little bit of work was to make a literal index, parallel to the existing s/p/o indices. This is keyed with EQUAL on the string value of literal triples. Indexing goes :before and :after db-add-triple and db-del-triple (though I confess to redefining the existing DB class again! I was aiming for drop-in replacement, rather than classical extension).

This was a substitute for a "proper" text-indexing library like Lucene. I have a higher-order function db-triples-literal-if which can take, e.g., a CL-PPCRE pattern. Performance is surprisingly good, even on 200,000 triples.

It sounds like a hacky version of yours, so I'd love to see the proper version -- and OINK sounds interesting, too!

Posted by: Rich at May 29, 2005 08:55 AM

My goal is the extend the Wilbur Query Language to be the general mechanism for accessing the triple store (be it triples, nodes, or literals that one is looking for).

Posted by: Ora Lassila at May 29, 2005 11:34 AM

The next question, therefore, is "is it in Wilbur 2?"

(I'll leave the obvious next next question to your imagination!)

Posted by: Richard Newman at May 29, 2005 11:43 AM

Uhhuh... well, ahem, "yes", and the answer to the next question is "Real Soon Now" (this is the part where work often interferes).

Posted by: Ora Lassila at May 29, 2005 01:05 PM

Good to know :D

Cheers Ora!

Posted by: Richard Newman at May 29, 2005 04:19 PM