« April 2006 | Main | June 2006 »

2006-05-22

RDF++

Yesterday I gave a talk at the W3C Advisory Committee meeting, titled "Identity Crisis and Serendipity" (unprotected version). I proposed to expand the class of applications that can be implemented using RDF (via the use of a reasoner) by "borrowing" two features from OWL. Namely, these features would be:

These features both address the issue of "identity", specifically the observation that many things one would like to describe do not have a URI but can still be uniquely identified. Now we can do, say, FoaF easily by relying on the underlying reasoning engine.

My current "development version" of Wilbur (not yet in CVS) implements this expanded "RDF++" reasoner, and OINK makes use of it. The additions still allow us to keep inference hidden and working behind the scenes, with applications accessing the deductive closure of the contents of the triple store.

I wonder if there are other features that could be seen as equally useful that we should add to RDF++?

Posted by ora at 05:23 | Comments (10)

2006-05-19

Edinburgh

Scene from the Royal Mile I have arrived in Edinburgh today to attend W3C meetings. On Sunday I will give a "lightning talk" at the W3C Advisory Committee meeting (link for W3C members only). The idea of these short lightning talks - as far as I understand - is to get up to the podium and piss people off (ahem, I meant "inspire discussion"). We'll see...

Unfortunately I cannot stay for the WWW2006 conference, but on Thursday my colleague Deepali Khushraj will give a talk on our OINK system titled "Browsing RDF Data with OINK".

Posted by ora at 14:33

2006-05-11

Measuring Wilbur's Performance

Speed Chart I have done some performance measurements, and comparing different Common Lisp platforms. First, I was mostly interested in load speed, i.e., the speed at which one can load triples into a database. This matters when you want to play with large amounts of data.

I created an instrumented database mixin class that allows me to measure transient load speed over chosen chunk size. I then created two database classes, one that indexes triples as it loads them (like Wilbur's normal database classes do) and one that does not; the built-in Common Lisp eq hash tables are used as indices. I then used one of the UniProt RDF files (with approx. 340,000 triples) to gather data, and discovered something unpleasant: Both OpenMCL and SBCL seem to load triples at a fairly constant speed, with indexing creating only a slight (but constant) overhead. Allegro, however, started really slowing down around 150,000 triples, so much so that the total time to load the file was almost 3 times that of OpenMCL and SBCL. Without indexing, Allegro was much faster than the others.

I then created my own hash-table implementation (loosely based on the genhash package from Ingvar Mattsson which I cleaned up and optimized a bit). Using this as the basis for our triple-index mechanism brought the performance on Allegro approximately on par with the others. Here is a graph that shows the transient load speed (as triples/s, taken in 3000 triple samples and shown as a sliding average over 10 samples). Note that the little hump at the tail end of the load is an artifact of the file, there are easier triples at the end...

Posted by ora at 13:08

Back from Paris

I am back from Paris where I keynoted (is that really a word?) the Finnish-French Symposium on Digital Semantic Content across Cultures. I liked the symposium. One presentation was particularly cool: Prof. Lily Díaz-Kommonen from the University of Art and Design Helsinki described a project to produce a "digital facsimile" of the oldest surviving map of México City (dating back to 1550). The project was a collaboration between the Media Lab of the University of Art and Design and the Institute of Photogrammetry and Remote Sensing of the Helsinki University of Technology.

Posted by ora at 12:25

2006-05-03

In Paris

I am in Paris to give the keynote at the Finnish-French Symposium on Digital Semantic Content across Cultures. The event is being held at the Louvre!

Posted by ora at 15:09 | Comments (1)

Fun with Lazy Evaluation

I was a bit bored and decided to play with lazy evaluation in Common Lisp. This has been done before in various publications about functional programming (including some of my own...), but I had fun doing this (read: don't flame me).

Common Lisp lists lend themselves well to lazy evaluation: we can make list construction (function cons) lazy wrt. the tail of the constructed list. This means that we only construct the list as far as the first element goes, and delay the construction of the tail. Functions that traverse lists (mainly, cdr) will force the construction of the remaining list, typically just one element at a time.

So here goes: First, we need a package for our new fun functions. Let's call it "LAZY". We need to shadow those CL functions we intend to redefine in our lazy framework.

(defpackage lazy
  (:use :cl)
  (:shadow cl:cdr cl:cons cl:mapcar cl:list-length cl:dolist)
  (:export   "CDR"  "CONS"  "MAPCAR"  "LIST-LENGTH"  "DOLIST"))

(in-package :lazy)

Then, we need a basic mechanism for delayed evaluation, as follows:

(defstruct (delay
             (:constructor make-delay (closure))
             (:predicate delayp))
  closure)

(defmethod print-object ((self delay) stream)
  (princ "..." stream))

(defmacro delay (form)
  `(make-delay #'(lambda () ,form)))

(defun force (thing)
  (if (delayp thing)
    (funcall (delay-closure thing))
    thing))

Basically, instances of delay are encapsulations of delayed evaluation, implemented using the lexical closure mechanism of CL. The macro delay will delay any form, and the function force will force the evaluation of a delayed form. If the argument passed to force is not delay, force effectively acts as the identity function.

Now, we can define our basic lazy functions (and one macro):

(defun lazy:cdr (cons)
  (setf (cl:cdr cons) (force (cl:cdr cons))))

(defmacro lazy:cons (head tail)
  `(cl:cons ,head (delay ,tail)))

(defun lazy:mapcar (function list &rest more-lists)
  (cons (apply function (car list) (cl:mapcar #'car more-lists))
        (apply #'lazy:mapcar function (cdr list) (cl:mapcar #'cdr more-lists))))

(defun lazy:list-length (list)
  (labels ((ll (list n)
             (cond ((consp list) (ll (cl:cdr list) (1+ n)))
                   ((null list) n)
                   (t nil))))
    (ll list 0)))

(defmacro lazy:dolist ((var list &optional value) &body body)
  (let ((s (gentemp)))
    `(do* ((,s ,list (cdr ,s))
           (,var (car ,s) (car ,s)))
          ((null ,s) ,value)
       ,@body)))

The explicit lazy: package prefixes are not necessary but I put them there for clarity. Notice that cdr is essentially the only function that calls force, and it will replace the forced delay with the result in the data structure; that way delays are not forced multiple times. We could have built this into the delay class of course, but I feel this is more elegant.

Next, we define some "helper" functions:

(defun first-n (list n)
  (and list
       (plusp n)
       (cl:cons (car list) (first-n (cdr list) (1- n)))))

(defun numbers (&optional (start 1) (end nil))
  (unless (and end (> start end))
    (cons start (numbers (1+ start) end))))

The function first-n will force the construction of a lazy list for the first (at most) n elements. The function numbers allows one to construct lists of consecutive integers, optionally specifying an end. Notice that the end is not necessary, we can construct infinite lists. The new definition of dolist will not do well with infinite lists, since it eagerly traverses the entire list it is given. I suppose

(dolist (i (numbers 1))
  ...)

is one way of doing an infinite loop now.

Finally, we get to the "showing off" part. The idea is borrowed from Henderson's book on functional programming: We will define a function that computes a list of all primes. Yes, all of them, since the list will be lazy. Here goes:

(defun primes ()
  (labels ((filter-multiples-of (n list)
             (if (zerop (mod (car list) n))
               (filter-multiples-of n (cdr list))
               (cons (car list) (filter-multiples-of n (cdr list)))))
           (primes-from (n)
             (cons n (filter-multiples-of n (primes-from (1+ n))))))
    (cons 1 (primes-from 2))))

This function will construct the list of primes as a cascade of filters, each filtering away the multiples of the previous prime in the list.

Posted by ora at 14:10