« More progress on Wilbur2 | Main | Erdös Numbers »


Good Progress with Wilbur2

Lately I have been making nice progress with Wilbur2, and I am confident that I will get past the "pre-release" phase soon. Several people have provided bug fixes (for which I am grateful). The bug-fix-provider-of-the-month -award goes to Richard Newman.

I am fine-tuning the new Wilbur2 API by structuring it into various "protocols" (expressed as collections of DEFGENERICs). So now I have things like the "Data Management Protocol", "Data Source Loading Protocol", "Parsing Protocol", etc. Documentation is progressing, too.

I have also done some performance measurements. On an 867 MHz PowerBook G4 running OpenMCL, I can populate the triple store approximately at the rate of 800 μs per triple (loading a file with RDF/XML). I am using an indexed main memory database with literal interning. Performance is not terrible, considering that I could still do all kinds of code optimizations (none so far) and even switch to a compiler that produces faster executable code (say, SBCL). I did the tests with data sets of approx. 200,000-300,000 triples. I will post accurate numbers later, with comparisons to other toolkits/libraries. Eventually, I would expect to beat at least the Java-based implementations.

After some improvements to the Wilbur query engine, I was also able to query at speeds that are quite adequate (a few seconds to produce results sets of 50,000-100,000 nodes using moderately simple and short path patterns). I am particularly interested in query performance.

Posted by ora at 14:21


Thank you, I am very pleased by the award! :)

Did you get the nodeID patch? (I fear it's slowed down the parser, though -- one more case to check.)

I'm having a cursory look now at optimisation, and I'd also be interested to see how performant twinql was, both in general and against its 'competition', such as ARQ.

Posted by: Rich at October 1, 2005 07:21 PM

Are you still using MCL?

Posted by: Louis Theran at December 7, 2005 12:57 PM

Wilbur is (mostly) being developed on OpenMCL these days.

Posted by: Ora Lassila at December 7, 2005 04:35 PM