« Back from Paris | Main | Edinburgh »


Measuring Wilbur's Performance

Speed Chart I have done some performance measurements, and comparing different Common Lisp platforms. First, I was mostly interested in load speed, i.e., the speed at which one can load triples into a database. This matters when you want to play with large amounts of data.

I created an instrumented database mixin class that allows me to measure transient load speed over chosen chunk size. I then created two database classes, one that indexes triples as it loads them (like Wilbur's normal database classes do) and one that does not; the built-in Common Lisp eq hash tables are used as indices. I then used one of the UniProt RDF files (with approx. 340,000 triples) to gather data, and discovered something unpleasant: Both OpenMCL and SBCL seem to load triples at a fairly constant speed, with indexing creating only a slight (but constant) overhead. Allegro, however, started really slowing down around 150,000 triples, so much so that the total time to load the file was almost 3 times that of OpenMCL and SBCL. Without indexing, Allegro was much faster than the others.

I then created my own hash-table implementation (loosely based on the genhash package from Ingvar Mattsson which I cleaned up and optimized a bit). Using this as the basis for our triple-index mechanism brought the performance on Allegro approximately on par with the others. Here is a graph that shows the transient load speed (as triples/s, taken in 3000 triple samples and shown as a sliding average over 10 samples). Note that the little hump at the tail end of the load is an artifact of the file, there are easier triples at the end...

Posted by ora at 13:08