« OINK | Main | Optimizing, finally... »


Bigger Datasets

I have just experimented with a dataset of about 2 million triples in Wilbur's main-memory triple store (again enzyme, protein and gene data) and browsed it with OINK. I used a 1.67MHz Powerbook G4; on this machine, Wilbur loads and parses triples from the RDF/XML-formatted file at the speed of about 2200 triples/sec (still largely unoptimized code, running on OpenMCL). It took about 15 minutes to load about 150MB of RDF.

From the viewpoint of a human user browsing data, performance was quite comfortable, although it seems that at times the reasoner hogs 80-90% of the CPU... The data has pretty deep subclass hierarchies, multiple direct superclasses for many classes, and lots of domain and range definitions for properties.

It may be time to measure the query engine's and reasoner's performance. I think the indexing of the triple store could be improved, but I need to figure out what the dominant (low-level) query patterns are.

Posted by ora at 05:55