RDF Parser

sodp.py parses the ODP RDF dumps (which can be downloaded from http://rdf.dmoz.org) and generates a SQLite database file odp.db.

Before parsing the RDF dump, you will need to create the odp.db file either by using the regular db.sqlite schema or by using the dblite.sqlite schema (which saves some space by omitting a few unique indices), e.g.


sqlite3 odp.db <dblite.sqlite

The RDF dump from rdf.dmoz.org is split into a content and a structure part - generally, you should process the content part first and then update the database file with the structure part, e.g.


zcat content.u8.rdf.gz | python sodp.py -v -s Computers &&
zcat structure.u8.rdf.gz | python sodp.py -v -s Computers

This will generate a odp.db file with just the Computers category.