sodp/Documentation

SODP is short for "Simple ODP" (or maybe SQLite ODP) and provides a read-only HTML interface to the Open Directory.

One of the most notable features of SODP is that it allows you to host the complete Computers sub-category of the ODP with only about 13 MB of hard-disk space for the database.

Prerequisites

In addition to Python, packages for the expat XML parser and apsw are needed.

For the Web front-end, PHP with PDO and sqlite is needed, as well as the zlib package.

RDF Parser

sodp.py parses the ODP RDF dumps (which can be downloaded from http://rdf.dmoz.org) and generates a SQLite database file odp.db.

Before parsing the RDF dump, you will need to create the odp.db file either by using the regular db.sqlite schema or by using the dblite.sqlite schema (which saves some space by omitting a few unique indices), e.g.

sqlite3 odp.db <dblite.sqlite

The RDF dump from rdf.dmoz.org is split into a content and a structure part - generally, you should process the content part first and then update the database file with the structure part, e.g.

zcat content.rdf.u8.gz | python sodp.py -v -s Top/Computers &&
zcat structure.rdf.u8.gz | python sodp.py -v -s Top/Computers

This will generate a odp.db file with just the Computers category.

PHP Front-End

A simple read-only PHP front-end is included which can easily be integrated into an existing web site (or framework).

To get started, just take a look at sodp.php and odp_db.php.

To get nice-looking URLs you might want to add the following lines to your .htaccess:

~RewriteRule ^odp/(.*) sodp.php/$1
~RewriteRule ^odp$ /odp/ ~[R=permanent]

BTW, just have a look at odp.cmeerw.org to see how this can be integrated into a web site.