this is code I used to process about 12gb of plaintext books from Project Gutenberg.

it assumes a mirror of gutenberg books made using their robot access endpoint.

so far the output of this is being used on blackout.

if you're a townie and want access to the database i made using this lmk.