From 0fa68c18ce82167bf5ae40d2965d54ce69a3f261 Mon Sep 17 00:00:00 2001 From: vilmibm Date: Thu, 31 Aug 2023 19:43:54 +0000 Subject: [PATCH] add blackout post --- posts/0008_blackout.md | 28 ++++++++++++++++++++++++++++ 1 file changed, 28 insertions(+) create mode 100644 posts/0008_blackout.md diff --git a/posts/0008_blackout.md b/posts/0008_blackout.md new file mode 100644 index 0000000..3b40d9b --- /dev/null +++ b/posts/0008_blackout.md @@ -0,0 +1,28 @@ +pubdate: Fri Jul 14 04:52:12 UTC 2023 +title: blackout.tilde.town +slug: blackout + + +I made a new thing: a website for making blackout poetry with over nine million chunks of text extracted from Project Gutenberg. It's here at (LINK https://blackout.tilde.town blackout.tilde.town) . + +(IMG https://tilde.town/~vilmibm/blackout.png a screenshot of a blackout poem that reads: the picturesque decay remains an idea of the beautiful) + +Ever since (LINK https://tilde.town/~kc ~kc) posted (LINK https://tilde.town/~kc/blackout this page) I've been inspired by blackout poetry. I wanted an interface not only for doing it, but for giving me novel text to work with as well. + +I used Project Gutenberg's (LINK https://gutenberg.org/policy/robot_access.html robot access instructions) to get about 12 gigabytes of compressed plaintext English language books. It translated to about 35,000 books once duplicate encodings were ignored. + +(LINK https://git.tilde.town/vilmibm/gutchunk This code) , gutchunk, uncompressed the books and combed through them for what i'm calling "chunks." I was looking for meaty sections of text that would make for good blackout poetry fodder. My approach is fairly naive. I store text in a buffer until I see two newlines, then check if I have enough in the buffer; if I do, I cut a chunk. If I don't, I discard it. + +To my extreme pleasure I ended up with over nine million chunks. This is all sitting in a sqlite3 database on the town and if you're reading this and are also a townie, let me know if you want access to it. + +When I was working on (LINK https://github.com/vilmibm/prosaic prosaic) over the years I got a lot of junk from my sloppy parsing of gutenberg books. I was young and silly and not writing great code then. I was also afflicted with this perverse need to ingest ALL of the text into my cut-up corpora. I got a lot of cruft: chapter headings, tables of content, captions, and similar. So far I've pulled well over a hundred of my nine million chunks and they all look quite good. My simple heuristic avoided a lot of the noise that I get when running prosaic. Of course, I'm missing some text: short bits of dialogue, for example. This kind of thing would have haunted me in the past, but now knowing that mystery remains in these books feels good. I don't like finding (LINK https://tilde.town/~vilmibm/swamp the bottom of the swamp) . + +If you're interested, the code for blackout.tilde.town is also up on [our gitea](https://git.tilde.town/vilmibm/blackout) . + +There is no way to iterate over the chunks; you get a random one every single page load. Given the size of the ID space, this should mean an infinitesimally small chance for repeats. I wanted an experience like (LINK https://en.wikipedia.org/wiki/The_Library_of_Babel the library of babel) ; one of wandering and digging up scraps to scrawl upon. + +I'm hosting this decidedly personal project on tilde.town because I felt like it was a nice fit for our community. It's also my house and I can do whatever, though I try not to have that mindset too often. + +I may also make an SSH-hosted text-mode version. I haven't decided. + +I've already been really pleased with the experience of making poems using the new site and hope you like it, too. Please let me know on (LINK https://tiny.tilde.website/@vilmibm mastodon) or wherever if you're making stuff with it.