55 lines
2.4 KiB
Markdown
55 lines
2.4 KiB
Markdown
# apub2gmi.py
|
|
|
|
This is a script which takes an archive exported from a Mastodon account, looks for media attachments and uses them to build an archive for a Gemini server.
|
|
|
|
I use it to update the [Glossatory archives](gemini://gemini.mikelynch.org/glossatory/) from my account [@GLOSSATORY](https://weirder.earth/@GLOSSATORY).
|
|
|
|
It builds a hierarchy of folders with the structure YYYY/MM/DD and copies
|
|
media attachments into the appropriate day's folder.
|
|
|
|
It also adds index.gmi files at each level. Index files at the top and year
|
|
levels have links to the next level down. Index files at the month level have
|
|
links to all of the attachments for that month.
|
|
|
|
It assumes that there's only one media attachment per post. If it finds a post with more than one attachment it will only copy the first and issue a warning.
|
|
|
|
The alt-text is used as the title of each image in the month-level index file. If you want to only use part of the alt-text, you can provide a list of regular expressions in the config file which will be matched against it.
|
|
|
|
## Usage
|
|
|
|
python apub2gmi.py --archive PATH_OF_YOUR_ACTIVITYPUB_ARCHIVE/ --output GEMINI_OUTPUT --config CONFIG.json [--text OPTIONAL_COLOPHON_TEXT ] [--debug]
|
|
|
|
|
|
|
|
## Config
|
|
|
|
The configuration file is JSON as follows:
|
|
|
|
{
|
|
"url_re": "^/YOUR_SERVERS_MEDIA_ATTACHMENT_PATH/(.*)$",
|
|
"title_res": [
|
|
"^(.*?)\\.\\s*(.*)$"
|
|
]
|
|
}
|
|
|
|
|
|
`url_re` should match the URLs of media attachments in the ActivityPub JSON. This will depend on your server - here's an example from the GLOSSATORY archive.
|
|
|
|
|
|
"attachment":
|
|
[
|
|
{
|
|
"type": "Document",
|
|
"mediaType": "image/jpeg",
|
|
"url": "/weirderearth/media_attachments/files/105/839/131/582/626/008/original/9e2423c3ffd70dd0.jpeg",
|
|
"name": "BILLET: an unmarried working person (often used for making tying) The drawing depicts a person seated at a bench tying knots in a long cord.",
|
|
"blurhash": "U2Ss50M{~qt7-;t7IUt7_3-;RjM{RjD%-;WB",
|
|
"width": 1280,
|
|
"height": 1280
|
|
}
|
|
],
|
|
|
|
|
|
`title_res` is optional: it's a list of Python regular expressions which will be matched against the alt-text. The text used for the index page is the `()` group or groups from the first regexp which matches. If there's more than one group in the re, the results are joined with spaces.
|
|
|
|
Comments - questions - issues? let me know at [@mikelynch@aus.social](https://aus.social/@mikelynch) |