• Skull giver@popplesburger.hilciferous.nl
    link
    fedilink
    arrow-up
    27
    ·
    edit-2
    1 year ago

    Looks like the admin button called “purge user” does purge all posts by that user, but not the cached image/video content they uploaded.

    Looking through the Lemmy code, I can’t see how Lemmy is even supposed to delete images. pict-rs requires a delete token, but the delete token doesn’t seem to get stored anywhere.

    That’s a rather big oversight. Now I need to write a script to figure out how the fuck the pict-rs disk format works and how to generate delete tokens to make lemmy delete these files without breaking my instance.


    Edit: You can purge specific files using curl -H "X-Api-Token: $PICTRS__SERVER__API_KEY" http://pict-rs:9000/internal/purge?alias=$filename. However, having purged the CSAM posts from the database already, I’m now stuck unable to find these files (and I’m sure as hell not going to look through the image folder for them).


    Edit 2: I’m pretty sure the only directly addressable content that can end up on your server is the thumbnails Lemmy will proxy. That means that purging the thumbnails should be all you need to do.

    I’ve tried to make a script that operates on the sled store and image structure of pictrs but it’s just too weird a system. Some thumbnails are kept on external servers, those aren’t a problem for your local storage.

    Thumbnails will be re-downloaded on the fly so only federated content that’s not been removed by an admin comes back. I’ve made this script to purge all thumbnails uploaded to the pict-rs server:

    #!/usr/bin/env python3
    
    import os
    import psycopg
    import requests
    from tqdm import tqdm
    
    username = os.environ.get('LEMMY_DB_USERNAME')
    password = os.environ.get('LEMMY_DB_PASSWORD')
    host = os.environ.get('LEMMY_DB_HOST')
    db = os.environ.get('LEMMY_DB_DATABASE')
    pictrs_path = os.environ.get('PICTRS_PATH')
    pictrs_host = os.environ.get('PICTRS_HOST')
    api_key = os.environ.get('PICTRS_API_KEY')
    
    headers = {'X-Api-Token': api_key}
    
    with psycopg.connect(f'dbname={db} user={username} password={password} host={host}') as connection:
        with connection.cursor() as cursor:
            thumbnails = cursor.execute("select thumbnail_url from post where thumbnail_url like concat('https://', (select domain from instance where id = 1),'/%')").fetchall()
    
            uuids = [thumb[0].rsplit('/', 1)[1] for thumb in thumbnails]
    
            for idx, uuid in enumerate(tqdm(uuids)):
                
                purge_request = requests.post(f'http://{pictrs_host}/internal/purge?alias={uuid}', headers=headers)
    
    

    This requires the following Python dependencies:

    • tqdm
    • psycopg
    • requests

    It uses the following environment variables:

    • PICTRS_API_KEY: the admin API key for pict-rs. Set in docker-compose on my system, read the documentation for your setup if you don’t know what this is.
    • PICTRS_HOST: the host on which pictrs is running. Can be 127.0.0.1:8080 if you’re not using containers. For me, the Docker container running pict-rs is on 172.17.0.2
    • PICTRS_PATH: the path to where the lemmy picture
    • LEMMY_DB_USERNAME: the username to connect to Postgres with
    • LEMMY_DB_PASSWORD: the password to connect to Postgres with
    • LEMMY_DB_HOST: the host the database runs on. Can be 127.0.0.1, can be anything else, check your lemmy config and read the documentation if you don’t know yours
    • LEMMY_DB_DATABASE: the database Lemmy uses. The default is lemmy so that’s probably right for your setup.

    Call it like this:

    user@server $ PICTRS_API_KEY=abcdefghijklmn PICTRS_HOST=172.17.0.2:8080 LEMMY_DB_USERNAME=lemmy LEMMY_DB_PASSWORD=abcdefghijklmno LEMMY_DB_HOST=172.17.0.3 LEMMY_DB_DATABASE=lemmy python3 pictrs_cleanup.py
    

    Edit 3: there are still tons of files in my pict-rs directory after purging all thumbnails. I can’t tell where they’re coming from, but I’ve collected gigabytes of them.

    Guess I’m just wiping all image files like this posts suggests. Screw making the pict-rs state consistent, I’m not even going to attempt to fix a disk format that combines raw UUID bytes and “.png” in a single field.

    This is the best I’m gonna get:

    sudo find volumes/pictrs/files -type f -ctime -2 -exec shred --remove -n 1 {} \;
    sudo fstrim / -v;
    

    wiping SSDs is hard.