Skip to content

Conversation

@ghuds540
Copy link

NOTE: This scraper was mostly generated by Claude Code (Sonnet), and has been validated and tested by me with minimal modification.

Adds a scene and image fragment scraper which matches filenames to either an md5 hash or the general format of r34_{post_id_number}_{...}.{ext}.

Tag categorization:

  • Characters → Performers
  • Artists → Studio (first non-voice actor artist)
  • Voice actors → Performers
  • Copyright → Tags
  • General → Tags
  • Meta → Tags

Open to feedback on this one. I have been treating Studios as a direct equivalent to Artists in my personal instance, but it's a bit of a bummer I can't assign multiple to the same media to keep parity with sites like rule34.xxx and other booru.

@feederbox826
Copy link
Collaborator

just going to nip this in the bud immediately, python isn't required for this.

Stash does md5 calculations already and supports query via {checksum} https://docs.stashapp.cc/in-app-manual/scraping/scraperdevelopment/#scrapexpath-and-scrapejson-use-with-scenebyfragment-and-scenebyqueryfragment

also having multiple seperate scrapers instead of just reusing one since they all use the similar format

@feederbox826 feederbox826 marked this pull request as draft January 11, 2026 02:20
@ghuds540
Copy link
Author

Thanks, I'll take a look at simplifying this when I get a chance, if possible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants