Skip to content

fetch-media doesn't cache article cover or inline images locally #150

@RealADemin

Description

@RealADemin

src/bookmark-media.ts handles tweet media and author profile images, downloading them to ~/.ft-bookmarks/media/ with the existing manifest, dedup, size-cap, and content-type guards. Article images — both the hero (article.cover_media) and inline images (article.media_entities[]) — aren't recognized by the fetch pipeline at all. Once #148 lands and preserves the URL fields on the bookmark record, those URLs will sit in the data unfetched, served remotely from pbs.twimg.com at render time.

grep of the current fetch pipeline confirms zero coverage:

"article" occurrences in bookmark-media.ts:       0
"cover_media" occurrences:                         0
"media_entities" occurrences:                      0

Why local caching matters

The same arguments that justify ft's tweet-media caching apply to article images:

  • Link rot. Twitter rotates pbs.twimg.com URLs. Remote-only article images break over time, and a bookmark archive without local copies isn't really an archive.
  • Offline use. A bookmark archive should be queryable without network access. Article images currently aren't.
  • Rate-limit exposure. Rendering article-heavy views hits twimg.com per image; ft's local cache makes this a no-op.
  • Consistency. Tweet images on the same host (pbs.twimg.com/media/*.jpg) are already cached. Article images from the same host aren't — purely by happenstance of the data flow ft has wired up so far.

Evidence

Verified May 2026 across three X Articles fetched via TweetResultByRestId:

  • 3 articles surfaced 3 cover images + 4 inline article images = 7 URLs total that would be fetched
  • All on pbs.twimg.com — same host ft's existing tweet-media fetcher already uses
  • No new authentication, redirect handling, or hostname allowlist changes needed; the host is implicitly trusted via the existing pipeline

Suggested scope

Extend src/bookmark-media.ts to recognize article URLs as fetch candidates:

Function Currently handles Needs to also handle
hasMediaCandidate (L107) tweet media, profile images article cover, article inline images
resolveMediaTargets (L172) builds tweet/profile targets also build article-image targets
hasPendingMediaTarget (L229) checks tweet/profile pending state also check article-image pending state
fetchBookmarkMediaBatch (L241) runs the fetch loop unchanged — consumes target list from resolveMediaTargets

Naming, manifest entry shape, dedup key, optional --skip-article-images flag, filename pattern — all maintainer's choice. ft's existing patterns for profile images (URL-deduped) and tweet media (per-bookmark) are good reference points; same article quoted across multiple bookmarks probably wants URL-only dedup like profile images, but that's an implementation call.

Dependency

#148 (article structure preservation) must land first. Without it, the article URL fields don't exist on the bookmark record, so there's nothing to fetch. Once #148 lands — regardless of the schema choice (column-per-field / JSON blob / normalized articles table) — the URLs will be accessible at bookmark.article.cover_media.media_info.original_img_url and bookmark.article.media_entities[].media_info.original_img_url (or wherever the maintainer chooses to put them in the final schema).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions