Skip to content

Add --llm_image_context for context-aware image descriptions#1037

Open
vertgo wants to merge 1 commit into
datalab-to:masterfrom
vertgo:feature/llm-image-context
Open

Add --llm_image_context for context-aware image descriptions#1037
vertgo wants to merge 1 commit into
datalab-to:masterfrom
vertgo:feature/llm-image-context

Conversation

@vertgo
Copy link
Copy Markdown

@vertgo vertgo commented May 22, 2026

New opt-in processor (LLMImageContextDescriptionProcessor) that describes every Picture/Figure block with an LLM, supplying the page's extracted markdown and the page image as context. Returns a short caption (rendered as image alt text) and a long description (rendered as a hidden HTML comment), so a text-only LLM reading the markdown alone can understand the visual content without a multimodal pipeline.

  • New flag --llm_image_context, orthogonal to --use_llm; works whether or not image extraction is enabled.
  • Picture/Figure blocks gain short_caption + long_description fields.
  • HTML renderer emits alt text + a hidden ; markdown renderer converts that aside into an HTML comment.
  • Gemini service activates for this flag independently of --use_llm.

Developed with assistance from Claude (Claude Code).

New opt-in processor (LLMImageContextDescriptionProcessor) that describes
every Picture/Figure block with an LLM, supplying the page's extracted
markdown and the page image as context. Returns a short caption (rendered
as image alt text) and a long description (rendered as a hidden HTML
comment), so a text-only LLM reading the markdown alone can understand the
visual content without a multimodal pipeline.

- New flag --llm_image_context, orthogonal to --use_llm; works whether or
  not image extraction is enabled.
- Picture/Figure blocks gain short_caption + long_description fields.
- HTML renderer emits alt text + a hidden <aside>; markdown renderer
  converts that aside into an HTML comment.
- Gemini service activates for this flag independently of --use_llm.

Developed with assistance from Claude (Claude Code).
@github-actions
Copy link
Copy Markdown
Contributor

CLA Assistant Lite bot:
Thank you for your submission, we really appreciate it. Like many open-source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution. You can sign the CLA by just posting a Pull Request Comment same as the below format.


I have read the CLA Document and I hereby sign the CLA


You can retrigger this bot by commenting recheck in this Pull Request

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant