Skip to content

feat: extract and describe embedded images in PPTX files#6

Closed
olearydj wants to merge 2 commits intoMichaelliv:mainfrom
olearydj:feat/pptx-image-extraction
Closed

feat: extract and describe embedded images in PPTX files#6
olearydj wants to merge 2 commits intoMichaelliv:mainfrom
olearydj:feat/pptx-image-extraction

Conversation

@olearydj
Copy link
Copy Markdown

Summary

The PPTX converter currently extracts text from p:sp (shape) elements but ignores p:pic (picture) elements in the slide shape tree. This PR adds embedded image extraction and wires it to the existing describe callback, giving PPTX files the same AI-powered image description capability that standalone image files already have.

Changes

  • Accept options parameter in the PPTX converter's convert method
  • Load slide-level relationship files to resolve image references
  • Extract p:pic elements, resolve the embedded image data from the zip
  • Call options.describe(buffer, mimetype) when a provider is configured
  • Fall back to placeholder text with shape name and alt text when no provider is available
  • Add 12 tests covering text extraction, image placeholders, alt text, describe callback invocation, and error fallback
  • Add a PPTX test fixture with text-only slides, images with alt text, and images without alt text

Behavior

Without API key: images produce *[Image: Shape Name — alt text]* placeholders (or just *[Image: Shape Name]* if no alt text exists).

With configured provider: each embedded image is sent for AI description using the same pipeline as standalone image files, producing **[Image: Shape Name]** followed by the description.

Error handling: if the describe callback throws, the converter falls back to placeholder text rather than failing the entire conversion.

Test plan

  • bun test — 70 tests pass (58 existing + 12 new), 0 failures
  • Manual test on a 46-slide PPTX with mixed content (text, images, tables, speaker notes)
  • Verified image descriptions generated correctly with Anthropic provider
  • Verified graceful fallback without API key configured

🤖 Generated with Claude Code

olearydj and others added 2 commits March 28, 2026 13:37
The PPTX converter now extracts p:pic elements from slide shape trees,
resolves image data via slide-level relationships, and passes images to
the describe callback for AI-powered descriptions when configured.

Without an API key, images produce placeholder text with shape name and
alt text (if available). With a configured provider, each embedded image
is sent for description using the same pipeline as standalone image files.

Adds 12 tests and a PPTX fixture covering text extraction, image
placeholders, alt text, describe callback invocation, and error fallback.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Recursively collect p:pic elements from p:grpSp groups, not just
  top-level shapes. Grouped images (common after users group objects
  in PowerPoint) were previously skipped entirely.
- Handle package-absolute relationship targets (e.g. /ppt/media/image1.png)
  in addition to relative targets. Previously these resolved to invalid
  zip paths and were silently dropped.
- Update test fixture with a grouped image (slide 4)
- Add 2 regression tests for grouped image extraction

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@Michaelliv
Copy link
Copy Markdown
Owner

thanks @olearydj! i've added pptx image extraction following the same pattern as the pdf converter. images get extracted to a temp dir by default, or to a custom path with --image-dir. skipped the AI description path since most frontier LLMs can look at images directly. if you do need a description, you can always run markit image.png on the extracted image. closing in favor of that, but appreciate the PR!

@Michaelliv Michaelliv closed this Mar 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants