Skip to content

feat: external tool PDF to Markdown import#820

Open
S1933 wants to merge 14 commits into
refactoringhq:mainfrom
S1933:feat/pdf-markdown-import
Open

feat: external tool PDF to Markdown import#820
S1933 wants to merge 14 commits into
refactoringhq:mainfrom
S1933:feat/pdf-markdown-import

Conversation

@S1933

@S1933 S1933 commented Jun 6, 2026

Copy link
Copy Markdown

Summary

Add PDF to Markdown import using runtime-detected external tools (Poppler + Tesseract). Users can convert any PDF in the vault into an editable Markdown note, with optional OCR for scanned pages. The source PDF is never modified.

What changed

  • Added convert_pdf_to_markdown_note Tauri command (pdf_import_cmds.rs, pdf_import_extract.rs)
  • Added PdfMarkdownImportDialog component with OCR mode/language selection (pdfImport.* locale keys in all 17 languages)
  • Added "Convert to note" action from file preview, note-list context menu, and command palette
  • Added PostHog analytics events: pdf_markdown_import_started, pdf_markdown_import_completed, pdf_markdown_import_failed
  • Added pdfMarkdownImport utility (src/utils/pdfMarkdownImport.ts) with typed request/response interfaces
  • Added ADR 0137 documenting the external-tool approach
  • Refactored PDF import modules to eliminate circular dependencies and reduce file size (CodeScene/Codacy compliance)

Why

Users need to turn PDFs into editable Markdown notes while keeping source PDFs in the vault. Bundling a native PDF/OCR stack would increase packaging complexity before demand is proven. Runtime detection of pdfinfo, pdftotext, pdftoppm, and tesseract keeps Tolaria lean while still offering text extraction, page-by-page OCR, and a clear upgrade path.

How to test

  1. Install Poppler (brew install poppler) and optionally Tesseract (brew install tesseract).
  2. Launch Tolaria with a vault that contains a PDF.
  3. Right-click a PDF in the note list → "Convert to note", or click the button in the file preview toolbar, or use the command palette on an active PDF tab.
  4. Select OCR mode and language in the dialog, then confirm.
  5. Verify a new Markdown note is created next to the PDF with frontmatter metadata, a link back to the source PDF, and extracted page content.
  6. Confirm toast message on success, and error toast when Poppler/Tesseract are missing.

Screenshots

Capture d’écran 2026-06-06 à 15 41 25 Capture d’écran 2026-06-06 à 15 41 01 Capture d’écran 2026-06-07 à 15 26 50

Checklist

  • Code follows the project coding standards.
  • Tests added or updated.
  • Documentation updated.
  • Existing tests pass.

@S1933 S1933 force-pushed the feat/pdf-markdown-import branch 3 times, most recently from 0726289 to c9dc9b5 Compare June 6, 2026 13:02
@S1933 S1933 changed the title [DRAFT] feat: external tool PDF to Markdown import feat: external tool PDF to Markdown import Jun 6, 2026
@S1933

S1933 commented Jun 6, 2026

Copy link
Copy Markdown
Author

First of all, thank you for this project!

I use this tool in my daily professional work, and it has been useful for me.

The idea for this feature comes from my own use case. I have a lot of PDF documents that I'd like to turn into a knowledge base, and I've found that Markdown files are much easier for AI agents to use than raw PDFs, especially with MCP.

This is my first feature contribution to the project, so please don't hesitate to point me in the right direction if the PR description or the implementation should be improved.

@S1933 S1933 marked this pull request as ready for review June 6, 2026 14:06
Add pdf-to-markdown external tool integration that converts PDF files to Markdown and imports them into the vault.

Includes context menu, command palette, file preview integration, localized UI copy, PostHog events, tests, and ADR 0138.
@S1933 S1933 force-pushed the feat/pdf-markdown-import branch from afc7fad to d3c3ec2 Compare June 7, 2026 13:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant