Skip to content

[recipes] Obsidian vault import#108

Merged
justfinethanku merged 3 commits intoNateBJones-Projects:mainfrom
snapsynapse:contrib/snapsynapse/obsidian-vault-import
Mar 24, 2026
Merged

[recipes] Obsidian vault import#108
justfinethanku merged 3 commits intoNateBJones-Projects:mainfrom
snapsynapse:contrib/snapsynapse/obsidian-vault-import

Conversation

@snapsynapse
Copy link
Copy Markdown
Contributor

Summary

  • Parses any Obsidian vault (markdown, frontmatter, wikilinks, inline tags)
  • Hybrid chunking: heading-based splits + optional LLM distillation for long sections
  • Embeddings via OpenRouter, insert via Supabase REST API with content fingerprint dedup
  • Secret detection prevents API keys, tokens, and passwords from reaching the database
  • Preflight check validates Supabase and OpenRouter connections before any work
  • Tested on an 800+ note vault (3,743 thoughts, 10 duplicates caught by fingerprint index)

Addresses all review feedback from #28:

  • Branch/commit naming conventions (item 1)
  • Content fingerprint dedup at DB level (item 2)
  • --no-embed documented in Options table (item 3)
  • .env parser handles quoted values (item 4)
  • Dead code HEADING_RE removed (item 5)
  • Cost and time estimate section added (item 6)
  • Rate limiting between embedding calls (item 7)

Test plan

  • --dry-run on a sample vault shows correct note/thought counts and secret flags
  • --limit 5 inserts exactly 5 notes worth of thoughts with preflight passing
  • Re-run skips already-imported notes via sync log
  • Duplicate content rejected by fingerprint index (409 → counted as skip, not failure)
  • Bad Supabase URL fails at preflight, not after chunking

🤖 Generated with Claude Code

snapsynapse and others added 3 commits March 13, 2026 19:59
Parses any Obsidian vault, chunks notes into atomic thoughts,
generates embeddings via OpenRouter, and inserts into Supabase.
Tested on 500+ note LifeHQ-pattern vault.

Closes NateBJones-Projects#13
…n vault import

Addresses all feedback from PR NateBJones-Projects#28: branch/commit naming conventions,
content fingerprint dedup, --no-embed docs, .env quote handling, dead
code removal, cost estimates, and rate limiting.

Adds preflight connection check, secret detection scanner, early abort
on consecutive failures, per-note sync log timestamps, and line-buffered
output. Tested on 800+ note vault (3,743 thoughts, zero data loss).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copy link
Copy Markdown
Collaborator

@justfinethanku justfinethanku left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review — Obsidian Vault Import Recipe

Thank you for this comprehensive contribution! This is an excellent example of a well-documented, production-ready recipe. I've reviewed the PR against the contribution standards and here's my assessment:

✅ What's Great

Excellent documentation:

  • README is thorough and beginner-friendly with clear prerequisites, step-by-step instructions, expected outcomes, and comprehensive troubleshooting
  • Includes helpful extras like cost estimates, time estimates, vault compatibility table, credential tracker, and rate limiting details
  • Well-structured with all required sections plus valuable additions

Strong safety features:

  • Secret scanning prevents API keys/tokens from reaching the database
  • Preflight checks validate connections before any work
  • Content fingerprint deduplication at database level
  • Proper error handling with retry logic and exponential backoff
  • Rate limiting to respect upstream APIs

Clean implementation:

  • No hardcoded credentials (uses .env)
  • No SQL safety issues (only safe INSERTs with proper error handling)
  • All files properly scoped to recipes/obsidian-vault-import/
  • No binary blobs or oversized files
  • Battle-tested parsing logic adapted from OpenBrainBeta

Proper PR format:

  • Title follows [category] Description format ✅
  • Branch name follows contrib/<username>/<description> convention ✅
  • Commits properly prefixed with [recipes]
  • PR description includes summary, test plan, and references to addressed review feedback ✅

Complete metadata.json:

  • All required fields present and valid ✅
  • Proper semantic versioning ✅
  • Accurate difficulty and time estimates ✅
  • Author properly credited ✅

📋 Minor Observations (No Action Required)

  1. Optional enhancement: The README mentions a content_fingerprint column as "recommended" but doesn't show the ALTER TABLE statement to add it. Consider adding this SQL snippet for completeness:

    ALTER TABLE thoughts ADD COLUMN IF NOT EXISTS content_fingerprint TEXT;
    CREATE UNIQUE INDEX IF NOT EXISTS thoughts_content_fingerprint_idx
      ON thoughts (content_fingerprint);

    (The CREATE INDEX is already documented, just missing the ALTER TABLE)

  2. Graceful Boundaries link: Nice touch linking to the graceful-boundaries spec in the rate limiting section. This kind of thoughtful detail makes for excellent documentation.

  3. Code quality: The Python code is clean, well-commented, and follows good practices. The hybrid chunking strategy (heading-based + optional LLM distillation) is elegant.

✅ Review Checklist

  • Folder structure: Correctly placed in recipes/
  • Required files: README.md ✅, metadata.json ✅, code files ✅
  • Metadata valid: All required fields present, valid JSON, correct schema
  • No credentials: Uses environment variables, .env.example provided
  • SQL safety: Only safe INSERTs, no DROP/TRUNCATE/unqualified DELETE
  • Category artifacts: Python script + requirements.txt present
  • PR format: Title starts with [recipes]
  • No binary blobs: No files over 1MB, no .exe/.dmg/.zip
  • README completeness: Prerequisites ✅, Steps ✅, Expected Outcome ✅, Troubleshooting ✅
  • Scope check: All changes within contribution folder
  • No core modifications: Doesn't alter thoughts table structure (only adds fingerprint via recommended SQL)

🎯 Verdict: Ready to merge

This is a high-quality contribution that meets or exceeds all contribution standards. The documentation is exemplary, the code is production-ready, and the safety features demonstrate thoughtful engineering. The vault compatibility table and cost/time estimates are particularly helpful additions that go above and beyond the minimum requirements.

The only minor enhancement (adding the ALTER TABLE statement) is optional and doesn't block merging. This recipe is ready to ship.

Great work, Sam! 🚀

@justfinethanku justfinethanku merged commit 11edf12 into NateBJones-Projects:main Mar 24, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants