Skip to content

bearInTheRoad/obsidian_note_converter

Repository files navigation

Obsidian Note Converter

A Python tool that converts Obsidian markdown notes containing embedded images into text-based notes and flashcards using Claude AI. This tool extracts mathematical formulas, diagrams, and other visual content from images and converts them into LaTeX and text format.

Features

  • Image-to-Text Conversion: Converts embedded images (PNG, JPEG, GIF, WebP) to text using Claude AI vision capabilities
  • Multiple Markdown Formats: Supports both standard markdown (![alt](image.png)) and Obsidian wiki-link (![[image.png]]) image formats
  • LaTeX Math Support: Automatically converts mathematical content to proper LaTeX formatting
  • Flashcard Generation: Creates spaced repetition flashcards from processed notes
  • Batch Processing: Process entire folders of markdown files at once
  • Error Handling: Gracefully handles missing images and API errors
  • Processing Reports: Generates detailed reports of successful and failed conversions

Prerequisites

  • Python 3.7 or higher
  • Anthropic Claude API key
  • Internet connection for API calls

Installation

Quick Install (Recommended)

Run the installation script:

chmod +x install_dependencies.sh
./install_dependencies.sh

Manual Installation

  1. Clone or download this repository
  2. Install required Python packages:
pip install anthropic python-dotenv pytest
  1. Set up your environment variables:
# Create a .env file in the project root
echo "CLAUDE_API_KEY=your_api_key_here" > .env

Configuration

  1. API Key Setup:

    • Get your Claude API key from Anthropic Console
    • Create a .env file in the project root directory
    • Add your API key: CLAUDE_API_KEY=your_actual_api_key
  2. Directory Structure:

    your-project/
    ├── input_notes/          # Your Obsidian markdown files
    │   ├── note1.md
    │   ├── note2.md
    │   └── Image Assets/     # Image files (or relative paths)
    │       ├── diagram1.png
    │       └── graph1.jpg
    └── processed_notes/      # Output directory (created automatically)
        ├── note1_complete.md
        ├── note2_complete.md
        ├── flashcards/
        │   ├── note1_flashcards.md
        │   └── note2_flashcards.md
        └── processing_report.json
    

Usage

Process Entire Folder

from note_converter import NoteConverter
import os
from dotenv import load_dotenv

# Load environment variables
load_dotenv()
api_key = os.getenv("CLAUDE_API_KEY")

# Initialize converter
converter = NoteConverter(api_key)

# Process all markdown files in a folder
results = converter.process_folder(
    input_folder="./input_notes",
    output_folder="./processed_notes"
)

print(f"Processed: {len(results['processed'])} files")
print(f"Failed: {len(results['failed'])} files")

Process Single Note

# Process a single markdown file
complete_note = converter.process_markdown_note("./input_notes/calculus.md")

# Generate flashcards from the complete note
flashcards = converter.markdown_to_flashcards(complete_note, "calculus")

Command Line Usage

# Run the main script
python note_converter.py

Make sure to update the folder paths in the __main__ section of note_converter.py to match your setup.

Supported Image Formats

  • PNG (.png)
  • JPEG (.jpg, .jpeg)
  • GIF (.gif)
  • WebP (.webp)

Supported Markdown Image Formats

  1. Standard Markdown: ![alt text](image.png)
  2. Obsidian Wiki-links: ![[image.png]]
  3. Relative Paths: ![diagram](folder/image.png)
  4. Image Assets Folder: Automatically looks in Image Assets/ subdirectory

Output Files

For each processed note, the tool generates:

  1. Complete Note (filename_complete.md): Original note with images replaced by extracted text
  2. Flashcards (filename_flashcards.md): Generated Q&A flashcards for spaced repetition
  3. Processing Report (processing_report.json): Summary of successful and failed conversions

Example

Input Markdown

# Calculus Notes

The derivative of $x^2$ is shown in this diagram:

![derivative](derivative_graph.png)

This is fundamental to understanding calculus.

Output Complete Note

# Calculus Notes

The derivative of $x^2$ is shown in this diagram:

$$f'(x) = 2x$$

The graph shows the parabola $y = x^2$ and its derivative $y = 2x$, demonstrating how the slope of the tangent line at any point $(x, x^2)$ equals $2x$.

This is fundamental to understanding calculus.

Generated Flashcards

Q: What is the derivative of $x^2$?
A: $f'(x) = 2x$

---

Q: How does the derivative $2x$ relate to the original function $x^2$?
A: The derivative $2x$ gives the slope of the tangent line at any point on the parabola $y = x^2$.

---

Testing

Run the comprehensive test suite:

# Run all tests
pytest test.py -v

# Run with detailed output
pytest test.py -v --tb=short

# Run specific test
pytest test.py::TestNoteConverter::test_extract_image_paths_standard_markdown -v

The test suite includes:

  • Image path extraction (standard and Obsidian formats)
  • Image-to-text conversion
  • Markdown processing
  • Flashcard generation
  • Folder processing
  • Error handling

Error Handling

The tool handles various error scenarios gracefully:

  • Missing Images: Replaces with [Image not found: filename]
  • API Errors: Replaces with [Image conversion failed: filename]
  • Invalid Files: Logs errors and continues processing other files
  • Network Issues: Provides detailed error messages

Limitations

  • Requires active internet connection for Claude API
  • API usage costs apply (see Anthropic pricing)
  • Large images may take longer to process
  • Complex diagrams may not convert perfectly

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

License

This project is open source. Please check with Anthropic's terms of service for API usage guidelines.

Troubleshooting

Common Issues

  1. API Key Error:

    • Verify your .env file contains the correct API key
    • Check that the key has proper permissions
  2. Missing Images:

    • Ensure image paths in markdown are correct
    • Check that images exist in the specified directories
  3. Import Errors:

    • Run the installation script to ensure all dependencies are installed
    • Verify Python version compatibility
  4. Permission Errors:

    • Ensure write permissions for output directories
    • Check file permissions for input files

Getting Help

  • Check the test files for usage examples
  • Review the processing report JSON for detailed error information
  • Ensure all file paths use forward slashes or proper Path objects

About

A Python tool that converts Obsidian markdown notes containing embedded images into text-based notes and flashcards using Claude AI. This tool extracts mathematical formulas, diagrams, and other visual content from images and converts them into LaTeX and text format.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors