This experimental project automates the workflow of exporting Google Docs to Markdown. It uses a headless browser to trigger exports and Pandoc to ensure high-fidelity conversion.
The pipeline follows a three-step automated process:
- Download: Uses
playwrightto navigate to Google Doc export URLs and save them as.odtfiles. - Convert: Uses
pypandocto transform the.odtfiles into Markdown. - Automate: A GitHub Action runs daily (or on-demand) to sync changes back to the repository.
main.py: The entry point that orchestrates the download and conversion tasks.download.py: Handles asynchronous browser interactions to fetch files.convert.py: Manages the Pandoc conversion logic and directory handling.config.py: Centralized configuration for file URLs and output paths usingpydantic-settings.
- Python 3.13
- Pandoc (system-level)
- Playwright Chromium
- Install dependencies:
pip install -r requirements.txt playwright install chromium
- Run the pipeline:
python main.py
The converted files will be generated in the converted/ directory as specified in the configuration.