Skip to content

Parallelize markdown generation with ProcessPoolExecutor#243

Merged
svekars merged 2 commits intopytorch_sphinx_theme2from
fix-markdown
Apr 8, 2026
Merged

Parallelize markdown generation with ProcessPoolExecutor#243
svekars merged 2 commits intopytorch_sphinx_theme2from
fix-markdown

Conversation

@svekars
Copy link
Copy Markdown
Contributor

@svekars svekars commented Apr 6, 2026

  • Parallelize HTML-to-markdown conversion using ProcessPoolExecutor for significant speedup on large doc sites (69x
    faster on 2,563 PyTorch Python doc pages)
  • Refactor _generate_md_files to return a dict of {docname: md_content} instead of a count, passing in-memory content
    to _generate_llms_full_txt to avoid redundant disk reads
  • Update tests to match the new API and add parallelism-specific tests validated against real PyTorch docs

pytorch_sphinx_theme2/llm_generation.py:

  • Extract _convert_single_doc() as a top-level function (required for ProcessPoolExecutor pickling)
  • _generate_md_files() now uses ProcessPoolExecutor to convert docs in parallel; returns dict[docname, md_content]
    instead of int
  • _generate_llms_full_txt() takes the in-memory md_contents dict instead of re-reading .md files from disk

tests/test_llm_markdown.py:

  • Fix 7 tests (TestGenerateMdFiles, TestGenerateLlmsFullTxt) to match new return types and signatures
  • Add TestParallelConversion class with 4 tests validating correctness and performance

Benchmarks

Dataset Pages Sequential Parallel Speedup
PyTorch Python docs 2,563 622.5s (~10 min) 9.0s 69.4x

All parallel results match sequential output byte-for-byte.

Test plan

  • All 37 unit tests pass (pytest tests/test_llm_markdown.py)
  • Parallel output matches sequential output byte-for-byte on 66 PyTorch C++ doc pages
  • Parallel output matches sequential output byte-for-byte on 2,563 PyTorch Python doc pages
  • Verified .md files are written to disk correctly
  • 200-file synthetic stress test passes

Tested against pytorch docs pytorch/pytorch#179268

@netlify
Copy link
Copy Markdown

netlify bot commented Apr 6, 2026

Deploy Preview for pytorchsphinxtheme ready!

Name Link
🔨 Latest commit c222c62
🔍 Latest deploy log https://app.netlify.com/projects/pytorchsphinxtheme/deploys/69d42613605a2b0008a81499
😎 Deploy Preview https://deploy-preview-243--pytorchsphinxtheme.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@meta-cla meta-cla bot added the cla signed label Apr 6, 2026
@svekars svekars marked this pull request as ready for review April 8, 2026 18:29
@AlannaBurke
Copy link
Copy Markdown

LGTM!

@svekars svekars merged commit bf39e08 into pytorch_sphinx_theme2 Apr 8, 2026
7 checks passed
@svekars svekars mentioned this pull request Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants