Parallelize markdown generation with ProcessPoolExecutor by svekars · Pull Request #243 · pytorch/pytorch_sphinx_theme

svekars · 2026-04-06T17:42:56Z

Parallelize HTML-to-markdown conversion using ProcessPoolExecutor for significant speedup on large doc sites (69x
faster on 2,563 PyTorch Python doc pages)
Refactor _generate_md_files to return a dict of {docname: md_content} instead of a count, passing in-memory content
to _generate_llms_full_txt to avoid redundant disk reads
Update tests to match the new API and add parallelism-specific tests validated against real PyTorch docs

pytorch_sphinx_theme2/llm_generation.py:

Extract _convert_single_doc() as a top-level function (required for ProcessPoolExecutor pickling)
_generate_md_files() now uses ProcessPoolExecutor to convert docs in parallel; returns dict[docname, md_content]
instead of int
_generate_llms_full_txt() takes the in-memory md_contents dict instead of re-reading .md files from disk

tests/test_llm_markdown.py:

Fix 7 tests (TestGenerateMdFiles, TestGenerateLlmsFullTxt) to match new return types and signatures
Add TestParallelConversion class with 4 tests validating correctness and performance

Benchmarks

Dataset	Pages	Sequential	Parallel	Speedup
PyTorch Python docs	2,563	622.5s (~10 min)	9.0s	69.4x

All parallel results match sequential output byte-for-byte.

Test plan

All 37 unit tests pass (pytest tests/test_llm_markdown.py)
Parallel output matches sequential output byte-for-byte on 66 PyTorch C++ doc pages
Parallel output matches sequential output byte-for-byte on 2,563 PyTorch Python doc pages
Verified .md files are written to disk correctly
200-file synthetic stress test passes

Tested against pytorch docs pytorch/pytorch#179268

netlify · 2026-04-06T17:43:02Z

✅ Deploy Preview for pytorchsphinxtheme ready!

Name	Link
🔨 Latest commit	`c222c62`
🔍 Latest deploy log	https://app.netlify.com/projects/pytorchsphinxtheme/deploys/69d42613605a2b0008a81499
😎 Deploy Preview	https://deploy-preview-243--pytorchsphinxtheme.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

AlannaBurke · 2026-04-08T19:39:09Z

LGTM!

Parallelize markdown generation with ProcessPoolExecutor

57606c3

meta-cla bot added the cla signed label Apr 6, 2026

Update

c222c62

svekars marked this pull request as ready for review April 8, 2026 18:29

AlannaBurke approved these changes Apr 8, 2026

View reviewed changes

svekars merged commit bf39e08 into pytorch_sphinx_theme2 Apr 8, 2026
7 checks passed

svekars mentioned this pull request Apr 8, 2026

Bump theme to v0.4.8 #244

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize markdown generation with ProcessPoolExecutor#243

Parallelize markdown generation with ProcessPoolExecutor#243
svekars merged 2 commits intopytorch_sphinx_theme2from
fix-markdown

svekars commented Apr 6, 2026 •

edited

Loading

Uh oh!

netlify bot commented Apr 6, 2026 •

edited

Loading

Uh oh!

AlannaBurke commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

svekars commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Benchmarks

Uh oh!

netlify bot commented Apr 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

✅ Deploy Preview for pytorchsphinxtheme ready!

Uh oh!

AlannaBurke commented Apr 8, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

svekars commented Apr 6, 2026 •

edited

Loading

netlify bot commented Apr 6, 2026 •

edited

Loading