Skip to content

Comments

Implement embed v2 data-loading pipeline + H100 benchmark harness#56

Open
clemsgrs wants to merge 3 commits intomainfrom
codex/embed-v2-h100-loader-optimization
Open

Implement embed v2 data-loading pipeline + H100 benchmark harness#56
clemsgrs wants to merge 3 commits intomainfrom
codex/embed-v2-h100-loader-optimization

Conversation

@clemsgrs
Copy link
Owner

Summary

This PR implements the new embed v2 data-loading pipeline focused on single-node H100 throughput while keeping v1 compatibility.

Core changes

  • Add speed.embedding_pipeline (v1/v2) in embedding config.
  • Add adaptive sharding for v2:
    • rank_sharding_mode=auto -> slide sharding when pending_slides >= world_size, else tile fallback.
  • Add loader tuning knobs and heuristics:
    • num_workers_embedding supports "auto"
    • prefetch_factor_embedding, persistent_workers_embedding, pin_memory_embedding, loader_batch_timeout_sec, storage_mode
  • Refactor tile loading to cache one worker-local WSI handle in TileDataset.
  • Remove embed warmup dry-run batch and switch to lazy writer initialization from first real batch.
  • Add optional perf timing logs per slide (speed.log_perf_embedding).
  • Keep output compatibility (features/<slide>.pt by default).

Benchmark harness

  • Add scripts/benchmark_embed_v1_v2.py to compare embed-only throughput for v1/v2 on GPU sets like 1,4,8.
  • Script expects an existing tiling output with process_list.csv and coordinates/.

New/updated tests

  • test/test_dataset_worker_cache.py
  • test/test_rank_sharding_lpt.py
  • test/test_embed_pipeline_mode.py
  • test/test_regression_bugfixes.py (new check for v2 config key usage)
  • CI regression job now runs all test_*.py files.

Validation

  • python3 -m unittest discover -s test -p 'test_*.py'
  • python3 -m compileall slide2vec/embed.py slide2vec/data/dataset.py scripts/benchmark_embed_v1_v2.py
  • python3 scripts/benchmark_embed_v1_v2.py --help

Notes

  • slide2vec/hs2p submodule code was not modified in this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant