ci: cache HuggingFace models and pip to stop HF 429 failures#19
Closed
przemarzec wants to merge 6 commits into
Closed
ci: cache HuggingFace models and pip to stop HF 429 failures#19przemarzec wants to merge 6 commits into
przemarzec wants to merge 6 commits into
Conversation
chore: forward-merge dev into main (stable mirror sync)
chore: forward-merge dev into main (stable mirror sync)
…e tests (WS docs)
chore: forward-merge dev into main (stable mirror sync)
The synthetic-benchmark tests load a real sentence-transformers model (all-MiniLM-L6-v2) on purpose, so CI downloads it from the Hub. With no caching, every test-matrix job and the smoke gate re-downloaded it, and the concurrent requests tripped HF rate limiting (HTTP 429), failing the model-dependent tests intermittently. Add caching across the workflows that install the embeddings stack: - cache: "pip" on setup-python in ci (lint/typecheck/test), smoke-gate, release (publish) and upgrade-test — reuses the torch/sentence-transformers download. - actions/cache on ~/.cache/huggingface in the model-loading jobs (ci test matrix, smoke-gate, release publish), keyed on the model — the model is fetched once and reused. - On a cache hit, set HF_HUB_OFFLINE/TRANSFORMERS_OFFLINE so no Hub request is made at all; gated on cache-hit so a cold first run can still download once (an empty cache + offline fails to find the files). No test logic changes — the benchmark tests still exercise the real model, now served from cache. Also corrects the stale pyproject [dev] comment that claimed a CI cache existed before this change. Verified locally: with the model cached, HF_HUB_OFFLINE=1 loads it with zero network, and the evaluator benchmark tests pass offline.
Address review of the HF/pip cache change: - ci.yml: move the HuggingFace model download into a dedicated warm-hf-cache job that runs once before the test matrix (test now `needs: warm-hf-cache`). On a cold cache this guarantees a single, serialized model download for the whole workflow instead of three concurrent downloads across the matrix legs (which could still trip HF 429 and, if all legs failed, never populate the cache). The matrix jobs restore the warmed cache and run offline on a cache hit. Add actions/checkout before setup-python in the warm job so the pip cache key has the dependency files to hash. - pyproject.toml: reword the sqlite-vec dev-dependency comment — the extension exercises KNN search over the compact vec0 table (faster brute-force KNN, not ANN at the pinned 0.1.x line), and fix the unbalanced parenthesis. Verified locally: actionlint clean on all workflows, ruff clean, and the benchmark module (which loads the real model) passes offline — 30 passed.
Contributor
Author
|
Reopening from chore/cache-hf-models-and-pip — the branch-name guard only accepts feature/fix/chore/docs/refactor/test/perf/release/hotfix prefixes (not ci/); commit type stays ci:. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Cache HuggingFace models and pip across CI so the benchmark tests stop tripping
HF rate limiting (HTTP 429).
The synthetic-benchmark tests load a real sentence-transformers model
(all-MiniLM-L6-v2) on purpose. With no caching, every test-matrix leg and the
smoke gate re-downloaded the model, and the concurrent requests intermittently
hit HF 429 — failing the model-dependent tests for reasons unrelated to the code.
Changes
warm-hf-cachejob downloads the model once on a cold cacheand saves
~/.cache/huggingface; the test matrixneedsit and restores awarm cache, so a cold run does one serialized download for the whole workflow
instead of three concurrent ones.
cache: "pip"on setup-python in CI (lint/typecheck/test), smoke-gate, therelease publish job, and upgrade-test.
release publish), with
HF_HUB_OFFLINEset only on a cache hit so a cold runcan still populate the cache.
not ANN at the pinned 0.1.x line).
No test logic changes — the benchmark tests still exercise the real model, now
served from cache. Docs-/CI-only: no version bump.
Verification
proving cache + offline = no Hub request.