Skip to content

ci: cache HuggingFace models and pip to stop HF 429 failures#19

Closed
przemarzec wants to merge 6 commits into
devfrom
ci/cache-hf-models-and-pip
Closed

ci: cache HuggingFace models and pip to stop HF 429 failures#19
przemarzec wants to merge 6 commits into
devfrom
ci/cache-hf-models-and-pip

Conversation

@przemarzec

Copy link
Copy Markdown
Contributor

What

Cache HuggingFace models and pip across CI so the benchmark tests stop tripping
HF rate limiting (HTTP 429).

The synthetic-benchmark tests load a real sentence-transformers model
(all-MiniLM-L6-v2) on purpose. With no caching, every test-matrix leg and the
smoke gate re-downloaded the model, and the concurrent requests intermittently
hit HF 429 — failing the model-dependent tests for reasons unrelated to the code.

Changes

  • A single upstream warm-hf-cache job downloads the model once on a cold cache
    and saves ~/.cache/huggingface; the test matrix needs it and restores a
    warm cache, so a cold run does one serialized download for the whole workflow
    instead of three concurrent ones.
  • cache: "pip" on setup-python in CI (lint/typecheck/test), smoke-gate, the
    release publish job, and upgrade-test.
  • HF model cache in the jobs that load the model (CI test matrix, smoke-gate,
    release publish), with HF_HUB_OFFLINE set only on a cache hit so a cold run
    can still populate the cache.
  • Tidy the sqlite-vec dev-dependency comment (KNN over the compact vec0 table —
    not ANN at the pinned 0.1.x line).

No test logic changes — the benchmark tests still exercise the real model, now
served from cache. Docs-/CI-only: no version bump.

Verification

  • actionlint clean on all four workflows.
  • The benchmark module that loads the real model passes offline (30 passed),
    proving cache + offline = no Hub request.

chore: forward-merge dev into main (stable mirror sync)
chore: forward-merge dev into main (stable mirror sync)
chore: forward-merge dev into main (stable mirror sync)
The synthetic-benchmark tests load a real sentence-transformers model
(all-MiniLM-L6-v2) on purpose, so CI downloads it from the Hub. With no caching,
every test-matrix job and the smoke gate re-downloaded it, and the concurrent
requests tripped HF rate limiting (HTTP 429), failing the model-dependent tests
intermittently.

Add caching across the workflows that install the embeddings stack:
- cache: "pip" on setup-python in ci (lint/typecheck/test), smoke-gate,
  release (publish) and upgrade-test — reuses the torch/sentence-transformers
  download.
- actions/cache on ~/.cache/huggingface in the model-loading jobs (ci test
  matrix, smoke-gate, release publish), keyed on the model — the model is fetched
  once and reused.
- On a cache hit, set HF_HUB_OFFLINE/TRANSFORMERS_OFFLINE so no Hub request is
  made at all; gated on cache-hit so a cold first run can still download once
  (an empty cache + offline fails to find the files).

No test logic changes — the benchmark tests still exercise the real model, now
served from cache. Also corrects the stale pyproject [dev] comment that claimed a
CI cache existed before this change. Verified locally: with the model cached,
HF_HUB_OFFLINE=1 loads it with zero network, and the evaluator benchmark tests
pass offline.
Address review of the HF/pip cache change:

- ci.yml: move the HuggingFace model download into a dedicated warm-hf-cache job
  that runs once before the test matrix (test now `needs: warm-hf-cache`). On a
  cold cache this guarantees a single, serialized model download for the whole
  workflow instead of three concurrent downloads across the matrix legs (which
  could still trip HF 429 and, if all legs failed, never populate the cache). The
  matrix jobs restore the warmed cache and run offline on a cache hit. Add
  actions/checkout before setup-python in the warm job so the pip cache key has
  the dependency files to hash.
- pyproject.toml: reword the sqlite-vec dev-dependency comment — the extension
  exercises KNN search over the compact vec0 table (faster brute-force KNN, not
  ANN at the pinned 0.1.x line), and fix the unbalanced parenthesis.

Verified locally: actionlint clean on all workflows, ruff clean, and the
benchmark module (which loads the real model) passes offline — 30 passed.
@przemarzec

Copy link
Copy Markdown
Contributor Author

Reopening from chore/cache-hf-models-and-pip — the branch-name guard only accepts feature/fix/chore/docs/refactor/test/perf/release/hotfix prefixes (not ci/); commit type stays ci:.

@przemarzec przemarzec closed this Jun 5, 2026
@przemarzec przemarzec deleted the ci/cache-hf-models-and-pip branch June 5, 2026 08:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant