Skip to content

ci: cache HuggingFace models and pip to stop HF 429 failures#20

Merged
przemarzec merged 6 commits into
devfrom
chore/cache-hf-models-and-pip
Jun 5, 2026
Merged

ci: cache HuggingFace models and pip to stop HF 429 failures#20
przemarzec merged 6 commits into
devfrom
chore/cache-hf-models-and-pip

Conversation

@przemarzec

Copy link
Copy Markdown
Contributor

What

Cache HuggingFace models and pip across CI so the benchmark tests stop tripping
HF rate limiting (HTTP 429).

The synthetic-benchmark tests load a real sentence-transformers model
(all-MiniLM-L6-v2) on purpose. With no caching, every test-matrix leg and the
smoke gate re-downloaded the model, and the concurrent requests intermittently
hit HF 429 — failing the model-dependent tests for reasons unrelated to the code.

(Branch is chore/... because the branch-name guard accepts
feature/fix/chore/docs/refactor/test/perf/release/hotfix; the commits are ci:.)

Changes

  • A single upstream warm-hf-cache job downloads the model once on a cold cache
    and saves ~/.cache/huggingface; the test matrix needs it and restores a
    warm cache, so a cold run does one serialized download for the whole workflow
    instead of three concurrent ones.
  • cache: "pip" on setup-python in CI (lint/typecheck/test), smoke-gate, the
    release publish job, and upgrade-test.
  • HF model cache in the jobs that load the model, with HF_HUB_OFFLINE set only
    on a cache hit so a cold run can still populate the cache.
  • Tidy the sqlite-vec dev-dependency comment (KNN over the compact vec0 table —
    not ANN at the pinned 0.1.x line).

No test logic changes — the benchmark tests still exercise the real model, now
served from cache. CI-only: no version bump.

Verification

  • actionlint clean on all four workflows.
  • The benchmark module that loads the real model passes offline (30 passed),
    proving cache + offline = no Hub request.

chore: forward-merge dev into main (stable mirror sync)
chore: forward-merge dev into main (stable mirror sync)
chore: forward-merge dev into main (stable mirror sync)
The synthetic-benchmark tests load a real sentence-transformers model
(all-MiniLM-L6-v2) on purpose, so CI downloads it from the Hub. With no caching,
every test-matrix job and the smoke gate re-downloaded it, and the concurrent
requests tripped HF rate limiting (HTTP 429), failing the model-dependent tests
intermittently.

Add caching across the workflows that install the embeddings stack:
- cache: "pip" on setup-python in ci (lint/typecheck/test), smoke-gate,
  release (publish) and upgrade-test — reuses the torch/sentence-transformers
  download.
- actions/cache on ~/.cache/huggingface in the model-loading jobs (ci test
  matrix, smoke-gate, release publish), keyed on the model — the model is fetched
  once and reused.
- On a cache hit, set HF_HUB_OFFLINE/TRANSFORMERS_OFFLINE so no Hub request is
  made at all; gated on cache-hit so a cold first run can still download once
  (an empty cache + offline fails to find the files).

No test logic changes — the benchmark tests still exercise the real model, now
served from cache. Also corrects the stale pyproject [dev] comment that claimed a
CI cache existed before this change. Verified locally: with the model cached,
HF_HUB_OFFLINE=1 loads it with zero network, and the evaluator benchmark tests
pass offline.
Address review of the HF/pip cache change:

- ci.yml: move the HuggingFace model download into a dedicated warm-hf-cache job
  that runs once before the test matrix (test now `needs: warm-hf-cache`). On a
  cold cache this guarantees a single, serialized model download for the whole
  workflow instead of three concurrent downloads across the matrix legs (which
  could still trip HF 429 and, if all legs failed, never populate the cache). The
  matrix jobs restore the warmed cache and run offline on a cache hit. Add
  actions/checkout before setup-python in the warm job so the pip cache key has
  the dependency files to hash.
- pyproject.toml: reword the sqlite-vec dev-dependency comment — the extension
  exercises KNN search over the compact vec0 table (faster brute-force KNN, not
  ANN at the pinned 0.1.x line), and fix the unbalanced parenthesis.

Verified locally: actionlint clean on all workflows, ruff clean, and the
benchmark module (which loads the real model) passes offline — 30 passed.
@przemarzec przemarzec merged commit e43f3cc into dev Jun 5, 2026
13 checks passed
@przemarzec przemarzec deleted the chore/cache-hf-models-and-pip branch June 5, 2026 08:52
engrava-release Bot pushed a commit that referenced this pull request Jun 18, 2026
## 0.4.0 (2026-06-18)

* Merge bi-temporal documentation and 0.3 to 0.4 upgrade notes into v0.4.0 ([a04b75c](a04b75c))
* Merge bi-temporal valid-time storage into v0.4.0 ([534cb5c](534cb5c))
* Merge chore/release-trigger-to-dev: switch release-trigger to dev + align docs + CI on dev ([3ce07b2](3ce07b2))
* Merge convenience API (remember/recall) and cycle defaults into v0.4.0 ([bd6402f](bd6402f))
* Merge doc-block execution (tutorial end-to-end + search round-trip) into v0.4.0 ([c582bf0](c582bf0))
* Merge docs accuracy fix + documentation-example tests ([f4e1e28](f4e1e28))
* Merge documented-defaults verification and metadata-helper docs into v0.4.0 ([8ff4145](8ff4145))
* Merge embedding-input repair into v0.4.0 ([825a9fc](825a9fc))
* Merge embedding-provider transient-error retry into v0.4.0 ([f959844](f959844))
* Merge full-text query repair into v0.4.0 ([ba1b288](ba1b288))
* Merge MCP delete tools (delete_thought, delete_edge) into v0.4.0 ([76598bb](76598bb))
* Merge MCP guided memory prompts into v0.4.0 ([2afef80](2afef80))
* Merge MCP memory filters and pagination into v0.4.0 ([62b2b60](62b2b60))
* Merge MCP resources (thought, stats, recent) into v0.4.0 ([62c7639](62c7639))
* Merge MCP server (read tools) + MindQL execution entry point into v0.4.0 ([82c65f0](82c65f0))
* Merge MCP server guide and client-config examples into v0.4.0 ([7ccdbd8](7ccdbd8))
* Merge MCP structured tool errors into v0.4.0 ([8c6811b](8c6811b))
* Merge MCP write tools, read-only mode and safety annotations into v0.4.0 ([3723d10](3723d10))
* Merge MindQL correctness fixes into v0.4.0 ([3f3fbeb](3f3fbeb))
* Merge pull request #13 from sovantica/dev ([f2d4f8b](f2d4f8b)), closes [#13](#13)
* Merge pull request #14 from sovantica/chore/commitlint-ignore-merge-commits ([fbfe701](fbfe701)), closes [#14](#14)
* Merge pull request #15 from sovantica/dev ([f93d026](f93d026)), closes [#15](#15)
* Merge pull request #18 from sovantica/dev ([3db5530](3db5530)), closes [#18](#18)
* Merge pull request #21 from sovantica/dev ([f07fef4](f07fef4)), closes [#21](#21)
* Merge pull request #23 from sovantica/dev ([b2f1691](b2f1691)), closes [#23](#23)
* Merge pull request #27 from sovantica/release/v0.4.0 ([fcc41c1](fcc41c1)), closes [#27](#27)
* Merge pull request #28 from sovantica/fix/branch-guard-semver ([0598155](0598155)), closes [#28](#28)
* Merge pull request #29 from sovantica/ci/release-app-token ([5432de6](5432de6)), closes [#29](#29)
* Merge pull request from dev: docs accuracy fix + documentation-example tests (WS docs) ([7d1118d](7d1118d))
* Merge quickstart remember()/recall() short path into v0.4.0 ([1a89476](1a89476))
* Merge README extras list + drop no-op dreaming extra into v0.4.0 ([13c2448](13c2448))
* Merge README quickstart remember()/recall() into v0.4.0 ([22bf027](22bf027))
* Merge reflection temporal-extent inheritance into v0.4.0 ([e220f0a](e220f0a))
* Merge search functional contract suite into v0.4.0 ([30027f4](30027f4))
* Merge session-wide test thread-pinning into v0.4.0 ([114e85a](114e85a))
* Merge sqlite pragma tuning and hot-path indexes into v0.4.0 ([66afaf0](66afaf0))
* Merge temporal query predicates and invalidate primitive into v0.4.0 ([c160736](c160736))
* Merge temporal-query performance gate into v0.4.0 ([4779c9c](4779c9c))
* ci: add secret scan + dependency audit; verify wheel data on publish (#22) ([7cfaec3](7cfaec3)), closes [#22](#22)
* ci: align branch-name guard allowed types with BRANCHING.md ([a7f3e98](a7f3e98))
* ci: allow semantic-version release and hotfix branch names ([26261b4](26261b4))
* ci: cache HuggingFace models and pip to stop HF 429 (#20) ([e43f3cc](e43f3cc)), closes [#20](#20)
* ci: use GitHub App token for semantic-release push to protected dev ([1a282bd](1a282bd))
* ci(ci): ignore merge commits in commitlint via JS config ([3d54f61](3d54f61))
* ci(ci): run CI on dev branch as well as main ([d28ace2](d28ace2))
* docs: add bi-temporal model guide and 0.3 to 0.4 upgrade notes ([036e225](036e225))
* docs: bring architecture + CLI docs up to 0.4.0 (bi-temporal + MCP) ([cd5df19](cd5df19))
* docs: capitalize the Engrava brand name in README prose ([1254830](1254830))
* docs: correct mindql, extensions, extension-hooks, and configuration examples ([54149ef](54149ef))
* docs: correct README, quickstart, and api-reference examples to match shipped API ([36cf61d](36cf61d))
* docs: correct reflection_boost default to 1.0; add test verifying documented config defaults ([d42a852](d42a852))
* docs: document percept/utterance/thought metadata helpers in api-reference ([2e80d04](2e80d04))
* docs: drop reference to non-existent purity-check script in CONTRIBUTING ([59dcf5d](59dcf5d))
* docs: expand and correct the engrava documentation set (#17) ([2c3fa82](2c3fa82)), closes [#17](#17)
* docs: fix 0.4.0 drift found in the full doc audit ([bb5dff1](bb5dff1))
* docs: fix quickstart cycle note — remember() takes no created_cycle; use ThoughtRecord for write-sid ([8c00368](8c00368))
* docs: lead quickstart with remember()/recall() short path ([edfe620](edfe620))
* docs: lead README Basic Usage with remember()/recall() ([3f415e2](3f415e2))
* docs: list all installable extras in README (mcp + ollama/hf embeddings); note dreaming needs no ext ([c55f0ad](c55f0ad))
* docs: trim README Basic Usage to the core create-and-read example ([b2bf7bf](b2bf7bf))
* docs(mcp): add MCP server guide and client-config examples ([b2bc18b](b2bc18b))
* docs(release): align branching guide with dev release-trigger model ([7619127](7619127))
* docs(tests): de-reference internal principle name in doc-test rationale ([ed8b184](ed8b184))
* build: drop no-op 'dreaming' extra (empty deps; dreaming is in the base install) ([c6fa946](c6fa946))
* build(deps): raise pydantic floor to >=2.11 ([d7eaff0](d7eaff0))
* feat: add bi-temporal valid-time to thoughts and edges ([456bcb6](456bcb6))
* feat: add temporal query predicates and invalidate primitive ([86de77f](86de77f))
* feat: reflections inherit temporal extent from members ([8fba769](8fba769))
* feat(api): add remember() and recall() convenience methods on the store ([be9b110](be9b110))
* feat(mcp): add delete_thought and delete_edge tools ([84b87f6](84b87f6))
* feat(mcp): add guided memory prompts ([1f6fa36](1f6fa36))
* feat(mcp): add MCP server with read tools (engrava[mcp] extra) ([68a8085](68a8085))
* feat(mcp): add memory filters and pagination ([57357b7](57357b7))
* feat(mcp): add write tools, opt-in read-only mode, and per-tool safety annotations ([79d7604](79d7604))
* feat(mcp): expose memory as resources (thought, stats, recent) ([c54dcf7](c54dcf7))
* feat(mcp): map known failures to typed, actionable tool errors ([8b615cc](8b615cc))
* feat(mindql): add store-level execute_mindql entry point ([1c8ffb4](1c8ffb4))
* fix: assert plan-shape invariant for temporal queries, not scan-vs-index ([0e4e176](0e4e176))
* fix: embed full thought content without duplication or silent truncation ([36e08e7](36e08e7))
* fix: keep quoted MindQL values as strings and reject malformed conditions ([d88043d](d88043d))
* fix: let natural-language queries reach the full-text index ([bb6b729](bb6b729))
* fix: match exact table token in query-plan helpers ([cd4ecc2](cd4ecc2))
* fix(embeddings): retry transient errors with bounded backoff ([897c46f](897c46f))
* fix(mcp): keep query_memory parse errors FIND-only ([5f4ea20](5f4ea20))
* fix(mcp): map write-tool errors and complete the 0.4.0 documentation ([6794436](6794436))
* test: add documentation-example test suite ([75b977e](75b977e))
* test: add functional contract suite for search behavior ([862e7bc](862e7bc))
* test: bound temporal query overhead and confirm index use ([035e860](035e860))
* test: isolate subprocess examples (offline, single-thread, no stdin) ([c3cf339](c3cf339))
* test: pin native thread pools session-wide to stop full-suite hang ([3a0eaa3](3a0eaa3))
* test(docs): execute the tutorial end-to-end and a search round-trip ([730c2bb](730c2bb))
* perf: tune sqlite pragmas and add hot-path indexes ([1256303](1256303))
* chore: back-merge main into dev after v0.3.0 release ([4769d22](4769d22))
* chore: back-merge main into dev after v0.3.1 release ([40e45b6](40e45b6))
* chore(ci): switch release-trigger branch from main to dev ([cd2bb24](cd2bb24))
* chore(deps): bump actions/cache from 4 to 5 (#25) ([c84fb8f](c84fb8f)), closes [#25](#25)
* chore(deps): bump actions/setup-python from 5 to 6 (#26) ([0a69457](0a69457)), closes [#26](#26)

[skip ci]
@engrava-release

Copy link
Copy Markdown

🎉 This PR is included in version 0.4.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant