Last generated: 2026-01-22T18:33:56.526Z
Provider: openai
Model: gpt-5.2
Summary
Focus automation on reliable CI for a legacy Python package (Formasaurus) while reducing repo bloat/toil: add a minimal, deterministic GitHub Actions test workflow, tighten packaging checks, and stop committing large/opaque artifacts (e.g., bfg-1.15.0.jar, *.sqlite, *.bish-*) that can destabilize automation and slow CI.
Direction (what and why)
-
Establish a single source of truth CI pipeline (GitHub Actions) for lint + unit tests across a small Python matrix.
- Why: repo still has
.travis.yml and many “auto-*” workflows; but it’s unclear there’s a canonical test workflow guarding merges. A dedicated ci.yml ensures every PR runs the same checks deterministically.
-
Make builds reproducible and packaging safer by adding packaging validation.
- Why: this is a published-style Python package (
setup.py, setup.cfg, MANIFEST.in); packaging regressions are common and should be caught automatically.
-
Reduce automation noise and repository bloat by ensuring generated/binary files are not tracked and are ignored.
- Why: committed DB/index files (
.bish.sqlite, .bish-index) and a 14MB jar are high churn/binary; they increase checkout time and can trigger irrelevant diffs in automated PRs.
Plan (next 1–3 steps)
1) Add a canonical CI workflow (.github/workflows/ci.yml)
Create a new workflow that runs on pull_request and push to master:
- Use
actions/setup-python with a small matrix: 3.9, 3.11 (and optionally 3.12 if deps allow).
- Install dependencies from
requirements.txt plus test deps (pytest, pytest-cov).
- Run:
python -m pip install -U pip setuptools wheel
python -m pip install -r requirements.txt
python -m pip install -e .
python -m pytest -q --disable-warnings --maxfail=1
If you already rely on tox.ini, prefer calling tox instead:
python -m pip install tox
tox -q
Concrete file to add:
2) Add packaging validation job (same workflow or separate packaging.yml)
Add a job that validates the distribution can be built:
python -m pip install build twine
python -m build
python -m twine check dist/*
This catches broken metadata, missing package data, and sdist/wheel issues early.
Concrete target:
- Add a second job
build-and-check inside .github/workflows/ci.yml
3) Stop tracking generated/binary artifacts and harden ignores
- Update
.gitignore to include (if not already):
*.sqlite
.bish-index
.bish.sqlite
docs/.bish-index
docs/.bish.sqlite
formasaurus/.bish-index
formasaurus/.bish.sqlite
.pytest_cache/
.coverage
dist/
build/
*.egg-info/
- Remove large/unnecessary binaries from the repo if they are not required at runtime:
- Evaluate
bfg-1.15.0.jar (14MB). If it’s only a maintenance tool, move it out of the repo:
- Preferred: document download instructions in
docs/contributing.rst and delete the jar from git.
- Alternative: store in GitHub Releases or an internal artifact store.
Concrete files:
.gitignore
- potentially remove
bfg-1.15.0.jar, **/.bish.sqlite, **/.bish-index from git history going forward (at minimum remove from current tree).
Risks/unknowns
- Python version support is unclear: repo historically had Py2.7/Py3.5 references in commits; modern deps (scikit-learn/joblib) may not support old versions. The proposed matrix assumes modern Python; adjust after first CI run.
- Heavy HTML fixtures in
formasaurus/data/html/* may make tests slow; CI timeouts could occur. If so, mark slow tests and run them nightly, or cache artifacts.
- Many existing “auto-*” workflows may already perform CI-like actions; adding
ci.yml could duplicate checks. After ci.yml is stable, consider disabling redundant workflows that comment/review excessively.
Suggested tests
Run these locally and in CI:
- Unit tests:
- Coverage sanity (optional):
pytest --cov=formasaurus --cov-report=term-missing
- Packaging checks:
python -m build
python -m twine check dist/*
Verification checklist (short)
Last generated: 2026-01-22T18:33:56.526Z
Provider: openai
Model: gpt-5.2
Summary
Focus automation on reliable CI for a legacy Python package (Formasaurus) while reducing repo bloat/toil: add a minimal, deterministic GitHub Actions test workflow, tighten packaging checks, and stop committing large/opaque artifacts (e.g.,
bfg-1.15.0.jar,*.sqlite,*.bish-*) that can destabilize automation and slow CI.Direction (what and why)
Establish a single source of truth CI pipeline (GitHub Actions) for lint + unit tests across a small Python matrix.
.travis.ymland many “auto-*” workflows; but it’s unclear there’s a canonical test workflow guarding merges. A dedicatedci.ymlensures every PR runs the same checks deterministically.Make builds reproducible and packaging safer by adding packaging validation.
setup.py,setup.cfg,MANIFEST.in); packaging regressions are common and should be caught automatically.Reduce automation noise and repository bloat by ensuring generated/binary files are not tracked and are ignored.
.bish.sqlite,.bish-index) and a 14MB jar are high churn/binary; they increase checkout time and can trigger irrelevant diffs in automated PRs.Plan (next 1–3 steps)
1) Add a canonical CI workflow (
.github/workflows/ci.yml)Create a new workflow that runs on
pull_requestandpushtomaster:actions/setup-pythonwith a small matrix:3.9,3.11(and optionally3.12if deps allow).requirements.txtplus test deps (pytest,pytest-cov).python -m pip install -U pip setuptools wheelpython -m pip install -r requirements.txtpython -m pip install -e .python -m pytest -q --disable-warnings --maxfail=1If you already rely on
tox.ini, prefer calling tox instead:python -m pip install toxtox -qConcrete file to add:
.github/workflows/ci.yml2) Add packaging validation job (same workflow or separate
packaging.yml)Add a job that validates the distribution can be built:
python -m pip install build twinepython -m buildpython -m twine check dist/*This catches broken metadata, missing package data, and sdist/wheel issues early.
Concrete target:
build-and-checkinside.github/workflows/ci.yml3) Stop tracking generated/binary artifacts and harden ignores
.gitignoreto include (if not already):*.sqlite.bish-index.bish.sqlitedocs/.bish-indexdocs/.bish.sqliteformasaurus/.bish-indexformasaurus/.bish.sqlite.pytest_cache/.coveragedist/build/*.egg-info/bfg-1.15.0.jar(14MB). If it’s only a maintenance tool, move it out of the repo:docs/contributing.rstand delete the jar from git.Concrete files:
.gitignorebfg-1.15.0.jar,**/.bish.sqlite,**/.bish-indexfrom git history going forward (at minimum remove from current tree).Risks/unknowns
formasaurus/data/html/*may make tests slow; CI timeouts could occur. If so, mark slow tests and run them nightly, or cache artifacts.ci.ymlcould duplicate checks. Afterci.ymlis stable, consider disabling redundant workflows that comment/review excessively.Suggested tests
Run these locally and in CI:
pytest -qpytest --cov=formasaurus --cov-report=term-missingpython -m buildpython -m twine check dist/*Verification checklist (short)
CIworkflow and reports green on supported Python versionspython -m buildproduces both sdist and wheel in CI.bish-*/*.sqlitefiles keep reappearing in diffsbfg-1.15.0.jarremoved, docs mention how to obtain it (if still needed)