Skip to content

Automation: Direction #37

@github-actions

Description

@github-actions

Last generated: 2026-01-22T18:33:56.526Z
Provider: openai
Model: gpt-5.2

Summary

Focus automation on reliable CI for a legacy Python package (Formasaurus) while reducing repo bloat/toil: add a minimal, deterministic GitHub Actions test workflow, tighten packaging checks, and stop committing large/opaque artifacts (e.g., bfg-1.15.0.jar, *.sqlite, *.bish-*) that can destabilize automation and slow CI.

Direction (what and why)

  1. Establish a single source of truth CI pipeline (GitHub Actions) for lint + unit tests across a small Python matrix.

    • Why: repo still has .travis.yml and many “auto-*” workflows; but it’s unclear there’s a canonical test workflow guarding merges. A dedicated ci.yml ensures every PR runs the same checks deterministically.
  2. Make builds reproducible and packaging safer by adding packaging validation.

    • Why: this is a published-style Python package (setup.py, setup.cfg, MANIFEST.in); packaging regressions are common and should be caught automatically.
  3. Reduce automation noise and repository bloat by ensuring generated/binary files are not tracked and are ignored.

    • Why: committed DB/index files (.bish.sqlite, .bish-index) and a 14MB jar are high churn/binary; they increase checkout time and can trigger irrelevant diffs in automated PRs.

Plan (next 1–3 steps)

1) Add a canonical CI workflow (.github/workflows/ci.yml)

Create a new workflow that runs on pull_request and push to master:

  • Use actions/setup-python with a small matrix: 3.9, 3.11 (and optionally 3.12 if deps allow).
  • Install dependencies from requirements.txt plus test deps (pytest, pytest-cov).
  • Run:
    • python -m pip install -U pip setuptools wheel
    • python -m pip install -r requirements.txt
    • python -m pip install -e .
    • python -m pytest -q --disable-warnings --maxfail=1

If you already rely on tox.ini, prefer calling tox instead:

  • python -m pip install tox
  • tox -q

Concrete file to add:

  • .github/workflows/ci.yml

2) Add packaging validation job (same workflow or separate packaging.yml)

Add a job that validates the distribution can be built:

  • python -m pip install build twine
  • python -m build
  • python -m twine check dist/*

This catches broken metadata, missing package data, and sdist/wheel issues early.

Concrete target:

  • Add a second job build-and-check inside .github/workflows/ci.yml

3) Stop tracking generated/binary artifacts and harden ignores

  1. Update .gitignore to include (if not already):
  • *.sqlite
  • .bish-index
  • .bish.sqlite
  • docs/.bish-index
  • docs/.bish.sqlite
  • formasaurus/.bish-index
  • formasaurus/.bish.sqlite
  • .pytest_cache/
  • .coverage
  • dist/
  • build/
  • *.egg-info/
  1. Remove large/unnecessary binaries from the repo if they are not required at runtime:
  • Evaluate bfg-1.15.0.jar (14MB). If it’s only a maintenance tool, move it out of the repo:
    • Preferred: document download instructions in docs/contributing.rst and delete the jar from git.
    • Alternative: store in GitHub Releases or an internal artifact store.

Concrete files:

  • .gitignore
  • potentially remove bfg-1.15.0.jar, **/.bish.sqlite, **/.bish-index from git history going forward (at minimum remove from current tree).

Risks/unknowns

  • Python version support is unclear: repo historically had Py2.7/Py3.5 references in commits; modern deps (scikit-learn/joblib) may not support old versions. The proposed matrix assumes modern Python; adjust after first CI run.
  • Heavy HTML fixtures in formasaurus/data/html/* may make tests slow; CI timeouts could occur. If so, mark slow tests and run them nightly, or cache artifacts.
  • Many existing “auto-*” workflows may already perform CI-like actions; adding ci.yml could duplicate checks. After ci.yml is stable, consider disabling redundant workflows that comment/review excessively.

Suggested tests

Run these locally and in CI:

  1. Unit tests:
    • pytest -q
  2. Coverage sanity (optional):
    • pytest --cov=formasaurus --cov-report=term-missing
  3. Packaging checks:
    • python -m build
    • python -m twine check dist/*

Verification checklist (short)

  • PR triggers CI workflow and reports green on supported Python versions
  • python -m build produces both sdist and wheel in CI
  • No tracked .bish-* / *.sqlite files keep reappearing in diffs
  • If bfg-1.15.0.jar removed, docs mention how to obtain it (if still needed)

Metadata

Metadata

Assignees

No one assigned

    Labels

    automationAutomation-generated direction and planning

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions