[CI] Add slow marker and remove un-necessary tests by hughperkins · Pull Request #711 · Genesis-Embodied-AI/quadrants

hughperkins · 2026-05-21T10:11:26Z

Issue: #

Brief Summary

copilot:summary

Walkthrough

copilot:walkthrough

Add a `slow` pytest marker, mark the worst-case tests with it, and have `tests/run_tests.py` skip those tests by default (use `--run-slow` to include them, or `pytest -m slow` to run only those). Picked from macOS CI per-file timing (QD_FILE_TIMING=1, run 26083950810): phase 1 totals 6415s across 8641 test calls; the slowest 3 files alone (test_eig, test_tile16, test_linalg) cover 55%. The cost of test_eig / test_make_spd is super-linear in matrix size n (n=12 ≈ 5x n=9). Marked slow: - Parametrize cases n in {6, 9, 12} (and 7..11 for inverse_large) across test_eig.py and test_linalg.py. - Rectangular (9, 12) / (12, 3) cases in test_frobenius_inner_rectangular. - test_matmul_chain_qipc_sizes_{f32,f64} (>130s each on macOS CI). - test_clear_all_gradients (180s/invocation). - test_reset_ndarrays::test_ndarray_doesnt_crash_on_gc (127s). - test_mpm88::{test_mpm88, test_mpm88_numpy_and_ndarray} (~30s/invocation). - test_struct::test_2d_nested (122s/invocation). run_tests.py composes `not slow` with any user-supplied `-m` expression, so existing CI invocations like `-m "not needs_torch"` become `(not needs_torch) and not slow`. Note that this also drops slow tests from GPU / Linux / macOS CI runs — a separate workflow (or `--run-slow` job) is needed if we still want to exercise the n>=6 / n=12 paths in CI.

The previous lists ([4, 5, 6, 9, 12], [2, 3, 4, 6, 9, 12], [5..12], etc.) gave the Householder/QR path a lot of redundant size coverage. For routine CI we only need to exercise a small size + the largest supported size (12, which also doubles as the slow-marked stress case): if a bug shows up only at n=7 or n=11 it almost certainly also shows up at n=12. test_eig.py sym_eig_general_{f32,f64} [4,5,6,9,12] -> [4, 12*] make_spd_{f32,f64} [4,6,9,12] -> [4, 12*] sym_eig_alpha_identity_f64 [4,6,9,12] -> [4, 12*] make_spd_idempotent_f64 [4,6,9,12] -> [4, 12*] make_spd_negative_definite_zero_f64 [4,6,9,12] -> [4, 12*] sym_eig_sort_order_{f32,f64} [2,3,4,6,9,12] -> [3, 12*] test_linalg.py frobenius_inner_{f32,f64} [2,3,6,9,12] -> [3, 12*] inverse_large_{f32,f64} [5..12] -> [5, 12*] * n=12 retains the `slow` marker, so default `run_tests.py` invocations only hit n=4 / n=3 / n=5. `--run-slow` runs both. Closed-form 2x2/3x3 paths in test_sym_eig_sort_order: dropped n=2 in favour of n=3 (per directive); the 2x2 path is still covered by test_sym_eig2x2_{f32,f64}. The 3x3 closed-form path stays covered by n=3. Other parametrize lists left untouched: - rectangular (rows, cols) tuples in test_frobenius_inner_rectangular (it's varying shape, not pure size). - test_mat_inverse_size's `range(1, 5)` (tiny sizes only). - `a00` integer parametrize in test_sym_eig3x3_{f32,f64}.

The blocked-Cholesky demo previously hard-coded N=92, N_ENVS=4096, WARMUP=50, ITERS=200 as module globals. The unit-test wrapper test_tile16_cholesky_blocked_demo runs the demo as a subprocess and only cares that it returns 0; at the hard-coded sizes that takes ~74 s on cluster CUDA, dominated by JIT-compiling 3 large unrolled kernels at N=92 and running the 4096-env x 250-iter benchmark loop. Expose all four knobs as command-line flags with the previous values as defaults, so: python misc/demos/cholesky_blocked.py # unchanged, full demo python misc/demos/cholesky_blocked.py --n 32 --n-envs 64 \ --num-warmup 1 --num-iters 1 # smoke-mode The test will switch to the smoke-mode invocation in a follow-up commit so it stops dominating the slow critical path. Flag names (--n, --n-envs, --num-warmup, --num-iters) follow the user spec; using argparse + ArgumentDefaultsHelpFormatter so --help shows the full demo defaults.

Pass small CLI overrides (--n 32 --n-envs 64 --num-warmup 1 --num-iters 1) so the demo runs end-to-end in seconds instead of ~74 s. The test contract is just "demo exits 0"; it doesn't read any of the benchmark numbers, so the smaller workload still satisfies the smoke test. The full N=92 / N_ENVS=4096 / 50+200-iter demo is still what humans running misc/demos/cholesky_blocked.py see by default (argparse defaults match the previous hard-coded values). Together with the previous commit, this drops the test_tile16_cholesky_blocked_demo wall time on cluster CUDA from ~74 s to (expected) a few seconds, removing the largest remaining single-test outlier on hp/mark-slow-tests.

Previously the test hard-coded the qipc IPC sizes (9x12) · (12x12) · (12x9). On cluster CUDA those two cases (f32 + f64) take ~92.7s and ~87.3s respectively -- the top two single-test outliers in the suite, each holding one xdist worker for ~90s of contiguous JIT-compile + unrolled-FMA work. Parametrize `_test_matmul_chain` on (rows_a, cols_a, cols_b, cols_c). Default lane runs the small (3,4,4,3) chain to exercise the same Matrix.__matmul__ codegen path; the original (9,12,12,9) qipc-sized chain is slow-marked so it still runs on --run-slow (i.e. CI's nightly / release lane, once that's wired up). Estimated saving: ~180s CPU, ~70s wall (these tests were on the critical path of the branch run). No function-level coverage lost: both f32 and f64 versions still run the same chain by default, just at a smaller size.

Previously hard-coded N=30 (900 particles), n_grid=120, steps=32 -- 26s on cluster CUDA. The test's actual contract is that the AD-validation checker raises QuadrantsAssertionError on the global-data-access violation in g2p (`v[f, p] = new_v`), which fires on the first substep regardless of grid / particle / step counts. Parametrize on (particles_side, n_grid_size, num_steps) with a small default (8, 32, 4) and slow-marked original (30, 120, 32). The default still exercises the same diff-MPM pipeline (p2g / grid_op / g2p, qd.ad.Tape with validation=True, `with pytest.raises(...)`) and still triggers the assertion error. Estimated CPU saving: ~22s; wall saving ~3s on the branch run.

…ne op-parametrized test The three reduce variants (and the three scan variants) shared an identical kernel signature, identical input shape, and differed only in (a) which qd.algorithms.device_<op> function they called and (b) overflow vs bitwise-exact verification. Collapse each triple into a single op-parametrized test: test_device_reduce(op, dtype, N) # op in {add, min, max} test_device_exclusive_scan(op, dtype, N) # op in {add, min, max} Behavior, coverage and the parametrize space are unchanged -- pytest still collects the same number of parametrize cases, just under unified test names. This is purely a code-dedup refactor (~130 LOC less) which makes the next op-axis sampling change (if/when we choose to drop A vs B vs C from the sweep) a one-line edit.

…run_tests help string Pure formatting fix from `pre-commit run -a`; no behavior change.

…ppers into 2 op-parametrized tests Lines 3608-3694 in test_simt.py were 18 ~5-line wrappers each calling ``_check_full_matches_tiled(subgroup.<op>, subgroup.<op>_tiled, ...)``. Lines 3841-3848 were 2 more, parametrized on dtype. ``_check_full_matches_tiled`` already accepts the full / tiled functions as Python arguments (closure-captured into ``@qd.kernel``), so collapsing the family is a pure dedup move: test_subgroup_full_matches_tiled(op_name, host_init) # 18 cases: {reduce, inclusive, exclusive}_{add,min,max,mul,and,or,xor} on qd.i32 test_subgroup_full_matches_tiled_float(op_name, dtype) # 4 cases: {reduce_add, inclusive_add} x {qd.f32, qd.f64} Behavior + coverage unchanged (still 22 parametrize cases, same dtype + init configurations). Pytest ids are designed to match the original test-name suffixes (e.g. ``[reduce_add]``, ``[inclusive_mul]``) so ``-k`` selectors and test reports stay readable. Drops ~50 LOC net.

…zed tests The six block-reduce tests (3 single-output + 3 broadcast) share an identical kernel skeleton, parametrize axes, and verification loop. They only differ in which `block.reduce_*` function is called (closure-captured into `@qd.kernel` via getattr), the host-side reference oracle, the init pattern (sequential for `add` so the running sum has signal; permuted hash for `min` / `max` so the result depends on lanes other than first / last), and the float tolerance regime (relative for accumulating `add`, absolute for picker `min` / `max`). Collapse the six tests into two op-parametrized tests: test_block_reduce(sg_per_block, dtype, op_name, ...) # single-output, 3 ops test_block_reduce_all(sg_per_block, dtype, op_name, ...) # broadcast, 3 ops Parametrize space is unchanged (3 sg x 5 dtype x 3 op = 45 cases per fused test, matching the original 3 tests x 15 cases each). Pytest ids use plain `[add|min|max]` suffixes so `-k` selectors remain readable. Drops ~100 LOC of boilerplate -- two new small helpers (`_init_block_reduce_src` and `_assert_block_reduce_close`) capture the per-op behavioral differences in one place each.

…zed test The three block inclusive scan tests share the same kernel skeleton and only differ in the closure-captured `block.inclusive_<op>` function, the host-side reference oracle, the init pattern (sequential for `add` -- sums grow with prefix length; permuted for `min` / `max` -- result depends on lanes other than first / last), and the float tolerance regime (relative for `add`, absolute for `min` / `max`). Collapse into one op-parametrized test: test_block_inclusive(sg_per_block, dtype, op_name, ...) Identical param count to the original three tests (3 sg x 5 dtype x 3 op = 45 cases vs original 3 x 15). Pulls a shared `_assert_block_scan_close` helper out so the int / relative-float / absolute-float regime is encoded in one place; the relative-float branch keeps the floor-on-tol-base trick needed by the original `test_block_exclusive_add` (also routed through the same helper). `test_block_exclusive_add` stays as its own function for now because the matching exclusive `min` / `max` cases need dtype-derived sentinel identities + ``isinf`` handling that's different enough that fusing them in would create more branches than it removes; can address that in a follow-up if needed.

…trized test `test_block_exclusive_min` and `test_block_exclusive_max` share the same permuted-init pattern and only differ in the dtype-derived sentinel identity (``+inf`` / ``iinfo.max`` for min, ``-inf`` / ``iinfo.min`` for max) and the inf-sign check at lane 0. Collapse into one op-parametrized test that takes ``(op_name, sentinel_fn, py_op, inf_sign)`` and dispatches via getattr + the (already module-level) `_PY_MIN` / `_PY_MAX` lambdas. Identical param count to the original pair (3 sg x 5 dtype x 2 op = 30 cases vs original 2 x 15 each = 30). `test_block_exclusive_add` remains its own function because the integer identity is `0` (not `iinfo.max/min`) and the init pattern is sequential -- different enough that fusing it in would add more branches than it removes. Drops ~30 LOC.

The PR's `Check line wrapping` CI agent flagged three comments wrapped at the AI-default ~78-90c instead of the project's 120c target. Reflow each to the full target width: - tests/python/test_tile16.py:1791 (78c -> 120c) docstring for test_tile16_cholesky_blocked_demo. - tests/python/test_ad_gdar_diffmpm.py:8 (85c -> 120c) the "defaults shrink ..." comment above the parametrize block. - tests/run_tests.py:60 (90c -> 120c) the "--run-slow opts back in" comment. Also collapse the dangling-backslash continuation in misc/demos/cholesky_blocked.py's Usage example onto one line (69c -> 109c). No behavior change; comments only. Verified via the cursor find-underwrapped skill that the remaining flagged runs in my diff are all 103-116c with save~=0 (already-tight runs the greedy heuristic still reports), comfortably in the agent's "not borderline" exemption.

Hugh requested in PR #709 review comment that the testing bullet collapse to just a pointer at unit_testing.md, since the long inline summary duplicates the dedicated doc immediately below.

@sample

Documents the test launcher, the @pytest.mark.slow marker (whole-test and parametrize-case variants), how to write a new parametrized test with the test_utils.test decorator, and the Advanced section with the per-test timeout, kernel compilation cache, and per-file timing knobs. Modeled on the structure of the equivalent doc on hp/mark-slow-tests (after Hugh's two rounds of PR review feedback there) but with all @pytest.mark.sample references stripped, since the @sample marker is not part of this branch.

chatgpt-codex-connector

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a85c6ecbcc

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

Open a pull request for review
Mark a draft as ready
Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "@codex address that feedback".

chatgpt-codex-connector · 2026-05-21T10:16:39Z

    sm70: Can only run on GPU with compute capability 7.0 or higher.
    needs_torch: mark test as requiring PyTorch.
+    slow: mark test (or parametrize case) as slow. Skipped by default by tests/run_tests.py;
+        pass --run-slow to include them, or directly `pytest -m slow` to run only the slow ones.


Keep slow marker description in one ini entry

The wrapped description line is parsed as a second marker entry, so pytest now registers an unintended marker named pass --run-slow to include them, or directly \pytest -m slow` to run only the slow ones.instead of treating it as part ofslow's description. You can see this in pytest --markers, and it can confuse marker tooling (and future strict marker validation) because tests/pytest.inino longer defines markers as onename: description` per line.

Useful? React with 👍 / 👎.

github-actions · 2026-05-21T10:46:05Z

Total: 12 file(s) changed, +296 -473 code lines.

github-actions · 2026-05-21T11:41:07Z

Diff coverage: 97% · 180 lines, 5 missing

duburcqa · 2026-05-21T12:16:22Z

ok to merge

pytest registers the `markers` ini option as a `linelist` type, so config.getini("markers") splits the value on newlines and every line becomes a separate registered marker. configparser preserves indented-continuation lines as literal newlines inside the value, so the previous two-line entry slow: mark test (or parametrize case) as slow. Skipped by default by tests/run_tests.py; pass --run-slow to include them, or directly `pytest -m slow` to run only the slow ones. was being parsed as TWO markers: `slow: ...` and a phantom marker whose "name" is the literal text `pass --run-slow to include them, ...`. The phantom marker shows up in `pytest --markers` output and would either pollute or break future `--strict-markers` validation. Collapse to a single line; trim wording slightly to keep it readable. Verified via configparser that the markers list now has 4 entries (was 5). Reported by chatgpt-codex on PR #711.

pytest registers the `markers` ini option as a `linelist` type, so config.getini("markers") splits the value on newlines and every line becomes a separate registered marker. configparser preserves indented-continuation lines as literal newlines inside the value, so the two multi-line entries here were being parsed as 9 markers (4 real + 5 phantom whose "names" were the continuation lines, e.g. a marker named `pass --run-slow to include them, ...` and another named `--no-sample, nodeid-paste).`). Phantom markers pollute `pytest --markers` output and would either mask or break future `--strict-markers` validation, since strict mode would accept the literal text `pass` or `--no-sample,` as valid marker names on any test. Collapse both descriptions to single lines (with light wording trim to keep the source readable). Verified via configparser that the markers list now has exactly 5 real entries (was 9). Reported by chatgpt-codex on the sibling PR #711.

github-actions · 2026-05-27T17:10:48Z

Total: 12 file(s) changed, +296 -473 code lines.

github-actions · 2026-05-27T18:00:23Z

Diff coverage: 97% · 180 lines, 5 missing

Hugh Perkins (deskai7) added 15 commits May 21, 2026 02:58

[Style] black: reformat test_tile16_cholesky_blocked_demo cmd list + …

8acaaec

…run_tests help string Pure formatting fix from `pre-commit run -a`; no behavior change.

[Doc] contributing.md: shorten testing bullet per PR review

7389b5f

Hugh requested in PR #709 review comment that the testing bullet collapse to just a pointer at unit_testing.md, since the long inline summary duplicates the dedicated doc immediately below.

hughperkins assigned duburcqa May 21, 2026

chatgpt-codex-connector Bot reviewed May 21, 2026

View reviewed changes

hughperkins merged commit dcb15c2 into main May 27, 2026
56 checks passed

hughperkins deleted the hp/mark-slow-tests-only branch May 27, 2026 20:53

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[CI] Add slow marker and remove un-necessary tests#711

[CI] Add slow marker and remove un-necessary tests#711
hughperkins merged 16 commits into
mainfrom
hp/mark-slow-tests-only

hughperkins commented May 21, 2026

Uh oh!

chatgpt-codex-connector Bot left a comment

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Uh oh!

hughperkins May 27, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

duburcqa commented May 21, 2026

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

hughperkins commented May 21, 2026

Brief Summary

Walkthrough

Uh oh!

chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

💡 Codex Review

Uh oh!

chatgpt-codex-connector Bot May 21, 2026

Choose a reason for hiding this comment

Uh oh!

hughperkins May 27, 2026

Choose a reason for hiding this comment

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

duburcqa commented May 21, 2026

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

github-actions Bot commented May 27, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants