Refactoring test fixtures #724

tturocy · 2025-12-23T16:24:13Z

Trying to understand the output from the failing Nash equilibrium tests recently just about did my head in, so I decided it was high time that we really need to look at our tests. So I pulled together a bunch of notes/thoughts I had been having (alongside reading up on some pytest features)..

This draft pull request is a proposal for discussion.

In short - I have re-written what used to be the test for enummixed_solve with rational probabilities to be a generic Nash equilibrium solver tester (!!!)

Highlights:

Test cases are represented as a dataclass
Games are constructed using a factory function (this solves a problem we noted previously that if the game construction fails it takes down the test run at collection time)
The solver is likewise specified as a callable
Tests are given an identifier of our choosing. I have been lazy and just called them test1, test2 &c right now, they will need to be better in production of course.
Tolerances are configurable for regrets and probabilities on a per-test case. So no global setting is hard-coded, and we can handle both float and rational with the same test function.
The tests now use pytest-subtests (which has been added as a dependency in the tests requirements.txt). This allows us to isolate the individual checks within the test so even if one fails the others can still run - e.g. if somehow max_regret is correct but the probabilities change for one profile, it still will check them all.
Experimenting with using Q() as a shorthand for rationals just to make our intent precise, and a helper d() which visually represents a probability distribution - solely for visual layout so we are not lost in a maze of square brackets, all alike.

This is an all hands request for everyone to have a look. Not least because in the new year I will be asking everyone to kick in for rationalising our test suite using either this technique or some other one - it is clear our test suite's reaching the edge of maintainability without making some investments like the ones suggested here!

edwardchalstrey1 · 2026-01-05T10:33:39Z

Games are constructed using a factory function (this solves a problem we noted previously that if the game construction fails it takes down the test run at collection time)

^ do we not want this to trigger the test failing - or is it the idea that only the game construction tests in test_game.py should fail if breaking changes to game construction are made?

I like the EquilibriumTestCase especially as it gives us the flexibility to test as many game/solver combos as required. Is the idea to have a test (or several) per solver?

Also just noting that when I work on the game library idea #623 then games can be removed from the test dir completely and loaded directly from the library, however that ends up looking.

tturocy · 2026-01-05T10:48:11Z

Games are constructed using a factory function (this solves a problem we noted previously that if the game construction fails it takes down the test run at collection time)

^ do we not want this to trigger the test failing - or is it the idea that only the game construction tests in test_game.py should fail if breaking changes to game construction are made?

They will still fail - but they will fail when the test is run. In our current test suite, if something causes game construction to fail, this fails at test collection time - so no tests run.

rahulsavani · 2026-01-05T10:48:57Z

Also just noting that when I work on the game library idea #623 then games can be removed from the test dir completely and loaded directly from the library, however that ends up looking.

A related point is that in the test suite we now use a mixture of .nfg and .efg files and functions in games.py that create a Gambit game object.

Ted made the good point that we don't want one of those functions in games.py failing to bring down all tests.

In addition, some of these functions do quite non-trivial things and we could easily break them in a way that they still run but do not produce the correct game. Presumably often this would then break some tests, but still I think Ted and I agree that a better way to go would likely be:

Have these game creation functions as part of the game library (broadly construed).
Have tests that test the output of those game creation functions, e.g., by comparing to hard wired .efg files.

One implication is that some things in the game library will be good for testing certain aspects of Gambit, but may not be of as much general interest as others, but that seems fine, as we will be explaining the relevance of all games in the library anyway.

tturocy · 2026-01-05T10:59:00Z

Yes. Games will be "interesting" for different reasons:

For general users, games will tend to be interesting because of their application domain (e.g. "this is an auction", "this is a market game", "this is a pursuer-evader game").
For slightly more technical game theorists, games will be interesting because they illustrate certain examples (e.g. the Myerson figure 4.2 example discussed in the work on clarifying agent-form computations).
For very technical computational game theorists, games will be interesting because they are important cases in specific methods (degeneracies, long Lemke-Howson paths, "backwards-bending" parts of quantal response equilibrium correspondences).

In terms of designing our dataset of games, we probably don't want to distinguish at the dataset level too much among these different types of example (because some will fit more than one category). Where they'll be distinguished is more in the user-facing documentation/tutorials - for example the first type of game would appear in examples/tutorials for domain applications, the second more for game-theory teaching, the final group more in more technical algorithmic examples...

d-kad

To align with this, I will refactor (i) reachability, (ii) perfect recall, (iii) WIP subgame root tests to use the dataclass and factory pattern

tturocy added 2 commits December 23, 2025 15:45

Draft structure of organising test cases as dataclasses, using subtests.

156083b

Suggestion for writing a generic Nash tester.

549b2e6

tturocy requested review from StephenPasteris, d-kad, edwardchalstrey1 and rahulsavani December 23, 2025 16:24

tturocy mentioned this pull request Jan 15, 2026

TEST: Test collection fails if there's an error in a game generator in games.py #607

Open

d-kad approved these changes Jan 15, 2026

View reviewed changes

edwardchalstrey1 mentioned this pull request Jan 15, 2026

Add algorithms page #682

Open

d-kad mentioned this pull request Jan 16, 2026

Refactor infoset tests to use factory pattern and dataclasses. #737

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactoring test fixtures #724

Refactoring test fixtures #724

Uh oh!

tturocy commented Dec 23, 2025 •

edited by rahulsavani

Loading

Uh oh!

edwardchalstrey1 commented Jan 5, 2026

Uh oh!

tturocy commented Jan 5, 2026

Uh oh!

rahulsavani commented Jan 5, 2026 •

edited

Loading

Uh oh!

tturocy commented Jan 5, 2026

Uh oh!

d-kad left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

Refactoring test fixtures #724

Are you sure you want to change the base?

Refactoring test fixtures #724

Uh oh!

Conversation

tturocy commented Dec 23, 2025 • edited by rahulsavani Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

edwardchalstrey1 commented Jan 5, 2026

Uh oh!

tturocy commented Jan 5, 2026

Uh oh!

rahulsavani commented Jan 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tturocy commented Jan 5, 2026

Uh oh!

d-kad left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

tturocy commented Dec 23, 2025 •

edited by rahulsavani

Loading

rahulsavani commented Jan 5, 2026 •

edited

Loading