Skip to content

Conversation

@tturocy
Copy link
Member

@tturocy tturocy commented Dec 23, 2025

Trying to understand the output from the failing Nash equilibrium tests recently just about did my head in, so I decided it was high time that we really need to look at our tests. So I pulled together a bunch of notes/thoughts I had been having (alongside reading up on some pytest features)..

This draft pull request is a proposal for discussion.

In short - I have re-written what used to be the test for enummixed_solve with rational probabilities to be a generic Nash equilibrium solver tester (!!!)

Highlights:

  • Test cases are represented as a dataclass
  • Games are constructed using a factory function (this solves a problem we noted previously that if the game construction fails it takes down the test run at collection time)
  • The solver is likewise specified as a callable
  • Tests are given an identifier of our choosing. I have been lazy and just called them test1, test2 &c right now, they will need to be better in production of course.
  • Tolerances are configurable for regrets and probabilities on a per-test case. So no global setting is hard-coded, and we can handle both float and rational with the same test function.
  • The tests now use pytest-subtests (which has been added as a dependency in the tests requirements.txt). This allows us to isolate the individual checks within the test so even if one fails the others can still run - e.g. if somehow max_regret is correct but the probabilities change for one profile, it still will check them all.
  • Experimenting with using Q() as a shorthand for rationals just to make our intent precise, and a helper d() which visually represents a probability distribution - solely for visual layout so we are not lost in a maze of square brackets, all alike.

This is an all hands request for everyone to have a look. Not least because in the new year I will be asking everyone to kick in for rationalising our test suite using either this technique or some other one - it is clear our test suite's reaching the edge of maintainability without making some investments like the ones suggested here!

@edwardchalstrey1
Copy link
Member

Games are constructed using a factory function (this solves a problem we noted previously that if the game construction fails it takes down the test run at collection time)

^ do we not want this to trigger the test failing - or is it the idea that only the game construction tests in test_game.py should fail if breaking changes to game construction are made?

I like the EquilibriumTestCase especially as it gives us the flexibility to test as many game/solver combos as required. Is the idea to have a test (or several) per solver?

Also just noting that when I work on the game library idea #623 then games can be removed from the test dir completely and loaded directly from the library, however that ends up looking.

@tturocy
Copy link
Member Author

tturocy commented Jan 5, 2026

Games are constructed using a factory function (this solves a problem we noted previously that if the game construction fails it takes down the test run at collection time)

^ do we not want this to trigger the test failing - or is it the idea that only the game construction tests in test_game.py should fail if breaking changes to game construction are made?

They will still fail - but they will fail when the test is run. In our current test suite, if something causes game construction to fail, this fails at test collection time - so no tests run.

@rahulsavani
Copy link
Member

rahulsavani commented Jan 5, 2026

Also just noting that when I work on the game library idea #623 then games can be removed from the test dir completely and loaded directly from the library, however that ends up looking.

A related point is that in the test suite we now use a mixture of .nfg and .efg files and functions in games.py that create a Gambit game object.

Ted made the good point that we don't want one of those functions in games.py failing to bring down all tests.

In addition, some of these functions do quite non-trivial things and we could easily break them in a way that they still run but do not produce the correct game. Presumably often this would then break some tests, but still I think Ted and I agree that a better way to go would likely be:

  • Have these game creation functions as part of the game library (broadly construed).
  • Have tests that test the output of those game creation functions, e.g., by comparing to hard wired .efg files.

One implication is that some things in the game library will be good for testing certain aspects of Gambit, but may not be of as much general interest as others, but that seems fine, as we will be explaining the relevance of all games in the library anyway.

@tturocy
Copy link
Member Author

tturocy commented Jan 5, 2026

Yes. Games will be "interesting" for different reasons:

  • For general users, games will tend to be interesting because of their application domain (e.g. "this is an auction", "this is a market game", "this is a pursuer-evader game").
  • For slightly more technical game theorists, games will be interesting because they illustrate certain examples (e.g. the Myerson figure 4.2 example discussed in the work on clarifying agent-form computations).
  • For very technical computational game theorists, games will be interesting because they are important cases in specific methods (degeneracies, long Lemke-Howson paths, "backwards-bending" parts of quantal response equilibrium correspondences).

In terms of designing our dataset of games, we probably don't want to distinguish at the dataset level too much among these different types of example (because some will fit more than one category). Where they'll be distinguished is more in the user-facing documentation/tutorials - for example the first type of game would appear in examples/tutorials for domain applications, the second more for game-theory teaching, the final group more in more technical algorithmic examples...

Copy link
Contributor

@d-kad d-kad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To align with this, I will refactor (i) reachability, (ii) perfect recall, (iii) WIP subgame root tests to use the dataclass and factory pattern

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants