Generate stub tests for all expression arithmetic#1229
Conversation
|
Holy moly! This is an interesting approach, and it does seems to be a good one. But how do you envision things moving forward? Merging this and then doing focused PRs, or just iterating on this one giant PR? Also @Zeroto521 , you might be interested in a couple of these, as you've been dealing with expressions a bunch. |
I would prefer to merge this and then improve things bit by bit.
Yes the failures are interesting to note, for instance the errors due to |
To avoid counting shape broadcasting errors as typing errors. Since shape-typing is nearly impossible to achieve, let's assume shapes are correct and focus on typing the operation result.
|
Alright, let's try it out. But would it possible not to merge the giant files? Those are just the output of the new stub tests, right? |
|
Do you mean just keep the output of the script and the baseline for local development? Sure it's possible, though it means it's hard to see the impact without running them locally for each change. I don't mind either way, it's more of a reviewer concern I believe, let me know what you prefer. |
|
I was thinking the opposite, actually, since the 20k file can be regenerated pretty quickly, and the diff on the baseline is useful, as you say. But it's not a big preference. Something that should happen, however, is fixing a mypy version, because there'll be some inconsistencies otherwise, I believe. |
|
Ah yes I get it, and I agree! The baseline is the important artifact, the generated tests don't need to be committed. I've went in that direction and removed the generated file from the repo. I also pinned Mypy and all needed dependencies. One thing to note is that the error messages change with the versions used, so the committed baseline can only match one specific version set. All dependencies are locked in a |
Joao-Dionisio
left a comment
There was a problem hiding this comment.
Alright, @jonathanberthias , let's do this then :)
I'm on the final months before submitting my thesis, so I'll probably take a little bit to review the incoming PRs, but I'll do my best 💪
|
Btw, this relates to the first and third points of the checklist in #1072 , right? |
Currently almost all the arithmetic operations are untyped in the stubs. This is one area where the stubs would be the most useful since there are no arguments to help understand the objects being used.
To make it easier to introduce these stubs, this PR generates typing tests to check how good the stubs match the runtime behavior. A script runs all operations and captures the result type, and then a python file is generated with assertions for type checkers.
Then we can run type checkers (Mypy) against these expectations and check where the current stubs do not match the runtime behavior. I believe this is a great way to track progress and ensure the stubs stay faithful to the implementation.
One issue is that the generated file is huge. The type checker errors baseline should go down as the stubs are improved, though I don't expect all errors to ever be solved, since Python's type system can't represent all the possible behaviors (yet).