Refactor code into every_eval_ever namespace and add modal CLI#57
Refactor code into every_eval_ever namespace and add modal CLI#57nelaturuharsha merged 21 commits intoevaleval:mainfrom
Conversation
|
Also note the github diff reports that the schema files were deleted and re-added. I think this is a consequence of adding the symlinks to keep existing links alive as I did Which on my machine results in: |
every_eval_ever/converters/README.md
Outdated
|
|
||
| ```bash | ||
| uv run python3 -m eval_converters.inspect --log_path tests/data/inspect/2026-02-07T11-26-57+00-00_gaia_4V8zHbbRKpU5Yv2BMoBcjE.json | ||
| uv run --extra inspect every_eval_ever convert inspect eee --log-path tests/data/inspect/2026-02-07T11-26-57+00-00_gaia_4V8zHbbRKpU5Yv2BMoBcjE.json |
There was a problem hiding this comment.
The full commands (below examples) are outdated after these changes. We probably may remove them.
Also, I’d like to simplify this command. What is the purpose of the word eee? I suspect every-eval-ever convert inspect would be clearer.
There was a problem hiding this comment.
This is because their is a destination positional argument that indicates which format to convert to. It has a default so its not required to include it. It would probably be cleaner to simply remove it as there is only one destination format. I can do that.
Also as a note to myself, I see log_path got changed to log-path, and I should undo that.
WRT to full commands after, I will clean that up.
Also, if you are open to adding a dependency on my scriptconfig CLI library we can organize the argument parsers in a much more declarative way that keeps the entrypoint much closer to the code they belong to. The argparse code works, but I always think it looks ugly. With scriptconfig you also get several quality of life improvements including opt-in autocomplete for free as well as nice colored CLIs based on its tie in with rich-argparse and argcomplete. I'd recommend landing a baseline pure-stdlib approach first, but if you are interested I could make a follow on PR.
There was a problem hiding this comment.
Maybe a follow-up PR with dependency for scriptconfig CLI library would be best option, keep things as simple as possible in this PR.
README.md
Outdated
| Validation command: | ||
|
|
||
| ```bash | ||
| every_eval_ever validate <path-or-dir> |
There was a problem hiding this comment.
We do not run validation and duplication checks in the github. It's only valid in the HF during uploading new data. So we don't need it here. Convert option is enough
There was a problem hiding this comment.
We probably should though, at least in a CI test suite to cover more of the code with tests. We should also show how to do it in the README.
There was a problem hiding this comment.
It's harder to check duplicates now because data aren't stored in the github anymore (and it's still run when uploading data to HF, so don't see sense to do it). Validation of data will be rewritten slightly (full validation via pydantic) so it isn't needed imo.
|
@damian1996 I've resolved merge conflicts, cleaned up the diff, and addressed prior discussion points. The schemas on main changed, so I reverified that the merge updated them correctly: Note: resolving the merge conflicts got a little messy, so I would recommend squashing this PR if it is accepted instead of creating a merge commit. That will keep the history of the repo cleaner. If desired, I could do a rebase here, but I'm punting on that unless you want it. |
|
Hi @Erotemic, the validation workflow is now going to be via pydantic + other changes will land once #69 is merged and there are some clean-ups regarding dependencies via #71. Could you check if the incoming changes affect your PR? I'd assume the validation changing might but I've not had a chance to look through yours. Will prioritize this PR once I am done with changes to the validation pipeline (I am working on it assuming this PR will be merged - but don't want to rush this one). |
|
#71 did cause conflicts, but I merged in the latest main and addressed them. For #69 there will be merge conflicts, but the core intent of it and this PR are different and compatible, if you want to merge that one first, I can update my PR accordingly. Otherwise, if this is merged first, then the changes in #71 basically need to be moved into the correct new locations and integrated with the CLI. |
It looks good to me but maybe wait until @nelaturuharsha will merge his PR and then let him make a pass through it as well. |
|
@damian1996 ready for re-review |
|
@evijit can you trigger co-pilot review? Thanks! |
There was a problem hiding this comment.
Pull request overview
Refactors the repository into an installable every_eval_ever Python package (suitable for PyPI), bundles schemas under the package namespace, and adds a top-level CLI (plus converter CLIs) alongside CI workflows that validate data via the package entrypoint.
Changes:
- Moved library code + generated Pydantic models + JSON schemas under
every_eval_ever/and updated imports across utils/tests. - Added a unified CLI (
every_eval_everconsole script) withvalidate,check-duplicates, andconvertsubcommands, plus converter-specific CLIs. - Added GitHub Actions workflows for validating data and regenerating types based on package-local schemas.
Reviewed changes
Copilot reviewed 45 out of 59 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| utils/rewardbench/adapter.py | Switch imports to every_eval_ever.* and remove sys.path manipulation. |
| utils/hfopenllm_v2/adapter.py | Switch imports to every_eval_ever.* and remove sys.path manipulation. |
| utils/helm/adapter.py | Switch imports to every_eval_ever.* and remove sys.path manipulation. |
| utils/global-mmlu-lite/adapter.py | Switch imports to every_eval_ever.* and remove sys.path manipulation. |
| tests/test_validate.py | Update validate imports and fixture schema versions. |
| tests/test_lm_eval_adapter.py | Update imports to new package converter paths. |
| tests/test_inspect_instance_level_adapter.py | Update imports to new package converter paths. |
| tests/test_inspect_adapter.py | Update imports to new package converter paths. |
| tests/test_helm_instance_level_adapter.py | Update imports to new package converter paths. |
| tests/test_helm_adapter.py | Update imports to new package converter paths. |
| tests/test_check_duplicate_entries.py | Test against every_eval_ever.check_duplicate_entries and new main(argv) return code. |
| pyproject.toml | Add build-system + setuptools package discovery/data and CLI entrypoint. |
| post_codegen.py | Update paths/usage for package-local schemas and generated types. |
| eval.schema.json | Convert root schema file to a symlink target pointing into every_eval_ever/schemas/. |
| instance_level_eval.schema.json | Convert root schema file to a symlink target pointing into every_eval_ever/schemas/. |
| every_eval_ever/validate.py | Move validator into package and make main(argv)->int for CLI integration. |
| every_eval_ever/check_duplicate_entries.py | Move duplicate checker into package and make main(argv)->int for CLI integration. |
| every_eval_ever/schemas/eval.schema.json | Add bundled aggregate schema under package. |
| every_eval_ever/schemas/instance_level_eval.schema.json | Add bundled instance-level schema under package. |
| every_eval_ever/schemas/init.py | Mark schemas as a package for resource loading. |
| every_eval_ever/schema.py | Add helpers for loading bundled schemas via importlib.resources. |
| every_eval_ever/eval_types.py | Relocate generated aggregate Pydantic types under package. |
| every_eval_ever/instance_level_types.py | Relocate generated instance-level Pydantic types under package. |
| every_eval_ever/helpers/schema.py | Update imports and extend make_source_metadata(..., additional_details=...). |
| every_eval_ever/helpers/io.py | Update imports to package namespace. |
| every_eval_ever/helpers/fetch.py | Add HTTP fetching utilities (requests-based). |
| every_eval_ever/helpers/developer.py | Add developer/model ID derivation helpers. |
| every_eval_ever/helpers/init.py | Export helper utilities via __all__. |
| every_eval_ever/converters/init.py | Reintroduce converter SCHEMA_VERSION constant under new namespace. |
| every_eval_ever/converters/README.md | Update docs/commands to use new package CLI + extras. |
| every_eval_ever/converters/common/adapter.py | Update imports to package namespace for shared adapter base. |
| every_eval_ever/converters/common/error.py | Add shared adapter exception types. |
| every_eval_ever/converters/common/utils.py | Add shared utils (timestamps, HF lookup, hashing). |
| every_eval_ever/converters/common/init.py | Package marker for common converters module. |
| every_eval_ever/converters/inspect/adapter.py | Update imports + make inspect dependency optional with runtime gating. |
| every_eval_ever/converters/inspect/instance_level_adapter.py | Make inspect dependency optional with runtime gating. |
| every_eval_ever/converters/inspect/utils.py | Update imports to new common utils / types. |
| every_eval_ever/converters/inspect/main.py | Update entrypoint imports to new namespace. |
| every_eval_ever/converters/inspect/init.py | Package marker for inspect converter module. |
| every_eval_ever/converters/helm/adapter.py | Make helm dependency optional with runtime gating; update imports. |
| every_eval_ever/converters/helm/instance_level_adapter.py | Make helm dependency optional with runtime gating; update imports. |
| every_eval_ever/converters/helm/utils.py | Make helm dependency optional for type import. |
| every_eval_ever/converters/helm/main.py | Update entrypoint imports to new namespace. |
| every_eval_ever/converters/helm/init.py | Package marker for helm converter module. |
| every_eval_ever/converters/lm_eval/adapter.py | Update imports to new namespace. |
| every_eval_ever/converters/lm_eval/instance_level_adapter.py | Update imports to new namespace. |
| every_eval_ever/converters/lm_eval/utils.py | Add lm-eval helper utilities for parsing args, locating samples, etc. |
| every_eval_ever/converters/lm_eval/main.py | Add CLI for lm-eval conversion under package namespace. |
| every_eval_ever/converters/lm_eval/init.py | Package marker for lm-eval converter module. |
| every_eval_ever/cli.py | Add top-level CLI: validate, check-duplicates, convert (lm_eval/inspect/helm). |
| every_eval_ever/init.py | Define lightweight public API via lazy module loading. |
| every_eval_ever/main.py | Add python -m every_eval_ever entrypoint wrapper. |
| eval_converters/init.py | Remove old namespace’s SCHEMA_VERSION constant. |
| README.md | Update documentation to reflect package install, new CLI, and new workflow path. |
| .github/workflows/validate-data.yml | Add data validation workflow invoking the package CLI. |
| .github/workflows/regenerate_types.yml | Update type regeneration workflow to use package-local schemas/outputs. |
Comments suppressed due to low confidence (2)
every_eval_ever/converters/inspect/adapter.py:106
AdapterMetadatarequiressupported_library_versions, butmetadata()constructs it with onlyname,version, anddescription. AccessingInspectAIAdapter.metadatawill raiseTypeError: __init__() missing 1 required positional argument. Populatesupported_library_versions(or make the field optional/defaulted in the dataclass).
every_eval_ever/converters/helm/adapter.py:376- Several HELMAdapter method contracts are inconsistent with
BaseEvaluationAdapter:_transform_single()is annotated as returning a tuple but returns onlyEvaluationLog, andtransform_from_directory()takes an extraoutput_pathparameter compared to the abstract base signature. These mismatches make it hard to call the adapter polymorphically and confuse static analysis. Additionally,metadata()constructsAdapterMetadatawithout the requiredsupported_library_versionsfield, which will raiseTypeErrorif accessed. Consider aligning these signatures/annotations with the base class and populating all required AdapterMetadata fields.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Here's an initial pass at a refactor to make this module easier to deploy to pypi and use as a standalone tool.
every_eval_evernamespace.every_eval_everbecomes a command line tool.I didn't do github actions stuff, but you can at least
uv pip install -e .the repo now and get a nice CLI.On slack there was mention of github actions existing, but I didn't see a .github directory and lack of CI tests for the PR makes me nervous. Are there CI tests somewhere? I do see that actions are running in the actions tab, but I've never set them up without a .github folder.
Note: the script in utils/helm/parse_helm_leaderboards.sh seems to have an error which I don't think is introduced by this PR. I've left it alone for now, but in general
uv runcommands would need to be updated withuv run --extra <extra>for whichever extra they needed.