Skip to content

feat!: SPRAS revision#320

Open
tristan-f-r wants to merge 51 commits intoReed-CompBio:mainfrom
tristan-f-r:hash
Open

feat!: SPRAS revision#320
tristan-f-r wants to merge 51 commits intoReed-CompBio:mainfrom
tristan-f-r:hash

Conversation

@tristan-f-r
Copy link
Collaborator

@tristan-f-r tristan-f-r commented Jul 9, 2025

This change means that output files will not be reused whenever SPRAS is updated if osdf_immutable is true, furthering the immutability goal necessary to get OSDF integration working for SPRAS benchmarking. ('updated' depends on the git commit hash or the actual SPRAS release version)

This adds the unique spras_revision to every single paramater combination (before hashing) and the dataset label, to provide OSDF support on the level of deterministic, non-seeded algorithms when datasets are immutable.

This has the added benefit of allowing SPRAS users to simply upgrade their SPRAS version without needing to clear output, which complements #380. The refactored test also partially covers #165 and #45. (This is also where the majority of the code comes from: The actual feature patch here is a 50 line change.)

See #321 implemented by #335 for handling nondeterministic algorithms / seeded algorithms.


To make this change, a significant test refactor in test/analysis was needed to remove hardcoded paths (which contained the hashes being modified per-commit in this PR.) It turns out that whenever we make any change to the hash, this [original: the patch here fixes this] test breaks! That's why this PR is depended on by so many other PRs.

This adds the unique spras_revision to every single paramater combination (before hashing) and the dataset label, to provide OSDF support on the level of deterministic algorithms.
@tristan-f-r tristan-f-r marked this pull request as ready for review July 9, 2025 20:51
@tristan-f-r tristan-f-r added enhancement New feature or request needed for benchmarking Priority PRs needed for the benchmarking paper labels Jul 9, 2025
@tristan-f-r tristan-f-r changed the title feat: spras_revision feat: SPRAS revision Jul 9, 2025
@tristan-f-r

This comment was marked as outdated.

@tristan-f-r tristan-f-r marked this pull request as draft July 9, 2025 21:37
@tristan-f-r tristan-f-r marked this pull request as ready for review July 10, 2025 19:34
@tristan-f-r tristan-f-r changed the title feat: SPRAS revision feat!: SPRAS revision Jul 10, 2025
@tristan-f-r

This comment was marked as outdated.

@tristan-f-r tristan-f-r added the P-high This is a blocker for many PRs/issues/features label Jul 24, 2025
@tristan-f-r tristan-f-r added the tuning Workflow-spanning algorithm tuning label Jan 13, 2026
Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I finished another partial revision. I still haven't thought about the testing implications carefully.

@github-actions github-actions bot added the merge-conflict This PR has merge conflicts. label Jan 31, 2026
@github-actions github-actions bot removed the merge-conflict This PR has merge conflicts. label Jan 31, 2026
Copy link
Collaborator

@agitter agitter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few more comments. I still haven't looked through all the test code.

@tristan-f-r
Copy link
Collaborator Author

tristan-f-r commented Jan 31, 2026

Since both past approaches do not scale well, I've decided to only focus on the RECORD file.

This fails specifically in the case where SPRAS is somehow ran without being installed as a python module, and I can't think of a plausible scenario where this happens.

@agitter
Copy link
Collaborator

agitter commented Feb 8, 2026

As a follow up to our meeting discussion, I'm wondering if this type of output file versioning should be optional. Then when running in CHTC and writing to OSDF (or running locally and opting in) it could be enabled. By making it opt in, we would have simpler filenames by default and ensure the user knows they have to install and run SPRAS a specific way for this feature to work.

@tristan-f-r
Copy link
Collaborator Author

That makes the most sense to me as well 👍

#
# By default, this is disabled, as it can make output file names confusing. Here, it's set to true since we use this
# configuration file for testing.
osdf_immutable: true
Copy link
Collaborator Author

@tristan-f-r tristan-f-r Feb 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little annoying. We use this config for testing, so it's nice to enable this, but this is also our documentation config. I can write some extra code to enable this during testing, but that seems strange as well.

For now, I'm okay with keeping this then writing more documentation later (especially as we start focusing more on the COMBINE25 tutorial.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request needed for benchmarking Priority PRs needed for the benchmarking paper P-high This is a blocker for many PRs/issues/features tuning Workflow-spanning algorithm tuning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants