feat: automatically reproduce model from reproduce.json#326
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces a reproduction feature that allows loading model reproduction information from a local file or URL via a new --reproduce CLI flag. The changes include updates to the configuration settings, CLI argument handling, and a new utility function for fetching and parsing JSON data. Feedback focuses on adhering to the repository style guide by updating the default configuration file and formatting comments correctly. Additionally, it is recommended to use a context manager and timeout for network requests to ensure better resource handling and process stability.
| path = path.replace("/blob/", "/raw/") # Hugging Face, GitHub | ||
| path = path.replace("/src/branch/", "/raw/branch/") # Codeberg | ||
|
|
||
| json_str = urlopen(path).read().decode("utf-8") |
There was a problem hiding this comment.
It is recommended to use a context manager with urlopen to ensure the connection is properly closed. Additionally, adding a timeout prevents the process from hanging indefinitely on network issues.
| json_str = urlopen(path).read().decode("utf-8") | |
| with urlopen(path, timeout=10) as response: | |
| json_str = response.read().decode("utf-8") |
There was a problem hiding this comment.
why not use hf_hub_download or snapshot_download method of huggingface_hub.
The reason Ig is "raw url provides a broad range of source for downloading", though above methods can work if we just specify a Repo/ID
There was a problem hiding this comment.
Yes, we also want to support files stored on GitHub and Codeberg, where a public archive of reproduce.json files will be.
There was a problem hiding this comment.
HF is kind of the least useful here, because if the repo is still available there's really no need to reproduce, you can just download the model directly.
There was a problem hiding this comment.
Understood, that's correct
This improves security when running Heretic with an untrusted config file. The prompt is now always shown. This is NOT a breaking change, because we currently ignore values for unknown settings, so existing configs continue to work.
|
This is ready for testing! AFAICT, it appears to work, though the situation with cloud GPUs is currently catastrophic so I haven't been able to comprehensively test in different environments. Checksum validation is still missing. |
Cloud TestBase Model (Reproducible Compatible): heretic-org/MiniCPM-V-4.6-heretic 1Loading reproduction information from
https://huggingface.co/buckets/heretic-org/Heretic-Reproducibles-Storage/resolve
/heretic-reproducibles/data/huggingface.co/heretic-org/MiniCPM-V-4.6-heretic-32b
28aa.json?download=true...
Your local environment doesn't perfectly match the environment used to produce
the original model. The following components differ:
System Mismatches
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Component ┃ This system ┃ Original system ┃ Severity ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ CUDA Version │ 12.8 │ 13.0 │ medium │
└──────────────┴─────────────┴─────────────────┴──────────┘
Package Mismatches
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Package ┃ This system ┃ Original system ┃ Severity ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ certifi │ 2026.5.20 │ 2026.4.22 │ low │
│ click │ 8.4.0 │ 8.3.3 │ low │
│ cuda-bindings │ 12.9.6 │ 13.2.0 │ low │
│ cuda-pathfinder │ 1.5.4 │ 1.5.3 │ low │
│ cuda-toolkit │ 12.8.1 │ 13.0.2 │ low │
│ greenlet │ 3.5.1 │ 3.4.0 │ low │
│ heretic-llm │ 1.3.0-git+https://gith │ 1.3.0 │ critical │
│ │ ub.com/p-e-w/heretic.g │ │ │
│ │ it@3bcb2f6d2b5fd419af8 │ │ │
│ │ 55e3d52690bc7cdc58d77 │ │ │
│ hf-xet │ 1.5.0 │ 1.4.3 │ low │
│ huggingface-hub │ 1.16.1 │ 1.11.0 │ low │
│ idna │ 3.15 │ 3.13 │ low │
│ kernels-data │ 0.15.2 │ 0.14.1 │ low │
│ lxml │ 6.1.1 │ 6.1.0 │ low │
│ mako │ 1.3.12 │ 1.3.11 │ low │
│ markdown-it-py │ 4.2.0 │ 4.0.0 │ low │
│ numpy │ 2.4.6 │ 2.4.4 │ low │
│ nvidia-cublas │ │ 13.1.1.3 │ low │
│ nvidia-cublas-cu12 │ 12.8.4.1 │ │ low │
│ nvidia-cuda-nvrtc │ │ 13.0.88 │ low │
│ nvidia-cudnn-cu12 │ 9.19.0.56 │ │ low │
│ nvidia-cudnn-cu13 │ │ 9.20.0.48 │ low │
│ nvidia-cusparselt-cu12 │ 0.7.1 │ │ low │
│ nvidia-cusparselt-cu13 │ │ 0.8.1 │ low │
│ nvidia-nccl-cu12 │ 2.28.9 │ │ low │
│ nvidia-nccl-cu13 │ │ 2.29.7 │ low │
│ nvidia-nvshmem-cu12 │ 3.4.5 │ │ low │
│ nvidia-nvshmem-cu13 │ │ 3.4.5 │ low │
│ optuna │ 4.9.0 │ 4.8.0 │ medium │
│ packaging │ 26.2 │ 26.1 │ low │
│ pydantic-settings │ 2.14.1 │ 2.14.0 │ low │
│ torch │ 2.11.0+cu128 │ 2.12.0+cu130 │ high │
│ torchvision │ 0.26.0 │ 0.27.0 │ low │
│ transformers │ 5.10.1 │ 5.8.1 │ high │
│ triton │ 3.6.0 │ 3.7.0 │ medium │
│ typer │ 0.25.1 │ 0.24.2 │ low │
│ tzdata │ 2026.2 │ 2026.1 │ low │
│ wcwidth │ 0.7.0 │ 0.6.0 │ low │
│ xxhash │ 3.7.0 │ 3.6.0 │ low │
│ zipp │ 4.1.0 │ 3.23.1 │ low │
└────────────────────────┴────────────────────────┴─────────────────┴──────────┘
There is a critical chance that reproduction won't produce a byte-for-byte
identical model. However, the resulting model will very likely still behave
similarly to the original model.Result: Unidentical SHA256 of the VINAY-UMRETHE/MiniCPM-V-4.6-heretic-AUTO-REPRODUCE-Test1 2Loading reproduction information from
https://huggingface.co/buckets/heretic-org/Heretic-Reproducibles-Storage/resolve
/heretic-reproducibles/data/huggingface.co/heretic-org/MiniCPM-V-4.6-heretic-32b
28aa.json?download=true...
Your local environment doesn't perfectly match the environment used to produce
the original model. The following components differ:
Package Mismatches
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Package ┃ This system ┃ Original system ┃ Severity ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ heretic-llm │ 1.3.0-git+https://github.com/p-e- │ 1.3.0 │ critical │
│ │ w/heretic.git@3bcb2f6d2b5fd419af8 │ │ │
│ │ 55e3d52690bc7cdc58d77 │ │ │
└─────────────┴───────────────────────────────────┴─────────────────┴──────────┘
There is a critical chance that reproduction won't produce a byte-for-byte
identical model. However, the resulting model will very likely still behave
similarly to the original model.Result: Unidentical SHA256 again for VINAY-UMRETHE/MiniCPM-V-4.6-heretic-AUTO-REPRODUCE-Test2 OverallThe KLD and Refusal count remained the same for both the Tests, But not byte-identical as they should have. I think most likely reason for it is: Note Two Incoming tests running now, which'll manually run trials using |
I hope that isn't the problem, because that would seem to be very hard to fix. I did check a while ago that the |
|
I just checked and it appears that Python guarantees that floats are preserved perfectly through a |
maybe the decimals are not complete (not full precision) or PyTorch operations or Heretic version is the issue Correct way would be: Create a fresh model with the |
in that case its not a problem, it has full precision |
The decimals are complete. Python guarantees this through the "Grisu3/Dragon4" algorithm, apparently. |
|
Hash verification is now implemented for models uploaded to HF. |
* fix: Check if a model is gated / accessible * fix: handle unknown gated models * feat: Auto install requirements * simplify * Revert "simplify" This reverts commit 1028792. * Revert "feat: Auto install requirements" This reverts commit f4be1ab. * fix: Seed pytorch method * reference, style * simplify token * feat: Export strategy in reproduce.json, v2 * style: Name * simplify export strategy * style: Rename * enumeration * maybe remove seed as well * fix: don't lock settings with permanent strategy * simplify no choice, use try/finally block
|
I've now also implemented hash verification for locally saved files, and it appears to work. As for the version, I've decided on the following approach: We indeed bump the version to |
The only possible way for exact reproduction is to run all trials like original run for
Although so both options are good, (1) is compatible (2) is easy less maintenance But I don't think |
On the contrary, it's the most important thing by a huge margin. Every single release so far would have broken reproduction. We have changed prompt alignment, the algorithm for computing residual means, objective computations, and many other things. And 1.4 is no exception, given that we now re-seed before When the versions of the Heretic package differ, there's essentially no chance to reproduce exactly, and I expect that to continue to be true in the future. |
Yes that's why I said "unless the config.default is changed in anyway (a config / method) which was optional in previous versions but default in new one" so that applies to all future versions. |
|
But that's not the main problem. It's not that we changed the config in the past, we have repeatedly changed how Heretic works internally. This is purely a code issue. For example, in 1.3 we introduced computing residual means in batches. That changes the outcome numerically, but it's not configurable. |
|
Ok, now I'm going to do a quick test and then this is going in.
That's true, but it would be a lot of hassle, and also require further hacks like making the re-seeding conditional depending on which reproduce version is loaded. I think it's much better to just move forward and accept that v1 models can't be reproduced exactly. |
I mean the same that heretic's internals change based on versions, maybe "config" was wrong wording |
|
|
|
What update do you mean? Just the information that it's now possible to use |
Test resultshttps://huggingface.co/p-e-w/gemma-4-E4B-it-heretic (original model) https://huggingface.co/p-e-w/gemma-4-E4B-it-heretic-REPRODUCED (reproduced on the same system, perfect match both for HF upload and local save) https://huggingface.co/p-e-w/gemma-4-E4B-it-heretic-REPRODUCED-2 (reproduced on system with different GPU, 2 out of 4 shards differ, likely those that contain the layers the LoRA applied to) System MismatchesSo here we have definitive proof that reproduction isn't guaranteed even if the only differences are in the hardware. |
|
Why check the hashes? Maybe going through all the tensors and firing "all_close" with a configurable espilon could be a better solution? |
|
For easier Huggingface upload (avoid timeouts) people can shard the model into smaller partitions and then there might not even be matching shards at all |
|
The shard size is configurable in Heretic, and the configuration is loaded during reproduction, which ensures the shards are exactly the same. |
The problem is that when uploading to HF, finding the model files on the local disk is pretty tricky, so re-downloading is the safest solution, which sucks. |
|
I do like the idea of doing a tensor check in principle though. |
|
Ah wait, the tensor comparison idea actually doesn't work at all. The whole point of the reproduction mechanism is that the So we can't reliably compare tensors, because there might not be tensors to compare to. By contrast, the hashes are stored inside the |
|
Merged! I added a note to the generated reproduction README explaining how to use this mechanism. |
Automatic model reproduction from the
reproduce.jsonfile generated during model upload if the user chooses to add reproducibility information.Features: