feat: automatically reproduce model from reproduce.json by p-e-w · Pull Request #326 · p-e-w/heretic

p-e-w · 2026-05-12T06:45:18Z

Automatic model reproduction from the reproduce.json file generated during model upload if the user chooses to add reproducibility information.

Features:

Supports loading either a local file or an HTTP URL, with special handling for Hugging Face, GitHub, and Codeberg URLs.
Checks the local environment for mismatches with the original environment, and evaluates the likelihood of reproduction failure if there are any.
Validates the reproduced model against the original model's checksums after export.
- For models uploaded to Hugging Face.
- For models saved locally.

gemini-code-assist

Code Review

This pull request introduces a reproduction feature that allows loading model reproduction information from a local file or URL via a new --reproduce CLI flag. The changes include updates to the configuration settings, CLI argument handling, and a new utility function for fetching and parsing JSON data. Feedback focuses on adhering to the repository style guide by updating the default configuration file and formatting comments correctly. Additionally, it is recommended to use a context manager and timeout for network requests to ensure better resource handling and process stability.

gemini-code-assist · 2026-05-12T06:46:52Z

+        path = path.replace("/blob/", "/raw/")  # Hugging Face, GitHub
+        path = path.replace("/src/branch/", "/raw/branch/")  # Codeberg
+
+        json_str = urlopen(path).read().decode("utf-8")


It is recommended to use a context manager with urlopen to ensure the connection is properly closed. Additionally, adding a timeout prevents the process from hanging indefinitely on network issues.

Suggested change

json_str = urlopen(path).read().decode("utf-8")

with urlopen(path, timeout=10) as response:

json_str = response.read().decode("utf-8")

Fair enough.

why not use hf_hub_download or snapshot_download method of huggingface_hub.

The reason Ig is "raw url provides a broad range of source for downloading", though above methods can work if we just specify a Repo/ID

Yes, we also want to support files stored on GitHub and Codeberg, where a public archive of reproduce.json files will be.

HF is kind of the least useful here, because if the repo is still available there's really no need to reproduce, you can just download the model directly.

Understood, that's correct

This improves security when running Heretic with an untrusted config file. The prompt is now always shown. This is NOT a breaking change, because we currently ignore values for unknown settings, so existing configs continue to work.

p-e-w · 2026-06-04T06:08:01Z

@Vinay-Umrethe

This is ready for testing! AFAICT, it appears to work, though the situation with cloud GPUs is currently catastrophic so I haven't been able to comprehensively test in different environments.

Checksum validation is still missing.

Vinay-Umrethe · 2026-06-04T11:51:28Z

Cloud Test

Base Model (Reproducible Compatible):

heretic-org/MiniCPM-V-4.6-heretic

1

Loading reproduction information from 
https://huggingface.co/buckets/heretic-org/Heretic-Reproducibles-Storage/resolve
/heretic-reproducibles/data/huggingface.co/heretic-org/MiniCPM-V-4.6-heretic-32b
28aa.json?download=true...

Your local environment doesn't perfectly match the environment used to produce 
the original model. The following components differ:

System Mismatches
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Component    ┃ This system ┃ Original system ┃ Severity ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ CUDA Version │ 12.8        │ 13.0            │ medium   │
└──────────────┴─────────────┴─────────────────┴──────────┘

Package Mismatches
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Package                ┃ This system            ┃ Original system ┃ Severity ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ certifi                │ 2026.5.20              │ 2026.4.22       │ low      │
│ click                  │ 8.4.0                  │ 8.3.3           │ low      │
│ cuda-bindings          │ 12.9.6                 │ 13.2.0          │ low      │
│ cuda-pathfinder        │ 1.5.4                  │ 1.5.3           │ low      │
│ cuda-toolkit           │ 12.8.1                 │ 13.0.2          │ low      │
│ greenlet               │ 3.5.1                  │ 3.4.0           │ low      │
│ heretic-llm            │ 1.3.0-git+https://gith │ 1.3.0           │ critical │
│                        │ ub.com/p-e-w/heretic.g │                 │          │
│                        │ it@3bcb2f6d2b5fd419af8 │                 │          │
│                        │ 55e3d52690bc7cdc58d77  │                 │          │
│ hf-xet                 │ 1.5.0                  │ 1.4.3           │ low      │
│ huggingface-hub        │ 1.16.1                 │ 1.11.0          │ low      │
│ idna                   │ 3.15                   │ 3.13            │ low      │
│ kernels-data           │ 0.15.2                 │ 0.14.1          │ low      │
│ lxml                   │ 6.1.1                  │ 6.1.0           │ low      │
│ mako                   │ 1.3.12                 │ 1.3.11          │ low      │
│ markdown-it-py         │ 4.2.0                  │ 4.0.0           │ low      │
│ numpy                  │ 2.4.6                  │ 2.4.4           │ low      │
│ nvidia-cublas          │                        │ 13.1.1.3        │ low      │
│ nvidia-cublas-cu12     │ 12.8.4.1               │                 │ low      │
│ nvidia-cuda-nvrtc      │                        │ 13.0.88         │ low      │
│ nvidia-cudnn-cu12      │ 9.19.0.56              │                 │ low      │
│ nvidia-cudnn-cu13      │                        │ 9.20.0.48       │ low      │
│ nvidia-cusparselt-cu12 │ 0.7.1                  │                 │ low      │
│ nvidia-cusparselt-cu13 │                        │ 0.8.1           │ low      │
│ nvidia-nccl-cu12       │ 2.28.9                 │                 │ low      │
│ nvidia-nccl-cu13       │                        │ 2.29.7          │ low      │
│ nvidia-nvshmem-cu12    │ 3.4.5                  │                 │ low      │
│ nvidia-nvshmem-cu13    │                        │ 3.4.5           │ low      │
│ optuna                 │ 4.9.0                  │ 4.8.0           │ medium   │
│ packaging              │ 26.2                   │ 26.1            │ low      │
│ pydantic-settings      │ 2.14.1                 │ 2.14.0          │ low      │
│ torch                  │ 2.11.0+cu128           │ 2.12.0+cu130    │ high     │
│ torchvision            │ 0.26.0                 │ 0.27.0          │ low      │
│ transformers           │ 5.10.1                 │ 5.8.1           │ high     │
│ triton                 │ 3.6.0                  │ 3.7.0           │ medium   │
│ typer                  │ 0.25.1                 │ 0.24.2          │ low      │
│ tzdata                 │ 2026.2                 │ 2026.1          │ low      │
│ wcwidth                │ 0.7.0                  │ 0.6.0           │ low      │
│ xxhash                 │ 3.7.0                  │ 3.6.0           │ low      │
│ zipp                   │ 4.1.0                  │ 3.23.1          │ low      │
└────────────────────────┴────────────────────────┴─────────────────┴──────────┘

There is a critical chance that reproduction won't produce a byte-for-byte 
identical model. However, the resulting model will very likely still behave 
similarly to the original model.

Result:

Unidentical SHA256 of the model.safetensors, this test was conducted with default pre-installed packages as is without requirements.txt

VINAY-UMRETHE/MiniCPM-V-4.6-heretic-AUTO-REPRODUCE-Test1

2

Loading reproduction information from 
https://huggingface.co/buckets/heretic-org/Heretic-Reproducibles-Storage/resolve
/heretic-reproducibles/data/huggingface.co/heretic-org/MiniCPM-V-4.6-heretic-32b
28aa.json?download=true...

Your local environment doesn't perfectly match the environment used to produce 
the original model. The following components differ:

Package Mismatches
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Package     ┃ This system                       ┃ Original system ┃ Severity ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ heretic-llm │ 1.3.0-git+https://github.com/p-e- │ 1.3.0           │ critical │
│             │ w/heretic.git@3bcb2f6d2b5fd419af8 │                 │          │
│             │ 55e3d52690bc7cdc58d77             │                 │          │
└─────────────┴───────────────────────────────────┴─────────────────┴──────────┘

There is a critical chance that reproduction won't produce a byte-for-byte 
identical model. However, the resulting model will very likely still behave 
similarly to the original model.

Result:

Unidentical SHA256 again for model.safetensors, this test was conducted after installing all packages from reproduce/requirements.txt of the base model.

VINAY-UMRETHE/MiniCPM-V-4.6-heretic-AUTO-REPRODUCE-Test2

Overall

The KLD and Refusal count remained the same for both the Tests, But not byte-identical as they should have.

I think most likely reason for it is: create_trial method of Optuna which re-creates the model from raw "parameters" key in reproduce.json

Note

Two Incoming tests running now, which'll manually run trials using config.toml file with only critical package mismatch of heretic-llm version itself (test 3) and with exact heretic version (test 4)

p-e-w · 2026-06-04T12:55:45Z

I think most likely reason for it is: create_trial method of Optuna which re-creates the model from raw "parameters" key in reproduce.json

I hope that isn't the problem, because that would seem to be very hard to fix. I did check a while ago that the parameters are stored with full precision, so I don't know why they wouldn't be identical.

p-e-w · 2026-06-04T13:01:47Z

I just checked and it appears that Python guarantees that floats are preserved perfectly through a json.dumps/json.loads round trip, as long as the float is not infinity/NaN, and a modern Python version is used. So this cannot be the issue.

Vinay-Umrethe · 2026-06-04T13:01:49Z

I hope that isn't the problem, because that would seem to be very hard to fix. I did check a while ago that the parameters are stored with full precision, so I don't know why they wouldn't be identical.

maybe the decimals are not complete (not full precision) or PyTorch operations or Heretic version is the issue

Correct way would be:

Create a fresh model with the reproduce/ with auto-reproduce branch and then test, so that the last mismatch is also fixed where heretic versions differ

Vinay-Umrethe · 2026-06-04T13:02:45Z

I just checked and it appears that Python guarantees that floats are preserved perfectly through a json.dumps/json.loads round trip, as long as the float is not infinity/NaN, and a modern Python version is used. So this cannot be the issue.

in that case its not a problem, it has full precision

p-e-w · 2026-06-04T13:02:56Z

maybe the decimals are not complete (not full precision) or PyTorch operations or Heretic version is the issue

The decimals are complete. Python guarantees this through the "Grisu3/Dragon4" algorithm, apparently.

p-e-w · 2026-06-05T12:12:28Z

Hash verification is now implemented for models uploaded to HF.

* fix: Check if a model is gated / accessible * fix: handle unknown gated models * feat: Auto install requirements * simplify * Revert "simplify" This reverts commit 1028792. * Revert "feat: Auto install requirements" This reverts commit f4be1ab. * fix: Seed pytorch method * reference, style * simplify token * feat: Export strategy in reproduce.json, v2 * style: Name * simplify export strategy * style: Rename * enumeration * maybe remove seed as well * fix: don't lock settings with permanent strategy * simplify no choice, use try/finally block

p-e-w · 2026-06-10T14:58:14Z

@Vinay-Umrethe

I've now also implemented hash verification for locally saved files, and it appears to work.

As for the version, I've decided on the following approach: We indeed bump the version to 2, and we only verify hashes if the version isn't 1 (because for those files there is no chance of exact reproduction anyway). At the same time, since the upcoming Heretic 1.4 will be a mismatch for all previously exported reproduce.json files, the user will be told that there is a "critical" chance of non-exact reproduction. So everything will work correctly, I think.

Vinay-Umrethe · 2026-06-10T19:15:37Z

We indeed bump the version to 2, and we only verify hashes if the version isn't 1 (because for those files there is no chance of exact reproduction anyway).

The only possible way for exact reproduction is to run all trials like original run for v1 using settings like suggested previously instead of simply skipping hash verification.

At the same time, since the upcoming Heretic 1.4 will be a mismatch for all previously exported reproduce.json files, the user will be told that there is a "critical" chance of non-exact reproduction. So everything will work correctly, I think.

Although --reproduce will be New Release's feature and it justifies it well that we should move on to v2 of reproduce.json

so both options are good, (1) is compatible (2) is easy less maintenance

But I don't think heretic-llm's version really matters that much for reproduction unless the config.default is changed in anyway (a config / method) which was optional in previous versions but default in new one

p-e-w · 2026-06-11T05:08:24Z

But I don't think heretic-llm's version really matters that much for reproduction

On the contrary, it's the most important thing by a huge margin.

Every single release so far would have broken reproduction. We have changed prompt alignment, the algorithm for computing residual means, objective computations, and many other things. And 1.4 is no exception, given that we now re-seed before svd_lowrank, which once again breaks reproduction.

When the versions of the Heretic package differ, there's essentially no chance to reproduce exactly, and I expect that to continue to be true in the future.

Vinay-Umrethe · 2026-06-11T05:40:07Z

We have changed prompt alignment, the algorithm for computing residual means, objective computations, and many other things. And 1.4 is no exception,

Yes that's why I said "unless the config.default is changed in anyway (a config / method) which was optional in previous versions but default in new one" so that applies to all future versions.

p-e-w · 2026-06-11T05:51:43Z

But that's not the main problem. It's not that we changed the config in the past, we have repeatedly changed how Heretic works internally. This is purely a code issue. For example, in 1.3 we introduced computing residual means in batches. That changes the outcome numerically, but it's not configurable.

p-e-w · 2026-06-11T06:14:29Z

Ok, now I'm going to do a quick test and then this is going in.

The only possible way for exact reproduction is to run all trials like original run for v1 using settings like suggested previously instead of simply skipping hash verification.

That's true, but it would be a lot of hassle, and also require further hacks like making the re-seeding conditional depending on which reproduce version is loaded. I think it's much better to just move forward and accept that v1 models can't be reproduced exactly.

Vinay-Umrethe · 2026-06-11T06:25:06Z

t's not that we changed the config in the past, we have repeatedly changed how Heretic works internally.

I mean the same that heretic's internals change based on versions, maybe "config" was wrong wording

Vinay-Umrethe · 2026-06-11T06:27:42Z

reproduce/README.md instructions needs update

p-e-w · 2026-06-11T06:37:22Z

What update do you mean? Just the information that it's now possible to use --reproduce?

p-e-w · 2026-06-11T07:39:37Z

Test results

https://huggingface.co/p-e-w/gemma-4-E4B-it-heretic (original model)

https://huggingface.co/p-e-w/gemma-4-E4B-it-heretic-REPRODUCED (reproduced on the same system, perfect match both for HF upload and local save)

https://huggingface.co/p-e-w/gemma-4-E4B-it-heretic-REPRODUCED-2 (reproduced on system with different GPU, 2 out of 4 shards differ, likely those that contain the layers the LoRA applied to)

System Mismatches

┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Component ┃ This system                     ┃ Original system                  ┃ Severity ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ CPU       │ AMD EPYC 9554 64-Core Processor │ AMD EPYC 7713P 64-Core Processor │ low      │
│ Devices   │ NVIDIA H100 80GB HBM3           │ NVIDIA RTX PRO 4500 Blackwell    │ medium   │
└───────────┴─────────────────────────────────┴──────────────────────────────────┴──────────┘

So here we have definitive proof that reproduction isn't guaranteed even if the only differences are in the hardware.

Verifying hashes of weight files...
model-00001-of-00004.safetensors: Hash matches
model-00002-of-00004.safetensors: Hash doesn't match
model-00003-of-00004.safetensors: Hash doesn't match
model-00004-of-00004.safetensors: Hash matches

kabachuha · 2026-06-11T07:45:03Z

Why check the hashes? Maybe going through all the tensors and firing "all_close" with a configurable espilon could be a better solution?

kabachuha · 2026-06-11T07:46:49Z

For easier Huggingface upload (avoid timeouts) people can shard the model into smaller partitions and then there might not even be matching shards at all

p-e-w · 2026-06-11T07:47:45Z

The shard size is configurable in Heretic, and the configuration is loaded during reproduction, which ensures the shards are exactly the same.

p-e-w · 2026-06-11T07:49:31Z

Why check the hashes? Maybe going through all the tensors and firing "all_close" with a configurable espilon could be a better solution?

The problem is that when uploading to HF, finding the model files on the local disk is pretty tricky, so re-downloading is the safest solution, which sucks.

p-e-w · 2026-06-11T07:51:20Z

I do like the idea of doing a tensor check in principle though.

p-e-w · 2026-06-11T08:05:40Z

Ah wait, the tensor comparison idea actually doesn't work at all. The whole point of the reproduction mechanism is that the reproduce.json file is self-contained. In fact, it is (among other situations) made for the case where the model itself doesn't exist anymore, so it can be restored.

So we can't reliably compare tensors, because there might not be tensors to compare to. By contrast, the hashes are stored inside the reproduce.json file, and can thus always be verified.

p-e-w · 2026-06-11T09:20:14Z

Merged! I added a note to the generated reproduction README explaining how to use this mechanism.

feat: load reproduction information

4df78d7

gemini-code-assist Bot reviewed May 12, 2026

View reviewed changes

feat: check reproduction environment against original environment

223d0f8

p-e-w mentioned this pull request May 28, 2026

feat: ARA, but it's LoRA #332

Open

p-e-w added 2 commits May 29, 2026 11:38

fix: remove trust_remote_code setting

96e4d2b

This improves security when running Heretic with an untrusted config file. The prompt is now always shown. This is NOT a breaking change, because we currently ignore values for unknown settings, so existing configs continue to work.

feat: reproduce model from JSON file

3bcb2f6

p-e-w changed the title ~~[WIP] feat: automatically reproduce model from reproduce.json~~ feat: automatically reproduce model from reproduce.json Jun 4, 2026

p-e-w marked this pull request as ready for review June 4, 2026 06:06

feat: verify hashes of uploaded weight files

bd3bf2c

p-e-w mentioned this pull request Jun 7, 2026

fix: fix issues in automatic reproduction system #352

Merged

feat: verify hashes of locally saved weight files

184b828

p-e-w added 2 commits June 11, 2026 11:32

Merge branch 'master' into auto-reproduce

10fef8f

fix: remove obsolete code from merge

eada28e

docs: add automatic reproduction instructions to reproduce README

e9bcd86

p-e-w merged commit 2fd163f into master Jun 11, 2026
4 checks passed

	json_str = urlopen(path).read().decode("utf-8")
	with urlopen(path, timeout=10) as response:
	json_str = response.read().decode("utf-8")

Conversation

p-e-w commented May 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

gemini-code-assist Bot May 12, 2026

Choose a reason for hiding this comment

Uh oh!

p-e-w May 12, 2026

Choose a reason for hiding this comment

Uh oh!

Vinay-Umrethe Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

p-e-w Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

p-e-w Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

Vinay-Umrethe Jun 4, 2026

Choose a reason for hiding this comment

Uh oh!

p-e-w commented Jun 4, 2026

Uh oh!

Vinay-Umrethe commented Jun 4, 2026

Cloud Test

1

2

Overall

Uh oh!

p-e-w commented Jun 4, 2026

Uh oh!

p-e-w commented Jun 4, 2026

Uh oh!

Vinay-Umrethe commented Jun 4, 2026

Uh oh!

Vinay-Umrethe commented Jun 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

p-e-w commented Jun 4, 2026

Uh oh!

p-e-w commented Jun 5, 2026

Uh oh!

p-e-w commented Jun 10, 2026

Uh oh!

Vinay-Umrethe commented Jun 10, 2026

Uh oh!

p-e-w commented Jun 11, 2026

Uh oh!

Vinay-Umrethe commented Jun 11, 2026

Uh oh!

p-e-w commented Jun 11, 2026

Uh oh!

p-e-w commented Jun 11, 2026

Uh oh!

Vinay-Umrethe commented Jun 11, 2026

Uh oh!

Vinay-Umrethe commented Jun 11, 2026

Uh oh!

p-e-w commented Jun 11, 2026

Uh oh!

p-e-w commented Jun 11, 2026

Test results

System Mismatches

Uh oh!

kabachuha commented Jun 11, 2026

Uh oh!

kabachuha commented Jun 11, 2026

Uh oh!

p-e-w commented Jun 11, 2026

Uh oh!

p-e-w commented Jun 11, 2026

Uh oh!

p-e-w commented Jun 11, 2026

p-e-w commented May 12, 2026 •

edited

Loading

Vinay-Umrethe commented Jun 4, 2026 •

edited

Loading