Skip to content

feat: automatically reproduce model from reproduce.json#326

Merged
p-e-w merged 10 commits into
masterfrom
auto-reproduce
Jun 11, 2026
Merged

feat: automatically reproduce model from reproduce.json#326
p-e-w merged 10 commits into
masterfrom
auto-reproduce

Conversation

@p-e-w

@p-e-w p-e-w commented May 12, 2026

Copy link
Copy Markdown
Owner

Automatic model reproduction from the reproduce.json file generated during model upload if the user chooses to add reproducibility information.

Features:

  • Supports loading either a local file or an HTTP URL, with special handling for Hugging Face, GitHub, and Codeberg URLs.
  • Checks the local environment for mismatches with the original environment, and evaluates the likelihood of reproduction failure if there are any.
  • Validates the reproduced model against the original model's checksums after export.
    • For models uploaded to Hugging Face.
    • For models saved locally.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a reproduction feature that allows loading model reproduction information from a local file or URL via a new --reproduce CLI flag. The changes include updates to the configuration settings, CLI argument handling, and a new utility function for fetching and parsing JSON data. Feedback focuses on adhering to the repository style guide by updating the default configuration file and formatting comments correctly. Additionally, it is recommended to use a context manager and timeout for network requests to ensure better resource handling and process stability.

Comment thread src/heretic/config.py
Comment thread src/heretic/reproduce.py
Comment thread src/heretic/reproduce.py
path = path.replace("/blob/", "/raw/") # Hugging Face, GitHub
path = path.replace("/src/branch/", "/raw/branch/") # Codeberg

json_str = urlopen(path).read().decode("utf-8")

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

It is recommended to use a context manager with urlopen to ensure the connection is properly closed. Additionally, adding a timeout prevents the process from hanging indefinitely on network issues.

Suggested change
json_str = urlopen(path).read().decode("utf-8")
with urlopen(path, timeout=10) as response:
json_str = response.read().decode("utf-8")

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair enough.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not use hf_hub_download or snapshot_download method of huggingface_hub.

The reason Ig is "raw url provides a broad range of source for downloading", though above methods can work if we just specify a Repo/ID

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we also want to support files stored on GitHub and Codeberg, where a public archive of reproduce.json files will be.

Copy link
Copy Markdown
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HF is kind of the least useful here, because if the repo is still available there's really no need to reproduce, you can just download the model directly.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Understood, that's correct

p-e-w added 2 commits May 29, 2026 11:38
This improves security when running Heretic with an untrusted config file. The prompt is now always shown.

This is NOT a breaking change, because we currently ignore values for unknown settings, so existing configs continue to work.
@p-e-w p-e-w changed the title [WIP] feat: automatically reproduce model from reproduce.json feat: automatically reproduce model from reproduce.json Jun 4, 2026
@p-e-w p-e-w marked this pull request as ready for review June 4, 2026 06:06
@p-e-w

p-e-w commented Jun 4, 2026

Copy link
Copy Markdown
Owner Author

@Vinay-Umrethe

This is ready for testing! AFAICT, it appears to work, though the situation with cloud GPUs is currently catastrophic so I haven't been able to comprehensively test in different environments.

Checksum validation is still missing.

@Vinay-Umrethe

Copy link
Copy Markdown
Contributor

Cloud Test

Base Model (Reproducible Compatible):

heretic-org/MiniCPM-V-4.6-heretic

1

Loading reproduction information from 
https://huggingface.co/buckets/heretic-org/Heretic-Reproducibles-Storage/resolve
/heretic-reproducibles/data/huggingface.co/heretic-org/MiniCPM-V-4.6-heretic-32b
28aa.json?download=true...

Your local environment doesn't perfectly match the environment used to produce 
the original model. The following components differ:

System Mismatches
┏━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Component    ┃ This system ┃ Original system ┃ Severity ┃
┡━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ CUDA Version │ 12.8        │ 13.0            │ medium   │
└──────────────┴─────────────┴─────────────────┴──────────┘

Package Mismatches
┏━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Package                ┃ This system            ┃ Original system ┃ Severity ┃
┡━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ certifi                │ 2026.5.20              │ 2026.4.22       │ low      │
│ click                  │ 8.4.0                  │ 8.3.3           │ low      │
│ cuda-bindings          │ 12.9.6                 │ 13.2.0          │ low      │
│ cuda-pathfinder        │ 1.5.4                  │ 1.5.3           │ low      │
│ cuda-toolkit           │ 12.8.1                 │ 13.0.2          │ low      │
│ greenlet               │ 3.5.1                  │ 3.4.0           │ low      │
│ heretic-llm            │ 1.3.0-git+https://gith │ 1.3.0           │ critical │
│                        │ ub.com/p-e-w/heretic.g │                 │          │
│                        │ it@3bcb2f6d2b5fd419af8 │                 │          │
│                        │ 55e3d52690bc7cdc58d77  │                 │          │
│ hf-xet                 │ 1.5.0                  │ 1.4.3           │ low      │
│ huggingface-hub        │ 1.16.1                 │ 1.11.0          │ low      │
│ idna                   │ 3.15                   │ 3.13            │ low      │
│ kernels-data           │ 0.15.2                 │ 0.14.1          │ low      │
│ lxml                   │ 6.1.1                  │ 6.1.0           │ low      │
│ mako                   │ 1.3.12                 │ 1.3.11          │ low      │
│ markdown-it-py         │ 4.2.0                  │ 4.0.0           │ low      │
│ numpy                  │ 2.4.6                  │ 2.4.4           │ low      │
│ nvidia-cublas          │                        │ 13.1.1.3        │ low      │
│ nvidia-cublas-cu12     │ 12.8.4.1               │                 │ low      │
│ nvidia-cuda-nvrtc      │                        │ 13.0.88         │ low      │
│ nvidia-cudnn-cu12      │ 9.19.0.56              │                 │ low      │
│ nvidia-cudnn-cu13      │                        │ 9.20.0.48       │ low      │
│ nvidia-cusparselt-cu12 │ 0.7.1                  │                 │ low      │
│ nvidia-cusparselt-cu13 │                        │ 0.8.1           │ low      │
│ nvidia-nccl-cu12       │ 2.28.9                 │                 │ low      │
│ nvidia-nccl-cu13       │                        │ 2.29.7          │ low      │
│ nvidia-nvshmem-cu12    │ 3.4.5                  │                 │ low      │
│ nvidia-nvshmem-cu13    │                        │ 3.4.5           │ low      │
│ optuna                 │ 4.9.0                  │ 4.8.0           │ medium   │
│ packaging              │ 26.2                   │ 26.1            │ low      │
│ pydantic-settings      │ 2.14.1                 │ 2.14.0          │ low      │
│ torch                  │ 2.11.0+cu128           │ 2.12.0+cu130    │ high     │
│ torchvision            │ 0.26.0                 │ 0.27.0          │ low      │
│ transformers           │ 5.10.1                 │ 5.8.1           │ high     │
│ triton                 │ 3.6.0                  │ 3.7.0           │ medium   │
│ typer                  │ 0.25.1                 │ 0.24.2          │ low      │
│ tzdata                 │ 2026.2                 │ 2026.1          │ low      │
│ wcwidth                │ 0.7.0                  │ 0.6.0           │ low      │
│ xxhash                 │ 3.7.0                  │ 3.6.0           │ low      │
│ zipp                   │ 4.1.0                  │ 3.23.1          │ low      │
└────────────────────────┴────────────────────────┴─────────────────┴──────────┘

There is a critical chance that reproduction won't produce a byte-for-byte 
identical model. However, the resulting model will very likely still behave 
similarly to the original model.

Result:

Unidentical SHA256 of the model.safetensors, this test was conducted with default pre-installed packages as is without requirements.txt

VINAY-UMRETHE/MiniCPM-V-4.6-heretic-AUTO-REPRODUCE-Test1


2

Loading reproduction information from 
https://huggingface.co/buckets/heretic-org/Heretic-Reproducibles-Storage/resolve
/heretic-reproducibles/data/huggingface.co/heretic-org/MiniCPM-V-4.6-heretic-32b
28aa.json?download=true...

Your local environment doesn't perfectly match the environment used to produce 
the original model. The following components differ:

Package Mismatches
┏━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Package     ┃ This system                       ┃ Original system ┃ Severity ┃
┡━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ heretic-llm │ 1.3.0-git+https://github.com/p-e- │ 1.3.0           │ critical │
│             │ w/heretic.git@3bcb2f6d2b5fd419af8 │                 │          │
│             │ 55e3d52690bc7cdc58d77             │                 │          │
└─────────────┴───────────────────────────────────┴─────────────────┴──────────┘

There is a critical chance that reproduction won't produce a byte-for-byte 
identical model. However, the resulting model will very likely still behave 
similarly to the original model.

Result:

Unidentical SHA256 again for model.safetensors, this test was conducted after installing all packages from reproduce/requirements.txt of the base model.

VINAY-UMRETHE/MiniCPM-V-4.6-heretic-AUTO-REPRODUCE-Test2


Overall

The KLD and Refusal count remained the same for both the Tests, But not byte-identical as they should have.

I think most likely reason for it is: create_trial method of Optuna which re-creates the model from raw "parameters" key in reproduce.json

Note

Two Incoming tests running now, which'll manually run trials using config.toml file with only critical package mismatch of heretic-llm version itself (test 3) and with exact heretic version (test 4)

@p-e-w

p-e-w commented Jun 4, 2026

Copy link
Copy Markdown
Owner Author

I think most likely reason for it is: create_trial method of Optuna which re-creates the model from raw "parameters" key in reproduce.json

I hope that isn't the problem, because that would seem to be very hard to fix. I did check a while ago that the parameters are stored with full precision, so I don't know why they wouldn't be identical.

@p-e-w

p-e-w commented Jun 4, 2026

Copy link
Copy Markdown
Owner Author

I just checked and it appears that Python guarantees that floats are preserved perfectly through a json.dumps/json.loads round trip, as long as the float is not infinity/NaN, and a modern Python version is used. So this cannot be the issue.

@Vinay-Umrethe

Copy link
Copy Markdown
Contributor

I hope that isn't the problem, because that would seem to be very hard to fix. I did check a while ago that the parameters are stored with full precision, so I don't know why they wouldn't be identical.

maybe the decimals are not complete (not full precision) or PyTorch operations or Heretic version is the issue

Correct way would be:

Create a fresh model with the reproduce/ with auto-reproduce branch and then test, so that the last mismatch is also fixed where heretic versions differ

@Vinay-Umrethe

Vinay-Umrethe commented Jun 4, 2026

Copy link
Copy Markdown
Contributor

I just checked and it appears that Python guarantees that floats are preserved perfectly through a json.dumps/json.loads round trip, as long as the float is not infinity/NaN, and a modern Python version is used. So this cannot be the issue.

in that case its not a problem, it has full precision

@p-e-w

p-e-w commented Jun 4, 2026

Copy link
Copy Markdown
Owner Author

maybe the decimals are not complete (not full precision) or PyTorch operations or Heretic version is the issue

The decimals are complete. Python guarantees this through the "Grisu3/Dragon4" algorithm, apparently.

@p-e-w

p-e-w commented Jun 5, 2026

Copy link
Copy Markdown
Owner Author

Hash verification is now implemented for models uploaded to HF.

* fix: Check if a model is gated / accessible

* fix: handle unknown gated models

* feat: Auto install requirements

* simplify

* Revert "simplify"

This reverts commit 1028792.

* Revert "feat: Auto install requirements"

This reverts commit f4be1ab.

* fix: Seed pytorch method

* reference, style

* simplify token

* feat: Export strategy in reproduce.json, v2

* style: Name

* simplify export strategy

* style: Rename

* enumeration

* maybe remove seed as well

* fix: don't lock settings with permanent strategy

* simplify no choice, use try/finally block
@p-e-w

p-e-w commented Jun 10, 2026

Copy link
Copy Markdown
Owner Author

@Vinay-Umrethe

I've now also implemented hash verification for locally saved files, and it appears to work.

As for the version, I've decided on the following approach: We indeed bump the version to 2, and we only verify hashes if the version isn't 1 (because for those files there is no chance of exact reproduction anyway). At the same time, since the upcoming Heretic 1.4 will be a mismatch for all previously exported reproduce.json files, the user will be told that there is a "critical" chance of non-exact reproduction. So everything will work correctly, I think.

@Vinay-Umrethe

Copy link
Copy Markdown
Contributor

We indeed bump the version to 2, and we only verify hashes if the version isn't 1 (because for those files there is no chance of exact reproduction anyway).

The only possible way for exact reproduction is to run all trials like original run for v1 using settings like suggested previously instead of simply skipping hash verification.

At the same time, since the upcoming Heretic 1.4 will be a mismatch for all previously exported reproduce.json files, the user will be told that there is a "critical" chance of non-exact reproduction. So everything will work correctly, I think.

Although --reproduce will be New Release's feature and it justifies it well that we should move on to v2 of reproduce.json

so both options are good, (1) is compatible (2) is easy less maintenance

But I don't think heretic-llm's version really matters that much for reproduction unless the config.default is changed in anyway (a config / method) which was optional in previous versions but default in new one

@p-e-w

p-e-w commented Jun 11, 2026

Copy link
Copy Markdown
Owner Author

But I don't think heretic-llm's version really matters that much for reproduction

On the contrary, it's the most important thing by a huge margin.

Every single release so far would have broken reproduction. We have changed prompt alignment, the algorithm for computing residual means, objective computations, and many other things. And 1.4 is no exception, given that we now re-seed before svd_lowrank, which once again breaks reproduction.

When the versions of the Heretic package differ, there's essentially no chance to reproduce exactly, and I expect that to continue to be true in the future.

@Vinay-Umrethe

Copy link
Copy Markdown
Contributor

We have changed prompt alignment, the algorithm for computing residual means, objective computations, and many other things. And 1.4 is no exception,

Yes that's why I said "unless the config.default is changed in anyway (a config / method) which was optional in previous versions but default in new one" so that applies to all future versions.

@p-e-w

p-e-w commented Jun 11, 2026

Copy link
Copy Markdown
Owner Author

But that's not the main problem. It's not that we changed the config in the past, we have repeatedly changed how Heretic works internally. This is purely a code issue. For example, in 1.3 we introduced computing residual means in batches. That changes the outcome numerically, but it's not configurable.

@p-e-w

p-e-w commented Jun 11, 2026

Copy link
Copy Markdown
Owner Author

Ok, now I'm going to do a quick test and then this is going in.

The only possible way for exact reproduction is to run all trials like original run for v1 using settings like suggested previously instead of simply skipping hash verification.

That's true, but it would be a lot of hassle, and also require further hacks like making the re-seeding conditional depending on which reproduce version is loaded. I think it's much better to just move forward and accept that v1 models can't be reproduced exactly.

@Vinay-Umrethe

Copy link
Copy Markdown
Contributor

t's not that we changed the config in the past, we have repeatedly changed how Heretic works internally.

I mean the same that heretic's internals change based on versions, maybe "config" was wrong wording

@Vinay-Umrethe

Copy link
Copy Markdown
Contributor

reproduce/README.md instructions needs update

@p-e-w

p-e-w commented Jun 11, 2026

Copy link
Copy Markdown
Owner Author

What update do you mean? Just the information that it's now possible to use --reproduce?

@p-e-w

p-e-w commented Jun 11, 2026

Copy link
Copy Markdown
Owner Author

Test results

https://huggingface.co/p-e-w/gemma-4-E4B-it-heretic (original model)

https://huggingface.co/p-e-w/gemma-4-E4B-it-heretic-REPRODUCED (reproduced on the same system, perfect match both for HF upload and local save)

https://huggingface.co/p-e-w/gemma-4-E4B-it-heretic-REPRODUCED-2 (reproduced on system with different GPU, 2 out of 4 shards differ, likely those that contain the layers the LoRA applied to)

System Mismatches

┏━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┓
┃ Component ┃ This system                     ┃ Original system                  ┃ Severity ┃
┡━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━┩
│ CPU       │ AMD EPYC 9554 64-Core Processor │ AMD EPYC 7713P 64-Core Processor │ low      │
│ Devices   │ NVIDIA H100 80GB HBM3           │ NVIDIA RTX PRO 4500 Blackwell    │ medium   │
└───────────┴─────────────────────────────────┴──────────────────────────────────┴──────────┘

So here we have definitive proof that reproduction isn't guaranteed even if the only differences are in the hardware.

Verifying hashes of weight files...
model-00001-of-00004.safetensors: Hash matches
model-00002-of-00004.safetensors: Hash doesn't match
model-00003-of-00004.safetensors: Hash doesn't match
model-00004-of-00004.safetensors: Hash matches

@kabachuha

Copy link
Copy Markdown
Contributor

Why check the hashes? Maybe going through all the tensors and firing "all_close" with a configurable espilon could be a better solution?

@kabachuha

Copy link
Copy Markdown
Contributor

For easier Huggingface upload (avoid timeouts) people can shard the model into smaller partitions and then there might not even be matching shards at all

@p-e-w

p-e-w commented Jun 11, 2026

Copy link
Copy Markdown
Owner Author

The shard size is configurable in Heretic, and the configuration is loaded during reproduction, which ensures the shards are exactly the same.

@p-e-w

p-e-w commented Jun 11, 2026

Copy link
Copy Markdown
Owner Author

Why check the hashes? Maybe going through all the tensors and firing "all_close" with a configurable espilon could be a better solution?

The problem is that when uploading to HF, finding the model files on the local disk is pretty tricky, so re-downloading is the safest solution, which sucks.

@p-e-w

p-e-w commented Jun 11, 2026

Copy link
Copy Markdown
Owner Author

I do like the idea of doing a tensor check in principle though.

@p-e-w

p-e-w commented Jun 11, 2026

Copy link
Copy Markdown
Owner Author

Ah wait, the tensor comparison idea actually doesn't work at all. The whole point of the reproduction mechanism is that the reproduce.json file is self-contained. In fact, it is (among other situations) made for the case where the model itself doesn't exist anymore, so it can be restored.

So we can't reliably compare tensors, because there might not be tensors to compare to. By contrast, the hashes are stored inside the reproduce.json file, and can thus always be verified.

@p-e-w p-e-w merged commit 2fd163f into master Jun 11, 2026
4 checks passed
@p-e-w

p-e-w commented Jun 11, 2026

Copy link
Copy Markdown
Owner Author

Merged! I added a note to the generated reproduction README explaining how to use this mechanism.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants