Witness Optimizations for SHA256: Dynamic AND/XOR Bit-Width + Spread-Based Binops #284
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This PR implements two major witness-count optimizations for the R1CS compiler targeting SHA256 circuits:
1. Dynamic Bit-Width for Combined AND/XOR Lookup Table
Introduces a cost model that searches over candidate atomic widths {2, 4, 8} and selects the width minimizing total witness count (table + decomposition + complementary + overhead)
Currently only applies to 8-bit operands; wider operands fall back to w=8
2. Spread Trick for SHA256 Bitwise Operations
Replaces the generic combined AND/XOR table approach for SHA256 with the spread-based technique from Eli Ben-Sasson et al.
Key idea: "spreading" a value by interleaving zeros between bits (
0b1011→0b01000101) converts bitwise XOR/AND into field addition/multiplication, eliminating per-operation lookup costsWhy ROTR/SHR are free: SHA256 decomposes each 32-bit word into chunks aligned to rotation boundaries. For example,
Sigma_0uses chunks[2, 11, 9, 10] bits exactly matching ROTR2, ROTR13, ROTR22. A rotation is just reading the same chunks in a different order with adjusted coefficients: ROTR2(a) reads chunks starting from index 1 instead of index 0. No new witnesses are allocated — the spread values computed during decomposition are reused with permuted coefficients. Similarly, SHR drops the lowest chunks entirely. The cost of rotation/shift is zero witnesses, zero constraints.SHA256 operations implemented via spread:
sigma_0,sigma_1,Sigma_0,Sigma_1,Ch,Maj, message schedule, compression roundsu32addition with single carry witness per additionStats
SHA256 (35 compression calls )
SHA256 (1 compression call)
Known Soundness Limitation (for M31)
Future Work
Dynamic spread table width: Currently fixed at w=8 (256 entries). A dynamic search over {4, 8, 16} could further reduce witnesses for circuits with fewer SHA256 calls
Shared lookup table: The spread table implicitly range-checks values to [0, 2^w - 1]. Merging with the existing range-check system would eliminate redundant table entries and save additional witnesses.