Skip to content

MOD-14010 support Homogenues array floating point forcing(deserializa…#17

Merged
AvivDavid23 merged 33 commits intomasterfrom
MOD-13577-datatypes-json-homogeneous-fp-arrays-declare-fp-type
Mar 3, 2026
Merged

MOD-14010 support Homogenues array floating point forcing(deserializa…#17
AvivDavid23 merged 33 commits intomasterfrom
MOD-13577-datatypes-json-homogeneous-fp-arrays-declare-fp-type

Conversation

@AvivDavid23
Copy link

@AvivDavid23 AvivDavid23 commented Feb 9, 2026

…tion path only)

  • Add Option to try and enforce specific float type in a value for a homogenues array
  • Add binary encoding and decoding, which will be used by RedisJson(to easily preserve the tag per path)
  • Add tests, fuzz tests

no production-quality Rust CBOR library implements RFC 8746 natively (the only candidate, cbor_enhanced, has been unmaintained since 2020 and lacks F16-LE and BF16 support), so a thin IValue ↔ ciborium::Value conversion layer is still needed

Size comparison (vs JSON baseline)

Document JSON CBOR CBOR+zstd
FP32 array (1000 elements) 19 180 B 4 005 B (−79%) 3 593 B (−81%)
FP64 array (1000 elements) 5 891 B 8 005 B (+36%) 2 359 B (−60%)
Heterogeneous array (1000 nums) 5 891 B 8 005 B (+36%) 2 359 B (−60%)
String-heavy object (50 keys) 2 181 B 2 032 B (−7%) 191 B (−91%)
Mixed object 94 B 59 B (−37%) 68 B (−28%)
Nested FP32 arrays + string 3 110 B 826 B (−73%) 338 B (−89%)
Big mixed JSON (200 records) 77 190 B 66 068 B (−14%) 17 777 B (−77%)
Repeated strings / RED-141886 49 276 B 37 190 B (−25%) 2 418 B (−95%)

Example payloads

1 — FP32 typed array (1 000 elements, stored with FPHA F32 tag)

[0.0, 0.001, 0.002, 0.003, 0.004, ...]   // 1 000 floats total

2 — FP64 typed array (1 000 elements, stored with FPHA F64 tag)

[0.0, 0.001, 0.002, ...]   // 1 000 doubles total

3 — Heterogeneous float array (same data, no FPHA hint)

Same JSON as above; without an FPHA hint the array is stored as ArrayHetero
(each element tagged individually). zstd still achieves the same ratio because
the repeated tag bytes compress well.

4 — String-heavy object (50 keys)

{
  "key_0": "value_0_some_longer_string_here",
  "key_1": "value_1_some_longer_string_here",
  "key_2": "value_2_some_longer_string_here",
  ...   // 50 keys total
}

5 — Small mixed object

{
  "name": "Alice",
  "age": 30,
  "scores": [1, 2, 3, null, true, "bonus"],
  "meta": {"active": true, "level": 42}
}

6 — Nested FP32 arrays + string

{
  "a": [0.0, 0.1, 0.2, ...],   // 100 F32 elements
  "b": [0.0, 0.1, 0.2, ...],   // 100 F32 elements
  "label": "test"
}

7 — Big mixed JSON (200 records, heterogeneous embeddings)

[
  {
    "id": 0, "name": "user_0", "active": true, "score": 0.0,
    "tags": ["alpha", "beta", "gamma"],
    "embedding": [0.0, 0.001, 0.002, ...]   // 32 floats
  },
  // ... 200 records total, repeated schema
]

The repeated key names ("id", "name", "active", "score", "tags", "embedding")
across 200 records are what zstd compresses so aggressively here.

8 — Repeated-string records (500 records, RED-141886 scenario)

[
  {"id": 0, "status": "active",   "region": "us-east-1", "tier": "free",     "owner": "team-a", "count": 0},
  {"id": 1, "status": "inactive", "region": "eu-west-1", "tier": "standard", "owner": "team-b", "count": 10},
  // ... 500 records; status/region/tier/owner values repeat from a small fixed set
]

Note

High Risk
Adds new CBOR/zstd encoding and changes core IArray fallible APIs to return a new IJsonError, which can affect downstream callers and data interchange correctness.

Overview
Adds optional floating-point homogeneous array (FPHA) enforcement during JSON deserialization via FPHAConfig/IValueDeserSeed, plus IArray::push_with_fp_type to force F16/BF16/F32/F64 storage and reject out-of-range values.

Introduces a new cbor module with encode/decode and zstd-compressed variants that preserve typed array tags using RFC 8746 (with a private BF16 tag), and surfaces this API from lib.rs.

Refactors array-related fallible APIs to return IJsonError (wrapping allocation failures and new range errors), adds extensive unit tests, and expands CI fuzzing to cover CBOR decode + round-trip (with updated fuzz runtime/memory flags).

Written by Cursor Bugbot for commit 9675e63. This will update automatically on new commits. Configure here.

@AvivDavid23 AvivDavid23 changed the title MOD-13577 support Homogenues array floating point forcing(deserializa… MOD-14010 support Homogenues array floating point forcing(deserializa… Feb 18, 2026
@RedisJSON RedisJSON deleted a comment from cursor bot Feb 22, 2026
@AvivDavid23 AvivDavid23 requested a review from gabsow February 22, 2026 07:17
@galcohen-redislabs
Copy link

Did you look at c2pa_cbor?

@AvivDavid23
Copy link
Author

Did you look at c2pa_cbor?

@galcohen-redislabs I can try that, although it doesnt have a major release, and very low usage:
image

Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 2 potential issues.

Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

@AvivDavid23 AvivDavid23 merged commit 259d8ac into master Mar 3, 2026
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants