Skip to content

Further strip offending annotations from flow://relaxed-write-schema. Maybe strip all of them? #2818

@jshearer

Description

@jshearer

Problem

A customer's materialization failed schema validation because SHA-256 redaction produces 71-character hashes (sha256:<64 hex>), but the connector schema's maxLength: 20 (from source varchar(20)) leaks into the read path via flow://relaxed-write-schema. The write schema overlay correctly sets maxLength: 71 with redact: {strategy: "sha256"}, and the inferred schema has maxLength: 128, but the relaxed write schema preserves the connector's maxLength: 20 unchanged, and the allOf intersection takes the most restrictive constraint:

let max_length = match (lhs.max_length, rhs.max_length) {
(Some(l), Some(r)) => Some(l.min(r)),

We have triaged by applying manual workarounds (removing the relaxed-write-schema ref from readSchema) a couple of times now, but any collection combining SHA-256 redaction with a length-constrained connector field will hit this.

Context

RelaxedSchemaObj explicitly strips keywords via #[serde(skip_serializing)] and passes everything else through a catch-all:

// Other keywords are passed-through.
#[serde(flatten)]
pass_through: BTreeMap<String, serde_json::Value>,

The original design intent (Discussion #1988) was to remove "most validations (type, format, required, etc)" while preserving title, description, reduce, default, and conditional keywords, but the implementation only strips what's enumerated and passes everything else through.

The strip list has grown reactively, each addition driven by a production bug:

  • Feb 2025 (#1937): type, required, format (initial implementation)
  • Jun 2025 (#2209): const, enum (customer materialization failure)
  • Jan 2026 (#2626): additionalProperties: false (closed-object constraint leak)

When RelaxedSchemaObj was introduced (Feb 2025), no connector emitted maxLength, so the omission had no effect. Connectors started emitting maxLength for column sizing in Oct 2025 (connectors#3327, connectors#3341), and the redaction feature landed in Sep 2025 (#2383). The bug requires both to be present on the same field, which is why it took until now to surface.

Proposed fix

Strip maxLength and minLength from the relaxed schema, following the existing skip_serializing pattern. This is safe because the relaxed schema only appears in the read path (readSchema's allOf) and does not affect the write schema, connector schema, or materialization column sizing. The inferred schema still carries its own length constraints from observed data, so the read path isn't left unconstrained.

Phil agreed with this approach: "I think I'd probably be inclined to have the relaxed schema just strip out maxLength entirely. Seems like it was perhaps an oversight that it wasn't stripped out already."

Open question

The same logic applies to every other numeric/pattern validation keyword that passes through today: minimum, maximum, exclusiveMinimum, exclusiveMaximum, minItems, maxItems, pattern, multipleOf. Should we strip all of them now to match the original "most validations removed" design intent, or just fix maxLength/minLength and wait for the next bug?


@jwhartley FYI

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions