Problem
A customer's materialization failed schema validation because SHA-256 redaction produces 71-character hashes (sha256:<64 hex>), but the connector schema's maxLength: 20 (from source varchar(20)) leaks into the read path via flow://relaxed-write-schema. The write schema overlay correctly sets maxLength: 71 with redact: {strategy: "sha256"}, and the inferred schema has maxLength: 128, but the relaxed write schema preserves the connector's maxLength: 20 unchanged, and the allOf intersection takes the most restrictive constraint:
|
let max_length = match (lhs.max_length, rhs.max_length) { |
|
(Some(l), Some(r)) => Some(l.min(r)), |
We have triaged by applying manual workarounds (removing the relaxed-write-schema ref from readSchema) a couple of times now, but any collection combining SHA-256 redaction with a length-constrained connector field will hit this.
Context
RelaxedSchemaObj explicitly strips keywords via #[serde(skip_serializing)] and passes everything else through a catch-all:
|
// Other keywords are passed-through. |
|
#[serde(flatten)] |
|
pass_through: BTreeMap<String, serde_json::Value>, |
The original design intent (Discussion #1988) was to remove "most validations (type, format, required, etc)" while preserving title, description, reduce, default, and conditional keywords, but the implementation only strips what's enumerated and passes everything else through.
The strip list has grown reactively, each addition driven by a production bug:
- Feb 2025 (#1937):
type, required, format (initial implementation)
- Jun 2025 (#2209):
const, enum (customer materialization failure)
- Jan 2026 (#2626):
additionalProperties: false (closed-object constraint leak)
When RelaxedSchemaObj was introduced (Feb 2025), no connector emitted maxLength, so the omission had no effect. Connectors started emitting maxLength for column sizing in Oct 2025 (connectors#3327, connectors#3341), and the redaction feature landed in Sep 2025 (#2383). The bug requires both to be present on the same field, which is why it took until now to surface.
Proposed fix
Strip maxLength and minLength from the relaxed schema, following the existing skip_serializing pattern. This is safe because the relaxed schema only appears in the read path (readSchema's allOf) and does not affect the write schema, connector schema, or materialization column sizing. The inferred schema still carries its own length constraints from observed data, so the read path isn't left unconstrained.
Phil agreed with this approach: "I think I'd probably be inclined to have the relaxed schema just strip out maxLength entirely. Seems like it was perhaps an oversight that it wasn't stripped out already."
Open question
The same logic applies to every other numeric/pattern validation keyword that passes through today: minimum, maximum, exclusiveMinimum, exclusiveMaximum, minItems, maxItems, pattern, multipleOf. Should we strip all of them now to match the original "most validations removed" design intent, or just fix maxLength/minLength and wait for the next bug?
@jwhartley FYI
Problem
A customer's materialization failed schema validation because SHA-256 redaction produces 71-character hashes (
sha256:<64 hex>), but the connector schema'smaxLength: 20(from sourcevarchar(20)) leaks into the read path viaflow://relaxed-write-schema. The write schema overlay correctly setsmaxLength: 71withredact: {strategy: "sha256"}, and the inferred schema hasmaxLength: 128, but the relaxed write schema preserves the connector'smaxLength: 20unchanged, and theallOfintersection takes the most restrictive constraint:flow/crates/doc/src/shape/intersect.rs
Lines 34 to 35 in 008278b
We have triaged by applying manual workarounds (removing the
relaxed-write-schemaref fromreadSchema) a couple of times now, but any collection combining SHA-256 redaction with a length-constrained connector field will hit this.Context
RelaxedSchemaObjexplicitly strips keywords via#[serde(skip_serializing)]and passes everything else through a catch-all:flow/crates/models/src/schemas.rs
Lines 215 to 217 in 008278b
The original design intent (Discussion #1988) was to remove "most validations (
type,format,required, etc)" while preservingtitle,description,reduce,default, and conditional keywords, but the implementation only strips what's enumerated and passes everything else through.The strip list has grown reactively, each addition driven by a production bug:
type,required,format(initial implementation)const,enum(customer materialization failure)additionalProperties: false(closed-object constraint leak)When
RelaxedSchemaObjwas introduced (Feb 2025), no connector emittedmaxLength, so the omission had no effect. Connectors started emittingmaxLengthfor column sizing in Oct 2025 (connectors#3327, connectors#3341), and the redaction feature landed in Sep 2025 (#2383). The bug requires both to be present on the same field, which is why it took until now to surface.Proposed fix
Strip
maxLengthandminLengthfrom the relaxed schema, following the existingskip_serializingpattern. This is safe because the relaxed schema only appears in the read path (readSchema'sallOf) and does not affect the write schema, connector schema, or materialization column sizing. The inferred schema still carries its own length constraints from observed data, so the read path isn't left unconstrained.Phil agreed with this approach: "I think I'd probably be inclined to have the relaxed schema just strip out
maxLengthentirely. Seems like it was perhaps an oversight that it wasn't stripped out already."Open question
The same logic applies to every other numeric/pattern validation keyword that passes through today:
minimum,maximum,exclusiveMinimum,exclusiveMaximum,minItems,maxItems,pattern,multipleOf. Should we strip all of them now to match the original "most validations removed" design intent, or just fixmaxLength/minLengthand wait for the next bug?@jwhartley FYI