-
Notifications
You must be signed in to change notification settings - Fork 1
Description
I am wondering if we need to more strictly separate "normalization" from "transformation" when we use sureberus. The idea being that "normalization" is actual normalization, aka canonicalization or standardization, to conform to a format so that it can validate w/ a particular schema. Normalization should be an idempotent operation, i.e. the output of normalization should always be valid as an input to the same schema, and return an equivalent result.
Then, "transformation" is more "interesting" modification of potentially deeply nested values inside of a data structure. Like what we do in data-driven data services where a recipe configuration is replaced by an actual recipe object before being passed to a renderer. This is not an idempotent operation. There are lots of other things we do that aren't idempotent and which maybe should be separated out to another step -- either custom Python code or an additional Sureberus schema that is explicitly not a "validation schema". Having Recipe Shelves replace "sum" with an actual sum function would be another example.
Usually, if your schema only uses coerce and not coerce_post, it is probably doing proper normalization. But there are still cases of using coerce_post that can be considered normalization, as long as they still return an output that can be used as an input.
One idea I have for how to easily describe both of these situations while sharing as much code as possible is by taking advantage of schema registries.
It should be possible to have a schema that describes the bulk of what needs to be done, and use named schema references anywhere that "transformation" vs "normalization" needs to take place.
Then, in a "normalization" codepath we would register simple idempotent definitions for those named schemas, and in the "transformation" codepath we would register definitions that potentially replace them with completely different values