Skip to content

Remove catalog_schema from Python Catalog object #1626

@jacobsimionato

Description

@jacobsimionato

Design Document: Decoupled & Lossless A2UI Catalog Architecture

1. Objective

Simplify the A2UI Python SDK catalog lifecycle by eliminating redundant in-memory schema state (catalog.catalog_schema), removing the need to preserve source catalog.json files on disk, and enforcing rigid structural limits. This is achieved by inlining local definitions during ingestion and dynamically reconstructing the unified catalog schema on the fly for validation and roundtripping.


2. Architectural Design

   [Ingestion Phase]
   ┌───────────────────────────┐
   │ catalog_schema (Raw JSON) │
   └─────────────┬─────────────┘
                 │
                 ▼
   ┌───────────────────────────┐
   │ Recursive Local Inliner   │◄── Expands `#/$defs/...` local references
   └─────────────┬─────────────┘
                 │
                 ▼
   ┌───────────────────────────┐
   │   Catalog Instantiation   │◄── Stores fully self-contained schemas
   │   (No catalog_schema stored)│   in `ComponentApi` & `FunctionApi`
   └─────────────┬─────────────┘
                 │
   [Execution / Validation / Roundtripping Phase]
                 │
                 ▼
   ┌───────────────────────────┐
   │ Dynamic Reconstructor     │◄── Generates standard catalog schema on the fly
   └───────────────────────────┘

3. Key Changes

A. Removal of the catalog_schema / _catalog_schema Property

Currently, the Catalog class maintains a permanent _catalog_schema private property to store the exact JSON document ingested. Under the new architecture:

  • The backing variable _catalog_schema is removed.
  • The getter property catalog_schema is modified to be a computed property that dynamically reconstructs the schema on the fly.
  • This eliminates duplicate schema trees in memory and removes any permanent dependency on the source filesystem.

B. In-Memory Dereferencing (Inlining $defs during Ingestion)

When instantiating a catalog via Catalog.from_json(), the schema is pre-processed using a recursive, depth-safe local reference resolver:

  1. Target Pointers: The resolver only processes local pointers starting with #/ (e.g., "#/$defs/CatalogComponentCommon").
  2. Exclusion: Universal specification pointers (e.g., "common_types.json#/$defs/ComponentId") are left untouched.
  3. Merging: When a $ref is expanded, its resolved dictionary is merged with any other properties defined at the reference site (with the local site properties taking precedence).
  4. Self-Containment: The fully dereferenced schemas are assigned to the ComponentApi.schema and FunctionApi.schema properties.

C. On-The-Fly Schema Reconstruction (Supporting Lossless Roundtripping)

Whenever a standard, unified catalog schema document is requested (such as for CatalogSchemaValidator or external export/roundtripping), the catalog dynamically compiles a standard-compliant document:

  1. components: Reconstructed directly by gathering the schemas from all registered ComponentApi and ModelComponentApi instances (which are now fully self-contained).
  2. functions: Reconstructed from the schemas of all registered FunctionApi and FunctionImplementation instances.
  3. theme: Populated from the active theme_schema or generated from the theme_class.model_json_schema().
  4. Mechanical $defs: The constructor dynamically and mechanically generates:
    • theme: points to the theme block.
    • anyComponent: A oneOf array containing a direct reference to every component in the components dictionary, with the "component" discriminator.
    • anyFunction: A oneOf array containing a direct reference to every function in the functions dictionary.

4. Conceptual Resolver Algorithm

The ingestion pre-processor traverses the schema dictionary recursively using the following rules:

def inline_local_refs(node: Any, root_catalog: Dict[str, Any], visited: Set[str]) -> Any:
    if isinstance(node, dict):
        if "$ref" in node and isinstance(node["$ref"], str) and node["$ref"].startswith("#/"):
            ref_path = node["$ref"]
            if ref_path in visited:
                return node  # Prevent circular loop stack overflow
                
            visited.add(ref_path)
            
            # 1. Resolve JSON Pointer against root_catalog
            resolved_node = query_json_pointer(root_catalog, ref_path)
            
            # 2. Recursively resolve any nested references in the resolved node
            resolved_node = inline_local_refs(resolved_node, root_catalog, visited)
            
            # 3. Merge local override properties with resolved keys
            merged = {k: v for k, v in node.items() if k != "$ref"}
            if isinstance(resolved_node, dict):
                return {**resolved_node, **merged}
            return resolved_node
            
        return {k: inline_local_refs(v, root_catalog, visited) for k, v in node.items()}
        
    elif isinstance(node, list):
        return [inline_local_refs(item, root_catalog, visited) for item in node]
        
    return node

5. Dynamic Schema Target Format

The computed catalog_schema property dynamically formats and returns the following rigid JSON structure:

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "catalogId": "<catalog.catalog_id>",
  "components": {
    "Text": { ... Text's fully self-contained, inlined schema ... },
    "Image": { ... Image's fully self-contained, inlined schema ... }
  },
  "functions": {
    "required": { ... required's fully self-contained schema ... }
  },
  "$defs": {
    "theme": { ... theme schema ... },
    "anyComponent": {
      "oneOf": [
        { "$ref": "#/components/Text" },
        { "$ref": "#/components/Image" }
      ],
      "discriminator": { "propertyName": "component" }
    },
    "anyFunction": {
      "oneOf": [
        { "$ref": "#/functions/required" }
      ]
    }
  }
}

6. Compatibility & Impact Analysis

  • Full v0.9 Catalog Support: Even though the v0.9 basic catalog uses helper classes like CatalogComponentCommon inside root $defs, the local inliner seamlessly merges the weight property directly into the layout-descendant component schemas (Text, Button, Row, etc.).
  • Test Suite Alignment: All core integrity validation checks (A2uiValidator, CatalogSchemaValidator, and _extract_ref_fields_json) remain fully functional. Pointers pointing to common_types.json are untouched, and local JSON reference resolution behaves identically to the original flat-file design.
  • Lossless Roundtripping: Downstream system components receive standard JSON-Schema syntax. Rich field descriptors, constraints (pattern, minItems), and union branching (oneOf / anyOf) are perfectly preserved inside the individual components and fully assembled back into a single standard document on demand.

Metadata

Metadata

Assignees

Labels

P2sprint readyThis issue should be included in sprint planning views

Type

No type
No fields configured for issues without a type.

Projects

Status
Todo

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions