Skip to content

Config discoverability: JSON Schema generation + falcon schema CLI #29

@cweniger

Description

@cweniger

Motivation

Writing config.yaml files requires knowing the full set of available options, their types, defaults, and valid choices for each estimator/prior. Today this information lives only in Python dataclasses (SNPEConfig, GaussianConfig, TrainingLoopConfig, etc.) and isn't surfaced to users at config-writing time.

The goal: a user should be able to discover all config options without reading source code.

Proposal

1. JSON Schema generation → IDE autocompletion

Auto-generate a JSON Schema from the existing config dataclasses. With VS Code + the Red Hat YAML extension, this gives autocomplete, validation, and hover docs for free.

// .vscode/settings.json
{ "yaml.schemas": { "./schemas/falcon-config.schema.json": "config*.yaml" } }

Implementation:

  • Walk config dataclasses, emit JSON Schema properties with types, defaults, and descriptions
  • _target_ fields get enum of known estimator/prior paths
  • choices metadata (e.g., net_type) becomes enum in schema
  • Generated once at release time (or via falcon schema --json-schema)

2. falcon schema CLI command

Interactive introspection from the terminal:

# Show full config tree with defaults and descriptions
$ falcon schema falcon.estimators.Flow

falcon.estimators.Flow:
  loop:
    num_epochs: 100          # Max training epochs
    batch_size: 128          # Samples per training step
    early_stop_patience: 16  # Epochs without improvement before stopping
  network:
    net_type: zuko_nice      # Flow architecture [nsf, maf, zuko_gf, naf, ...]
    theta_norm: true         # Normalize parameter space
  embedding: {}              # _target_ + _input_ for observation embedding
  optimizer:
    lr: 0.01                 # Learning rate
    lr_decay_factor: 0.1     # LR multiplier on plateau
  inference:
    gamma: 0.5               # Amortization mixing (0=sequential, 1=amortized)

# Dump as YAML template
$ falcon schema falcon.estimators.Flow --yaml > config_template.yaml

Prerequisite: dataclass field metadata

Both features are driven by the same source — metadata on dataclass fields:

@dataclass
class NetworkConfig:
    net_type: str = field(
        default="zuko_nice",
        metadata={"help": "Flow architecture", "choices": ["nsf", "maf", "zuko_gf", "naf"]}
    )
    theta_norm: bool = field(
        default=True,
        metadata={"help": "Normalize parameter space"}
    )

Adding metadata={"help": ...} to existing config dataclasses is the single investment that pays off across both the schema and the CLI.

Scope

  • Add metadata={"help": ...} to all config dataclass fields (TrainingLoopConfig, OptimizerConfig, InferenceConfig, NetworkConfig, SNPEConfig, GaussianConfig, GaussianPosteriorConfig)
  • Implement schema_from_dataclass() utility that walks dataclasses → JSON Schema
  • Implement falcon schema <target> CLI subcommand (pretty-printed YAML with comments)
  • Implement falcon schema <target> --json-schema output mode
  • Ship generated schema in schemas/ and add .vscode/settings.json example
  • Document in README / docs site

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions