This document explains the structure of the AI SBOM (Software Bill of Materials) document produced by Xelo. Every field maps directly to the Pydantic models in src/xelo/models.py. The canonical JSON Schema is at src/xelo/schemas/aibom.schema.json and can be printed with xelo schema.
{
"schema_version": "1.3.0",
"generated_at": "2026-03-07T08:00:00Z",
"generator": "xelo",
"target": "./my-repo",
"nodes": [...],
"edges": [...],
"deps": [...],
"summary": {...}
}| Field | Type | Description |
|---|---|---|
schema_version |
string | SBOM schema semver; bump when format changes |
generated_at |
ISO 8601 datetime | UTC timestamp of the scan |
generator |
string | Always "xelo" |
target |
string | Repository URL or local path scanned |
nodes |
array of Node | Detected AI components — the main payload |
edges |
array of Edge | Directed relationships between nodes |
deps |
array of PackageDep | Package manifest dependencies |
summary |
ScanSummary | Scan-level roll-up metadata |
A node is one detected AI component. Nodes are the main thing you work with.
{
"id": "3f4a1c2d-...",
"name": "ResearchAgent",
"component_type": "AGENT",
"confidence": 0.95,
"metadata": { ... },
"evidence": [ ... ]
}| Type | What it represents |
|---|---|
AGENT |
An agentic orchestrator — LangGraph graph, CrewAI crew, AutoGen agent, OpenAI Agent, etc. |
MODEL |
An LLM or embedding model reference — e.g. gpt-4o, claude-3-5-sonnet, text-embedding-3-small |
TOOL |
A function tool or MCP tool wired to an agent |
PROMPT |
A system instruction or prompt template; full content preserved in metadata.extras.content |
DATASTORE |
A vector store, database, or cache — Chroma, Pinecone, Redis, PostgreSQL, etc. |
GUARDRAIL |
A content filter or safety validator — Guardrails AI validators, NeMo Guardrails, etc. |
AUTH |
An authentication node — OAuth2, Bearer, API key, JWT, MCP auth provider |
PRIVILEGE |
A capability grant — db_write, filesystem_write, code_execution, network_out, etc. |
CONTAINER_IMAGE |
A container image reference from a Dockerfile — carries image name, tag, digest, and security posture |
DEPLOYMENT |
An IaC-derived deployment unit — Kubernetes workload, Terraform resource, CloudFormation stack, GitHub Actions workflow, etc. |
IAM |
An IAM entity detected in IaC — AWS role/policy, GCP service account, Azure managed identity, K8s role/ClusterRole |
FRAMEWORK |
An AI framework detected without a more specific type — used when a framework is present but no individual agents were found |
API_ENDPOINT |
An exposed API route or MCP endpoint |
A float between 0 and 1. Values above 0.85 indicate high-confidence AST-derived detection. Values between 0.5–0.85 are usually regex-based. Below 0.5 is inferred or uncertain.
All typed metadata fields are optional (null when not applicable). Fields relevant to each node type:
For MODEL nodes:
| Field | Description |
|---|---|
model_name |
LLM or embedding model identifier, e.g. "gpt-4o-mini" |
framework |
Framework the model is used through, e.g. "openai_agents", "langgraph" |
extras.provider |
Cloud/API provider — "openai", "anthropic", "google", "bedrock", etc. |
extras.model_family |
Normalised family — "gpt-4", "claude-3", "gemini" |
For DATASTORE nodes:
| Field | Description |
|---|---|
datastore_type |
Technology — "chromavector", "pinecone", "postgres", "redis", etc. |
data_classification |
Union of classification labels — ["PHI", "PII"] |
classified_tables |
SQL table or Python model names that carry classified fields |
classified_fields |
Per-table mapping of field names to labels: {"patients": ["name", "dob"]} |
For AUTH / API_ENDPOINT / MCP nodes:
| Field | Description |
|---|---|
auth_type |
Mechanism — "oauth2", "bearer", "api_key", "jwt" |
auth_class |
Provider class name, e.g. "BearerAuthProvider" |
transport |
Protocol — "sse", "streamable-http", "stdio" |
server_name |
MCP server display name |
endpoint |
Address — "0.0.0.0:8080 (sse)", "/chat" |
method |
HTTP method — "GET", "POST" |
For PRIVILEGE nodes:
| Field | Description |
|---|---|
privilege_scope |
Capability label — "db_write", "filesystem_write", "code_execution", "network_out", "email_out", "social_media_out", "admin", "rbac" |
For CONTAINER_IMAGE nodes (Dockerfile):
| Field | Description |
|---|---|
image_name |
e.g. "python" |
image_tag |
e.g. "3.12-slim" |
image_digest |
e.g. "sha256:abc…" |
registry |
e.g. "docker.io", "gcr.io" |
base_image |
Full reference — "python:3.12-slim" |
runs_as_root |
true when the container runs as root (UID 0), false when non-root |
has_health_check |
true when a HEALTHCHECK instruction is present in the Dockerfile |
has_resource_limits |
true when Kubernetes resource limits are defined for the container |
For DEPLOYMENT nodes (IaC — K8s, Terraform, CloudFormation, GitHub Actions, etc.):
| Field | Description |
|---|---|
cloud_region |
Cloud region — e.g. "us-east-1", "eastus", "us-central1" |
availability_zones |
AZs configured — e.g. ["us-east-1a", "us-east-1b"] |
secret_store |
Secret management service — "aws_secrets_manager", "azure_key_vault", "gcp_secret_manager", "hashicorp_vault", "k8s_secret", "github_actions_secret" |
encryption_at_rest |
true when encryption-at-rest is explicitly configured |
encryption_key_ref |
KMS key ARN, Key Vault URI, or CMEK resource reference |
ha_mode |
HA topology: "multi-az", "replicated", or "single" |
runs_as_root |
true when the pod/container security context runs as root |
has_health_check |
true when a liveness or readiness probe is configured |
has_resource_limits |
true when Kubernetes resource limits are defined |
For IAM nodes (IaC — AWS roles/policies, GCP service accounts, Azure managed identities, K8s RBAC):
| Field | Description |
|---|---|
iam_type |
Entity kind: "role", "policy", "service_account", "managed_identity", "role_binding" |
principal |
ARN, email, or object ID of the IAM principal |
permissions |
Actions or scopes granted (up to 20 entries) — e.g. ["s3:GetObject", "s3:PutObject"] |
iam_scope |
Scope of the binding: "project", "subscription", "cluster", "namespace", "resource" |
trust_principals |
Principals trusted to assume this role — AWS trust policy subjects or K8s binding subjects |
For all nodes:
| Field | Description |
|---|---|
deployment_target |
Cloud target — "aws", "gcp", "kubernetes" |
extras |
Adapter-specific key/value pairs not covered by the typed fields above |
Each node carries one or more evidence items explaining why Xelo detected it:
{
"kind": "ast_instantiation",
"confidence": 0.95,
"detail": "crewai_adapter: Agent(role='researcher', ...)",
"location": { "path": "src/agents.py", "line": 42 }
}| Field | Description |
|---|---|
kind |
Detection method: "ast", "ast_instantiation", "regex", "config", "iac", "inferred" |
confidence |
Evidence-level confidence [0, 1] |
detail |
Human-readable description — adapter name plus the matched code snippet (up to 500 chars; full content for PROMPT nodes) |
location.path |
Repo-relative file path |
location.line |
1-based line number, if known |
An edge represents a directed relationship between two nodes.
{
"source": "3f4a1c2d-...",
"target": "a7b2e9f0-...",
"relationship_type": "CALLS"
}| Type | Meaning |
|---|---|
USES |
Agent or framework uses a model |
CALLS |
Agent calls a tool |
ACCESSES |
Agent or model accesses a datastore |
PROTECTS |
Guardrail protects an agent or model |
DEPLOYS |
Deployment artifact deploys an agent or framework |
Explicit edges come from AST analysis. When no explicit edges are found, Xelo adds inferred fallback edges (e.g. agents → tools of the same file).
Standard package manifest entries, scanned recursively at any depth.
{
"name": "langchain-core",
"version_spec": ">=0.3.0",
"purl": "pkg:pypi/langchain-core@0.3.51",
"source_file": "pyproject.toml",
"ecosystem": "pypi"
}High-level roll-up attached to every document.
| Field | Type | Description |
|---|---|---|
use_case |
string | Natural-language description of what the app does (deterministic rule-based; enriched by LLM when enabled) |
frameworks |
list[string] | Detected framework names — ["langgraph", "openai_agents"] |
modalities |
list[string] | I/O modalities in upper-case — ["TEXT", "VOICE", "IMAGE"] |
modality_support |
dict | Detailed flags — {"text": true, "voice": false} |
api_endpoints |
list[string] | API route paths — ["/chat", "/health"] |
deployment_platforms |
list[string] | Cloud/CI platforms — ["AWS", "GCP"] |
regions |
list[string] | Cloud regions — ["us-east-1"] |
environments |
list[string] | Deployment envs — ["prod", "staging"] |
deployment_urls |
list[string] | Canonical URLs from IaC/workflow files |
iac_accounts |
list[string] | Cloud account / project IDs from IaC |
node_counts |
dict | Count per type — {"AGENT": 3, "MODEL": 2, "TOOL": 5} |
secret_stores |
list[string] | Deduplicated secret management services across all IaC files — e.g. ["aws_secrets_manager"] |
availability_zones |
list[string] | All cloud AZs referenced in IaC — e.g. ["us-east-1a", "us-east-1b"] |
encryption_at_rest_coverage |
bool | true when at least one IaC resource has encryption-at-rest configured |
security_findings |
list[string] | Notable security/resilience issues detected across IaC and container config — e.g. ["container_runs_as_root", "missing_health_check", "no_resource_limits", "secrets_in_env_vars", "overly_permissive_iam"] |
iam_principals |
list[string] | IAM role ARNs, GCP service account emails, and Azure managed identity names |
service_accounts |
list[string] | K8s ServiceAccount names and GCP/Azure service account identifiers |
iac_security_summary |
string|null | LLM-generated security briefing covering deployment posture, IAM, secret management, encryption, HA, and CI/CD risks. Populated only when LLM enrichment is enabled (--llm). |
data_classification |
list[string] | Union of all classification labels — ["PHI", "PII"] |
classified_tables |
list[string] | All SQL tables / Python models carrying PII or PHI |
Xelo classifies PII and PHI by analysing SQL CREATE TABLE statements and Python model definitions (Pydantic BaseModel, SQLAlchemy ORM, @dataclass). Classification results appear in two places:
- On each DATASTORE node —
metadata.classified_tablesandmetadata.classified_fieldsshow which tables/fields were flagged. - In
summary.data_classificationandsummary.classified_tables— a project-wide roll-up.
Classification labels:
PII— name, email, phone, address, date of birth, SSN, passport, financial fields, IP address, passwordPHI— medical record numbers, diagnosis, medication, lab results, insurance ID, vital signs, mental health, allergies
The machine-readable JSON Schema (draft 2020-12) is embedded in the package:
xelo schema # print to stdout
xelo schema --output schema.json # write to file
xelo validate my-sbom.json # validate a documentSchema $id: https://nuguard.ai/schemas/aibom/1.3.0/aibom.schema.json
The schema is generated directly from the Pydantic models and is always in sync with the code.