guardspan

LLM compliance checker for regulated financial environments. Built with LangGraph, OpenTelemetry, and FastAPI.

Architecture

┌─────────────────────────────────────────────────────────────────┐
│                         FastAPI Gateway                          │
│  POST /check  ·  GET /health  ·  GET /docs                      │
│  • Generates run_id before workflow execution                    │
│  • Produces minimal audit event for blocked/failed executions    │
└────────────────────────┬────────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────────┐
│                      LangGraph Workflow                          │
│                                                                  │
│  Input Classifier                                                │
│       ↓                                                          │
│  Policy Enforcer ──[block]──→ END                                │
│       ↓ [allow]                                                  │
│  Model Router                                                    │
│       ↓                                                          │
│  LLM Executor ──[error]──→ END                                   │
│       ↓ [success]                                                │
│  Output Validator                                                │
│       ↓                                                          │
│  Audit Logger                                                    │
│       ↓                                                          │
│      END                                                         │
└─────────────────────────────────────────────────────────────────┘
                         │
          ┌──────────────┴──────────────┐
          ▼                             ▼
  Audit Persistence             OTel Export
  DynamoDB (if configured)      OTLP endpoint (if set)
  Console fallback              ConsoleSpanExporter fallback

What it does

Guardspan implements a compliance-aware LLM workflow for regulated financial environments. The system enforces deterministic policy checks before LLM execution, routes requests to cost-appropriate models based on semantic complexity, validates outputs for compliance violations and PII leakage, and maintains a correlated audit trail with OpenTelemetry observability.

The workflow operates as a six-node synchronous LangGraph pipeline. Policy enforcement blocks prohibited requests before they reach the LLM, preventing wasted API costs and compliance exposure. Model routing selects between gpt-4o-mini and gpt-4o based on input complexity and financial domain keywords, optimizing cost while maintaining response quality. Output validation applies deterministic rules to detect compliance violations (guaranteed returns, insider information), PII patterns (email, phone numbers), and quality issues (truncated responses).

Every execution produces a correlated audit record regardless of outcome. Blocked and failed executions emit minimal audit events at the API boundary. Successful executions persist complete audit records with redacted user input, token usage, cost attribution, and validation results. All workflow nodes emit OpenTelemetry spans with a stable graphspan.* attribute namespace, enabling distributed tracing and cost analysis across executions.

Configuration

# Required
OPENAI_API_KEY=sk-...

# Optional — ConsoleSpanExporter if absent
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317

# Optional — console logging if absent
DYNAMODB_AUDIT_TABLE=guardspan-audit

Install and run

make install
make run

# POST http://localhost:8000/check
# GET  http://localhost:8000/health
# GET  http://localhost:8000/docs

Example request:

curl -X POST http://localhost:8000/check \
  -H "Content-Type: application/json" \
  -d '{
    "user_input": "Como funciona um CDB?",
    "advisor_verified": false
  }'

Example response:

{
  "run_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
  "policy_decision": "allow",
  "policy_reason": "Policy check passed",
  "model_used": "gpt-4o-mini",
  "estimated_cost_usd": 0.000123,
  "actual_cost_usd": 0.000098,
  "validation_status": "pass",
  "validation_flags": [],
  "audit_backend": "console",
  "error": null
}

Test

make test

Test suite includes:

23 unit tests for deterministic logic (policy, validation, redaction, cost)
22 node tests with OTel fail-open verification
12 integration tests via FastAPI TestClient
9 property-based tests with Hypothesis (max_examples=30)

All tests mock OpenAI API calls. No real API requests are made during testing.

OTel span schema

All workflow nodes emit spans with the graphspan.* attribute namespace:

Attribute	Node	Type	Description
`graphspan.node.name`	All	string	Node identifier
`graphspan.audit.run_id`	All	string	UUID correlating execution
`graphspan.complexity.level`	Input Classifier	string	"low" or "high"
`graphspan.policy.decision`	Policy Enforcer	string	"allow" or "block"
`graphspan.policy.reason`	Policy Enforcer	string	Policy rule explanation
`graphspan.model.name`	Model Router, LLM Executor	string	"gpt-4o-mini" or "gpt-4o"
`graphspan.cost.estimated_usd`	Model Router	float	Pre-execution cost estimate
`graphspan.cost.actual_usd`	LLM Executor	float	Post-execution actual cost
`graphspan.tokens.input`	LLM Executor	int	Prompt tokens consumed
`graphspan.tokens.output`	LLM Executor	int	Completion tokens generated
`graphspan.validation.status`	Output Validator	string	"pass" or "fail"
`graphspan.validation.flags_count`	Output Validator	int	Number of validation flags
`graphspan.audit.backend`	Audit Logger	string	"dynamodb", "console", or "none"
`graphspan.error`	LLM Executor	string	Error message if execution failed

Cost model

Model	Input (per 1K tokens)	Output (per 1K tokens)	Example cost (100-word request)
gpt-4o-mini	$0.00015	$0.0006	~$0.0001
gpt-4o	$0.0025	$0.01	~$0.002

Cost estimation uses 1 token ≈ 4 characters and assumes output is 2× input tokens. Actual cost is calculated from provider-reported token usage.

Architecture decisions

Why LangGraph over a sequential chain

LangGraph provides explicit conditional edges for policy-based branching and error handling. The workflow has two conditional branches: Policy Enforcer routes to END on block, and LLM Executor routes to END on error. A sequential chain would require implicit exception handling or nested conditionals, obscuring the control flow. LangGraph makes the execution paths explicit in the graph structure, improving readability and testability.

Why policy and validation are deterministic (no LLM)

Policy enforcement and output validation are compliance-critical control points. Using an LLM for these decisions introduces non-determinism, latency, cost, and the risk of prompt injection. Deterministic regex-based rules provide predictable behavior, zero latency, zero cost, and immunity to adversarial inputs. The trade-off is reduced flexibility — adding new rules requires code changes — but this is acceptable for a regulated environment where policy changes follow a formal review process.

Why run_id is generated at the API boundary

The run_id is the identity of an execution, not the identity of an audit record. Generating it at the API boundary before workflow invocation ensures it is present in all responses (including blocked and failed executions), all OTel spans, and all audit records. If the Audit Logger generated it, blocked and failed executions would have no run_id in their responses, breaking correlation. The API boundary is the single source of truth for execution identity.

Why audit_backend over a boolean

A boolean audit_persisted does not distinguish between "logged to console" and "not recorded anywhere", which are operationally different states for a regulated system. The three-value enum ("dynamodb", "console", "none") makes the persistence outcome explicit: durable storage succeeded, best-effort logging occurred, or no record was produced. This distinction is meaningful for troubleshooting and operational review.

Why redacted storage for user_input

Guardspan operates in a regulated financial context subject to LGPD (Brazilian General Data Protection Law). Raw storage maximizes forensic utility but conflicts with Article 6 data minimization principles. Hashed storage eliminates forensic utility entirely. Redacted storage preserves semantic context for operational review while reducing data subject risk.

The redaction implementation uses deterministic regex patterns for CPF (Brazilian tax ID), CNPJ (Brazilian company ID), Brazilian phone numbers, and email addresses. Patterns are replaced with tokens like [CPF_REDACTED] and [EMAIL_REDACTED].

Redaction scope and limitations

This redaction is illustrative and not production-grade DLP. The regex patterns are Brazilian-context only and do not cover:

US Social Security Numbers
Credit card numbers (PAN)
IBAN or other international banking identifiers
Addresses or geolocation data
Biometric data
Health information (HIPAA-regulated data)

Production deployments in regulated environments must integrate a dedicated DLP solution (e.g., AWS Macie, Google Cloud DLP API, Microsoft Purview) or a commercial PII detection library with broader pattern coverage and language support.

DynamoDB is optional. The local MVP requires no AWS resources. Audit records fall back to console logging if DYNAMODB_AUDIT_TABLE is not configured. This design supports local development and testing without cloud dependencies.

graphspan

The span attribute namespace graphspan.* used in this project is intended for future extraction as a standalone PyPI library. The namespace provides a stable telemetry contract for LangGraph workflows, independent of the application name.

Planned repository: github.com/lucianareynaud/graphspan

The graphspan library will provide:

Standardized span attribute names for LLM workflows
Fail-open span emission helpers
Cost attribution utilities
Audit correlation patterns

The tracer name guardspan.workflow identifies this application. The attribute namespace graphspan.* identifies the telemetry contract. These are deliberately separate to enable library extraction without breaking existing instrumentation.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
.hypothesis		.hypothesis
.kiro/specs/financial-llm-compliance-checker		.kiro/specs/financial-llm-compliance-checker
src/guardspan		src/guardspan
tests		tests
.env.example		.env.example
.gitignore		.gitignore
Makefile		Makefile
README.md		README.md
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

guardspan

Architecture

What it does

Configuration

Install and run

Test

OTel span schema

Cost model

Architecture decisions

Why LangGraph over a sequential chain

Why policy and validation are deterministic (no LLM)

Why run_id is generated at the API boundary

Why audit_backend over a boolean

Why redacted storage for user_input

Redaction scope and limitations

graphspan

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

guardspan

Architecture

What it does

Configuration

Install and run

Test

OTel span schema

Cost model

Architecture decisions

Why LangGraph over a sequential chain

Why policy and validation are deterministic (no LLM)

Why run_id is generated at the API boundary

Why audit_backend over a boolean

Why redacted storage for user_input

Redaction scope and limitations

graphspan

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages