LLM compliance checker for regulated financial environments. Built with LangGraph, OpenTelemetry, and FastAPI.
┌─────────────────────────────────────────────────────────────────┐
│ FastAPI Gateway │
│ POST /check · GET /health · GET /docs │
│ • Generates run_id before workflow execution │
│ • Produces minimal audit event for blocked/failed executions │
└────────────────────────┬────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ LangGraph Workflow │
│ │
│ Input Classifier │
│ ↓ │
│ Policy Enforcer ──[block]──→ END │
│ ↓ [allow] │
│ Model Router │
│ ↓ │
│ LLM Executor ──[error]──→ END │
│ ↓ [success] │
│ Output Validator │
│ ↓ │
│ Audit Logger │
│ ↓ │
│ END │
└─────────────────────────────────────────────────────────────────┘
│
┌──────────────┴──────────────┐
▼ ▼
Audit Persistence OTel Export
DynamoDB (if configured) OTLP endpoint (if set)
Console fallback ConsoleSpanExporter fallback
Guardspan implements a compliance-aware LLM workflow for regulated financial environments. The system enforces deterministic policy checks before LLM execution, routes requests to cost-appropriate models based on semantic complexity, validates outputs for compliance violations and PII leakage, and maintains a correlated audit trail with OpenTelemetry observability.
The workflow operates as a six-node synchronous LangGraph pipeline. Policy enforcement blocks prohibited requests before they reach the LLM, preventing wasted API costs and compliance exposure. Model routing selects between gpt-4o-mini and gpt-4o based on input complexity and financial domain keywords, optimizing cost while maintaining response quality. Output validation applies deterministic rules to detect compliance violations (guaranteed returns, insider information), PII patterns (email, phone numbers), and quality issues (truncated responses).
Every execution produces a correlated audit record regardless of outcome. Blocked and failed executions emit minimal audit events at the API boundary. Successful executions persist complete audit records with redacted user input, token usage, cost attribution, and validation results. All workflow nodes emit OpenTelemetry spans with a stable graphspan.* attribute namespace, enabling distributed tracing and cost analysis across executions.
# Required
OPENAI_API_KEY=sk-...
# Optional — ConsoleSpanExporter if absent
OTEL_EXPORTER_OTLP_ENDPOINT=http://localhost:4317
# Optional — console logging if absent
DYNAMODB_AUDIT_TABLE=guardspan-auditmake install
make run
# POST http://localhost:8000/check
# GET http://localhost:8000/health
# GET http://localhost:8000/docsExample request:
curl -X POST http://localhost:8000/check \
-H "Content-Type: application/json" \
-d '{
"user_input": "Como funciona um CDB?",
"advisor_verified": false
}'Example response:
{
"run_id": "a1b2c3d4-e5f6-7890-abcd-ef1234567890",
"policy_decision": "allow",
"policy_reason": "Policy check passed",
"model_used": "gpt-4o-mini",
"estimated_cost_usd": 0.000123,
"actual_cost_usd": 0.000098,
"validation_status": "pass",
"validation_flags": [],
"audit_backend": "console",
"error": null
}make testTest suite includes:
- 23 unit tests for deterministic logic (policy, validation, redaction, cost)
- 22 node tests with OTel fail-open verification
- 12 integration tests via FastAPI TestClient
- 9 property-based tests with Hypothesis (max_examples=30)
All tests mock OpenAI API calls. No real API requests are made during testing.
All workflow nodes emit spans with the graphspan.* attribute namespace:
| Attribute | Node | Type | Description |
|---|---|---|---|
graphspan.node.name |
All | string | Node identifier |
graphspan.audit.run_id |
All | string | UUID correlating execution |
graphspan.complexity.level |
Input Classifier | string | "low" or "high" |
graphspan.policy.decision |
Policy Enforcer | string | "allow" or "block" |
graphspan.policy.reason |
Policy Enforcer | string | Policy rule explanation |
graphspan.model.name |
Model Router, LLM Executor | string | "gpt-4o-mini" or "gpt-4o" |
graphspan.cost.estimated_usd |
Model Router | float | Pre-execution cost estimate |
graphspan.cost.actual_usd |
LLM Executor | float | Post-execution actual cost |
graphspan.tokens.input |
LLM Executor | int | Prompt tokens consumed |
graphspan.tokens.output |
LLM Executor | int | Completion tokens generated |
graphspan.validation.status |
Output Validator | string | "pass" or "fail" |
graphspan.validation.flags_count |
Output Validator | int | Number of validation flags |
graphspan.audit.backend |
Audit Logger | string | "dynamodb", "console", or "none" |
graphspan.error |
LLM Executor | string | Error message if execution failed |
| Model | Input (per 1K tokens) | Output (per 1K tokens) | Example cost (100-word request) |
|---|---|---|---|
| gpt-4o-mini | $0.00015 | $0.0006 | ~$0.0001 |
| gpt-4o | $0.0025 | $0.01 | ~$0.002 |
Cost estimation uses 1 token ≈ 4 characters and assumes output is 2× input tokens. Actual cost is calculated from provider-reported token usage.
LangGraph provides explicit conditional edges for policy-based branching and error handling. The workflow has two conditional branches: Policy Enforcer routes to END on block, and LLM Executor routes to END on error. A sequential chain would require implicit exception handling or nested conditionals, obscuring the control flow. LangGraph makes the execution paths explicit in the graph structure, improving readability and testability.
Policy enforcement and output validation are compliance-critical control points. Using an LLM for these decisions introduces non-determinism, latency, cost, and the risk of prompt injection. Deterministic regex-based rules provide predictable behavior, zero latency, zero cost, and immunity to adversarial inputs. The trade-off is reduced flexibility — adding new rules requires code changes — but this is acceptable for a regulated environment where policy changes follow a formal review process.
The run_id is the identity of an execution, not the identity of an audit record. Generating it at the API boundary before workflow invocation ensures it is present in all responses (including blocked and failed executions), all OTel spans, and all audit records. If the Audit Logger generated it, blocked and failed executions would have no run_id in their responses, breaking correlation. The API boundary is the single source of truth for execution identity.
A boolean audit_persisted does not distinguish between "logged to console" and "not recorded anywhere", which are operationally different states for a regulated system. The three-value enum ("dynamodb", "console", "none") makes the persistence outcome explicit: durable storage succeeded, best-effort logging occurred, or no record was produced. This distinction is meaningful for troubleshooting and operational review.
Guardspan operates in a regulated financial context subject to LGPD (Brazilian General Data Protection Law). Raw storage maximizes forensic utility but conflicts with Article 6 data minimization principles. Hashed storage eliminates forensic utility entirely. Redacted storage preserves semantic context for operational review while reducing data subject risk.
The redaction implementation uses deterministic regex patterns for CPF (Brazilian tax ID), CNPJ (Brazilian company ID), Brazilian phone numbers, and email addresses. Patterns are replaced with tokens like [CPF_REDACTED] and [EMAIL_REDACTED].
This redaction is illustrative and not production-grade DLP. The regex patterns are Brazilian-context only and do not cover:
- US Social Security Numbers
- Credit card numbers (PAN)
- IBAN or other international banking identifiers
- Addresses or geolocation data
- Biometric data
- Health information (HIPAA-regulated data)
Production deployments in regulated environments must integrate a dedicated DLP solution (e.g., AWS Macie, Google Cloud DLP API, Microsoft Purview) or a commercial PII detection library with broader pattern coverage and language support.
DynamoDB is optional. The local MVP requires no AWS resources. Audit records fall back to console logging if DYNAMODB_AUDIT_TABLE is not configured. This design supports local development and testing without cloud dependencies.
The span attribute namespace graphspan.* used in this project is intended for future extraction as a standalone PyPI library. The namespace provides a stable telemetry contract for LangGraph workflows, independent of the application name.
Planned repository: github.com/lucianareynaud/graphspan
The graphspan library will provide:
- Standardized span attribute names for LLM workflows
- Fail-open span emission helpers
- Cost attribution utilities
- Audit correlation patterns
The tracer name guardspan.workflow identifies this application. The attribute namespace graphspan.* identifies the telemetry contract. These are deliberately separate to enable library extraction without breaking existing instrumentation.