Replies: 2 comments 1 reply
-
|
Hello Steve, I found this discussion document very thoughtful and helpful. I think for the wire format I’d keep FHIR JSON as the default for non-database sinks, but I think it could make sense to support an optional alternate wire encoding later for high-throughput transports like Kafka or SQS. The important constraint, in my view, is that AuditEvent should remain the only canonical model. Any compact binary format should be a lossless serialization of helios_fhir::AuditEvent, not a separate schema. That preserves the RFC’s "use the FHIR model directly / no custom intermediate representation" design while still leaving room for performance optimization if JSON ever becomes a bottleneck. So I’d lean toward:
I probably wouldn’t do this in v1 without benchmark evidence, but it seems like a reasonable extension point to define now. For the audit event integrity/signature my inclination is to leave retention and archival policy to the deployment/operations layer rather than making HFS enforce it directly. I'm assuming that audit retention requirements are highly jurisdiction- and organization-specific, and application-level purge logic creates compliance risk if misconfigured. I think HFS should make audit data easy to manage operationally, but the actual retention policy should remain operator-controlled. |
Beta Was this translation helpful? Give feedback.
-
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Summary
This document proposes the design for audit event logging in HFS. The audit system records security, privacy, and operational events as FHIR
AuditEventresources conforming to the IHE Basic Audit Log Patterns (BALP) Implementation Guide. It is designed as a pluggable, async-first subsystem with multiple backend "sinks" — from no-op (for development) to production-grade targets like PostgreSQL, Kafka, and AWS SQS.The goal is twofold: provide a trustworthy, standards-compliant audit trail for healthcare data access, and expose that trail as a searchable FHIR resource via the REST API (
GET /AuditEvent).We use
helios_fhir::r4::AuditEvent(and the corresponding R5/R6 types) directly as the data model. TheAuditEventstruct derivesSerialize,Deserialize,Clone,Debug, andDefault, and implementsSend + Sync, making it ideal for passing across async boundaries and serializing to any backend.We are seeking community feedback on the trait design, backend strategy, and BALP conformance approach before implementation begins.
Table of Contents
helios_fhir::AuditEventDirectlyAuditSinkMotivation
Every FHIR server operating in a healthcare environment must maintain an audit trail. This is not optional — it is a fundamental requirement of HIPAA, GDPR (for health data), and virtually every national health data regulation. Beyond compliance, a well-structured audit log enables:
Privacy transparency. Patients and privacy officers can see who accessed what data and when. The IHE BALP Implementation Guide specifically targets this use case, enabling patient-facing audit access through the FHIR API.
Security surveillance. Establishing a baseline of normal operations and detecting anomalies — a core function of the IHE ATNA profile that has been in production healthcare systems for over two decades. As the ATNA specification describes, audit events capture all security events that are detected plus a full set of activity and transaction events describing ongoing operations. These are monitored for deviations from that baseline.
Operational accountability. Understanding system behavior during incidents, supporting forensic investigation, and satisfying regulatory reporting requirements.
Interoperability. By recording audit events as FHIR
AuditEventresources conforming to BALP, audit data becomes portable, queryable, and interoperable across systems — rather than locked in proprietary log formats.HFS needs an audit system that is idiomatic to Rust's async ecosystem, zero-cost when disabled, and flexible enough to target the diverse deployment environments our users operate in — from a developer's laptop to a multi-region cloud deployment fronting Kafka-based event pipelines.
Standards Background
The audit event model in FHIR has deep roots. Understanding the lineage helps explain why the
AuditEventresource looks the way it does and why we use it directly rather than defining a custom intermediate struct.The Standards Chain
ASTM E2147 established the concept of security audit logs for healthcare, including accounting of disclosures.
RFC 3881 (2004) defined the XML-based information model for healthcare audit messages. It specified event identification, active participants, audit sources, and participant objects as the core data elements. The RFC notes that audit data collection should be stateless to avoid the overhead of transactional semantics, and that data gathering should be optimized through buffering and filtering — principles that directly inform our fire-and-forget sink design.
DICOM PS3.15 Annex A.5 made the RFC 3881 information model normative, defined vocabulary, transport bindings (Syslog), and the XML schema.
IHE ATNA (Audit Trail and Node Authentication) profiled DICOM audit logging for healthcare IT. ATNA defines which events to log for each IHE transaction, requires TLS for node authentication, and specifies Syslog as transport. Critically, ATNA distinguishes between surveillance logging (what we implement here) and forensic logging (detailed, product-specific logs for incident investigation). Our audit system targets the surveillance use case.
FHIR AuditEvent is the HL7 FHIR representation of this same information model, managed collaboratively between HL7, DICOM, and IHE. It maps the RFC 3881 / DICOM audit message structure into FHIR's resource model. The
AuditEventresource captures events in terms of who (agents), what (entities), where (source), when (recorded/period), and why (purpose of event).IHE BALP (Basic Audit Log Patterns) v1.1.4 is an Implementation Guide that profiles
AuditEventfor common FHIR RESTful operations (Create, Read, Update, Delete, Search), security token use (SAML, OAuth), consent decisions, and privacy disclosures. BALP provides concrete, testable profiles with two variants: minimal (identifier-centric, assumes lookup capability) and comprehensive (self-contained, preserves context).Why BALP?
BALP is the current standard of practice for FHIR audit logging. It provides reusable AuditEvent patterns that can be used directly or extended for domain-specific use cases. Critically, BALP does not mandate how audit events are stored — only that they conform to specific profiles when exposed via FHIR. As the IHE guidance notes: a system can record events using whatever internal mechanism it wants, as long as the data can be made available in FHIR AuditEvent format.
This is exactly the architectural seam we exploit: backends persist
AuditEventresources however they choose, and the read-side API serves them as BALP-conformant resources.The BALP IG also establishes important operational patterns: both client and server should record audit events for the same interaction (enabling anomaly detection through correlation), and accessing the audit log is itself an auditable event that generates its own
AuditEvent.Design Principles
Zero-cost when disabled. A developer running HFS locally with
audit: noneshould pay no performance penalty. TheNullSinkcompiles to no-ops.Use the FHIR model directly. We use
helios_fhir::r4::AuditEvent(and the corresponding R5/R6 types) as the data structure passed through the entire audit pipeline. No custom intermediate representation. The struct is alreadySerialize + Deserialize + Clone + Send + Sync, which is everything we need. This eliminates an entire class of mapping bugs and ensures any BALP profile can be populated without hitting a modeling wall.Async-first. All sink implementations are behind an async trait. File I/O, database writes, Kafka produce calls, and SQS sends are all naturally async operations. Synchronous contexts (e.g., a test harness) can use
block_onor a blocking adapter.Fire-and-forget semantics. Audit logging must not block or degrade the FHIR server's request processing. Sink failures are logged (via
tracing) but do not propagate errors to callers. This is a deliberate tradeoff: we prioritize server availability over guaranteed audit delivery. For deployments requiring stronger guarantees, Kafka and SQS provide their own durability and retry mechanisms at the transport layer. This aligns with RFC 3881's guidance that data gathering should be stateless to avoid the overhead of transactional semantics.Single active backend. The initial implementation supports exactly one configured audit sink at a time. This simplifies configuration, error reasoning, and resource management. Fan-out to multiple sinks is a natural future extension but is explicitly out of scope for v1.
Immutability of the audit trail. Regardless of backend, the FHIR REST API will not permit
UPDATEorDELETEoperations onAuditEventresources. This is fundamental to the purpose of an audit log — it must be tamper-evident.Architecture Overview
Data Model: Using
helios_fhir::AuditEventDirectlyRather than defining a custom struct, we use the existing
helios_fhir::r4::AuditEventdirectly. The struct already provides everything we need:This struct derives
Serialize,Deserialize,Clone,Debug,Default,PartialEq, and auto-implementsSend + Sync. It is the canonical FHIR representation — exactly what we need to pass through the audit pipeline, serialize to any backend, and serve via the FHIR API.Why Not a Custom Struct?
Simplified struct with BALP serialization would force us to pick which fields matter up front. Any field we omit today becomes a breaking change to add tomorrow. More importantly, it introduces a mapping layer between our internal model and the FHIR resource that must be tested and maintained — all to re-create functionality the
helios_fhircrate already provides.Generic key-value map offers maximum flexibility but loses all type safety and makes it impossible to validate BALP conformance at compile time.
Using the FHIR type directly means: the data structure passed to
AuditSink::record()is the same structure stored in the database, serialized to Kafka, and served from the FHIR API. No mapping, no conversion, no impedance mismatch.FHIR Version Handling
HFS supports R4, R4B, R5, and R6 via feature flags. The
AuditEventresource evolved between versions — notably, R5 replacedtype: Codingwithcategory: Vec<CodeableConcept>andcode: CodeableConcept, and changedoutcomefrom a Code to a structured backbone element.The sink trait and interceptor use the appropriate version's type through
cfgattributes, matching HFS's existing FHIR version selection pattern:Since BALP v1.1.4 targets R4 (and notes compatibility with R4B), the initial implementation focuses on R4. R5/R6 support follows the same pattern — the
AuditEventtype changes, but theAuditSinktrait and backend implementations are identical because they operate onserde-serializable types.Builder Pattern for BALP Profiles
The
AuditInterceptorprovides builder functions for constructing BALP-conformantAuditEventinstances. These fill in the required coded values for each BALP pattern:This approach lets callers construct BALP-conformant events with a single function call, while retaining the ability to build arbitrary
AuditEventinstances for non-RESTful events (login, logout, configuration changes, etc.) by constructing the struct directly.The Sink Trait:
AuditSinkThe
AuditSinktrait is the core abstraction. Every backend implements this trait. The server holds a singleArc<dyn AuditSink>which is injected at startup based on configuration.Design Notes
async fn record(&self, event: AuditEvent)takes ownership. The caller constructs theAuditEventand hands it off. SinceAuditEventimplementsClone, callers that need to retain a copy can clone before calling.Infallible by design. The method returns
(), notResult. This is a deliberate choice matching the fire-and-forget semantics. Sink implementations handle errors internally — logging them viatracing::warn!and incrementing a failure counter exposed via metrics. This ensures a flaky Kafka broker cannot cascade into FHIR API failures.flush()for graceful shutdown. Async sinks that buffer events (file sink with write batching, Kafka producer with linger) need a way to drain their buffers on shutdown. The server's shutdown handler callssink.flush().awaitbefore exiting.record_batch()for Bundles. When processing a FHIRtransactionorbatchBundle, HFS generates oneAuditEventper entry plus one for the Bundle as a whole.record_batchallows backends like Kafka to use a single produce request for the batch, reducing round-trips.Backend Implementations
NullSink— No LoggingFor local development. All methods are no-ops. The compiler should be able to optimize these away entirely when monomorphized.
FileSink— Append-Only NDJSON FileWrites one JSON-serialized
AuditEventper line to an append-only file. Uses NDJSON (newline-delimited JSON) format — each line is a valid FHIRAuditEventJSON resource, directly ingestible by log aggregation tools and parseable by any FHIR-aware consumer.Because
helios_fhir::r4::AuditEventimplementsserde::Serialize, the serialized output is FHIR-conformant JSON. This means file sink output and the FHIR REST API serve the same representation — no format divergence.File rotation is intentionally deferred. The initial implementation writes to a single file. Rotation can be added via
tracing-appender-style rolling, or delegated to the operating system (e.g.,logrotateon Linux). This keeps the initial implementation simple.DatabaseSink— PostgreSQL / SQLite / S3Persists
AuditEventresources to the same storage backends HFS already supports: PostgreSQL, SQLite, and S3. The database sink is the only backend that directly enables the read-side FHIR API (GET /AuditEvent), because the data is stored in a queryable format alongside other FHIR resources.Since the
AuditEventstruct is the same type used throughout HFS's resource storage layer, persisting it requires no conversion — it is stored exactly as any other FHIR resource would be.Separate vs. Shared Database
The
DatabaseSinksupports two configurations:Dedicated audit database (
audit.database_urlis set) — The audit sink connects to a separate database instance from the main FHIR resource storage. This is the recommended configuration for production deployments. Isolating audit data from clinical data provides independent scaling, independent backup/retention policies, and ensures that audit write load does not contend with FHIR API read/write operations.Shared database (
audit.database_urlis omitted) — The audit sink reuses the sameStorageBackendinstance as the FHIR server. This is convenient for development and testing, but at startup the server will emit a warning:This warning is suppressed when a dedicated
audit.database_urlis configured.KafkaSink— Apache KafkaProduces
AuditEventresources to a Kafka topic as JSON-serialized messages. Usesrdkafka(Rust bindings for librdkafka) behind thekafkafeature flag.Kafka key strategy: The
AuditEvent.idis used as the message key. This distributes events evenly across partitions. Deployments that need patient-level ordering could override this to use the patient reference as key — this is a future consideration.SqsSink— AWS Simple Queue ServiceSends
AuditEventresources to an SQS queue usingaws-sdk-sqs, behind thesqsfeature flag. SQS is a natural fit for deployments already in the AWS ecosystem, providing managed durability and dead-letter queue support.Feature Flags
Backend dependencies are gated behind Cargo feature flags to avoid pulling heavy dependencies for unused backends:
The
NullSink,FileSink, andDatabaseSinkhave no external dependencies beyond what HFS already uses and are always available.Patient Identity and BALP Profiles
BALP defines two variants for each RESTful audit pattern:
AuditEvent.entityincludes a Patient reference withrole = "Patient". Used when the operation involves a specific, identifiable patient.Practitioner,Organization) or when patient identity cannot be determined.Our approach: optional patient reference with automatic inference as fallback.
The
AuditInterceptorpopulates the patient entity using a four-step waterfall:Explicit patient reference. If the operation directly targets a Patient resource (e.g.,
GET /Patient/123), the patient reference is taken from the resource identity.Compartment inference. If the operation targets a resource within the Patient compartment (e.g.,
GET /Observation/456whereObservation.subject→Patient/123), the interceptor traverses the subject/patient reference to infer the patient identity.Search parameter inference. For search operations with a
patientorsubjectparameter (e.g.,GET /Observation?patient=Patient/123), the patient identity is extracted from the search parameters.No patient. If none of the above applies, the audit event is recorded without a patient entity, using the BALP "without Patient" profile variant.
This means the system automatically selects the appropriate BALP profile variant (with or without Patient) based on the operation context. Callers never need to manually specify which variant to use.
Read-Side: AuditEvent as a FHIR Resource
The audit system exposes
AuditEventas a fully searchable FHIR resource through the standard REST API. This enables privacy officers, security teams, and patient-facing applications to query the audit trail. This aligns with the ATNA[ITI-81]transaction for retrieving audit records, and with BALP's explicit use case of providing AuditEvents to authorized consumers.Supported Interactions
readGET /AuditEvent/{id}searchGET /AuditEvent?{params}createupdatedeleteSearch Parameters
The following FHIR search parameters will be supported, aligned with the ATNA
[ITI-81]transaction:_iddateactionoutcomeagentpatiententitytypesubtypeBackend Considerations for Read-Side
Only the database sink (PostgreSQL, SQLite, S3) directly enables the read-side API, because the data is stored in HFS's existing FHIR resource storage and indexed for search.
For non-database backends (file, Kafka, SQS), the read-side API will return empty results unless a separate process ingests audit events back into a queryable store. This is an expected deployment pattern: Kafka consumers or SQS processors can write audit events into a dedicated FHIR repository for querying.
Audit-of-Audit
Consistent with BALP guidance, querying
AuditEventresources is itself an auditable event. AGET /AuditEventsearch will generate its ownAuditEventrecord. This creates a recursive but convergent trail — the audit-of-audit uses the "without Patient" profile variant since the entity being accessed is anAuditEvent, not patient data.Configuration
Audit configuration follows the existing HFS environment variable and configuration file patterns:
Environment variable overrides follow the
HFS_prefix convention:HFS_AUDIT_BACKEND=kafka HFS_AUDIT_KAFKA_BROKERS=broker1:9092,broker2:9092 HFS_AUDIT_KAFKA_TOPIC=fhir-audit # For dedicated audit database: HFS_AUDIT_DATABASE_URL=postgresql://audit_user:pass@audit-db:5432/fhir_auditFiltering and Exclusions
Not every request warrants an audit event. Health checks, capability statements, and other infrastructure endpoints generate noise without security or privacy value. RFC 3881 explicitly anticipates this, noting that policy-based methods should be employed to optimize data gathering, including selective auditing of only events defined as important.
HFS supports configurable exclusion rules:
Default exclusions (applied unless overridden):
GET /metadata— CapabilityStatement requestsGET /_health— Health check endpointGET /.well-known/*— SMART configuration discoveryOpen Questions
We welcome feedback on the following design decisions. Please comment below or open a linked discussion for deeper dives.
1. Wire Format for Non-Database Backends
Since we use
helios_fhir::AuditEventdirectly and it implementsserde::Serialize, the file, Kafka, and SQS backends automatically emit FHIR-conformant JSON. This means any consumer can parse the output as a FHIR resource. Is there a use case where a non-FHIR serialization format would be preferred (e.g., a compact binary encoding for high-throughput Kafka deployments)?2. Kafka Message Key Strategy
Using
AuditEvent.idas the Kafka key distributes messages evenly across partitions. However, some consumers may prefer patient-level ordering (all events forPatient/123on the same partition). Should the Kafka key be configurable — defaulting to event ID but optionally set to patient reference?3. File Rotation Strategy
The initial
FileSinkwrites to a single file. Options for rotation include: built-in daily rotation (time-based rolling), size-based rotation, or delegation to the OS vialogrotate. Which approach fits our users best?4. Batch Audit Events for Bundles
When processing a FHIR Bundle, we generate one
AuditEventper Bundle entry plus one for the Bundle as a whole (with actionEfor Execute). If atransactionBundle fails and all actions are reverted, theAuditEventrecords are still persisted (with failure outcome codes). Is this the correct behavior, or should failed transaction Bundles generate only a single failure AuditEvent?5. AuditEvent Resource Retention
Audit logs grow indefinitely. Should we provide a built-in retention/archival mechanism (e.g., purge AuditEvents older than N days via a background task), or is this purely an operational concern left to database maintenance? Note that any retention mechanism must be carefully designed to not violate regulatory retention requirements, which vary by jurisdiction.
6. Fan-Out as Future Work
The current design supports a single active backend. A natural extension is a
FanOutSinkthat wraps multiple sinks:This is explicitly deferred but architecturally trivial given the trait design. Is single-backend sufficient for the initial release, or is fan-out a hard requirement for any early adopter?
7. Audit Event Integrity / Signatures
Should we support cryptographic signatures on
AuditEventresources to provide tamper evidence? This would involve generating aProvenanceresource containing a signature of the completeAuditEventJSON. Is this a requirement for any initial deployment, or can it be deferred?References
helios_fhir::r4::AuditEventrdkafkacrateaws-sdk-sqscrateBeta Was this translation helpful? Give feedback.
All reactions