Audit Implementation #50

smunini · 2026-03-24T01:17:42Z

smunini
Mar 24, 2026
Maintainer

Status: RFC — Request for Comments
Crate: hfs-audit
FHIR Versions: R4, R4B, R5, R6 (matching HFS feature flags)
Standards Alignment: IHE BALP v1.1.4, FHIR AuditEvent, IHE ATNA, RFC 3881

Summary

This document proposes the design for audit event logging in HFS. The audit system records security, privacy, and operational events as FHIR AuditEvent resources conforming to the IHE Basic Audit Log Patterns (BALP) Implementation Guide. It is designed as a pluggable, async-first subsystem with multiple backend "sinks" — from no-op (for development) to production-grade targets like PostgreSQL, Kafka, and AWS SQS.

The goal is twofold: provide a trustworthy, standards-compliant audit trail for healthcare data access, and expose that trail as a searchable FHIR resource via the REST API (GET /AuditEvent).

We use helios_fhir::r4::AuditEvent (and the corresponding R5/R6 types) directly as the data model. The AuditEvent struct derives Serialize, Deserialize, Clone, Debug, and Default, and implements Send + Sync, making it ideal for passing across async boundaries and serializing to any backend.

We are seeking community feedback on the trait design, backend strategy, and BALP conformance approach before implementation begins.

Motivation

Every FHIR server operating in a healthcare environment must maintain an audit trail. This is not optional — it is a fundamental requirement of HIPAA, GDPR (for health data), and virtually every national health data regulation. Beyond compliance, a well-structured audit log enables:

Privacy transparency. Patients and privacy officers can see who accessed what data and when. The IHE BALP Implementation Guide specifically targets this use case, enabling patient-facing audit access through the FHIR API.

Security surveillance. Establishing a baseline of normal operations and detecting anomalies — a core function of the IHE ATNA profile that has been in production healthcare systems for over two decades. As the ATNA specification describes, audit events capture all security events that are detected plus a full set of activity and transaction events describing ongoing operations. These are monitored for deviations from that baseline.

Operational accountability. Understanding system behavior during incidents, supporting forensic investigation, and satisfying regulatory reporting requirements.

Interoperability. By recording audit events as FHIR AuditEvent resources conforming to BALP, audit data becomes portable, queryable, and interoperable across systems — rather than locked in proprietary log formats.

HFS needs an audit system that is idiomatic to Rust's async ecosystem, zero-cost when disabled, and flexible enough to target the diverse deployment environments our users operate in — from a developer's laptop to a multi-region cloud deployment fronting Kafka-based event pipelines.

Standards Background

The audit event model in FHIR has deep roots. Understanding the lineage helps explain why the AuditEvent resource looks the way it does and why we use it directly rather than defining a custom intermediate struct.

The Standards Chain

ASTM E2147 established the concept of security audit logs for healthcare, including accounting of disclosures.

RFC 3881 (2004) defined the XML-based information model for healthcare audit messages. It specified event identification, active participants, audit sources, and participant objects as the core data elements. The RFC notes that audit data collection should be stateless to avoid the overhead of transactional semantics, and that data gathering should be optimized through buffering and filtering — principles that directly inform our fire-and-forget sink design.

DICOM PS3.15 Annex A.5 made the RFC 3881 information model normative, defined vocabulary, transport bindings (Syslog), and the XML schema.

IHE ATNA (Audit Trail and Node Authentication) profiled DICOM audit logging for healthcare IT. ATNA defines which events to log for each IHE transaction, requires TLS for node authentication, and specifies Syslog as transport. Critically, ATNA distinguishes between surveillance logging (what we implement here) and forensic logging (detailed, product-specific logs for incident investigation). Our audit system targets the surveillance use case.

FHIR AuditEvent is the HL7 FHIR representation of this same information model, managed collaboratively between HL7, DICOM, and IHE. It maps the RFC 3881 / DICOM audit message structure into FHIR's resource model. The AuditEvent resource captures events in terms of who (agents), what (entities), where (source), when (recorded/period), and why (purpose of event).

IHE BALP (Basic Audit Log Patterns) v1.1.4 is an Implementation Guide that profiles AuditEvent for common FHIR RESTful operations (Create, Read, Update, Delete, Search), security token use (SAML, OAuth), consent decisions, and privacy disclosures. BALP provides concrete, testable profiles with two variants: minimal (identifier-centric, assumes lookup capability) and comprehensive (self-contained, preserves context).

Why BALP?

BALP is the current standard of practice for FHIR audit logging. It provides reusable AuditEvent patterns that can be used directly or extended for domain-specific use cases. Critically, BALP does not mandate how audit events are stored — only that they conform to specific profiles when exposed via FHIR. As the IHE guidance notes: a system can record events using whatever internal mechanism it wants, as long as the data can be made available in FHIR AuditEvent format.

This is exactly the architectural seam we exploit: backends persist AuditEvent resources however they choose, and the read-side API serves them as BALP-conformant resources.

The BALP IG also establishes important operational patterns: both client and server should record audit events for the same interaction (enabling anomaly detection through correlation), and accessing the audit log is itself an auditable event that generates its own AuditEvent.

Design Principles

Zero-cost when disabled. A developer running HFS locally with audit: none should pay no performance penalty. The NullSink compiles to no-ops.
Use the FHIR model directly. We use helios_fhir::r4::AuditEvent (and the corresponding R5/R6 types) as the data structure passed through the entire audit pipeline. No custom intermediate representation. The struct is already Serialize + Deserialize + Clone + Send + Sync, which is everything we need. This eliminates an entire class of mapping bugs and ensures any BALP profile can be populated without hitting a modeling wall.
Async-first. All sink implementations are behind an async trait. File I/O, database writes, Kafka produce calls, and SQS sends are all naturally async operations. Synchronous contexts (e.g., a test harness) can use block_on or a blocking adapter.
Fire-and-forget semantics. Audit logging must not block or degrade the FHIR server's request processing. Sink failures are logged (via tracing) but do not propagate errors to callers. This is a deliberate tradeoff: we prioritize server availability over guaranteed audit delivery. For deployments requiring stronger guarantees, Kafka and SQS provide their own durability and retry mechanisms at the transport layer. This aligns with RFC 3881's guidance that data gathering should be stateless to avoid the overhead of transactional semantics.
Single active backend. The initial implementation supports exactly one configured audit sink at a time. This simplifies configuration, error reasoning, and resource management. Fan-out to multiple sinks is a natural future extension but is explicitly out of scope for v1.
Immutability of the audit trail. Regardless of backend, the FHIR REST API will not permit UPDATE or DELETE operations on AuditEvent resources. This is fundamental to the purpose of an audit log — it must be tamper-evident.

Architecture Overview

┌──────────────────────────────────────────────────────┐
│                   FHIR Request                       │
│              (e.g., GET /Patient/123)                │
└──────────────────┬───────────────────────────────────┘
                   │
                   ▼
┌──────────────────────────────────────────────────────┐
│              AuditInterceptor                        │
│  ┌─────────────────────────────────────────────────┐ │
│  │  1. Capture request context (who, what, when)   │ │
│  │  2. Infer patient identity from resource/search │ │
│  │  3. Build helios_fhir::AuditEvent               │ │
│  │  4. Classify BALP profile variant               │ │
│  └─────────────────────────────────────────────────┘ │
└──────────────────┬───────────────────────────────────┘
                   │
                   ▼
┌──────────────────────────────────────────────────────┐
│          AuditSink (async trait object)              │
│                                                      │
│  ┌──────────┐ ┌──────────┐ ┌────────┐ ┌───────────┐  │
│  │ NullSink │ │ FileSink │ │ DbSink │ │ KafkaSink │  │
│  └──────────┘ └──────────┘ └────────┘ └───────────┘  │
│                                        ┌──────────┐  │
│                                        │ SqsSink  │  │
│                                        └──────────┘  │
└──────────────────────────────────────────────────────┘
                   │
                   ▼  (for database backends)
┌──────────────────────────────────────────────────────┐
│         Read-Side: GET /AuditEvent                   │
│         Searchable via FHIR Search API               │
│         (Immutable — no UPDATE or DELETE)            │
└──────────────────────────────────────────────────────┘

Data Model: Using `helios_fhir::AuditEvent` Directly

Rather than defining a custom struct, we use the existing helios_fhir::r4::AuditEvent directly. The struct already provides everything we need:

// From helios_fhir::r4
pub struct AuditEvent {
    pub id: Option<String>,
    pub meta: Option<Meta>,
    pub implicit_rules: Option<Uri>,
    pub language: Option<Code>,
    pub text: Option<Narrative>,
    pub contained: Option<Vec<Resource>>,
    pub extension: Option<Vec<Extension>>,
    pub modifier_extension: Option<Vec<Extension>>,
    pub r#type: Coding,                              // Event type (required)
    pub subtype: Option<Vec<Coding>>,                // RESTful interaction subtype
    pub action: Option<Code>,                        // C | R | U | D | E
    pub period: Option<Period>,                      // When the activity occurred
    pub recorded: Instant,                           // When recorded (required)
    pub outcome: Option<Code>,                       // Success/failure
    pub outcome_desc: Option<String>,                // Human-readable outcome
    pub purpose_of_event: Option<Vec<CodeableConcept>>,
    pub agent: Option<Vec<AuditEventAgent>>,         // Who was involved (1..*)
    pub source: AuditEventSource,                    // Reporter (required)
    pub entity: Option<Vec<AuditEventEntity>>,       // What resources were involved
}

This struct derives Serialize, Deserialize, Clone, Debug, Default, PartialEq, and auto-implements Send + Sync. It is the canonical FHIR representation — exactly what we need to pass through the audit pipeline, serialize to any backend, and serve via the FHIR API.

Why Not a Custom Struct?

Simplified struct with BALP serialization would force us to pick which fields matter up front. Any field we omit today becomes a breaking change to add tomorrow. More importantly, it introduces a mapping layer between our internal model and the FHIR resource that must be tested and maintained — all to re-create functionality the helios_fhir crate already provides.

Generic key-value map offers maximum flexibility but loses all type safety and makes it impossible to validate BALP conformance at compile time.

Using the FHIR type directly means: the data structure passed to AuditSink::record() is the same structure stored in the database, serialized to Kafka, and served from the FHIR API. No mapping, no conversion, no impedance mismatch.

FHIR Version Handling

HFS supports R4, R4B, R5, and R6 via feature flags. The AuditEvent resource evolved between versions — notably, R5 replaced type: Coding with category: Vec<CodeableConcept> and code: CodeableConcept, and changed outcome from a Code to a structured backbone element.

The sink trait and interceptor use the appropriate version's type through cfg attributes, matching HFS's existing FHIR version selection pattern:

#[cfg(feature = "R4")]
use helios_fhir::r4::AuditEvent;

#[cfg(feature = "R5")]
use helios_fhir::r5::AuditEvent;

#[cfg(feature = "R6")]
use helios_fhir::r6::AuditEvent;

Since BALP v1.1.4 targets R4 (and notes compatibility with R4B), the initial implementation focuses on R4. R5/R6 support follows the same pattern — the AuditEvent type changes, but the AuditSink trait and backend implementations are identical because they operate on serde-serializable types.

Builder Pattern for BALP Profiles

The AuditInterceptor provides builder functions for constructing BALP-conformant AuditEvent instances. These fill in the required coded values for each BALP pattern:

use helios_fhir::r4::{
    AuditEvent, AuditEventAgent, AuditEventEntity, AuditEventSource,
    Coding, Code, Instant, Reference,
};

/// Builders for BALP-conformant AuditEvent instances.
pub struct BalpAuditEventBuilder;

impl BalpAuditEventBuilder {
    /// Build a BALP "Basic AuditEvent for a successful Read" with known Patient.
    ///
    /// Populates:
    /// - type: rest (audit-event-type)
    /// - subtype: read (restful-interaction)
    /// - action: R
    /// - outcome: 0 (Success)
    /// - agent[0]: Source Role ID (client)
    /// - agent[1]: Destination Role ID (server)
    /// - entity[0]: The resource read
    /// - entity[1]: The patient subject
    pub fn read_with_patient(
        client_addr: &str,
        server_url: &str,
        resource_ref: &str,
        patient_ref: &str,
        recorded: Instant,
    ) -> AuditEvent {
        AuditEvent {
            r#type: Coding {
                system: Some("http://terminology.hl7.org/CodeSystem/audit-event-type".into()),
                code: Some("rest".into()),
                display: Some("RESTful Operation".into()),
                ..Default::default()
            },
            subtype: Some(vec![Coding {
                system: Some("http://hl7.org/fhir/restful-interaction".into()),
                code: Some("read".into()),
                display: Some("read".into()),
                ..Default::default()
            }]),
            action: Some(Code::from("R")),
            recorded,
            outcome: Some(Code::from("0")),
            outcome_desc: Some("200 OK".into()),
            agent: Some(vec![
                AuditEventAgent {
                    r#type: Some(Self::source_role_coding()),
                    who: Some(Reference {
                        display: Some(client_addr.into()),
                        ..Default::default()
                    }),
                    requestor: Some(false),
                    network: Some(Self::network(client_addr, "2")),
                    ..Default::default()
                },
                AuditEventAgent {
                    r#type: Some(Self::destination_role_coding()),
                    who: Some(Reference {
                        display: Some(server_url.into()),
                        ..Default::default()
                    }),
                    requestor: Some(false),
                    network: Some(Self::network(server_url, "5")),
                    ..Default::default()
                },
            ]),
            source: AuditEventSource {
                site: Some(server_url.into()),
                observer: Reference {
                    display: Some(server_url.into()),
                    ..Default::default()
                },
                r#type: Some(vec![Coding {
                    system: Some(
                        "http://terminology.hl7.org/CodeSystem/security-source-type".into(),
                    ),
                    code: Some("4".into()),
                    display: Some("Application Server".into()),
                    ..Default::default()
                }]),
                ..Default::default()
            },
            entity: Some(vec![
                AuditEventEntity {
                    what: Some(Reference {
                        reference: Some(resource_ref.into()),
                        ..Default::default()
                    }),
                    r#type: Some(Coding {
                        system: Some(
                            "http://terminology.hl7.org/CodeSystem/audit-entity-type".into(),
                        ),
                        code: Some("2".into()),
                        display: Some("System Object".into()),
                        ..Default::default()
                    }),
                    ..Default::default()
                },
                AuditEventEntity {
                    what: Some(Reference {
                        reference: Some(patient_ref.into()),
                        ..Default::default()
                    }),
                    r#type: Some(Coding {
                        system: Some(
                            "http://terminology.hl7.org/CodeSystem/audit-entity-type".into(),
                        ),
                        code: Some("1".into()),
                        display: Some("Person".into()),
                        ..Default::default()
                    }),
                    role: Some(Coding {
                        system: Some(
                            "http://terminology.hl7.org/CodeSystem/object-role".into(),
                        ),
                        code: Some("1".into()),
                        display: Some("Patient".into()),
                        ..Default::default()
                    }),
                    ..Default::default()
                },
            ]),
            ..Default::default()
        }
    }

    // Similar builders for: create, update, delete, query,
    // each with _with_patient and _without_patient variants.
    // ...
}

This approach lets callers construct BALP-conformant events with a single function call, while retaining the ability to build arbitrary AuditEvent instances for non-RESTful events (login, logout, configuration changes, etc.) by constructing the struct directly.

The Sink Trait: `AuditSink`

The AuditSink trait is the core abstraction. Every backend implements this trait. The server holds a single Arc<dyn AuditSink> which is injected at startup based on configuration.

use std::sync::Arc;
use helios_fhir::r4::AuditEvent;

/// The core audit sink trait.
///
/// All backend implementations must implement this trait.
/// Methods are async and infallible — failures are handled internally
/// (logged via `tracing`, metrics incremented) but never propagated
/// to the caller. This ensures audit logging cannot degrade FHIR
/// server request processing.
#[async_trait::async_trait]
pub trait AuditSink: Send + Sync + 'static {
    /// Record a single audit event.
    ///
    /// Implementations should be non-blocking and best-effort.
    /// If the sink cannot accept the event (buffer full, connection
    /// lost, etc.), it should log a warning and return. It must
    /// not panic.
    async fn record(&self, event: AuditEvent);

    /// Record a batch of audit events (e.g., from a Bundle).
    ///
    /// Default implementation calls `record` in sequence.
    /// Backends may override for more efficient batch operations.
    async fn record_batch(&self, events: Vec<AuditEvent>) {
        for event in events {
            self.record(event).await;
        }
    }

    /// Flush any buffered events.
    ///
    /// Called during graceful shutdown. Implementations should
    /// make a best-effort attempt to deliver buffered events
    /// before returning.
    async fn flush(&self);

    /// Human-readable name for logging/diagnostics.
    fn name(&self) -> &str;
}

/// Factory function — constructs the appropriate sink from configuration.
pub async fn create_sink(
    config: &AuditConfig,
    fhir_storage: Arc<dyn StorageBackend>,
) -> Arc<dyn AuditSink> {
    match config.backend {
        AuditBackend::None => Arc::new(NullSink),
        AuditBackend::File { ref path } => {
            Arc::new(FileSink::new(path).await
                .expect("failed to open audit log file"))
        }
        AuditBackend::Database => {
            if let Some(ref audit_db_url) = config.database_url {
                // Dedicated audit database — recommended for production.
                Arc::new(DatabaseSink::dedicated(audit_db_url).await
                    .expect("failed to connect dedicated audit database"))
            } else {
                // Shared database — warns at startup.
                Arc::new(DatabaseSink::shared(fhir_storage).await)
            }
        }
        AuditBackend::Kafka { ref brokers, ref topic } => {
            Arc::new(KafkaSink::new(brokers, topic).await
                .expect("failed to connect to Kafka"))
        }
        AuditBackend::Sqs { ref queue_url } => {
            Arc::new(SqsSink::new(queue_url).await
                .expect("failed to initialize SQS client"))
        }
    }
}

Design Notes

async fn record(&self, event: AuditEvent) takes ownership. The caller constructs the AuditEvent and hands it off. Since AuditEvent implements Clone, callers that need to retain a copy can clone before calling.

Infallible by design. The method returns (), not Result. This is a deliberate choice matching the fire-and-forget semantics. Sink implementations handle errors internally — logging them via tracing::warn! and incrementing a failure counter exposed via metrics. This ensures a flaky Kafka broker cannot cascade into FHIR API failures.

flush() for graceful shutdown. Async sinks that buffer events (file sink with write batching, Kafka producer with linger) need a way to drain their buffers on shutdown. The server's shutdown handler calls sink.flush().await before exiting.

record_batch() for Bundles. When processing a FHIR transaction or batch Bundle, HFS generates one AuditEvent per entry plus one for the Bundle as a whole. record_batch allows backends like Kafka to use a single produce request for the batch, reducing round-trips.

Backend Implementations

`NullSink` — No Logging

For local development. All methods are no-ops. The compiler should be able to optimize these away entirely when monomorphized.

pub struct NullSink;

#[async_trait::async_trait]
impl AuditSink for NullSink {
    async fn record(&self, _event: AuditEvent) {
        // Intentionally empty.
    }

    async fn flush(&self) {
        // Nothing to flush.
    }

    fn name(&self) -> &str {
        "null"
    }
}

`FileSink` — Append-Only NDJSON File

Writes one JSON-serialized AuditEvent per line to an append-only file. Uses NDJSON (newline-delimited JSON) format — each line is a valid FHIR AuditEvent JSON resource, directly ingestible by log aggregation tools and parseable by any FHIR-aware consumer.

Because helios_fhir::r4::AuditEvent implements serde::Serialize, the serialized output is FHIR-conformant JSON. This means file sink output and the FHIR REST API serve the same representation — no format divergence.

use tokio::fs::OpenOptions;
use tokio::io::{AsyncWriteExt, BufWriter};
use tokio::sync::Mutex;

pub struct FileSink {
    writer: Mutex<BufWriter<tokio::fs::File>>,
}

impl FileSink {
    pub async fn new(path: &std::path::Path) -> std::io::Result<Self> {
        let file = OpenOptions::new()
            .create(true)
            .append(true)
            .open(path)
            .await?;
        Ok(Self {
            writer: Mutex::new(BufWriter::new(file)),
        })
    }
}

#[async_trait::async_trait]
impl AuditSink for FileSink {
    async fn record(&self, event: AuditEvent) {
        let line = match serde_json::to_string(&event) {
            Ok(json) => json,
            Err(e) => {
                tracing::warn!(error = %e, "failed to serialize AuditEvent");
                return;
            }
        };

        let mut writer = self.writer.lock().await;
        if let Err(e) = writer.write_all(line.as_bytes()).await {
            tracing::warn!(error = %e, "failed to write AuditEvent to file");
            return;
        }
        if let Err(e) = writer.write_all(b"\n").await {
            tracing::warn!(error = %e, "failed to write newline to audit file");
            return;
        }
        // Flush on each write for durability. For higher throughput,
        // this could be changed to periodic flushing.
        let _ = writer.flush().await;
    }

    async fn flush(&self) {
        let mut writer = self.writer.lock().await;
        let _ = writer.flush().await;
    }

    fn name(&self) -> &str {
        "file"
    }
}

File rotation is intentionally deferred. The initial implementation writes to a single file. Rotation can be added via tracing-appender-style rolling, or delegated to the operating system (e.g., logrotate on Linux). This keeps the initial implementation simple.

`DatabaseSink` — PostgreSQL / SQLite / S3

Persists AuditEvent resources to the same storage backends HFS already supports: PostgreSQL, SQLite, and S3. The database sink is the only backend that directly enables the read-side FHIR API (GET /AuditEvent), because the data is stored in a queryable format alongside other FHIR resources.

Since the AuditEvent struct is the same type used throughout HFS's resource storage layer, persisting it requires no conversion — it is stored exactly as any other FHIR resource would be.

Separate vs. Shared Database

The DatabaseSink supports two configurations:

Dedicated audit database (audit.database_url is set) — The audit sink connects to a separate database instance from the main FHIR resource storage. This is the recommended configuration for production deployments. Isolating audit data from clinical data provides independent scaling, independent backup/retention policies, and ensures that audit write load does not contend with FHIR API read/write operations.
Shared database (audit.database_url is omitted) — The audit sink reuses the same StorageBackend instance as the FHIR server. This is convenient for development and testing, but at startup the server will emit a warning:
```
WARN hfs_audit: Audit backend is sharing the FHIR server database.
    This is not recommended for production use. Set `audit.database_url`
    to configure a dedicated audit database.
```
This warning is suppressed when a dedicated audit.database_url is configured.

use crate::storage::StorageBackend;

pub struct DatabaseSink {
    storage: Arc<dyn StorageBackend>,
}

impl DatabaseSink {
    /// Create a DatabaseSink using a dedicated audit storage backend.
    pub async fn dedicated(database_url: &str) -> Result<Self, StorageError> {
        let storage = create_storage_backend(database_url).await?;
        Ok(Self { storage })
    }

    /// Create a DatabaseSink sharing the FHIR server's storage backend.
    ///
    /// Emits a warning — not recommended for production use.
    pub async fn shared(storage: Arc<dyn StorageBackend>) -> Self {
        tracing::warn!(
            "Audit backend is sharing the FHIR server database. \
             This is not recommended for production use. Set \
             `audit.database_url` to configure a dedicated audit database."
        );
        Self { storage }
    }
}

#[async_trait::async_trait]
impl AuditSink for DatabaseSink {
    async fn record(&self, event: AuditEvent) {
        let resource_json = match serde_json::to_value(&event) {
            Ok(json) => json,
            Err(e) => {
                tracing::warn!(error = %e, "failed to serialize AuditEvent");
                return;
            }
        };

        if let Err(e) = self.storage.create("AuditEvent", resource_json).await {
            tracing::warn!(
                error = %e,
                audit_id = ?event.id,
                "failed to persist AuditEvent to database"
            );
        }
    }

    async fn flush(&self) {
        // Database writes are immediate; nothing to flush.
    }

    fn name(&self) -> &str {
        "database"
    }
}

`KafkaSink` — Apache Kafka

Produces AuditEvent resources to a Kafka topic as JSON-serialized messages. Uses rdkafka (Rust bindings for librdkafka) behind the kafka feature flag.

#[cfg(feature = "kafka")]
pub struct KafkaSink {
    producer: rdkafka::producer::FutureProducer,
    topic: String,
}

#[cfg(feature = "kafka")]
impl KafkaSink {
    pub async fn new(brokers: &str, topic: &str) -> Result<Self, rdkafka::error::KafkaError> {
        use rdkafka::ClientConfig;
        use rdkafka::producer::FutureProducer;

        let producer: FutureProducer = ClientConfig::new()
            .set("bootstrap.servers", brokers)
            .set("message.timeout.ms", "5000")
            .create()?;

        Ok(Self {
            producer,
            topic: topic.to_string(),
        })
    }
}

#[cfg(feature = "kafka")]
#[async_trait::async_trait]
impl AuditSink for KafkaSink {
    async fn record(&self, event: AuditEvent) {
        let key = event.id.clone().unwrap_or_default();
        let payload = match serde_json::to_string(&event) {
            Ok(json) => json,
            Err(e) => {
                tracing::warn!(error = %e, "failed to serialize AuditEvent for Kafka");
                return;
            }
        };

        use rdkafka::producer::FutureRecord;
        let record = FutureRecord::to(&self.topic)
            .key(&key)
            .payload(&payload);

        if let Err((e, _)) = self.producer.send(record, std::time::Duration::from_secs(1)).await {
            tracing::warn!(
                error = %e,
                audit_id = %key,
                "failed to produce AuditEvent to Kafka"
            );
        }
    }

    async fn flush(&self) {
        self.producer.flush(std::time::Duration::from_secs(5)).unwrap_or(());
    }

    fn name(&self) -> &str {
        "kafka"
    }
}

Kafka key strategy: The AuditEvent.id is used as the message key. This distributes events evenly across partitions. Deployments that need patient-level ordering could override this to use the patient reference as key — this is a future consideration.

`SqsSink` — AWS Simple Queue Service

Sends AuditEvent resources to an SQS queue using aws-sdk-sqs, behind the sqs feature flag. SQS is a natural fit for deployments already in the AWS ecosystem, providing managed durability and dead-letter queue support.

#[cfg(feature = "sqs")]
pub struct SqsSink {
    client: aws_sdk_sqs::Client,
    queue_url: String,
}

#[cfg(feature = "sqs")]
impl SqsSink {
    pub async fn new(queue_url: &str) -> Result<Self, aws_sdk_sqs::Error> {
        let config = aws_config::load_defaults(aws_config::BehaviorVersion::latest()).await;
        let client = aws_sdk_sqs::Client::new(&config);

        Ok(Self {
            client,
            queue_url: queue_url.to_string(),
        })
    }
}

#[cfg(feature = "sqs")]
#[async_trait::async_trait]
impl AuditSink for SqsSink {
    async fn record(&self, event: AuditEvent) {
        let body = match serde_json::to_string(&event) {
            Ok(json) => json,
            Err(e) => {
                tracing::warn!(error = %e, "failed to serialize AuditEvent for SQS");
                return;
            }
        };

        if let Err(e) = self.client
            .send_message()
            .queue_url(&self.queue_url)
            .message_body(&body)
            .send()
            .await
        {
            tracing::warn!(
                error = %e,
                audit_id = ?event.id,
                "failed to send AuditEvent to SQS"
            );
        }
    }

    async fn flush(&self) {
        // SQS sends are immediate; nothing to buffer.
    }

    fn name(&self) -> &str {
        "sqs"
    }
}

Feature Flags

Backend dependencies are gated behind Cargo feature flags to avoid pulling heavy dependencies for unused backends:

[features]
default = []
kafka = ["rdkafka"]
sqs = ["aws-sdk-sqs", "aws-config"]

The NullSink, FileSink, and DatabaseSink have no external dependencies beyond what HFS already uses and are always available.

Patient Identity and BALP Profiles

BALP defines two variants for each RESTful audit pattern:

With known Patient subject — The AuditEvent.entity includes a Patient reference with role = "Patient". Used when the operation involves a specific, identifiable patient.
Without Patient subject — Used for operations on non-patient resources (e.g., Practitioner, Organization) or when patient identity cannot be determined.

Our approach: optional patient reference with automatic inference as fallback.

The AuditInterceptor populates the patient entity using a four-step waterfall:

Explicit patient reference. If the operation directly targets a Patient resource (e.g., GET /Patient/123), the patient reference is taken from the resource identity.
Compartment inference. If the operation targets a resource within the Patient compartment (e.g., GET /Observation/456 where Observation.subject → Patient/123), the interceptor traverses the subject/patient reference to infer the patient identity.
Search parameter inference. For search operations with a patient or subject parameter (e.g., GET /Observation?patient=Patient/123), the patient identity is extracted from the search parameters.
No patient. If none of the above applies, the audit event is recorded without a patient entity, using the BALP "without Patient" profile variant.

use helios_fhir::r4::Reference;

impl AuditInterceptor {
    /// Determine the patient identity for a FHIR operation, if any.
    /// Returns a Reference suitable for inclusion in AuditEvent.entity.
    fn infer_patient(&self, context: &RequestContext) -> Option<Reference> {
        // 1. Direct Patient resource access
        if context.resource_type() == "Patient" {
            return context.resource_id().map(|id| Reference {
                reference: Some(format!("Patient/{}", id)),
                ..Default::default()
            });
        }

        // 2. Compartment inference from resource subject/patient field
        if let Some(resource) = context.response_resource() {
            if let Some(patient_ref) = self.extract_patient_reference(resource) {
                return Some(patient_ref);
            }
        }

        // 3. Search parameter inference
        if let Some(patient_param) = context.search_param("patient")
            .or_else(|| context.search_param("subject"))
        {
            return Some(Reference {
                reference: Some(patient_param.to_string()),
                ..Default::default()
            });
        }

        // 4. No patient identity determinable
        None
    }
}

This means the system automatically selects the appropriate BALP profile variant (with or without Patient) based on the operation context. Callers never need to manually specify which variant to use.

Read-Side: AuditEvent as a FHIR Resource

The audit system exposes AuditEvent as a fully searchable FHIR resource through the standard REST API. This enables privacy officers, security teams, and patient-facing applications to query the audit trail. This aligns with the ATNA [ITI-81] transaction for retrieving audit records, and with BALP's explicit use case of providing AuditEvents to authorized consumers.

Supported Interactions

Interaction	Supported	Notes
`read`	Yes	`GET /AuditEvent/{id}`
`search`	Yes	`GET /AuditEvent?{params}`
`create`	No	Audit events are system-generated only
`update`	No	Immutable by design
`delete`	No	Immutable by design

Search Parameters

The following FHIR search parameters will be supported, aligned with the ATNA [ITI-81] transaction:

Parameter	Type	Description
`_id`	token	AuditEvent resource ID
`date`	date	When the event was recorded
`action`	token	CRUD action code
`outcome`	token	Success/failure status
`agent`	reference	Who was involved
`patient`	reference	Patient subject of the event
`entity`	reference	Resource involved in the event
`type`	token	Event type code
`subtype`	token	Event subtype code

Backend Considerations for Read-Side

Only the database sink (PostgreSQL, SQLite, S3) directly enables the read-side API, because the data is stored in HFS's existing FHIR resource storage and indexed for search.

For non-database backends (file, Kafka, SQS), the read-side API will return empty results unless a separate process ingests audit events back into a queryable store. This is an expected deployment pattern: Kafka consumers or SQS processors can write audit events into a dedicated FHIR repository for querying.

Audit-of-Audit

Consistent with BALP guidance, querying AuditEvent resources is itself an auditable event. A GET /AuditEvent search will generate its own AuditEvent record. This creates a recursive but convergent trail — the audit-of-audit uses the "without Patient" profile variant since the entity being accessed is an AuditEvent, not patient data.

Configuration

Audit configuration follows the existing HFS environment variable and configuration file patterns:

# hfs.toml

[audit]
# Backend selection: "none" | "file" | "database" | "kafka" | "sqs"
backend = "database"

# --- Database backend ---
# Optional: dedicated database URL for audit storage.
# When set, audit events are written to this database instead of the
# main FHIR server database. Recommended for production deployments.
# When omitted, the FHIR server's database is reused (with a warning).
# Supports the same connection string formats as HFS_DATABASE_URL.
# database_url = "postgresql://audit_user:pass@localhost:5432/fhir_audit"

# --- File backend ---
# Path to the append-only NDJSON audit log file.
# file_path = "./audit/audit.log"

# --- Kafka backend (requires `kafka` feature) ---
# kafka_brokers = "localhost:9092"
# kafka_topic = "hfs-audit-events"

# --- SQS backend (requires `sqs` feature) ---
# Uses AWS credential provider chain (env vars, profile, IMDS).
# sqs_queue_url = "https://sqs.us-east-1.amazonaws.com/123456789/hfs-audit"

# --- Filtering ---
# Exclude specific request patterns from audit logging.
# [[audit.exclude]]
# path = "/metadata"
# method = "GET"
#
# [[audit.exclude]]
# path = "/_health"

Environment variable overrides follow the HFS_ prefix convention:

HFS_AUDIT_BACKEND=kafka
HFS_AUDIT_KAFKA_BROKERS=broker1:9092,broker2:9092
HFS_AUDIT_KAFKA_TOPIC=fhir-audit

# For dedicated audit database:
HFS_AUDIT_DATABASE_URL=postgresql://audit_user:pass@audit-db:5432/fhir_audit

Filtering and Exclusions

Not every request warrants an audit event. Health checks, capability statements, and other infrastructure endpoints generate noise without security or privacy value. RFC 3881 explicitly anticipates this, noting that policy-based methods should be employed to optimize data gathering, including selective auditing of only events defined as important.

HFS supports configurable exclusion rules:

/// Rule for excluding requests from audit logging.
#[derive(Debug, Clone, serde::Deserialize)]
pub struct AuditExclusionRule {
    /// URL path pattern. Supports trailing `*` wildcard.
    /// Examples: "/metadata", "/Patient/*", "/$*"
    pub path: String,

    /// HTTP method filter. None matches all methods.
    /// Supports `|` delimiter for multiple methods: "GET|HEAD"
    pub method: Option<String>,
}

impl AuditExclusionRule {
    /// Returns true if this rule matches the given request.
    pub fn matches(&self, request_path: &str, request_method: &str) -> bool {
        let path_matches = if self.path.ends_with('*') {
            request_path.starts_with(self.path.trim_end_matches('*'))
        } else {
            request_path == self.path
        };

        let method_matches = match &self.method {
            None => true,
            Some(m) if m == "*" || m.is_empty() => true,
            Some(m) => m.split('|').any(|v| v.eq_ignore_ascii_case(request_method)),
        };

        path_matches && method_matches
    }
}

Default exclusions (applied unless overridden):

GET /metadata — CapabilityStatement requests
GET /_health — Health check endpoint
GET /.well-known/* — SMART configuration discovery

Open Questions

We welcome feedback on the following design decisions. Please comment below or open a linked discussion for deeper dives.

1. Wire Format for Non-Database Backends

Since we use helios_fhir::AuditEvent directly and it implements serde::Serialize, the file, Kafka, and SQS backends automatically emit FHIR-conformant JSON. This means any consumer can parse the output as a FHIR resource. Is there a use case where a non-FHIR serialization format would be preferred (e.g., a compact binary encoding for high-throughput Kafka deployments)?

2. Kafka Message Key Strategy

Using AuditEvent.id as the Kafka key distributes messages evenly across partitions. However, some consumers may prefer patient-level ordering (all events for Patient/123 on the same partition). Should the Kafka key be configurable — defaulting to event ID but optionally set to patient reference?

3. File Rotation Strategy

The initial FileSink writes to a single file. Options for rotation include: built-in daily rotation (time-based rolling), size-based rotation, or delegation to the OS via logrotate. Which approach fits our users best?

4. Batch Audit Events for Bundles

When processing a FHIR Bundle, we generate one AuditEvent per Bundle entry plus one for the Bundle as a whole (with action E for Execute). If a transaction Bundle fails and all actions are reverted, the AuditEvent records are still persisted (with failure outcome codes). Is this the correct behavior, or should failed transaction Bundles generate only a single failure AuditEvent?

5. AuditEvent Resource Retention

Audit logs grow indefinitely. Should we provide a built-in retention/archival mechanism (e.g., purge AuditEvents older than N days via a background task), or is this purely an operational concern left to database maintenance? Note that any retention mechanism must be carefully designed to not violate regulatory retention requirements, which vary by jurisdiction.

6. Fan-Out as Future Work

The current design supports a single active backend. A natural extension is a FanOutSink that wraps multiple sinks:

pub struct FanOutSink {
    sinks: Vec<Arc<dyn AuditSink>>,
}

#[async_trait::async_trait]
impl AuditSink for FanOutSink {
    async fn record(&self, event: AuditEvent) {
        for sink in &self.sinks {
            sink.record(event.clone()).await;
        }
    }
    // ...
}

This is explicitly deferred but architecturally trivial given the trait design. Is single-backend sufficient for the initial release, or is fan-out a hard requirement for any early adopter?

7. Audit Event Integrity / Signatures

Should we support cryptographic signatures on AuditEvent resources to provide tamper evidence? This would involve generating a Provenance resource containing a signature of the complete AuditEvent JSON. Is this a requirement for any initial deployment, or can it be deferred?

References

aacruzgon · 2026-03-25T23:48:15Z

aacruzgon
Mar 25, 2026
Collaborator

Hello Steve,

I found this discussion document very thoughtful and helpful. I think for the wire format I’d keep FHIR JSON as the default for non-database sinks, but I think it could make sense to support an optional alternate wire encoding later for high-throughput transports like Kafka or SQS.

The important constraint, in my view, is that AuditEvent should remain the only canonical model. Any compact binary format should be a lossless serialization of helios_fhir::AuditEvent, not a separate schema. That preserves the RFC’s "use the FHIR model directly / no custom intermediate representation" design while still leaving room for performance optimization if JSON ever becomes a bottleneck.

So I’d lean toward:

default = FHIR JSON
optional = standardized binary codec for transport-heavy sinks
decode back into the same AuditEvent type before read-side exposure or normalization

I probably wouldn’t do this in v1 without benchmark evidence, but it seems like a reasonable extension point to define now.

For the audit event integrity/signature my inclination is to leave retention and archival policy to the deployment/operations layer rather than making HFS enforce it directly. I'm assuming that audit retention requirements are highly jurisdiction- and organization-specific, and application-level purge logic creates compliance risk if misconfigured. I think HFS should make audit data easy to manage operationally, but the actual retention policy should remain operator-controlled.

1 reply

smunini Mar 27, 2026
Maintainer Author

Hi @aacruzgon,

Thanks for the feedback. Fully agree on both points.

Wire format: FHIR JSON default for v1, no binary codec needed. The key is that record() takes the typed AuditEvent struct, so serialization stays a backend concern.
Retention: Agreed, this belongs in the operations layer. HFS will make audit data easy to query and partition, but won't prescribe purge policy. We'll document recommended patterns (pg_partman, S3 lifecycle, etc.) instead.

smunini · 2026-04-01T20:16:46Z

smunini
Apr 1, 2026
Maintainer Author

#55

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Audit Implementation #50

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Audit Implementation #50

Uh oh!

smunini Mar 24, 2026 Maintainer

Summary

Table of Contents

Motivation

Standards Background

The Standards Chain

Why BALP?

Design Principles

Architecture Overview

Data Model: Using helios_fhir::AuditEvent Directly

Why Not a Custom Struct?

FHIR Version Handling

Builder Pattern for BALP Profiles

The Sink Trait: AuditSink

Design Notes

Backend Implementations

NullSink — No Logging

FileSink — Append-Only NDJSON File

DatabaseSink — PostgreSQL / SQLite / S3

Separate vs. Shared Database

KafkaSink — Apache Kafka

SqsSink — AWS Simple Queue Service

Feature Flags

Patient Identity and BALP Profiles

Read-Side: AuditEvent as a FHIR Resource

Supported Interactions

Search Parameters

Backend Considerations for Read-Side

Audit-of-Audit

Configuration

Filtering and Exclusions

Open Questions

1. Wire Format for Non-Database Backends

2. Kafka Message Key Strategy

3. File Rotation Strategy

4. Batch Audit Events for Bundles

5. AuditEvent Resource Retention

6. Fan-Out as Future Work

7. Audit Event Integrity / Signatures

References

Replies: 2 comments · 1 reply

Uh oh!

aacruzgon Mar 25, 2026 Collaborator

Uh oh!

smunini Mar 27, 2026 Maintainer Author

Uh oh!

smunini Apr 1, 2026 Maintainer Author

smunini
Mar 24, 2026
Maintainer

Data Model: Using `helios_fhir::AuditEvent` Directly

The Sink Trait: `AuditSink`

`NullSink` — No Logging

`FileSink` — Append-Only NDJSON File

`DatabaseSink` — PostgreSQL / SQLite / S3

`KafkaSink` — Apache Kafka

`SqsSink` — AWS Simple Queue Service

Replies: 2 comments 1 reply

aacruzgon
Mar 25, 2026
Collaborator

smunini Mar 27, 2026
Maintainer Author

smunini
Apr 1, 2026
Maintainer Author