You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If authentication and authorization decide who may access healthcare data, bulk data export decides how much of it can leave the building at one time - and at what pace, and through what door. It is the API that population health platforms, payer-provider data exchanges, registry submissions, research extracts, and AI training pipelines all converge on. CRUD and search are the bread and butter of FHIR, but the moment a workload needs every Observation for every patient in a cohort, it stops looking like a request/response problem and starts looking like a data engineering problem.
This document shares my thoughts on how to approach Bulk Data Export for the Helios FHIR Server. Like the persistence layer discussion and the authentication and authorization discussion, this is an architectural strategy document rather than a comprehensive specification. It explains the motivating direction, the key building blocks, and the Rust trait designs that will shape the export subsystem.
Who should read this? Anyone with an interest in FHIR bulk data interoperability, healthcare analytics infrastructure, or the operational realities of running long-lived, asynchronous jobs alongside a high-throughput FHIR API. Feedback is very much welcome - this is open source, developed in the open, and your perspective matters.
A note on scope: this document covers export only - the FHIR Bulk Data Access IG, specifically the $export family of operations. The companion problem of bulk submit - the inverse direction, taking large NDJSON payloads back into the server - is a separate concern with separate trade-offs. The Argonaut Project's current draft of $bulk-submit is being worked through here; we will publish a separate discussion document for ingestion once that draft stabilizes.
The Lay of the Land: What the Bulk Data Access IG Says
The Bulk Data Access IG defines an asynchronous, manifest-based, NDJSON-over-HTTPS pattern for exporting large volumes of FHIR data. The shape of every export, regardless of scope, is the same:
The client kicks off an export with a $export operation. The server responds immediately with 202 Accepted and a Content-Location header pointing to a status URL.
The client polls the status URL. While the export is in flight, the server keeps responding 202. When the export is complete, the server returns 200 OK with a JSON manifest describing every output file.
The client downloads the output files from the URLs in the manifest. Each file is application/fhir+ndjson - one resource per line, no Bundle wrapper.
Optionally, the client deletes the export (a DELETE on the status URL) when finished, signaling that the server may reclaim the output files.
Three flavors of export sit on top of this same pattern:
System-level - [base]/$export - everything the caller is permitted to see across the entire server.
Patient-level - [base]/Patient/$export - every resource in every patient's compartment that the caller is permitted to see.
Group-level - [base]/Group/[id]/$export - every resource in the compartment of every patient who is a member of the named Group.
The kick-off request accepts a substantial parameter surface. Some of the headline parameters that any compliant implementation must understand:
Parameter
Meaning
_type
Comma-delimited list of resource types to include. Defaults to all types the scope and level permit.
_since
FHIR instant. Only resources whose meta.lastUpdated is at or after this point.
_until
FHIR instant. Only resources whose meta.lastUpdated is at or before this point.
_typeFilter
A FHIR REST search expression applied to a single resource type (e.g. MedicationRequest?status=active). May be repeated.
_outputFormat
The output media type. application/fhir+ndjson is required; abbreviated application/ndjson and ndjson must also be accepted.
_elements
Comma-delimited element paths to include. The server should mark subsetted resources with the SUBSETTED tag.
includeAssociatedData
Hints for related resources to include (e.g. LatestProvenanceResources).
organizeOutputBy
Reorganize output by instances of a particular resource type, using Parameters header blocks per group.
allowPartialManifests
Permit the server to publish a manifest with link[] pagination before all output is finished.
patient (POST only)
Restrict the export to a list of Patient references.
The manifest returned at completion has a fixed schema:
Three fields are non-obvious and worth highlighting up front:
transactionTime is the server's frozen wall-clock at the moment the export was started. Every resource in the output must reflect server state as of that instant. This is the anchor that lets clients implement incremental sync correctly using _since on the next run.
requiresAccessToken is a hint to the client about how to fetch the output files. If true, the URLs require the same Authorization: Bearer ... token that authorized the kick-off. If false, the URLs are pre-signed (or otherwise capability-based) and the client SHALL NOT send a token. The decision is the server's; both modes are valid.
error is not a status indicator. It is a list of NDJSON files containing FHIR OperationOutcome resources, one per line, describing per-resource-type failures that did not cause the entire job to fail. An export with a populated error array still finished 200 OK from a workflow perspective.
Authorization for bulk operations sits squarely on SMART Backend Services, with scopes of the form system/Patient.rs, system/*.rs, and similar. That story is told in detail in Discussion #45 and shipped today; we will not re-derive it here. What this document does assume from that work is that the auth layer produces a RequestContext containing a validated Principal with a ScopeSet, and that this context flows into every export handler intact.
The Essential Flow
In words and then in pictures.
Kick-off. The client sends GET /Group/cohort-1/$export?_type=Patient,Observation&_since=2026-01-01T00:00:00Z with Accept: application/fhir+json and Prefer: respond-async and a Bearer token. The server validates the token, parses parameters, opens an export job in shared state, returns 202 Accepted with Content-Location: https://fhir.example.org/export-status/abc. The handler does no actual data extraction in line; it returns within milliseconds.
Polling. The client polls GET /export-status/abc periodically (honoring Retry-After). While the job runs, the server returns 202 Accepted, optionally with an X-Progress header carrying a free-form message. When the job is finished, the server returns 200 OK with the JSON manifest above. The client now has every URL it needs to fetch the data.
Download. The client fetches each output[].url in parallel. The server streams each file as application/fhir+ndjson, optionally Content-Encoding: gzip. The number of files, their sizes, and the order are server-chosen.
Cleanup. When the client is done - or whenever it decides to abandon the export - it sends DELETE /export-status/abc. The server returns 202 Accepted. The files may now be reclaimed. The status URL begins returning 404 Not Found.
Notice three things in that diagram. The kick-off handler does no extraction. The polling client may land on a different HFS instance than the one that received the kick-off - the status URL must work regardless. The download path is not necessarily served by HFS at all; if the manifest's requiresAccessToken is false, the URLs may point directly at the object store.
These are not implementation details. They are the architectural premises the rest of this document is responding to.
The Architectural Tensions
Before we get to traits, it is worth naming the tensions that the design has to resolve. Bulk export is not a single piece of code; it is a system, and the system has to hold together under four pressures simultaneously.
Long-running work behind a short-lived HTTP request. The kick-off responds 202 in milliseconds. The job behind it may run for minutes, hours, or - for the largest population-level pulls - long enough to outlive the process that started it. The handler and the worker cannot be the same thing.
State that outlives a process. Job status, cursors, manifests, output files - all of it has to survive process restarts, deploys, and crashes. There is no "in-memory only" version of an export that is also production-grade. Whatever state we keep must be durable from the first call to start_export().
One server versus many. A small clinic might run HFS as a single process on a single VM. A national exchange will run HFS as a fleet of pods behind a load balancer, scaled to traffic. The kick-off, the status polls, and the file downloads will land on whatever instance the load balancer picks at the moment. Job state cannot live in one instance's memory if any other instance might field the next request.
The download endpoint is a fileserver. Once a job is done, every output URL is a sustained GET. Megabytes per file, gigabytes per export, hundreds of files in the manifest. That is a fundamentally different workload from "look up a Patient by ID and return JSON" - it is hot-path bandwidth, not request/response latency. Bolting it onto the same Tokio executor that fields Patient.read queries is workable in a single-instance deployment and a mistake at scale.
The design that follows pulls these four tensions apart cleanly, so each one is solved by a single, replaceable abstraction.
Single-Instance vs Multi-Instance: A Tale of Two Deployments
HFS has always tried to scale down as well as it scales up. The persistence layer ships with a zero-config SQLite default; the same trait surface accepts PostgreSQL, MongoDB, Elasticsearch, and S3. Bulk export follows the same philosophy: the same traits serve both the single-VM clinic install and the multi-pod cloud deployment, and the operator decides at startup which concrete implementations to wire in.
Single-instance: zero-config
The simplest possible export deployment is a single HFS process, running on a single VM, writing job state into the SQLite database it already manages, and writing output files to the local filesystem under ${HFS_DATA_DIR}/exports/{job_id}/. The worker that performs the extraction is a Tokio task spawned from the same process; the polling and download handlers serve the local SQLite row and the local file directly.
This is fine for clinics, single-tenant deployments, demos, conformance testing, and CI. It works without an external job queue, without object storage, without a network filesystem. The trade-off is that you cannot horizontally scale HFS - the moment a second pod appears behind the load balancer, status polls will start landing on the wrong instance, and the design falls apart.
Multi-instance: shared state, work pool
The horizontally scaled deployment splits responsibilities cleanly:
Job state lives in a shared transactional store. Every HFS instance reads and writes the same bulk_export_jobs table. Status polls work from any instance because every instance is looking at the same row.
Output files live in object storage. Every HFS instance can serve any file URL because they all point at the same bucket - or, in the requiresAccessToken: false case, the client downloads pre-signed URLs directly from the object store and HFS is not in the path at all.
Workers are co-located with HFS pods (the default) or run as a separate hfs-exporter binary against the same shared state (an option discussed later). Workers claim jobs out of the shared store using a leasing pattern, so adding or removing workers requires no coordination beyond the shared store itself.
The cardinal architectural rule is that the same code path serves both topologies. The BulkExportStorage trait is implemented by an embedded SQLite backend for single-instance and a PostgreSQL backend for multi-instance. The ExportOutputStore trait is implemented by a local-FS backend for single-instance and an S3 backend for multi-instance. The handler does not know which one is wired up. The worker does not know which one is wired up. Only the bootstrap code, reading environment variables, knows.
The Recommendation: PostgreSQL for Job State, S3 for Output, In-Process Workers
There is a tension between offering a menu of options and recommending a default. Discussion #45 leaned toward recommendations - JwksBearerAuthProvider as the default token validator, with IntrospectionAuthProvider as the fallback for opaque tokens. We take the same posture here.
For multi-instance job state, the default is PostgreSQL. PostgreSQL is already a supported HFS primary store, so adopting it for export state adds no new operational dependency for the common case. SELECT ... FOR UPDATE SKIP LOCKED is the canonical pattern for transactional job queuing in Rust ecosystems (sqlx, tokio-postgres, every Sidekiq-style job library); it handles worker fail-over, lease expiry, and at-least-once delivery without an external broker. The bulk_export_jobs table is small, write-amplified per heartbeat, and bounded in size by output retention - it does not threaten the resource store's hot path.
For multi-instance output storage, the default is S3-compatible object storage. This is the same S3 backend the persistence layer already ships, with the same AwsS3Client and the same keyspace conventions. Output keys are scoped under /{tenant}/exports/{job_id}/{resource_type}-{part}.ndjson. The manifest publishes pre-signed URLs with a configurable TTL, so the manifest's requiresAccessToken is false and the client downloads directly from S3 without HFS in the bandwidth path. For deployments that want to keep token-based access (audit-heavy environments, environments without a CDN), the same files are streamable through HFS's own download handler with requiresAccessToken: true instead - configuration, not code.
For execution, the default is an embedded worker pool. Workers run in the same process as the HFS REST API by default. This keeps the operational surface small: one binary, one deployment, one set of logs and metrics. A configurable HFS_BULK_EXPORT_WORKER_CONCURRENCY limits how many jobs each pod runs at once, and HFS_BULK_EXPORT_DISABLE_LOCAL_WORKER=true lets operators turn off in-pod workers entirely when they want to dedicate request-serving capacity. The optional hfs-exporter binary, discussed later, addresses the cases where worker isolation needs to be physical, not just configurational.
Now the vendor-style walkthroughs, in the same shape as the IdP integration section of Discussion #45.
PostgreSQL (recommended default for job state)
How it connects. The same HFS_DATABASE_URL that drives the persistence layer. The export subsystem adds two tables - bulk_export_jobs and bulk_export_outputs - alongside the existing resource schema. No new connection pool, no new credentials, no new operational surface.
Trade-offs. PostgreSQL is durable, transactional, and well understood. SELECT ... FOR UPDATE SKIP LOCKED makes multi-worker claiming straightforward and safe. The cost is that every status poll is a query against PostgreSQL, which on a hot system means tuning indexes on (tenant_id, status, lease_expiry) and accepting that very-high-poll-rate workloads (more than a few thousand polls per second per tenant) will eventually want a caching layer in front.
How it connects. A standard Redis or Redis Cluster endpoint. Job records are hashes keyed by job ID; an indexed sorted-set holds pending jobs ordered by enqueue time; claim is BLMOVE from pending to in-flight-{worker_id} lists with a TTL-backed lease.
Trade-offs. Redis makes status polls trivially fast - a single HGETALL is under a millisecond - and the claim semantics are clean. The cost is durability: a Redis crash without AOF persistence can lose in-flight job state. For exports, the worst-case impact is a job that has to be restarted; cursors live in PostgreSQL via the same BulkExportStorage trait, so re-running is mostly idempotent, but operators who run Redis as a cache rather than a primary store should think twice. Best fit: deployments that already operate Redis as a hot path and want polling latency under a millisecond.
DynamoDB / Cosmos DB / Spanner (cloud-managed equivalents)
How they connect. Each cloud's identity model. DynamoDB via the AWS SDK; Cosmos DB via the Azure SDK; Spanner via Google Cloud credentials. Each is a BulkExportStorage implementation that mirrors the PostgreSQL pattern but uses conditional writes (DynamoDB: ConditionExpression, Cosmos: ETag preconditions, Spanner: read-modify-write transactions) in place of SKIP LOCKED.
Trade-offs. Managed durability and global replication, at the cost of additional integration code per provider and per-call billing. The same caveats as the IdP discussion in #45 apply: every cloud has its own claim-name and capability quirks; the abstraction has to absorb them.
These are not first-tier targets for the initial implementation, but the trait design must not preclude them. Anyone running HFS purely on a single cloud will eventually want them.
Kafka / NATS JetStream (workers physically separate from request handlers)
How they connect. Kafka topics or JetStream streams act as the work queue; HFS publishes a job-created event on kick-off, and a separate hfs-exporter binary consumes the topic. Job state still lives in PostgreSQL (or wherever the BulkExportStorage impl points), so status polls and downloads do not touch the broker.
Trade-offs. This is the "we run our exporters on a different node pool because they're bandwidth-heavy" case. Adds a broker as an operational dependency, but lets you scale request handlers and exporters independently, and gives you explicit at-least-once delivery semantics with offsets. Best fit: fleets large enough that the export workload visibly distorts request-serving capacity.
S3-compatible object storage (recommended default for output files)
How it connects. The same AwsS3Client the persistence layer's S3 backend uses, configured via the standard AWS credential chain. Output files are uploaded as multipart objects under /{tenant}/exports/{job_id}/{resource_type}-{part}.ndjson. The manifest publishes pre-signed GET URLs with a TTL configured by HFS_BULK_EXPORT_FILE_URL_TTL.
Trade-offs. Object storage is the right tool for the job - massively parallel reads, transparent CDN integration, region-redundant durability, lifecycle policies for automatic expiry. The only meaningful cost is that pre-signed URLs reveal that something exists at this URL until this expiry, which some audit regimes treat as out of band. Those deployments switch HFS_BULK_EXPORT_REQUIRES_ACCESS_TOKEN=true, the manifest reports requiresAccessToken: true, and downloads flow through HFS's own handler instead.
Cloudflare R2 / Google Cloud Storage / MinIO (S3-compatible drop-ins)
R2, GCS (via interop), and MinIO all speak the S3 API. The same AwsS3Client works against each with no code change, only endpoint_url and force_path_style configuration adjustments. We will document a docker compose example with MinIO as part of the development environment so contributors can exercise the multi-instance path without an AWS account.
Local filesystem (single-instance only)
How it connects. Files are written to ${HFS_DATA_DIR}/exports/{tenant_id}/{job_id}/{resource_type}-{part}.ndjson. The download handler serves them via tokio::fs::File and Axum's streaming body.
Trade-offs. No external dependencies; perfect for development and single-VM deployments. Not safe in a multi-instance topology because the writing instance and the reading instance may differ. A shared NFS mount makes this technically work across instances, but it is brittle (lock semantics, cache coherency, fsync surprises) and we do not recommend it.
Designing the Rust Traits
The persistence crate already carries most of the building blocks. The export module at crates/persistence/src/core/bulk_export.rs defines the types and traits below; the S3 backend implements them today, and the embedded SQLite and PostgreSQL backends will follow. We present the existing surface first - so readers know what already exists - and then propose the additions that this discussion is centrally about.
The Existing Surface: Types
The vocabulary of an export job. These are stable; nothing in this document proposes changing them.
/// Unique identifier for an export job.#[derive(Debug,Clone,PartialEq,Eq,Hash,Serialize,Deserialize)]pubstructExportJobId(String);/// Status of an export job.#[derive(Debug,Clone,Copy,PartialEq,Eq,Serialize,Deserialize)]#[serde(rename_all = "lowercase")]pubenumExportStatus{/// Job has been accepted but not yet started processing.Accepted,/// Job is currently processing.InProgress,/// Job has completed successfully.Complete,/// Job failed with an error.Error,/// Job was cancelled by the user.Cancelled,}/// Level at which the export is being performed.#[derive(Debug,Clone,PartialEq,Eq,Serialize,Deserialize)]#[serde(rename_all = "lowercase")]pubenumExportLevel{/// System-level export (`[base]/$export`).System,/// Patient-level export (`[base]/Patient/$export`).Patient,/// Group-level export (`[base]/Group/[id]/$export`).Group{group_id:String},}/// A type filter for the export request.////// Type filters allow specifying FHIR search parameters that should be applied/// when exporting a specific resource type.#[derive(Debug,Clone,PartialEq,Eq,Serialize,Deserialize)]pubstructTypeFilter{pubresource_type:String,pubquery:String,}/// Request parameters for starting an export job.#[derive(Debug,Clone,Serialize,Deserialize)]pubstructExportRequest{publevel:ExportLevel,pubresource_types:Vec<String>,pubsince:Option<DateTime<Utc>>,pubuntil:Option<DateTime<Utc>>,pubtype_filters:Vec<TypeFilter>,pubelements:Vec<String>,pubinclude_associated_data:Vec<String>,puboutput_format:String,pubbatch_size:u32,// ... builder methods (with_types, with_batch_size, with_type_filter, ...)}
ExportProgress and its per-type companion TypeExportProgress round out the model. TypeExportProgress carries the cursor state that lets a worker resume mid-job after a crash - the same cursor type that ExportDataProvider::fetch_export_batch accepts and returns.
/// Progress for a single resource type within an export.#[derive(Debug,Clone,Serialize,Deserialize)]pubstructTypeExportProgress{pubresource_type:String,pubtotal_count:Option<u64>,pubexported_count:u64,pubcursor:Option<String>,pubcompleted:bool,}/// Overall progress for an export job.#[derive(Debug,Clone,Serialize,Deserialize)]pubstructExportProgress{pubjob_id:ExportJobId,pubstatus:ExportStatus,pubtransaction_time:DateTime<Utc>,pubper_type:Vec<TypeExportProgress>,pubmessage:Option<String>,puberror:Option<String>,}
ExportManifest and ExportOutputFile model the terminal manifest that the status endpoint serves at 200 OK. NdjsonBatch is what data providers produce - one logical batch of NDJSON lines plus a cursor and a "is this the last batch" flag.
/// A descriptor for a single output file in an export manifest.#[derive(Debug,Clone,Serialize,Deserialize)]pubstructExportOutputFile{pubresource_type:String,puburl:String,pubcount:Option<u64>,}/// The terminal manifest for a completed export.#[derive(Debug,Clone,Serialize,Deserialize)]pubstructExportManifest{pubtransaction_time:DateTime<Utc>,pubrequest:String,pubrequires_access_token:bool,puboutput:Vec<ExportOutputFile>,pubdeleted:Vec<ExportOutputFile>,puberror:Vec<ExportOutputFile>,pubmessage:Option<String>,}/// A batch of NDJSON lines produced by an `ExportDataProvider`.#[derive(Debug,Clone)]pubstructNdjsonBatch{publines:Vec<String>,pubnext_cursor:Option<String>,pubis_last:bool,}
The Existing Surface: Job-State Trait
BulkExportStorage is the contract that any backend providing job state implements. Single-instance backends (embedded SQLite) and multi-instance backends (PostgreSQL, Redis) both satisfy this trait. The handler does not care which is wired up.
/// Storage trait for bulk export job management.////// This trait handles the lifecycle of export jobs: creating, tracking,/// completing, and cleaning up exports.#[async_trait]pubtraitBulkExportStorage:Send + Sync{/// Starts a new export job and returns its ID.////// On a multi-instance deployment, this is a single transactional/// insert into the shared store. The job is `Accepted` and waiting/// for a worker to claim it.asyncfnstart_export(&self,tenant:&TenantContext,request:ExportRequest,) -> StorageResult<ExportJobId>;/// Returns the current progress of an export job.////// Called by the status-polling handler. Must succeed regardless of/// which HFS instance the polling client lands on.asyncfnget_export_status(&self,tenant:&TenantContext,job_id:&ExportJobId,) -> StorageResult<ExportProgress>;/// Cancels an in-progress export job.////// Cooperative: the worker observes the cancellation on its next/// status check and unwinds cleanly, leaving partial output that the/// cleanup pass will reclaim.asyncfncancel_export(&self,tenant:&TenantContext,job_id:&ExportJobId,) -> StorageResult<()>;/// Deletes a finished export job and reclaims its output files.asyncfndelete_export(&self,tenant:&TenantContext,job_id:&ExportJobId,) -> StorageResult<()>;/// Returns the terminal manifest for a completed export.asyncfnget_export_manifest(&self,tenant:&TenantContext,job_id:&ExportJobId,) -> StorageResult<ExportManifest>;/// Lists export jobs visible to the tenant.asyncfnlist_exports(&self,tenant:&TenantContext,include_completed:bool,) -> StorageResult<Vec<ExportProgress>>;}
The Existing Surface: Data-Provider Trait Hierarchy
ExportDataProvider is what the worker calls when it needs more resources to write. It is notBulkExportStorage - one provides job lifecycle, the other provides data. In practice the same backend object often implements both (the persistence layer already does), but conceptually they are separate concerns that could live in separate processes.
/// Data provider for export operations.#[async_trait]pubtraitExportDataProvider:Send + Sync{/// Lists resource types available for export, intersected with what/// the request asked for.asyncfnlist_export_types(&self,tenant:&TenantContext,request:&ExportRequest,) -> StorageResult<Vec<String>>;/// Counts resources of a type matching the request filters.////// Used to publish a meaningful `total_count` on `TypeExportProgress`/// when the underlying store can answer the count cheaply.asyncfncount_export_resources(&self,tenant:&TenantContext,request:&ExportRequest,resource_type:&str,) -> StorageResult<u64>;/// Fetches the next batch of resources for the given type.////// The cursor is opaque from the caller's perspective. The provider/// chooses its encoding (page tokens, last-id, last-modified-tuple)/// and the worker passes it back unchanged.asyncfnfetch_export_batch(&self,tenant:&TenantContext,request:&ExportRequest,resource_type:&str,cursor:Option<&str>,batch_size:u32,) -> StorageResult<NdjsonBatch>;}/// Provider for patient compartment exports.#[async_trait]pubtraitPatientExportProvider:ExportDataProvider{asyncfnlist_patient_ids(&self,tenant:&TenantContext,request:&ExportRequest,cursor:Option<&str>,batch_size:u32,) -> StorageResult<(Vec<String>,Option<String>)>;asyncfnfetch_patient_compartment_batch(&self,tenant:&TenantContext,request:&ExportRequest,resource_type:&str,patient_ids:&[String],cursor:Option<&str>,batch_size:u32,) -> StorageResult<NdjsonBatch>;}/// Provider for group-level exports.#[async_trait]pubtraitGroupExportProvider:PatientExportProvider{asyncfnget_group_members(&self,tenant:&TenantContext,group_id:&str,) -> StorageResult<Vec<String>>;asyncfnresolve_group_patient_ids(&self,tenant:&TenantContext,group_id:&str,) -> StorageResult<Vec<String>>;}
The trait hierarchy reflects the spec: every server that can do a Patient export can also do a System export; every server that can do a Group export can also do a Patient export. The compiler enforces the capability ladder.
The Proposal: Output Storage as a First-Class Trait
The traits above answer "what data do we export" and "how is the job tracked". They do not answer "where do the bytes go". Today, each backend that implements BulkExportStorage also implicitly decides where output files live - the S3 backend writes them to S3, a future SQLite backend would write them locally. This works, but it conflates two decisions that operators reasonably want to make independently. A site running PostgreSQL for job state and S3 for output is a perfectly normal configuration. A site running PostgreSQL for both is also reasonable. The current shape does not support that cleanly.
We propose a separate ExportOutputStore trait:
/// Pluggable backend for bulk export output files.////// Implementations decide where NDJSON output is physically stored/// (local filesystem, S3, R2, GCS, Azure Blob, MinIO, etc.) and how/// download URLs are generated for the manifest. The job-state backend/// is unaware - it stores the keys and TTL hints; the output store/// turns keys into URLs and bytes.#[async_trait]pubtraitExportOutputStore:Send + Sync{/// Opens an async writer for a new output part.////// The returned key uniquely identifies this part. Implementations/// SHOULD use `(tenant, job_id, resource_type, part_index)` as the/// natural ordering of keys for ease of cleanup.asyncfnopen_writer(&self,tenant:&TenantContext,job_id:&ExportJobId,resource_type:&str,part_index:u32,) -> StorageResult<ExportPartWriter>;/// Marks a part as finalized and immutable.////// For object stores using multipart upload, this completes the/// upload. For local filesystem, this fsyncs and renames from/// `.tmp` to the final name.asyncfnfinalize_part(&self,tenant:&TenantContext,job_id:&ExportJobId,key:&ExportPartKey,line_count:u64,) -> StorageResult<FinalizedPart>;/// Produces a download URL for a finalized part.////// If the store supports pre-signed URLs, returns a URL that the/// client can fetch directly. Otherwise returns a stable HFS-served/// URL that the download handler will resolve back to a key.fndownload_url(&self,tenant:&TenantContext,job_id:&ExportJobId,key:&ExportPartKey,ttl:Duration,) -> StorageResult<DownloadUrl>;/// Deletes all output parts for a job. Idempotent.////// Called from `BulkExportStorage::delete_export` and from the/// cleanup pass when the output TTL elapses.asyncfndelete_job_outputs(&self,tenant:&TenantContext,job_id:&ExportJobId,) -> StorageResult<()>;}/// A finalized output file as it will appear in the manifest.#[derive(Debug,Clone)]pubstructFinalizedPart{pubkey:ExportPartKey,pubresource_type:String,publine_count:u64,pubsize_bytes:u64,}/// A download URL plus the access posture for the manifest.#[derive(Debug,Clone)]pubstructDownloadUrl{puburl:String,/// `true` if the URL requires the same Bearer token used at kick-off/// (HFS-served streaming). `false` if the URL is pre-signed and the/// client must NOT send a token.pubrequires_access_token:bool,}
Two implementations cover the common cases. LocalFsOutputStore writes under ${HFS_DATA_DIR}/exports/, hands back HFS-served URLs, and sets requires_access_token: true. S3OutputStore writes to a bucket configured via the existing S3BackendConfig, hands back pre-signed URLs, and sets requires_access_token: false (or true if the operator wants to keep downloads on the HFS data path).
The Proposal: Workers, Leases, and the Claim Strategy
Today, the S3 backend's start_export does the work synchronously inside the kick-off handler. That works for the single-instance case but breaks two design goals: it pins the handler thread for the duration of the job, and it gives no way to scale workers independently of request handlers. The handler must return immediately; the work happens elsewhere.
A worker is the runtime that performs an export. Workers may run in-pod (the default) or in a separate hfs-exporter binary. Either way, they share state through BulkExportStorage and they coordinate through a leasing protocol.
/// A lease over a single export job, held by exactly one worker at a time.////// Leases have an expiry; if the worker holding the lease does not/// heartbeat before the expiry, the lease is reclaimable by another/// worker. This is the at-least-once-delivery primitive of the export/// subsystem.#[derive(Debug,Clone)]pubstructExportJobLease{pubjob_id:ExportJobId,pubtenant:TenantContext,pubworker_id:WorkerId,publease_expiry:DateTime<Utc>,pubfencing_token:u64,}/// The runtime that actually performs export work.////// Implementations bind together a `BulkExportStorage` (for job state),/// an `ExportDataProvider` (for resource data), and an `ExportOutputStore`/// (for NDJSON files). The same trait is satisfied by an in-pod worker/// and by the standalone `hfs-exporter` binary.#[async_trait]pubtraitExportWorker:Send + Sync{/// Attempts to claim the next available job for this worker.////// Returns `Ok(None)` if no job is available. The strategy used/// (FOR UPDATE SKIP LOCKED on Postgres, BLMOVE on Redis, mutex on/// in-memory) is encapsulated by the `ExportClaimStrategy` impl/// the worker was constructed with.asyncfnclaim_next(&self,worker_id:&WorkerId,) -> StorageResult<Option<ExportJobLease>>;/// Runs the export job for as long as the lease is valid.////// Performs `fetch_export_batch` in a loop, writes NDJSON to the/// output store, persists progress (cursors, counts) after each/// batch, and heartbeats the lease. On cancellation observed via/// `BulkExportStorage::get_export_status`, unwinds cleanly.asyncfnrun_job(&self,lease:ExportJobLease,) -> StorageResult<JobOutcome>;/// Renews a lease that the worker still holds.////// Called periodically while a job runs. Returns the new expiry,/// or `Err(LeaseLost)` if another worker has already reclaimed/// the job - in which case the current worker MUST stop writing/// to the output store immediately.asyncfnheartbeat(&self,lease:&ExportJobLease,) -> StorageResult<DateTime<Utc>>;/// Releases a lease early (e.g. on graceful shutdown).asyncfnrelease(&self,lease:ExportJobLease,) -> StorageResult<()>;}
The choice of claim mechanism is itself pluggable, so the same ExportWorker runtime works against any job-state backend:
/// Strategy for atomically claiming the next available export job.////// The trait surface is small on purpose: every backend has its own/// idiomatic primitive for this, and we want each implementation to/// reach for its native pattern rather than emulating someone else's.#[async_trait]pubtraitExportClaimStrategy:Send + Sync{/// Atomically transitions a single eligible job from `Accepted`/// (or expired-lease `InProgress`) to held-by-this-worker.asyncfnclaim_next(&self,worker_id:&WorkerId,lease_duration:Duration,) -> StorageResult<Option<ExportJobLease>>;}
Three implementations are in scope for the initial work: PostgresSkipLocked (default for multi-instance), RedisListMove (alternative for low-poll-latency deployments), and InMemoryMutex (used by the embedded single-instance backend and by tests).
Two design notes that come up in review:
Why a lease with expiry rather than an explicit ack/nack queue? Because the work is long-lived and idempotent (cursors live in TypeExportProgress). A worker that dies mid-job leaves its lease to expire; another worker picks the job up from the last persisted cursor. This is simpler to operate than a queue with explicit redelivery, and it matches how tokio-postgres job-queue libraries already work.
Why a fencing token? Because the lease-expiry pattern allows two workers to briefly believe they hold the same job if the original worker hung rather than crashed. The fencing token, written into every output-file key and checked on the output store's finalize_part, prevents the zombie worker from corrupting output the live worker is producing. Inspired directly by the fencing tokens pattern Martin Kleppmann wrote about; nothing novel.
The Proposal: File-Download Authorization
The download endpoint is its own small authorization problem. The manifest can be served two ways - requiresAccessToken: true, meaning download URLs point at HFS and require the kick-off's Bearer token; or requiresAccessToken: false, meaning download URLs point at the object store and are pre-signed. The download handler in HFS needs to handle the first case; the second case bypasses HFS entirely.
/// Authorization decision for a bulk export file download.////// `BearerScopeAuth` validates the incoming Bearer token has the same/// `system/*.rs` scope that authorized the kick-off, and that the/// token's subject matches the job's owner.////// `PresignedUrlAuth` is used when the manifest publishes pre-signed/// URLs directly; the download handler is not in the path.#[async_trait]pubtraitExportFileAuth:Send + Sync{asyncfnauthorize_download(&self,token:Option<&str>,tenant:&TenantContext,manifest_entry:&ExportOutputFile,) -> Result<(),ExportAuthError>;}
The default implementation is BearerScopeAuth. It revalidates the token against the same AuthProvider discussed in Discussion #45, checks that the system/{ResourceType}.rs scope covers the file's resource type, and lets the handler stream the file. The pre-signed URL case never runs through this trait - by the time a client is downloading a pre-signed URL, the object store is doing the auth check via the URL's signature.
The REST Layer: How the Endpoints Wire Up
Four handlers, all in the established HFS style: generic over the storage trait, taking a TenantExtractor, returning RestResult<Response>. We sketch the kick-off here; the other three follow the same pattern.
Three observations on this handler that are easy to miss:
It does not spawn a Tokio task to run the export. The start_export call writes the job row and returns. A worker picks the job up via claim_next out of band. The handler returns within milliseconds even for jobs that will take hours.
It does not know whether the deployment is single-instance or multi-instance. state.storage() is whichever BulkExportStorage was wired in at startup; the handler is identical either way.
It runs inside the auth middleware described in Discussion #45. By the time this function runs, ctx is a fully validated RequestContext containing a Principal with a ScopeSet. The handler does not re-validate the token; it only calls authorize_kickoff on whatever AuthorizationPolicy is configured, which evaluates SMART system scopes (and any composed deployment-specific policies) against the requested export.
Content-Location URLs are constructed from HFS_BASE_URL plus the tenant prefix (when HFS_TENANT_ROUTING_MODE=url_path) plus /export-status/{job_id}. They are absolute. They survive load balancer changes because every HFS instance constructs the same URL for the same job, and every instance can answer the poll against the shared BulkExportStorage.
Error semantics. Per-resource-type failures during the run are not catastrophic - they accumulate as OperationOutcome resources in error[] NDJSON files attached to the manifest, and the job still terminates Complete. Only conditions that prevent the export from producing any valid output (authorization failure mid-stream, total backend outage, output-store failure on every write) transition the job to ExportStatus::Error. This matches the IG's expectation that bulk jobs prefer partial success over hard failure.
Group Export: The Hard Part
Group/[id]/$export is where the spec's edges become apparent. The export returns "every resource in the patient compartment of every patient who is a member of this group" - which is the cross product of three things HFS has to compute on the fly.
First, who are the members? Groups can list members directly, list nested Groups whose members must be flattened, or (in the forthcoming Bulk Cohort profile) carry member-filter modifier extensions whose values are FHIR search expressions to evaluate against the live data. GroupExportProvider::get_group_members handles the direct case; resolve_group_patient_ids handles the rest.
Second, what is the patient compartment for this FHIR version? Compartments are defined per FHIR version by CompartmentDefinition resources; the mapping from (version, resource_type) to (search_param_names) is generated alongside the FHIR models. HFS already has this lookup at crates/rest/src/handlers/compartment.rs::get_compartment_params_for_version. The bulk export worker reuses it.
Third, how do we enumerate efficiently? For each requested resource type, the worker calls fetch_patient_compartment_batch with the resolved patient ID list and a cursor. The implementation chooses whether to issue per-patient queries, range queries, or a single chunked query depending on what the underlying store is good at; the trait does not prescribe.
The IG's behavior on _since plus group membership has a subtle wrinkle that operators should be aware of: if a patient was added to the group after_since, the server MAY return that patient's resources from before _since (because they were not part of the group at that time, but are now). The current draft says "behavior SHOULD be documented" - we will document our choice (we plan to include them by default, matching the prevalent reading) and let operators override via HFS_BULK_EXPORT_SINCE_NEWLY_ADDED=exclude when their use case demands the alternative.
The Bulk Cohort member-filter profile - where a client can POST a Group whose membership is defined by FHIR search criteria evaluated server-side - is intentionally out of scope for the first cut. It is its own design problem (asynchronous Group construction, dynamic membership, refresh semantics) and deserves its own discussion document once the export plumbing is in production.
Authorization
Bulk export's authorization story is short because Discussion #45 has already done the work.
By the time an export handler runs, the auth middleware has produced a RequestContext with a validated Principal and a ScopeSet. The export kick-off handler asks the same AuthorizationPolicy trait whether the principal's scopes cover the requested export. The scopes that matter are the standard SMART Backend Services system scopes:
Scope
Covers
system/*.rs
Read and search every resource type - the broadest bulk scope.
system/Patient.rs
Read and search Patient. Required at minimum for any Patient or Group export.
system/Observation.rs
Read and search Observation. Required to include Observation in any export.
system/[Type].read
The legacy v1-style alias; the policy implementation should accept it for backward compatibility.
The composability described in #45 carries through here. A deployment that wants additional restrictions on bulk operations - say, a BulkRateLimitPolicy that throttles concurrent jobs per client, or a BulkTenantQuotaPolicy that caps total exported volume per tenant per day - implements AuthorizationPolicy and composes it via CompositeAuthorizationPolicy. The export handlers do not know these policies exist; they only know that authorize_kickoff returned Permit.
What this means in practice: there is no new auth surface for bulk export. The same JwksBearerAuthProvider you configured for the rest of HFS validates the kick-off token. The same scope syntax governs what can be exported. The same audit trail records every job.
Configuration: What Operators Will Touch
Variable
Default
Description
HFS_BULK_EXPORT_ENABLED
true
Master switch. When false, the operation endpoints return 501 Not Implemented.
HFS_BULK_EXPORT_BACKEND
embedded
Job-state backend: embedded (SQLite + local FS), postgres-s3, redis-s3.
HFS_BULK_EXPORT_DATABASE_URL
(from HFS_DATABASE_URL)
Connection string for the job-state store when distinct from the resource store.
HFS_BULK_EXPORT_OUTPUT_BACKEND
local-fs
Output store: local-fs or s3.
HFS_BULK_EXPORT_OUTPUT_DIR
${HFS_DATA_DIR}/exports
Local FS root for output files.
HFS_BULK_EXPORT_S3_BUCKET
(none)
Bucket for output files when OUTPUT_BACKEND=s3.
HFS_BULK_EXPORT_REQUIRES_ACCESS_TOKEN
auto
Manifest hint: auto (pre-signed when supported), true (always token), false (always pre-signed).
HFS_BULK_EXPORT_FILE_URL_TTL
3600
Seconds. Pre-signed URL lifetime in the manifest.
HFS_BULK_EXPORT_OUTPUT_TTL
86400
Seconds. How long output files are retained after job completion.
HFS_BULK_EXPORT_WORKER_CONCURRENCY
2
Maximum jobs this pod runs concurrently.
HFS_BULK_EXPORT_DISABLE_LOCAL_WORKER
false
When true, this pod does not run workers (use with separate hfs-exporter).
HFS_BULK_EXPORT_MAX_CONCURRENT_PER_TENANT
4
Cap on simultaneous in-flight jobs per tenant.
HFS_BULK_EXPORT_BATCH_SIZE
1000
Resources per fetch_export_batch call.
HFS_BULK_EXPORT_LEASE_DURATION
60
Seconds. Initial lease length issued at claim.
HFS_BULK_EXPORT_HEARTBEAT_INTERVAL
20
Seconds. Worker heartbeat cadence.
HFS_BULK_EXPORT_SINCE_NEWLY_ADDED
include
For Group exports: include or exclude resources from before _since for patients added after _since.
The single-instance default - HFS_BULK_EXPORT_BACKEND=embedded with HFS_BULK_EXPORT_OUTPUT_BACKEND=local-fs - requires zero additional configuration on top of the standard HFS environment. A deployment grows into the multi-instance path by changing two variables and pointing at a Postgres and an S3 bucket; no code changes, no different binary.
Conformance Testing
The Inferno Bulk Data Test Kit is the canonical conformance harness for FHIR bulk data servers. It exercises every kick-off variant, the polling pattern, the manifest schema, the NDJSON output format, the cancellation flow, and SMART Backend Services authorization end to end. After the initial implementation lands, we will:
Spin up HFS in a Docker Compose configuration (HFS + PostgreSQL + MinIO + Keycloak for SMART) suitable for Inferno to exercise.
Wire a cargo xtask inferno-bulk-data target that runs the test kit headlessly against this configuration.
Add the run to .github/workflows/inferno.yml alongside the existing test kits.
Publish the Inferno conformance badge in crates/hfs/README.md next to the other test-kit badges.
This is intentionally separate from unit and integration tests in the workspace. Inferno tests are slow, network-bound, and authoritative - they belong in CI as a nightly job, not on every PR.
What's Not in Scope (Yet)
A handful of things are deliberately deferred. None are blockers; each is a follow-up.
$bulk-submit (the inverse direction - large NDJSON payloads into the server). The Argonaut Project's current draft is at https://hackmd.io/@argonaut/rJoqHZrPle. It will get its own discussion document once the draft stabilizes; the shared-state architecture proposed here generalizes naturally to ingestion.
The Bulk Cohort member-filter profile for dynamic Group construction. This is its own design problem (asynchronous Group creation, dynamic membership, refresh semantics) and deserves its own discussion.
Legacy $import from earlier IG drafts. Superseded by $bulk-submit; we will not implement the legacy shape.
Prefer: separate-export-status (the variant where status polling returns 200 OK with an X-Export-Status header instead of 202 Accepted). Marked as a follow-up; it is a low-effort addition once the core flow is in place.
organizeOutputBy (reorganized output with Parameters header blocks per group). Wait for broader IG adoption before committing to it.
includeAssociatedData=LatestProvenanceResources and similar Provenance hints. Implement once the audit subsystem's Provenance support lands.
Proposed Next Steps
The traits sketched above are a starting point. To move toward implementation:
Ship the embedded single-instance default first. SQLite job-state backend + local-FS output store + in-process worker. This gets every test, every demo, and every single-VM deployment unblocked, and exercises the trait surface end to end before the multi-instance work begins.
Add ExportOutputStore and ExportClaimStrategy to helios-persistence. Two new traits; refactor the existing S3 bulk-export code to satisfy them rather than implementing everything inside the S3 backend's BulkExportStorage.
Implement PostgresSkipLocked and the PostgreSQL BulkExportStorage. With the S3 ExportOutputStore, this is the multi-instance default. Cover with testcontainers integration tests against real Postgres + MinIO.
Wire the REST handlers in helios-rest. Four handlers: kick-off (with sub-routes for system, patient, group), status, cancel/delete, file-download. Plumb Content-Location, X-Progress, Retry-After, Expires, and the manifest content type correctly.
Add pre-signed URL generation to the existing S3 backend. A short addition; the existing AwsS3Client already supports it via the AWS SDK.
Bundle a docker compose configuration with HFS + PostgreSQL + MinIO + Keycloak, suitable for running the Inferno Bulk Data Test Kit locally and in CI.
Wire the Inferno Bulk Data Test Kit into .github/workflows/inferno.yml as a nightly conformance job, and publish the badge.
Document HFS_BULK_EXPORT_* envvars in CLAUDE.md and crates/hfs/README.md, including the single-instance vs multi-instance configuration recipes.
Add audit events for bulk-export lifecycle via the existing record_export_event helper already in crates/persistence/src/core/bulk_export.rs::audit, plumbed through the kick-off, completion, cancellation, and download handlers.
Open the $bulk-submit discussion once the Argonaut draft is stable, building on the shared-state architecture established here.
Closing Thoughts
Bulk export is the API that turns a FHIR server from a transaction processor into a data platform. Population health teams, research data lakes, payer-provider exchanges, AI training pipelines - none of them are reading one Patient at a time. They are reading entire compartments, entire cohorts, entire systems, and they want to do it asynchronously, resumably, and at a rate that does not require an arrangement with the FHIR server's on-call.
The architecture proposed here is built around two convictions. First, that the same trait surface should serve a single VM with SQLite and a horizontally scaled fleet with PostgreSQL and S3 - the operator chooses, the code does not change. Second, that the long-running, bandwidth-heavy parts of an export should be cleanly separable from the request-serving HFS process, so that operators can scale them independently without rewriting handlers.
The Rust trait system makes both convictions enforceable. The compiler guarantees that every export handler receives a validated RequestContext and a TenantContext. That BulkExportStorage, ExportDataProvider, and ExportOutputStore are independently replaceable. That the worker runtime is identical whether it is co-located with the REST API or running standalone. These guarantees hold regardless of how complex the deployment becomes, and they hold across the inevitable migrations from "we started on SQLite and outgrew it" to "we now run on Postgres + S3 + a separate worker tier".
After the implementation lands, the Inferno Bulk Data Test Kit becomes the daily check on whether HFS is a conformant bulk data server. The kit covers every kick-off variant, every flavor of the polling state machine, every required manifest field, the NDJSON contract, and SMART Backend Services authorization end to end. Treating Inferno conformance as a non-negotiable in CI is what turns "we shipped bulk export" into "we shipped a bulk export implementation interoperable with the rest of the ecosystem".
Thank you for reading. I look forward to the discussion.
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Introduction
If authentication and authorization decide who may access healthcare data, bulk data export decides how much of it can leave the building at one time - and at what pace, and through what door. It is the API that population health platforms, payer-provider data exchanges, registry submissions, research extracts, and AI training pipelines all converge on. CRUD and search are the bread and butter of FHIR, but the moment a workload needs every Observation for every patient in a cohort, it stops looking like a request/response problem and starts looking like a data engineering problem.
This document shares my thoughts on how to approach Bulk Data Export for the Helios FHIR Server. Like the persistence layer discussion and the authentication and authorization discussion, this is an architectural strategy document rather than a comprehensive specification. It explains the motivating direction, the key building blocks, and the Rust trait designs that will shape the export subsystem.
Who should read this? Anyone with an interest in FHIR bulk data interoperability, healthcare analytics infrastructure, or the operational realities of running long-lived, asynchronous jobs alongside a high-throughput FHIR API. Feedback is very much welcome - this is open source, developed in the open, and your perspective matters.
A note on scope: this document covers export only - the FHIR Bulk Data Access IG, specifically the
$exportfamily of operations. The companion problem of bulk submit - the inverse direction, taking large NDJSON payloads back into the server - is a separate concern with separate trade-offs. The Argonaut Project's current draft of$bulk-submitis being worked through here; we will publish a separate discussion document for ingestion once that draft stabilizes.The Lay of the Land: What the Bulk Data Access IG Says
The Bulk Data Access IG defines an asynchronous, manifest-based, NDJSON-over-HTTPS pattern for exporting large volumes of FHIR data. The shape of every export, regardless of scope, is the same:
$exportoperation. The server responds immediately with202 Acceptedand aContent-Locationheader pointing to a status URL.202. When the export is complete, the server returns200 OKwith a JSON manifest describing every output file.application/fhir+ndjson- one resource per line, no Bundle wrapper.DELETEon the status URL) when finished, signaling that the server may reclaim the output files.Three flavors of export sit on top of this same pattern:
[base]/$export- everything the caller is permitted to see across the entire server.[base]/Patient/$export- every resource in every patient's compartment that the caller is permitted to see.[base]/Group/[id]/$export- every resource in the compartment of every patient who is a member of the named Group.The kick-off request accepts a substantial parameter surface. Some of the headline parameters that any compliant implementation must understand:
_type_sinceinstant. Only resources whosemeta.lastUpdatedis at or after this point._untilinstant. Only resources whosemeta.lastUpdatedis at or before this point._typeFilterMedicationRequest?status=active). May be repeated._outputFormatapplication/fhir+ndjsonis required; abbreviatedapplication/ndjsonandndjsonmust also be accepted._elementsSUBSETTEDtag.includeAssociatedDataLatestProvenanceResources).organizeOutputByallowPartialManifestslink[]pagination before all output is finished.patient(POST only)Patientreferences.The manifest returned at completion has a fixed schema:
{ "transactionTime": "2026-05-11T00:00:00Z", "request": "https://fhir.example.org/Group/cohort-1/$export?_type=Patient,Observation", "requiresAccessToken": true, "output": [ { "type": "Patient", "url": "https://files.example.org/exports/abc/Patient-001.ndjson", "count": 12500 }, { "type": "Observation", "url": "https://files.example.org/exports/abc/Observation-001.ndjson", "count": 980321 } ], "deleted": [], "error": [], "link": [] }Three fields are non-obvious and worth highlighting up front:
transactionTimeis the server's frozen wall-clock at the moment the export was started. Every resource in the output must reflect server state as of that instant. This is the anchor that lets clients implement incremental sync correctly using_sinceon the next run.requiresAccessTokenis a hint to the client about how to fetch the output files. Iftrue, the URLs require the sameAuthorization: Bearer ...token that authorized the kick-off. Iffalse, the URLs are pre-signed (or otherwise capability-based) and the client SHALL NOT send a token. The decision is the server's; both modes are valid.erroris not a status indicator. It is a list of NDJSON files containing FHIROperationOutcomeresources, one per line, describing per-resource-type failures that did not cause the entire job to fail. An export with a populatederrorarray still finished200 OKfrom a workflow perspective.Authorization for bulk operations sits squarely on SMART Backend Services, with scopes of the form
system/Patient.rs,system/*.rs, and similar. That story is told in detail in Discussion #45 and shipped today; we will not re-derive it here. What this document does assume from that work is that the auth layer produces aRequestContextcontaining a validatedPrincipalwith aScopeSet, and that this context flows into every export handler intact.The Essential Flow
In words and then in pictures.
Kick-off. The client sends
GET /Group/cohort-1/$export?_type=Patient,Observation&_since=2026-01-01T00:00:00ZwithAccept: application/fhir+jsonandPrefer: respond-asyncand a Bearer token. The server validates the token, parses parameters, opens an export job in shared state, returns202 AcceptedwithContent-Location: https://fhir.example.org/export-status/abc. The handler does no actual data extraction in line; it returns within milliseconds.Polling. The client polls
GET /export-status/abcperiodically (honoringRetry-After). While the job runs, the server returns202 Accepted, optionally with anX-Progressheader carrying a free-form message. When the job is finished, the server returns200 OKwith the JSON manifest above. The client now has every URL it needs to fetch the data.Download. The client fetches each
output[].urlin parallel. The server streams each file asapplication/fhir+ndjson, optionallyContent-Encoding: gzip. The number of files, their sizes, and the order are server-chosen.Cleanup. When the client is done - or whenever it decides to abandon the export - it sends
DELETE /export-status/abc. The server returns202 Accepted. The files may now be reclaimed. The status URL begins returning404 Not Found.Notice three things in that diagram. The kick-off handler does no extraction. The polling client may land on a different HFS instance than the one that received the kick-off - the status URL must work regardless. The download path is not necessarily served by HFS at all; if the manifest's
requiresAccessTokenisfalse, the URLs may point directly at the object store.These are not implementation details. They are the architectural premises the rest of this document is responding to.
The Architectural Tensions
Before we get to traits, it is worth naming the tensions that the design has to resolve. Bulk export is not a single piece of code; it is a system, and the system has to hold together under four pressures simultaneously.
Long-running work behind a short-lived HTTP request. The kick-off responds
202in milliseconds. The job behind it may run for minutes, hours, or - for the largest population-level pulls - long enough to outlive the process that started it. The handler and the worker cannot be the same thing.State that outlives a process. Job status, cursors, manifests, output files - all of it has to survive process restarts, deploys, and crashes. There is no "in-memory only" version of an export that is also production-grade. Whatever state we keep must be durable from the first call to
start_export().One server versus many. A small clinic might run HFS as a single process on a single VM. A national exchange will run HFS as a fleet of pods behind a load balancer, scaled to traffic. The kick-off, the status polls, and the file downloads will land on whatever instance the load balancer picks at the moment. Job state cannot live in one instance's memory if any other instance might field the next request.
The download endpoint is a fileserver. Once a job is done, every output URL is a sustained GET. Megabytes per file, gigabytes per export, hundreds of files in the manifest. That is a fundamentally different workload from "look up a Patient by ID and return JSON" - it is hot-path bandwidth, not request/response latency. Bolting it onto the same Tokio executor that fields
Patient.readqueries is workable in a single-instance deployment and a mistake at scale.The design that follows pulls these four tensions apart cleanly, so each one is solved by a single, replaceable abstraction.
Single-Instance vs Multi-Instance: A Tale of Two Deployments
HFS has always tried to scale down as well as it scales up. The persistence layer ships with a zero-config SQLite default; the same trait surface accepts PostgreSQL, MongoDB, Elasticsearch, and S3. Bulk export follows the same philosophy: the same traits serve both the single-VM clinic install and the multi-pod cloud deployment, and the operator decides at startup which concrete implementations to wire in.
Single-instance: zero-config
The simplest possible export deployment is a single HFS process, running on a single VM, writing job state into the SQLite database it already manages, and writing output files to the local filesystem under
${HFS_DATA_DIR}/exports/{job_id}/. The worker that performs the extraction is a Tokio task spawned from the same process; the polling and download handlers serve the local SQLite row and the local file directly.This is fine for clinics, single-tenant deployments, demos, conformance testing, and CI. It works without an external job queue, without object storage, without a network filesystem. The trade-off is that you cannot horizontally scale HFS - the moment a second pod appears behind the load balancer, status polls will start landing on the wrong instance, and the design falls apart.
Multi-instance: shared state, work pool
The horizontally scaled deployment splits responsibilities cleanly:
bulk_export_jobstable. Status polls work from any instance because every instance is looking at the same row.requiresAccessToken: falsecase, the client downloads pre-signed URLs directly from the object store and HFS is not in the path at all.hfs-exporterbinary against the same shared state (an option discussed later). Workers claim jobs out of the shared store using a leasing pattern, so adding or removing workers requires no coordination beyond the shared store itself.The cardinal architectural rule is that the same code path serves both topologies. The
BulkExportStoragetrait is implemented by an embedded SQLite backend for single-instance and a PostgreSQL backend for multi-instance. TheExportOutputStoretrait is implemented by a local-FS backend for single-instance and an S3 backend for multi-instance. The handler does not know which one is wired up. The worker does not know which one is wired up. Only the bootstrap code, reading environment variables, knows.The Recommendation: PostgreSQL for Job State, S3 for Output, In-Process Workers
There is a tension between offering a menu of options and recommending a default. Discussion #45 leaned toward recommendations -
JwksBearerAuthProvideras the default token validator, withIntrospectionAuthProvideras the fallback for opaque tokens. We take the same posture here.For multi-instance job state, the default is PostgreSQL. PostgreSQL is already a supported HFS primary store, so adopting it for export state adds no new operational dependency for the common case.
SELECT ... FOR UPDATE SKIP LOCKEDis the canonical pattern for transactional job queuing in Rust ecosystems (sqlx,tokio-postgres, every Sidekiq-style job library); it handles worker fail-over, lease expiry, and at-least-once delivery without an external broker. Thebulk_export_jobstable is small, write-amplified per heartbeat, and bounded in size by output retention - it does not threaten the resource store's hot path.For multi-instance output storage, the default is S3-compatible object storage. This is the same S3 backend the persistence layer already ships, with the same
AwsS3Clientand the same keyspace conventions. Output keys are scoped under/{tenant}/exports/{job_id}/{resource_type}-{part}.ndjson. The manifest publishes pre-signed URLs with a configurable TTL, so the manifest'srequiresAccessTokenisfalseand the client downloads directly from S3 without HFS in the bandwidth path. For deployments that want to keep token-based access (audit-heavy environments, environments without a CDN), the same files are streamable through HFS's own download handler withrequiresAccessToken: trueinstead - configuration, not code.For execution, the default is an embedded worker pool. Workers run in the same process as the HFS REST API by default. This keeps the operational surface small: one binary, one deployment, one set of logs and metrics. A configurable
HFS_BULK_EXPORT_WORKER_CONCURRENCYlimits how many jobs each pod runs at once, andHFS_BULK_EXPORT_DISABLE_LOCAL_WORKER=truelets operators turn off in-pod workers entirely when they want to dedicate request-serving capacity. The optionalhfs-exporterbinary, discussed later, addresses the cases where worker isolation needs to be physical, not just configurational.Now the vendor-style walkthroughs, in the same shape as the IdP integration section of Discussion #45.
PostgreSQL (recommended default for job state)
How it connects. The same
HFS_DATABASE_URLthat drives the persistence layer. The export subsystem adds two tables -bulk_export_jobsandbulk_export_outputs- alongside the existing resource schema. No new connection pool, no new credentials, no new operational surface.Trade-offs. PostgreSQL is durable, transactional, and well understood.
SELECT ... FOR UPDATE SKIP LOCKEDmakes multi-worker claiming straightforward and safe. The cost is that every status poll is a query against PostgreSQL, which on a hot system means tuning indexes on(tenant_id, status, lease_expiry)and accepting that very-high-poll-rate workloads (more than a few thousand polls per second per tenant) will eventually want a caching layer in front.Configuration sketch.
Redis (alternative for low-latency status polls)
How it connects. A standard Redis or Redis Cluster endpoint. Job records are hashes keyed by job ID; an indexed sorted-set holds pending jobs ordered by enqueue time; claim is
BLMOVEfrompendingtoin-flight-{worker_id}lists with a TTL-backed lease.Trade-offs. Redis makes status polls trivially fast - a single
HGETALLis under a millisecond - and the claim semantics are clean. The cost is durability: a Redis crash without AOF persistence can lose in-flight job state. For exports, the worst-case impact is a job that has to be restarted; cursors live in PostgreSQL via the sameBulkExportStoragetrait, so re-running is mostly idempotent, but operators who run Redis as a cache rather than a primary store should think twice. Best fit: deployments that already operate Redis as a hot path and want polling latency under a millisecond.Configuration sketch.
DynamoDB / Cosmos DB / Spanner (cloud-managed equivalents)
How they connect. Each cloud's identity model. DynamoDB via the AWS SDK; Cosmos DB via the Azure SDK; Spanner via Google Cloud credentials. Each is a
BulkExportStorageimplementation that mirrors the PostgreSQL pattern but uses conditional writes (DynamoDB:ConditionExpression, Cosmos: ETag preconditions, Spanner: read-modify-write transactions) in place ofSKIP LOCKED.Trade-offs. Managed durability and global replication, at the cost of additional integration code per provider and per-call billing. The same caveats as the IdP discussion in #45 apply: every cloud has its own claim-name and capability quirks; the abstraction has to absorb them.
These are not first-tier targets for the initial implementation, but the trait design must not preclude them. Anyone running HFS purely on a single cloud will eventually want them.
Kafka / NATS JetStream (workers physically separate from request handlers)
How they connect. Kafka topics or JetStream streams act as the work queue; HFS publishes a job-created event on kick-off, and a separate
hfs-exporterbinary consumes the topic. Job state still lives in PostgreSQL (or wherever theBulkExportStorageimpl points), so status polls and downloads do not touch the broker.Trade-offs. This is the "we run our exporters on a different node pool because they're bandwidth-heavy" case. Adds a broker as an operational dependency, but lets you scale request handlers and exporters independently, and gives you explicit at-least-once delivery semantics with offsets. Best fit: fleets large enough that the export workload visibly distorts request-serving capacity.
S3-compatible object storage (recommended default for output files)
How it connects. The same
AwsS3Clientthe persistence layer's S3 backend uses, configured via the standard AWS credential chain. Output files are uploaded as multipart objects under/{tenant}/exports/{job_id}/{resource_type}-{part}.ndjson. The manifest publishes pre-signedGETURLs with a TTL configured byHFS_BULK_EXPORT_FILE_URL_TTL.Trade-offs. Object storage is the right tool for the job - massively parallel reads, transparent CDN integration, region-redundant durability, lifecycle policies for automatic expiry. The only meaningful cost is that pre-signed URLs reveal that something exists at this URL until this expiry, which some audit regimes treat as out of band. Those deployments switch
HFS_BULK_EXPORT_REQUIRES_ACCESS_TOKEN=true, the manifest reportsrequiresAccessToken: true, and downloads flow through HFS's own handler instead.Cloudflare R2 / Google Cloud Storage / MinIO (S3-compatible drop-ins)
R2, GCS (via interop), and MinIO all speak the S3 API. The same
AwsS3Clientworks against each with no code change, onlyendpoint_urlandforce_path_styleconfiguration adjustments. We will document adocker composeexample with MinIO as part of the development environment so contributors can exercise the multi-instance path without an AWS account.Local filesystem (single-instance only)
How it connects. Files are written to
${HFS_DATA_DIR}/exports/{tenant_id}/{job_id}/{resource_type}-{part}.ndjson. The download handler serves them viatokio::fs::Fileand Axum's streaming body.Trade-offs. No external dependencies; perfect for development and single-VM deployments. Not safe in a multi-instance topology because the writing instance and the reading instance may differ. A shared NFS mount makes this technically work across instances, but it is brittle (lock semantics, cache coherency, fsync surprises) and we do not recommend it.
Designing the Rust Traits
The persistence crate already carries most of the building blocks. The export module at
crates/persistence/src/core/bulk_export.rsdefines the types and traits below; the S3 backend implements them today, and the embedded SQLite and PostgreSQL backends will follow. We present the existing surface first - so readers know what already exists - and then propose the additions that this discussion is centrally about.The Existing Surface: Types
The vocabulary of an export job. These are stable; nothing in this document proposes changing them.
ExportProgressand its per-type companionTypeExportProgressround out the model.TypeExportProgresscarries the cursor state that lets a worker resume mid-job after a crash - the same cursor type thatExportDataProvider::fetch_export_batchaccepts and returns.ExportManifestandExportOutputFilemodel the terminal manifest that the status endpoint serves at200 OK.NdjsonBatchis what data providers produce - one logical batch of NDJSON lines plus a cursor and a "is this the last batch" flag.The Existing Surface: Job-State Trait
BulkExportStorageis the contract that any backend providing job state implements. Single-instance backends (embedded SQLite) and multi-instance backends (PostgreSQL, Redis) both satisfy this trait. The handler does not care which is wired up.The Existing Surface: Data-Provider Trait Hierarchy
ExportDataProvideris what the worker calls when it needs more resources to write. It is notBulkExportStorage- one provides job lifecycle, the other provides data. In practice the same backend object often implements both (the persistence layer already does), but conceptually they are separate concerns that could live in separate processes.The trait hierarchy reflects the spec: every server that can do a Patient export can also do a System export; every server that can do a Group export can also do a Patient export. The compiler enforces the capability ladder.
The Proposal: Output Storage as a First-Class Trait
The traits above answer "what data do we export" and "how is the job tracked". They do not answer "where do the bytes go". Today, each backend that implements
BulkExportStoragealso implicitly decides where output files live - the S3 backend writes them to S3, a future SQLite backend would write them locally. This works, but it conflates two decisions that operators reasonably want to make independently. A site running PostgreSQL for job state and S3 for output is a perfectly normal configuration. A site running PostgreSQL for both is also reasonable. The current shape does not support that cleanly.We propose a separate
ExportOutputStoretrait:Two implementations cover the common cases.
LocalFsOutputStorewrites under${HFS_DATA_DIR}/exports/, hands back HFS-served URLs, and setsrequires_access_token: true.S3OutputStorewrites to a bucket configured via the existingS3BackendConfig, hands back pre-signed URLs, and setsrequires_access_token: false(ortrueif the operator wants to keep downloads on the HFS data path).The Proposal: Workers, Leases, and the Claim Strategy
Today, the S3 backend's
start_exportdoes the work synchronously inside the kick-off handler. That works for the single-instance case but breaks two design goals: it pins the handler thread for the duration of the job, and it gives no way to scale workers independently of request handlers. The handler must return immediately; the work happens elsewhere.A worker is the runtime that performs an export. Workers may run in-pod (the default) or in a separate
hfs-exporterbinary. Either way, they share state throughBulkExportStorageand they coordinate through a leasing protocol.The choice of claim mechanism is itself pluggable, so the same
ExportWorkerruntime works against any job-state backend:Three implementations are in scope for the initial work:
PostgresSkipLocked(default for multi-instance),RedisListMove(alternative for low-poll-latency deployments), andInMemoryMutex(used by the embedded single-instance backend and by tests).Two design notes that come up in review:
Why a lease with expiry rather than an explicit ack/nack queue? Because the work is long-lived and idempotent (cursors live in
TypeExportProgress). A worker that dies mid-job leaves its lease to expire; another worker picks the job up from the last persisted cursor. This is simpler to operate than a queue with explicit redelivery, and it matches howtokio-postgresjob-queue libraries already work.Why a fencing token? Because the lease-expiry pattern allows two workers to briefly believe they hold the same job if the original worker hung rather than crashed. The fencing token, written into every output-file key and checked on the output store's
finalize_part, prevents the zombie worker from corrupting output the live worker is producing. Inspired directly by the fencing tokens pattern Martin Kleppmann wrote about; nothing novel.The Proposal: File-Download Authorization
The download endpoint is its own small authorization problem. The manifest can be served two ways -
requiresAccessToken: true, meaning download URLs point at HFS and require the kick-off's Bearer token; orrequiresAccessToken: false, meaning download URLs point at the object store and are pre-signed. The download handler in HFS needs to handle the first case; the second case bypasses HFS entirely.The default implementation is
BearerScopeAuth. It revalidates the token against the sameAuthProviderdiscussed in Discussion #45, checks that thesystem/{ResourceType}.rsscope covers the file's resource type, and lets the handler stream the file. The pre-signed URL case never runs through this trait - by the time a client is downloading a pre-signed URL, the object store is doing the auth check via the URL's signature.The REST Layer: How the Endpoints Wire Up
Four handlers, all in the established HFS style: generic over the storage trait, taking a
TenantExtractor, returningRestResult<Response>. We sketch the kick-off here; the other three follow the same pattern.Three observations on this handler that are easy to miss:
It does not spawn a Tokio task to run the export. The
start_exportcall writes the job row and returns. A worker picks the job up viaclaim_nextout of band. The handler returns within milliseconds even for jobs that will take hours.It does not know whether the deployment is single-instance or multi-instance.
state.storage()is whicheverBulkExportStoragewas wired in at startup; the handler is identical either way.It runs inside the auth middleware described in Discussion #45. By the time this function runs,
ctxis a fully validatedRequestContextcontaining aPrincipalwith aScopeSet. The handler does not re-validate the token; it only callsauthorize_kickoffon whateverAuthorizationPolicyis configured, which evaluates SMART system scopes (and any composed deployment-specific policies) against the requested export.The status handler is symmetric:
Content-LocationURLs are constructed fromHFS_BASE_URLplus the tenant prefix (whenHFS_TENANT_ROUTING_MODE=url_path) plus/export-status/{job_id}. They are absolute. They survive load balancer changes because every HFS instance constructs the same URL for the same job, and every instance can answer the poll against the sharedBulkExportStorage.Error semantics. Per-resource-type failures during the run are not catastrophic - they accumulate as
OperationOutcomeresources inerror[]NDJSON files attached to the manifest, and the job still terminatesComplete. Only conditions that prevent the export from producing any valid output (authorization failure mid-stream, total backend outage, output-store failure on every write) transition the job toExportStatus::Error. This matches the IG's expectation that bulk jobs prefer partial success over hard failure.Group Export: The Hard Part
Group/[id]/$exportis where the spec's edges become apparent. The export returns "every resource in the patient compartment of every patient who is a member of this group" - which is the cross product of three things HFS has to compute on the fly.First, who are the members? Groups can list members directly, list nested Groups whose members must be flattened, or (in the forthcoming Bulk Cohort profile) carry
member-filtermodifier extensions whose values are FHIR search expressions to evaluate against the live data.GroupExportProvider::get_group_membershandles the direct case;resolve_group_patient_idshandles the rest.Second, what is the patient compartment for this FHIR version? Compartments are defined per FHIR version by
CompartmentDefinitionresources; the mapping from(version, resource_type)to(search_param_names)is generated alongside the FHIR models. HFS already has this lookup atcrates/rest/src/handlers/compartment.rs::get_compartment_params_for_version. The bulk export worker reuses it.Third, how do we enumerate efficiently? For each requested resource type, the worker calls
fetch_patient_compartment_batchwith the resolved patient ID list and a cursor. The implementation chooses whether to issue per-patient queries, range queries, or a single chunked query depending on what the underlying store is good at; the trait does not prescribe.The IG's behavior on
_sinceplus group membership has a subtle wrinkle that operators should be aware of: if a patient was added to the group after_since, the server MAY return that patient's resources from before_since(because they were not part of the group at that time, but are now). The current draft says "behavior SHOULD be documented" - we will document our choice (we plan to include them by default, matching the prevalent reading) and let operators override viaHFS_BULK_EXPORT_SINCE_NEWLY_ADDED=excludewhen their use case demands the alternative.The Bulk Cohort
member-filterprofile - where a client can POST a Group whose membership is defined by FHIR search criteria evaluated server-side - is intentionally out of scope for the first cut. It is its own design problem (asynchronous Group construction, dynamic membership, refresh semantics) and deserves its own discussion document once the export plumbing is in production.Authorization
Bulk export's authorization story is short because Discussion #45 has already done the work.
By the time an export handler runs, the auth middleware has produced a
RequestContextwith a validatedPrincipaland aScopeSet. The export kick-off handler asks the sameAuthorizationPolicytrait whether the principal's scopes cover the requested export. The scopes that matter are the standard SMART Backend Services system scopes:system/*.rssystem/Patient.rssystem/Observation.rssystem/[Type].readThe composability described in #45 carries through here. A deployment that wants additional restrictions on bulk operations - say, a
BulkRateLimitPolicythat throttles concurrent jobs per client, or aBulkTenantQuotaPolicythat caps total exported volume per tenant per day - implementsAuthorizationPolicyand composes it viaCompositeAuthorizationPolicy. The export handlers do not know these policies exist; they only know thatauthorize_kickoffreturnedPermit.What this means in practice: there is no new auth surface for bulk export. The same
JwksBearerAuthProvideryou configured for the rest of HFS validates the kick-off token. The same scope syntax governs what can be exported. The same audit trail records every job.Configuration: What Operators Will Touch
HFS_BULK_EXPORT_ENABLEDtruefalse, the operation endpoints return501 Not Implemented.HFS_BULK_EXPORT_BACKENDembeddedembedded(SQLite + local FS),postgres-s3,redis-s3.HFS_BULK_EXPORT_DATABASE_URLHFS_DATABASE_URL)HFS_BULK_EXPORT_OUTPUT_BACKENDlocal-fslocal-fsors3.HFS_BULK_EXPORT_OUTPUT_DIR${HFS_DATA_DIR}/exportsHFS_BULK_EXPORT_S3_BUCKETOUTPUT_BACKEND=s3.HFS_BULK_EXPORT_REQUIRES_ACCESS_TOKENautoauto(pre-signed when supported),true(always token),false(always pre-signed).HFS_BULK_EXPORT_FILE_URL_TTL3600HFS_BULK_EXPORT_OUTPUT_TTL86400HFS_BULK_EXPORT_WORKER_CONCURRENCY2HFS_BULK_EXPORT_DISABLE_LOCAL_WORKERfalsetrue, this pod does not run workers (use with separatehfs-exporter).HFS_BULK_EXPORT_MAX_CONCURRENT_PER_TENANT4HFS_BULK_EXPORT_BATCH_SIZE1000fetch_export_batchcall.HFS_BULK_EXPORT_LEASE_DURATION60HFS_BULK_EXPORT_HEARTBEAT_INTERVAL20HFS_BULK_EXPORT_SINCE_NEWLY_ADDEDincludeincludeorexcluderesources from before_sincefor patients added after_since.The single-instance default -
HFS_BULK_EXPORT_BACKEND=embeddedwithHFS_BULK_EXPORT_OUTPUT_BACKEND=local-fs- requires zero additional configuration on top of the standard HFS environment. A deployment grows into the multi-instance path by changing two variables and pointing at a Postgres and an S3 bucket; no code changes, no different binary.Conformance Testing
The Inferno Bulk Data Test Kit is the canonical conformance harness for FHIR bulk data servers. It exercises every kick-off variant, the polling pattern, the manifest schema, the NDJSON output format, the cancellation flow, and SMART Backend Services authorization end to end. After the initial implementation lands, we will:
cargo xtask inferno-bulk-datatarget that runs the test kit headlessly against this configuration..github/workflows/inferno.ymlalongside the existing test kits.crates/hfs/README.mdnext to the other test-kit badges.This is intentionally separate from unit and integration tests in the workspace. Inferno tests are slow, network-bound, and authoritative - they belong in CI as a nightly job, not on every PR.
What's Not in Scope (Yet)
A handful of things are deliberately deferred. None are blockers; each is a follow-up.
$bulk-submit(the inverse direction - large NDJSON payloads into the server). The Argonaut Project's current draft is at https://hackmd.io/@argonaut/rJoqHZrPle. It will get its own discussion document once the draft stabilizes; the shared-state architecture proposed here generalizes naturally to ingestion.member-filterprofile for dynamic Group construction. This is its own design problem (asynchronous Group creation, dynamic membership, refresh semantics) and deserves its own discussion.$importfrom earlier IG drafts. Superseded by$bulk-submit; we will not implement the legacy shape.Prefer: separate-export-status(the variant where status polling returns200 OKwith anX-Export-Statusheader instead of202 Accepted). Marked as a follow-up; it is a low-effort addition once the core flow is in place.organizeOutputBy(reorganized output with Parameters header blocks per group). Wait for broader IG adoption before committing to it.includeAssociatedData=LatestProvenanceResourcesand similar Provenance hints. Implement once the audit subsystem's Provenance support lands.Proposed Next Steps
The traits sketched above are a starting point. To move toward implementation:
ExportOutputStoreandExportClaimStrategytohelios-persistence. Two new traits; refactor the existing S3 bulk-export code to satisfy them rather than implementing everything inside the S3 backend'sBulkExportStorage.PostgresSkipLockedand the PostgreSQLBulkExportStorage. With the S3ExportOutputStore, this is the multi-instance default. Cover with testcontainers integration tests against real Postgres + MinIO.helios-rest. Four handlers: kick-off (with sub-routes for system, patient, group), status, cancel/delete, file-download. PlumbContent-Location,X-Progress,Retry-After,Expires, and the manifest content type correctly.AwsS3Clientalready supports it via the AWS SDK.docker composeconfiguration with HFS + PostgreSQL + MinIO + Keycloak, suitable for running the Inferno Bulk Data Test Kit locally and in CI..github/workflows/inferno.ymlas a nightly conformance job, and publish the badge.HFS_BULK_EXPORT_*envvars inCLAUDE.mdandcrates/hfs/README.md, including the single-instance vs multi-instance configuration recipes.record_export_eventhelper already incrates/persistence/src/core/bulk_export.rs::audit, plumbed through the kick-off, completion, cancellation, and download handlers.$bulk-submitdiscussion once the Argonaut draft is stable, building on the shared-state architecture established here.Closing Thoughts
Bulk export is the API that turns a FHIR server from a transaction processor into a data platform. Population health teams, research data lakes, payer-provider exchanges, AI training pipelines - none of them are reading one Patient at a time. They are reading entire compartments, entire cohorts, entire systems, and they want to do it asynchronously, resumably, and at a rate that does not require an arrangement with the FHIR server's on-call.
The architecture proposed here is built around two convictions. First, that the same trait surface should serve a single VM with SQLite and a horizontally scaled fleet with PostgreSQL and S3 - the operator chooses, the code does not change. Second, that the long-running, bandwidth-heavy parts of an export should be cleanly separable from the request-serving HFS process, so that operators can scale them independently without rewriting handlers.
The Rust trait system makes both convictions enforceable. The compiler guarantees that every export handler receives a validated
RequestContextand aTenantContext. ThatBulkExportStorage,ExportDataProvider, andExportOutputStoreare independently replaceable. That the worker runtime is identical whether it is co-located with the REST API or running standalone. These guarantees hold regardless of how complex the deployment becomes, and they hold across the inevitable migrations from "we started on SQLite and outgrew it" to "we now run on Postgres + S3 + a separate worker tier".After the implementation lands, the Inferno Bulk Data Test Kit becomes the daily check on whether HFS is a conformant bulk data server. The kit covers every kick-off variant, every flavor of the polling state machine, every required manifest field, the NDJSON contract, and SMART Backend Services authorization end to end. Treating Inferno conformance as a non-negotiable in CI is what turns "we shipped bulk export" into "we shipped a bulk export implementation interoperable with the rest of the ecosystem".
Thank you for reading. I look forward to the discussion.
Beta Was this translation helpful? Give feedback.
All reactions