Skip to content

Introduce a separated dataplane proxy for /data endpoint #98

@alambare

Description

@alambare

Context

stac-fastapi-eodag currently exposes a federated STAC API with a /data/{federation:backend}/{collection_id}/{item_id}/{asset_name} endpoint that proxies asset downloads through EODAG.

The STAC specification defines how to discover assets but does not define how data should be accessed, streamed, or delivered. Data access is intentionally outside the scope of STAC.

Currently, discovery and access responsibilities are combined in the same component. This introduces limitations:

EODAG is a Python library optimized for provider resolution and search; it is not designed for high-throughput concurrent streaming. Using it directly in the data path constrains scalability.

Data access requires route-level authorization, per-user or per-group quota enforcement, configurable delivery strategies (streaming vs presigned redirects), and caching with retention policies. These concerns differ fundamentally from discovery.

No explicit caching layer exists to improve latency, reduce upstream load, and support lifecycle management.

There are also scenarios where a federated data proxy is required independently of the STAC API.

Decision

We will extract the /data endpoint from stac-fastapi-eodag and implement it as a dedicated Rust Federated Data Proxy, deployed independently.

Responsibilities of the proxy:

  • Authenticate requests and enforce fine-grained authorization (route- and resource-level).
  • Resolve upstream provider URLs using EODAG.
  • Stream data to the client or return presigned redirects, depending on configuration.
  • Integrate a controlled caching layer with configurable retention policies.
  • Enforce per-user or per-group quotas and track data transfer.
  • Provide a high-performance, concurrent data access path.

The STAC API will continue to serve discovery endpoints but will point asset URLs to the new proxy. This preserves existing interfaces for clients while clearly separating discovery from access.

Image

Integration with EODAG

The Federated Data Proxy delegates upstream provider resolution to EODAG through a dedicated interface.

When handling a cache miss, the proxy calls the EODAG interface and provides:

  • The URL of the identified STAC item
  • The requested asset name

EODAG performs the following steps:

  1. Retrieve the STAC item metadata using the provided item URL.
  2. Identify the corresponding remote data provider based on the item metadata and EODAG configuration.
  3. Instantiate an EOProduct using the appropriate provider and download plugin.
  4. Resolve the concrete upstream access URL and generate the required authentication context (e.g., signed URL, S3 configuration, HTTP headers, credentials).

Return to the proxy:

  • The resolved upstream URL
  • The associated authentication context
  • Any protocol-specific parameters required for access

The proxy then uses this information to either:

  • Stream the upstream response to the client, or
  • Generate a presigned redirect response, depending on configuration.

EODAG remains responsible for provider abstraction and download logic resolution. The Rust proxy remains responsible for authentication, authorization, quota enforcement, caching, streaming, and delivery strategy.

Deployment Model

The exact integration model between the proxy and EODAG is yet to be finalized. Two options are considered:

  1. Embedded integration: EODAG is integrated directly into the Rust proxy using Python bindings.
  2. Service-based integration: EODAG runs as an independent service exposed over HTTP or gRPC, and the proxy communicates with it as a remote dependency.

The final choice will balance latency, operational complexity, scalability, and isolation from Python runtime constraints. This decision does not affect the logical responsibility boundary between provider resolution (EODAG) and data access control (proxy).

Cache retention policies

The lifecycle of cache is solely handled by the S3 storage layer using S3 lifecycle policies.

Cases to handle:

  • Remove a specific prefix after a defined amount of time.
  • Clean up parts on failed multi-part uploads

Alternatives Considered

  1. Keep /data in stac-fastapi-eodag

    • Pros: No additional deployment.
    • Cons: Conflates responsibilities, limited scalability, GIL bottleneck.
  2. Wrap EODAG in a Python microservice

    • Pros: Avoids Rust.
    • Cons: Adds operational complexity; The proxy would still need caching and quota enforcement.
  3. Rust Federated Data Proxy (Chosen)

    • Pros: High performance, independent deployment, clear separation, easier caching and quota management, no Python runtime bottleneck in the data path.
    • Cons: Requires Rust implementation, integration with EODAG via Python bindings or microservice call.

Consequences

  • Positive:

    • Clear separation of discovery and access responsibilities.
    • Improved scalability and concurrency for data access.
    • Maintains backward-compatible client interface via asset URLs.
  • Negative / Risks:

    • Initial Rust implementation requires integration with Python EODAG (GIL considerations).
    • Additional deployment and operational overhead.
    • Future changes in upstream provider resolution may require coordination between Rust proxy and EODAG.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions