Skip to content
This repository was archived by the owner on Apr 17, 2026. It is now read-only.
This repository was archived by the owner on Apr 17, 2026. It is now read-only.

per-tenant rate limiting and token quota enforcement #71

Description

@yai-dev

Summary

Add first-class per-tenant rate limiting and token quota enforcement for hosted Agentrail deployments.

Current State

The current host/app layer does not yet enforce tenant-level request budgets or token quotas.

Proposed Design

  1. Define a rate-limiter contract in the current host/app surface
  2. Check tenant quotas before LLM execution and reconcile actual usage after completion
  3. Support configurable enforcement policies such as reject, queue, or degrade
  4. Resolve tenant identity from explicit run context and route headers

Affected Areas

  • host/app request handling in packages/app
  • runtime usage accounting in packages/core
  • optional external persistence adapters as follow-up work

Acceptance Criteria

  • Rate-limiter contract is defined against the current app/host architecture
  • In-memory implementation exists for local/single-instance deployments
  • Enforcement modes are implemented and tested
  • Tenant identity resolution is documented and tested
  • Metrics/events are emitted for rejection, queueing, and budget state

Metadata

Metadata

Assignees

No one assigned

    Labels

    architectureDesign and structural concernsenhancementNew feature or requestsecuritySecurity vulnerabilities and sensitive data handling

    Projects

    Status
    Todo

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions