New workflow architecture


# future Architecture goals and high-level design

This proposal addresses several raised issues and sets up long-term integration of GitProxy with the codified controls ecosystem. It updates how the proxy handles workflows around user actions — by moving to configurable workflows with explicit states and transitions. The goal is to add flexibility for organization-specific policies, workflows, and integrations, while still providing consistent handling for checks, scans, approvals, and routing decisions. This design can be delivered in smaller, incremental pieces rather than as an all-or-nothing rewrite; it describes the target redesign and serves as an ideal benchmark for future delivery and migration decisions.

As a potential, such a system could form it's own re-usable library - where codified policies, controls, evidence of execution + results are required as part of a system flow. 

## Requirements (Short)

1. Integration with common policy specifications (OSCAL), for traceability between policies and implemented controls.
2. Compatibility with open standards (Open Policy Agent), to preserve separation of concerns between control, policy, and workflow.
3. Flexibility to change the core workflow to meet organizational policies.
4. Flexibility to cleanly add organization-specific workflow integrations (e.g., internal request/approval ticketing systems).
5. Implement organization-specific controls and scanners.
6. Clean separation of evidence handlers vs control handlers, with explicit declaration of control input requirements (evidence keys).

### Non-functional

- Support horizontal scaling, with the caveat that shared repository storage must be provided by the deploying organization (e.g., NFS or shared volume).

## Issue Map (Requirements -> Issues)

| Requirement | Related issues |
| --- | --- |
| 1. OSCAL integration | (covered by this proposal) |
| 2. OPA compatibility | [#99](https://github.com/finos/git-proxy/issues/99) |
| 3. Workflow flexibility | (covered by this proposal), [#1350](https://github.com/finos/git-proxy/issues/1350), [#1365](https://github.com/finos/git-proxy/issues/1365) |
| 4. Org-specific integrations | [#1121](https://github.com/finos/git-proxy/issues/1121) |
| 5. Org-specific controls/scanners | [#26](https://github.com/finos/git-proxy/issues/26) |
| Non-functional (horizontal scaling) | [#1386](https://github.com/finos/git-proxy/issues/1386) |

---

## Requirement Details and Rationale

1. **OSCAL integration**: Support expressing controls and evidence mapping in OSCAL so policy intent and implementation remain traceable in audits and compliance reporting. **Rationale**: without a standards-based control model, the workflow becomes a one-off implementation that is hard to justify, certify, or map to external requirements.
2. **OPA compatibility**: Keep policy decisions externalized to OPA with a clear input/output contract. **Rationale**: policy changes should not require workflow code changes, and OPA provides a mature PDP with tooling, versioning, and safe rollout patterns.
3. **Workflow flexibility**: Allow states, handlers, and transitions to be altered per deployment. **Rationale**: organizations have different risk tolerances and gates; a fixed workflow forces forks or patching core logic.
4. **Org-specific integrations**: Enable custom steps such as internal ticketing, approvals, or notifications. **Rationale**: real-world governance flows depend on org systems, and the proxy must plug into them without bespoke forks.
5. **Org-specific controls/scanners**: Support adding or swapping scanners and controls per organization. **Rationale**: tooling preferences and regulatory obligations vary, so the platform must accept custom controls without re-architecting the core.
6. **Evidence/control separation**: Keep evidence production distinct from control evaluation, with explicit evidence keys as inputs. **Rationale**: this removes implicit dependencies, enables reuse, and makes decisions explainable and auditable.
7. **Horizontal scaling**: Run multiple proxy instances with shared storage and transactional state. **Rationale**: production deployments require concurrency and resilience, which demands a stateless runtime with shared persistence.

*Notes*

- **OSCAL** (control framework language + implementation/evidence mapping as compliance artifacts)
- **OPA** (runtime policy decision point)
embedded)
- A **policy supply chain**: OSCAL → compiled data/policies → OPA bundles → runtime decisions



## Critique of Current Workflow (Chain) Design

Below is a brief critique of the current chain-based workflow in the codebase, based on the existing handler ordering and data dependencies.

- The push pipeline is a single linear chain with implicit dependencies between handlers (see [src/proxy/chain.ts](src/proxy/chain.ts)). There is no explicit contract that declares which handler produces which data, so later handlers silently depend on earlier side effects.
- Data dependencies are implicit and spread across mutable `action` fields. Examples:
  - `getDiff` depends on `commitData` and commit range from `parsePush`, and on a populated `proxyGitPath` from `pullRemote` ([src/proxy/processors/push-action/parsePush.ts](src/proxy/processors/push-action/parsePush.ts), [src/proxy/processors/push-action/pullRemote.ts](src/proxy/processors/push-action/pullRemote.ts), [src/proxy/processors/push-action/getDiff.ts](src/proxy/processors/push-action/getDiff.ts)).
  - `scanDiff` expects a prior `diff` step to have stored content in the action steps list ([src/proxy/processors/push-action/scanDiff.ts](src/proxy/processors/push-action/scanDiff.ts)).
  - `writePack` assumes `proxyGitPath` and `repoName` are set by earlier steps ([src/proxy/processors/push-action/writePack.ts](src/proxy/processors/push-action/writePack.ts)).
  - `gitleaks` assumes `commitFrom/commitTo` and repository clone state are already set up ([src/proxy/processors/push-action/gitleaks.ts](src/proxy/processors/push-action/gitleaks.ts)).
- Evidence gathering and control logic are intertwined inside the chain rather than being separated as distinct handler types. This makes it hard to reuse evidence across controls or to swap control logic without also changing evidence order.
- Exit/hold behavior is coupled to specific handlers (`checkIfWaitingAuth`, `blockForAuth`) rather than a unified pending/approval mechanism. This makes re-entry behavior implicit and difficult to reason about ([src/proxy/processors/push-action/checkIfWaitingAuth.ts](src/proxy/processors/push-action/checkIfWaitingAuth.ts), [src/proxy/processors/push-action/blockForAuth.ts](src/proxy/processors/push-action/blockForAuth.ts)).
- The chain mixes parsing, auth, repo cloning, scanning, and gating concerns in one ordered list. This makes workflow changes brittle because ordering is the only coordination mechanism (see [src/proxy/chain.ts](src/proxy/chain.ts)).
- Push and pull are modeled inconsistently: push uses a dedicated chain with side-effectful steps (clone, diff, scan, write-pack), while pull paths apply a different set of checks and short-circuit logic, so there is no shared contract for evidence, control evaluation, or decision outcomes across actions.
- Push exposes an explicit wait/approval path (via `checkIfWaitingAuth`/`blockForAuth`), but pull does not use an equivalent pending/approval mechanism, which means similar policy requirements must be implemented twice with different semantics.

## How This Proposal Solves Current Issues (and Linked Issues)

- **Explicit evidence/control separation** removes implicit data coupling in the chain by declaring evidence types and control inputs, which directly addresses the critique about hidden dependencies.
- **Declarative state definitions** replace fragile ordering with explicit transitions, making workflow changes safer and aligning with [#1350](https://github.com/finos/git-proxy/issues/1350) and [#1365](https://github.com/finos/git-proxy/issues/1365).
- **Shared push/pull model** applies the same evidence/control contract, decision statuses, and pending flow to both actions, removing the current divergence between push chains and pull short-circuit checks.
- **Re-entrant pending flow** replaces ad-hoc auth blocks with a consistent `pending` path, which supports review workflows and notification integrations like [#1121](https://github.com/finos/git-proxy/issues/1121).
- **OPA-based policy control** centralizes decision logic (control vs policy separation) in support of [#99](https://github.com/finos/git-proxy/issues/99).
- **Plugin-based handlers** allow organization-specific scanners and controls (aligned with [#26](https://github.com/finos/git-proxy/issues/26)).
---

## Workflow Engine vs State Machine

The current code already behaves similarly to a very simple state machine (a mostly linear chain with early exits). Moving to a full workflow engine would likely add complexity without solving a clear problem.

The simplest and most maintainable approach is:

- Use a **state machine per action** (Push, Pull, future actions).
- Keep a **declarative workflow definition** (JSON/YAML) describing states, transitions, and handlers.
- Implement a **lightweight workflow runner** that executes handlers, applies transitions, and emits events/audit records.

---

## Overview

Core concepts:

- **Action**: a single user request type (push | pull | …).
- **State Flow**: config per action (states, handlers, transitions).
- **State**: named phase of execution (VALIDATING, ANALYZING, DECIDING, …).
- **Evidence handlers**: gather facts and produce evidence.
- **Control handlers**: evaluate controls using evidence and return decisions.
- **Result**: allow/block/error (derived from control handler results and transitions).
- **Plugin integration**: plugins are externally registered handlers (and optionally guards/events) without patching core logic/config.

---

## Basic Hierarchy

```
Action
  └── State
        ├── EvidenceHandler
        └── ControlHandler
```

---

## Handler Execution Model

States contain ordered **evidence handlers** followed by ordered **control handlers**. The runner will:

1. Enters a state (records state-enter).
2. Executes evidence handlers in order, storing evidence by `(actionId, evidenceKey)` and recording their results.
3. Executes control handlers in order, each reading required evidence keys and returning pass/fail/pending/error.
4. Aggregates control results into a state outcome (pass/fail/pending/error).
5. Selects the next transition based on the outcome.
6. Records state-exit, then transitions.

## Handler Types

Handlers are split into two types with different semantics:

- **Evidence handlers**: gather facts and produce **named** evidence (diff, commit messages, author emails, scan outputs). They return **pass | error** and do not directly decide pass/fail.
- **Control handlers**: evaluate controls and return **pass | fail | pending | error**. These drive state transitions.

### Evidence and Control Dependencies

Evidence and control handlers are decoupled. Evidence handlers declare the **evidence types** they produce. Control handlers declare the **evidence types** they require. The workflow runner wires them together by matching evidence types.

An `evidence_gather` handler is the canonical pattern for collecting a specific evidence type. Each evidence type is identified by a **stable key** (for example: `evidence.git.diff`, `evidence.git.commit_messages`). Control handlers depend on those stable keys, not on specific evidence handler implementations. This keeps evidence gathering interchangeable and allows multiple implementations to produce the same evidence type.

### Evidence Handler Result

- `status`: pass | error
- `evidenceTypes`: list of evidence types produced
- `data`: evidence payload (diff summary, scan report, metadata)

### Control Handler Result

- `status`: pass | fail | pending | error
- `requiredEvidence`: list of evidence types needed
- `data`: decision metadata (reasons, required approvals, references)

### Requirements
- Handlers may return `pending` to pause the workflow in the current state.
- Re-entry resumes from **persisted Action state**, not in-memory runtime state.
- All handlers must be **idempotent** (safe to run multiple times).

---

### Async and Re-Entrant Handlers

Some handlers are long-running or require external input (e.g. `require_approval`). The workflow must support pausing and resuming, this is handled through the 'pending' state

#### UI-Driven Approvals - (A type of re-entrant handler)

Some decisions are manual - These decisions are recorded via the GitProxy UI which can **pass / fail** the state of a specific handler.

#### Model

- Re-entrant handlers (e.g. `require_approval`), when that handler runs, it ensures an approval record exists (or creates one) and returns `pending`.
- The UI reads pending approval records and allows reviewers to:
  - **Pass** → update the handler result to `pass`
  - **Fail** → update the handler result to `fail`
  - **Cancel** → update the handler result to `fail` with an additional reason that it was cancelled 

#### Cancellation semantics

'Cancel' is operationally different from 'reject', but the state machine needs a finite set of statuses. The simplest option is:

- Treat **cancelled** as `fail` with `failedReason = "cancelled"` 

#### How the workflow resumes

UI updates the status of a specific handler. 

- **Event trigger**: Engine locates the persisted handler-state, and updates the status field pass/fail. The handler will automatically trigger an onChange event, causing the state-machine to recalculate and perform necessary next steps.

## Stateless and Idempotent Design

Multiple proxy instances may be running concurrently as part of the horizontal scaling requirement. Therefore:

- Application code must be **stateless**, outside of an operation currently in flight (which should be protected with optimistic locking).
- The authoritative source of state is a **transactional store**.
- Runtime must **rehydrate** from storage and continue deterministically.
- A locking mechanism should be thought through - ideally if the database handles locking, the record would be locked from the read to the update with optimistic locking or similar. 

Idempotency expectations:
- If a handler is re-run, it must detect prior completion and return the same result (or a compatible result), without duplicating side effects.
- Side effects must be guarded with stable keys (e.g. “approval_request_id”, “scan_run_id”) stored in `HandlerResult.data`.
- If a handler is in an Error state, the handler can be re-run in it's entirety.

---

## Workflow Definition Sketch

**NOTE:** *I think it should be possible to remove evidence_handlers and declare only control handlers, as control handlers *know* which evidence is required, therefore which evidence handlers can provide such evidence*

```yaml
actions:
  push:
    initial: RECEIVED
    states:
      RECEIVED:
        on:
          next: VALIDATING
      VALIDATING:
        evidence_handlers:
          - parse_request
          - read_pack
          - compute_diff
        control_handlers:
          - check_repo_allowlist
          - check_user_permissions
        on:
          pass: APPROVED
          fail: REJECTED
        
      ANALYZING:
        evidence_handlers:          
          - plugin.security.gitleaks
        control_handlers:
          - no_leaks          
        on:
          pass: STRAIGHT_THROUGH
          fail: APPROVAL_4_EYES

      STRAIGHT_THROUGH:
        control_handlers:
          - evaluate_straight_through
        on:
          pass: APPROVAL
          fail: REJECTED

      APPROVAL:
        evidence_handlers:
          - get_2_eyes_approval
        control_handlers:
          - evaluate_eyes_policy
        on:
          pass: APPROVED
          fail: REJECTED
          
      APPROVAL_4_EYES:
        evidence_handlers:
          - get_4_eyes_approval
        control_handlers:
          - evaluate_eyes_policy          
        on:
          pass: APPROVED          
          fail: REJECTED

      APPROVED:
        evidence_handlers:
          - wait_for_last_push
        on:
          pass: COMPLETED

      REJECTED:
        terminal: true

      COMPLETED:
        terminal: true

  pull:
    initial: RECEIVED
    states:
      RECEIVED:
        on:
          next: VALIDATING

      VALIDATING:
        evidence_handlers:
          - check_repo_allowlist
          - check_user_permissions
        on:
          pass: COMPLETED
          fail: DECIDING
          error: ERROR

      DECIDING:
        control_handlers:
          - check_allowed_foundation
          - check_open_source_licence
        on:
          pass: COMPLETED
          fail: REJECTED
          error: ERROR

      REJECTED:
        terminal: true

      COMPLETED:
        terminal: true

      ERROR:
        terminal: true
```

---

## Logical Storage Model

**Action**
- id
- type (push | pull)
- repo, user, timestamps
- workflowVersion
- currentState
- currentDecision (pass | fail | pending | error)

**StateHistory**
- actionId
- state
- enteredAt
- exitedAt
- outcome (pass | fail | pending | error)
- details (optional: aggregate handler summary)

**HandlerResult**
- actionId
- key
- status (pass | fail | pending | error)
- requiredEvidence (for control handlers)
- message
- data (structured JSON; includes idempotency tokens)

**Evidence**
- actionId
- evidenceKey
- producedBy (handler key)
- data (structured JSON)

---

## Mapping Handler Names to Implementation (Registry)

Workflow config references **stable handler keys**. Code binds keys to concrete implementations via a registry.

Design goals:
- Config stays stable even if code moves.
- Plugins register handlers via namespaced keys.
- Startup fails fast if config references missing handlers.

### Handler Function Shape

```ts
export type HandlerFn = (ctx: WorkflowContext) => Promise<HandlerResult>;
```

### Registry Interface

```ts
export interface HandlerRegistry {
  register: (key: string, fn: HandlerFn) => void;
  has: (key: string) => boolean;
  resolve: (key: string) => HandlerFn;
}
```

### Implementation Sketch (with aliases + validation)

```ts
export class DefaultHandlerRegistry implements HandlerRegistry {
  private handlers = new Map<string, HandlerFn>();
  
  register(key: string, fn: HandlerFn) {
    if (this.handlers.has(key)) throw new Error(`Handler already registered: ${key}`);
    this.handlers.set(key, fn);
  }

  has(key: string) {
    return this.handlers.has(key) || this.aliases.has(key);
  }

  resolve(key: string) {
    const fn = this.handlers.get(key);
    if (!fn) throw new Error(`Missing handler: ${key}`);
    return fn;
  }
  
  validateWorkflowDefinition(def: WorkflowDefinition) {
    const missing: string[] = [];
    for (const action of Object.values(def.actions)) {
      for (const state of Object.values(action.states)) {
        for (const key of [...(state.evidence_handlers ?? []), ...(state.control_handlers ?? [])]) {
          if (!this.has(key)) missing.push(key);
        }
      }
    }
    if (missing.length) {
      throw new Error(`Workflow references missing handlers: ${[...new Set(missing)].join(', ')}`);
    }
  }
}
```

### Core + Plugin Registration Example

```ts
const registry = new DefaultHandlerRegistry();

// core
registry.register('parse_request', handlers.parseRequest);
registry.register('check_repo_allowlist', handlers.checkRepoAllowlist);
registry.register('check_user_permissions', handlers.checkUserPermissions);
registry.register('read_pack', handlers.readPack);
registry.register('compute_diff', handlers.computeDiff);
registry.register('evaluate_policy', handlers.evaluatePolicy);
registry.register('require_approval', handlers.requireApproval);
registry.register('apply_rate_limits', handlers.applyRateLimits);
```

Adding a plugin at runtime. 

Recommended conventions:
- Core keys: `snake_case` (e.g., `check_repo_allowlist`).
- User defined plugin handlers: `plugin.<vendor>.<handler>` (namespaced, stable).

---

## Extensibility: Events

Three extension approaches:

** core Structural changes and extension**

- Add / customize states/handlers in the main YAML file.

This allows for maximum flexibility, but has the downside of being quite a core structural change, where the maintainer of the deployed instance becomes responsible for the workflow and each action and ensuring mandatory handlers are synced with new updates to git-proxy.


** Register a handler specific to your deployment in the deployment**

```ts

# index.ts

// registry your custom handler
registry.register('plugin.security.gitleaks', myextensions.gitleaksScan);

// Apply it to a state
registry.addHandlerToState('PUSH', 'ANALYZING', 'plugin.security.gitleaks');

```

This allows for a fully defined additional handler to be inserted into a specific state for an action. This allows for a fully fledged handler to be implemented, but still quite 'heavy'as it requires the full handler implmentation, with exit states etc. to be implemented.


**Event-based extension**

- Observe lifecycle events without modifying the workflow 

Event hooks (concept):
- `state:enter`, `state:exit`
- `handler:before`, `handler:after`, `handler:error`

Example: Send email on Pull completion

```ts
workflow.on('state:exit', ({ actionId, actionType, state }) => {
  if (actionType === 'pull' && state === 'COMPLETED') {
    email.send({
      to: getUserEmail(actionId),
      subject: 'Next steps: how to push',
      body: buildPushInstructions(actionId),
    });
  }
});
```

This is the easiest and simplest way to extend - workflow.on allows simple code to be hooked in on events, the workflow.on *DOES* not modify status - A very simple hook to run some extra non-state-chaning code. Great for cross-cutting concerns (audit, metrics), or small non-state-changing extensions (sending emails, notifications etc)



---

# Appendix A: OPA Integration (Runtime Policy Evaluation)

OPA is the **decision engine** for “is this allowed?” based on context:

- who (user, groups, roles)
- what (push/pull)
- where (repo, org, branch, env)
- evidence (scan results, secrets findings, risk score)
- workflow context (state, approvals, override flags)

OPA does not run workflows; it returns a decision + reasons + obligations. Your workflow runner enforces that decision by mapping to handler status pass/fail/pending (and possibly creating approvals).

## A2. OPA Decision Contract

### Input (what `evaluate_policy` sends)

```json
{
  "request_id": "act_123",
  "action": "push",
  "repo": {
    "full_name": "org/repo",
    "visibility": "private",
    "default_branch": "main",
    "tags": ["prod", "pci"]
  },
  "ref": {
    "type": "branch",
    "name": "main"
  },
  "actor": {
    "username": "grovesy",
    "groups": ["devs", "platform"],
    "roles": ["developer"],
    "mfa": true
  },
  "context": {
    "workflow_state": "DECIDING",
    "workflow_version": "2026-02-09",
    "ip": "203.0.113.10"
  },
  "evidence": {
    "scans": {
      "gitleaks": {
        "status": "completed",
        "findings": [
          { "rule": "aws-access-key", "severity": "high", "file": "secrets.txt" }
        ]
      }
    },
    "risk": {
      "score": 8.4,
      "reasons": ["secret_high"]
    }
  },
  "approvals": {
    "security": { "status": "none" },
    "repo_admin": { "status": "none" }
  }
}
```

### Output (what OPA returns)

```json
{
  "decision": "pending",
  "reasons": [
    "High severity secret detected on protected branch main",
    "Security approval required when risk.score >= 7"
  ],
  "obligations": [
    { "type": "require_approval", "key": "security", "min_reviewers": 1 }
  ],
  "policy": {
    "package": "gitproxy.authz",
    "rule": "decision",
    "version": "bundle:prod@sha256:abcd..."
  }
}
```

Mapping to workflow:
- `allow` → handler result `pass`
- `deny` → handler result `fail`
- `pending` → handler result `pending` (+ obligations for `require_approval`)

---

# Appendix B: OSCAL Integration (Controls + Implementations + Evidence)

This section adds a coherent architecture for expressing policies as **OSCAL controls** (control framework language) and mapping them to implementations in GitProxy (handlers/plugins) and runtime evaluation in OPA.

## B1. Policy Model: OSCAL → OPA → Workflow

```
OSCAL (Catalog/Profile/Component Definition)
  - controls, params, obligations, evidence expectations
        ↓
Policy Build Pipeline compiles OSCAL into:
  - data.json (control set + parameters + obligations)
  - rego modules (decision logic)
        ↓
OPA loads bundle, evaluates runtime input
        ↓
Workflow enforces allow/deny/pending and manages approvals
```

## B2. Where OSCAL touches the state machine

The state machine remains unchanged structurally:

- Evidence is produced by handlers in VALIDATING/ANALYZING (and stored in HandlerResult)
- DECIDING runs:
  - `evaluate_policy` (calls OPA with runtime input + control_set reference)
  - `require_approval` (enforces obligations returned by OPA)

The only addition is that `evaluate_policy` includes a `control_set` pointer in the OPA input so decisions can vary by environment/profile.

### Example addition to OPA input

```json
{
  "control_set": {
    "oscal_profile_id": "gp-profile-prod",
    "version": "2026-02-09"
  }
}
```

## B3. OSCAL artifact usage

Use these OSCAL models together:

- **Catalog**: baseline control catalog (the “what”)
- **Profile**: tailored selection for environment (the “which” + parameters)
- **Component Definition**: components + how controls are implemented (the “how”)
- (Optional) **Assessment Results**: periodic/audit outputs (the “evidence in OSCAL form”)

## B4. Example OSCAL control + implementation intent (illustrative)

> NOTE: This is an illustrative fragment. A real OSCAL document must conform to the OSCAL JSON schema structure.

```json
{
  "control": {
    "id": "gp-secret-detection",
    "title": "Secrets must not be pushed to protected branches",
    "description": "High severity secrets require approval",
    "params": [
      { "id": "severity-threshold", "value": "high" },
      { "id": "protected-branch-mode", "value": "default_branch_only" }
    ],
    "implementation": {
      "enforced-by": ["plugin.security.gitleaks", "evaluate_policy", "require_approval"],
      "evidence": [
        {
          "type": "scan",
          "source": "gitleaks",
          "fields": ["severity", "rule", "file"]
        }
      ],
      "obligations": [
        {
          "type": "approval",
          "key": "security",
          "when": "severity >= high AND protected_branch = true"
        }
      ]
    }
  }
}
```

## B5. Mapping controls to handlers/plugins

A control maps to enforcement and evidence producers:

| Control | Evidence Producers | Enforcement Points |
|--------|---------------------|--------------------|
| `gp-secret-detection` | `plugin.security.gitleaks` | `evaluate_policy`, `require_approval` |
| `gp-mfa-protected-branch` | `parse_request`, identity context | `evaluate_policy` |
| `gp-risk-gate` | `plugin.risk.score` | `evaluate_policy`, `require_approval` |

---

# Appendix C: Reference Architecture (OSCAL → OPA Bundles) and OPA Agents

This section makes the OSCAL integration “real”: it defines the supply chain that turns OSCAL artifacts into runtime policy and data that OPA can evaluate with low latency.

## C1. Reference Architecture Diagram (Policy Supply Chain)

```mermaid
flowchart TD

  subgraph Authoring["Policy Authoring (Git)"]
    CAT["OSCAL Catalog"]
    PROF["OSCAL Profile (Tailoring/Params)"]
    COMP["OSCAL Component Definition (Implementation Mapping)"]
    MAPT["Mapping Templates (OSCAL→Policy Model)"]
  end

  subgraph Pipeline["Policy Build Pipeline (CI)"]
    VAL["Validate OSCAL + Schemas"]
    NORM["Resolve Profile + Normalize Controls"]
    GEN["Generate Runtime Control Data"]
    REGO["Generate/Assemble Rego Modules"]
    BLD["Build OPA Bundle (rego + data.json)"]
    SIGN["Sign/Attest Bundle (optional)"]
  end

  subgraph Distribution["Distribution"]
    REG["Bundle Registry (OCI/HTTP/S3)"]
  end

  subgraph Runtime["Runtime"]
    OPAAG["OPA Agent (sidecar/central/embedded)"]
    GP["GitProxy Workflow Engine"]
  end

  CAT --> VAL
  PROF --> VAL
  COMP --> VAL
  MAPT --> GEN

  VAL --> NORM --> GEN --> REGO --> BLD --> SIGN --> REG

  REG --> OPAAG
  GP -->|evaluate_policy| OPAAG
```

## C2. What gets generated from OSCAL

A practical split is:

- **Rego modules**: stable logic (decision rules, helper functions)
- **Data documents**: environment-specific control sets and parameters derived from OSCAL profile/catalog

Example `data.json` shape:

```json
{
  "control_sets": {
    "gp-profile-prod": {
      "version": "2026-02-09",
      "controls": {
        "gp-secret-detection": {
          "enabled": true,
          "params": { "severity_threshold": "high" },
          "applies_to": { "protected_branches": true },
          "obligations": [
            { "type": "require_approval", "key": "security", "min_reviewers": 1 }
          ]
        }
      }
    }
  }
}
```

Rego then consults `data.control_sets[input.control_set.oscal_profile_id]...` to apply the correct control set.

## C3. Unified runtime diagram (GP + workflow + OSCAL + OPA)

```mermaid
flowchart LR
  A["Git Request\n(push/pull)"] --> B["Workflow Runner\n(State Machine)"]

  B --> C["Handlers\nVALIDATING/ANALYZING\n(evidence)"]
  C --> D["HandlerResult Store\n(transactional)"]

  B --> E["evaluate_policy\n(OPA client)"]
  E --> F["OPA Agent\n(PDP)"]
  F --> G["OPA Bundle\n(rego + data.json)\ncompiled from OSCAL"]

  F --> H["Decision + Obligations\nallow/deny/pending"]
  H --> I["require_approval\n(UI-driven re-entrant handler)"]
  I --> J["Approval UI"]

  H --> K["Transition\nAPPROVED/REJECTED/PENDING"]
  K --> L["Persist Action + StateHistory + Audit"]
  L --> M["Respond to Git\nallow/block"]
```

## C4. OPA Agents (deployment patterns)

### Sidecar (recommended for k8s)
- OPA runs next to GitProxy (same pod)
- low latency, independent scaling/rollout of policy bundles

### Central PDP
- one OPA cluster serves many GitProxy instances
- centralized governance and caching
- needs HA + scaling + tenancy controls

### Embedded WASM
- compile policies to WASM and embed in GitProxy
- fastest decision path
- policy updates become a “hot-load WASM module” problem

---

# Appendix D: OPA Policies (Rego DSL) — Examples

Package: `gitproxy.authz`  
Rule: `decision`  
API path: `/v1/data/gitproxy/authz/decision`

## D1. Base skeleton

```rego
package gitproxy.authz

default decision := {
  "decision": "deny",
  "reasons": ["Default deny"],
  "obligations": []
}

is_push { input.action == "push" }
is_pull { input.action == "pull" }

protected_branch {
  input.ref.type == "branch"
  input.ref.name == input.repo.default_branch
}
```

## D2. Deny protected-branch push without MFA

```rego
decision := out {
  is_push
  protected_branch
  not input.actor.mfa
  out := {
    "decision": "deny",
    "reasons": ["MFA required for pushes to protected branch"],
    "obligations": []
  }
}
```

## D3. Secret finding triggers approval obligation

```rego
has_high_secret {
  some i
  f := input.evidence.scans.gitleaks.findings[i]
  f.severity == "high"
}

decision := out {
  is_push
  protected_branch
  has_high_secret
  input.approvals.security.status != "approved"

  out := {
    "decision": "pending",
    "reasons": ["High severity secret detected; security approval required"],
    "obligations": [
      {"type": "require_approval", "key": "security", "min_reviewers": 1}
    ]
  }
}
```

---

# Appendix E: Operational Notes (Auditability + Versioning)

For every request/action, persist:

- workflow version (YAML/state machine)
- OPA bundle digest (or policy git SHA)
- OSCAL profile/catalog/component version(s)
- decisions + reasons + obligations
- evidence references (scan_run_id etc.)

This enables:
- deterministic replay
- audit trails
- compliance reporting (optionally generating OSCAL Assessment Results)

---



Control	Evidence Producers	Enforcement Points
`gp-secret-detection`	`plugin.security.gitleaks`	`evaluate_policy`, `require_approval`
`gp-mfa-protected-branch`	`parse_request`, identity context	`evaluate_policy`
`gp-risk-gate`	`plugin.risk.score`	`evaluate_policy`, `require_approval`

Requirement	Related issues
1. OSCAL integration	(covered by this proposal)
2. OPA compatibility	#99
3. Workflow flexibility	(covered by this proposal), #1350, #1365
4. Org-specific integrations	#1121
5. Org-specific controls/scanners	#26
Non-functional (horizontal scaling)	#1386

New workflow architecture #1383

Description

future Architecture goals and high-level design

Requirements (Short)

Non-functional

Issue Map (Requirements -> Issues)

Requirement Details and Rationale

Critique of Current Workflow (Chain) Design

How This Proposal Solves Current Issues (and Linked Issues)

Workflow Engine vs State Machine

Overview

Basic Hierarchy

Handler Execution Model

Handler Types

Evidence and Control Dependencies

Evidence Handler Result

Control Handler Result

Requirements

Async and Re-Entrant Handlers

UI-Driven Approvals - (A type of re-entrant handler)

Model

Cancellation semantics

How the workflow resumes

Stateless and Idempotent Design

Workflow Definition Sketch

Logical Storage Model

Mapping Handler Names to Implementation (Registry)

Handler Function Shape

Registry Interface

Implementation Sketch (with aliases + validation)

Core + Plugin Registration Example

Extensibility: Events

Appendix A: OPA Integration (Runtime Policy Evaluation)

A2. OPA Decision Contract

Input (what evaluate_policy sends)

Output (what OPA returns)

Appendix B: OSCAL Integration (Controls + Implementations + Evidence)

B1. Policy Model: OSCAL → OPA → Workflow

B2. Where OSCAL touches the state machine

Example addition to OPA input

B3. OSCAL artifact usage

B4. Example OSCAL control + implementation intent (illustrative)

B5. Mapping controls to handlers/plugins

Appendix C: Reference Architecture (OSCAL → OPA Bundles) and OPA Agents

C1. Reference Architecture Diagram (Policy Supply Chain)

C2. What gets generated from OSCAL

C3. Unified runtime diagram (GP + workflow + OSCAL + OPA)

C4. OPA Agents (deployment patterns)

Sidecar (recommended for k8s)

Central PDP

Embedded WASM

Appendix D: OPA Policies (Rego DSL) — Examples

D1. Base skeleton

D2. Deny protected-branch push without MFA

D3. Secret finding triggers approval obligation

Appendix E: Operational Notes (Auditability + Versioning)

Sub-issues

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Input (what `evaluate_policy` sends)