Skip to content

v0.7 hardening: API auth, policy enforcement, safe evidence paths, audit integrity#2

Open
Protonmatter wants to merge 13 commits into
mainfrom
hardening-v0.7-production-rails
Open

v0.7 hardening: API auth, policy enforcement, safe evidence paths, audit integrity#2
Protonmatter wants to merge 13 commits into
mainfrom
hardening-v0.7-production-rails

Conversation

@Protonmatter
Copy link
Copy Markdown
Owner

Summary

This PR hardens the MX CyberOps Agent trust boundary and closes several high-risk gaps identified during the deep review.

Primary focus areas:

  • API authentication and rate limiting
  • Connector policy enforcement
  • Safe evidence-pack filesystem handling
  • Tamper-evident audit logging
  • Graph token expiry and retry handling
  • Regression tests for hardening controls
  • Policy semantic cleanup for draft vs. write operations

Changes

API boundary hardening

  • Added control_plane/api_security.py
  • Added explicit API-key dependency
  • Added lightweight in-process rate limiting
  • Fail-closed behavior when no API key is configured
  • Explicit insecure local-lab override (MXCYBER_API_ALLOW_INSECURE=true)
  • Added protected dependencies to sensitive API routes

Connector policy enforcement

  • /connectors/{name}/query now routes through PolicyEngine
  • Connector operations now generate audit events for:
    • denied
    • approval-required
    • success
    • failure
  • Added connector operation policy entries

Evidence-pack hardening

  • Added utils/safe_paths.py
  • Sanitized incident_id and run_id
  • Added resolved-path containment checks
  • Removed caller-controlled evidence_root
  • Added SHA-256 artifact hashes to manifest
  • Upgraded manifest version to 0.7.0

Audit integrity

  • Added hash-chained JSONL audit records
  • Added schema versioning (audit.v1)
  • Added prev_hash and event_hash

Microsoft Graph connector hardening

  • Added token expiry tracking
  • Added token refresh margin
  • Added 401 retry behavior with token invalidation

Policy semantics cleanup

  • create_ticket now requires approval
  • Added explicit draft_ticket / draft_jsm_ticket read-only operations

Tests

Added regression coverage for:

  • unsafe path segments
  • evidence traversal prevention
  • manifest hashing
  • audit hash chaining
  • API key fail-closed behavior
  • policy approval semantics

Safety posture

This PR preserves the repo’s public-safe defensive posture:

  • read-only-first behavior
  • draft-only ticketing
  • analyst approval requirements
  • evidence-oriented workflows
  • no destructive execution paths added

Follow-up work

Planned next hardening wave:

  • structured retry/backoff in HTTP client
  • mypy + bandit + semgrep CI gates
  • Dockerized lab deployment
  • request tracing middleware
  • signed evidence manifests
  • pagination helpers for Graph/Okta

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 68d156252d

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread apps/api/main.py
tool=name,
action=req.operation,
target_type="connector-record",
target_count=req.target_count,
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Derive policy target_count from query params

connector_query now treats target_count as trusted input from the request body, so callers can bypass max_targets by sending a low target_count while requesting a large result set in params (for example, top=5000). Because PolicyEngine.evaluate only checks target_count, this weakens the new policy enforcement for connector reads and allows oversized queries that should be denied.

Useful? React with 👍 / 👎.

Comment thread .github/workflows/ci.yml
Comment on lines +49 to +50
- name: Docker build
run: docker build -t mx-cyberops-agent:ci .
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Remove failing Docker build gate or add Dockerfile

The workflow now unconditionally runs docker build -t mx-cyberops-agent:ci ., but this repository does not contain a Dockerfile, so the new CI step will fail every run and block merges. This is a hard failure in the default pipeline unless a Dockerfile is added or the step is made conditional/pointed at an existing build file.

Useful? React with 👍 / 👎.

Comment on lines +75 to +76
route = request.url.path
rate_limiter.check(f"{client}:{route}")
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Key rate limiting on route template, not raw URL path

The limiter key uses request.url.path, which includes path parameters, so clients can evade limits on /connectors/{name}/query by cycling connector names (each name gets a fresh bucket). This bypasses the newly added abuse control and allows effectively unbounded request volume from one client IP.

Useful? React with 👍 / 👎.

Comment thread control_plane/audit.py
Comment on lines +45 to +48
with self.audit_path.open("r", encoding="utf-8") as f:
for line in f:
if line.strip():
last_line = line
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Cache previous audit hash instead of rescanning file

Each audit write calls _last_hash(), which linearly scans the entire JSONL file to find the last record before appending. In a long-running API process this makes logging cost grow with file size and turns total runtime into quadratic behavior, causing avoidable latency spikes as audit volume increases.

Useful? React with 👍 / 👎.

Comment thread control_plane/audit.py
Comment on lines +28 to +29
prev_hash = self._last_hash()
record = {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Serialize hash-chain updates across concurrent writers

The hash chain update is a read-then-append sequence (_last_hash() followed by file append) without any file/process lock, so concurrent writers can compute the same prev_hash and append different successor records. In multi-worker deployments this creates forked chains that break the tamper-evident guarantee and make downstream verification ambiguous.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant