Skip to content

Conversation

@adityachoudhari26
Copy link
Contributor

@adityachoudhari26 adityachoudhari26 commented Dec 23, 2025

Summary by CodeRabbit

  • New Features

    • Added configurable automatic rollback: system can create and dispatch a rollback job to the last successful release when configured job statuses occur or when verification failures are enabled.
    • Policy rules can now include rollback settings to control which statuses or verification outcomes trigger rollbacks.
    • Rollback handling integrated into the release orchestration and status-change processing.
  • Tests

    • Added comprehensive end-to-end tests for job- and verification-based rollback scenarios, policy matching, and edge cases.

✏️ Tip: You can customize this high-level summary in your review settings.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 23, 2025

📝 Walkthrough

Walkthrough

Adds a rollback feature: new OpenAPI RollbackRule, generated Go types, policy evaluator and rollback evaluator, trigger emission for job status changes, rollback action and hooks that create/dispatch rollback jobs, workspace/manager integration, and end-to-end tests and test helpers.

Changes

Cohort / File(s) Summary
Schema & Generated Types
apps/workspace-engine/oapi/openapi.json, apps/workspace-engine/oapi/spec/schemas/policy.jsonnet, apps/workspace-engine/pkg/oapi/oapi.gen.go
Add RollbackRule schema with onJobStatuses and onVerificationFailure. Extend PolicyRule with optional rollback field and generate corresponding Go type RollbackRule.
Action Trigger System
apps/workspace-engine/pkg/workspace/releasemanager/action/action.go, apps/workspace-engine/pkg/workspace/releasemanager/action/orchestrator.go
Add TriggerJobStatusChange constant and emit it when a job's status changes (current ≠ previous).
Rollback Action & Hooks
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go, apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go
New RollbackAction and RollbackHooks to evaluate policies, create/upsert rollback jobs targeting last successful release, and dispatch them; include tracing and persistence logic.
Policy Evaluation
apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/evaulator.go, apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/rollback/rollback.go
Add RuleTypeRollback constant and implement RollbackEvaluator that inspects latest job, configured rollback statuses and onVerificationFailure to decide rollback applicability.
Manager & Workspace Integration
apps/workspace-engine/pkg/workspace/releasemanager/manager.go, apps/workspace-engine/pkg/workspace/workspace.go
Register rollback hooks in verification composite hooks and add RollbackAction to the workspace action pipeline (with a jobs dispatcher).
Tests & Test Helpers
apps/workspace-engine/test/integration/opts.go, apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
Add WithRuleRollback* policy builders and comprehensive e2e tests covering job- and verification-triggered rollback scenarios, edge cases, and helper createJobUpdateEvent.

Sequence Diagrams

sequenceDiagram
    autonumber
    participant Agent as Job Agent
    participant Orchestrator as Action Orchestrator
    participant RollbackAction as Rollback Action
    participant Store as Store/DB
    participant Dispatcher as Job Dispatcher

    Agent->>Orchestrator: Job status update (status change)
    Orchestrator->>RollbackAction: Execute(TriggerJobStatusChange)
    RollbackAction->>Store: Load job, release, policies
    RollbackAction->>RollbackAction: shouldRollback(check rules & statuses)
    alt rollback applicable && prior release exists
        RollbackAction->>Store: Create/upsert rollback Job (new UUID, pending)
        RollbackAction->>Dispatcher: Dispatch rollback job
        Dispatcher->>Agent: Execute rollback job
    else no rollback or no prior release
        RollbackAction->>Store: record event/span and exit
    end
Loading
sequenceDiagram
    autonumber
    participant VerMgr as Verification Manager
    participant RollbackHooks as Rollback Hooks
    participant PolicyEval as Policy Evaluator
    participant Store as Store/DB
    participant Dispatcher as Job Dispatcher

    VerMgr->>RollbackHooks: OnVerificationComplete(result)
    RollbackHooks->>Store: Load job & release
    RollbackHooks->>PolicyEval: shouldRollbackOnVerificationFailure(policies)
    PolicyEval->>PolicyEval: inspect RollbackRule.onVerificationFailure & statuses
    alt configured & prior release exists
        RollbackHooks->>Store: Create/upsert rollback Job
        RollbackHooks->>Dispatcher: Dispatch rollback job
    else not configured or no prior release
        RollbackHooks->>RollbackHooks: record span/event and exit
    end
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~45 minutes

🐰 I sniffed the policies, found the last good hop,
I craft a fresh job, then send it to stop the drop,
If statuses or verifications say "rewind",
I hop back softly, leave no mess behind,
A tiny rabbit rollback — restore, then hop.

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Title check ❓ Inconclusive The title 'chore: rollback rule' is vague and generic, using non-descriptive terms that don't convey the specific implementation details or meaningful context about the extensive rollback feature additions. Consider a more descriptive title like 'feat: add rollback policy support for job failures and verification events' to better reflect the comprehensive feature implementation.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Docstring Coverage ✅ Passed Docstring coverage is 80.95% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate docstrings
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch rollback-rule

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (4)
apps/workspace-engine/oapi/spec/schemas/policy.jsonnet (1)

68-69: RollbackRule schema wiring looks good; consider field-name casing

The new rollback: openapi.schemaRef('RollbackRule') on PolicyRule and the RollbackRule definition itself are wired consistently with the rest of the schema surface.

One minor API-shape nit: the property name rollBackJobStatuses is slightly inconsistent with other fields like retryOnStatuses / successStatuses. If you want to standardize before this goes public, rollbackJobStatuses (lowercase “b”) would better match existing naming patterns in both JSON and Go.

Please double‑check any consumers/generated code for rollBackJobStatuses before considering a rename, since this is part of the external API.

Also applies to: 239-253

apps/workspace-engine/oapi/openapi.json (1)

1287-1335: OpenAPI rollback schema matches jsonnet; watch rollBackJobStatuses casing

The OpenAPI additions for:

  • PolicyRule.properties.rollback -> $ref: "#/components/schemas/RollbackRule", and
  • the new RollbackRule schema with onVerificationFailure and rollBackJobStatuses

are consistent with the jsonnet source and the rest of the components block.

Same minor naming concern as in the jsonnet: rollBackJobStatuses breaks the usual lower‑camel convention and will be the canonical JSON field name exposed to clients. If you intend rollbackJobStatuses, now is the time to adjust (and regenerate) before external consumers depend on it.

Please confirm the desired JSON field spelling and, if you change it, regenerate oapi.gen.go and update any callers relying on the old name.

Also applies to: 1857-1873

apps/workspace-engine/pkg/workspace/workspace.go (1)

3-16: Rollback action wiring into workspace orchestrator looks coherent

The new rollback action:

RegisterAction(
    rollback.NewRollbackAction(
        s,
        jobs.NewDispatcher(s, ws.releasemanager.VerificationManager()),
    ),
)

is wired consistently with the rest of the workspace setup: it shares the same store and VerificationManager, and is appended to the existing action chain without altering prior behavior.

Creating a dedicated Dispatcher instance here is fine; if you ever want to reuse it across actions, you could hoist it into the workspace struct, but that’s optional.

Also applies to: 36-54

apps/workspace-engine/test/e2e/engine_policy_rollback_test.go (1)

487-585: Strong rollback e2e coverage; consider avoiding fixed time.Sleep for verification flows

These tests do a good job covering:

  • Job-status-based rollback (single and multiple statuses, disabled/mismatched policies, no previous release).
  • Verification-based rollback with WithRuleRollbackOnVerificationFailure.
  • Combined job + verification configuration and the “same release” edge case.
  • Realistic wiring via events and the createJobUpdateEvent helper.

One thing to watch: the verification-centric tests rely on time.Sleep(3 * time.Second) to wait for verification to complete. That works, but it lengthens the suite and can become flaky if verification processing ever takes longer than expected.

If you start seeing flakes or want to speed things up, consider replacing the fixed sleeps with a small polling loop (with a timeout) that checks either:

  • The verification status in the store, or
  • The presence/absence of pending jobs indicating rollback has (or has not) occurred.

For now, the structure and assertions look solid.

Also applies to: 591-685, 687-769, 771-850, 852-964, 966-983

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 6ce45fb and a011534.

📒 Files selected for processing (13)
  • apps/workspace-engine/oapi/openapi.json
  • apps/workspace-engine/oapi/spec/schemas/policy.jsonnet
  • apps/workspace-engine/pkg/oapi/oapi.gen.go
  • apps/workspace-engine/pkg/workspace/releasemanager/action/action.go
  • apps/workspace-engine/pkg/workspace/releasemanager/action/orchestrator.go
  • apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go
  • apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go
  • apps/workspace-engine/pkg/workspace/releasemanager/manager.go
  • apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/evaulator.go
  • apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/rollback/rollback.go
  • apps/workspace-engine/pkg/workspace/workspace.go
  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
  • apps/workspace-engine/test/integration/opts.go
🧰 Additional context used
📓 Path-based instructions (3)
apps/workspace-engine/**/*.go

📄 CodeRabbit inference engine (apps/workspace-engine/CLAUDE.md)

apps/workspace-engine/**/*.go: Do not add extraneous inline comments that state the obvious
Do not add comments that simply restate what the code does
Do not add comments for standard Go patterns (e.g., noting WaitGroup or semaphore usage)
Write comments that explain why, document complex logic/algorithms, provide non-obvious context, include TODO/FIXME, and document exported functions/types/methods

Files:

  • apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/evaulator.go
  • apps/workspace-engine/pkg/workspace/workspace.go
  • apps/workspace-engine/pkg/workspace/releasemanager/action/action.go
  • apps/workspace-engine/test/integration/opts.go
  • apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go
  • apps/workspace-engine/pkg/oapi/oapi.gen.go
  • apps/workspace-engine/pkg/workspace/releasemanager/manager.go
  • apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go
  • apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/rollback/rollback.go
  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
  • apps/workspace-engine/pkg/workspace/releasemanager/action/orchestrator.go
apps/workspace-engine/**/*_test.go

📄 CodeRabbit inference engine (apps/workspace-engine/CLAUDE.md)

Follow the existing test structure used in *_test.go files

Files:

  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
**/*.{js,jsx,ts,tsx,json,md,yml,yaml}

📄 CodeRabbit inference engine (CLAUDE.md)

Formatting: Prettier is used with @ctrlplane/prettier-config

Files:

  • apps/workspace-engine/oapi/openapi.json
🧠 Learnings (11)
📓 Common learnings
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 637
File: packages/events/src/kafka/client.ts:10-16
Timestamp: 2025-08-01T04:41:41.345Z
Learning: User adityachoudhari26 prefers not to add null safety checks for required environment variables when they are guaranteed to be present in their deployment configuration, similar to their preference for simplicity over defensive programming in test code.
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 601
File: e2e/tests/api/policies/retry-policy.spec.ts:23-24
Timestamp: 2025-06-24T23:52:50.732Z
Learning: The user adityachoudhari26 prefers not to add null safety checks or defensive programming in test code, particularly in e2e tests, as they prioritize simplicity and focus on the main functionality being tested rather than comprehensive error handling within the test itself.
📚 Learning: 2025-08-12T20:49:05.086Z
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 655
File: apps/workspace-engine/pkg/engine/workspace/fluent.go:166-171
Timestamp: 2025-08-12T20:49:05.086Z
Learning: The UpdateDeploymentVersions() method with OperationCreate case in apps/workspace-engine/pkg/engine/workspace/fluent.go is specifically designed only for creating new deployment versions, not for handling potential duplicates or existing versions.

Applied to files:

  • apps/workspace-engine/pkg/workspace/workspace.go
  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*.go : Write comments that explain why, document complex logic/algorithms, provide non-obvious context, include TODO/FIXME, and document exported functions/types/methods

Applied to files:

  • apps/workspace-engine/pkg/workspace/workspace.go
  • apps/workspace-engine/test/integration/opts.go
  • apps/workspace-engine/pkg/workspace/releasemanager/manager.go
  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-10-24T00:02:29.723Z
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 696
File: apps/workspace-engine/pkg/workspace/releasemanager/deployment/executor.go:63-67
Timestamp: 2025-10-24T00:02:29.723Z
Learning: In workspace-engine replay mode (apps/workspace-engine/pkg/workspace/releasemanager/deployment/executor.go), jobs are created with Pending status even during replay because job updates are still accepted and processed. If a job remains Pending after replay completes, it genuinely is still pending because no updates occurred during replay.

Applied to files:

  • apps/workspace-engine/pkg/workspace/workspace.go
  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*_test.go : Follow the existing test structure used in *_test.go files

Applied to files:

  • apps/workspace-engine/test/integration/opts.go
  • apps/workspace-engine/pkg/workspace/releasemanager/manager.go
  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/pkg/mapping/protobuf_mappings.go : Add mapping functions for the new condition in pkg/mapping/protobuf_mappings.go

Applied to files:

  • apps/workspace-engine/test/integration/opts.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*.go : Do not add comments for standard Go patterns (e.g., noting WaitGroup or semaphore usage)

Applied to files:

  • apps/workspace-engine/pkg/workspace/releasemanager/manager.go
  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/pkg/model/selector/**/*_test.go : Write comprehensive, data-driven tests for new condition types

Applied to files:

  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*.go : Do not add comments that simply restate what the code does

Applied to files:

  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/pkg/model/selector/**/*_test.go : Use table-driven tests for all condition types

Applied to files:

  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*.go : Do not add extraneous inline comments that state the obvious

Applied to files:

  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
🧬 Code graph analysis (8)
apps/workspace-engine/pkg/workspace/workspace.go (2)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go (1)
  • NewRollbackAction (25-30)
apps/workspace-engine/pkg/workspace/releasemanager/deployment/jobs/dispatcher.go (1)
  • NewDispatcher (20-25)
apps/workspace-engine/test/integration/opts.go (1)
apps/workspace-engine/pkg/oapi/oapi.gen.go (3)
  • JobStatus (688-688)
  • PolicyRule (791-805)
  • RollbackRule (1001-1007)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go (3)
packages/trpc/src/trpc.ts (1)
  • Context (26-26)
apps/workspace-engine/pkg/workspace/releasemanager/action/action.go (2)
  • ActionTrigger (9-9)
  • ActionContext (20-24)
apps/workspace-engine/pkg/oapi/oapi.gen.go (6)
  • Release (902-908)
  • Job (657-671)
  • ReleaseTarget (911-915)
  • JobAgentConfig (683-685)
  • Policy (768-781)
  • JobStatus (688-688)
apps/workspace-engine/pkg/oapi/oapi.gen.go (1)
apps/web/app/routes/ws/deployments/_components/types.ts (1)
  • JobStatus (11-21)
apps/workspace-engine/pkg/workspace/releasemanager/manager.go (3)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go (1)
  • NewRollbackHooks (23-28)
apps/workspace-engine/pkg/workspace/releasemanager/deployment/jobs/dispatcher.go (1)
  • NewDispatcher (20-25)
apps/workspace-engine/pkg/workspace/releasemanager/verification/hooks.go (1)
  • NewCompositeHooks (62-64)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go (3)
apps/workspace-engine/pkg/workspace/releasemanager/verification/hooks.go (1)
  • VerificationHooks (8-23)
apps/workspace-engine/pkg/oapi/oapi.gen.go (7)
  • JobVerification (710-721)
  • VerificationMeasurement (1148-1160)
  • JobVerificationStatusFailed (158-158)
  • ReleaseTarget (911-915)
  • Job (657-671)
  • JobAgentConfig (683-685)
  • Policy (768-781)
apps/workspace-engine/pkg/workspace/releasemanager/manager.go (1)
  • New (43-64)
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go (6)
apps/workspace-engine/pkg/oapi/oapi.gen.go (11)
  • JobStatusFailure (129-129)
  • JobStatusSuccessful (135-135)
  • JobStatusCancelled (127-127)
  • JobStatusInvalidJobAgent (132-132)
  • PolicyTargetSelector (841-846)
  • PolicyRule (791-805)
  • RollbackRule (1001-1007)
  • MetricProvider (752-754)
  • SleepMetricProvider (1050-1055)
  • Sleep (205-205)
  • VerificationMetricSpec (1166-1188)
apps/workspace-engine/test/integration/creators/deployment_version.go (1)
  • NewDeploymentVersion (15-36)
apps/workspace-engine/pkg/events/handler/handler.go (4)
  • DeploymentVersionCreate (44-44)
  • JobUpdate (68-68)
  • PolicyCreate (70-70)
  • ResourceUpdate (27-27)
apps/workspace-engine/test/integration/creators/policy.go (2)
  • NewPolicy (13-31)
  • NewPolicyTargetSelector (34-40)
apps/workspace-engine/pkg/memsql/memsql_test.go (1)
  • ResourceSelector (2874-2877)
packages/rule-engine/src/manager/types.ts (1)
  • ReleaseManager (10-12)
apps/workspace-engine/pkg/workspace/releasemanager/action/orchestrator.go (1)
apps/workspace-engine/pkg/workspace/releasemanager/action/action.go (1)
  • TriggerJobStatusChange (16-16)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: workspace-engine-tests
  • GitHub Check: Lint
  • GitHub Check: tests
  • GitHub Check: Typecheck
  • GitHub Check: build (linux/amd64)
🔇 Additional comments (7)
apps/workspace-engine/pkg/workspace/releasemanager/action/action.go (1)

11-17: Trigger enum extension looks consistent

Adding TriggerJobStatusChange alongside the other job lifecycle triggers is consistent and matches its usage in determineTrigger. No issues here.

apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/evaulator.go (1)

62-72: Rollback rule type constant is aligned and safe

Adding RuleTypeRollback = "rollback" to the rule-type constants cleanly extends the existing set and matches the new rollback evaluator/policy wiring. No behavior changes in this file.

apps/workspace-engine/pkg/workspace/releasemanager/action/orchestrator.go (1)

110-139: New generic status-change trigger broadens action invocation surface

The fallback:

if currentStatus != previousStatus {
    return TriggerJobStatusChange
}

is a clean way to surface all non‑special status transitions while preserving precedence for the explicit created/started/success/failure cases.

This does mean actions will now be called for more transitions (e.g., cancelled/skipped) if they choose to handle TriggerJobStatusChange. As long as existing actions either switch on specific triggers or no‑op for unknown ones (per the Execute contract), this is safe.

It’s worth quickly scanning existing PolicyAction.Execute implementations to ensure they don’t implicitly treat any non‑empty trigger as applicable.

apps/workspace-engine/pkg/workspace/releasemanager/manager.go (1)

3-24: Rollback hooks are correctly integrated into the verification hook chain

In New, wiring:

releaseManagerHooks := newReleaseManagerVerificationHooks(store, stateCache)
rollbackHooks := rollback.NewRollbackHooks(store, jobs.NewDispatcher(store, verificationManager))
compositeHooks := verification.NewCompositeHooks(releaseManagerHooks, rollbackHooks)
verificationManager.SetHooks(compositeHooks)

cleanly composes rollback behavior into the existing verification lifecycle without altering the public API. Using the same store and verificationManager keeps the dispatcher and hooks aligned with the rest of the release manager.

Also applies to: 48-56

apps/workspace-engine/pkg/oapi/oapi.gen.go (1)

790-805: RollbackRule wiring into PolicyRule looks consistent

The added Rollback *RollbackRule field on PolicyRule and the RollbackRule type (with OnVerificationFailure and RollBackJobStatuses) follow the existing OpenAPI-generated patterns and align with how other rule types (e.g., RetryRule, VersionCooldownRule) are modeled. No issues from the workspace-engine perspective as long as the OpenAPI spec is the single source of truth for these fields.

Also applies to: 1000-1007

apps/workspace-engine/test/integration/opts.go (1)

901-939: Rollback PolicyRuleOption helpers are consistent with existing option patterns

WithRuleRollback, WithRuleRollbackOnJobStatuses, and WithRuleRollbackOnVerificationFailure mirror how retry and environment progression options are modeled (using pointer fields for slices/bools). This keeps tests expressive without introducing new patterns; the address-of-slice/bool usage is already established in this file.

apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/rollback/rollback.go (1)

19-86: RollbackEvaluator correctly encodes status- and verification-based rollback semantics

The evaluator cleanly handles both configured rollback triggers: it denies when the latest job’s status is in RollBackJobStatuses, and, when OnVerificationFailure is set, it checks persisted job verifications and denies on any failure. The use of ScopeReleaseTarget, memoization, and sorting by CreatedAt for “latest job” is consistent with the surrounding evaluator patterns.

Comment on lines +51 to +56
if !r.shouldRollback(actx.Policies, actx.Job.Status) {
span.SetAttributes(attribute.Bool("rollback_applicable", false))
span.SetStatus(codes.Ok, "no applicable rollback policy")
return nil
}

Copy link
Contributor

@coderabbitai coderabbitai bot Dec 23, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

OnVerificationFailure rollback rules are never honored in RollbackAction

Right now Execute bails out unless:

  • At least one enabled policy has a Rollback rule, and
  • That rule’s RollBackJobStatuses slice is non-nil and contains the current job status.

OnVerificationFailure is not considered at all in shouldRollback, so any policy that configures rollback only via OnVerificationFailure (with RollBackJobStatuses == nil) will never trigger a rollback from this action, even if verifications have clearly failed and the evaluator has decided the rollback rule should deny.

This contradicts the intended behavior implied by the RollbackRule schema and the new e2e tests that exercise WithRuleRollbackOnVerificationFailure() and combined job/verification configurations.

To fix this, you probably want shouldRollback (or the Execute gating) to also take OnVerificationFailure into account, for example by:

  • Either removing the policy re-check here and letting the action be invoked only when the rollback evaluator has already decided rollback is required, or
  • Extending shouldRollback to:
    • Still check RollBackJobStatuses against jobStatus, and
    • When OnVerificationFailure is true, inspect store.JobVerifications.GetByJobId(job.Id) and return true if any verification has JobVerificationStatusFailed.

A possible direction (simplified, assuming you pass the job into shouldRollback) would look like:

Example adjustment to consider
-func (r *RollbackAction) Execute(
+func (r *RollbackAction) Execute(
     ctx context.Context,
     trigger action.ActionTrigger,
     actx action.ActionContext,
 ) error {
@@
-    if !r.shouldRollback(actx.Policies, actx.Job.Status) {
+    if !r.shouldRollback(actx.Policies, actx.Job) {
         span.SetAttributes(attribute.Bool("rollback_applicable", false))
         span.SetStatus(codes.Ok, "no applicable rollback policy")
         return nil
     }
@@
-func (r *RollbackAction) shouldRollback(policies []*oapi.Policy, jobStatus oapi.JobStatus) bool {
+func (r *RollbackAction) shouldRollback(policies []*oapi.Policy, job *oapi.Job) bool {
+    jobStatus := job.Status
     for _, policy := range policies {
         if !policy.Enabled {
             continue
         }
 
         for _, rule := range policy.Rules {
             if rule.Rollback == nil {
                 continue
             }
 
-            if rule.Rollback.RollBackJobStatuses != nil &&
-                slices.Contains(*rule.Rollback.RollBackJobStatuses, jobStatus) {
-                return true
-            }
+            rb := rule.Rollback
+            if rb.RollBackJobStatuses != nil &&
+                slices.Contains(*rb.RollBackJobStatuses, jobStatus) {
+                return true
+            }
+
+            if rb.OnVerificationFailure != nil && *rb.OnVerificationFailure {
+                verifications := r.store.JobVerifications.GetByJobId(job.Id)
+                for _, v := range verifications {
+                    if v.Status() == oapi.JobVerificationStatusFailed {
+                        return true
+                    }
+                }
+            }
         }
     }
 
     return false
 }

That keeps the local decision logic aligned with how the rollback evaluator interprets RollbackRule, and makes the OnVerificationFailure flag effective for action execution.

Also applies to: 101-117

🤖 Prompt for AI Agents
In
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go
around lines 51-56 (also apply same change to 101-117), the Execute path bails
out unless a policy has Rollback.RollBackJobStatuses matching the current job
status and therefore ignores RollbackRule.OnVerificationFailure; update the
gating so OnVerificationFailure is honored: either remove this local policy
re-check and rely on the evaluator decision, or extend shouldRollback (or
Execute) to accept the job and when a policy's OnVerificationFailure is true
call store.JobVerifications.GetByJobId(job.ID) and return true if any
verification has JobVerificationStatusFailed while still keeping existing
RollBackJobStatuses checks. Ensure span attributes/status handling remains
consistent when deciding to return nil vs proceed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the rollback action only fires on job updates, so this is intentional, we only care about the job part of the rule here

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@adityachoudhari26, thank you for the clarification! I understand now—the rollback logic has two separate trigger paths:

  1. Job status changesRollbackAction.Execute() → checks RollBackJobStatuses
  2. Verification completionRollbackHooks.OnVerificationComplete() → checks OnVerificationFailure

So shouldRollback() correctly only needs to inspect RollBackJobStatuses since this action is exclusively invoked when job statuses change. The verification-driven rollback path is handled independently by the hooks. That makes perfect sense!


✏️ Learnings added
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 733
File: apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go:51-56
Timestamp: 2025-12-24T17:22:38.067Z
Learning: The rollback system uses two separate trigger mechanisms: RollbackAction (in rollback.go) handles rollbacks triggered by job status changes and checks the RollBackJobStatuses field, while RollbackHooks (in hooks.go) handles rollbacks triggered by verification failures and checks the OnVerificationFailure field. Each component only needs to check its respective portion of the RollbackRule configuration.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


🧠 Learnings used
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 658
File: apps/workspace-engine/pkg/engine/policy/rules/environment-version-rollout/rollout_start_time.go:65-80
Timestamp: 2025-08-19T22:58:15.170Z
Learning: In the environment-version-rollout rule's rollout start time logic, the code intentionally returns the latest ApprovedAt time (first non-nil record from records ordered by updatedAt) rather than the earliest. This is because the goal is to find the latest approval that caused all the rules to pass. While the complete logic for determining "the latest that caused all rules to pass" would be too complex, this approximation works well given that the UI prevents approving if all rules are already passing.

Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 637
File: packages/events/src/kafka/client.ts:10-16
Timestamp: 2025-08-01T04:41:41.345Z
Learning: User adityachoudhari26 prefers not to add null safety checks for required environment variables when they are guaranteed to be present in their deployment configuration, similar to their preference for simplicity over defensive programming in test code.

Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 601
File: e2e/tests/api/policies/retry-policy.spec.ts:23-24
Timestamp: 2025-06-24T23:52:50.732Z
Learning: The user adityachoudhari26 prefers not to add null safety checks or defensive programming in test code, particularly in e2e tests, as they prioritize simplicity and focus on the main functionality being tested rather than comprehensive error handling within the test itself.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go (1)

115-125: Check Upsert error and initialize Job fields.

Identical issues to rollback.go:79-89:

  1. Critical: Line 125 doesn't check for Upsert errors, potentially causing silent failures.

  2. Major: The new job lacks Metadata initialization and TraceToken propagation, which can break map operations downstream and disrupt trace continuity.

🔎 Proposed fix
 	now := time.Now()
 	newJob := oapi.Job{
 		Id:             uuid.New().String(),
 		ReleaseId:      lastSuccessfulJob.ReleaseId,
 		JobAgentId:     lastSuccessfulJob.JobAgentId,
 		JobAgentConfig: lastSuccessfulJob.JobAgentConfig,
 		Status:         oapi.JobStatusPending,
 		CreatedAt:      now,
 		UpdatedAt:      now,
+		Metadata:       make(map[string]string),
+		TraceToken:     lastSuccessfulJob.TraceToken,
 	}
 
-	h.store.Jobs.Upsert(ctx, &newJob)
+	if err := h.store.Jobs.Upsert(ctx, &newJob); err != nil {
+		span.RecordError(err)
+		span.SetStatus(codes.Error, "failed to persist rollback job")
+		return err
+	}
🧹 Nitpick comments (1)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go (1)

94-135: Consider extracting shared rollback execution logic.

Lines 94-135 duplicate the rollback job creation and dispatch logic from rollback.go:59-98. Both files:

  • Fetch the current release and last successful job
  • Check if the current release matches the failed release
  • Create a new rollback job with the same structure
  • Upsert and dispatch the job

Extracting this into a shared helper (e.g., executeRollback(ctx, release, store, dispatcher, span)) would reduce duplication and make future changes easier to maintain consistently.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between a011534 and 3d1ecd3.

📒 Files selected for processing (2)
  • apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go
  • apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go
🧰 Additional context used
📓 Path-based instructions (1)
apps/workspace-engine/**/*.go

📄 CodeRabbit inference engine (apps/workspace-engine/CLAUDE.md)

apps/workspace-engine/**/*.go: Do not add extraneous inline comments that state the obvious
Do not add comments that simply restate what the code does
Do not add comments for standard Go patterns (e.g., noting WaitGroup or semaphore usage)
Write comments that explain why, document complex logic/algorithms, provide non-obvious context, include TODO/FIXME, and document exported functions/types/methods

Files:

  • apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go
  • apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go
🧠 Learnings (3)
📓 Common learnings
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 733
File: apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go:51-56
Timestamp: 2025-12-24T17:22:38.067Z
Learning: The rollback system uses two separate trigger mechanisms: RollbackAction (in rollback.go) handles rollbacks triggered by job status changes and checks the RollBackJobStatuses field, while RollbackHooks (in hooks.go) handles rollbacks triggered by verification failures and checks the OnVerificationFailure field. Each component only needs to check its respective portion of the RollbackRule configuration.
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 637
File: packages/events/src/kafka/client.ts:10-16
Timestamp: 2025-08-01T04:41:41.345Z
Learning: User adityachoudhari26 prefers not to add null safety checks for required environment variables when they are guaranteed to be present in their deployment configuration, similar to their preference for simplicity over defensive programming in test code.
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 601
File: e2e/tests/api/policies/retry-policy.spec.ts:23-24
Timestamp: 2025-06-24T23:52:50.732Z
Learning: The user adityachoudhari26 prefers not to add null safety checks or defensive programming in test code, particularly in e2e tests, as they prioritize simplicity and focus on the main functionality being tested rather than comprehensive error handling within the test itself.
📚 Learning: 2025-12-24T17:22:38.067Z
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 733
File: apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go:51-56
Timestamp: 2025-12-24T17:22:38.067Z
Learning: In rollback.go and hooks.go, validate that each component only references and acts on its designated portion of the RollbackRule configuration. Rollback.go should handle rollbacks triggered by job status changes and check the RollBackJobStatuses field; RollbackHooks should handle rollbacks triggered by verification failures and check the OnVerificationFailure field. Ensure no cross-dependency on the other fields and that each file enforces its own trigger path during reviews.

Applied to files:

  • apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go
  • apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go
📚 Learning: 2025-08-19T22:58:15.170Z
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 658
File: apps/workspace-engine/pkg/engine/policy/rules/environment-version-rollout/rollout_start_time.go:65-80
Timestamp: 2025-08-19T22:58:15.170Z
Learning: In the environment-version-rollout rule's rollout start time logic, the code intentionally returns the latest ApprovedAt time (first non-nil record from records ordered by updatedAt) rather than the earliest. This is because the goal is to find the latest approval that caused all the rules to pass. While the complete logic for determining "the latest that caused all rules to pass" would be too complex, this approximation works well given that the UI prevents approving if all rules are already passing.

Applied to files:

  • apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go
🧬 Code graph analysis (1)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go (2)
apps/workspace-engine/pkg/workspace/releasemanager/verification/hooks.go (1)
  • VerificationHooks (8-23)
apps/workspace-engine/pkg/oapi/oapi.gen.go (7)
  • JobVerification (710-721)
  • VerificationMeasurement (1148-1160)
  • JobVerificationStatusFailed (158-158)
  • ReleaseTarget (911-915)
  • Job (657-671)
  • JobAgentConfig (683-685)
  • Policy (768-781)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: workspace-engine-tests
  • GitHub Check: tests
  • GitHub Check: build (linux/amd64)
  • GitHub Check: Lint
  • GitHub Check: Typecheck
🔇 Additional comments (2)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go (1)

101-120: LGTM! Correctly checks only job status trigger.

The shouldRollback method appropriately checks only the RollBackJobStatuses field, since RollbackAction handles rollbacks triggered by job status changes. The verification-driven rollback path (via OnVerificationFailure) is handled separately in hooks.go.

Based on learnings, this separation of concerns is intentional.

apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go (1)

141-159: LGTM! Correctly checks only verification failure trigger.

The shouldRollbackOnVerificationFailure method appropriately checks only the OnVerificationFailure field, since RollbackHooks handles rollbacks triggered by verification failures. The job-status-driven rollback path (via RollBackJobStatuses) is handled separately in rollback.go.

Based on learnings, this separation of concerns is intentional.

Comment on lines +79 to +89
newJob := oapi.Job{
Id: uuid.New().String(),
ReleaseId: lastSuccessfulJob.ReleaseId,
JobAgentId: lastSuccessfulJob.JobAgentId,
JobAgentConfig: lastSuccessfulJob.JobAgentConfig,
Status: oapi.JobStatusPending,
CreatedAt: now,
UpdatedAt: now,
}

r.store.Jobs.Upsert(ctx, &newJob)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

Check Upsert error and initialize Job fields.

Two issues here:

  1. Critical: The Upsert call on line 89 doesn't check for errors, which could lead to silent failures if the job cannot be persisted.

  2. Major: The new job doesn't initialize Metadata or copy TraceToken from lastSuccessfulJob, which may cause:

    • Nil map serialization issues or panics if downstream code attempts map operations
    • Broken trace continuity, making it harder to correlate rollback jobs with their triggers in observability tooling
🔎 Proposed fix
 	now := time.Now()
 	newJob := oapi.Job{
 		Id:             uuid.New().String(),
 		ReleaseId:      lastSuccessfulJob.ReleaseId,
 		JobAgentId:     lastSuccessfulJob.JobAgentId,
 		JobAgentConfig: lastSuccessfulJob.JobAgentConfig,
 		Status:         oapi.JobStatusPending,
 		CreatedAt:      now,
 		UpdatedAt:      now,
+		Metadata:       make(map[string]string),
+		TraceToken:     lastSuccessfulJob.TraceToken,
 	}
 
-	r.store.Jobs.Upsert(ctx, &newJob)
+	if err := r.store.Jobs.Upsert(ctx, &newJob); err != nil {
+		span.RecordError(err)
+		span.SetStatus(codes.Error, "failed to persist rollback job")
+		return err
+	}
🤖 Prompt for AI Agents
In
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go
around lines 79 to 89, the new Job creation and persistence silently ignore
errors and omit important fields; ensure you initialize Metadata (set to an
empty map if lastSuccessfulJob.Metadata is nil) and copy TraceToken from
lastSuccessfulJob into newJob, then call r.store.Jobs.Upsert(ctx, &newJob) and
check its returned error; on error return or wrap it with context (or log and
return) so the caller can handle persistence failures instead of failing
silently.

RollbackRule: {
type: 'object',
properties: {
rollBackJobStatuses: {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
rollBackJobStatuses: {
onJobStatus: {

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

onJobStatuses

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go (1)

78-89: Handle rollback job persistence errors and propagate metadata/trace context

Two issues remain in the rollback job creation:

  1. Upsert error is ignored – if Jobs.Upsert fails, DispatchJob still runs against a job that was never persisted, and callers get no signal that rollback setup failed. This is a reliability bug.
  2. Job context is partially droppedMetadata and TraceToken from lastSuccessfulJob are not carried over, and Metadata is left nil. That can break trace correlation and risks panics if any code later writes into job.Metadata without initializing it.

Consider:

  • Initializing Metadata (either copying lastSuccessfulJob.Metadata or creating an empty map), and copying TraceToken.
  • Checking and returning the error from Jobs.Upsert, recording it on the span and setting an error status before exiting.

This was noted in a previous review and still applies to the current implementation.

🧹 Nitpick comments (1)
apps/workspace-engine/test/integration/opts.go (1)

901-939: Rollback policy helpers correctly model RollbackRule; minor DRY opportunity

The three helpers set OnJobStatuses and OnVerificationFailure in the expected way (using pointers only when configured), which matches oapi.RollbackRule and the evaluator logic. Behavior looks correct for tests.

If you want to reduce duplication, you could have the specialized helpers delegate to WithRuleRollback:

Optional refactor using the generic helper
-func WithRuleRollbackOnJobStatuses(statuses ...oapi.JobStatus) PolicyRuleOption {
-	return func(_ *TestWorkspace, r *oapi.PolicyRule) error {
-		r.Rollback = &oapi.RollbackRule{
-			OnJobStatuses: &statuses,
-		}
-		return nil
-	}
-}
+func WithRuleRollbackOnJobStatuses(statuses ...oapi.JobStatus) PolicyRuleOption {
+	return WithRuleRollback(statuses, false)
+}
@@
-func WithRuleRollbackOnVerificationFailure() PolicyRuleOption {
-	return func(_ *TestWorkspace, r *oapi.PolicyRule) error {
-		onVerificationFailure := true
-		r.Rollback = &oapi.RollbackRule{
-			OnVerificationFailure: &onVerificationFailure,
-		}
-		return nil
-	}
-}
+func WithRuleRollbackOnVerificationFailure() PolicyRuleOption {
+	return WithRuleRollback(nil, true)
+}
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Disabled knowledge base sources:

  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between 3d1ecd3 and b1ddcc1.

📒 Files selected for processing (7)
  • apps/workspace-engine/oapi/openapi.json
  • apps/workspace-engine/oapi/spec/schemas/policy.jsonnet
  • apps/workspace-engine/pkg/oapi/oapi.gen.go
  • apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go
  • apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/rollback/rollback.go
  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
  • apps/workspace-engine/test/integration/opts.go
🚧 Files skipped from review as they are similar to previous changes (2)
  • apps/workspace-engine/pkg/oapi/oapi.gen.go
  • apps/workspace-engine/oapi/spec/schemas/policy.jsonnet
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{js,jsx,ts,tsx,json,md,yml,yaml}

📄 CodeRabbit inference engine (CLAUDE.md)

Formatting: Prettier is used with @ctrlplane/prettier-config

Files:

  • apps/workspace-engine/oapi/openapi.json
apps/workspace-engine/**/*.go

📄 CodeRabbit inference engine (apps/workspace-engine/CLAUDE.md)

apps/workspace-engine/**/*.go: Do not add extraneous inline comments that state the obvious
Do not add comments that simply restate what the code does
Do not add comments for standard Go patterns (e.g., noting WaitGroup or semaphore usage)
Write comments that explain why, document complex logic/algorithms, provide non-obvious context, include TODO/FIXME, and document exported functions/types/methods

Files:

  • apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go
  • apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/rollback/rollback.go
  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
  • apps/workspace-engine/test/integration/opts.go
apps/workspace-engine/**/*_test.go

📄 CodeRabbit inference engine (apps/workspace-engine/CLAUDE.md)

Follow the existing test structure used in *_test.go files

Files:

  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
🧠 Learnings (15)
📓 Common learnings
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 733
File: apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go:51-56
Timestamp: 2025-12-24T17:22:38.067Z
Learning: The rollback system uses two separate trigger mechanisms: RollbackAction (in rollback.go) handles rollbacks triggered by job status changes and checks the RollBackJobStatuses field, while RollbackHooks (in hooks.go) handles rollbacks triggered by verification failures and checks the OnVerificationFailure field. Each component only needs to check its respective portion of the RollbackRule configuration.
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 637
File: packages/events/src/kafka/client.ts:10-16
Timestamp: 2025-08-01T04:41:41.345Z
Learning: User adityachoudhari26 prefers not to add null safety checks for required environment variables when they are guaranteed to be present in their deployment configuration, similar to their preference for simplicity over defensive programming in test code.
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 601
File: e2e/tests/api/policies/retry-policy.spec.ts:23-24
Timestamp: 2025-06-24T23:52:50.732Z
Learning: The user adityachoudhari26 prefers not to add null safety checks or defensive programming in test code, particularly in e2e tests, as they prioritize simplicity and focus on the main functionality being tested rather than comprehensive error handling within the test itself.
📚 Learning: 2025-12-24T17:22:38.067Z
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 733
File: apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go:51-56
Timestamp: 2025-12-24T17:22:38.067Z
Learning: The rollback system uses two separate trigger mechanisms: RollbackAction (in rollback.go) handles rollbacks triggered by job status changes and checks the RollBackJobStatuses field, while RollbackHooks (in hooks.go) handles rollbacks triggered by verification failures and checks the OnVerificationFailure field. Each component only needs to check its respective portion of the RollbackRule configuration.

Applied to files:

  • apps/workspace-engine/oapi/openapi.json
  • apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/rollback/rollback.go
  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
  • apps/workspace-engine/test/integration/opts.go
📚 Learning: 2025-12-24T17:22:38.067Z
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 733
File: apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go:51-56
Timestamp: 2025-12-24T17:22:38.067Z
Learning: In rollback.go and hooks.go, validate that each component only references and acts on its designated portion of the RollbackRule configuration. Rollback.go should handle rollbacks triggered by job status changes and check the RollBackJobStatuses field; RollbackHooks should handle rollbacks triggered by verification failures and check the OnVerificationFailure field. Ensure no cross-dependency on the other fields and that each file enforces its own trigger path during reviews.

Applied to files:

  • apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go
📚 Learning: 2025-08-19T22:58:15.170Z
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 658
File: apps/workspace-engine/pkg/engine/policy/rules/environment-version-rollout/rollout_start_time.go:65-80
Timestamp: 2025-08-19T22:58:15.170Z
Learning: In the environment-version-rollout rule's rollout start time logic, the code intentionally returns the latest ApprovedAt time (first non-nil record from records ordered by updatedAt) rather than the earliest. This is because the goal is to find the latest approval that caused all the rules to pass. While the complete logic for determining "the latest that caused all rules to pass" would be too complex, this approximation works well given that the UI prevents approving if all rules are already passing.

Applied to files:

  • apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go
📚 Learning: 2025-10-24T00:02:29.723Z
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 696
File: apps/workspace-engine/pkg/workspace/releasemanager/deployment/executor.go:63-67
Timestamp: 2025-10-24T00:02:29.723Z
Learning: In workspace-engine replay mode (apps/workspace-engine/pkg/workspace/releasemanager/deployment/executor.go), jobs are created with Pending status even during replay because job updates are still accepted and processed. If a job remains Pending after replay completes, it genuinely is still pending because no updates occurred during replay.

Applied to files:

  • apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go
  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-04-21T18:47:47.640Z
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 500
File: apps/webservice/src/app/api/v1/jobs/[jobId]/new-engine-job.ts:0-0
Timestamp: 2025-04-21T18:47:47.640Z
Learning: In the job API endpoints, it's acceptable to let internal code fail with TypeErrors when encountering unexpected null values (like missing releaseJob) as long as the errors are caught at the function boundary, logged for debugging, and a consistent response is returned to the caller.

Applied to files:

  • apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*_test.go : Follow the existing test structure used in *_test.go files

Applied to files:

  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*.go : Write comments that explain why, document complex logic/algorithms, provide non-obvious context, include TODO/FIXME, and document exported functions/types/methods

Applied to files:

  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/pkg/model/selector/**/*_test.go : Write comprehensive, data-driven tests for new condition types

Applied to files:

  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/pkg/model/selector/**/*_test.go : Use table-driven tests for all condition types

Applied to files:

  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*.go : Do not add comments that simply restate what the code does

Applied to files:

  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/pkg/model/selector/**/*_test.go : Test validation and matching logic separately for condition types

Applied to files:

  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/pkg/model/selector/**/*_test.go : Include edge cases in tests (empty values, special characters, unicode) for condition types

Applied to files:

  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*.go : Do not add extraneous inline comments that state the obvious

Applied to files:

  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T20:49:05.086Z
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 655
File: apps/workspace-engine/pkg/engine/workspace/fluent.go:166-171
Timestamp: 2025-08-12T20:49:05.086Z
Learning: The UpdateDeploymentVersions() method with OperationCreate case in apps/workspace-engine/pkg/engine/workspace/fluent.go is specifically designed only for creating new deployment versions, not for handling potential duplicates or existing versions.

Applied to files:

  • apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
🧬 Code graph analysis (2)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go (1)
apps/workspace-engine/pkg/oapi/oapi.gen.go (6)
  • Release (902-908)
  • Job (657-671)
  • ReleaseTarget (911-915)
  • JobAgentConfig (683-685)
  • Policy (768-781)
  • JobStatus (688-688)
apps/workspace-engine/test/integration/opts.go (1)
apps/workspace-engine/pkg/oapi/oapi.gen.go (3)
  • JobStatus (688-688)
  • PolicyRule (791-805)
  • RollbackRule (1001-1007)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
  • GitHub Check: Typecheck
  • GitHub Check: Lint
  • GitHub Check: tests
  • GitHub Check: workspace-engine-tests
  • GitHub Check: build (linux/amd64)
🔇 Additional comments (5)
apps/workspace-engine/oapi/openapi.json (1)

1316-1318: RollbackRule OpenAPI wiring matches generated types and behavior

The PolicyRule.rollback property and RollbackRule schema (with onJobStatuses and onVerificationFailure) are consistent with the generated oapi.RollbackRule struct and how the evaluator/action code consumes these fields. No changes needed.

Also applies to: 1857-1873

apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/rollback/rollback.go (1)

1-86: RollbackEvaluator logic cleanly implements rollback policy semantics

The evaluator construction and Evaluate flow look solid:

  • Safely no-ops when store or rollback rule is absent.
  • Correctly scopes to release targets and uses the latest job (by CreatedAt) for evaluation.
  • Implements both rollback-on-status (OnJobStatuses) and rollback-on-verification-failure (OnVerificationFailure) behavior, only hitting the verification store when the flag is enabled.
  • Returns detailed RuleEvaluations with attached job/verification data for downstream consumers.

This fits well with the surrounding rollback design and keeps concerns separated from the action/hooks trigger paths. Based on learnings, this aligns with the intended two-path rollback model.

apps/workspace-engine/test/e2e/engine_policy_rollback_test.go (3)

19-105: LGTM! Comprehensive rollback test structure.

The test follows existing e2e patterns and correctly validates the basic rollback scenario: deploying v1.0.0 successfully, deploying v2.0.0 that fails, and asserting that a rollback job to v1.0.0 is created.


966-983: LGTM! Clean helper function.

The createJobUpdateEvent helper properly constructs a JobUpdateEvent with all required fields, reducing duplication across tests.


854-964: Good coverage of retry-rollback interaction.

This test validates an important edge case: when a retry of the current release fails, the rollback should target the previous successful release (v1.0.0), not retry the same failed release (v2.0.0). The nested conditionals appropriately handle the asynchronous nature of the e2e test.

Comment on lines +566 to +574
// This condition will always fail (result.ok is undefined for sleep)
failureCondition := "result.ok == true"
metric := oapi.VerificationMetricSpec{
Name: "always-fail",
IntervalSeconds: 1,
Count: 1,
FailureCondition: &failureCondition,
Provider: metricProvider,
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

# Find how FailureCondition is evaluated
rg -n 'FailureCondition|SuccessCondition' --type=go -A 3 -B 1 | head -50

Repository: ctrlplanedev/ctrlplane

Length of output: 4267


🏁 Script executed:

# Find the evaluation logic for conditions
rg -n 'evaluateFailureCondition|evaluateSuccessCondition|EvaluateCondition' --type=go -B 2 -A 10

Repository: ctrlplanedev/ctrlplane

Length of output: 48


🏁 Script executed:

# Look at the specific test file to see the actual implementation
rg -n 'TestEngine_Rollback_OnVerificationFailure\|failureCondition|SuccessCondition' apps/workspace-engine/test/e2e/engine_policy_rollback_test.go -B 2 -A 5

Repository: ctrlplanedev/ctrlplane

Length of output: 698


🏁 Script executed:

# Search for metric evaluation logic
fd -t f -name '*.go' -path '*/metric*' | head -20

Repository: ctrlplanedev/ctrlplane

Length of output: 296


🏁 Script executed:

# Look for verification evaluation logic
fd -t f -name '*.go' -path '*/verif*' | head -20

Repository: ctrlplanedev/ctrlplane

Length of output: 296


🏁 Script executed:

# Search for where verification conditions are evaluated
rg -n 'FailureCondition\|SuccessCondition' --type=go -B 5 -A 5 | grep -E '(func|eval|process|check|metric)' | head -30

Repository: ctrlplanedev/ctrlplane

Length of output: 48


🏁 Script executed:

# Find files that handle verification metric logic
fd metric --type=f --path '*/pkg/*' | head -20

Repository: ctrlplanedev/ctrlplane

Length of output: 299


🏁 Script executed:

# Look for condition evaluation in the verification package
rg -n 'condition' --type=go -l | grep -i verif

Repository: ctrlplanedev/ctrlplane

Length of output: 476


🏁 Script executed:

# Search for the actual evaluation of these conditions
rg -n 'Passed\|Failed' --type=go apps/workspace-engine/pkg/workspace -A 3 -B 3 | head -60

Repository: ctrlplanedev/ctrlplane

Length of output: 48


🏁 Script executed:

# Look at how the test file expects the verification to behave
cat -n apps/workspace-engine/test/e2e/engine_policy_rollback_test.go | sed -n '566,600p'

Repository: ctrlplanedev/ctrlplane

Length of output: 1726


🏁 Script executed:

# Examine the evaluator.go file to understand condition evaluation
cat apps/workspace-engine/pkg/workspace/releasemanager/verification/metrics/evaluator.go

Repository: ctrlplanedev/ctrlplane

Length of output: 1538


🏁 Script executed:

# Also check the metric.go file for VerificationMetricSpec definition
cat apps/workspace-engine/pkg/workspace/releasemanager/verification/metrics/metric.go

Repository: ctrlplanedev/ctrlplane

Length of output: 4399


🏁 Script executed:

# Look at the executor to see how these are used
cat apps/workspace-engine/pkg/workspace/releasemanager/verification/executor_test.go | head -150

Repository: ctrlplanedev/ctrlplane

Length of output: 4456


Use SuccessCondition instead of FailureCondition to reliably fail verification when result.ok is undefined.

When FailureCondition: "result.ok == true" is evaluated against sleep provider output where result.ok is undefined, the CEL expression evaluates to false. This causes the verification system to check the SuccessCondition instead. Without a SuccessCondition, the measurement status becomes Inconclusive and polling continues—the verification never actually fails. Lines 667-674 and 750-758 correctly use SuccessCondition with the same expression to achieve the intended failure behavior.

🤖 Prompt for AI Agents
In apps/workspace-engine/test/e2e/engine_policy_rollback_test.go around lines
566 to 574, the metric currently sets FailureCondition to "result.ok == true"
which evaluates false for the sleep provider (result.ok is undefined) causing
the verification to become Inconclusive; replace FailureCondition with
SuccessCondition using the same expression so the metric uses SuccessCondition:
set SuccessCondition to "result.ok == true" and remove the FailureCondition
field so the intended failing behavior occurs when result.ok is not present.

@adityachoudhari26 adityachoudhari26 merged commit 79bcf01 into main Dec 24, 2025
9 checks passed
@adityachoudhari26 adityachoudhari26 deleted the rollback-rule branch December 24, 2025 18:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants