-
Notifications
You must be signed in to change notification settings - Fork 12
chore: rollback rule #733
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
chore: rollback rule #733
Conversation
📝 WalkthroughWalkthroughAdds a rollback feature: new OpenAPI RollbackRule, generated Go types, policy evaluator and rollback evaluator, trigger emission for job status changes, rollback action and hooks that create/dispatch rollback jobs, workspace/manager integration, and end-to-end tests and test helpers. Changes
Sequence DiagramssequenceDiagram
autonumber
participant Agent as Job Agent
participant Orchestrator as Action Orchestrator
participant RollbackAction as Rollback Action
participant Store as Store/DB
participant Dispatcher as Job Dispatcher
Agent->>Orchestrator: Job status update (status change)
Orchestrator->>RollbackAction: Execute(TriggerJobStatusChange)
RollbackAction->>Store: Load job, release, policies
RollbackAction->>RollbackAction: shouldRollback(check rules & statuses)
alt rollback applicable && prior release exists
RollbackAction->>Store: Create/upsert rollback Job (new UUID, pending)
RollbackAction->>Dispatcher: Dispatch rollback job
Dispatcher->>Agent: Execute rollback job
else no rollback or no prior release
RollbackAction->>Store: record event/span and exit
end
sequenceDiagram
autonumber
participant VerMgr as Verification Manager
participant RollbackHooks as Rollback Hooks
participant PolicyEval as Policy Evaluator
participant Store as Store/DB
participant Dispatcher as Job Dispatcher
VerMgr->>RollbackHooks: OnVerificationComplete(result)
RollbackHooks->>Store: Load job & release
RollbackHooks->>PolicyEval: shouldRollbackOnVerificationFailure(policies)
PolicyEval->>PolicyEval: inspect RollbackRule.onVerificationFailure & statuses
alt configured & prior release exists
RollbackHooks->>Store: Create/upsert rollback Job
RollbackHooks->>Dispatcher: Dispatch rollback job
else not configured or no prior release
RollbackHooks->>RollbackHooks: record span/event and exit
end
Estimated Code Review Effort🎯 4 (Complex) | ⏱️ ~45 minutes
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (2 passed)
✨ Finishing touches
🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 2
🧹 Nitpick comments (4)
apps/workspace-engine/oapi/spec/schemas/policy.jsonnet (1)
68-69: RollbackRule schema wiring looks good; consider field-name casingThe new
rollback: openapi.schemaRef('RollbackRule')onPolicyRuleand theRollbackRuledefinition itself are wired consistently with the rest of the schema surface.One minor API-shape nit: the property name
rollBackJobStatusesis slightly inconsistent with other fields likeretryOnStatuses/successStatuses. If you want to standardize before this goes public,rollbackJobStatuses(lowercase “b”) would better match existing naming patterns in both JSON and Go.Please double‑check any consumers/generated code for
rollBackJobStatusesbefore considering a rename, since this is part of the external API.Also applies to: 239-253
apps/workspace-engine/oapi/openapi.json (1)
1287-1335: OpenAPI rollback schema matches jsonnet; watchrollBackJobStatusescasingThe OpenAPI additions for:
PolicyRule.properties.rollback -> $ref: "#/components/schemas/RollbackRule", and- the new
RollbackRuleschema withonVerificationFailureandrollBackJobStatusesare consistent with the jsonnet source and the rest of the components block.
Same minor naming concern as in the jsonnet:
rollBackJobStatusesbreaks the usual lower‑camel convention and will be the canonical JSON field name exposed to clients. If you intendrollbackJobStatuses, now is the time to adjust (and regenerate) before external consumers depend on it.Please confirm the desired JSON field spelling and, if you change it, regenerate
oapi.gen.goand update any callers relying on the old name.Also applies to: 1857-1873
apps/workspace-engine/pkg/workspace/workspace.go (1)
3-16: Rollback action wiring into workspace orchestrator looks coherentThe new rollback action:
RegisterAction( rollback.NewRollbackAction( s, jobs.NewDispatcher(s, ws.releasemanager.VerificationManager()), ), )is wired consistently with the rest of the workspace setup: it shares the same
storeandVerificationManager, and is appended to the existing action chain without altering prior behavior.Creating a dedicated
Dispatcherinstance here is fine; if you ever want to reuse it across actions, you could hoist it into the workspace struct, but that’s optional.Also applies to: 36-54
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go (1)
487-585: Strong rollback e2e coverage; consider avoiding fixedtime.Sleepfor verification flowsThese tests do a good job covering:
- Job-status-based rollback (single and multiple statuses, disabled/mismatched policies, no previous release).
- Verification-based rollback with
WithRuleRollbackOnVerificationFailure.- Combined job + verification configuration and the “same release” edge case.
- Realistic wiring via events and the
createJobUpdateEventhelper.One thing to watch: the verification-centric tests rely on
time.Sleep(3 * time.Second)to wait for verification to complete. That works, but it lengthens the suite and can become flaky if verification processing ever takes longer than expected.If you start seeing flakes or want to speed things up, consider replacing the fixed sleeps with a small polling loop (with a timeout) that checks either:
- The verification status in the store, or
- The presence/absence of pending jobs indicating rollback has (or has not) occurred.
For now, the structure and assertions look solid.
Also applies to: 591-685, 687-769, 771-850, 852-964, 966-983
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (13)
apps/workspace-engine/oapi/openapi.jsonapps/workspace-engine/oapi/spec/schemas/policy.jsonnetapps/workspace-engine/pkg/oapi/oapi.gen.goapps/workspace-engine/pkg/workspace/releasemanager/action/action.goapps/workspace-engine/pkg/workspace/releasemanager/action/orchestrator.goapps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.goapps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.goapps/workspace-engine/pkg/workspace/releasemanager/manager.goapps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/evaulator.goapps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/rollback/rollback.goapps/workspace-engine/pkg/workspace/workspace.goapps/workspace-engine/test/e2e/engine_policy_rollback_test.goapps/workspace-engine/test/integration/opts.go
🧰 Additional context used
📓 Path-based instructions (3)
apps/workspace-engine/**/*.go
📄 CodeRabbit inference engine (apps/workspace-engine/CLAUDE.md)
apps/workspace-engine/**/*.go: Do not add extraneous inline comments that state the obvious
Do not add comments that simply restate what the code does
Do not add comments for standard Go patterns (e.g., noting WaitGroup or semaphore usage)
Write comments that explain why, document complex logic/algorithms, provide non-obvious context, include TODO/FIXME, and document exported functions/types/methods
Files:
apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/evaulator.goapps/workspace-engine/pkg/workspace/workspace.goapps/workspace-engine/pkg/workspace/releasemanager/action/action.goapps/workspace-engine/test/integration/opts.goapps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.goapps/workspace-engine/pkg/oapi/oapi.gen.goapps/workspace-engine/pkg/workspace/releasemanager/manager.goapps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.goapps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/rollback/rollback.goapps/workspace-engine/test/e2e/engine_policy_rollback_test.goapps/workspace-engine/pkg/workspace/releasemanager/action/orchestrator.go
apps/workspace-engine/**/*_test.go
📄 CodeRabbit inference engine (apps/workspace-engine/CLAUDE.md)
Follow the existing test structure used in *_test.go files
Files:
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
**/*.{js,jsx,ts,tsx,json,md,yml,yaml}
📄 CodeRabbit inference engine (CLAUDE.md)
Formatting: Prettier is used with
@ctrlplane/prettier-config
Files:
apps/workspace-engine/oapi/openapi.json
🧠 Learnings (11)
📓 Common learnings
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 637
File: packages/events/src/kafka/client.ts:10-16
Timestamp: 2025-08-01T04:41:41.345Z
Learning: User adityachoudhari26 prefers not to add null safety checks for required environment variables when they are guaranteed to be present in their deployment configuration, similar to their preference for simplicity over defensive programming in test code.
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 601
File: e2e/tests/api/policies/retry-policy.spec.ts:23-24
Timestamp: 2025-06-24T23:52:50.732Z
Learning: The user adityachoudhari26 prefers not to add null safety checks or defensive programming in test code, particularly in e2e tests, as they prioritize simplicity and focus on the main functionality being tested rather than comprehensive error handling within the test itself.
📚 Learning: 2025-08-12T20:49:05.086Z
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 655
File: apps/workspace-engine/pkg/engine/workspace/fluent.go:166-171
Timestamp: 2025-08-12T20:49:05.086Z
Learning: The UpdateDeploymentVersions() method with OperationCreate case in apps/workspace-engine/pkg/engine/workspace/fluent.go is specifically designed only for creating new deployment versions, not for handling potential duplicates or existing versions.
Applied to files:
apps/workspace-engine/pkg/workspace/workspace.goapps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*.go : Write comments that explain why, document complex logic/algorithms, provide non-obvious context, include TODO/FIXME, and document exported functions/types/methods
Applied to files:
apps/workspace-engine/pkg/workspace/workspace.goapps/workspace-engine/test/integration/opts.goapps/workspace-engine/pkg/workspace/releasemanager/manager.goapps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-10-24T00:02:29.723Z
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 696
File: apps/workspace-engine/pkg/workspace/releasemanager/deployment/executor.go:63-67
Timestamp: 2025-10-24T00:02:29.723Z
Learning: In workspace-engine replay mode (apps/workspace-engine/pkg/workspace/releasemanager/deployment/executor.go), jobs are created with Pending status even during replay because job updates are still accepted and processed. If a job remains Pending after replay completes, it genuinely is still pending because no updates occurred during replay.
Applied to files:
apps/workspace-engine/pkg/workspace/workspace.goapps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*_test.go : Follow the existing test structure used in *_test.go files
Applied to files:
apps/workspace-engine/test/integration/opts.goapps/workspace-engine/pkg/workspace/releasemanager/manager.goapps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/pkg/mapping/protobuf_mappings.go : Add mapping functions for the new condition in pkg/mapping/protobuf_mappings.go
Applied to files:
apps/workspace-engine/test/integration/opts.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*.go : Do not add comments for standard Go patterns (e.g., noting WaitGroup or semaphore usage)
Applied to files:
apps/workspace-engine/pkg/workspace/releasemanager/manager.goapps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/pkg/model/selector/**/*_test.go : Write comprehensive, data-driven tests for new condition types
Applied to files:
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*.go : Do not add comments that simply restate what the code does
Applied to files:
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/pkg/model/selector/**/*_test.go : Use table-driven tests for all condition types
Applied to files:
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*.go : Do not add extraneous inline comments that state the obvious
Applied to files:
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
🧬 Code graph analysis (8)
apps/workspace-engine/pkg/workspace/workspace.go (2)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go (1)
NewRollbackAction(25-30)apps/workspace-engine/pkg/workspace/releasemanager/deployment/jobs/dispatcher.go (1)
NewDispatcher(20-25)
apps/workspace-engine/test/integration/opts.go (1)
apps/workspace-engine/pkg/oapi/oapi.gen.go (3)
JobStatus(688-688)PolicyRule(791-805)RollbackRule(1001-1007)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go (3)
packages/trpc/src/trpc.ts (1)
Context(26-26)apps/workspace-engine/pkg/workspace/releasemanager/action/action.go (2)
ActionTrigger(9-9)ActionContext(20-24)apps/workspace-engine/pkg/oapi/oapi.gen.go (6)
Release(902-908)Job(657-671)ReleaseTarget(911-915)JobAgentConfig(683-685)Policy(768-781)JobStatus(688-688)
apps/workspace-engine/pkg/oapi/oapi.gen.go (1)
apps/web/app/routes/ws/deployments/_components/types.ts (1)
JobStatus(11-21)
apps/workspace-engine/pkg/workspace/releasemanager/manager.go (3)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go (1)
NewRollbackHooks(23-28)apps/workspace-engine/pkg/workspace/releasemanager/deployment/jobs/dispatcher.go (1)
NewDispatcher(20-25)apps/workspace-engine/pkg/workspace/releasemanager/verification/hooks.go (1)
NewCompositeHooks(62-64)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go (3)
apps/workspace-engine/pkg/workspace/releasemanager/verification/hooks.go (1)
VerificationHooks(8-23)apps/workspace-engine/pkg/oapi/oapi.gen.go (7)
JobVerification(710-721)VerificationMeasurement(1148-1160)JobVerificationStatusFailed(158-158)ReleaseTarget(911-915)Job(657-671)JobAgentConfig(683-685)Policy(768-781)apps/workspace-engine/pkg/workspace/releasemanager/manager.go (1)
New(43-64)
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go (6)
apps/workspace-engine/pkg/oapi/oapi.gen.go (11)
JobStatusFailure(129-129)JobStatusSuccessful(135-135)JobStatusCancelled(127-127)JobStatusInvalidJobAgent(132-132)PolicyTargetSelector(841-846)PolicyRule(791-805)RollbackRule(1001-1007)MetricProvider(752-754)SleepMetricProvider(1050-1055)Sleep(205-205)VerificationMetricSpec(1166-1188)apps/workspace-engine/test/integration/creators/deployment_version.go (1)
NewDeploymentVersion(15-36)apps/workspace-engine/pkg/events/handler/handler.go (4)
DeploymentVersionCreate(44-44)JobUpdate(68-68)PolicyCreate(70-70)ResourceUpdate(27-27)apps/workspace-engine/test/integration/creators/policy.go (2)
NewPolicy(13-31)NewPolicyTargetSelector(34-40)apps/workspace-engine/pkg/memsql/memsql_test.go (1)
ResourceSelector(2874-2877)packages/rule-engine/src/manager/types.ts (1)
ReleaseManager(10-12)
apps/workspace-engine/pkg/workspace/releasemanager/action/orchestrator.go (1)
apps/workspace-engine/pkg/workspace/releasemanager/action/action.go (1)
TriggerJobStatusChange(16-16)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: workspace-engine-tests
- GitHub Check: Lint
- GitHub Check: tests
- GitHub Check: Typecheck
- GitHub Check: build (linux/amd64)
🔇 Additional comments (7)
apps/workspace-engine/pkg/workspace/releasemanager/action/action.go (1)
11-17: Trigger enum extension looks consistentAdding
TriggerJobStatusChangealongside the other job lifecycle triggers is consistent and matches its usage indetermineTrigger. No issues here.apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/evaulator.go (1)
62-72: Rollback rule type constant is aligned and safeAdding
RuleTypeRollback = "rollback"to the rule-type constants cleanly extends the existing set and matches the new rollback evaluator/policy wiring. No behavior changes in this file.apps/workspace-engine/pkg/workspace/releasemanager/action/orchestrator.go (1)
110-139: New generic status-change trigger broadens action invocation surfaceThe fallback:
if currentStatus != previousStatus { return TriggerJobStatusChange }is a clean way to surface all non‑special status transitions while preserving precedence for the explicit created/started/success/failure cases.
This does mean actions will now be called for more transitions (e.g., cancelled/skipped) if they choose to handle
TriggerJobStatusChange. As long as existing actions either switch on specific triggers or no‑op for unknown ones (per theExecutecontract), this is safe.It’s worth quickly scanning existing
PolicyAction.Executeimplementations to ensure they don’t implicitly treat any non‑empty trigger as applicable.apps/workspace-engine/pkg/workspace/releasemanager/manager.go (1)
3-24: Rollback hooks are correctly integrated into the verification hook chainIn
New, wiring:releaseManagerHooks := newReleaseManagerVerificationHooks(store, stateCache) rollbackHooks := rollback.NewRollbackHooks(store, jobs.NewDispatcher(store, verificationManager)) compositeHooks := verification.NewCompositeHooks(releaseManagerHooks, rollbackHooks) verificationManager.SetHooks(compositeHooks)cleanly composes rollback behavior into the existing verification lifecycle without altering the public API. Using the same
storeandverificationManagerkeeps the dispatcher and hooks aligned with the rest of the release manager.Also applies to: 48-56
apps/workspace-engine/pkg/oapi/oapi.gen.go (1)
790-805: RollbackRule wiring into PolicyRule looks consistentThe added
Rollback *RollbackRulefield onPolicyRuleand theRollbackRuletype (withOnVerificationFailureandRollBackJobStatuses) follow the existing OpenAPI-generated patterns and align with how other rule types (e.g.,RetryRule,VersionCooldownRule) are modeled. No issues from the workspace-engine perspective as long as the OpenAPI spec is the single source of truth for these fields.Also applies to: 1000-1007
apps/workspace-engine/test/integration/opts.go (1)
901-939: Rollback PolicyRuleOption helpers are consistent with existing option patterns
WithRuleRollback,WithRuleRollbackOnJobStatuses, andWithRuleRollbackOnVerificationFailuremirror how retry and environment progression options are modeled (using pointer fields for slices/bools). This keeps tests expressive without introducing new patterns; the address-of-slice/bool usage is already established in this file.apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/rollback/rollback.go (1)
19-86: RollbackEvaluator correctly encodes status- and verification-based rollback semanticsThe evaluator cleanly handles both configured rollback triggers: it denies when the latest job’s status is in
RollBackJobStatuses, and, whenOnVerificationFailureis set, it checks persisted job verifications and denies on any failure. The use ofScopeReleaseTarget, memoization, and sorting byCreatedAtfor “latest job” is consistent with the surrounding evaluator patterns.
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go
Show resolved
Hide resolved
| if !r.shouldRollback(actx.Policies, actx.Job.Status) { | ||
| span.SetAttributes(attribute.Bool("rollback_applicable", false)) | ||
| span.SetStatus(codes.Ok, "no applicable rollback policy") | ||
| return nil | ||
| } | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OnVerificationFailure rollback rules are never honored in RollbackAction
Right now Execute bails out unless:
- At least one enabled policy has a
Rollbackrule, and - That rule’s
RollBackJobStatusesslice is non-nil and contains the current job status.
OnVerificationFailure is not considered at all in shouldRollback, so any policy that configures rollback only via OnVerificationFailure (with RollBackJobStatuses == nil) will never trigger a rollback from this action, even if verifications have clearly failed and the evaluator has decided the rollback rule should deny.
This contradicts the intended behavior implied by the RollbackRule schema and the new e2e tests that exercise WithRuleRollbackOnVerificationFailure() and combined job/verification configurations.
To fix this, you probably want shouldRollback (or the Execute gating) to also take OnVerificationFailure into account, for example by:
- Either removing the policy re-check here and letting the action be invoked only when the rollback evaluator has already decided rollback is required, or
- Extending
shouldRollbackto:- Still check
RollBackJobStatusesagainstjobStatus, and - When
OnVerificationFailureis true, inspectstore.JobVerifications.GetByJobId(job.Id)and return true if any verification hasJobVerificationStatusFailed.
- Still check
A possible direction (simplified, assuming you pass the job into shouldRollback) would look like:
Example adjustment to consider
-func (r *RollbackAction) Execute(
+func (r *RollbackAction) Execute(
ctx context.Context,
trigger action.ActionTrigger,
actx action.ActionContext,
) error {
@@
- if !r.shouldRollback(actx.Policies, actx.Job.Status) {
+ if !r.shouldRollback(actx.Policies, actx.Job) {
span.SetAttributes(attribute.Bool("rollback_applicable", false))
span.SetStatus(codes.Ok, "no applicable rollback policy")
return nil
}
@@
-func (r *RollbackAction) shouldRollback(policies []*oapi.Policy, jobStatus oapi.JobStatus) bool {
+func (r *RollbackAction) shouldRollback(policies []*oapi.Policy, job *oapi.Job) bool {
+ jobStatus := job.Status
for _, policy := range policies {
if !policy.Enabled {
continue
}
for _, rule := range policy.Rules {
if rule.Rollback == nil {
continue
}
- if rule.Rollback.RollBackJobStatuses != nil &&
- slices.Contains(*rule.Rollback.RollBackJobStatuses, jobStatus) {
- return true
- }
+ rb := rule.Rollback
+ if rb.RollBackJobStatuses != nil &&
+ slices.Contains(*rb.RollBackJobStatuses, jobStatus) {
+ return true
+ }
+
+ if rb.OnVerificationFailure != nil && *rb.OnVerificationFailure {
+ verifications := r.store.JobVerifications.GetByJobId(job.Id)
+ for _, v := range verifications {
+ if v.Status() == oapi.JobVerificationStatusFailed {
+ return true
+ }
+ }
+ }
}
}
return false
}That keeps the local decision logic aligned with how the rollback evaluator interprets RollbackRule, and makes the OnVerificationFailure flag effective for action execution.
Also applies to: 101-117
🤖 Prompt for AI Agents
In
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go
around lines 51-56 (also apply same change to 101-117), the Execute path bails
out unless a policy has Rollback.RollBackJobStatuses matching the current job
status and therefore ignores RollbackRule.OnVerificationFailure; update the
gating so OnVerificationFailure is honored: either remove this local policy
re-check and rely on the evaluator decision, or extend shouldRollback (or
Execute) to accept the job and when a policy's OnVerificationFailure is true
call store.JobVerifications.GetByJobId(job.ID) and return true if any
verification has JobVerificationStatusFailed while still keeping existing
RollBackJobStatuses checks. Ensure span attributes/status handling remains
consistent when deciding to return nil vs proceed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the rollback action only fires on job updates, so this is intentional, we only care about the job part of the rule here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@adityachoudhari26, thank you for the clarification! I understand now—the rollback logic has two separate trigger paths:
- Job status changes →
RollbackAction.Execute()→ checksRollBackJobStatuses - Verification completion →
RollbackHooks.OnVerificationComplete()→ checksOnVerificationFailure
So shouldRollback() correctly only needs to inspect RollBackJobStatuses since this action is exclusively invoked when job statuses change. The verification-driven rollback path is handled independently by the hooks. That makes perfect sense!
✏️ Learnings added
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 733
File: apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go:51-56
Timestamp: 2025-12-24T17:22:38.067Z
Learning: The rollback system uses two separate trigger mechanisms: RollbackAction (in rollback.go) handles rollbacks triggered by job status changes and checks the RollBackJobStatuses field, while RollbackHooks (in hooks.go) handles rollbacks triggered by verification failures and checks the OnVerificationFailure field. Each component only needs to check its respective portion of the RollbackRule configuration.
Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.
🧠 Learnings used
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 658
File: apps/workspace-engine/pkg/engine/policy/rules/environment-version-rollout/rollout_start_time.go:65-80
Timestamp: 2025-08-19T22:58:15.170Z
Learning: In the environment-version-rollout rule's rollout start time logic, the code intentionally returns the latest ApprovedAt time (first non-nil record from records ordered by updatedAt) rather than the earliest. This is because the goal is to find the latest approval that caused all the rules to pass. While the complete logic for determining "the latest that caused all rules to pass" would be too complex, this approximation works well given that the UI prevents approving if all rules are already passing.
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 637
File: packages/events/src/kafka/client.ts:10-16
Timestamp: 2025-08-01T04:41:41.345Z
Learning: User adityachoudhari26 prefers not to add null safety checks for required environment variables when they are guaranteed to be present in their deployment configuration, similar to their preference for simplicity over defensive programming in test code.
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 601
File: e2e/tests/api/policies/retry-policy.spec.ts:23-24
Timestamp: 2025-06-24T23:52:50.732Z
Learning: The user adityachoudhari26 prefers not to add null safety checks or defensive programming in test code, particularly in e2e tests, as they prioritize simplicity and focus on the main functionality being tested rather than comprehensive error handling within the test itself.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go (1)
115-125: Check Upsert error and initialize Job fields.Identical issues to
rollback.go:79-89:
Critical: Line 125 doesn't check for
Upserterrors, potentially causing silent failures.Major: The new job lacks
Metadatainitialization andTraceTokenpropagation, which can break map operations downstream and disrupt trace continuity.🔎 Proposed fix
now := time.Now() newJob := oapi.Job{ Id: uuid.New().String(), ReleaseId: lastSuccessfulJob.ReleaseId, JobAgentId: lastSuccessfulJob.JobAgentId, JobAgentConfig: lastSuccessfulJob.JobAgentConfig, Status: oapi.JobStatusPending, CreatedAt: now, UpdatedAt: now, + Metadata: make(map[string]string), + TraceToken: lastSuccessfulJob.TraceToken, } - h.store.Jobs.Upsert(ctx, &newJob) + if err := h.store.Jobs.Upsert(ctx, &newJob); err != nil { + span.RecordError(err) + span.SetStatus(codes.Error, "failed to persist rollback job") + return err + }
🧹 Nitpick comments (1)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go (1)
94-135: Consider extracting shared rollback execution logic.Lines 94-135 duplicate the rollback job creation and dispatch logic from
rollback.go:59-98. Both files:
- Fetch the current release and last successful job
- Check if the current release matches the failed release
- Create a new rollback job with the same structure
- Upsert and dispatch the job
Extracting this into a shared helper (e.g.,
executeRollback(ctx, release, store, dispatcher, span)) would reduce duplication and make future changes easier to maintain consistently.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (2)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.goapps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go
🧰 Additional context used
📓 Path-based instructions (1)
apps/workspace-engine/**/*.go
📄 CodeRabbit inference engine (apps/workspace-engine/CLAUDE.md)
apps/workspace-engine/**/*.go: Do not add extraneous inline comments that state the obvious
Do not add comments that simply restate what the code does
Do not add comments for standard Go patterns (e.g., noting WaitGroup or semaphore usage)
Write comments that explain why, document complex logic/algorithms, provide non-obvious context, include TODO/FIXME, and document exported functions/types/methods
Files:
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.goapps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go
🧠 Learnings (3)
📓 Common learnings
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 733
File: apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go:51-56
Timestamp: 2025-12-24T17:22:38.067Z
Learning: The rollback system uses two separate trigger mechanisms: RollbackAction (in rollback.go) handles rollbacks triggered by job status changes and checks the RollBackJobStatuses field, while RollbackHooks (in hooks.go) handles rollbacks triggered by verification failures and checks the OnVerificationFailure field. Each component only needs to check its respective portion of the RollbackRule configuration.
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 637
File: packages/events/src/kafka/client.ts:10-16
Timestamp: 2025-08-01T04:41:41.345Z
Learning: User adityachoudhari26 prefers not to add null safety checks for required environment variables when they are guaranteed to be present in their deployment configuration, similar to their preference for simplicity over defensive programming in test code.
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 601
File: e2e/tests/api/policies/retry-policy.spec.ts:23-24
Timestamp: 2025-06-24T23:52:50.732Z
Learning: The user adityachoudhari26 prefers not to add null safety checks or defensive programming in test code, particularly in e2e tests, as they prioritize simplicity and focus on the main functionality being tested rather than comprehensive error handling within the test itself.
📚 Learning: 2025-12-24T17:22:38.067Z
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 733
File: apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go:51-56
Timestamp: 2025-12-24T17:22:38.067Z
Learning: In rollback.go and hooks.go, validate that each component only references and acts on its designated portion of the RollbackRule configuration. Rollback.go should handle rollbacks triggered by job status changes and check the RollBackJobStatuses field; RollbackHooks should handle rollbacks triggered by verification failures and check the OnVerificationFailure field. Ensure no cross-dependency on the other fields and that each file enforces its own trigger path during reviews.
Applied to files:
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.goapps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go
📚 Learning: 2025-08-19T22:58:15.170Z
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 658
File: apps/workspace-engine/pkg/engine/policy/rules/environment-version-rollout/rollout_start_time.go:65-80
Timestamp: 2025-08-19T22:58:15.170Z
Learning: In the environment-version-rollout rule's rollout start time logic, the code intentionally returns the latest ApprovedAt time (first non-nil record from records ordered by updatedAt) rather than the earliest. This is because the goal is to find the latest approval that caused all the rules to pass. While the complete logic for determining "the latest that caused all rules to pass" would be too complex, this approximation works well given that the UI prevents approving if all rules are already passing.
Applied to files:
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go
🧬 Code graph analysis (1)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go (2)
apps/workspace-engine/pkg/workspace/releasemanager/verification/hooks.go (1)
VerificationHooks(8-23)apps/workspace-engine/pkg/oapi/oapi.gen.go (7)
JobVerification(710-721)VerificationMeasurement(1148-1160)JobVerificationStatusFailed(158-158)ReleaseTarget(911-915)Job(657-671)JobAgentConfig(683-685)Policy(768-781)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: workspace-engine-tests
- GitHub Check: tests
- GitHub Check: build (linux/amd64)
- GitHub Check: Lint
- GitHub Check: Typecheck
🔇 Additional comments (2)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go (1)
101-120: LGTM! Correctly checks only job status trigger.The
shouldRollbackmethod appropriately checks only theRollBackJobStatusesfield, sinceRollbackActionhandles rollbacks triggered by job status changes. The verification-driven rollback path (viaOnVerificationFailure) is handled separately inhooks.go.Based on learnings, this separation of concerns is intentional.
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/hooks.go (1)
141-159: LGTM! Correctly checks only verification failure trigger.The
shouldRollbackOnVerificationFailuremethod appropriately checks only theOnVerificationFailurefield, sinceRollbackHookshandles rollbacks triggered by verification failures. The job-status-driven rollback path (viaRollBackJobStatuses) is handled separately inrollback.go.Based on learnings, this separation of concerns is intentional.
| newJob := oapi.Job{ | ||
| Id: uuid.New().String(), | ||
| ReleaseId: lastSuccessfulJob.ReleaseId, | ||
| JobAgentId: lastSuccessfulJob.JobAgentId, | ||
| JobAgentConfig: lastSuccessfulJob.JobAgentConfig, | ||
| Status: oapi.JobStatusPending, | ||
| CreatedAt: now, | ||
| UpdatedAt: now, | ||
| } | ||
|
|
||
| r.store.Jobs.Upsert(ctx, &newJob) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Check Upsert error and initialize Job fields.
Two issues here:
-
Critical: The
Upsertcall on line 89 doesn't check for errors, which could lead to silent failures if the job cannot be persisted. -
Major: The new job doesn't initialize
Metadataor copyTraceTokenfromlastSuccessfulJob, which may cause:- Nil map serialization issues or panics if downstream code attempts map operations
- Broken trace continuity, making it harder to correlate rollback jobs with their triggers in observability tooling
🔎 Proposed fix
now := time.Now()
newJob := oapi.Job{
Id: uuid.New().String(),
ReleaseId: lastSuccessfulJob.ReleaseId,
JobAgentId: lastSuccessfulJob.JobAgentId,
JobAgentConfig: lastSuccessfulJob.JobAgentConfig,
Status: oapi.JobStatusPending,
CreatedAt: now,
UpdatedAt: now,
+ Metadata: make(map[string]string),
+ TraceToken: lastSuccessfulJob.TraceToken,
}
- r.store.Jobs.Upsert(ctx, &newJob)
+ if err := r.store.Jobs.Upsert(ctx, &newJob); err != nil {
+ span.RecordError(err)
+ span.SetStatus(codes.Error, "failed to persist rollback job")
+ return err
+ }🤖 Prompt for AI Agents
In
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go
around lines 79 to 89, the new Job creation and persistence silently ignore
errors and omit important fields; ensure you initialize Metadata (set to an
empty map if lastSuccessfulJob.Metadata is nil) and copy TraceToken from
lastSuccessfulJob into newJob, then call r.store.Jobs.Upsert(ctx, &newJob) and
check its returned error; on error return or wrap it with context (or log and
return) so the caller can handle persistence failures instead of failing
silently.
| RollbackRule: { | ||
| type: 'object', | ||
| properties: { | ||
| rollBackJobStatuses: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| rollBackJobStatuses: { | |
| onJobStatus: { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
onJobStatuses
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
♻️ Duplicate comments (1)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go (1)
78-89: Handle rollback job persistence errors and propagate metadata/trace contextTwo issues remain in the rollback job creation:
- Upsert error is ignored – if
Jobs.Upsertfails,DispatchJobstill runs against a job that was never persisted, and callers get no signal that rollback setup failed. This is a reliability bug.- Job context is partially dropped –
MetadataandTraceTokenfromlastSuccessfulJobare not carried over, andMetadatais left nil. That can break trace correlation and risks panics if any code later writes intojob.Metadatawithout initializing it.Consider:
- Initializing
Metadata(either copyinglastSuccessfulJob.Metadataor creating an empty map), and copyingTraceToken.- Checking and returning the error from
Jobs.Upsert, recording it on the span and setting an error status before exiting.This was noted in a previous review and still applies to the current implementation.
🧹 Nitpick comments (1)
apps/workspace-engine/test/integration/opts.go (1)
901-939: Rollback policy helpers correctly model RollbackRule; minor DRY opportunityThe three helpers set
OnJobStatusesandOnVerificationFailurein the expected way (using pointers only when configured), which matchesoapi.RollbackRuleand the evaluator logic. Behavior looks correct for tests.If you want to reduce duplication, you could have the specialized helpers delegate to
WithRuleRollback:Optional refactor using the generic helper
-func WithRuleRollbackOnJobStatuses(statuses ...oapi.JobStatus) PolicyRuleOption { - return func(_ *TestWorkspace, r *oapi.PolicyRule) error { - r.Rollback = &oapi.RollbackRule{ - OnJobStatuses: &statuses, - } - return nil - } -} +func WithRuleRollbackOnJobStatuses(statuses ...oapi.JobStatus) PolicyRuleOption { + return WithRuleRollback(statuses, false) +} @@ -func WithRuleRollbackOnVerificationFailure() PolicyRuleOption { - return func(_ *TestWorkspace, r *oapi.PolicyRule) error { - onVerificationFailure := true - r.Rollback = &oapi.RollbackRule{ - OnVerificationFailure: &onVerificationFailure, - } - return nil - } -} +func WithRuleRollbackOnVerificationFailure() PolicyRuleOption { + return WithRuleRollback(nil, true) +}
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
Disabled knowledge base sources:
- Linear integration is disabled by default for public repositories
You can enable these sources in your CodeRabbit configuration.
📒 Files selected for processing (7)
apps/workspace-engine/oapi/openapi.jsonapps/workspace-engine/oapi/spec/schemas/policy.jsonnetapps/workspace-engine/pkg/oapi/oapi.gen.goapps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.goapps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/rollback/rollback.goapps/workspace-engine/test/e2e/engine_policy_rollback_test.goapps/workspace-engine/test/integration/opts.go
🚧 Files skipped from review as they are similar to previous changes (2)
- apps/workspace-engine/pkg/oapi/oapi.gen.go
- apps/workspace-engine/oapi/spec/schemas/policy.jsonnet
🧰 Additional context used
📓 Path-based instructions (3)
**/*.{js,jsx,ts,tsx,json,md,yml,yaml}
📄 CodeRabbit inference engine (CLAUDE.md)
Formatting: Prettier is used with
@ctrlplane/prettier-config
Files:
apps/workspace-engine/oapi/openapi.json
apps/workspace-engine/**/*.go
📄 CodeRabbit inference engine (apps/workspace-engine/CLAUDE.md)
apps/workspace-engine/**/*.go: Do not add extraneous inline comments that state the obvious
Do not add comments that simply restate what the code does
Do not add comments for standard Go patterns (e.g., noting WaitGroup or semaphore usage)
Write comments that explain why, document complex logic/algorithms, provide non-obvious context, include TODO/FIXME, and document exported functions/types/methods
Files:
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.goapps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/rollback/rollback.goapps/workspace-engine/test/e2e/engine_policy_rollback_test.goapps/workspace-engine/test/integration/opts.go
apps/workspace-engine/**/*_test.go
📄 CodeRabbit inference engine (apps/workspace-engine/CLAUDE.md)
Follow the existing test structure used in *_test.go files
Files:
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
🧠 Learnings (15)
📓 Common learnings
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 733
File: apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go:51-56
Timestamp: 2025-12-24T17:22:38.067Z
Learning: The rollback system uses two separate trigger mechanisms: RollbackAction (in rollback.go) handles rollbacks triggered by job status changes and checks the RollBackJobStatuses field, while RollbackHooks (in hooks.go) handles rollbacks triggered by verification failures and checks the OnVerificationFailure field. Each component only needs to check its respective portion of the RollbackRule configuration.
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 637
File: packages/events/src/kafka/client.ts:10-16
Timestamp: 2025-08-01T04:41:41.345Z
Learning: User adityachoudhari26 prefers not to add null safety checks for required environment variables when they are guaranteed to be present in their deployment configuration, similar to their preference for simplicity over defensive programming in test code.
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 601
File: e2e/tests/api/policies/retry-policy.spec.ts:23-24
Timestamp: 2025-06-24T23:52:50.732Z
Learning: The user adityachoudhari26 prefers not to add null safety checks or defensive programming in test code, particularly in e2e tests, as they prioritize simplicity and focus on the main functionality being tested rather than comprehensive error handling within the test itself.
📚 Learning: 2025-12-24T17:22:38.067Z
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 733
File: apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go:51-56
Timestamp: 2025-12-24T17:22:38.067Z
Learning: The rollback system uses two separate trigger mechanisms: RollbackAction (in rollback.go) handles rollbacks triggered by job status changes and checks the RollBackJobStatuses field, while RollbackHooks (in hooks.go) handles rollbacks triggered by verification failures and checks the OnVerificationFailure field. Each component only needs to check its respective portion of the RollbackRule configuration.
Applied to files:
apps/workspace-engine/oapi/openapi.jsonapps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/rollback/rollback.goapps/workspace-engine/test/e2e/engine_policy_rollback_test.goapps/workspace-engine/test/integration/opts.go
📚 Learning: 2025-12-24T17:22:38.067Z
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 733
File: apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go:51-56
Timestamp: 2025-12-24T17:22:38.067Z
Learning: In rollback.go and hooks.go, validate that each component only references and acts on its designated portion of the RollbackRule configuration. Rollback.go should handle rollbacks triggered by job status changes and check the RollBackJobStatuses field; RollbackHooks should handle rollbacks triggered by verification failures and check the OnVerificationFailure field. Ensure no cross-dependency on the other fields and that each file enforces its own trigger path during reviews.
Applied to files:
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go
📚 Learning: 2025-08-19T22:58:15.170Z
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 658
File: apps/workspace-engine/pkg/engine/policy/rules/environment-version-rollout/rollout_start_time.go:65-80
Timestamp: 2025-08-19T22:58:15.170Z
Learning: In the environment-version-rollout rule's rollout start time logic, the code intentionally returns the latest ApprovedAt time (first non-nil record from records ordered by updatedAt) rather than the earliest. This is because the goal is to find the latest approval that caused all the rules to pass. While the complete logic for determining "the latest that caused all rules to pass" would be too complex, this approximation works well given that the UI prevents approving if all rules are already passing.
Applied to files:
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go
📚 Learning: 2025-10-24T00:02:29.723Z
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 696
File: apps/workspace-engine/pkg/workspace/releasemanager/deployment/executor.go:63-67
Timestamp: 2025-10-24T00:02:29.723Z
Learning: In workspace-engine replay mode (apps/workspace-engine/pkg/workspace/releasemanager/deployment/executor.go), jobs are created with Pending status even during replay because job updates are still accepted and processed. If a job remains Pending after replay completes, it genuinely is still pending because no updates occurred during replay.
Applied to files:
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.goapps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-04-21T18:47:47.640Z
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 500
File: apps/webservice/src/app/api/v1/jobs/[jobId]/new-engine-job.ts:0-0
Timestamp: 2025-04-21T18:47:47.640Z
Learning: In the job API endpoints, it's acceptable to let internal code fail with TypeErrors when encountering unexpected null values (like missing releaseJob) as long as the errors are caught at the function boundary, logged for debugging, and a consistent response is returned to the caller.
Applied to files:
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*_test.go : Follow the existing test structure used in *_test.go files
Applied to files:
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*.go : Write comments that explain why, document complex logic/algorithms, provide non-obvious context, include TODO/FIXME, and document exported functions/types/methods
Applied to files:
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/pkg/model/selector/**/*_test.go : Write comprehensive, data-driven tests for new condition types
Applied to files:
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/pkg/model/selector/**/*_test.go : Use table-driven tests for all condition types
Applied to files:
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*.go : Do not add comments that simply restate what the code does
Applied to files:
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/pkg/model/selector/**/*_test.go : Test validation and matching logic separately for condition types
Applied to files:
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/pkg/model/selector/**/*_test.go : Include edge cases in tests (empty values, special characters, unicode) for condition types
Applied to files:
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T18:13:54.630Z
Learnt from: CR
Repo: ctrlplanedev/ctrlplane PR: 0
File: apps/workspace-engine/CLAUDE.md:0-0
Timestamp: 2025-08-12T18:13:54.630Z
Learning: Applies to apps/workspace-engine/**/*.go : Do not add extraneous inline comments that state the obvious
Applied to files:
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
📚 Learning: 2025-08-12T20:49:05.086Z
Learnt from: adityachoudhari26
Repo: ctrlplanedev/ctrlplane PR: 655
File: apps/workspace-engine/pkg/engine/workspace/fluent.go:166-171
Timestamp: 2025-08-12T20:49:05.086Z
Learning: The UpdateDeploymentVersions() method with OperationCreate case in apps/workspace-engine/pkg/engine/workspace/fluent.go is specifically designed only for creating new deployment versions, not for handling potential duplicates or existing versions.
Applied to files:
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go
🧬 Code graph analysis (2)
apps/workspace-engine/pkg/workspace/releasemanager/action/rollback/rollback.go (1)
apps/workspace-engine/pkg/oapi/oapi.gen.go (6)
Release(902-908)Job(657-671)ReleaseTarget(911-915)JobAgentConfig(683-685)Policy(768-781)JobStatus(688-688)
apps/workspace-engine/test/integration/opts.go (1)
apps/workspace-engine/pkg/oapi/oapi.gen.go (3)
JobStatus(688-688)PolicyRule(791-805)RollbackRule(1001-1007)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: Typecheck
- GitHub Check: Lint
- GitHub Check: tests
- GitHub Check: workspace-engine-tests
- GitHub Check: build (linux/amd64)
🔇 Additional comments (5)
apps/workspace-engine/oapi/openapi.json (1)
1316-1318: RollbackRule OpenAPI wiring matches generated types and behaviorThe
PolicyRule.rollbackproperty andRollbackRuleschema (withonJobStatusesandonVerificationFailure) are consistent with the generatedoapi.RollbackRulestruct and how the evaluator/action code consumes these fields. No changes needed.Also applies to: 1857-1873
apps/workspace-engine/pkg/workspace/releasemanager/policy/evaluator/rollback/rollback.go (1)
1-86: RollbackEvaluator logic cleanly implements rollback policy semanticsThe evaluator construction and
Evaluateflow look solid:
- Safely no-ops when store or rollback rule is absent.
- Correctly scopes to release targets and uses the latest job (by
CreatedAt) for evaluation.- Implements both rollback-on-status (
OnJobStatuses) and rollback-on-verification-failure (OnVerificationFailure) behavior, only hitting the verification store when the flag is enabled.- Returns detailed
RuleEvaluations with attached job/verification data for downstream consumers.This fits well with the surrounding rollback design and keeps concerns separated from the action/hooks trigger paths. Based on learnings, this aligns with the intended two-path rollback model.
apps/workspace-engine/test/e2e/engine_policy_rollback_test.go (3)
19-105: LGTM! Comprehensive rollback test structure.The test follows existing e2e patterns and correctly validates the basic rollback scenario: deploying v1.0.0 successfully, deploying v2.0.0 that fails, and asserting that a rollback job to v1.0.0 is created.
966-983: LGTM! Clean helper function.The
createJobUpdateEventhelper properly constructs aJobUpdateEventwith all required fields, reducing duplication across tests.
854-964: Good coverage of retry-rollback interaction.This test validates an important edge case: when a retry of the current release fails, the rollback should target the previous successful release (v1.0.0), not retry the same failed release (v2.0.0). The nested conditionals appropriately handle the asynchronous nature of the e2e test.
| // This condition will always fail (result.ok is undefined for sleep) | ||
| failureCondition := "result.ok == true" | ||
| metric := oapi.VerificationMetricSpec{ | ||
| Name: "always-fail", | ||
| IntervalSeconds: 1, | ||
| Count: 1, | ||
| FailureCondition: &failureCondition, | ||
| Provider: metricProvider, | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🧩 Analysis chain
🏁 Script executed:
# Find how FailureCondition is evaluated
rg -n 'FailureCondition|SuccessCondition' --type=go -A 3 -B 1 | head -50Repository: ctrlplanedev/ctrlplane
Length of output: 4267
🏁 Script executed:
# Find the evaluation logic for conditions
rg -n 'evaluateFailureCondition|evaluateSuccessCondition|EvaluateCondition' --type=go -B 2 -A 10Repository: ctrlplanedev/ctrlplane
Length of output: 48
🏁 Script executed:
# Look at the specific test file to see the actual implementation
rg -n 'TestEngine_Rollback_OnVerificationFailure\|failureCondition|SuccessCondition' apps/workspace-engine/test/e2e/engine_policy_rollback_test.go -B 2 -A 5Repository: ctrlplanedev/ctrlplane
Length of output: 698
🏁 Script executed:
# Search for metric evaluation logic
fd -t f -name '*.go' -path '*/metric*' | head -20Repository: ctrlplanedev/ctrlplane
Length of output: 296
🏁 Script executed:
# Look for verification evaluation logic
fd -t f -name '*.go' -path '*/verif*' | head -20Repository: ctrlplanedev/ctrlplane
Length of output: 296
🏁 Script executed:
# Search for where verification conditions are evaluated
rg -n 'FailureCondition\|SuccessCondition' --type=go -B 5 -A 5 | grep -E '(func|eval|process|check|metric)' | head -30Repository: ctrlplanedev/ctrlplane
Length of output: 48
🏁 Script executed:
# Find files that handle verification metric logic
fd metric --type=f --path '*/pkg/*' | head -20Repository: ctrlplanedev/ctrlplane
Length of output: 299
🏁 Script executed:
# Look for condition evaluation in the verification package
rg -n 'condition' --type=go -l | grep -i verifRepository: ctrlplanedev/ctrlplane
Length of output: 476
🏁 Script executed:
# Search for the actual evaluation of these conditions
rg -n 'Passed\|Failed' --type=go apps/workspace-engine/pkg/workspace -A 3 -B 3 | head -60Repository: ctrlplanedev/ctrlplane
Length of output: 48
🏁 Script executed:
# Look at how the test file expects the verification to behave
cat -n apps/workspace-engine/test/e2e/engine_policy_rollback_test.go | sed -n '566,600p'Repository: ctrlplanedev/ctrlplane
Length of output: 1726
🏁 Script executed:
# Examine the evaluator.go file to understand condition evaluation
cat apps/workspace-engine/pkg/workspace/releasemanager/verification/metrics/evaluator.goRepository: ctrlplanedev/ctrlplane
Length of output: 1538
🏁 Script executed:
# Also check the metric.go file for VerificationMetricSpec definition
cat apps/workspace-engine/pkg/workspace/releasemanager/verification/metrics/metric.goRepository: ctrlplanedev/ctrlplane
Length of output: 4399
🏁 Script executed:
# Look at the executor to see how these are used
cat apps/workspace-engine/pkg/workspace/releasemanager/verification/executor_test.go | head -150Repository: ctrlplanedev/ctrlplane
Length of output: 4456
Use SuccessCondition instead of FailureCondition to reliably fail verification when result.ok is undefined.
When FailureCondition: "result.ok == true" is evaluated against sleep provider output where result.ok is undefined, the CEL expression evaluates to false. This causes the verification system to check the SuccessCondition instead. Without a SuccessCondition, the measurement status becomes Inconclusive and polling continues—the verification never actually fails. Lines 667-674 and 750-758 correctly use SuccessCondition with the same expression to achieve the intended failure behavior.
🤖 Prompt for AI Agents
In apps/workspace-engine/test/e2e/engine_policy_rollback_test.go around lines
566 to 574, the metric currently sets FailureCondition to "result.ok == true"
which evaluates false for the sleep provider (result.ok is undefined) causing
the verification to become Inconclusive; replace FailureCondition with
SuccessCondition using the same expression so the metric uses SuccessCondition:
set SuccessCondition to "result.ok == true" and remove the FailureCondition
field so the intended failing behavior occurs when result.ok is not present.
Summary by CodeRabbit
New Features
Tests
✏️ Tip: You can customize this high-level summary in your review settings.