System User Password Rotation with CredentialRotation CRD#903
Draft
shunki-fujita wants to merge 16 commits into
Draft
System User Password Rotation with CredentialRotation CRD#903shunki-fujita wants to merge 16 commits into
shunki-fujita wants to merge 16 commits into
Conversation
2720d8d to
0f3bf1c
Compare
32efe48 to
8de6492
Compare
8de6492 to
420a50b
Compare
420a50b to
fa7c877
Compare
fa7c877 to
f75f60c
Compare
f75f60c to
ec27644
Compare
ec27644 to
bb28b21
Compare
bb28b21 to
423fd02
Compare
423fd02 to
b5c6f0f
Compare
Signed-off-by: shunki-fujita <shunki-fujita@cybozu.co.jp>
2332dd7 to
ea2d967
Compare
- Replace stale MySQLCluster controller ownerReference (different UID, e.g., cluster deleted and recreated) instead of requeueing forever. - Refuse to advance to Discarding phase when cluster replicas=0; emit DiscardRefused Warning Event and stay in Rotated. - Emit RotationBlocked Warning Event when Rotating is blocked by replicas=0, instead of silently logging. - Migrate CredentialRotation webhook to controller-runtime's generic admission.Validator[*CredentialRotation] API to fix the build. - Use Go 1.26 new(true) in place of ptr.To(true) to satisfy modernize/newexpr. Signed-off-by: shunki-fujita <shunki-fujita@cybozu.co.jp>
531bef4 to
ef37741
Compare
- Self-heal per-namespace user/my.cnf Secrets during active rotation by re-applying pending passwords in Rotated/Discarding/Discarded phases. Fall back to current passwords during the brief Discarded→Completed window where pending keys have been promoted. - Treat CR Get errors strictly: only NotFound/NoMatch are "no active rotation"; transient errors abort reconciliation instead of falling back to current passwords. - Refuse to adopt or act on a stale CredentialRotation (ownerRef UID does not match the live cluster). Apply this consistently in the CredentialRotation reconciler, ClusterManager, MySQLClusterReconciler, and the kubectl-moco CLI. - Validate CR delete: forbid mid-rotation deletes, but always allow garbage collection (owner NotFound, owner Terminating, stale UID). - Make DISCARD OLD PASSWORD per-user idempotent via HasDualPassword, matching the existing RETAIN behavior; required because MySQL rejects DISCARD when no retained password remains. - Reject discard while replicas=0 in handleStartDiscard and emit a RotationBlocked Warning Event when handleRotatingPhase is stalled by replicas=0. - Replace spec.discardOldPassword (bool) with spec.discardGeneration (int64) to eliminate edge-trigger semantics and the webhook transition contortions it required (false->true reject, forced reset on rotation bump). discardGeneration is monotonic and must satisfy 0 <= discardGeneration <= rotationGeneration; bumping it to match rotationGeneration triggers discard. Adds status.observedDiscardGeneration to mirror the existing rotation pattern, and updates the webhook to enforce monotonicity and the upper-bound invariant (phase guards remain: rotation bump requires \"\"/Completed, discard bump requires Rotated). kubectl-moco credential discard now patches discardGeneration to match rotationGeneration instead of toggling a bool. - Update the design doc to match the implementation (MySQLClusterReconciler code, DISCARD idempotency, stale-CR handling, MySQLCluster deletion, validation webhook, discardGeneration semantics, GitOps-vs-CLI guidance). - Add unit and envtest coverage for the new paths. Signed-off-by: shunki-fujita <shunki-fujita@cybozu.co.jp>
arosh
reviewed
May 15, 2026
ef37741 to
4a125b3
Compare
The dedicated ignore entry existed only to hide envtest binaries accidentally downloaded under clustering/ when tests were run directly without ENVTEST_ASSETS_DIR. The proper flow goes through `make clustering-envtest`, which places assets in repo-root bin/ (already ignored via /bin), so the per-package entry is no longer needed. Signed-off-by: shunki-fujita <shunki-fujita@cybozu.co.jp>
Reflect three implementation changes that had drifted from the doc: - Retained → Rotated step 2: the restart annotation is written via Server-Side Apply under a dedicated field manager with ForceOwnership, not a merge Patch. - passwordForDistribution: a partial pending state now surfaces as an error instead of silently falling back to the current password. - StatefulSet Rolling Restart section: describe the dedicated field manager (moco-credential-rotation) and ForceOwnership, including the case where a user pre-sets the same annotation key in cluster.Spec.PodTemplate.Annotations. Signed-off-by: shunki-fujita <shunki-fujita@cybozu.co.jp>
arosh
requested changes
May 15, 2026
arosh
reviewed
May 15, 2026
Member
|
@shunki-fujita |
- Mark RotationGeneration as required with Minimum=1. The webhook already rejected RotationGeneration <= 0, so keeping it as an optional int64 with omitempty made "0" indistinguishable from "omitted" while neither was a valid expression of intent. - Add Minimum=0 (and default=0 for spec) on DiscardGeneration and on the observed* counters in Status, so the optional/required intent is expressed at the CRD schema layer. - Mark Spec as required and add explicit +optional markers on ObjectMeta and Status so every field is unambiguously optional or required. - Constrain RotationPhase to its enum values via +kubebuilder:validation:Enum, matching the pattern used elsewhere in moco (backuppolicy, mysqlcluster, job) and surfacing the allowed values in the generated OpenAPI schema. Signed-off-by: shunki-fujita <shunki-fujita@cybozu.co.jp>
arosh
reviewed
May 20, 2026
arosh
reviewed
May 20, 2026
arosh
reviewed
May 20, 2026
arosh
reviewed
May 20, 2026
Address review feedback on api/v1beta2/credentialrotation_types.go: - Drop omitempty from int64 generation counters so the meaningful zero value is preserved across PATCH and matches the Kubernetes API convention for optional fields with significant zero values. - Add a UUID pattern validation to status.rotationID so the field shape is enforced at the API boundary. - Add explicit +required markers on the Spec field and on RotationGeneration to satisfy the "fields must be either optional or required" rule. Test rotationID literals in clustering/manager_test.go are updated to valid UUIDs to satisfy the new pattern validation. Signed-off-by: shunki-fujita <shunki-fujita@cybozu.co.jp>
Convert the CredentialRotation API from a single status.phase enum to three orthogonal metav1.Condition entries, aligning the new CRD with the Kubernetes API convention that discourages phase-style state machines. API changes: - Remove RotationPhase type and the Phase status field. - Add status.conditions[] with three Types: Rotating (in-flight cycle, current sub-step in Reason), OldPasswordRetained (MySQL dual-password state), Ready (latest spec generations fully reconciled). - Add ReasonXxx constants for the in-flight sub-steps (ApplyingRetain, DistributingPassword, AwaitingDiscard, WaitingForRollout, ApplyingDiscard, Finalizing, RotationBlocked, StalePending) and the idle reasons (NotStarted, Completed, RotationRefused). - Add CurrentStep, RotatingReason, IsIdle, IsDeletable, and SetRotating/SetOldPasswordRetained/SetReady helpers. - Express the spec invariant discardGeneration <= rotationGeneration as a CEL XValidation rule so the API server enforces it declaratively. - Document every Condition Type and Reason constant in godoc so the OpenAPI schema (and kubectl explain) surface workflow semantics. Behavioral changes: - Validating webhook gates rotationGeneration bumps on IsIdle() and discardGeneration bumps on CurrentStep == AwaitingDiscard. - ValidateDelete gains an escape hatch for RotationBlocked and StalePending so the documented recovery procedure (which begins with kubectl delete credentialrotation) succeeds. - CredentialRotationReconciler drives the cycle via CurrentStep() rather than a Phase enum; the rollout wait that was previously inlined in handleStartDiscard is now its own WaitingForRollout sub-step. - ClusterManager's clustering loop transitions Rotating and OldPasswordRetained together. - MySQLClusterReconciler.reconcileV1Secret chooses current vs pending passwords based on the Rotating condition's Reason. - kubectl moco credential rotate/discard use IsIdle() and CurrentStep() instead of Phase comparisons. - Repeated RotationRefused / StalePending events are suppressed so only condition transitions emit Warning events. CRD printer columns expose Step (Rotating.Reason), Retained (OldPasswordRetained.Status), and Ready (Ready.Status) instead of the single Phase column. The design doc (docs/designdoc/credential_rotation_crd.md) is rewritten end-to-end: state diagram, CRD example, Go type definitions, validation webhook, kubectl UI, rotate/discard step tables, ClusterManager and MySQLClusterReconciler code samples, crash safety table, deletion handling, and impact summary. Tests drive condition transitions via SetRotating / simulateClusterManagerAdvance helpers; assertions check the Rotating Reason via crRotatingReason. All envtest suites (api/controller/clustering/backup) pass and make lint is clean. Signed-off-by: shunki-fujita <shunki-fujita@cybozu.co.jp>
Add a top-level status.observedGeneration field and watches on
MySQLCluster and the controller-namespace Secret so the rotation
loop reacts immediately to external changes instead of relying on
the 15s requeue.
API:
- Add status.observedGeneration to surface the metadata.generation
that the controller has most recently reconciled. Standard
observed-generation pattern that kstatus / ArgoCD / Flux read.
- New helper StampObservedGeneration; the Reconciler and the
ClusterManager both call it immediately before every
Status().Update so the field stays in sync with each condition
transition.
Watches:
- Watch MySQLCluster with a predicate that only enqueues on
Spec.Replicas or DeletionTimestamp changes. A RotationRefused /
RotationBlocked CR now resumes the cycle the moment the cluster
scales back up, without waiting for the periodic requeue.
- Watch Secrets in the controller's system namespace and
reverse-parse the ControllerSecretName ("mysql-<ns>.<name>")
back to the CredentialRotation that owns it. Operators who
manually clean up a StalePending source Secret no longer have
to wait for the requeue tick before the controller retries.
- mapClusterToCR / mapSourceSecretToCR are pure mapping helpers;
clusterReplicasChangedPredicate avoids noise from unrelated
cluster updates.
All envtest suites (api/controller/clustering/backup) pass and
make lint is clean.
Signed-off-by: shunki-fujita <shunki-fujita@cybozu.co.jp>
Surface a couple of guardrails that kube-api-linter (KAL) flagged on the new types: - Add an explicit MaxLength=36 on status.rotationID. The existing Pattern constrains the value to a canonical UUID, but MaxLength is the convention KAL expects and also limits the OpenAPI/CRD schema explicitly. - Promote the Conditions field's SSA tags (patchStrategy=merge, patchMergeKey=type) into godoc markers in addition to the struct tag. This matches the modern style upstream Kubernetes uses on metav1.Condition slices and what KAL recommends. Remaining KAL findings on the rest of the v1beta2 API (godoc should start with the json field name, +default vs +kubebuilder:default, etc.) are pre-existing project-wide and are not addressed here to avoid churn unrelated to this PR. Signed-off-by: shunki-fujita <shunki-fujita@cybozu.co.jp>
Contributor
|
@shunki-fujita
Proposal: replace with truly independent observationsFor example: RotationReady, DiscardReady, OldPasswordRetained, etc... |
8f6219b to
eea952b
Compare
Move RotationReady / DiscardReady from "generation tracking" (observedX == specX) to "action-availability guard" semantics: - RotationReady=True ⟺ Idle: operator may run `credential rotate`. - DiscardReady=True ⟺ AwaitingDiscard: operator may run `credential discard`. - DualPassword=True ⟺ MySQL holds a dual-password set (unchanged). The two Ready conditions are now mutually exclusive — at most one is True at a time — which removes the "both True" ambiguity that made the printer columns hard to read. Also rename OldPasswordRetained → DualPassword (noun form parallel to MemoryPressure; the past-participle "Retained" was easily misread as "retain completed"). Add a StepAwaitingRollout sub-step between DistributingPassword and AwaitingDiscard so DiscardReady=True flips only after the rolling restart has settled. The post-distribute rollout wait moves out of ClusterManager's DISCARD handler into the Reconciler, matching the "DiscardReady=True means discard is genuinely safe now" semantics. Step() trusts DualPassword as the authoritative physical-state signal in the steady-state branch so legacy CRs persisted by the previous controller (which could carry both Ready conditions True) still classify correctly after upgrade. Signed-off-by: shunki-fujita <shunki-fujita@cybozu.co.jp>
eea952b to
c12cae9
Compare
Per-namespace Secrets are populated with pending passwords during DistributingPassword and must stay that way through every post- distribution step. DiscardRefused/DiscardBlocked were falling through to the default arm and reverting Secrets to current, which would leave restarted Pods with stale credentials after DISCARD eventually completes. Also tighten the ObservedRotationGeneration/ObservedDiscardGeneration godoc to state that spec/observed equality is necessary but not sufficient for the corresponding Ready condition. Signed-off-by: shunki-fujita <shunki-fujita@cybozu.co.jp>
Trim from 1136 to 620 lines: drop the Go type/webhook/controller code snippets and fold scattered prose into tables (Responsibility Split, MySQLClusterReconciler step→password map). Keep the step matrix, Reason values, crash-safety table, and runnable recovery shell commands. Fix two inconsistencies surfaced during the rewrite: the reconcileV1Secret table now lists every post-distribution step, and the Conditions section no longer claims RotationReady/DiscardReady are equivalent to IsIdle/IsAwaitingDiscard. Signed-off-by: shunki-fujita <shunki-fujita@cybozu.co.jp>
Tighten ValidateCreate so a CR can only be created in the canonical starting state. Without this, applying a manifest with discardGeneration already set to a non-zero value would short-circuit straight to ApplyingDiscard, skipping AwaitingRollout and running DISCARD before the post-distribute rolling restart settled — leaving Pods that still held the old password without a valid credential after DISCARD. Both fields stay +required (no kubebuilder default) so the intent is visible in the manifest itself. Signed-off-by: shunki-fujita <shunki-fujita@cybozu.co.jp>
…phase The boolean columns were derived from generation comparisons, not from anything actually persisted on the CR — placing them next to the three Conditions in the same table blurred the line between stored status and runtime-computed predicates. Collapse them into a single "Outstanding phase" column with values rotation / discard / —, and add a paragraph that names the column as derived and explains how it disambiguates the three (False, False, True) rows. Signed-off-by: shunki-fujita <shunki-fujita@cybozu.co.jp>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Close: #849
Propose a dedicated CredentialRotation CRD with its own controller as an alternative to embedding password rotation into MySQLCluster.