From 8d4fba8bac5327b1f75b45a1e9be24ccae8d9df4 Mon Sep 17 00:00:00 2001 From: Anastasia Alexadrova Date: Tue, 28 Apr 2026 12:21:50 +0200 Subject: [PATCH 1/2] K8SPXC-1619 Documented the SST retry limit --- docs/operator.md | 10 ++++++++++ docs/sst-retry-limit.md | 41 +++++++++++++++++++++++++++++++++++++++++ mkdocs-base.yml | 1 + 3 files changed, 52 insertions(+) create mode 100644 docs/sst-retry-limit.md diff --git a/docs/operator.md b/docs/operator.md index 1f612dc..ceec0fa 100644 --- a/docs/operator.md +++ b/docs/operator.md @@ -365,6 +365,16 @@ Turns [Automatic Crash Recovery](recovery.md#automatic-crash-recovery) on or off | ----------- | ---------- | | :material-toggle-switch-outline: boolean | `true` | +### `pxc.sstRetryCount` + +Limits how many State Snapshot Transfer (SST) retries a joining node can perform before it stops retrying and remains running but unready. + +Use this option to avoid endless SST retry loops and make recovery behavior predictable. For details about behavior and recovery steps, see [Limit SST retries](sst-retry-limit.md). + +| Value type | Example | +| ----------- | ---------- | +| :material-numeric-1-box: int (minimum `1`) | `3` | + ### `pxc.expose.enabled` Enable or disable exposing Percona XtraDB Cluster instances with dedicated IP addresses. diff --git a/docs/sst-retry-limit.md b/docs/sst-retry-limit.md new file mode 100644 index 0000000..806f534 --- /dev/null +++ b/docs/sst-retry-limit.md @@ -0,0 +1,41 @@ +# Limit SST retries + +When a Percona XtraDB Cluster node joins or rejoins the cluster, it receives data from an existing cluster member using the State Snapshot Transfer (SST) method. If SST fails repeatedly, the node can quickly enter an endless retry loop, using resources such network bandwidth and impacting the overall cluster performance. + +To prevent excessive and ineffective SST retry loops, you can set a limit on SST attempts for each joining node using the `spec.pxc.sstRetryCount` option in the Custom Resource. The Operator counts SST retries and records them in the `/var/lib/mysql/sst_retry_count` file inside the Pod. + +When the number of SST attempts exceeds the specified threshold, the following occurs: + +* The Operator creates the `/var/lib/mysql/sst_retry_limit_reached` marker file and further SST attempts are stopped. +* Liveness checks on the Pod continue to pass +* Readiness checks fail +* The Pod stays running, but remains unready +* The `SST retry limit reached` message is written in the container logs + +This behavior lets you inspect the Pod and decide when to resume retries. + +## Configure the retry limit + +Set `spec.pxc.sstRetryCount` in your Custom Resource: + +```yaml +apiVersion: pxc.percona.com/v1 +kind: PerconaXtraDBCluster +metadata: + name: cluster1 +spec: + pxc: + sstRetryCount: 3 +``` + +The value must be an integer greater than or equal to `1`. + +## Resume SST retries + +To allow retries again, remove the marker file inside the affected Pod: + +```bash +kubectl exec -it cluster1-pxc-2 -c pxc -- rm -f /var/lib/mysql/sst_retry_limit_reached +``` + +The retry state is cleared automatically after the node successfully reaches the `joined` or `synced` state. diff --git a/mkdocs-base.yml b/mkdocs-base.yml index 8bd4765..32f6223 100644 --- a/mkdocs-base.yml +++ b/mkdocs-base.yml @@ -209,6 +209,7 @@ nav: - "Application and system users": users.md - "Exposing the cluster": expose.md - "Changing MySQL Options": options.md + - "Limit SST retries": sst-retry-limit.md - "Control Pod scheduling": constraints.md - "Labels and annotations": annotations.md - "Local Storage support": storage.md From 555709ffdbd9a58cd336b9a659e689656266ca8e Mon Sep 17 00:00:00 2001 From: Anastasia Alexandrova Date: Tue, 28 Apr 2026 12:31:17 +0200 Subject: [PATCH 2/2] Update docs/sst-retry-limit.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> --- docs/sst-retry-limit.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/sst-retry-limit.md b/docs/sst-retry-limit.md index 806f534..3330b90 100644 --- a/docs/sst-retry-limit.md +++ b/docs/sst-retry-limit.md @@ -1,6 +1,6 @@ # Limit SST retries -When a Percona XtraDB Cluster node joins or rejoins the cluster, it receives data from an existing cluster member using the State Snapshot Transfer (SST) method. If SST fails repeatedly, the node can quickly enter an endless retry loop, using resources such network bandwidth and impacting the overall cluster performance. +When a Percona XtraDB Cluster node joins or rejoins the cluster, it receives data from an existing cluster member using the State Snapshot Transfer (SST) method. If SST fails repeatedly, the node can quickly enter an endless retry loop, using resources such as network bandwidth and impacting the overall cluster performance. To prevent excessive and ineffective SST retry loops, you can set a limit on SST attempts for each joining node using the `spec.pxc.sstRetryCount` option in the Custom Resource. The Operator counts SST retries and records them in the `/var/lib/mysql/sst_retry_count` file inside the Pod.