Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions docs/operator.md
Original file line number Diff line number Diff line change
Expand Up @@ -365,6 +365,16 @@ Turns [Automatic Crash Recovery](recovery.md#automatic-crash-recovery) on or off
| ----------- | ---------- |
| :material-toggle-switch-outline: boolean | `true` |

### `pxc.sstRetryCount`

Limits how many State Snapshot Transfer (SST) retries a joining node can perform before it stops retrying and remains running but unready.

Use this option to avoid endless SST retry loops and make recovery behavior predictable. For details about behavior and recovery steps, see [Limit SST retries](sst-retry-limit.md).

| Value type | Example |
| ----------- | ---------- |
| :material-numeric-1-box: int (minimum `1`) | `3` |

### `pxc.expose.enabled`

Enable or disable exposing Percona XtraDB Cluster instances with dedicated IP addresses.
Expand Down
41 changes: 41 additions & 0 deletions docs/sst-retry-limit.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Limit SST retries

When a Percona XtraDB Cluster node joins or rejoins the cluster, it receives data from an existing cluster member using the State Snapshot Transfer (SST) method. If SST fails repeatedly, the node can quickly enter an endless retry loop, using resources such as network bandwidth and impacting the overall cluster performance.

To prevent excessive and ineffective SST retry loops, you can set a limit on SST attempts for each joining node using the `spec.pxc.sstRetryCount` option in the Custom Resource. The Operator counts SST retries and records them in the `/var/lib/mysql/sst_retry_count` file inside the Pod.

When the number of SST attempts exceeds the specified threshold, the following occurs:

* The Operator creates the `/var/lib/mysql/sst_retry_limit_reached` marker file and further SST attempts are stopped.
* Liveness checks on the Pod continue to pass
* Readiness checks fail
* The Pod stays running, but remains unready
* The `SST retry limit reached` message is written in the container logs

This behavior lets you inspect the Pod and decide when to resume retries.

## Configure the retry limit

Set `spec.pxc.sstRetryCount` in your Custom Resource:

```yaml
apiVersion: pxc.percona.com/v1
kind: PerconaXtraDBCluster
metadata:
name: cluster1
spec:
pxc:
sstRetryCount: 3
```

The value must be an integer greater than or equal to `1`.

## Resume SST retries

To allow retries again, remove the marker file inside the affected Pod:

```bash
kubectl exec -it cluster1-pxc-2 -c pxc -- rm -f /var/lib/mysql/sst_retry_limit_reached
```

The retry state is cleared automatically after the node successfully reaches the `joined` or `synced` state.
1 change: 1 addition & 0 deletions mkdocs-base.yml
Original file line number Diff line number Diff line change
Expand Up @@ -209,6 +209,7 @@ nav:
- "Application and system users": users.md
- "Exposing the cluster": expose.md
- "Changing MySQL Options": options.md
- "Limit SST retries": sst-retry-limit.md
- "Control Pod scheduling": constraints.md
- "Labels and annotations": annotations.md
- "Local Storage support": storage.md
Expand Down