From 5617ebee62235b6fbbc37216f1e13f5e95946aef Mon Sep 17 00:00:00 2001 From: vitaliis Date: Wed, 17 Jun 2026 13:30:09 -0400 Subject: [PATCH 01/24] Updated tests for GODEBUG fips140 Modes --- .../fips/QA_STP_ClickHouse_Backup_FIPS.md | 40 +++--- .../requirements/fips/requirements.md | 51 ++++++-- .../requirements/fips/requirements.py | 120 ++++++++++++++---- 3 files changed, 152 insertions(+), 59 deletions(-) diff --git a/test/testflows/clickhouse_backup/requirements/fips/QA_STP_ClickHouse_Backup_FIPS.md b/test/testflows/clickhouse_backup/requirements/fips/QA_STP_ClickHouse_Backup_FIPS.md index a2e72df9..6b3b36da 100644 --- a/test/testflows/clickhouse_backup/requirements/fips/QA_STP_ClickHouse_Backup_FIPS.md +++ b/test/testflows/clickhouse_backup/requirements/fips/QA_STP_ClickHouse_Backup_FIPS.md @@ -43,7 +43,7 @@ Test results ensure that `clickhouse-backup`: To validate this, the following items SHALL be checked: -* The FIPS-built `clickhouse-backup` binary starts with the Go FIPS 140-3 cryptographic module enabled and reports it in `--version` output under all three Go FIPS runtime modes (`GODEBUG` unset, `GODEBUG=fips140=on`, `GODEBUG=fips140=only`). +* The FIPS-built `clickhouse-backup` binary reports the correct FIPS posture (`enabled` / `enforced`) in `clickhouse-backup-fips --fips-info` output under every Go FIPS runtime mode (`GODEBUG` unset, empty, `fips140=off`, `fips140=on`, `fips140=only`). * The FIPS cipher policy is enforced for inbound and outbound TLS when running in strict mode (`GODEBUG=fips140=only`). * The binary aborts on startup if the FIPS integrity check or any startup cryptographic self-test fails. * The binary stays operational against both FIPS-compatible and non-FIPS-compatible ClickHouse server versions. @@ -85,7 +85,7 @@ For TLS policy validation, the test suite also uses OpenSSL probe tools: | --- | --- | --- | | FIPS indicator in binary version output | Run `clickhouse-backup-fips --version` (`clickhouse_backup_fips_version_output`) and run control check on non-FIPS binary (`clickhouse_backup_fips_version_output_negative_check`) | FIPS binary reports `FIPS 140-3: true`; non-FIPS binary does not report `true` | | Build flag | Run `go version -m clickhouse-backup-fips` (`gofips140_build_flags_present`) | Output contains `build GOFIPS140=v1.0.0` | -| FIPS runtime behavior across Go modes | Run `godebug_fips140_modes` with `GODEBUG` unset, `fips140=on`, and `fips140=only` | For each mode, `--version` reports `FIPS 140-3: true`, and `tables` against the FIPS ClickHouse TLS endpoint succeeds (`exit 0`) | +| FIPS runtime posture across Go modes | Run `godebug_fips140_modes`, which runs `clickhouse-backup-fips --fips-info` with `GODEBUG` unset, empty, `fips140=off`, `fips140=on`, and `fips140=only` | For each mode, `--fips-info` reports the expected `enabled` / `enforced` flags (unset/empty/on → `true`/`false`; off → `false`/`false`; only → `true`/`true`) | Direct checks of `crypto/fips140.Version()` and `crypto/fips140.Enabled()` are not called as standalone assertions in the current `clickhouse-backup` TestFlows scenarios; their behavior is validated through `--version` output and runtime connectivity checks above. @@ -135,7 +135,7 @@ The following artifacts and tools will be used: * `openssl` CLI tool on the test host for TLS client and server probes. > [!NOTE] -> The regression sets `GODEBUG` per command rather than at the FIPS container level. The suite covers all three modes documented in [GODEBUG fips140 Modes](#godebug-fips140-modes) (`unset`, `fips140=on`, `fips140=only`), and the forced-CAST scenario also injects `GODEBUG=failfipscast=,fips140=on`; a single container-level value would prevent the matrix and the negative-self-test path from running. The Altinity FIPS Docker image still ships with `GODEBUG=fips140=only` as documented in [FIPS Configuration](#fips-configuration); that default is honored when the image is run as-is. +> The regression sets `GODEBUG` per command rather than at the FIPS container level. The `godebug_fips140_modes` scenario covers every mode documented in [GODEBUG fips140 Modes](#godebug-fips140-modes) (`unset`, empty, `fips140=off`, `fips140=on`, `fips140=only`), and the forced-CAST scenario also injects `GODEBUG=failfipscast=,fips140=on`; a single container-level value would prevent the matrix and the negative-self-test path from running. The Altinity FIPS Docker image still ships with `GODEBUG=fips140=only` as documented in [FIPS Configuration](#fips-configuration); that default is honored when the image is run as-is. ## Inputs and Outputs of `clickhouse-backup-fips` @@ -181,32 +181,28 @@ Expected result: ## GODEBUG `fips140` Modes -Check that `clickhouse-backup-fips` behaves correctly under each of the three Go FIPS runtime modes listed below. +Check that `clickhouse-backup-fips` reports the correct FIPS posture under every Go FIPS `fips140` runtime mode. -For every mode run both `--version` and a basic `tables` command against the FIPS-compatible Altinity ClickHouse server `altinity/clickhouse-server:25.3.8.30001.altinityfips`. +The binary exposes its build/runtime posture via `clickhouse-backup-fips --fips-info`, which prints a line-oriented `key: value` dump including, under the `fips_module:` block, `enabled:` and `enforced:` booleans (these map to Go's `crypto/fips140.Enabled()` and `crypto/fips140.Enforced()`). The binary is built with `DefaultGODEBUG=fips140=on`, so leaving `GODEBUG` unset (or empty) keeps FIPS enabled but not enforced. -* `GODEBUG` not set — FIPS mode is enabled by build-time default (`GOFIPS140=v1.0.0`). +For each mode, run `clickhouse-backup-fips --fips-info` with the corresponding `GODEBUG` value and assert the reported `enabled` / `enforced` flags match the table below (scenario `godebug_fips140_modes`): - Expected result: - * `--version` reports `FIPS 140-3: true`. - * `tables` returns the list of tables. +| `GODEBUG` runtime | `enabled` | `enforced` | Notes | +| -------------------- | --------- | ---------- | ----- | +| unset | true | false | Build-time default (`DefaultGODEBUG=fips140=on`). | +| empty (`GODEBUG=`) | true | false | Same as unset. | +| `fips140=off` | false | false | FIPS disabled. | +| `fips140=on` | true | false | FIPS enabled, not enforced. Mode used for the forced CAST test below. | +| `fips140=only` | true | true | Strict enforcement; any non-approved cryptographic operation triggers an error or panic. Mode used for the TLS policy tests below and the default of the FIPS Docker image. | -* `GODEBUG=fips140=on` — FIPS mode is enabled explicitly without strict enforcement. This is the mode used for the forced CAST test below. +To set each case explicitly (independent of any container-level `GODEBUG`): - Expected result: - * `--version` reports `FIPS 140-3: true`. - * `tables` returns the list of tables. - -* `GODEBUG=fips140=only` — FIPS mode is enabled with strict enforcement; any non-approved cryptographic operation triggers an error or panic. This is the mode used for the TLS policy tests below and the default of the FIPS Docker image. - - Expected result: - * `--version` reports `FIPS 140-3: true`. - * `tables` against an approved TLS configuration returns the list of tables. - * Non-approved cryptographic operations cause the binary to fail. - * The full `clickhouse-backup` TestFlows regression suite runs in this mode without panics or strict-FIPS-only regressions. +* unset: `env -u GODEBUG clickhouse-backup-fips --fips-info` +* empty: `env GODEBUG= clickhouse-backup-fips --fips-info` +* off / on / only: `env GODEBUG=fips140= clickhouse-backup-fips --fips-info` > [!NOTE] -> No negative test exists for "the binary panics when `GODEBUG` is unset". `clickhouse-backup-fips` is built with `GOFIPS140=v1.0.0`, so the FIPS module is enabled by the build flag, not by `GODEBUG`. The "GODEBUG not set" mode above IS the production-default operation; the binary is expected to operate normally there. +> No negative test exists for "the binary panics when `GODEBUG` is unset". `clickhouse-backup-fips` is built with `GOFIPS140=v1.0.0`, so the FIPS module is enabled by the build flag (`DefaultGODEBUG=fips140=on`), not by the runtime `GODEBUG`. The "GODEBUG unset" mode above IS the production-default operation; the binary is expected to operate normally there. ## FIPS Integrity Self-test Failure on Tampered Binary diff --git a/test/testflows/clickhouse_backup/requirements/fips/requirements.md b/test/testflows/clickhouse_backup/requirements/fips/requirements.md index c91c6213..98b6c3cc 100644 --- a/test/testflows/clickhouse_backup/requirements/fips/requirements.md +++ b/test/testflows/clickhouse_backup/requirements/fips/requirements.md @@ -29,8 +29,10 @@ * 4.3.2 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Version.BuildSetting](#rqsrs-013clickhousebackuputilityfipsversionbuildsetting) * 4.4 [GODEBUG fips140 Modes](#godebug-fips140-modes) * 4.4.1 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.Unset](#rqsrs-013clickhousebackuputilityfipsgodebugunset) - * 4.4.2 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.On](#rqsrs-013clickhousebackuputilityfipsgodebugon) - * 4.4.3 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.Only](#rqsrs-013clickhousebackuputilityfipsgodebugonly) + * 4.4.2 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.Empty](#rqsrs-013clickhousebackuputilityfipsgodebugempty) + * 4.4.3 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.Off](#rqsrs-013clickhousebackuputilityfipsgodebugoff) + * 4.4.4 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.On](#rqsrs-013clickhousebackuputilityfipsgodebugon) + * 4.4.5 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.Only](#rqsrs-013clickhousebackuputilityfipsgodebugonly) * 4.5 [Startup Integrity Self-Tests](#startup-integrity-self-tests) * 4.5.1 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.SelfTest.Integrity](#rqsrs-013clickhousebackuputilityfipsselftestintegrity) * 4.5.2 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.SelfTest.TamperedBinary](#rqsrs-013clickhousebackuputilityfipsselftesttamperedbinary) @@ -190,29 +192,56 @@ The output of `go version -m $(which clickhouse-backup-fips)` SHALL contain the ### GODEBUG fips140 Modes +The [clickhouse-backup-fips] binary SHALL expose its FIPS build and runtime posture via +`clickhouse-backup-fips --fips-info`, which prints a line-oriented `key: value` dump including, +under the `fips_module:` block, `enabled: ` and `enforced: `. The binary +is built with `DefaultGODEBUG=fips140=on`, so the `fips140` runtime key SHALL produce the +following posture: + +| `GODEBUG` runtime | `enabled` | `enforced` | +| ----------------- | --------- | ---------- | +| unset | true | false | +| empty (`GODEBUG=`)| true | false | +| `fips140=off` | false | false | +| `fips140=on` | true | false | +| `fips140=only` | true | true | + #### RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.Unset version: 1.0 -When `GODEBUG` is not set, the [clickhouse-backup-fips] binary SHALL operate with FIPS 140-3 mode -enabled by build-time default, `--version` SHALL report `FIPS 140-3: true`, and the basic -`clickhouse-backup-fips tables` command SHALL return the list of tables from a FIPS-configured -ClickHouse endpoint. +When `GODEBUG` is not set, the [clickhouse-backup-fips] binary SHALL rely on its build-time default +(`DefaultGODEBUG=fips140=on`) and operate with FIPS 140-3 mode enabled but not enforced. The +output of `clickhouse-backup-fips --fips-info` SHALL report `enabled: true` and `enforced: false`. + +#### RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.Empty +version: 1.0 + +When started with an empty `GODEBUG` (i.e. `GODEBUG=`), the [clickhouse-backup-fips] binary SHALL +behave identically to the unset case, relying on its build-time default (`fips140=on`) with +FIPS 140-3 mode enabled but not enforced. The output of `clickhouse-backup-fips --fips-info` +SHALL report `enabled: true` and `enforced: false`. + +#### RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.Off +version: 1.0 + +When started with `GODEBUG=fips140=off`, the [clickhouse-backup-fips] binary SHALL disable +FIPS 140-3 mode. The output of `clickhouse-backup-fips --fips-info` SHALL report `enabled: false` +and `enforced: false`. #### RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.On version: 1.0 When started with `GODEBUG=fips140=on`, the [clickhouse-backup-fips] binary SHALL operate with -FIPS 140-3 mode enabled without strict enforcement, `--version` SHALL report `FIPS 140-3: true`, -and the basic `clickhouse-backup-fips tables` command SHALL return the list of tables from a -FIPS-configured ClickHouse endpoint. +FIPS 140-3 mode enabled without strict enforcement. The output of +`clickhouse-backup-fips --fips-info` SHALL report `enabled: true` and `enforced: false`. #### RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.Only version: 1.0 When started with `GODEBUG=fips140=only`, the [clickhouse-backup-fips] binary SHALL operate with strict FIPS 140-3 enforcement so that any non-approved cryptographic operation triggers an error -or panic, `--version` SHALL report `FIPS 140-3: true`, and `clickhouse-backup-fips tables` against -an approved [TLS] configuration SHALL return the list of tables. +or panic. The output of `clickhouse-backup-fips --fips-info` SHALL report `enabled: true` and +`enforced: true`. ### Startup Integrity Self-Tests diff --git a/test/testflows/clickhouse_backup/requirements/fips/requirements.py b/test/testflows/clickhouse_backup/requirements/fips/requirements.py index b34ba545..7a5b656c 100644 --- a/test/testflows/clickhouse_backup/requirements/fips/requirements.py +++ b/test/testflows/clickhouse_backup/requirements/fips/requirements.py @@ -1,6 +1,6 @@ # These requirements were auto generated # from software requirements specification (SRS) -# document by TestFlows v2.1.240306.1133530. +# document by TestFlows v2.0.250110.1002922. # Do not edit by hand but re-generate instead # using 'tfs requirements generate' command. from testflows.core import Specification @@ -201,10 +201,9 @@ type=None, uid=None, description=( - 'When `GODEBUG` is not set, the [clickhouse-backup-fips] binary SHALL operate with FIPS 140-3 mode\n' - 'enabled by build-time default, `--version` SHALL report `FIPS 140-3: true`, and the basic\n' - '`clickhouse-backup-fips tables` command SHALL return the list of tables from a FIPS-configured\n' - 'ClickHouse endpoint.\n' + 'When `GODEBUG` is not set, the [clickhouse-backup-fips] binary SHALL rely on its build-time default\n' + '(`DefaultGODEBUG=fips140=on`) and operate with FIPS 140-3 mode enabled but not enforced. The\n' + 'output of `clickhouse-backup-fips --fips-info` SHALL report `enabled: true` and `enforced: false`.\n' '\n' ), link=None, @@ -212,6 +211,43 @@ num='4.4.1' ) +RQ_SRS_013_ClickHouse_BackupUtility_FIPS_GODEBUG_Empty = Requirement( + name='RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.Empty', + version='1.0', + priority=None, + group=None, + type=None, + uid=None, + description=( + 'When started with an empty `GODEBUG` (i.e. `GODEBUG=`), the [clickhouse-backup-fips] binary SHALL\n' + 'behave identically to the unset case, relying on its build-time default (`fips140=on`) with\n' + 'FIPS 140-3 mode enabled but not enforced. The output of `clickhouse-backup-fips --fips-info`\n' + 'SHALL report `enabled: true` and `enforced: false`.\n' + '\n' + ), + link=None, + level=3, + num='4.4.2' +) + +RQ_SRS_013_ClickHouse_BackupUtility_FIPS_GODEBUG_Off = Requirement( + name='RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.Off', + version='1.0', + priority=None, + group=None, + type=None, + uid=None, + description=( + 'When started with `GODEBUG=fips140=off`, the [clickhouse-backup-fips] binary SHALL disable\n' + 'FIPS 140-3 mode. The output of `clickhouse-backup-fips --fips-info` SHALL report `enabled: false`\n' + 'and `enforced: false`.\n' + '\n' + ), + link=None, + level=3, + num='4.4.3' +) + RQ_SRS_013_ClickHouse_BackupUtility_FIPS_GODEBUG_On = Requirement( name='RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.On', version='1.0', @@ -221,14 +257,13 @@ uid=None, description=( 'When started with `GODEBUG=fips140=on`, the [clickhouse-backup-fips] binary SHALL operate with\n' - 'FIPS 140-3 mode enabled without strict enforcement, `--version` SHALL report `FIPS 140-3: true`,\n' - 'and the basic `clickhouse-backup-fips tables` command SHALL return the list of tables from a\n' - 'FIPS-configured ClickHouse endpoint.\n' + 'FIPS 140-3 mode enabled without strict enforcement. The output of\n' + '`clickhouse-backup-fips --fips-info` SHALL report `enabled: true` and `enforced: false`.\n' '\n' ), link=None, level=3, - num='4.4.2' + num='4.4.4' ) RQ_SRS_013_ClickHouse_BackupUtility_FIPS_GODEBUG_Only = Requirement( @@ -241,13 +276,13 @@ description=( 'When started with `GODEBUG=fips140=only`, the [clickhouse-backup-fips] binary SHALL operate with\n' 'strict FIPS 140-3 enforcement so that any non-approved cryptographic operation triggers an error\n' - 'or panic, `--version` SHALL report `FIPS 140-3: true`, and `clickhouse-backup-fips tables` against\n' - 'an approved [TLS] configuration SHALL return the list of tables.\n' + 'or panic. The output of `clickhouse-backup-fips --fips-info` SHALL report `enabled: true` and\n' + '`enforced: true`.\n' '\n' ), link=None, level=3, - num='4.4.3' + num='4.4.5' ) RQ_SRS_013_ClickHouse_BackupUtility_FIPS_SelfTest_Integrity = Requirement( @@ -608,8 +643,10 @@ Heading(name='RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Version.BuildSetting', level=3, num='4.3.2'), Heading(name='GODEBUG fips140 Modes', level=2, num='4.4'), Heading(name='RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.Unset', level=3, num='4.4.1'), - Heading(name='RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.On', level=3, num='4.4.2'), - Heading(name='RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.Only', level=3, num='4.4.3'), + Heading(name='RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.Empty', level=3, num='4.4.2'), + Heading(name='RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.Off', level=3, num='4.4.3'), + Heading(name='RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.On', level=3, num='4.4.4'), + Heading(name='RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.Only', level=3, num='4.4.5'), Heading(name='Startup Integrity Self-Tests', level=2, num='4.5'), Heading(name='RQ.SRS-013.ClickHouse.BackupUtility.FIPS.SelfTest.Integrity', level=3, num='4.5.1'), Heading(name='RQ.SRS-013.ClickHouse.BackupUtility.FIPS.SelfTest.TamperedBinary', level=3, num='4.5.2'), @@ -646,6 +683,8 @@ RQ_SRS_013_ClickHouse_BackupUtility_FIPS_Version_Status, RQ_SRS_013_ClickHouse_BackupUtility_FIPS_Version_BuildSetting, RQ_SRS_013_ClickHouse_BackupUtility_FIPS_GODEBUG_Unset, + RQ_SRS_013_ClickHouse_BackupUtility_FIPS_GODEBUG_Empty, + RQ_SRS_013_ClickHouse_BackupUtility_FIPS_GODEBUG_Off, RQ_SRS_013_ClickHouse_BackupUtility_FIPS_GODEBUG_On, RQ_SRS_013_ClickHouse_BackupUtility_FIPS_GODEBUG_Only, RQ_SRS_013_ClickHouse_BackupUtility_FIPS_SelfTest_Integrity, @@ -663,7 +702,7 @@ RQ_SRS_013_ClickHouse_BackupUtility_FIPS_Configuration_SecureClickHouse, RQ_SRS_013_ClickHouse_BackupUtility_FIPS_Server_Listener, ), - content=''' + content=r''' # QA-SRS013 ClickHouse Backup Utility FIPS Compatibility # Software Requirements Specification @@ -695,8 +734,10 @@ * 4.3.2 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Version.BuildSetting](#rqsrs-013clickhousebackuputilityfipsversionbuildsetting) * 4.4 [GODEBUG fips140 Modes](#godebug-fips140-modes) * 4.4.1 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.Unset](#rqsrs-013clickhousebackuputilityfipsgodebugunset) - * 4.4.2 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.On](#rqsrs-013clickhousebackuputilityfipsgodebugon) - * 4.4.3 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.Only](#rqsrs-013clickhousebackuputilityfipsgodebugonly) + * 4.4.2 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.Empty](#rqsrs-013clickhousebackuputilityfipsgodebugempty) + * 4.4.3 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.Off](#rqsrs-013clickhousebackuputilityfipsgodebugoff) + * 4.4.4 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.On](#rqsrs-013clickhousebackuputilityfipsgodebugon) + * 4.4.5 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.Only](#rqsrs-013clickhousebackuputilityfipsgodebugonly) * 4.5 [Startup Integrity Self-Tests](#startup-integrity-self-tests) * 4.5.1 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.SelfTest.Integrity](#rqsrs-013clickhousebackuputilityfipsselftestintegrity) * 4.5.2 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.SelfTest.TamperedBinary](#rqsrs-013clickhousebackuputilityfipsselftesttamperedbinary) @@ -856,29 +897,56 @@ ### GODEBUG fips140 Modes +The [clickhouse-backup-fips] binary SHALL expose its FIPS build and runtime posture via +`clickhouse-backup-fips --fips-info`, which prints a line-oriented `key: value` dump including, +under the `fips_module:` block, `enabled: ` and `enforced: `. The binary +is built with `DefaultGODEBUG=fips140=on`, so the `fips140` runtime key SHALL produce the +following posture: + +| `GODEBUG` runtime | `enabled` | `enforced` | +| ----------------- | --------- | ---------- | +| unset | true | false | +| empty (`GODEBUG=`)| true | false | +| `fips140=off` | false | false | +| `fips140=on` | true | false | +| `fips140=only` | true | true | + #### RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.Unset version: 1.0 -When `GODEBUG` is not set, the [clickhouse-backup-fips] binary SHALL operate with FIPS 140-3 mode -enabled by build-time default, `--version` SHALL report `FIPS 140-3: true`, and the basic -`clickhouse-backup-fips tables` command SHALL return the list of tables from a FIPS-configured -ClickHouse endpoint. +When `GODEBUG` is not set, the [clickhouse-backup-fips] binary SHALL rely on its build-time default +(`DefaultGODEBUG=fips140=on`) and operate with FIPS 140-3 mode enabled but not enforced. The +output of `clickhouse-backup-fips --fips-info` SHALL report `enabled: true` and `enforced: false`. + +#### RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.Empty +version: 1.0 + +When started with an empty `GODEBUG` (i.e. `GODEBUG=`), the [clickhouse-backup-fips] binary SHALL +behave identically to the unset case, relying on its build-time default (`fips140=on`) with +FIPS 140-3 mode enabled but not enforced. The output of `clickhouse-backup-fips --fips-info` +SHALL report `enabled: true` and `enforced: false`. + +#### RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.Off +version: 1.0 + +When started with `GODEBUG=fips140=off`, the [clickhouse-backup-fips] binary SHALL disable +FIPS 140-3 mode. The output of `clickhouse-backup-fips --fips-info` SHALL report `enabled: false` +and `enforced: false`. #### RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.On version: 1.0 When started with `GODEBUG=fips140=on`, the [clickhouse-backup-fips] binary SHALL operate with -FIPS 140-3 mode enabled without strict enforcement, `--version` SHALL report `FIPS 140-3: true`, -and the basic `clickhouse-backup-fips tables` command SHALL return the list of tables from a -FIPS-configured ClickHouse endpoint. +FIPS 140-3 mode enabled without strict enforcement. The output of +`clickhouse-backup-fips --fips-info` SHALL report `enabled: true` and `enforced: false`. #### RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GODEBUG.Only version: 1.0 When started with `GODEBUG=fips140=only`, the [clickhouse-backup-fips] binary SHALL operate with strict FIPS 140-3 enforcement so that any non-approved cryptographic operation triggers an error -or panic, `--version` SHALL report `FIPS 140-3: true`, and `clickhouse-backup-fips tables` against -an approved [TLS] configuration SHALL return the list of tables. +or panic. The output of `clickhouse-backup-fips --fips-info` SHALL report `enabled: true` and +`enforced: true`. ### Startup Integrity Self-Tests From 6c86ac3933a825572e32fc72a28bc4558ce1e131 Mon Sep 17 00:00:00 2001 From: vitaliis Date: Wed, 17 Jun 2026 14:16:10 -0400 Subject: [PATCH 02/24] -Added tests for stress mode - Added godebug_fips140_modes scenario --- .../testflows/clickhouse_backup/regression.py | 8 +- .../clickhouse_backup/tests/fips_140_3.py | 247 +++++++++++++++--- 2 files changed, 219 insertions(+), 36 deletions(-) diff --git a/test/testflows/clickhouse_backup/regression.py b/test/testflows/clickhouse_backup/regression.py index 2db9447c..39311965 100755 --- a/test/testflows/clickhouse_backup/regression.py +++ b/test/testflows/clickhouse_backup/regression.py @@ -18,9 +18,7 @@ from clickhouse_backup.requirements.requirements import * -from clickhouse_backup.requirements.fips.requirements import ( - QA_SRS013_ClickHouse_Backup_Utility_FIPS_Compatibility, -) +from clickhouse_backup.requirements.fips.requirements import * from clickhouse_backup.tests.common import simple_data_types_columns # `--fips-godebug` choices mapped to the `GODEBUG` value exported on @@ -31,6 +29,7 @@ # * `only` - FIPS active with strict enforcement (default). # * `off` - FIPS disabled at runtime. FIPS_GODEBUG_VALUES = { + "empty": "", "unset": None, "on": "fips140=on", "only": "fips140=only", @@ -105,6 +104,9 @@ def regression(self, local, stress=False, fips=True, fips_godebug="only"): self.context.backup_config_origin = origin_path self.context.backup_config_file = config_path self.context.cluster = cluster + # `--stress` widens the FIPS cipher/suite coverage (see tests/fips_140_3.py). + # Default runs keep the documented minimum so they stay fast. + self.context.stress = stress self.context.nodes = [self.context.cluster.node(n) for n in ["clickhouse1", "clickhouse2"]] self.context.backup = self.context.cluster.node("clickhouse_backup") self.context.kafka = self.context.cluster.node("kafka") diff --git a/test/testflows/clickhouse_backup/tests/fips_140_3.py b/test/testflows/clickhouse_backup/tests/fips_140_3.py index af913e5d..0b544eef 100644 --- a/test/testflows/clickhouse_backup/tests/fips_140_3.py +++ b/test/testflows/clickhouse_backup/tests/fips_140_3.py @@ -46,27 +46,57 @@ "TLS_AES_256_GCM_SHA384", ) -# Single non-approved TLSv1.3 suite -NON_FIPS_TLS13 = ("TLS_CHACHA20_POLY1305_SHA256",) - -# Non-FIPS TLS1.2 suites for the outbound verification. -# The set covers three different reasons a TLS1.2 cipher must be rejected -# by the Go FIPS 140-3 outbound policy: -# -# 1. Non-approved bulk cipher (CHACHA20): -# ECDHE-RSA-CHACHA20-POLY1305 -# 2. Non-approved key exchange (DHE): -# DHE-RSA-AES256-GCM-SHA384 -# DHE-RSA-AES128-GCM-SHA256 -# 3. Plain RSA static key exchange (no forward secrecy): -# AES256-GCM-SHA384 -# AES128-GCM-SHA256 +# Non-approved TLSv1.3 suites the FIPS policy must reject. +# Default runs probe the single documented suite; the `_STRESS` list adds the +# remaining non-approved TLSv1.3 suites (CCM bulk ciphers, outside the GCM-only +# FIPS-approved set). The wider `--stress` coverage is slower by design. +NON_FIPS_TLS13 = ( + "TLS_CHACHA20_POLY1305_SHA256", +) +NON_FIPS_TLS13_STRESS = NON_FIPS_TLS13 + ( + "TLS_AES_128_CCM_SHA256", + "TLS_AES_128_CCM_8_SHA256", +) + +# Non-approved TLSv1.2 ciphers the FIPS outbound policy must reject, one per +# rejection reason: +# * non-approved bulk cipher (ChaCha20): ECDHE-RSA-CHACHA20-POLY1305 +# * non-approved key exchange (DHE): DHE-RSA-AES256-GCM-SHA384 / -AES128- +# * plain RSA key exchange (no forward secrecy): AES256-GCM-SHA384 / AES128-GCM-SHA256 +# The `_STRESS` list adds more ciphers from the same rejection classes +# (ECDSA/DHE ChaCha20 and CBC-mode ciphers outside the GCM-only approved set). NON_FIPS_TLS12_OUTBOUND = ( -"ECDHE-RSA-CHACHA20-POLY1305", -"DHE-RSA-AES256-GCM-SHA384", -"DHE-RSA-AES128-GCM-SHA256", -"AES256-GCM-SHA384", -"AES128-GCM-SHA256", + "ECDHE-RSA-CHACHA20-POLY1305", + "DHE-RSA-AES256-GCM-SHA384", + "DHE-RSA-AES128-GCM-SHA256", + "AES256-GCM-SHA384", + "AES128-GCM-SHA256", +) +NON_FIPS_TLS12_OUTBOUND_STRESS = NON_FIPS_TLS12_OUTBOUND + ( + "ECDHE-ECDSA-CHACHA20-POLY1305", + "DHE-RSA-CHACHA20-POLY1305", + "ECDHE-RSA-AES128-SHA", + "ECDHE-RSA-AES256-SHA", + "ECDHE-RSA-AES128-SHA256", + "ECDHE-RSA-AES256-SHA384", + "AES128-SHA", + "AES256-SHA", + "AES128-SHA256", + "AES256-SHA256", +) + +# Non-approved TLSv1.2 ciphers the inbound REST API listener must reject. The +# base adds RC4 / 3DES on top of the shared ChaCha20 reject; the `_STRESS` list +# reuses the outbound stress set (which already covers ChaCha20) and adds only +# the inbound-specific RC4 / 3DES. +NON_FIPS_TLS12_INBOUND = ( + "ECDHE-RSA-CHACHA20-POLY1305", + "RC4-SHA", + "DES-CBC3-SHA", +) +NON_FIPS_TLS12_INBOUND_STRESS = NON_FIPS_TLS12_OUTBOUND_STRESS + ( + "RC4-SHA", + "DES-CBC3-SHA", ) CLI_CMD_TIMEOUT_SEC = 15 # Timeout for clickhouse-backup-fips command runs. @@ -581,6 +611,131 @@ def connectivity_against_non_fips_clickhouse_server(self): cluster.stop_auxiliary_container(NON_FIPS_CH_SERVER_NAME) + + + +def _godebug_env_prefix(godebug): + """Return the `env ...` command prefix that applies one GODEBUG case. + + `None` strips any inherited `GODEBUG` (`env -u GODEBUG`), `""` sets it + empty (`env GODEBUG=`), and any other value sets `GODEBUG=fips140=`. + Setting it explicitly per case makes the test independent of the suite-wide + `--fips-godebug` selection that the container otherwise exports. + """ + if godebug is None: + return "env -u GODEBUG " + if godebug == "": + return "env GODEBUG= " + return f"env GODEBUG=fips140={godebug} " + + +def _read_fips_info_field(output, field): + """Return the value of a `:` line from `--fips-info` output.""" + for line in output.splitlines(): + stripped = line.strip() + if stripped.startswith(f"{field}:"): + return stripped.split(":", 1)[1].strip() + return None + + +@TestStep(Then) +def _check_fips_info_values(self, backup_fips, *, name, godebug, + expected_enabled, expected_enforced): + """Run `--fips-info` for one GODEBUG case and assert enabled/enforced. + + Asserts the command succeeds and that the `fips_module` block reports the + expected `enabled` / `enforced` booleans for the given GODEBUG mode. + """ + cmd = f"{_godebug_env_prefix(godebug)}{FIPS_BINARY_IN_CONTAINER} --fips-info" + result = backup_fips.cmd(cmd, no_checks=True) + output = result.output or "" + + assert result.exitcode == 0, error( + f"`--fips-info` failed for GODEBUG mode `{name}` " + f"(exit={result.exitcode}).\n{output}" + ) + + enabled = _read_fips_info_field(output, "enabled") + enforced = _read_fips_info_field(output, "enforced") + want_enabled = str(expected_enabled).lower() + want_enforced = str(expected_enforced).lower() + + assert enabled == want_enabled, error( + f"GODEBUG mode `{name}`: expected `enabled: {want_enabled}`, " + f"got `enabled: {enabled}`.\n{output}" + ) + assert enforced == want_enforced, error( + f"GODEBUG mode `{name}`: expected `enforced: {want_enforced}`, " + f"got `enforced: {enforced}`.\n{output}" + ) + + +@TestScenario +@Requirements( + RQ_SRS_013_ClickHouse_BackupUtility_FIPS_GODEBUG_Unset("1.0"), + RQ_SRS_013_ClickHouse_BackupUtility_FIPS_GODEBUG_Empty("1.0"), + RQ_SRS_013_ClickHouse_BackupUtility_FIPS_GODEBUG_Off("1.0"), + RQ_SRS_013_ClickHouse_BackupUtility_FIPS_GODEBUG_On("1.0"), + RQ_SRS_013_ClickHouse_BackupUtility_FIPS_GODEBUG_Only("1.0"), +) +def godebug_fips140_modes(self): + """Validate `--fips-info` FIPS posture for every GODEBUG fips140 mode. + + For each documented GODEBUG runtime mode (`unset`, empty, `fips140=off`, + `fips140=on`, `fips140=only`), run `clickhouse-backup-fips --fips-info` and + assert the reported `enabled` / `enforced` flags match the expected truth + table: + + GODEBUG runtime enabled enforced + unset true false + empty ("") true false + fips140=off false false + fips140=on true false + fips140=only true true + + Each mode is set explicitly per command, so the result does not depend on + the suite-wide `--fips-godebug` selection. + """ + # Expected `clickhouse-backup-fips --fips-info` posture for every GODEBUG + # fips140 runtime mode. The FIPS binary is built with `DefaultGODEBUG` set to + # `fips140=on`, so leaving GODEBUG unset or empty keeps FIPS *enabled* but not + # *enforced*; `fips140=on` is the same; `fips140=only` adds strict enforcement; + # `fips140=off` disables FIPS entirely. + # + # GODEBUG runtime enabled enforced + # ------------------- ------- -------- + # unset true false + # empty ("") true false + # fips140=off false false + # fips140=on true false + # fips140=only true true + # + # Each tuple is (case name, GODEBUG value, expected enabled, expected enforced), + # where the GODEBUG value is: + # None -> GODEBUG removed from the environment (the "unset" case), + # "" -> GODEBUG present but empty, + # else -> GODEBUG=fips140=. + FIPS_GODEBUG_INFO_CASES = [ + ("unset", None, True, False), + ("empty", "", True, False), + ("off", "off", False, False), + ("on", "on", True, False), + ("only", "only", True, True), + ] + backup_fips = _require_fips_container(self) + + for name, godebug, expected_enabled, expected_enforced in FIPS_GODEBUG_INFO_CASES: + with Check(f"GODEBUG mode `{name}` reports " + f"enabled={expected_enabled} enforced={expected_enforced}"): + _check_fips_info_values( + backup_fips=backup_fips, + name=name, + godebug=godebug, + expected_enabled=expected_enabled, + expected_enforced=expected_enforced, + ) + + @TestScenario @Requirements( RQ_SRS_013_ClickHouse_BackupUtility_FIPS_SelfTest_Integrity("1.0"), @@ -661,12 +816,18 @@ def inbound_tls_cipher_negotiation(self): (`/etc/clickhouse-server/ssl/server.{crt,key}`), inherited via `volumes_from_name="clickhouse1"` on the FIPS container. """ - # Non-FIPS TLS1.2 suites - NON_FIPS_TLS12_INBOUND_REST = ( - "ECDHE-RSA-CHACHA20-POLY1305", - "RC4-SHA", - "DES-CBC3-SHA", - ) + # Non-approved profiles the REST API listener must reject. The TLSv1.2 base + # adds RC4 / 3DES (specific to the inbound case); the TLSv1.3 base is the + # shared non-approved suite. `--stress` widens both with the broader stress + # sets so legacy / CBC ciphers are exercised here too; default keeps the + # minimum. `STRESS` lists already include their base, so they are assigned, + # not appended, to avoid probing the same cipher twice. + non_fips_tls12 = NON_FIPS_TLS12_INBOUND + non_fips_tls13 = NON_FIPS_TLS13 + + if self.context.stress: + non_fips_tls12 = NON_FIPS_TLS12_INBOUND_STRESS + non_fips_tls13 = NON_FIPS_TLS13_STRESS backup_fips = _require_fips_container(self) @@ -701,7 +862,7 @@ def inbound_tls_cipher_negotiation(self): ) with And("I try to connect using each non-FIPS TLSv1.3 cipher suite"): - for ciphersuite in NON_FIPS_TLS13: + for ciphersuite in non_fips_tls13: with Check(f"TLSv1.3 ciphersuite {ciphersuite} should be rejected"): _check_tls_handshake( node=backup_fips, target=target, tls_flag="-tls1_3", @@ -709,7 +870,7 @@ def inbound_tls_cipher_negotiation(self): ) with And("I try to connect using each non-FIPS TLSv1.2 cipher"): - for cipher in NON_FIPS_TLS12_INBOUND_REST: + for cipher in non_fips_tls12: with Check(f"TLSv1.2 cipher {cipher} should be rejected"): _check_tls_handshake( node=backup_fips, target=target, tls_flag="-tls1_2", @@ -772,6 +933,10 @@ def outbound_tls_cipher_negotiation(self): f"-c {FIPS_OUTBOUND_CH_CONFIG_PATH} tables 2>&1" # `2>&1` redirects stderr to stdout. ) + # `--stress` widens the non-approved coverage; default keeps the minimum. + non_fips_tls13 = NON_FIPS_TLS13_STRESS if self.context.stress else NON_FIPS_TLS13 + non_fips_tls12 = NON_FIPS_TLS12_OUTBOUND_STRESS if self.context.stress else NON_FIPS_TLS12_OUTBOUND + with When("I try each FIPS-approved TLSv1.3 cipher suite on the CH endpoint"): for ciphersuite in FIPS_TLS13_APPROVED: with Check(f"TLSv1.3 ciphersuite {ciphersuite} should be accepted"): @@ -791,7 +956,7 @@ def outbound_tls_cipher_negotiation(self): ) with And("I try each non-FIPS TLSv1.3 cipher suite on the CH endpoint"): - for ciphersuite in NON_FIPS_TLS13: + for ciphersuite in non_fips_tls13: with Check(f"TLSv1.3 ciphersuite {ciphersuite} should be rejected"): _check_outbound_tls_with_cipher( cluster=cluster, backup_fips=backup_fips, @@ -800,7 +965,7 @@ def outbound_tls_cipher_negotiation(self): ) with And("I try each non-FIPS TLSv1.2 cipher on the CH endpoint"): - for cipher in NON_FIPS_TLS12_OUTBOUND: + for cipher in non_fips_tls12: with Check(f"TLSv1.2 cipher {cipher} should be rejected"): _check_outbound_tls_with_cipher( cluster=cluster, backup_fips=backup_fips, @@ -853,6 +1018,10 @@ def outbound_tls_to_s3_endpoint_with_openssl_s_server(self): f"-c {FIPS_OUTBOUND_S3_CONFIG_PATH} list remote 2>&1" # `2>&1` redirects stderr to stdout. ) + # `--stress` widens the non-approved coverage; default keeps the minimum. + non_fips_tls13 = NON_FIPS_TLS13_STRESS if self.context.stress else NON_FIPS_TLS13 + non_fips_tls12 = NON_FIPS_TLS12_OUTBOUND_STRESS if self.context.stress else NON_FIPS_TLS12_OUTBOUND + with Check("I try each FIPS-approved TLSv1.3 cipher suite on the S3 endpoint"): for ciphersuite in FIPS_TLS13_APPROVED: with Check(f"TLSv1.3 ciphersuite {ciphersuite} should be accepted"): @@ -874,7 +1043,7 @@ def outbound_tls_to_s3_endpoint_with_openssl_s_server(self): ) with Check("I try each non-FIPS TLSv1.3 cipher suite on the S3 endpoint"): - for ciphersuite in NON_FIPS_TLS13: + for ciphersuite in non_fips_tls13: with Check(f"TLSv1.3 ciphersuite {ciphersuite} should be rejected"): _check_outbound_tls_with_cipher( cluster=cluster, backup_fips=backup_fips, @@ -885,7 +1054,7 @@ def outbound_tls_to_s3_endpoint_with_openssl_s_server(self): ) with Check("I try each non-FIPS TLSv1.2 cipher on the S3 endpoint"): - for cipher in NON_FIPS_TLS12_OUTBOUND: + for cipher in non_fips_tls12: with Check(f"TLSv1.2 cipher {cipher} should be rejected"): _check_outbound_tls_with_cipher( cluster=cluster, backup_fips=backup_fips, @@ -1139,6 +1308,11 @@ def outbound_tls_to_nonfips_clickhouse_with_cipher_profile(self): `clickhouse-backup-fips tables` end-to-end and asserts the command succeeds (TLS + native CH protocol). + With `--stress` the server instead offers the full documented + FIPS-approved cipher set (`listeners-fips-cipher-stress.xml`), so the + end-to-end run is exercised against the same cipher list a real + FIPS-compatible server advertises. + Cipher-policy rejection (FIPS binary refusing non-approved ciphers) is covered deterministically by `outbound_tls_cipher_negotiation` using `openssl s_server`; replaying the negative case here would be flaky @@ -1147,9 +1321,15 @@ def outbound_tls_to_nonfips_clickhouse_with_cipher_profile(self): """ backup_fips = _require_fips_container(self) cluster = self.context.cluster + + # Default: a single FIPS-approved cipher. `--stress`: the full documented set. + listeners_basename = ( + "listeners-fips-cipher-stress.xml" if self.context.stress + else "listeners-fips-cipher.xml" + ) listeners_xml = os.path.join( cluster.tests_dir, - "configs/clickhouse_nonfips_server/config.d/listeners-fips-cipher.xml", + f"configs/clickhouse_nonfips_server/config.d/{listeners_basename}", ) try: @@ -1260,6 +1440,7 @@ def fips_140_3(self): Scenario(run=gofips140_build_flags_present, flags=TE) Scenario(run=connectivity_against_non_fips_clickhouse_server, flags=TE) Scenario(run=connectivity_against_fips_clickhouse_server, flags=TE) + Scenario(run=godebug_fips140_modes, flags=TE) Scenario(run=fips_integrity_self_test_failure_on_tampered_binary, flags=TE) Scenario(run=inbound_tls_cipher_negotiation, flags=TE) Scenario(run=outbound_tls_cipher_negotiation, flags=TE) From d6e84d121025af734ce18dc87c2af5b362a9b461 Mon Sep 17 00:00:00 2001 From: slach Date: Tue, 9 Jun 2026 18:03:15 +0400 Subject: [PATCH 03/24] add context cancel handler success unit tests, fix https://github.com/Altinity/clickhouse-backup/issues/1365, again, improve TestKill which actually test .pid remove --- pkg/server/server.go | 20 +-- pkg/storage/azblob.go | 8 +- pkg/storage/download_cancel_test.go | 154 ++++++++++++++++ pkg/storage/general.go | 51 +++++- pkg/storage/object_disk/cancel_test.go | 96 ++++++++++ pkg/storage/object_disk/object_disk.go | 26 ++- pkg/storage/upload_cancel_test.go | 105 +++++++++++ test/integration/kill_test.go | 238 ++++++++++++++++++++++++- 8 files changed, 672 insertions(+), 26 deletions(-) create mode 100644 pkg/storage/download_cancel_test.go create mode 100644 pkg/storage/object_disk/cancel_test.go create mode 100644 pkg/storage/upload_cancel_test.go diff --git a/pkg/server/server.go b/pkg/server/server.go index faf9c4e2..0b14d4b8 100644 --- a/pkg/server/server.go +++ b/pkg/server/server.go @@ -210,17 +210,17 @@ func (api *APIServer) Restart() error { } }() return nil - } else { - go func() { - if err = api.server.ListenAndServe(); err != nil { - if errors.Is(err, http.ErrServerClosed) { - log.Warn().Msgf("ListenAndServe get signal: %s", err.Error()) - } else { - log.Fatal().Stack().Msgf("ListenAndServe error: %s", err.Error()) - } - } - }() } + + go func() { + if err = api.server.ListenAndServe(); err != nil { + if errors.Is(err, http.ErrServerClosed) { + log.Warn().Msgf("ListenAndServe get signal: %s", err.Error()) + } else { + log.Fatal().Stack().Msgf("ListenAndServe error: %s", err.Error()) + } + } + }() return nil } diff --git a/pkg/storage/azblob.go b/pkg/storage/azblob.go index a8b15e44..06dee8d6 100644 --- a/pkg/storage/azblob.go +++ b/pkg/storage/azblob.go @@ -410,7 +410,13 @@ func (a *AzureBlob) CopyObject(ctx context.Context, srcSize int64, srcBucket, sr sleepDuration := time.Millisecond * 50 for copyStatus == blob.CopyStatusTypePending { // @TODO think how to avoid polling GetProperties in AZBLOB during CopyObject - time.Sleep(sleepDuration * time.Duration(pollCount*2)) + // honor context cancellation during the backoff so /backup/kill returns + // promptly instead of sleeping up to ~800ms before the next poll + select { + case <-ctx.Done(): + return 0, ctx.Err() + case <-time.After(sleepDuration * time.Duration(pollCount*2)): + } dstMeta, err := destinationBlob.GetProperties(ctx, &blob.GetPropertiesOptions{}) if err != nil { return 0, errors.Wrap(err, "azblob->CopyObject failed to destinationBlobURL.GetProperties operation") diff --git a/pkg/storage/download_cancel_test.go b/pkg/storage/download_cancel_test.go new file mode 100644 index 00000000..82d21917 --- /dev/null +++ b/pkg/storage/download_cancel_test.go @@ -0,0 +1,154 @@ +package storage + +import ( + "context" + "io" + "path/filepath" + "sync" + "testing" + "time" + + "github.com/eapache/go-resiliency/retrier" +) + +// blockingReadCloser blocks on every Read until Close is called, then reports +// EOF. It models a remote body / extract stream that has stalled (slow or +// half-open network, disk backpressure) and only unblocks when something +// closes the underlying reader. +type blockingReadCloser struct { + closed chan struct{} + closeOnce sync.Once + readStarted chan struct{} + startOnce sync.Once +} + +func (b *blockingReadCloser) Read(_ []byte) (int, error) { + b.startOnce.Do(func() { close(b.readStarted) }) + <-b.closed + return 0, io.EOF +} + +func (b *blockingReadCloser) Close() error { + b.closeOnce.Do(func() { close(b.closed) }) + return nil +} + +type fakeRemoteFile struct{ size int64 } + +func (f fakeRemoteFile) Size() int64 { return f.size } +func (f fakeRemoteFile) Name() string { return "part.tar" } +func (f fakeRemoteFile) LastModified() time.Time { return time.Time{} } + +// blockingRemote is a RemoteStorage whose download reader never returns data +// until closed. Only the methods DownloadCompressedStream needs are +// implemented; the embedded nil interface satisfies the rest (never called). +type blockingRemote struct { + RemoteStorage + r *blockingReadCloser +} + +func (m *blockingRemote) StatFile(_ context.Context, _ string) (RemoteFile, error) { + return fakeRemoteFile{size: 1024}, nil +} + +func (m *blockingRemote) GetFileReaderWithLocalPath(_ context.Context, _, _ string, _ int64) (io.ReadCloser, error) { + return m.r, nil +} + +// failClassifier never retries, so the retrier returns on the first attempt. +type failClassifier struct{} + +func (failClassifier) Classify(err error) retrier.Action { + if err == nil { + return retrier.Succeed + } + return retrier.Fail +} + +// downloadPathRemote enumerates a single file whose reader blocks until closed. +type downloadPathRemote struct { + RemoteStorage + r *blockingReadCloser +} + +func (m *downloadPathRemote) Kind() string { return "S3" } + +func (m *downloadPathRemote) Walk(ctx context.Context, _ string, _ bool, fn func(context.Context, RemoteFile) error) error { + return fn(ctx, fakeRemoteFile{size: 1024}) +} + +func (m *downloadPathRemote) GetFileReader(_ context.Context, _ string) (io.ReadCloser, error) { + return m.r, nil +} + +// TestDownloadPathCancel is the per-file (non-archive) counterpart of +// TestDownloadCompressedStreamCancel: DownloadPath copies a remote reader into +// a local file via copyWithBuffer and must return promptly on context cancel +// even when that read is stalled. +func TestDownloadPathCancel(t *testing.T) { + br := &blockingReadCloser{closed: make(chan struct{}), readStarted: make(chan struct{})} + bd := &BackupDestination{RemoteStorage: &downloadPathRemote{r: br}} + + ctx, cancel := context.WithCancel(context.Background()) + defer cancel() + + done := make(chan error, 1) + go func() { + _, err := bd.DownloadPath(ctx, "remote/path", t.TempDir(), 0, time.Second, 0, failClassifier{}, 0) + done <- err + }() + + select { + case <-br.readStarted: + case <-time.After(5 * time.Second): + t.Fatal("DownloadPath never started reading the remote stream") + } + cancel() + + select { + case <-done: + // Returned after cancel — correct behavior. + case <-time.After(5 * time.Second): + t.Fatal("DownloadPath did not return within 5s after context cancel; " + + "a stalled read is not honoring cancellation") + } +} + +// TestDownloadCompressedStreamCancel reproduces the production hang where +// /backup/kill cancels the command context but a download stuck in a stalled +// read/extract keeps running (observed: a download completing 6.32TiB hours +// after kill). DownloadCompressedStream must return promptly once the context +// is cancelled, even when the underlying read is blocked. +func TestDownloadCompressedStreamCancel(t *testing.T) { + br := &blockingReadCloser{closed: make(chan struct{}), readStarted: make(chan struct{})} + bd := &BackupDestination{ + RemoteStorage: &blockingRemote{r: br}, + compressionFormat: "tar", + pipeBufferSize: 1024 * 1024, + } + + ctx, cancel := context.WithCancel(context.Background()) + defer cancel() + + done := make(chan error, 1) + go func() { + _, err := bd.DownloadCompressedStream(ctx, "shadow/db/tbl/part.tar", filepath.Join(t.TempDir(), "out"), 0) + done <- err + }() + + // Wait until the stream is genuinely blocked on a read, then cancel. + select { + case <-br.readStarted: + case <-time.After(5 * time.Second): + t.Fatal("download never started reading the remote stream") + } + cancel() + + select { + case <-done: + // Returned after cancel — correct behavior. + case <-time.After(5 * time.Second): + t.Fatal("DownloadCompressedStream did not return within 5s after context cancel; " + + "a stalled read/extract is not honoring cancellation (reproduces the kill-does-not-stop-download bug)") + } +} diff --git a/pkg/storage/general.go b/pkg/storage/general.go index 45231e8f..318a6b03 100644 --- a/pkg/storage/general.go +++ b/pkg/storage/general.go @@ -475,15 +475,34 @@ func (bd *BackupDestination) DownloadCompressedStream(ctx context.Context, remot } rawReader := reader reader = bwlimit.ReadCloser(ctx, reader, bd.DownloadLimiter(maxSpeed)) - defer func() { - if err := reader.Close(); err != nil { - log.Warn().Msgf("can't close GetFileReader descriptor %v", reader) + var closeReaderOnce sync.Once + closeReader := func() { + closeReaderOnce.Do(func() { + if err := reader.Close(); err != nil { + log.Warn().Msgf("can't close GetFileReader descriptor %v", reader) + } + }) + } + // A stalled read (slow/half-open network, disk backpressure) is not + // interruptible by context alone: the nio feeder and tar.Extract block in + // Read calls that never re-check ctx, so /backup/kill cancels the context + // but the download keeps running. Force-close the reader on cancellation so + // the blocked read returns and the download unwinds. + watchDone := make(chan struct{}) + go func() { + select { + case <-ctx.Done(): + closeReader() + case <-watchDone: } - switch rawReader.(type) { + }() + defer func() { + close(watchDone) + closeReader() + switch rawReader := rawReader.(type) { case *os.File: - fileName := rawReader.(*os.File).Name() - if err := os.Remove(fileName); err != nil { - log.Warn().Msgf("can't remove %s", fileName) + if err := os.Remove(rawReader.Name()); err != nil { + log.Warn().Msgf("can't remove %s", rawReader.Name()) } } }() @@ -638,6 +657,22 @@ func (bd *BackupDestination) DownloadPath(ctx context.Context, remotePath string return errors.Wrap(err, "DownloadPath GetFileReader") } r = bwlimit.ReadCloser(ctx, r, limiter) + var closeSrcOnce sync.Once + var srcCloseErr error + closeSrc := func() { closeSrcOnce.Do(func() { srcCloseErr = r.Close() }) } + // A stalled read is not interruptible by context alone: copyWithBuffer + // blocks in Read and never re-checks ctx, so /backup/kill cancels the + // context but the copy keeps running. Force-close the source reader on + // cancellation so the blocked read returns and the copy unwinds. + watchDone := make(chan struct{}) + go func() { + select { + case <-ctx.Done(): + closeSrc() + case <-watchDone: + } + }() + defer close(watchDone) dstFilePath := path.Join(localPath, f.Name()) dstDirPath, _ := path.Split(dstFilePath) if err := os.MkdirAll(dstDirPath, 0750); err != nil { @@ -659,7 +694,7 @@ func (bd *BackupDestination) DownloadPath(ctx context.Context, remotePath string log.Error().Err(dstCloseErr).Send() return errors.Wrap(dstCloseErr, "DownloadPath dst.Close") } - if srcCloseErr := r.Close(); srcCloseErr != nil { + if closeSrc(); srcCloseErr != nil { log.Error().Err(srcCloseErr).Send() return errors.Wrap(srcCloseErr, "DownloadPath r.Close") } diff --git a/pkg/storage/object_disk/cancel_test.go b/pkg/storage/object_disk/cancel_test.go new file mode 100644 index 00000000..6ff051ca --- /dev/null +++ b/pkg/storage/object_disk/cancel_test.go @@ -0,0 +1,96 @@ +package object_disk + +import ( + "context" + "io" + "sync" + "testing" + "time" + + "github.com/Altinity/clickhouse-backup/v2/pkg/storage" +) + +// blockingReadCloser blocks on every Read until Close is called, then reports +// EOF. Models a remote body that has stalled and only unblocks when something +// closes the underlying reader. +type blockingReadCloser struct { + closed chan struct{} + closeOnce sync.Once + readStarted chan struct{} + startOnce sync.Once +} + +func (b *blockingReadCloser) Read(_ []byte) (int, error) { + b.startOnce.Do(func() { close(b.readStarted) }) + <-b.closed + return 0, io.EOF +} + +func (b *blockingReadCloser) Close() error { + b.closeOnce.Do(func() { close(b.closed) }) + return nil +} + +type fakeRemoteFile struct{ size int64 } + +func (f fakeRemoteFile) Size() int64 { return f.size } +func (f fakeRemoteFile) Name() string { return "obj" } +func (f fakeRemoteFile) LastModified() time.Time { return time.Time{} } + +// srcRemote provides a source reader that blocks until closed. +type srcRemote struct { + storage.RemoteStorage + r *blockingReadCloser +} + +func (s *srcRemote) StatFileAbsolute(_ context.Context, _ string) (storage.RemoteFile, error) { + return fakeRemoteFile{size: 1024}, nil +} + +func (s *srcRemote) GetFileReaderAbsolute(_ context.Context, _ string) (io.ReadCloser, error) { + return s.r, nil +} + +// dstRemote drains the body it is given, modeling an uploader that reads the +// source stream to completion. +type dstRemote struct { + storage.RemoteStorage +} + +func (d *dstRemote) PutFileAbsolute(_ context.Context, _ string, r io.ReadCloser, _ int64) error { + _, err := io.Copy(io.Discard, r) + return err +} + +// TestCopyObjectStreamingCancel reproduces the same kill-does-not-stop class of +// bug for the object_disk streaming copy used by create/restore: a stalled +// source read makes CopyObjectStreaming run forever despite context +// cancellation. It must return promptly once the context is cancelled. +func TestCopyObjectStreamingCancel(t *testing.T) { + br := &blockingReadCloser{closed: make(chan struct{}), readStarted: make(chan struct{})} + src := &srcRemote{r: br} + dst := &dstRemote{} + + ctx, cancel := context.WithCancel(context.Background()) + defer cancel() + + done := make(chan error, 1) + go func() { + done <- CopyObjectStreaming(ctx, src, dst, "src/key", "dst/key", nil) + }() + + select { + case <-br.readStarted: + case <-time.After(5 * time.Second): + t.Fatal("streaming copy never started reading the source") + } + cancel() + + select { + case <-done: + // Returned after cancel — correct behavior. + case <-time.After(5 * time.Second): + t.Fatal("CopyObjectStreaming did not return within 5s after context cancel; " + + "a stalled read is not honoring cancellation (reproduces the kill-does-not-stop bug for create/restore object_disk)") + } +} diff --git a/pkg/storage/object_disk/object_disk.go b/pkg/storage/object_disk/object_disk.go index 32d3b209..dd40ec5c 100644 --- a/pkg/storage/object_disk/object_disk.go +++ b/pkg/storage/object_disk/object_disk.go @@ -788,11 +788,31 @@ func CopyObjectStreaming(ctx context.Context, srcStorage storage.RemoteStorage, if srcErr != nil { return errors.Wrapf(srcErr, "srcStorage.GetFileReaderAbsolute(%s) error", srcKey) } - defer func() { - if closeErr := srcReader.Close(); closeErr != nil { - log.Error().Msgf("srcReader.Close(%s) error: %v", srcKey, closeErr) + var closeSrcOnce sync.Once + closeSrc := func() { + closeSrcOnce.Do(func() { + if closeErr := srcReader.Close(); closeErr != nil { + log.Error().Msgf("srcReader.Close(%s) error: %v", srcKey, closeErr) + } + }) + } + // A stalled read (slow/half-open network, disk backpressure) is not + // interruptible by context alone: PutFile blocks reading srcReader and never + // re-checks ctx, so /backup/kill cancels the context but the copy keeps + // running. Force-close the source reader on cancellation so the blocked read + // returns and the copy unwinds. + watchDone := make(chan struct{}) + go func() { + select { + case <-ctx.Done(): + closeSrc() + case <-watchDone: } }() + defer func() { + close(watchDone) + closeSrc() + }() // streaming copy moves bytes through this process (unlike server-side CopyObject), // so honor the configured upload throttle, fix https://github.com/Altinity/clickhouse-backup/issues/1377 body := bwlimit.ReadCloser(ctx, srcReader, limiter) diff --git a/pkg/storage/upload_cancel_test.go b/pkg/storage/upload_cancel_test.go new file mode 100644 index 00000000..2d133d65 --- /dev/null +++ b/pkg/storage/upload_cancel_test.go @@ -0,0 +1,105 @@ +package storage + +import ( + "context" + "io" + "os" + "path/filepath" + "sync" + "testing" + "time" +) + +// putBlockingRemote.PutFile blocks until the context is cancelled, then returns +// ctx.Err(). It models a well-behaved remote upload that aborts when its request +// context is cancelled (e.g. an S3 PutObject), and is used to verify the upload +// paths unwind promptly on /backup/kill. +type putBlockingRemote struct { + RemoteStorage + putStarted chan struct{} + startOnce sync.Once +} + +func (m *putBlockingRemote) PutFile(ctx context.Context, _ string, _ io.ReadCloser, _ int64) error { + m.startOnce.Do(func() { close(m.putStarted) }) + <-ctx.Done() + return ctx.Err() +} + +func (m *putBlockingRemote) PutFileAbsolute(ctx context.Context, key string, r io.ReadCloser, size int64) error { + return m.PutFile(ctx, key, r, size) +} + +func writeTempFile(t *testing.T) (dir, name string) { + t.Helper() + dir = t.TempDir() + name = "part.bin" + if err := os.WriteFile(filepath.Join(dir, name), []byte("some payload bytes"), 0600); err != nil { + t.Fatalf("write temp file: %v", err) + } + return dir, name +} + +// TestUploadCompressedStreamCancel verifies the archive-upload path unwinds on +// context cancel. Unlike download, upload reads local files and pushes to the +// remote via PutFile(ctx); cancellation propagates through the errgroup and the +// nio pipe cross-close, so this guards that wiring stays correct. +func TestUploadCompressedStreamCancel(t *testing.T) { + dir, name := writeTempFile(t) + remote := &putBlockingRemote{putStarted: make(chan struct{})} + bd := &BackupDestination{ + RemoteStorage: remote, + compressionFormat: "tar", + pipeBufferSize: 1024 * 1024, + } + + ctx, cancel := context.WithCancel(context.Background()) + defer cancel() + + done := make(chan error, 1) + go func() { + done <- bd.UploadCompressedStream(ctx, dir, []string{name}, "remote/data.tar", 0) + }() + + select { + case <-remote.putStarted: + case <-time.After(5 * time.Second): + t.Fatal("upload never started PutFile") + } + cancel() + + select { + case <-done: + case <-time.After(5 * time.Second): + t.Fatal("UploadCompressedStream did not return within 5s after context cancel") + } +} + +// TestUploadPathCancel is the per-file (non-archive) counterpart. +func TestUploadPathCancel(t *testing.T) { + dir, name := writeTempFile(t) + remote := &putBlockingRemote{putStarted: make(chan struct{})} + bd := &BackupDestination{RemoteStorage: remote} + + ctx, cancel := context.WithCancel(context.Background()) + defer cancel() + + done := make(chan error, 1) + go func() { + _, err := bd.UploadPath(ctx, dir, []string{name}, "remote", 0, time.Second, 0, failClassifier{}, 0) + done <- err + }() + + select { + case <-remote.putStarted: + case <-time.After(5 * time.Second): + t.Fatal("upload never started PutFile") + } + cancel() + + select { + case <-done: + case <-time.After(5 * time.Second): + t.Fatal("UploadPath did not return within 5s after context cancel") + } +} diff --git a/test/integration/kill_test.go b/test/integration/kill_test.go index 4b0558f9..6751c606 100644 --- a/test/integration/kill_test.go +++ b/test/integration/kill_test.go @@ -136,21 +136,251 @@ func TestKill(t *testing.T) { // registered above so they run on both success and mid-test failure. } +// TestKillDownload kills an in-progress streaming download and verifies the +// download goroutine actually stops. Reproduces the class of bug seen in +// production (https://github.com/Altinity/clickhouse-backup/issues/1365 follow +// up): with allow_multipart_download=false + download_by_part=true the data is +// streamed S3 -> nio pipe -> tar.Extract, and a worker stuck in a read that +// ignores context cancellation made /backup/kill block the full +// cancel_operation_timeout (default 1800s) while the download kept running. +// A short API_CANCEL_OPERATION_TIMEOUT makes that failure mode fail fast here. +func TestKillDownload(t *testing.T) { + env, r := NewTestEnvironment(t) + env.connectWithWait(t, r, 0*time.Second, 1*time.Second, 1*time.Minute) + r.NoError(env.DockerCP("configs/config-s3.yml", "clickhouse-backup:/etc/clickhouse-backup/config.yml")) + env.InstallDebIfNotExists(r, "clickhouse-backup", "curl", "jq") + defer env.Cleanup(t, r) + + const dbName = "kill_download_db" + const backupName = "kill_download_backup" + + killSetupTable(r, env, dbName) + + log.Debug().Msg("start clickhouse-backup server for TestKillDownload") + env.DockerExecBackgroundNoError(r, "clickhouse-backup", "bash", "-ce", + "S3_ALLOW_MULTIPART_DOWNLOAD=false DOWNLOAD_CONCURRENCY=1 API_CANCEL_OPERATION_TIMEOUT=15s "+ + "clickhouse-backup server &>>/tmp/clickhouse-backup-server.log") + defer func() { + _ = env.DockerExec("clickhouse-backup", "pkill", "-n", "-f", "clickhouse-backup") + }() + defer func() { + _ = env.DockerExec("clickhouse-backup", "clickhouse-backup", "delete", "local", backupName) + if out, err := env.DockerExecOut("clickhouse-backup", "clickhouse-backup", "delete", "remote", backupName); err != nil && !strings.Contains(out, fmt.Sprintf("'%s' is not found on remote storage", backupName)) { + t.Errorf("TestKillDownload teardown error=%+v: delete remote %s: %s", err, backupName, out) + } + _ = env.DockerExec("minio", "rm", "-rf", env.minioBackupFSPath(r, "config-s3.yml", backupName)) + if err := env.dropDatabase(dbName, true); err != nil { + t.Errorf("TestKillDownload teardown: drop database %s, error=%+v", dbName, err) + } + }() + time.Sleep(3 * time.Second) + + // 1. create local backup, push it remote, then drop local so download works. + runActionWait(r, env, fmt.Sprintf("create --tables=%s.* %s", dbName, backupName), "create", backupName, 60*time.Second) + runActionWait(r, env, "upload "+backupName, "upload", backupName, 120*time.Second) + delOut := postAction(r, env, "delete local "+backupName) + r.Contains(delOut, "\"status\":\"success\"", "delete local must succeed: %s", delOut) + + // 2. start download and kill it mid-flight. + startOut := postAction(r, env, "download "+backupName) + r.Contains(startOut, "acknowledged", "download must be acknowledged: %s", startOut) + observeInProgressAndKill(r, env, "download "+backupName, backupName, "download", 15*time.Second) + + // 3. a follow-up delete must not trip on a stale pid lock. + delOut = postAction(r, env, "delete local "+backupName) + r.NotContains(delOut, "another clickhouse-backup", "delete must not see a stale pid lock: %s", delOut) +} + +// TestKillCreate kills an in-progress create and verifies the create goroutine +// stops (pid removed, last_create_finish advances, kill returns fast). +func TestKillCreate(t *testing.T) { + env, r := NewTestEnvironment(t) + env.connectWithWait(t, r, 0*time.Second, 1*time.Second, 1*time.Minute) + r.NoError(env.DockerCP("configs/config-s3.yml", "clickhouse-backup:/etc/clickhouse-backup/config.yml")) + env.InstallDebIfNotExists(r, "clickhouse-backup", "curl", "jq") + defer env.Cleanup(t, r) + + const dbName = "kill_create_db" + const backupName = "kill_create_backup" + + killSetupTable(r, env, dbName) + + log.Debug().Msg("start clickhouse-backup server for TestKillCreate") + env.DockerExecBackgroundNoError(r, "clickhouse-backup", "bash", "-ce", + "API_CANCEL_OPERATION_TIMEOUT=15s clickhouse-backup server &>>/tmp/clickhouse-backup-server.log") + defer func() { + _ = env.DockerExec("clickhouse-backup", "pkill", "-n", "-f", "clickhouse-backup") + }() + defer func() { + _ = env.DockerExec("clickhouse-backup", "clickhouse-backup", "delete", "local", backupName) + if err := env.dropDatabase(dbName, true); err != nil { + t.Errorf("TestKillCreate teardown: drop database %s, error=%+v", dbName, err) + } + }() + time.Sleep(3 * time.Second) + + startOut := postAction(r, env, fmt.Sprintf("create --tables=%s.* %s", dbName, backupName)) + r.Contains(startOut, "acknowledged", "create must be acknowledged: %s", startOut) + observeInProgressAndKill(r, env, fmt.Sprintf("create --tables=%s.* %s", dbName, backupName), backupName, "create", 15*time.Second) + + // a follow-up delete must not trip on a stale pid lock. + delOut := postAction(r, env, "delete local "+backupName) + r.NotContains(delOut, "another clickhouse-backup", "delete must not see a stale pid lock: %s", delOut) +} + +// TestKillRestore kills an in-progress restore and verifies the restore +// goroutine stops (pid removed, last_restore_finish advances, kill returns fast). +func TestKillRestore(t *testing.T) { + env, r := NewTestEnvironment(t) + env.connectWithWait(t, r, 0*time.Second, 1*time.Second, 1*time.Minute) + r.NoError(env.DockerCP("configs/config-s3.yml", "clickhouse-backup:/etc/clickhouse-backup/config.yml")) + env.InstallDebIfNotExists(r, "clickhouse-backup", "curl", "jq") + defer env.Cleanup(t, r) + + const dbName = "kill_restore_db" + const backupName = "kill_restore_backup" + + killSetupTable(r, env, dbName) + + log.Debug().Msg("start clickhouse-backup server for TestKillRestore") + env.DockerExecBackgroundNoError(r, "clickhouse-backup", "bash", "-ce", + "API_CANCEL_OPERATION_TIMEOUT=15s clickhouse-backup server &>>/tmp/clickhouse-backup-server.log") + defer func() { + _ = env.DockerExec("clickhouse-backup", "pkill", "-n", "-f", "clickhouse-backup") + }() + defer func() { + _ = env.DockerExec("clickhouse-backup", "clickhouse-backup", "delete", "local", backupName) + if err := env.dropDatabase(dbName, true); err != nil { + t.Errorf("TestKillRestore teardown: drop database %s, error=%+v", dbName, err) + } + }() + time.Sleep(3 * time.Second) + + // create a local backup, drop the table so restore has to recreate+attach. + runActionWait(r, env, fmt.Sprintf("create --tables=%s.* %s", dbName, backupName), "create", backupName, 60*time.Second) + env.queryWithNoError(r, fmt.Sprintf("DROP TABLE %s.t1 SYNC", dbName)) + + startOut := postAction(r, env, "restore "+backupName) + r.Contains(startOut, "acknowledged", "restore must be acknowledged: %s", startOut) + observeInProgressAndKill(r, env, "restore "+backupName, backupName, "restore", 15*time.Second) +} + // readUploadFinishMetric scrapes /metrics and parses the value of // clickhouse_backup_last_upload_finish (a unix-timestamp gauge updated by // metrics.ExecuteWithMetrics when the upload goroutine returns). func readUploadFinishMetric(r *require.Assertions, env *TestEnvironment) int64 { + return readActionFinishMetric(r, env, "upload") +} + +// readActionFinishMetric scrapes /metrics and parses the value of +// clickhouse_backup_last__finish (a unix-timestamp gauge updated by +// metrics.ExecuteWithMetrics when the command goroutine returns). +func readActionFinishMetric(r *require.Assertions, env *TestEnvironment, command string) int64 { + metric := "clickhouse_backup_last_" + command + "_finish" out, err := env.DockerExecOut("clickhouse-backup", "bash", "-ce", - "curl -sfL http://localhost:7171/metrics | grep -E '^clickhouse_backup_last_upload_finish '") + "curl -sfL http://localhost:7171/metrics | grep -E '^"+metric+" '") r.NoError(err, "/metrics scrape failed: %s", out) - // Format: `clickhouse_backup_last_upload_finish ` - matches := regexp.MustCompile(`clickhouse_backup_last_upload_finish\s+([0-9.eE+\-]+)`).FindStringSubmatch(out) - r.Len(matches, 2, "could not parse upload_finish metric: %q", out) + matches := regexp.MustCompile(metric + `\s+([0-9.eE+\-]+)`).FindStringSubmatch(out) + r.Len(matches, 2, "could not parse %s metric: %q", metric, out) v, err := strconv.ParseFloat(strings.TrimSpace(matches[1]), 64) r.NoError(err, "parse %q", matches[1]) return int64(v) } +// postAction POSTs a single command to /backup/actions and returns the raw +// response body. The command value is JSON-encoded via %q; since no command +// used by these tests contains a single quote, the JSON is safely wrapped in +// shell single quotes. +func postAction(r *require.Assertions, env *TestEnvironment, command string) string { + body := fmt.Sprintf(`{"command":%q}`, command) + out, err := env.DockerExecOut("clickhouse-backup", "bash", "-ce", + "curl -sfL -XPOST 'http://localhost:7171/backup/actions' -d '"+body+"'") + r.NoError(err, "%s\nPOST /backup/actions %q error: %v", out, command, err) + return out +} + +// runActionWait starts an async action and blocks until it reports success. +func runActionWait(r *require.Assertions, env *TestEnvironment, command, cmdPrefix, nameNeedle string, timeout time.Duration) { + out := postAction(r, env, command) + r.Contains(out, "acknowledged", "%q expected acknowledged: %s", command, out) + waitForActionStatus(r, env, cmdPrefix, nameNeedle, "success", timeout) +} + +// killSetupTable (re)creates a table of 100 partitions, ~100KB each. The data +// is incompressible (randomPrintableASCII) so the tar archive stays large +// enough that create/upload/download/restore remain observably in-progress +// long enough to be killed mid-flight. +func killSetupTable(r *require.Assertions, env *TestEnvironment, dbName string) { + r.NoError(env.dropDatabase(dbName, true)) + env.queryWithNoError(r, "CREATE DATABASE "+dbName) + env.queryWithNoError(r, fmt.Sprintf( + "CREATE TABLE %s.t1 (id UInt64, s String) ENGINE=MergeTree() PARTITION BY (id %% 100) ORDER BY id", + dbName)) + // 10000 rows / 100 partitions = 100 rows per partition * 1KiB ≈ 100KiB each. + env.queryWithNoError(r, fmt.Sprintf( + "INSERT INTO %s.t1 SELECT number, randomPrintableASCII(1024) FROM numbers(10000)", dbName)) +} + +// actionInProgress reports whether /backup/actions output has a row whose +// command equals exactly `command` and is still in progress. +func actionInProgress(actionsOut, command string) bool { + for _, line := range strings.Split(actionsOut, "\n") { + if strings.Contains(line, `"command":"`+command+`"`) && strings.Contains(line, `"status":"in progress"`) { + return true + } + } + return false +} + +// observeInProgressAndKill waits until `command` is observably in-progress with +// its pid file present, kills it via /backup/actions, and asserts that the kill +// behaved correctly. metricCommand is the bare command name ("create", +// "download", "restore") whose clickhouse_backup_last__finish gauge is +// used as the proof that the worker goroutine actually returned. +// +// Two assertions discriminate a real cancellation from a hung worker: +// 1. kill returns well under cancel_operation_timeout — a worker that ignores +// context cancellation makes status.waitDone block the whole timeout. +// 2. the *_finish gauge advances — it is only updated by ExecuteWithMetrics +// after cliApp.Run returns; if waitDone merely timed out it stays put. +func observeInProgressAndKill(r *require.Assertions, env *TestEnvironment, command, backupName, metricCommand string, cancelTimeout time.Duration) { + pidPath := fmt.Sprintf("/tmp/clickhouse-backup.%s.pid", backupName) + deadline := time.Now().Add(30 * time.Second) + observed := false + for time.Now().Before(deadline) { + statusOut, _ := env.DockerExecOut("clickhouse-backup", "bash", "-ce", + "curl -sfL 'http://localhost:7171/backup/actions'") + lsOut, lsErr := env.DockerExecOut("clickhouse-backup", "bash", "-ce", "ls "+pidPath+" 2>/dev/null || true") + if actionInProgress(statusOut, command) && lsErr == nil && strings.Contains(lsOut, backupName) { + observed = true + break + } + time.Sleep(50 * time.Millisecond) + } + r.True(observed, "expected to observe %q in-progress with pid file %s present", command, pidPath) + + finishBefore := readActionFinishMetric(r, env, metricCommand) + + killStart := time.Now() + killOut := postAction(r, env, fmt.Sprintf("kill %q", command)) + killElapsed := time.Since(killStart) + r.Contains(killOut, "\"status\":\"success\"", "kill should succeed: %s", killOut) + log.Info().Msgf("kill %q returned in %s", command, killElapsed) + + r.Less(killElapsed, cancelTimeout-2*time.Second, + "kill %q returned in %s; a worker that ignored context cancellation makes "+ + "status.waitDone block until cancel_operation_timeout=%s", command, killElapsed, cancelTimeout) + + finishAfter := readActionFinishMetric(r, env, metricCommand) + r.Greater(finishAfter, finishBefore, + "clickhouse_backup_last_%s_finish must advance during kill (before=%d after=%d); "+ + "the %s goroutine did not return", metricCommand, finishBefore, finishAfter, metricCommand) + + checkOut, _ := env.DockerExecOut("clickhouse-backup", "bash", "-ce", + "if [ -f "+pidPath+" ]; then echo EXISTS; cat "+pidPath+"; else echo GONE; fi") + r.Contains(checkOut, "GONE", "pid file %s must be removed by kill, got: %s", pidPath, checkOut) +} + // waitForActionStatus polls /backup/actions and returns once a row whose // command starts with cmdPrefix and contains nameNeedle is observed with // the expected status. From c505c6266284daedd670af659ede947f26b95839 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 8 Jun 2026 17:54:22 +0000 Subject: [PATCH 04/24] Bump github.com/aws/aws-sdk-go-v2/feature/s3/transfermanager Bumps [github.com/aws/aws-sdk-go-v2/feature/s3/transfermanager](https://github.com/aws/aws-sdk-go-v2) from 0.2.6 to 0.2.8. - [Release notes](https://github.com/aws/aws-sdk-go-v2/releases) - [Commits](https://github.com/aws/aws-sdk-go-v2/compare/feature/s3/transfermanager/v0.2.6...feature/s3/transfermanager/v0.2.8) --- updated-dependencies: - dependency-name: github.com/aws/aws-sdk-go-v2/feature/s3/transfermanager dependency-version: 0.2.8 dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] --- go.mod | 2 +- go.sum | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/go.mod b/go.mod index 3453c02b..042d1a43 100644 --- a/go.mod +++ b/go.mod @@ -10,7 +10,7 @@ require ( github.com/aws/aws-sdk-go-v2 v1.41.12 github.com/aws/aws-sdk-go-v2/config v1.32.23 github.com/aws/aws-sdk-go-v2/credentials v1.19.22 - github.com/aws/aws-sdk-go-v2/feature/s3/transfermanager v0.2.6 + github.com/aws/aws-sdk-go-v2/feature/s3/transfermanager v0.2.8 github.com/aws/aws-sdk-go-v2/service/s3 v1.103.2 github.com/aws/aws-sdk-go-v2/service/sts v1.43.2 github.com/aws/smithy-go v1.27.1 diff --git a/go.sum b/go.sum index c6cb7e7b..f18a9738 100644 --- a/go.sum +++ b/go.sum @@ -71,8 +71,8 @@ github.com/aws/aws-sdk-go-v2/credentials v1.19.22 h1:SHfH6wyPsEgG7fVsi5rQxWEt7tu github.com/aws/aws-sdk-go-v2/credentials v1.19.22/go.mod h1:54nO8lKD4aQPOntM/VTWjnR+DYzTwx0YkSMZMhAgewQ= github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.28 h1:b+kcDejJrXc30zU/w8Tc9klISwaO5wh+6T0sMBdDoHM= github.com/aws/aws-sdk-go-v2/feature/ec2/imds v1.18.28/go.mod h1:LnI62O9GnSv6GcuLXxOYqlq0C8EmxMcgnF6m7LdYuOY= -github.com/aws/aws-sdk-go-v2/feature/s3/transfermanager v0.2.6 h1:bxJRjtHdQwjTQ9Qz37G6Knse7zGNqO4plC2r7TmM++4= -github.com/aws/aws-sdk-go-v2/feature/s3/transfermanager v0.2.6/go.mod h1:9A4usyBencYSi5/18mRjSDe0LHFarrOmyWifz4Om4bY= +github.com/aws/aws-sdk-go-v2/feature/s3/transfermanager v0.2.8 h1:vooR0jc+VLHDkM97Q82ml82WAOl1aA3jX/Dn6Yb19bc= +github.com/aws/aws-sdk-go-v2/feature/s3/transfermanager v0.2.8/go.mod h1:9A4usyBencYSi5/18mRjSDe0LHFarrOmyWifz4Om4bY= github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.28 h1:Xf2j7NdVcUKomlZ4iihOP4AZ3Fzlr8h4yKpXeP+OFPg= github.com/aws/aws-sdk-go-v2/internal/configsources v1.4.28/go.mod h1:O8cDo1dW63jU7ki//kRe1z+tLGcpnD1jrouitsQddDw= github.com/aws/aws-sdk-go-v2/internal/endpoints/v2 v2.7.28 h1:KqIfN9kpkKkcBqBbNpNGTIrXO6ExTUvFKvXkC+YAzVo= From b767c4f9d9ef16b2eb890c38d3f99046731c34c0 Mon Sep 17 00:00:00 2001 From: "dependabot[bot]" <49699333+dependabot[bot]@users.noreply.github.com> Date: Mon, 8 Jun 2026 17:54:25 +0000 Subject: [PATCH 05/24] Bump golang.org/x/sync from 0.20.0 to 0.21.0 Bumps [golang.org/x/sync](https://github.com/golang/sync) from 0.20.0 to 0.21.0. - [Commits](https://github.com/golang/sync/compare/v0.20.0...v0.21.0) --- updated-dependencies: - dependency-name: golang.org/x/sync dependency-version: 0.21.0 dependency-type: direct:production update-type: version-update:semver-minor ... Signed-off-by: dependabot[bot] --- go.mod | 2 +- go.sum | 4 ++-- 2 files changed, 3 insertions(+), 3 deletions(-) diff --git a/go.mod b/go.mod index 042d1a43..59b59c6e 100644 --- a/go.mod +++ b/go.mod @@ -49,7 +49,7 @@ require ( go.etcd.io/bbolt v1.4.3 golang.org/x/crypto v0.52.0 golang.org/x/mod v0.36.0 - golang.org/x/sync v0.20.0 + golang.org/x/sync v0.21.0 golang.org/x/text v0.37.0 google.golang.org/api v0.283.0 gopkg.in/yaml.v3 v3.0.1 diff --git a/go.sum b/go.sum index f18a9738..b34a8b40 100644 --- a/go.sum +++ b/go.sum @@ -441,8 +441,8 @@ golang.org/x/sync v0.3.0/go.mod h1:FU7BRWz2tNW+3quACPkgCx/L+uEAv1htQ0V83Z9Rj+Y= golang.org/x/sync v0.6.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk= golang.org/x/sync v0.7.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk= golang.org/x/sync v0.10.0/go.mod h1:Czt+wKu1gCyEFDUtn0jG5QVvpJ6rzVqr5aXyt9drQfk= -golang.org/x/sync v0.20.0 h1:e0PTpb7pjO8GAtTs2dQ6jYa5BWYlMuX047Dco/pItO4= -golang.org/x/sync v0.20.0/go.mod h1:9xrNwdLfx4jkKbNva9FpL6vEN7evnE43NNNJQ2LF3+0= +golang.org/x/sync v0.21.0 h1:HLII4xRRTtCRkxYp4HNFF0Js/Og6q2i++KXbg0gHCwM= +golang.org/x/sync v0.21.0/go.mod h1:9xrNwdLfx4jkKbNva9FpL6vEN7evnE43NNNJQ2LF3+0= golang.org/x/sys v0.0.0-20190215142949-d0b11bdaac8a/go.mod h1:STP8DvDyc/dI5b8T5hshtkjS+E42TnysNCUPdjciGhY= golang.org/x/sys v0.0.0-20190916202348-b4ddaad3f8a3/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs= From c62ddbe46f9d33bc09c9351f47c8d06130773d83 Mon Sep 17 00:00:00 2001 From: slach Date: Tue, 9 Jun 2026 19:07:44 +0400 Subject: [PATCH 06/24] remove randomPrintableASCII from tests, fix https://github.com/Altinity/clickhouse-backup/issues/1365, again --- test/integration/kill_test.go | 12 +++++++----- 1 file changed, 7 insertions(+), 5 deletions(-) diff --git a/test/integration/kill_test.go b/test/integration/kill_test.go index 6751c606..bcca12d8 100644 --- a/test/integration/kill_test.go +++ b/test/integration/kill_test.go @@ -306,10 +306,11 @@ func runActionWait(r *require.Assertions, env *TestEnvironment, command, cmdPref waitForActionStatus(r, env, cmdPrefix, nameNeedle, "success", timeout) } -// killSetupTable (re)creates a table of 100 partitions, ~100KB each. The data -// is incompressible (randomPrintableASCII) so the tar archive stays large -// enough that create/upload/download/restore remain observably in-progress -// long enough to be killed mid-flight. +// killSetupTable (re)creates a table of 100 partitions, ~100KB each. The +// archive is stored uncompressed (compression_format: tar), so a fixed 1KiB +// payload generated on the Go side keeps the tar archive large enough that +// create/upload/download/restore remain observably in-progress long enough to +// be killed mid-flight. func killSetupTable(r *require.Assertions, env *TestEnvironment, dbName string) { r.NoError(env.dropDatabase(dbName, true)) env.queryWithNoError(r, "CREATE DATABASE "+dbName) @@ -317,8 +318,9 @@ func killSetupTable(r *require.Assertions, env *TestEnvironment, dbName string) "CREATE TABLE %s.t1 (id UInt64, s String) ENGINE=MergeTree() PARTITION BY (id %% 100) ORDER BY id", dbName)) // 10000 rows / 100 partitions = 100 rows per partition * 1KiB ≈ 100KiB each. + payload := strings.Repeat("x", 1024) env.queryWithNoError(r, fmt.Sprintf( - "INSERT INTO %s.t1 SELECT number, randomPrintableASCII(1024) FROM numbers(10000)", dbName)) + "INSERT INTO %s.t1 SELECT number, '%s' FROM numbers(10000)", dbName, payload)) } // actionInProgress reports whether /backup/actions output has a row whose From 28c949b307d0fc6ade970e03ea06402d8b116267 Mon Sep 17 00:00:00 2001 From: slach Date: Tue, 9 Jun 2026 20:16:38 +0400 Subject: [PATCH 07/24] fix skipTablesByEngine, fix https://github.com/Altinity/clickhouse-backup/issues/1416 --- pkg/backup/table_pattern.go | 21 ++++++------ pkg/backup/table_pattern_test.go | 55 ++++++++++++++++++++++++++++++++ 2 files changed, 67 insertions(+), 9 deletions(-) create mode 100644 pkg/backup/table_pattern_test.go diff --git a/pkg/backup/table_pattern.go b/pkg/backup/table_pattern.go index 416c4c3e..44aa6491 100644 --- a/pkg/backup/table_pattern.go +++ b/pkg/backup/table_pattern.go @@ -132,17 +132,22 @@ func (b *Backuper) getTableListByPatternLocal(ctx context.Context, metadataPath return nil, nil, err } result.Sort(dropTable) - for i := 0; i < len(result); i++ { + result = b.skipTablesByEngine(result, resultPartitionNames) + return result, resultPartitionNames, nil +} + +// skipTablesByEngine removes tables matched by ClickHouse.SkipTableEngines from result +// and drops their partition names from resultPartitionNames. +func (b *Backuper) skipTablesByEngine(result ListOfTables, resultPartitionNames map[metadata.TableTitle][]string) ListOfTables { + // iterate in reverse so removing an element never shifts an unvisited one past the cursor + for i := len(result) - 1; i >= 0; i-- { if b.shouldSkipByTableEngine(*result[i]) { t := result[i] delete(resultPartitionNames, metadata.TableTitle{Database: t.Database, Table: t.Table}) result = append(result[:i], result[i+1:]...) - if i > 0 { - i = i - 1 - } } } - return result, resultPartitionNames, nil + return result } func (b *Backuper) shouldSkipByTableName(tableFullName string) bool { @@ -635,18 +640,16 @@ func getOrderByEngine(query string, dropTable bool) int64 { strings.HasPrefix(query, "ATTACH MATERIALIZED VIEW") { if dropTable { return 1 - } else { - return 2 } + return 2 } if strings.HasPrefix(query, "CREATE TABLE") && (strings.Contains(query, ".inner_id.") || strings.Contains(query, ".inner.")) { if dropTable { return 2 - } else { - return 1 } + return 1 } return 0 } diff --git a/pkg/backup/table_pattern_test.go b/pkg/backup/table_pattern_test.go new file mode 100644 index 00000000..4ea128c8 --- /dev/null +++ b/pkg/backup/table_pattern_test.go @@ -0,0 +1,55 @@ +package backup + +import ( + "testing" + + "github.com/Altinity/clickhouse-backup/v2/pkg/config" + "github.com/Altinity/clickhouse-backup/v2/pkg/metadata" + + "github.com/stretchr/testify/assert" +) + +// TestSkipTablesByEngine reproduces https://github.com/Altinity/clickhouse-backup +// where exactly one table escaped SkipTableEngines filtering when two skippable +// tables ended up adjacent at the head of the list after Sort: removing result[0] +// shifted result[1] into index 0, but the `if i > 0` guard prevented re-checking it. +func TestSkipTablesByEngine(t *testing.T) { + iceberg := func(db, table string) *metadata.TableMetadata { + return &metadata.TableMetadata{ + Database: db, + Table: table, + Query: "CREATE TABLE " + db + ".`" + table + "` (`id` Nullable(Int32)) ENGINE = Iceberg('s3://bucket/path')", + } + } + mergeTree := func(db, table string) *metadata.TableMetadata { + return &metadata.TableMetadata{ + Database: db, + Table: table, + Query: "CREATE TABLE " + db + ".`" + table + "` (`id` Int32) ENGINE = MergeTree ORDER BY id", + } + } + + b := &Backuper{cfg: &config.Config{}} + b.cfg.ClickHouse.SkipTableEngines = []string{"IceBerg"} + + // Two Iceberg tables adjacent at the start triggered the off-by-one: the second survived. + result := ListOfTables{ + iceberg("provisioning_iceberg", "provisioning_dbo.dataeventtype"), + iceberg("provisioning_iceberg", "provisioning_rptsched.reportschedulingeventtype"), + mergeTree("default", "events"), + iceberg("provisioning_iceberg", "provisioning_dbo.trackingtagtype"), + } + partitionNames := map[metadata.TableTitle][]string{ + {Database: "provisioning_iceberg", Table: "provisioning_dbo.dataeventtype"}: {"p1"}, + {Database: "provisioning_iceberg", Table: "provisioning_rptsched.reportschedulingeventtype"}: {"p2"}, + } + + result = b.skipTablesByEngine(result, partitionNames) + + for _, tbl := range result { + assert.NotContains(t, tbl.Query, "ENGINE = Iceberg", "Iceberg table `%s`.`%s` was not skipped", tbl.Database, tbl.Table) + } + assert.Len(t, result, 1, "only the MergeTree table must remain") + assert.Equal(t, "events", result[0].Table) + assert.Empty(t, partitionNames, "partition names of skipped tables must be removed") +} From fea1c33acff976a93dcdc7561cfafd7098da10b7 Mon Sep 17 00:00:00 2001 From: slach Date: Tue, 9 Jun 2026 21:16:13 +0400 Subject: [PATCH 08/24] improve error messages for broken object_disks data when keys not present on remote storage --- pkg/backup/create.go | 4 ++-- pkg/backup/restore.go | 4 ++-- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/pkg/backup/create.go b/pkg/backup/create.go index b888a7eb..4d80369d 100644 --- a/pkg/backup/create.go +++ b/pkg/backup/create.go @@ -1220,7 +1220,7 @@ func (b *Backuper) uploadObjectDiskParts(ctx context.Context, backupName string, return nil }) if copyObjectErr != nil { - return errors.Wrapf(copyObjectErr, "b.dst.CopyObject in %s error", backupShadowPath) + return errors.Wrapf(copyObjectErr, "b.dst.CopyObject in %s for srcKey=%s error", fPath, srcKey) } } else { if !isCopyFailed.Load() { @@ -1236,7 +1236,7 @@ func (b *Backuper) uploadObjectDiskParts(ctx context.Context, backupName string, return object_disk.CopyObjectStreaming(uploadCtx, srcDiskConnection.GetRemoteStorage(), b.dst, srcKey, path.Join(objectDiskPath, dstKey), b.dst.UploadLimiter(b.cfg.General.UploadMaxBytesPerSecond)) }) if copyObjectErr != nil { - return errors.Wrapf(copyObjectErr, "object_disk.CopyObjectStreaming in %s error", backupShadowPath) + return errors.Wrapf(copyObjectErr, "object_disk.CopyObjectStreaming in %s for srcKey=%s error", fPath, srcKey) } } objSize = storageObject.ObjectSize diff --git a/pkg/backup/restore.go b/pkg/backup/restore.go index 0c57b037..ed1bd008 100644 --- a/pkg/backup/restore.go +++ b/pkg/backup/restore.go @@ -2744,7 +2744,7 @@ func (b *Backuper) downloadObjectDiskParts(ctx context.Context, backupName strin return retryErr }) if copyObjectErr != nil { - return errors.Wrapf(copyObjectErr, "object_disk.CopyObject `%s`.`%s` error", backupTable.Database, backupTable.Table) + return errors.Wrapf(copyObjectErr, "object_disk.CopyObject `%s`.`%s` in %s for srcKey=%s error", backupTable.Database, backupTable.Table, capturedFPath, srcKey) } } else { copyObjectErr = nil @@ -2769,7 +2769,7 @@ func (b *Backuper) downloadObjectDiskParts(ctx context.Context, backupName strin return object_disk.CopyObjectStreaming(downloadCtx, srcStorage, dstStorage, srcKey, dstKey, b.dst.DownloadLimiter(b.cfg.General.DownloadMaxBytesPerSecond)) }) if copyObjectErr != nil { - return errors.Wrap(copyObjectErr, "object_disk.CopyObjectStreaming error") + return errors.Wrapf(copyObjectErr, "object_disk.CopyObjectStreaming in %s for srcKey=%s error", capturedFPath, srcKey) } copiedSize = storageObject.ObjectSize } From d108416b21279bdf1e9b1bde56cb7d8bd02845a2 Mon Sep 17 00:00:00 2001 From: slach Date: Tue, 9 Jun 2026 21:40:12 +0400 Subject: [PATCH 09/24] fix TestKill for old clickhouse-server version --- test/integration/kill_test.go | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/test/integration/kill_test.go b/test/integration/kill_test.go index bcca12d8..6b2858c3 100644 --- a/test/integration/kill_test.go +++ b/test/integration/kill_test.go @@ -4,6 +4,7 @@ package main import ( "fmt" + "os" "regexp" "strconv" "strings" @@ -258,7 +259,12 @@ func TestKillRestore(t *testing.T) { // create a local backup, drop the table so restore has to recreate+attach. runActionWait(r, env, fmt.Sprintf("create --tables=%s.* %s", dbName, backupName), "create", backupName, 60*time.Second) - env.queryWithNoError(r, fmt.Sprintf("DROP TABLE %s.t1 SYNC", dbName)) + // SYNC keyword not supported before 21.x + dropSQL := fmt.Sprintf("DROP TABLE %s.t1", dbName) + if compareVersion(os.Getenv("CLICKHOUSE_VERSION"), "21.1") >= 0 { + dropSQL += " SYNC" + } + env.queryWithNoError(r, dropSQL) startOut := postAction(r, env, "restore "+backupName) r.Contains(startOut, "acknowledged", "restore must be acknowledged: %s", startOut) From 4878e9cef79f43cefa3b7ab7f037eb7941666bd4 Mon Sep 17 00:00:00 2001 From: slach Date: Tue, 9 Jun 2026 21:55:08 +0400 Subject: [PATCH 10/24] add example for GCS workload identity --- Examples.md | 162 ++++++++++++++++++++++++++++++++++++++++++++++++++++ ReadMe.md | 1 + 2 files changed, 163 insertions(+) diff --git a/Examples.md b/Examples.md index b797f104..44a76cdb 100644 --- a/Examples.md +++ b/Examples.md @@ -906,6 +906,168 @@ spec: name: clickhouse-backup-config ``` +## How to use GCP Workload Identity to allow GCS backup without Explicit credentials + +This is the Google Cloud equivalent of AaWS IRSA. On GKE with +[Workload Identity Federation for GKE](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity) +enabled, a Kubernetes `ServiceAccount` (KSA) is bound to a Google Cloud IAM service account (GSA), +so `clickhouse-backup` authenticates to Google Cloud Storage without a `credentials_file` or +`credentials_json`. + +`clickhouse-backup` supports two GCS auth modes under Workload Identity: + +- **Direct binding** — annotate the KSA with the target GSA. The pod's Application Default + Credentials (ADC) already *are* that GSA, so leave the whole `gcs.credentials_*`/`gcs.sa_email` + block empty and `clickhouse-backup` uses ADC automatically. +- **Impersonation via `gcs.sa_email`** — the pod runs as one identity (a "source" GSA bound to the + KSA, or even the cluster default) and `clickhouse-backup` mints a short-lived token for a separate + "target" GSA that owns the bucket permissions. Set `gcs.sa_email` to the target GSA email; the + source identity needs `roles/iam.serviceAccountTokenCreator` on the target. This is the GCS + `sa_email` flow and is what the steps below configure. + +First set the variables used throughout (look the values up if you don't know them): + +```bash +# project that owns the GCS bucket and the service accounts +gcloud projects list +PROJECT_ID= +PROJECT_NUMBER=$(gcloud projects describe "${PROJECT_ID}" --format='value(projectNumber)') + +# GKE cluster that runs clickhouse — Workload Identity must be enabled on it: +gcloud container clusters list +CLUSTER_NAME= +CLUSTER_LOCATION= +# enable Workload Identity if it is not already (no-op if already enabled): +gcloud container clusters update "${CLUSTER_NAME}" --location "${CLUSTER_LOCATION}" \ + --workload-pool="${PROJECT_ID}.svc.id.goog" + +# GCS bucket that will hold the backups — pick an existing one: +gcloud storage buckets list --format='value(name)' +GCS_BUCKET= +# or create it (bucket names are globally unique): +gcloud storage buckets create gs://your-bucket-name --project "${PROJECT_ID}" --location +GCS_BUCKET=your-bucket-name + +# kubernetes namespace and service account name (created later): +NAMESPACE=your-kubernetes-namespace +SERVICE_ACCOUNT_NAME=your-kubernetes-service-account +``` + +Create the **target** Google Cloud service account (its email becomes `gcs.sa_email`) and grant it +access to the bucket (`roles/storage.objectAdmin` scoped to the bucket is enough for +backup/restore; use `roles/storage.admin` if `clickhouse-backup` must also create the bucket): + +```bash +TARGET_GSA_NAME=clickhouse-backup-gcs-sa-name +gcloud iam service-accounts create "${TARGET_GSA_NAME}" --project "${PROJECT_ID}" \ + --display-name "clickhouse-backup GCS access" +TARGET_GSA_EMAIL="${TARGET_GSA_NAME}@${PROJECT_ID}.iam.gserviceaccount.com" + +# grant bucket access to the target GSA (scoped to the single bucket): +gcloud storage buckets add-iam-policy-binding "gs://${GCS_BUCKET}" \ + --member "serviceAccount:${TARGET_GSA_EMAIL}" \ + --role "roles/storage.objectAdmin" +``` + +Bind the Kubernetes `ServiceAccount` to a **source** identity via Workload Identity, and allow that +source identity to impersonate the target GSA. The simplest source identity is the target GSA +itself bound directly to the KSA — then the source impersonates itself, which keeps a single GSA in +play while still exercising the `gcs.sa_email` flow: + +```bash +# allow the KSA to act as the source GSA (here: the same target GSA): +gcloud iam service-accounts add-iam-policy-binding "${TARGET_GSA_EMAIL}" \ + --project "${PROJECT_ID}" \ + --role "roles/iam.workloadIdentityUser" \ + --member "serviceAccount:${PROJECT_ID}.svc.id.goog[${NAMESPACE}/${SERVICE_ACCOUNT_NAME}]" + +# allow the source identity to mint impersonated tokens for the target GSA +# (required because gcs.sa_email goes through impersonate.CredentialsTokenSource): +gcloud iam service-accounts add-iam-policy-binding "${TARGET_GSA_EMAIL}" \ + --project "${PROJECT_ID}" \ + --role "roles/iam.serviceAccountTokenCreator" \ + --member "serviceAccount:${TARGET_GSA_EMAIL}" +``` + +Create the namespace and a service account annotated with the source GSA so the GKE webhook injects +the Workload Identity credentials into the pod: + +```bash +kubectl create ns "${NAMESPACE}" +``` + +```yaml +apiVersion: v1 +kind: ServiceAccount +metadata: + name: ${SERVICE_ACCOUNT_NAME} + namespace: ${NAMESPACE} + annotations: + # the source GSA the pod runs as; clickhouse-backup then impersonates gcs.sa_email + iam.gke.io/gcp-service-account: ${TARGET_GSA_EMAIL} +``` + +Put the `clickhouse-backup` config into a `ConfigMap` (no `credentials_file`/`credentials_json` +needed). With `gcs.sa_email` set, `clickhouse-backup` uses the pod's ambient Workload Identity +credentials to impersonate the target service account. Mount this `ConfigMap` into +`/etc/clickhouse-backup/` and link the service account to the podTemplate: + +```yaml +--- +apiVersion: v1 +kind: ConfigMap +metadata: + name: clickhouse-backup-config + namespace: ${NAMESPACE} +data: + config.yml: | + general: + remote_storage: gcs + gcs: + # ${TARGET_GSA_EMAIL} — the target service account that owns the bucket permissions; + # the pod's Workload Identity credentials impersonate it via impersonate.CredentialsTokenSource + sa_email: ${TARGET_GSA_EMAIL} + # ${GCS_BUCKET} — the bucket granted roles/storage.objectAdmin above + bucket: ${GCS_BUCKET} + path: backup +--- +apiVersion: "clickhouse.altinity.com/v1" +kind: "ClickHouseInstallation" +metadata: + name: + namespace: ${NAMESPACE} +spec: + defaults: + templates: + podTemplate: + templates: + podTemplates: + - name: + spec: + serviceAccountName: ${SERVICE_ACCOUNT_NAME} + containers: + - name: clickhouse + image: clickhouse/clickhouse-server:latest + - name: clickhouse-backup + image: altinity/clickhouse-backup:latest + command: + - bash + - -xc + - "/bin/clickhouse-backup server" + volumeMounts: + - name: clickhouse-backup-config + mountPath: /etc/clickhouse-backup/ + volumes: + - name: clickhouse-backup-config + configMap: + name: clickhouse-backup-config +``` + +> If you prefer the **direct binding** mode instead, omit `gcs.sa_email` from the `ConfigMap`, +> keep the `iam.gke.io/gcp-service-account` annotation pointing at the GSA that owns the bucket, +> and skip the `roles/iam.serviceAccountTokenCreator` self-binding — `clickhouse-backup` will use +> Application Default Credentials directly. + ### How to use clickhouse-backup + clickhouse-operator in FIPS compatible mode in Kubernetes for S3 Use the image `altinity/clickhouse-backup:X.X.X-fips` (where X.X.X is the version number). diff --git a/ReadMe.md b/ReadMe.md index 0825473b..683d5807 100644 --- a/ReadMe.md +++ b/ReadMe.md @@ -690,6 +690,7 @@ Display a list of all operations from start of API server: `curl -s localhost:71 - [How to restore object disks to s3 with s3:CopyObject](Examples.md#how-to-restore-object-disks-to-s3-with-s3copyobject) - [How to use AWS IRSA and IAM to allow S3 backup without Explicit credentials](Examples.md#how-to-use-aws-irsa-and-iam-to-allow-s3-backup-without-explicit-credentials) - [How to use Azure AD Workload Identity to allow AZBLOB backup without Explicit credentials](Examples.md#how-to-use-azure-ad-workload-identity-to-allow-azblob-backup-without-explicit-credentials) +- [How to use GCP Workload Identity to allow GCS backup without Explicit credentials](Examples.md#how-to-use-gcp-workload-identity-to-allow-gcs-backup-without-explicit-credentials) - [How incremental backups work with remote storage](Examples.md#how-incremental-backups-work-with-remote-storage) - [How to watch backups work](Examples.md#how-to-watch-backups-work) - [How to track operation status with operation_id](Examples.md#How-to-track-operation-status-with-operation_id) From 2d12b3bf8ad7a32f33b68a173af20379054dab99 Mon Sep 17 00:00:00 2001 From: slach Date: Tue, 9 Jun 2026 23:23:27 +0400 Subject: [PATCH 11/24] fix TestKill after CI/CD failures --- Examples.md | 2 +- test/integration/kill_test.go | 203 ++++++++++++++++++++++++---------- 2 files changed, 148 insertions(+), 57 deletions(-) diff --git a/Examples.md b/Examples.md index 44a76cdb..764d2cd0 100644 --- a/Examples.md +++ b/Examples.md @@ -908,7 +908,7 @@ spec: ## How to use GCP Workload Identity to allow GCS backup without Explicit credentials -This is the Google Cloud equivalent of AaWS IRSA. On GKE with +This is the Google Cloud equivalent of AWS IRSA. On GKE with [Workload Identity Federation for GKE](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity) enabled, a Kubernetes `ServiceAccount` (KSA) is bound to a Google Cloud IAM service account (GSA), so `clickhouse-backup` authenticates to Google Cloud Storage without a `credentials_file` or diff --git a/test/integration/kill_test.go b/test/integration/kill_test.go index 6b2858c3..29805816 100644 --- a/test/integration/kill_test.go +++ b/test/integration/kill_test.go @@ -68,14 +68,14 @@ func TestKill(t *testing.T) { // 1. Create the local backup, wait for completion. createOut, err := env.DockerExecOut("clickhouse-backup", "bash", "-ce", - fmt.Sprintf("curl -sfL -XPOST 'http://localhost:7171/backup/create?table=%s.*&name=%s'", dbName, backupName)) + execCurlWithFailBody(fmt.Sprintf("-XPOST 'http://127.0.0.1:7171/backup/create?table=%s.*&name=%s'", dbName, backupName))) r.NoError(err, "%s\nunexpected POST /backup/create error: %v", createOut, err) r.NotContains(createOut, "\"status\":\"error\"") waitForActionStatus(r, env, "create", backupName, "success", 60*time.Second) // 2. Kick off upload (async). uploadOut, err := env.DockerExecOut("clickhouse-backup", "bash", "-ce", - fmt.Sprintf("curl -sfL -XPOST 'http://localhost:7171/backup/upload/%s'", backupName)) + execCurlWithFailBody(fmt.Sprintf("-XPOST 'http://127.0.0.1:7171/backup/upload/%s'", backupName))) r.NoError(err, "%s\nunexpected POST /backup/upload error: %v", uploadOut, err) r.Contains(uploadOut, "acknowledged") @@ -84,7 +84,7 @@ func TestKill(t *testing.T) { pidSeen := false for time.Now().Before(deadline) { statusOut, _ := env.DockerExecOut("clickhouse-backup", "bash", "-ce", - "curl -sfL 'http://localhost:7171/backup/actions?filter=upload'") + execCurlWithFailBody("'http://127.0.0.1:7171/backup/actions?filter=upload'")) lsOut, lsErr := env.DockerExecOut("clickhouse-backup", "bash", "-ce", "ls "+pidPath+" 2>/dev/null || true") if strings.Contains(statusOut, `"status":"in progress"`) && lsErr == nil && strings.Contains(lsOut, backupName) { pidSeen = true @@ -102,7 +102,7 @@ func TestKill(t *testing.T) { // observably longer than a trivial round-trip. killStart := time.Now() killOut, err := env.DockerExecOut("clickhouse-backup", "bash", "-ce", - fmt.Sprintf("curl -sfL 'http://localhost:7171/backup/kill?command=upload+%s'", backupName)) + execCurlWithFailBody(fmt.Sprintf("'http://127.0.0.1:7171/backup/kill?command=upload+%s'", backupName))) killElapsed := time.Since(killStart) r.NoError(err, "%s\nunexpected GET /backup/kill error: %v", killOut, err) r.Contains(killOut, "\"status\":\"success\"", "kill should succeed: %s", killOut) @@ -127,7 +127,7 @@ func TestKill(t *testing.T) { // 6. Delete must NOT trip on a stale pid file. deleteOut, err := env.DockerExecOut("clickhouse-backup", "bash", "-ce", - fmt.Sprintf("curl -sfL -XPOST 'http://localhost:7171/backup/delete/local/%s'", backupName)) + execCurlWithFailBody(fmt.Sprintf("-XPOST 'http://127.0.0.1:7171/backup/delete/local/%s'", backupName))) r.NoError(err, "%s\nunexpected POST /backup/delete error: %v", deleteOut, err) r.NotContains(deleteOut, "another clickhouse-backup", "delete must not see a stale pid lock: %s", deleteOut) @@ -182,9 +182,7 @@ func TestKillDownload(t *testing.T) { delOut := postAction(r, env, "delete local "+backupName) r.Contains(delOut, "\"status\":\"success\"", "delete local must succeed: %s", delOut) - // 2. start download and kill it mid-flight. - startOut := postAction(r, env, "download "+backupName) - r.Contains(startOut, "acknowledged", "download must be acknowledged: %s", startOut) + // 2. start download and kill it mid-flight (start happens inside observeInProgressAndKill). observeInProgressAndKill(r, env, "download "+backupName, backupName, "download", 15*time.Second) // 3. a follow-up delete must not trip on a stale pid lock. @@ -192,7 +190,7 @@ func TestKillDownload(t *testing.T) { r.NotContains(delOut, "another clickhouse-backup", "delete must not see a stale pid lock: %s", delOut) } -// TestKillCreate kills an in-progress create and verifies the create goroutine +// TestKillCreate kills an in-progress create and verifies the creation goroutine // stops (pid removed, last_create_finish advances, kill returns fast). func TestKillCreate(t *testing.T) { env, r := NewTestEnvironment(t) @@ -220,8 +218,8 @@ func TestKillCreate(t *testing.T) { }() time.Sleep(3 * time.Second) - startOut := postAction(r, env, fmt.Sprintf("create --tables=%s.* %s", dbName, backupName)) - r.Contains(startOut, "acknowledged", "create must be acknowledged: %s", startOut) + // start happens inside observeInProgressAndKill so a fast create cannot + // finish before the kill is issued. observeInProgressAndKill(r, env, fmt.Sprintf("create --tables=%s.* %s", dbName, backupName), backupName, "create", 15*time.Second) // a follow-up delete must not trip on a stale pid lock. @@ -266,8 +264,8 @@ func TestKillRestore(t *testing.T) { } env.queryWithNoError(r, dropSQL) - startOut := postAction(r, env, "restore "+backupName) - r.Contains(startOut, "acknowledged", "restore must be acknowledged: %s", startOut) + // start happens inside observeInProgressAndKill so a fast restore cannot + // finish before the kill is issued. observeInProgressAndKill(r, env, "restore "+backupName, backupName, "restore", 15*time.Second) } @@ -284,7 +282,7 @@ func readUploadFinishMetric(r *require.Assertions, env *TestEnvironment) int64 { func readActionFinishMetric(r *require.Assertions, env *TestEnvironment, command string) int64 { metric := "clickhouse_backup_last_" + command + "_finish" out, err := env.DockerExecOut("clickhouse-backup", "bash", "-ce", - "curl -sfL http://localhost:7171/metrics | grep -E '^"+metric+" '") + "curl -sSL http://127.0.0.1:7171/metrics | grep -E '^"+metric+" '") r.NoError(err, "/metrics scrape failed: %s", out) matches := regexp.MustCompile(metric + `\s+([0-9.eE+\-]+)`).FindStringSubmatch(out) r.Len(matches, 2, "could not parse %s metric: %q", metric, out) @@ -293,6 +291,30 @@ func readActionFinishMetric(r *require.Assertions, env *TestEnvironment, command return int64(v) } +// execCurlWithFailBody builds a shell command that runs curl, prints the +// response body, and exits non-zero on failure — so callers' r.NoError(err) +// trips on errors while the body stays visible for diagnosis instead of just an +// exit code. curl runs inside the ClickHouse server container, whose curl +// version tracks CLICKHOUSE_VERSION: +// +// - CLICKHOUSE_VERSION >= 25.3 (image base moved to Ubuntu 22.04, curl 7.81) +// ships `curl --fail-with-body`, so use it directly. +// - older images (Ubuntu <= 20.04, curl <= 7.68) lack --fail-with-body, so +// emulate it: capture %{http_code} and exit 22 (curl's +// CURLE_HTTP_RETURNED_ERROR) on HTTP >= 400. curl's own exit code (e.g. 7 +// connection-refused, 28 timeout) is preserved for transport errors. +// +// args is everything after `curl` (flags, -d data, and the quoted URL). +func execCurlWithFailBody(args string) string { + if compareVersion(os.Getenv("CLICKHOUSE_VERSION"), "25.3") >= 0 { + return "curl -sSL --fail-with-body " + args + } + return `rc=0; resp=$(curl -sSL -w '\n%{http_code}' ` + args + `) || rc=$?; ` + + `code="${resp##*$'\n'}"; printf '%s' "${resp%$'\n'*}"; ` + + `if [ "$rc" -ne 0 ]; then exit "$rc"; fi; ` + + `if [ "${code:-000}" -ge 400 ]; then exit 22; fi` +} + // postAction POSTs a single command to /backup/actions and returns the raw // response body. The command value is JSON-encoded via %q; since no command // used by these tests contains a single quote, the JSON is safely wrapped in @@ -300,7 +322,7 @@ func readActionFinishMetric(r *require.Assertions, env *TestEnvironment, command func postAction(r *require.Assertions, env *TestEnvironment, command string) string { body := fmt.Sprintf(`{"command":%q}`, command) out, err := env.DockerExecOut("clickhouse-backup", "bash", "-ce", - "curl -sfL -XPOST 'http://localhost:7171/backup/actions' -d '"+body+"'") + execCurlWithFailBody("-XPOST 'http://127.0.0.1:7171/backup/actions' -d '"+body+"'")) r.NoError(err, "%s\nPOST /backup/actions %q error: %v", out, command, err) return out } @@ -329,22 +351,63 @@ func killSetupTable(r *require.Assertions, env *TestEnvironment, dbName string) "INSERT INTO %s.t1 SELECT number, '%s' FROM numbers(10000)", dbName, payload)) } -// actionInProgress reports whether /backup/actions output has a row whose -// command equals exactly `command` and is still in progress. -func actionInProgress(actionsOut, command string) bool { - for _, line := range strings.Split(actionsOut, "\n") { - if strings.Contains(line, `"command":"`+command+`"`) && strings.Contains(line, `"status":"in progress"`) { - return true - } - } - return false -} - -// observeInProgressAndKill waits until `command` is observably in-progress with -// its pid file present, kills it via /backup/actions, and asserts that the kill -// behaved correctly. metricCommand is the bare command name ("create", +// observeInProgressKillScript is a bash program run inside the clickhouse-backup +// container that STARTS `command`, waits until it is in-progress (its pid file +// exists), then fires the kill and captures the proof of cancellation — all +// without leaving the container. +// +// Keeping curl off the critical path is essential. A fast action (~230ms create +// on old ClickHouse) is comparable to how long a single curl takes to even spawn +// when the old amd64-only ClickHouse image runs under QEMU emulation (process +// startup ~100ms+, measured). Polling /backup/actions with curl to detect +// "in progress" would therefore consume the whole window before the kill is sent +// (the kill then reports "command not found"). Instead: +// - the before-metric is read BEFORE the start, off the critical path; +// - "in progress" is detected with a pure-bash wait on the pid file (no +// process spawn) — pidlock creates the file for the whole duration of +// create/download/restore and removes it when the worker returns, so its +// presence is an exact in-progress proxy; +// - only the start and the kill spawn curl, so the kill is issued ~one curl +// spawn after the pid appears, well inside the in-progress window. +// +// This is reliable on every ClickHouse version and architecture regardless of +// how quickly the worker runs. +// +// Placeholders are substituted in Go (none contain a single quote, so +// single-quote shell wrapping is safe). awk extracts the metric so a missing +// line yields "" instead of a non-zero exit that bash -e would abort on. SECONDS +// is a bash builtin (no spawn), so the wait loop adds no latency to detection. +const observeInProgressKillScript = ` +pid="__PID__" +metric="__METRIC__" +before=$(curl -sSL 'http://127.0.0.1:7171/metrics' 2>/dev/null | awk -v m="$metric" '$1==m {print $2}') +echo "FINISH_BEFORE=$before" +start_resp=$(curl -sSL -XPOST 'http://127.0.0.1:7171/backup/actions' -d '__STARTBODY__' 2>/dev/null || true) +printf 'START_RESP=%s\n' "$(printf '%s' "$start_resp" | tr -d '\n')" +SECONDS=0 +observed=0 +while [ "$SECONDS" -lt 30 ]; do + if [ -f "$pid" ]; then observed=1; break; fi +done +echo "OBSERVED=$observed" +[ "$observed" -eq 1 ] || exit 0 +start=$(date +%s%N) +kill_resp=$(curl -sSL -XPOST 'http://127.0.0.1:7171/backup/actions' -d '__KILLBODY__' 2>/dev/null || true) +end=$(date +%s%N) +echo "KILL_ELAPSED_MS=$(( (end - start) / 1000000 ))" +after=$(curl -sSL 'http://127.0.0.1:7171/metrics' 2>/dev/null | awk -v m="$metric" '$1==m {print $2}') +echo "FINISH_AFTER=$after" +if [ -f "$pid" ]; then echo "PID=EXISTS"; else echo "PID=GONE"; fi +printf 'KILL_RESP=%s\n' "$(printf '%s' "$kill_resp" | tr -d '\n')" +` + +// observeInProgressAndKill starts `command` via /backup/actions, waits until it +// is observably in-progress with its pid file present, kills it, and asserts the +// kill behaved correctly. metricCommand is the bare command name ("create", // "download", "restore") whose clickhouse_backup_last__finish gauge is -// used as the proof that the worker goroutine actually returned. +// used as the proof that the worker goroutine actually returned. The start and +// kill run in one in-container script (see observeInProgressKillScript) so a +// fast worker cannot finish before the kill is issued. // // Two assertions discriminate a real cancellation from a hung worker: // 1. kill returns well under cancel_operation_timeout — a worker that ignores @@ -353,40 +416,69 @@ func actionInProgress(actionsOut, command string) bool { // after cliApp.Run returns; if waitDone merely timed out it stays put. func observeInProgressAndKill(r *require.Assertions, env *TestEnvironment, command, backupName, metricCommand string, cancelTimeout time.Duration) { pidPath := fmt.Sprintf("/tmp/clickhouse-backup.%s.pid", backupName) - deadline := time.Now().Add(30 * time.Second) - observed := false - for time.Now().Before(deadline) { - statusOut, _ := env.DockerExecOut("clickhouse-backup", "bash", "-ce", - "curl -sfL 'http://localhost:7171/backup/actions'") - lsOut, lsErr := env.DockerExecOut("clickhouse-backup", "bash", "-ce", "ls "+pidPath+" 2>/dev/null || true") - if actionInProgress(statusOut, command) && lsErr == nil && strings.Contains(lsOut, backupName) { - observed = true - break - } - time.Sleep(50 * time.Millisecond) - } - r.True(observed, "expected to observe %q in-progress with pid file %s present", command, pidPath) - - finishBefore := readActionFinishMetric(r, env, metricCommand) - - killStart := time.Now() - killOut := postAction(r, env, fmt.Sprintf("kill %q", command)) - killElapsed := time.Since(killStart) - r.Contains(killOut, "\"status\":\"success\"", "kill should succeed: %s", killOut) + metric := "clickhouse_backup_last_" + metricCommand + "_finish" + startBody := fmt.Sprintf(`{"command":%q}`, command) + killBody := fmt.Sprintf(`{"command":%q}`, fmt.Sprintf("kill %q", command)) + script := strings.NewReplacer( + "__PID__", pidPath, + "__METRIC__", metric, + "__STARTBODY__", startBody, + "__KILLBODY__", killBody, + ).Replace(observeInProgressKillScript) + + out, err := env.DockerExecOut("clickhouse-backup", "bash", "-ce", script) + r.NoError(err, "observe+kill script failed:\n%s", out) + + r.Contains(scriptField(out, "START_RESP="), "acknowledged", + "%q must be acknowledged:\n%s", command, out) + r.Contains(out, "OBSERVED=1", + "expected to observe %q in-progress with pid file %s present:\n%s", command, pidPath, out) + + finishBefore := parseFinishField(r, out, "FINISH_BEFORE=") + finishAfter := parseFinishField(r, out, "FINISH_AFTER=") + killElapsed := time.Duration(parseIntField(r, out, "KILL_ELAPSED_MS=")) * time.Millisecond + killResp := scriptField(out, "KILL_RESP=") log.Info().Msgf("kill %q returned in %s", command, killElapsed) + r.Contains(killResp, "\"status\":\"success\"", "kill should succeed: %s", killResp) + r.Less(killElapsed, cancelTimeout-2*time.Second, "kill %q returned in %s; a worker that ignored context cancellation makes "+ "status.waitDone block until cancel_operation_timeout=%s", command, killElapsed, cancelTimeout) - finishAfter := readActionFinishMetric(r, env, metricCommand) r.Greater(finishAfter, finishBefore, "clickhouse_backup_last_%s_finish must advance during kill (before=%d after=%d); "+ "the %s goroutine did not return", metricCommand, finishBefore, finishAfter, metricCommand) - checkOut, _ := env.DockerExecOut("clickhouse-backup", "bash", "-ce", - "if [ -f "+pidPath+" ]; then echo EXISTS; cat "+pidPath+"; else echo GONE; fi") - r.Contains(checkOut, "GONE", "pid file %s must be removed by kill, got: %s", pidPath, checkOut) + r.Contains(out, "PID=GONE", "pid file %s must be removed by kill:\n%s", pidPath, out) +} + +// scriptField returns the value after the first line of observe+kill script +// output that starts with prefix (e.g. "FINISH_BEFORE="), or "" if absent. +func scriptField(out, prefix string) string { + for _, line := range strings.Split(out, "\n") { + if strings.HasPrefix(line, prefix) { + return strings.TrimSpace(strings.TrimPrefix(line, prefix)) + } + } + return "" +} + +// parseFinishField parses a clickhouse_backup_last_*_finish unix-timestamp gauge +// (a float such as 1.749e+09) emitted by the observe+kill script into seconds. +func parseFinishField(r *require.Assertions, out, prefix string) int64 { + field := scriptField(out, prefix) + v, err := strconv.ParseFloat(field, 64) + r.NoError(err, "parse %s%q from:\n%s", prefix, field, out) + return int64(v) +} + +// parseIntField parses an integer field emitted by the observe+kill script. +func parseIntField(r *require.Assertions, out, prefix string) int64 { + field := scriptField(out, prefix) + v, err := strconv.ParseInt(field, 10, 64) + r.NoError(err, "parse %s%q from:\n%s", prefix, field, out) + return v } // waitForActionStatus polls /backup/actions and returns once a row whose @@ -398,8 +490,7 @@ func waitForActionStatus(r *require.Assertions, env *TestEnvironment, cmdPrefix, if time.Now().After(deadline) { r.FailNow(fmt.Sprintf("timeout waiting for %s ... %s to reach status %q", cmdPrefix, nameNeedle, expected)) } - out, err := env.DockerExecOut("clickhouse-backup", "bash", "-ce", - "curl -sfL 'http://localhost:7171/backup/actions'") + out, err := env.DockerExecOut("clickhouse-backup", "bash", "-ce", execCurlWithFailBody("'http://127.0.0.1:7171/backup/actions'")) r.NoError(err) for _, line := range strings.Split(out, "\n") { if strings.Contains(line, `"command":"`+cmdPrefix) && From ed1c0509a415f101c1491930759ae2dc2bae27ec Mon Sep 17 00:00:00 2001 From: slach Date: Wed, 10 Jun 2026 06:49:51 +0400 Subject: [PATCH 12/24] fix TestCustomRestic for restic >= 0.17 exit-code and stats output changes restic 0.19.0 (test installs releases/latest) crossed the 0.17.0 boundary where exit codes were formalized, breaking two custom-storage scripts that had not changed in years: - upload.sh: restic now returns exit code 3 ("backed up some source files, but not all") whenever a command-line path does not exist. Most disks have no backup directory, so the skipped paths flip the exit code even though the snapshot is saved. set -euo pipefail aborted before restic forget, so clickhouse-backup saw a failure. Accept exit code 3, fail on anything else. - list.sh: restic stats --json now writes a progress line to stdout before the JSON (even with --quiet), poisoning the jq parse and failing list_custom. Keep only the JSON object line via grep. Verified live in the running container and with the full TestCustomRestic run. --- test/integration/restic/list.sh | 4 +++- test/integration/restic/upload.sh | 11 +++++++++++ 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/test/integration/restic/list.sh b/test/integration/restic/list.sh index 98f7f0d2..f5cf716c 100755 --- a/test/integration/restic/list.sh +++ b/test/integration/restic/list.sh @@ -6,7 +6,9 @@ source "${CUR_DIR}/init.sh" rm -rf /tmp/restic_list_full.json restic snapshots --insecure-tls --json | jq -c -M '.[] | {"snapshot_id": .short_id, "backup_name": .tags[0], "creation_date": .time, "upload_date": .time }' > /tmp/restic_list.json jq -c -r -M --slurp '.[].snapshot_id' /tmp/restic_list.json | while IFS= read -r snapshot_id ; do - jq -c -M -s 'add' <(grep ${snapshot_id} /tmp/restic_list.json) <(restic stats --insecure-tls --json ${snapshot_id}) >> /tmp/restic_list_full.json + # restic >= 0.17 prints a progress line to stdout before the JSON document + # (even with --quiet), so keep only the JSON object line for jq. + jq -c -M -s 'add' <(grep ${snapshot_id} /tmp/restic_list.json) <(restic stats --insecure-tls --json ${snapshot_id} | grep -E '^\{') >> /tmp/restic_list_full.json done cat /tmp/restic_list_full.json | jq -c -M --slurp '.[] | .data_size = .total_size | .metadata_size = 0' set -x diff --git a/test/integration/restic/upload.sh b/test/integration/restic/upload.sh index 9fc3866c..61c4bd9a 100755 --- a/test/integration/restic/upload.sh +++ b/test/integration/restic/upload.sh @@ -10,5 +10,16 @@ if [[ "" != "${DIFF_FROM_REMOTE}" ]]; then DIFF_FROM_REMOTE=$(${CUR_DIR}/list.sh | grep "${DIFF_FROM_REMOTE}" | jq -r -c '.snapshot_id') DIFF_FROM_REMOTE_CMD="--parent ${DIFF_FROM_REMOTE}" fi +# restic >= 0.17 returns exit code 3 ("command was able to back up some of the +# source files, but not all of them") when any source path passed on the command +# line does not exist. Not every disk has a backup directory, so the skipped +# paths trigger exit 3 even though the snapshot is saved correctly. Treat exit +# code 3 as success and fail on anything else. +set +e restic backup --insecure-tls $DIFF_FROM_REMOTE_CMD --tag "${BACKUP_NAME}" $LOCAL_PATHS +backup_rc=$? +set -e +if [ "${backup_rc}" -ne 0 ] && [ "${backup_rc}" -ne 3 ]; then + exit "${backup_rc}" +fi restic forget --insecure-tls --keep-last ${RESTIC_KEEP_LAST} --prune \ No newline at end of file From 221581d663a166cb61d66d47a1d7cc45b42af06c Mon Sep 17 00:00:00 2001 From: slach Date: Wed, 10 Jun 2026 08:30:20 +0400 Subject: [PATCH 13/24] v2.7.2 ChangeLog.md --- ChangeLog.md | 10 ++++++++++ 1 file changed, 10 insertions(+) diff --git a/ChangeLog.md b/ChangeLog.md index eaa581ef..788fce70 100644 --- a/ChangeLog.md +++ b/ChangeLog.md @@ -1,3 +1,13 @@ +# v2.7.2 + +IMPROVEMENTS +- include the data part file path and source object key (`fPath`, `srcKey`) in `object_disk` `CopyObject`/`CopyObjectStreaming` error messages during `create` and `restore`, so broken `object_disk` data (keys missing on remote storage) points to the exact failing file instead of just the shadow/table name +- document GCS Workload Identity authentication for the `gcs` remote storage in `Examples.md` + +BUG FIXES +- fix `clickhouse.skip_table_engines` (env `CLICKHOUSE_SKIP_TABLE_ENGINES`) silently keeping some matching tables: the in-place slice removal advanced the cursor past the next element, so adjacent tables sharing a skipped engine were not all skipped; iterate in reverse so every match is dropped, fix [#1416](https://github.com/Altinity/clickhouse-backup/issues/1416) +- ensure `/backup/kill` (and context cancellation in general) promptly aborts an in-flight `download`/`restore` even when a read is stalled on a slow/half-open network or disk backpressure — the source reader is now force-closed on cancellation so blocked `Read` calls in `DownloadCompressedStream`/`DownloadPath`, the Azure `CopyObject` poll backoff, and `object_disk.CopyObjectStreaming` return instead of running to completion, and the resumable `.pid` file is removed, fix [#1365](https://github.com/Altinity/clickhouse-backup/issues/1365) + # v2.7.1 NEW FEATURES From d2f12600abfafd372c0c77ff9a8d12cb76cae4bb Mon Sep 17 00:00:00 2001 From: slach Date: Sun, 14 Jun 2026 17:23:18 +0400 Subject: [PATCH 14/24] switch to awsV2Config.WithRetryMode(aws.RetryModeAdaptive), `general.compression_use_multi_thread` to `true` fix https://github.com/Altinity/clickhouse-backup/issues/1378 --- ChangeLog.md | 8 +++++ Examples.md | 7 +++-- ReadMe.md | 2 +- pkg/config/config.go | 29 ++++++++++--------- pkg/config/config_test.go | 11 +++++-- pkg/storage/s3.go | 4 +-- .../tests/snapshots/cli.py.cli.snapshot | 2 +- 7 files changed, 41 insertions(+), 22 deletions(-) diff --git a/ChangeLog.md b/ChangeLog.md index 788fce70..0d3d1c3e 100644 --- a/ChangeLog.md +++ b/ChangeLog.md @@ -1,3 +1,11 @@ +# v2.7.3 + +IMPROVEMENTS +- switch the S3 client retry mode from `aws.RetryModeStandard` to `aws.RetryModeAdaptive`, so upload/download retries back off with a client-side rate limiter under throttling (`SlowDown`/503) instead of a fixed token bucket + +BUG FIXES +- restore the default `general.compression_use_multi_thread` to `true`: [#1378](https://github.com/Altinity/clickhouse-backup/issues/1378) defaulted it to `false`, which silently dropped gzip/zstd compression from multi-threaded (pre-1378 gzip always used `pgzip`) to single-threaded and caused ~30% slower upload throughput for backups dominated by one large table (where `upload_concurrency` provides no per-stream parallelism); the option is now silently ignored instead of failing config validation for formats other than gzip/zstd + # v2.7.2 IMPROVEMENTS diff --git a/Examples.md b/Examples.md index 764d2cd0..3bd1ecc8 100644 --- a/Examples.md +++ b/Examples.md @@ -202,9 +202,10 @@ Notes for the other backends: ## Multi-threaded zstd/gzip compression -By default `compression_use_multi_thread: false`, each compression stream is single-threaded. clickhouse-backup already -parallelizes compression across tables via `upload_concurrency`/`download_concurrency`, so per-stream multi-threading -mainly over-subscribes the CPU and reduces total throughput, see https://github.com/Altinity/clickhouse-backup/issues/1378. +By default `compression_use_multi_thread: true`, each zstd/gzip stream is compressed with multiple threads (gzip via +pgzip), matching the pre-1378 behavior. A single large table dominating a backup is gated by per-stream compression speed +and gets no benefit from `upload_concurrency`/`download_concurrency`. Set `compression_use_multi_thread: false` to save CPU +when many tables upload in parallel, see https://github.com/Altinity/clickhouse-backup/issues/1378. ```yaml general: diff --git a/ReadMe.md b/ReadMe.md index 683d5807..3bec4d0d 100644 --- a/ReadMe.md +++ b/ReadMe.md @@ -126,7 +126,7 @@ general: download_copy_buffer_size: 0 # DOWNLOAD_COPY_BUFFER_SIZE, explicit buffer size in bytes for io.CopyBuffer during download/extract, 0 means use the Go default (32KB); raise (e.g. 1048576 = 1MB) to reduce syscalls per file on fast networks # zstd/gzip compression tuning, see https://github.com/Altinity/clickhouse-backup/issues/1378 and Examples.md#multi-threaded-zstdgzip-compression - compression_use_multi_thread: false # COMPRESSION_USE_MULTI_THREAD, enable per-stream multi-threaded zstd/gzip compression and decompression; default false because upload_concurrency/download_concurrency already parallelize across tables, so per-stream threading mainly over-subscribes CPU. Enable when backing up a single large table with low concurrency. Only affects compression_format: zstd and gzip + compression_use_multi_thread: true # COMPRESSION_USE_MULTI_THREAD, enable per-stream multi-threaded zstd/gzip compression and decompression; default true to match pre-1378 behavior (gzip always used pgzip). A single large table dominating a backup is gated by per-stream compression speed and gets no benefit from upload_concurrency/download_concurrency. Set false to save CPU when many tables upload in parallel. Only affects compression_format: zstd and gzip (silently ignored for other formats) compression_threads: 0 # COMPRESSION_THREADS, number of per-stream compression threads when compression_use_multi_thread is enabled (zstd concurrency / pgzip block workers), 0 means auto (GOMAXPROCS); must be unset/0 when compression_use_multi_thread is false compression_buffer_size: 0 # COMPRESSION_BUFFER_SIZE, compression buffer size in bytes, 0 keeps library defaults. Meaning and valid range depend on compression_format and compression_use_multi_thread: zstd = encoder window (power of two, 1024..536870912, e.g. 4194304 = 4MB); single-threaded gzip = DEFLATE window (32..32768); multi-threaded gzip = pgzip block size (>16384). Other formats reject it diff --git a/pkg/config/config.go b/pkg/config/config.go index e773c19f..68e92212 100644 --- a/pkg/config/config.go +++ b/pkg/config/config.go @@ -67,7 +67,7 @@ type GeneralConfig struct { PipeBufferSize int64 `yaml:"pipe_buffer_size" envconfig:"PIPE_BUFFER_SIZE"` // DownloadCopyBufferSize - explicit buffer size for io.CopyBuffer during download/extract, 0 means use the Go default io.Copy buffer (32KB), see https://github.com/Altinity/clickhouse-backup/issues/1376 DownloadCopyBufferSize int64 `yaml:"download_copy_buffer_size" envconfig:"DOWNLOAD_COPY_BUFFER_SIZE"` - // CompressionUseMultiThread - enable per-stream multi-threaded zstd/gzip compression and decompression (zstd encoder/decoder concurrency, gzip via pgzip). Default false because clickhouse-backup already parallelizes at table level via upload_concurrency/download_concurrency, so per-stream threading mainly over-subscribes CPU; enable it when backing up a single large table with low concurrency, see https://github.com/Altinity/clickhouse-backup/issues/1378 + // CompressionUseMultiThread - enable per-stream multi-threaded zstd/gzip compression and decompression (zstd encoder/decoder concurrency, gzip via pgzip). Default true to match pre-1378 behavior where gzip always used pgzip; a single large table dominating a backup is gated by per-stream compression speed and gets no benefit from table-level upload_concurrency/download_concurrency. Set false to save CPU when many tables upload in parallel. Only applies to compression_format zstd and gzip; silently ignored for other formats, see https://github.com/Altinity/clickhouse-backup/issues/1378 CompressionUseMultiThread bool `yaml:"compression_use_multi_thread" envconfig:"COMPRESSION_USE_MULTI_THREAD"` // CompressionThreads - number of per-stream compression threads when compression_use_multi_thread is enabled (zstd concurrency / pgzip block workers); 0 means auto (GOMAXPROCS). Ignored when compression_use_multi_thread is false, see https://github.com/Altinity/clickhouse-backup/issues/1378 CompressionThreads int `yaml:"compression_threads" envconfig:"COMPRESSION_THREADS"` @@ -417,15 +417,19 @@ func (cfg *Config) GetCompressionFormat() string { // buffer size only apply to zstd and gzip; the buffer size additionally has format- and mode-specific // ranges, see https://github.com/Altinity/clickhouse-backup/issues/1378 func validateCompressionTuning(cfg *Config) error { - format := cfg.GetCompressionFormat() - multiThreadSupported := format == "zstd" || format == "gzip" || format == "gz" - - if cfg.General.CompressionUseMultiThread && !multiThreadSupported { - return errors.Errorf("compression_use_multi_thread is only supported for 'zstd' and 'gzip' compression_format, not '%s'", format) - } if cfg.General.CompressionThreads < 0 { return errors.Errorf("compression_threads=%d is invalid, it must be >= 0 (0 means auto/GOMAXPROCS)", cfg.General.CompressionThreads) } + format := cfg.GetCompressionFormat() + multiThreadSupported := format == "zstd" || format == "gzip" || format == "gz" + if !multiThreadSupported { + // compression_use_multi_thread / compression_threads / compression_buffer_size only apply to + // zstd and gzip and are no-ops for other formats. Relax validation instead of failing config + // load: the default compression_use_multi_thread=true must not break tar/bzip2/xz/brotli/sz/none + // configs, so silently disable the knob here, see https://github.com/Altinity/clickhouse-backup/issues/1378 + cfg.General.CompressionUseMultiThread = false + return nil + } if cfg.General.CompressionThreads > 0 && !cfg.General.CompressionUseMultiThread { return errors.New("compression_threads is set but compression_use_multi_thread is false; enable compression_use_multi_thread or unset compression_threads") } @@ -433,13 +437,13 @@ func validateCompressionTuning(cfg *Config) error { if size == 0 { return nil } - switch format { - case "zstd": - // zstd encoder window size must be a power of two between 1KB and 512MB + // zstd encoder window size must be a power of two between 1KB and 512MB + if format == "zstd" { if size < 1024 || size > 512*1024*1024 || (size&(size-1)) != 0 { return errors.Errorf("compression_buffer_size=%d is invalid for zstd, it must be a power of two between 1024 and 536870912", size) } - case "gzip", "gz": + } + if format == "gzip" || format == "gz" { if cfg.General.CompressionUseMultiThread { // pgzip block size must be greater than its 16KB tail size if size <= 16384 { @@ -451,8 +455,6 @@ func validateCompressionTuning(cfg *Config) error { return errors.Errorf("compression_buffer_size=%d is invalid for single-threaded gzip, it must be between 32 and 32768", size) } } - default: - return errors.Errorf("compression_buffer_size is only supported for 'zstd' and 'gzip' compression_format, not '%s'", format) } return nil } @@ -733,6 +735,7 @@ func DefaultConfig() *Config { DeleteBatchSize: 1000, PipeBufferSize: 128 * 1024, DownloadCopyBufferSize: 0, + CompressionUseMultiThread: true, }, ClickHouse: ClickHouseConfig{ Username: "default", diff --git a/pkg/config/config_test.go b/pkg/config/config_test.go index 2027a466..80002427 100644 --- a/pkg/config/config_test.go +++ b/pkg/config/config_test.go @@ -28,8 +28,10 @@ func TestValidateConfigCompressionTuning(t *testing.T) { {"gzip multi-thread block too small", "gzip", true, 0, 16384, true}, {"gzip single-thread 32KB window", "gzip", false, 0, 32768, false}, {"gzip single-thread window too large", "gzip", false, 0, 65536, true}, - {"multi_thread on unsupported format", "brotli", true, 0, 0, true}, - {"buffer_size on unsupported format", "brotli", false, 0, 1024, true}, + // unsupported formats relax instead of failing: the default compression_use_multi_thread=true + // must not break them, and the knobs are no-ops there, see https://github.com/Altinity/clickhouse-backup/issues/1378 + {"multi_thread on unsupported format", "brotli", true, 0, 0, false}, + {"buffer_size on unsupported format", "brotli", false, 0, 1024, false}, {"negative threads", "zstd", true, -1, 0, true}, {"threads set without multi_thread", "zstd", false, 4, 0, true}, } @@ -46,6 +48,11 @@ func TestValidateConfigCompressionTuning(t *testing.T) { if !tc.wantErr && err != nil { t.Fatalf("unexpected error for %s: %v", tc.name, err) } + // on formats that don't support multi-thread the knob must be silently disabled, not honored + multiThreadSupported := tc.format == "zstd" || tc.format == "gzip" || tc.format == "gz" + if !multiThreadSupported && cfg.General.CompressionUseMultiThread { + t.Fatalf("expected compression_use_multi_thread to be disabled for unsupported format %s", tc.format) + } }) } } diff --git a/pkg/storage/s3.go b/pkg/storage/s3.go index 11a30d9f..0c80a02b 100644 --- a/pkg/storage/s3.go +++ b/pkg/storage/s3.go @@ -124,7 +124,7 @@ func (s *S3) Connect(ctx context.Context) error { var awsConfig aws.Config awsConfig, err = awsV2Config.LoadDefaultConfig( ctx, - awsV2Config.WithRetryMode(aws.RetryModeStandard), + awsV2Config.WithRetryMode(aws.RetryModeAdaptive), ) if err != nil { return errors.Wrap(err, "S3 Connect LoadDefaultConfig") @@ -140,7 +140,7 @@ func (s *S3) Connect(ctx context.Context) error { awsConfig.Credentials = stscreds.NewWebIdentityRoleProvider( stsClient, awsRoleARN, stscreds.IdentityTokenFile(awsWebIdentityTokenFile), ) - // inherit IRSA and try assume role https://github.com/Altinity/clickhouse-backup/issues/1191 + // inherit IRSA and try to assume role https://github.com/Altinity/clickhouse-backup/issues/1191 if s.Config.AssumeRoleARN != "" && s.Config.AssumeRoleARN != awsRoleARN { awsConfig.Credentials = aws.NewCredentialsCache(awsConfig.Credentials) stsClient = sts.NewFromConfig(awsConfig) diff --git a/test/testflows/clickhouse_backup/tests/snapshots/cli.py.cli.snapshot b/test/testflows/clickhouse_backup/tests/snapshots/cli.py.cli.snapshot index e5a3b413..d48133fe 100644 --- a/test/testflows/clickhouse_backup/tests/snapshots/cli.py.cli.snapshot +++ b/test/testflows/clickhouse_backup/tests/snapshots/cli.py.cli.snapshot @@ -1,4 +1,4 @@ -default_config = r"""'[\'general:\', \' remote_storage: none\', \' backups_to_keep_local: 0\', \' backups_to_keep_remote: 0\', \' log_level: info\', \' allow_empty_backups: false\', \' pipe_buffer_size: 131072\', \' download_copy_buffer_size: 0\', \' compression_use_multi_thread: false\', \' compression_threads: 0\', \' compression_buffer_size: 0\', \' allow_object_disk_streaming: false\', \' use_resumable_state: true\', \' restore_schema_on_cluster: ""\', \' upload_by_part: true\', \' download_by_part: true\', \' restore_database_mapping: {}\', \' restore_table_mapping: {}\', \' retries_on_failure: 3\', \' retries_pause: 5s\', \' retries_jitter: 0\', \' watch_interval: 1h\', \' full_interval: 24h\', \' watch_backup_name_template: shard{shard}-{type}-{time:20060102150405}\', \' sharded_operation_mode: ""\', \' cpu_nice_priority: 15\', \' io_nice_priority: idle\', \' rbac_backup_always: true\', \' rbac_conflict_resolution: recreate\', \' config_backup_always: false\', \' named_collections_backup_always: false\', \' delete_batch_size: 1000\', \' retriesduration: 5s\', \' watchduration: 1h0m0s\', \' fullduration: 24h0m0s\', \'clickhouse:\', \' username: default\', \' password: ""\', \' host: localhost\', \' port: 9000\', \' disk_mapping: {}\', \' skip_tables:\', \' - system.*\', \' - INFORMATION_SCHEMA.*\', \' - information_schema.*\', \' - _temporary_and_external_tables.*\', \' skip_table_engines: []\', \' skip_disks: []\', \' skip_disk_types: []\', \' timeout: 30m\', \' freeze_by_part: false\', \' freeze_by_part_where: ""\', \' use_embedded_backup_restore: false\', \' use_embedded_backup_restore_cluster: ""\', \' embedded_backup_disk: ""\', \' backup_mutations: true\', \' restore_as_attach: false\', \' restore_distributed_cluster: ""\', \' check_parts_columns: true\', \' parts_columns_batch_size: 25\', \' secure: false\', \' skip_verify: false\', \' sync_replicated_tables: false\', \' log_sql_queries: true\', \' config_dir: /etc/clickhouse-server/\', \' restart_command: exec:systemctl restart clickhouse-server\', \' ignore_not_exists_error_during_freeze: true\', \' check_replicas_before_attach: true\', \' default_replica_path: /clickhouse/tables/{cluster}/{shard}/{database}/{table}\', " default_replica_name: \'{replica}\'", \' tls_key: ""\', \' tls_cert: ""\', \' tls_ca: ""\', \' debug: false\', \' force_rebalance: false\', \'s3:\', \' access_key: ""\', \' secret_key: ""\', \' bucket: ""\', \' endpoint: ""\', \' region: us-east-1\', \' acl: private\', \' assume_role_arn: ""\', \' force_path_style: false\', \' path: ""\', \' object_disk_path: ""\', \' disable_ssl: false\', \' compression_level: 1\', \' compression_format: tar\', \' sse: ""\', \' sse_kms_key_id: ""\', \' sse_customer_algorithm: ""\', \' sse_customer_key: ""\', \' sse_customer_key_md5: ""\', \' sse_kms_encryption_context: ""\', \' disable_cert_verification: false\', \' use_custom_storage_class: false\', \' storage_class: STANDARD\', \' custom_storage_class_map: {}\', \' allow_multipart_download: false\', \' object_labels: {}\', \' request_payer: ""\', \' check_sum_algorithm: ""\', \' request_content_md5: false\', \' retry_mode: standard\', \' chunk_size: 5242880\', \' debug: false\', \' http_write_buffer_size: 0\', \' http_read_buffer_size: 0\', \' http_idle_conn_timeout: ""\', \'gcs:\', \' credentials_file: ""\', \' credentials_json: ""\', \' credentials_json_encoded: ""\', \' sa_email: ""\', \' embedded_access_key: ""\', \' embedded_secret_key: ""\', \' skip_credentials: false\', \' bucket: ""\', \' path: ""\', \' object_disk_path: ""\', \' compression_level: 1\', \' compression_format: tar\', \' debug: false\', \' force_http: false\', \' endpoint: ""\', \' storage_class: STANDARD\', \' object_labels: {}\', \' custom_storage_class_map: {}\', \' chunk_size: 16777216\', \' encryption_key: ""\', \' upload_buffer_size: 131072\', \'cos:\', \' url: ""\', \' timeout: 2m\', \' secret_id: ""\', \' secret_key: ""\', \' path: ""\', \' object_disk_path: ""\', \' compression_format: tar\', \' compression_level: 1\', \' allow_multipart_download: false\', \' debug: false\', \'api:\', \' listen: localhost:7171\', \' enable_metrics: true\', \' enable_pprof: false\', \' username: ""\', \' password: ""\', \' secure: false\', \' certificate_file: ""\', \' private_key_file: ""\', \' ca_cert_file: ""\', \' ca_key_file: ""\', \' create_integration_tables: false\', \' integration_tables_host: ""\', \' allow_parallel: false\', \' complete_resumable_after_restart: true\', \' complete_resumable_after_restart_commands:\', \' - upload\', \' - download\', \' watch_is_main_process: false\', \' backup_actions_skip_commands: []\', \' cancel_operation_timeout: 1800s\', \'ftp:\', \' address: ""\', \' timeout: 2m\', \' username: ""\', \' password: ""\', \' tls: false\', \' skip_tls_verify: false\', \' path: ""\', \' object_disk_path: ""\', \' compression_format: tar\', \' compression_level: 1\', \' debug: false\', \'sftp:\', \' address: ""\', \' port: 22\', \' username: ""\', \' password: ""\', \' key: ""\', \' path: ""\', \' object_disk_path: ""\', \' compression_format: tar\', \' compression_level: 1\', \' debug: false\', \'azblob:\', \' endpoint_schema: https\', \' endpoint_suffix: core.windows.net\', \' account_name: ""\', \' account_key: ""\', \' sas: ""\', \' use_managed_identity: false\', \' container: ""\', \' assume_container_exists: false\', \' path: ""\', \' object_disk_path: ""\', \' compression_level: 1\', \' compression_format: tar\', \' sse_key: ""\', \' buffer_count: 3\', \' timeout: 4h\', \' debug: false\', \'custom:\', \' upload_command: ""\', \' download_command: ""\', \' list_command: ""\', \' delete_command: ""\', \' command_timeout: 4h\', \' commandtimeoutduration: 4h0m0s\']'""" +default_config = r"""'[\'general:\', \' remote_storage: none\', \' backups_to_keep_local: 0\', \' backups_to_keep_remote: 0\', \' log_level: info\', \' allow_empty_backups: false\', \' pipe_buffer_size: 131072\', \' download_copy_buffer_size: 0\', \' compression_use_multi_thread: true\', \' compression_threads: 0\', \' compression_buffer_size: 0\', \' allow_object_disk_streaming: false\', \' use_resumable_state: true\', \' restore_schema_on_cluster: ""\', \' upload_by_part: true\', \' download_by_part: true\', \' restore_database_mapping: {}\', \' restore_table_mapping: {}\', \' retries_on_failure: 3\', \' retries_pause: 5s\', \' retries_jitter: 0\', \' watch_interval: 1h\', \' full_interval: 24h\', \' watch_backup_name_template: shard{shard}-{type}-{time:20060102150405}\', \' sharded_operation_mode: ""\', \' cpu_nice_priority: 15\', \' io_nice_priority: idle\', \' rbac_backup_always: true\', \' rbac_conflict_resolution: recreate\', \' config_backup_always: false\', \' named_collections_backup_always: false\', \' delete_batch_size: 1000\', \' retriesduration: 5s\', \' watchduration: 1h0m0s\', \' fullduration: 24h0m0s\', \'clickhouse:\', \' username: default\', \' password: ""\', \' host: localhost\', \' port: 9000\', \' disk_mapping: {}\', \' skip_tables:\', \' - system.*\', \' - INFORMATION_SCHEMA.*\', \' - information_schema.*\', \' - _temporary_and_external_tables.*\', \' skip_table_engines: []\', \' skip_disks: []\', \' skip_disk_types: []\', \' timeout: 30m\', \' freeze_by_part: false\', \' freeze_by_part_where: ""\', \' use_embedded_backup_restore: false\', \' use_embedded_backup_restore_cluster: ""\', \' embedded_backup_disk: ""\', \' backup_mutations: true\', \' restore_as_attach: false\', \' restore_distributed_cluster: ""\', \' check_parts_columns: true\', \' parts_columns_batch_size: 25\', \' secure: false\', \' skip_verify: false\', \' sync_replicated_tables: false\', \' log_sql_queries: true\', \' config_dir: /etc/clickhouse-server/\', \' restart_command: exec:systemctl restart clickhouse-server\', \' ignore_not_exists_error_during_freeze: true\', \' check_replicas_before_attach: true\', \' default_replica_path: /clickhouse/tables/{cluster}/{shard}/{database}/{table}\', " default_replica_name: \'{replica}\'", \' tls_key: ""\', \' tls_cert: ""\', \' tls_ca: ""\', \' debug: false\', \' force_rebalance: false\', \'s3:\', \' access_key: ""\', \' secret_key: ""\', \' bucket: ""\', \' endpoint: ""\', \' region: us-east-1\', \' acl: private\', \' assume_role_arn: ""\', \' force_path_style: false\', \' path: ""\', \' object_disk_path: ""\', \' disable_ssl: false\', \' compression_level: 1\', \' compression_format: tar\', \' sse: ""\', \' sse_kms_key_id: ""\', \' sse_customer_algorithm: ""\', \' sse_customer_key: ""\', \' sse_customer_key_md5: ""\', \' sse_kms_encryption_context: ""\', \' disable_cert_verification: false\', \' use_custom_storage_class: false\', \' storage_class: STANDARD\', \' custom_storage_class_map: {}\', \' allow_multipart_download: false\', \' object_labels: {}\', \' request_payer: ""\', \' check_sum_algorithm: ""\', \' request_content_md5: false\', \' retry_mode: standard\', \' chunk_size: 5242880\', \' debug: false\', \' http_write_buffer_size: 0\', \' http_read_buffer_size: 0\', \' http_idle_conn_timeout: ""\', \'gcs:\', \' credentials_file: ""\', \' credentials_json: ""\', \' credentials_json_encoded: ""\', \' sa_email: ""\', \' embedded_access_key: ""\', \' embedded_secret_key: ""\', \' skip_credentials: false\', \' bucket: ""\', \' path: ""\', \' object_disk_path: ""\', \' compression_level: 1\', \' compression_format: tar\', \' debug: false\', \' force_http: false\', \' endpoint: ""\', \' storage_class: STANDARD\', \' object_labels: {}\', \' custom_storage_class_map: {}\', \' chunk_size: 16777216\', \' encryption_key: ""\', \' upload_buffer_size: 131072\', \'cos:\', \' url: ""\', \' timeout: 2m\', \' secret_id: ""\', \' secret_key: ""\', \' path: ""\', \' object_disk_path: ""\', \' compression_format: tar\', \' compression_level: 1\', \' allow_multipart_download: false\', \' debug: false\', \'api:\', \' listen: localhost:7171\', \' enable_metrics: true\', \' enable_pprof: false\', \' username: ""\', \' password: ""\', \' secure: false\', \' certificate_file: ""\', \' private_key_file: ""\', \' ca_cert_file: ""\', \' ca_key_file: ""\', \' create_integration_tables: false\', \' integration_tables_host: ""\', \' allow_parallel: false\', \' complete_resumable_after_restart: true\', \' complete_resumable_after_restart_commands:\', \' - upload\', \' - download\', \' watch_is_main_process: false\', \' backup_actions_skip_commands: []\', \' cancel_operation_timeout: 1800s\', \'ftp:\', \' address: ""\', \' timeout: 2m\', \' username: ""\', \' password: ""\', \' tls: false\', \' skip_tls_verify: false\', \' path: ""\', \' object_disk_path: ""\', \' compression_format: tar\', \' compression_level: 1\', \' debug: false\', \'sftp:\', \' address: ""\', \' port: 22\', \' username: ""\', \' password: ""\', \' key: ""\', \' path: ""\', \' object_disk_path: ""\', \' compression_format: tar\', \' compression_level: 1\', \' debug: false\', \'azblob:\', \' endpoint_schema: https\', \' endpoint_suffix: core.windows.net\', \' account_name: ""\', \' account_key: ""\', \' sas: ""\', \' use_managed_identity: false\', \' container: ""\', \' assume_container_exists: false\', \' path: ""\', \' object_disk_path: ""\', \' compression_level: 1\', \' compression_format: tar\', \' sse_key: ""\', \' buffer_count: 3\', \' timeout: 4h\', \' debug: false\', \'custom:\', \' upload_command: ""\', \' download_command: ""\', \' list_command: ""\', \' delete_command: ""\', \' command_timeout: 4h\', \' commandtimeoutduration: 4h0m0s\']'""" help_flag = r"""'NAME:\n clickhouse-backup - Tool for easy backup of ClickHouse with cloud supportUSAGE:\n clickhouse-backup [-t, --tables=.] DESCRIPTION:\n Run as \'root\' or \'clickhouse\' userCOMMANDS:\n tables List of tables, exclude skip_tables\n create Create new backup\n create_remote Create and upload new backup\n upload Upload backup to remote storage\n list List of backups\n download Download backup from remote storage\n restore Create schema and restore data from backup\n restore_remote Download and restore\n delete Delete specific backup\n default-config Print default config\n print-config Print current config merged with environment variables\n clean Remove data in \'shadow\' folder from all \'path\' folders available from \'system.disks\'\n clean_remote_broken Remove all broken remote backups\n clean_local_broken Remove all broken local backups\n clean_broken_retention Remove orphan entries under remote `path` and `object_disks_path` that are not in the live backup list\n watch Run infinite loop which create full + incremental backup sequence to allow efficient backup sequences\n acvp Run ACVP wrapper protocol over stdin/stdout\n server Run API server\n help, h Shows a list of commands or help for one commandGLOBAL OPTIONS:\n --config value, -c value Config \'FILE\' name. (default: "/etc/clickhouse-backup/config.yml") [$CLICKHOUSE_BACKUP_CONFIG]\n --environment-override value, --env value override any environment variable via CLI parameter\n --fips-info Display FIPS build/runtime info and exit (no Go toolchain required).\n --help, -h show help\n --version, -v print the version'""" From c6363da50ecae61c73bb6b0131a6df48849a95ff Mon Sep 17 00:00:00 2001 From: vitaliis Date: Wed, 17 Jun 2026 15:57:45 -0400 Subject: [PATCH 15/24] Add FIPS configuration files for ClickHouse backup tests - `fips.xml` to disable insecure listeners and enforce secure ports with a specified cipher list - Added `listeners-fips-cipher-stress.xml` --- .../configs/clickhouse/config.d/fips.xml | 20 ++++++++++++ .../config.d/listeners-fips-cipher-stress.xml | 32 +++++++++++++++++++ 2 files changed, 52 insertions(+) create mode 100644 test/testflows/clickhouse_backup/configs/clickhouse/config.d/fips.xml create mode 100644 test/testflows/clickhouse_backup/configs/clickhouse_nonfips_server/config.d/listeners-fips-cipher-stress.xml diff --git a/test/testflows/clickhouse_backup/configs/clickhouse/config.d/fips.xml b/test/testflows/clickhouse_backup/configs/clickhouse/config.d/fips.xml new file mode 100644 index 00000000..d1403ce6 --- /dev/null +++ b/test/testflows/clickhouse_backup/configs/clickhouse/config.d/fips.xml @@ -0,0 +1,20 @@ + + + + + + + + + 8443 + 9440 + + + ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:AES128-GCM-SHA256:AES256-GCM-SHA384 + TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384 + true + sslv2,sslv3,tlsv1,tlsv1_1 + relaxed + + + diff --git a/test/testflows/clickhouse_backup/configs/clickhouse_nonfips_server/config.d/listeners-fips-cipher-stress.xml b/test/testflows/clickhouse_backup/configs/clickhouse_nonfips_server/config.d/listeners-fips-cipher-stress.xml new file mode 100644 index 00000000..4fa24d7b --- /dev/null +++ b/test/testflows/clickhouse_backup/configs/clickhouse_nonfips_server/config.d/listeners-fips-cipher-stress.xml @@ -0,0 +1,32 @@ + + + 9440 + + + /etc/clickhouse-server/ssl/server.crt + /etc/clickhouse-server/ssl/server.key + ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:AES128-GCM-SHA256:AES256-GCM-SHA384 + TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384 + true + sslv2,sslv3,tlsv1,tlsv1_1 + relaxed + + + From 256890fcc13aa275fabbb7681b41ce3c37e46da2 Mon Sep 17 00:00:00 2001 From: vitaliis Date: Wed, 17 Jun 2026 17:49:28 -0400 Subject: [PATCH 16/24] Added Pre-Publish Image Verification to tyhe FIPS test plan --- .../fips/QA_STP_ClickHouse_Backup_FIPS.md | 15 ++++++++++++++- 1 file changed, 14 insertions(+), 1 deletion(-) diff --git a/test/testflows/clickhouse_backup/requirements/fips/QA_STP_ClickHouse_Backup_FIPS.md b/test/testflows/clickhouse_backup/requirements/fips/QA_STP_ClickHouse_Backup_FIPS.md index 6b3b36da..7e63ea07 100644 --- a/test/testflows/clickhouse_backup/requirements/fips/QA_STP_ClickHouse_Backup_FIPS.md +++ b/test/testflows/clickhouse_backup/requirements/fips/QA_STP_ClickHouse_Backup_FIPS.md @@ -13,6 +13,7 @@ * 2 [Timeline](#timeline) * 3 [Configuration Requirements](#configuration-requirements) * 4 [Build Verification](#build-verification) + * 4.1 [Pre-Publish Image Verification](#pre-publish-image-verification) * 5 [Human Resources And Assignments](#human-resources-and-assignments) * 6 [Release Notes](#release-notes) * 7 [FIPS-Compatible `clickhouse-backup-fips` Configuration](#fips-compatible-clickhouse-backup-fips-configuration) @@ -87,7 +88,19 @@ For TLS policy validation, the test suite also uses OpenSSL probe tools: | Build flag | Run `go version -m clickhouse-backup-fips` (`gofips140_build_flags_present`) | Output contains `build GOFIPS140=v1.0.0` | | FIPS runtime posture across Go modes | Run `godebug_fips140_modes`, which runs `clickhouse-backup-fips --fips-info` with `GODEBUG` unset, empty, `fips140=off`, `fips140=on`, and `fips140=only` | For each mode, `--fips-info` reports the expected `enabled` / `enforced` flags (unset/empty/on → `true`/`false`; off → `false`/`false`; only → `true`/`true`) | -Direct checks of `crypto/fips140.Version()` and `crypto/fips140.Enabled()` are not called as standalone assertions in the current `clickhouse-backup` TestFlows scenarios; their behavior is validated through `--version` output and runtime connectivity checks above. +Direct checks of `crypto/fips140.Version()` and `crypto/fips140.Enabled()` are not called as standalone assertions in the current `clickhouse-backup` TestFlows scenarios; their behavior is validated through `--version` and `--info` outputs and runtime connectivity checks above. + +### Pre-Publish Image Verification + +In addition to the TestFlows scenarios above, the same FIPS build posture is enforced automatically in CI before any FIPS Docker image is published to the registry (Docker Hub). The `Verify FIPS 140-3 compatibility before push` step in both `.github/workflows/build.yaml` and `.github/workflows/release.yaml` builds the exact `image_fips` target locally and runs `.github/scripts/verify_fips_image.sh` against that image and the `clickhouse-backup-fips` binary. If any check fails the workflow stops, so a non-FIPS or mis-built image can never be pushed. + +This is an automation/release gate: it re-uses the same expectations already covered by the assertions in this section (no new requirement is introduced). The script verifies: + +| Check | Performed against | Expected Result | +| --- | --- | --- | +| Baked-in environment | `docker image inspect` of the `image_fips` image | `Config.Env` contains `GODEBUG=fips140=only` | +| Version FIPS indicator | `clickhouse-backup --version` run inside the image | Output contains `FIPS 140-3: true` | +| Build metadata | `go version -m clickhouse-backup-fips` (binary copied out of the image when `--binary` is omitted) | Reports `GOFIPS140=v1.0.0` (or a `v1.0.0-` snapshot), the `fips140v1.0` build tag, `DefaultGODEBUG=fips140=on` and `CGO_ENABLED=0` | ## Human Resources And Assignments From 395271dda5a9d2323171f00a99c5781b8ee8d295 Mon Sep 17 00:00:00 2001 From: vitaliis Date: Wed, 17 Jun 2026 17:56:17 -0400 Subject: [PATCH 17/24] Added coverage reporting for FIPS related tests --- .github/workflows/build.yaml | 10 ++++++++++ test/testflows/run.sh | 18 ++++++++++++++++++ 2 files changed, 28 insertions(+) diff --git a/.github/workflows/build.yaml b/.github/workflows/build.yaml index 8dcaf052..d377a215 100644 --- a/.github/workflows/build.yaml +++ b/.github/workflows/build.yaml @@ -261,6 +261,15 @@ jobs: source ~/venv/qa/bin/activate export CLICKHOUSE_TESTS_DIR=$(pwd)/test/testflows/clickhouse_backup ./test/testflows/run.sh + - name: Report FIPS testflows coverage + uses: coverallsapp/github-action@v2 + with: + fail-on-error: false + base-path: ./ + file: test/testflows/_coverage_/coverage.out + parallel: true + format: golang + flag-name: testflows-${{ matrix.clickhouse }} - name: Fix FIPS log permissions for artifact upload if: always() @@ -414,6 +423,7 @@ jobs: needs: - test - testflows + - testflows_fips name: coverage runs-on: ubuntu-24.04 steps: diff --git a/test/testflows/run.sh b/test/testflows/run.sh index c7b373e4..804cc67b 100755 --- a/test/testflows/run.sh +++ b/test/testflows/run.sh @@ -141,6 +141,24 @@ if command -v tfs &>/dev/null && [[ -f "${RAW_LOG}" ]]; then tfs ${TFS_FLAGS} transform compact "${RAW_LOG}" "${CUR_DIR}/compact.log" || true tfs ${TFS_FLAGS} transform nice "${RAW_LOG}" "${CUR_DIR}/nice.log.txt" || true tfs ${TFS_FLAGS} transform short "${RAW_LOG}" "${CUR_DIR}/short.log.txt" || true + + # FIPS requirements coverage report (HTML artifact). The FIPS suite has its + # own specification backed by requirements/fips/requirements.py, distinct from + # the main suite. The requirements source is passed explicitly (instead of + # '-' = read specs from the log, which would merge both specifications that + # regression.py declares into one report) so the report is scoped to the FIPS + # specification only. Generated only for the FIPS run (its CI job sets + # RUN_TESTS to the FIPS feature); the main run is unaffected. This is the + # requirements-coverage HTML produced manually before via `tfs report + # coverage`; Go source-code coverage is reported separately to Coveralls. + FIPS_REQUIREMENTS="${CUR_DIR}/clickhouse_backup/requirements/fips/requirements.py" + if [[ "${RUN_TESTS}" == *FIPS* && -f "${FIPS_REQUIREMENTS}" ]]; then + tfs ${TFS_FLAGS} report coverage "${FIPS_REQUIREMENTS}" "${RAW_LOG}" \ + --confidential --copyright "Altinity LTD" --logo "${CUR_DIR}/altinity.png" \ + --title "ClickHouse Backup FIPS Requirements Coverage" \ + | tfs ${TFS_FLAGS} document convert > "${CUR_DIR}/fips_coverage.html" || true + fi + if [[ -n "${GITHUB_SERVER_URL}" && -n "${GITHUB_REPOSITORY}" && -n "${GITHUB_RUN_ID}" ]]; then tfs ${TFS_FLAGS} report results \ -a "${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}/actions/runs/${GITHUB_RUN_ID}/" \ From 78bda1f0f229731d3af61a260bcab0cc2db842c3 Mon Sep 17 00:00:00 2001 From: vitaliis Date: Wed, 17 Jun 2026 19:04:08 -0400 Subject: [PATCH 18/24] Removed unused fips.xml --- .../configs/clickhouse/config.d/fips.xml | 20 ------------------- 1 file changed, 20 deletions(-) delete mode 100644 test/testflows/clickhouse_backup/configs/clickhouse/config.d/fips.xml diff --git a/test/testflows/clickhouse_backup/configs/clickhouse/config.d/fips.xml b/test/testflows/clickhouse_backup/configs/clickhouse/config.d/fips.xml deleted file mode 100644 index d1403ce6..00000000 --- a/test/testflows/clickhouse_backup/configs/clickhouse/config.d/fips.xml +++ /dev/null @@ -1,20 +0,0 @@ - - - - - - - - - 8443 - 9440 - - - ECDHE-RSA-AES128-GCM-SHA256:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:AES128-GCM-SHA256:AES256-GCM-SHA384 - TLS_AES_128_GCM_SHA256:TLS_AES_256_GCM_SHA384 - true - sslv2,sslv3,tlsv1,tlsv1_1 - relaxed - - - From ae0ad3508d34c023648aa1cc16ef7a496998ae1c Mon Sep 17 00:00:00 2001 From: vitaliis Date: Thu, 18 Jun 2026 11:22:09 -0400 Subject: [PATCH 19/24] ACVP tests are automatically executed in --stress mode --- test/testflows/clickhouse_backup/tests/fips_140_3.py | 11 +++++++---- 1 file changed, 7 insertions(+), 4 deletions(-) diff --git a/test/testflows/clickhouse_backup/tests/fips_140_3.py b/test/testflows/clickhouse_backup/tests/fips_140_3.py index 0b544eef..7830a500 100644 --- a/test/testflows/clickhouse_backup/tests/fips_140_3.py +++ b/test/testflows/clickhouse_backup/tests/fips_140_3.py @@ -1375,7 +1375,8 @@ def acvp_tests(self): and asserts it exits 0 and prints the expected line tracked in `FIPS_ACVP_EXPECTED_OUTPUT`. - Opt-in: skipped unless `RUN_ACVP_TESTS=1` is set. + Opt-in: skipped unless `RUN_ACVP_TESTS=1` is set or the suite runs + with `--stress` option. """ # ACVP wrapper scenario opt-in. # Set `RUN_ACVP_TESTS=1` locally or in the CI workflow to enable it. @@ -1393,10 +1394,12 @@ def acvp_tests(self): # a boringssl clone, an acvptool build, and then the ACVP run itself. FIPS_ACVP_TIMEOUT_SEC = 30 * 60 flag = os.environ.get(FIPS_ACVP_ENV_FLAG, "").strip().lower() - if flag not in FIPS_ACVP_ENV_FLAG_VALUES: + # `--stress` runs the full FIPS coverage, so enable ACVP tests automatically + # there even when `RUN_ACVP_TESTS` is unset. + if flag not in FIPS_ACVP_ENV_FLAG_VALUES and not self.context.stress: skip( - f"set {FIPS_ACVP_ENV_FLAG}=1 to enable; the wrapper pulls " - f"Docker images and clones upstream repos." + f"set {FIPS_ACVP_ENV_FLAG}=1 (or run with `--stress`) to enable; " + f"the wrapper pulls Docker images and clones upstream repos." ) cluster = self.context.cluster From 2753fc14068504473c61a33bcb6bc6d7d90c7d4c Mon Sep 17 00:00:00 2001 From: vitaliis Date: Thu, 18 Jun 2026 17:27:08 -0400 Subject: [PATCH 20/24] Small improvement to the test plan --- .../requirements/fips/QA_STP_ClickHouse_Backup_FIPS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/test/testflows/clickhouse_backup/requirements/fips/QA_STP_ClickHouse_Backup_FIPS.md b/test/testflows/clickhouse_backup/requirements/fips/QA_STP_ClickHouse_Backup_FIPS.md index 7e63ea07..22998f87 100644 --- a/test/testflows/clickhouse_backup/requirements/fips/QA_STP_ClickHouse_Backup_FIPS.md +++ b/test/testflows/clickhouse_backup/requirements/fips/QA_STP_ClickHouse_Backup_FIPS.md @@ -359,7 +359,7 @@ Invoke `pkg/acvpwrapper/run.sh` against the FIPS-built binary. Expected result: * The wrapper runs the algorithm test vectors and exits successfully with no failures across the run. -* This check is optional in automation and runs only when `RUN_ACVP_TESTS=1` is set. +* This check is optional in automation and runs only when `RUN_ACVP_TESTS=1` is set or when tests are being run is --stress mode. ## Server Listening-Port Assertion From 90f58e7ce899765c7259494437d490ac50fa41e2 Mon Sep 17 00:00:00 2001 From: vitaliis Date: Thu, 18 Jun 2026 17:58:52 -0400 Subject: [PATCH 21/24] Updated ClickHouse Backup FIPS test plan --- .../fips/QA_STP_ClickHouse_Backup_FIPS.md | 118 +++++++++++++----- 1 file changed, 88 insertions(+), 30 deletions(-) diff --git a/test/testflows/clickhouse_backup/requirements/fips/QA_STP_ClickHouse_Backup_FIPS.md b/test/testflows/clickhouse_backup/requirements/fips/QA_STP_ClickHouse_Backup_FIPS.md index 22998f87..0bc55524 100644 --- a/test/testflows/clickhouse_backup/requirements/fips/QA_STP_ClickHouse_Backup_FIPS.md +++ b/test/testflows/clickhouse_backup/requirements/fips/QA_STP_ClickHouse_Backup_FIPS.md @@ -12,6 +12,7 @@ * 1 [Introduction](#introduction) * 2 [Timeline](#timeline) * 3 [Configuration Requirements](#configuration-requirements) + * 3.1 [Supported TLS Protocol Versions and Cipher Suites](#supported-tls-protocol-versions-and-cipher-suites) * 4 [Build Verification](#build-verification) * 4.1 [Pre-Publish Image Verification](#pre-publish-image-verification) * 5 [Human Resources And Assignments](#human-resources-and-assignments) @@ -73,9 +74,35 @@ For TLS policy validation, the test suite also uses OpenSSL probe tools: - `openssl s_client` (acts as a TLS client to test inbound API listener policy) - `openssl s_server` (acts as a TLS server to test outbound client policy) +### Supported TLS Protocol Versions and Cipher Suites + +All TLS endpoints of `clickhouse-backup-fips` (the inbound REST API listener and the outbound clients to ClickHouse and S3) follow a single FIPS profile. This profile is what every TLS scenario in this plan asserts. + +**Protocol versions:** + +| Protocol | Status | +| -------- | ------ | +| TLSv1.3 | Supported | +| TLSv1.2 | Supported | +| TLSv1.1 | Rejected (below FIPS minimum) | +| TLSv1.0 | Rejected (below FIPS minimum) | +| SSLv2 / SSLv3 | Rejected | + +**FIPS-approved cipher suites (the only ones that may negotiate):** + +| TLS version | Approved suites | +| ----------- | --------------- | +| TLSv1.3 | `TLS_AES_128_GCM_SHA256`, `TLS_AES_256_GCM_SHA384` | +| TLSv1.2 | `ECDHE-RSA-AES128-GCM-SHA256`, `ECDHE-RSA-AES256-GCM-SHA384` | + +Everything else MUST be rejected — including ChaCha20-Poly1305, CBC-mode, RC4, 3DES, CCM, DHE / plain-RSA key exchange, and the plain RSA-kx AES-GCM suites (`AES128-GCM-SHA256`, `AES256-GCM-SHA384`). + +> [!NOTE] +> The client cipher set above is narrower than the ClickHouse server `` in the [Altinity FIPS documentation](https://docs.altinity.com/altinitystablebuilds/fips-compatible-altinity-builds/#configuration-of-altinity-stable-builds-for-fips-compatible-operation): that server config also permits the non-ECDHE `AES128-GCM-SHA256` / `AES256-GCM-SHA384` suites, but `clickhouse-backup-fips` rejects them because the Go FIPS module approves only the forward-secret ECDHE AES-GCM suites listed above. + ## Build Verification -**Objective:** Verify binaries are FIPS builds and linked to Go Cryptographic Module v1.0.0. +**Objective:** Verify that `clickhouse-backup-fips` is a FIPS build linked to the Go Cryptographic Module v1.0.0 (reports `FIPS 140-3: true`, built with `GOFIPS140=v1.0.0`), and that the regular `clickhouse-backup` binary is not (reports `FIPS 140-3: false`). **Certificates:** - [CMVP #5247](https://csrc.nist.gov/projects/cryptographic-module-validation-program/certificate/5247) @@ -84,7 +111,8 @@ For TLS policy validation, the test suite also uses OpenSSL probe tools: | Test Assertion | Description | Expected Result | | --- | --- | --- | -| FIPS indicator in binary version output | Run `clickhouse-backup-fips --version` (`clickhouse_backup_fips_version_output`) and run control check on non-FIPS binary (`clickhouse_backup_fips_version_output_negative_check`) | FIPS binary reports `FIPS 140-3: true`; non-FIPS binary does not report `true` | +| FIPS indicator in FIPS binary version output | Run `clickhouse-backup-fips --version` (`clickhouse_backup_fips_version_output`) | FIPS binary reports `FIPS 140-3: true` | +| FIPS indicator absent in non-FIPS binary (control check) | Run `clickhouse-backup --version` on the regular binary (`clickhouse_backup_fips_version_output_negative_check`) | Non-FIPS binary does not report `FIPS 140-3: true` (reports `false`) | | Build flag | Run `go version -m clickhouse-backup-fips` (`gofips140_build_flags_present`) | Output contains `build GOFIPS140=v1.0.0` | | FIPS runtime posture across Go modes | Run `godebug_fips140_modes`, which runs `clickhouse-backup-fips --fips-info` with `GODEBUG` unset, empty, `fips140=off`, `fips140=on`, and `fips140=only` | For each mode, `--fips-info` reports the expected `enabled` / `enforced` flags (unset/empty/on → `true`/`false`; off → `false`/`false`; only → `true`/`true`) | @@ -148,7 +176,12 @@ The following artifacts and tools will be used: * `openssl` CLI tool on the test host for TLS client and server probes. > [!NOTE] -> The regression sets `GODEBUG` per command rather than at the FIPS container level. The `godebug_fips140_modes` scenario covers every mode documented in [GODEBUG fips140 Modes](#godebug-fips140-modes) (`unset`, empty, `fips140=off`, `fips140=on`, `fips140=only`), and the forced-CAST scenario also injects `GODEBUG=failfipscast=,fips140=on`; a single container-level value would prevent the matrix and the negative-self-test path from running. The Altinity FIPS Docker image still ships with `GODEBUG=fips140=only` as documented in [FIPS Configuration](#fips-configuration); that default is honored when the image is run as-is. +> Each scenario sets `GODEBUG` explicitly per command rather than once at the container level: +> +> * [GODEBUG `fips140` Modes](#godebug-fips140-modes) (`godebug_fips140_modes`) runs `--fips-info` under `GODEBUG` unset, empty, `fips140=off`, `fips140=on`, and `fips140=only`. +> * [Forced CAST Failures](#forced-cast-failures) (`forced_cast_failures`) runs `--version` under `GODEBUG=failfipscast=,fips140=only`. +> +> The Altinity FIPS Docker image ships with `GODEBUG=fips140=only` baked in (see [FIPS-Compatible `clickhouse-backup-fips` Configuration](#fips-compatible-clickhouse-backup-fips-configuration)); that default applies when the image is run as-is. ## Inputs and Outputs of `clickhouse-backup-fips` @@ -158,7 +191,7 @@ The following artifacts and tools will be used: * Outbound to ClickHouse: secure native TCP port `9440` (`clickhouse.secure: true`, `clickhouse.port: 9440`). Plain native TCP `9000` and plain HTTP `8123` MUST NOT be used by `clickhouse-backup-fips`. * Outbound to S3-compatible storage: HTTPS to the AWS FIPS hostname `s3-fips..amazonaws.com:443` when `s3.endpoint` is empty and `s3.region` is set. -The [Server Listening-Port Assertion](#server-listening-port-assertion) subsection below describes how the inbound surface is verified. +The [Server Listening-Port Assertion](#server-listening-port-assertion) section describes how the inbound surface is verified. ## Connectivity Against ClickHouse FIPS and Non-FIPS Servers @@ -222,66 +255,73 @@ To set each case explicitly (independent of any container-level `GODEBUG`): Check that the FIPS startup integrity self-test stops the binary if the FIPS module bytes have been modified. -Take a copy of `clickhouse-backup-fips`, corrupt its `.go.fipsinfo` checksum section, and try to run the tampered copy. +Take a copy of `clickhouse-backup-fips`, corrupt its `.go.fipsinfo` checksum section, and try to run the tampered copy. The scenario (`fips_integrity_self_test_failure_on_tampered_binary`) runs `scripts/tamper_go_fips_checksum.sh`, which operates only on a temporary copy of the read-only original binary, so other scenarios are unaffected. Expected result: * The tampered binary panics on startup with `panic: fips140: verification mismatch` and exits with a non-zero exit code. -* The unmodified original binary continues to work normally. +* The tamper script prints its explicit success marker `== OK: FIPS integrity check failed as expected ==` and exits `0` (its success contract). +* The unmodified original binary continues to work normally (the script tampers only the copy). ## Forced CAST Failures +Check that the FIPS module aborts when a Cryptographic Algorithm Self-Test (CAST) fails. The Go FIPS module exposes a `GODEBUG=failfipscast=` hook that simulates a CAST failure for one named self-test. -Check that the FIPS module refuses to start if any startup self-test fails. - -Run the FIPS binary with the `GODEBUG=failfipscast` hook, substituting one self-test name at a time, for example: +The scenario `forced_cast_failures` forces one CAST at a time, for example: ``` -GODEBUG=failfipscast=SHA2-256,fips140=on clickhouse-backup-fips --version +env 'GODEBUG=failfipscast=SHA2-256,fips140=only' clickhouse-backup-fips --version ``` -`SHA2-256` in the command above can be replaced with any effective CAST name from the list below: +CASTs fall into two groups, and the behavior differs by group — the scenario asserts each accordingly: + +* **Startup CASTs** — always exercised during `clickhouse-backup-fips --version`. Forcing one MUST abort startup. +* **First-use CASTs** — run lazily, only the first time their algorithm is used. `--version` does not necessarily reach them, so forcing one is only expected to abort if and when it is actually exercised; otherwise startup stays clean. + +Startup CASTs (forcing any one MUST abort `--version`): ``` AES-CBC CTR_DRBG CounterKDF +HKDF-SHA2-256 +HMAC-SHA2-256 +PBKDF2 +SHA2-256 +SHA2-512 +TLSv1.2-SHA2-256 +TLSv1.3-SHA2-256 +cSHAKE128 +``` + +First-use CASTs (forcing one aborts only if `--version` happens to exercise that algorithm; otherwise startup stays clean): + +``` DetECDSA P-256 SHA2-512 sign ECDH PCT ECDSA P-256 SHA2-512 sign and verify ECDSA PCT Ed25519 sign and verify Ed25519 sign and verify PCT -HKDF-SHA2-256 -HMAC-SHA2-256 KAS-ECC-SSC P-256 ML-DSA sign and verify PCT ML-DSA-44 ML-KEM PCT -ML-KEM PCT ML-KEM-768 -PBKDF2 RSA sign and verify PCT RSASSA-PKCS-v1.5 2048-bit sign and verify -SHA2-256 -SHA2-512 -TLSv1.2-SHA2-256 -TLSv1.3-SHA2-256 -cSHAKE128 ``` -The list is taken directly from the Go FIPS test suite (file `crypto/internal/fips140test/cast_test.go` of the Go release in use). - -Expected result for every name in the list: +The names are taken directly from the Go FIPS test suite (the `allCASTs` slice in `crypto/internal/fips140test/cast_test.go` of the Go release in use). -* Baseline run with `GODEBUG=fips140=on clickhouse-backup-fips --version` succeeds. -* The process exits with a non-zero code. -* The output contains `fatal error: FIPS 140-3 self-test failed: : simulated CAST failure`. +Expected result: -How to obtain and refresh this list: +* **For every startup CAST:** the process exits with a non-zero code and the output contains both `self-test failed: ` and `simulated CAST failure` (Go emits the full line `fatal error: FIPS 140-3 self-test failed: : simulated CAST failure`). +* **For every first-use CAST:** either the same abort markers appear (if the algorithm was exercised), or — when the algorithm is not reached by `--version` — startup stays clean: the process exits `0` and the `simulated CAST failure` marker is absent. -* Open `$(go env GOROOT)/src/crypto/internal/fips140test/cast_test.go` and copy the entries from the `allCASTs` slice. +How to obtain and refresh this list: +* Open `$(go env GOROOT)/src/crypto/internal/fips140test/cast_test.go` and copy the entries from the `allCASTs` slice. Each entry's group (startup vs first-use) follows how Go registers it — keep `FIPS_FAILFIPSCAST_STARTUP_CASTS` and `FIPS_FAILFIPSCAST_FIRST_USE_CASTS` in `tests/fips_140_3.py` in sync. * The list should be refreshed when the Go version used to build `clickhouse-backup-fips` is upgraded, because new algorithms may add/rename/remove entries. ## Inbound TLS — REST API With `openssl s_client` @@ -308,6 +348,8 @@ Non-FIPS profiles (handshake MUST be rejected): * `openssl s_client -tls1` — expected result: handshake is rejected (TLSv1.0 is below the FIPS minimum protocol version). * `openssl s_client -tls1_1` — expected result: handshake is rejected (TLSv1.1 is below the FIPS minimum protocol version). +Additional non-FIPS TLSv1.2 ciphers exercised only under `--stress` (all MUST be rejected): `ECDHE-ECDSA-CHACHA20-POLY1305`, `DHE-RSA-CHACHA20-POLY1305`, `ECDHE-RSA-AES128-SHA`, `ECDHE-RSA-AES256-SHA`, `ECDHE-RSA-AES128-SHA256`, `ECDHE-RSA-AES256-SHA384`, `AES128-SHA`, `AES256-SHA`, `AES128-SHA256`, `AES256-SHA256` (RC4-SHA / DES-CBC3-SHA above are also kept). Additional non-FIPS TLSv1.3 suites under `--stress`: `TLS_AES_128_CCM_SHA256`, `TLS_AES_128_CCM_8_SHA256`. The default run probes the minimum set above; the wider `--stress` coverage is slower by design. + ## Outbound TLS to ClickHouse Server With `openssl s_server` @@ -327,10 +369,15 @@ Non-FIPS profiles (handshake MUST be rejected): * `openssl s_server -tls1_2 -cipher ECDHE-RSA-CHACHA20-POLY1305` — expected result: `clickhouse-backup-fips` fails with `remote error: tls: handshake failure` and `openssl s_server` reports `no shared cipher`. * `openssl s_server -tls1_2 -cipher DHE-RSA-AES256-GCM-SHA384` — expected result: handshake is rejected as above. * `openssl s_server -tls1_2 -cipher DHE-RSA-AES128-GCM-SHA256` — expected result: handshake is rejected as above. -* `openssl s_server -tls1_2 -cipher AES256-GCM-SHA384` — expected result: handshake is rejected as above. -* `openssl s_server -tls1_2 -cipher AES128-GCM-SHA256` — expected result: handshake is rejected as above. +* `openssl s_server -tls1_2 -cipher AES256-GCM-SHA384` — expected result: handshake is rejected as above (plain RSA key exchange, no forward secrecy). +* `openssl s_server -tls1_2 -cipher AES128-GCM-SHA256` — expected result: handshake is rejected as above (plain RSA key exchange, no forward secrecy). * `openssl s_server -tls1_3 -ciphersuites TLS_CHACHA20_POLY1305_SHA256` — expected result: handshake is rejected as above. +Additional non-FIPS TLSv1.2 ciphers exercised only under `--stress` (all MUST be rejected): `ECDHE-ECDSA-CHACHA20-POLY1305`, `DHE-RSA-CHACHA20-POLY1305`, `ECDHE-RSA-AES128-SHA`, `ECDHE-RSA-AES256-SHA`, `ECDHE-RSA-AES128-SHA256`, `ECDHE-RSA-AES256-SHA384`, `AES128-SHA`, `AES256-SHA`, `AES128-SHA256`, `AES256-SHA256`. Additional non-FIPS TLSv1.3 suites under `--stress`: `TLS_AES_128_CCM_SHA256`, `TLS_AES_128_CCM_8_SHA256`. The default run probes the minimum set above; the wider `--stress` coverage is slower by design. + +> [!NOTE] +> `AES128-GCM-SHA256` and `AES256-GCM-SHA384` appear in the Altinity ClickHouse server's `` (server side) but are rejected here by the `clickhouse-backup-fips` **client**. The Go FIPS cryptographic module approves a narrower TLSv1.2 set than the ClickHouse OpenSSL configuration: only ECDHE forward-secret AES-GCM suites (`ECDHE-RSA-AES128-GCM-SHA256`, `ECDHE-RSA-AES256-GCM-SHA384`) are approved on the client; plain RSA key-exchange suites are not. See [Supported TLS Protocol Versions and Cipher Suites](#supported-tls-protocol-versions-and-cipher-suites). + ## Outbound TLS to S3 Endpoint With `openssl s_server` @@ -350,6 +397,17 @@ FIPS-approved profiles (handshake MUST be accepted by `clickhouse-backup-fips` p * `openssl s_server -tls1_3 -ciphersuites TLS_AES_128_GCM_SHA256` — expected result: same as above. * `openssl s_server -tls1_3 -ciphersuites TLS_AES_256_GCM_SHA384` — expected result: same as above. +Non-FIPS profiles (handshake MUST be rejected by `clickhouse-backup-fips` policy): + +* `openssl s_server -tls1_3 -ciphersuites TLS_CHACHA20_POLY1305_SHA256` — expected result: the FIPS client refuses the handshake (`remote error: tls: handshake failure` / `no shared cipher`). +* `openssl s_server -tls1_2 -cipher ECDHE-RSA-CHACHA20-POLY1305` — expected result: rejected as above. +* `openssl s_server -tls1_2 -cipher DHE-RSA-AES256-GCM-SHA384` — expected result: rejected as above. +* `openssl s_server -tls1_2 -cipher DHE-RSA-AES128-GCM-SHA256` — expected result: rejected as above. +* `openssl s_server -tls1_2 -cipher AES256-GCM-SHA384` — expected result: rejected as above. +* `openssl s_server -tls1_2 -cipher AES128-GCM-SHA256` — expected result: rejected as above. + +The same `--stress` extension as the ClickHouse outbound section applies (the wider TLSv1.2 CBC / ChaCha20 and TLSv1.3 CCM sets). For the S3 probes only, if the AWS FIPS hostname resolves to public AWS instead of the local `openssl s_server` sidecar, a remote AWS auth error (e.g. `InvalidAccessKeyId`) with no TLS-rejection marker skips the negative check rather than failing it. + ## ACVP Tests Run the ACVP (Automated Cryptographic Validation Protocol) wrapper bundled with `clickhouse-backup`. This part is required for FIPS compatibity, but tests can be executed optionally. From 73f303c14b43296c95ca943e3ac91f61a2156d9e Mon Sep 17 00:00:00 2001 From: vitaliis Date: Thu, 18 Jun 2026 23:32:40 -0400 Subject: [PATCH 22/24] Improvements to ClickHouse Backup FIPS Compatibility test plan. --- .../requirements/fips/QA_STP_ClickHouse_Backup_FIPS.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/test/testflows/clickhouse_backup/requirements/fips/QA_STP_ClickHouse_Backup_FIPS.md b/test/testflows/clickhouse_backup/requirements/fips/QA_STP_ClickHouse_Backup_FIPS.md index 0bc55524..7b2347a3 100644 --- a/test/testflows/clickhouse_backup/requirements/fips/QA_STP_ClickHouse_Backup_FIPS.md +++ b/test/testflows/clickhouse_backup/requirements/fips/QA_STP_ClickHouse_Backup_FIPS.md @@ -176,7 +176,7 @@ The following artifacts and tools will be used: * `openssl` CLI tool on the test host for TLS client and server probes. > [!NOTE] -> Each scenario sets `GODEBUG` explicitly per command rather than once at the container level: +> The FIPS backup container exports `GODEBUG` at the container level (the regression `--fips-godebug` option, default `fips140=only`), so it applies to every command by default. Two scenarios override `GODEBUG` per command because they require other values: > > * [GODEBUG `fips140` Modes](#godebug-fips140-modes) (`godebug_fips140_modes`) runs `--fips-info` under `GODEBUG` unset, empty, `fips140=off`, `fips140=on`, and `fips140=only`. > * [Forced CAST Failures](#forced-cast-failures) (`forced_cast_failures`) runs `--version` under `GODEBUG=failfipscast=,fips140=only`. From 0dd3ded2b9a3c32f1baae70d4ea39fb52a923df6 Mon Sep 17 00:00:00 2001 From: vitaliis Date: Mon, 22 Jun 2026 16:11:59 -0400 Subject: [PATCH 23/24] Increase timeout for container health check during restart to 12min --- test/integration/containers.go | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/test/integration/containers.go b/test/integration/containers.go index 112063ac..d6d86913 100644 --- a/test/integration/containers.go +++ b/test/integration/containers.go @@ -324,7 +324,8 @@ func (tc *TestContainers) RestartContainer(t *testing.T, name string) error { if err := tc.client.ContainerRestart(ctx, info.ID, container.StopOptions{Timeout: &timeout}); err != nil { return err } - return tc.waitHealthy(ctx, name, 10*time.Minute, t.Name()) + // 12min restart headroom. + return tc.waitHealthy(ctx, name, 12*time.Minute, t.Name()) } func (tc *TestContainers) waitHealthy(ctx context.Context, name string, timeout time.Duration, testName string) error { From 27d32898414212def02e469e1f8b8896650753fe Mon Sep 17 00:00:00 2001 From: vitaliis Date: Tue, 23 Jun 2026 14:45:59 -0400 Subject: [PATCH 24/24] - Removed unused comment. - Added a new FIPS requirement --- test/integration/containers.go | 1 - .../requirements/fips/requirements.md | 13 ++++-- .../requirements/fips/requirements.py | 44 +++++++++++++++---- .../clickhouse_backup/tests/fips_140_3.py | 3 +- 4 files changed, 46 insertions(+), 15 deletions(-) diff --git a/test/integration/containers.go b/test/integration/containers.go index d6d86913..210a1743 100644 --- a/test/integration/containers.go +++ b/test/integration/containers.go @@ -324,7 +324,6 @@ func (tc *TestContainers) RestartContainer(t *testing.T, name string) error { if err := tc.client.ContainerRestart(ctx, info.ID, container.StopOptions{Timeout: &timeout}); err != nil { return err } - // 12min restart headroom. return tc.waitHealthy(ctx, name, 12*time.Minute, t.Name()) } diff --git a/test/testflows/clickhouse_backup/requirements/fips/requirements.md b/test/testflows/clickhouse_backup/requirements/fips/requirements.md index 98b6c3cc..c558dc11 100644 --- a/test/testflows/clickhouse_backup/requirements/fips/requirements.md +++ b/test/testflows/clickhouse_backup/requirements/fips/requirements.md @@ -18,9 +18,10 @@ * 4.1.1 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GoCryptographicModule](#rqsrs-013clickhousebackuputilityfipsgocryptographicmodule) * 4.1.2 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Build.GOFIPS140](#rqsrs-013clickhousebackuputilityfipsbuildgofips140) * 4.1.3 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Binary](#rqsrs-013clickhousebackuputilityfipsbinary) - * 4.1.4 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Approved.TLSProtocolVersions](#rqsrs-013clickhousebackuputilityfipsapprovedtlsprotocolversions) - * 4.1.5 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Approved.CipherSuites.TLSv12.Approved](#rqsrs-013clickhousebackuputilityfipsapprovedciphersuitestlsv12approved) - * 4.1.6 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Approved.CipherSuites.TLSv13.Approved](#rqsrs-013clickhousebackuputilityfipsapprovedciphersuitestlsv13approved) + * 4.1.4 [RQ.SRS-013.ClickHouse.BackupUtility.non-FIPS.Binary](#rqsrs-013clickhousebackuputilitynon-fipsbinary) + * 4.1.5 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Approved.TLSProtocolVersions](#rqsrs-013clickhousebackuputilityfipsapprovedtlsprotocolversions) + * 4.1.6 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Approved.CipherSuites.TLSv12.Approved](#rqsrs-013clickhousebackuputilityfipsapprovedciphersuitestlsv12approved) + * 4.1.7 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Approved.CipherSuites.TLSv13.Approved](#rqsrs-013clickhousebackuputilityfipsapprovedciphersuitestlsv13approved) * 4.2 [Connectivity](#connectivity) * 4.2.1 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Connectivity.FIPSEndpoint](#rqsrs-013clickhousebackuputilityfipsconnectivityfipsendpoint) * 4.2.2 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Connectivity.NonFIPSEndpoint](#rqsrs-013clickhousebackuputilityfipsconnectivitynonfipsendpoint) @@ -127,6 +128,12 @@ version: 1.0 The FIPS-compatible build of the [clickhouse-backup] utility SHALL be distributed as a separate binary named `clickhouse-backup-fips`, distinct from the standard `clickhouse-backup` binary. +#### RQ.SRS-013.ClickHouse.BackupUtility.non-FIPS.Binary +version: 1.0 + +The regular build of the [clickhouse-backup] utility SHALL be distributed as a +binary named `clickhouse-backup`, that SHALL report ``FIPS 140-3: false`` + #### RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Approved.TLSProtocolVersions version: 1.0 diff --git a/test/testflows/clickhouse_backup/requirements/fips/requirements.py b/test/testflows/clickhouse_backup/requirements/fips/requirements.py index 7a5b656c..d89b8033 100644 --- a/test/testflows/clickhouse_backup/requirements/fips/requirements.py +++ b/test/testflows/clickhouse_backup/requirements/fips/requirements.py @@ -57,6 +57,23 @@ num='4.1.3' ) +RQ_SRS_013_ClickHouse_BackupUtility_non_FIPS_Binary = Requirement( + name='RQ.SRS-013.ClickHouse.BackupUtility.non-FIPS.Binary', + version='1.0', + priority=None, + group=None, + type=None, + uid=None, + description=( + 'The regular build of the [clickhouse-backup] utility SHALL be distributed as a\n' + 'binary named `clickhouse-backup`, that SHALL report ``FIPS 140-3:\tfalse``\n' + '\n' + ), + link=None, + level=3, + num='4.1.4' +) + RQ_SRS_013_ClickHouse_BackupUtility_FIPS_Approved_TLSProtocolVersions = Requirement( name='RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Approved.TLSProtocolVersions', version='1.0', @@ -72,7 +89,7 @@ ), link=None, level=3, - num='4.1.4' + num='4.1.5' ) RQ_SRS_013_ClickHouse_BackupUtility_FIPS_Approved_CipherSuites_TLSv12_Approved = Requirement( @@ -96,7 +113,7 @@ ), link=None, level=3, - num='4.1.5' + num='4.1.6' ) RQ_SRS_013_ClickHouse_BackupUtility_FIPS_Approved_CipherSuites_TLSv13_Approved = Requirement( @@ -116,7 +133,7 @@ ), link=None, level=3, - num='4.1.6' + num='4.1.7' ) RQ_SRS_013_ClickHouse_BackupUtility_FIPS_Connectivity_FIPSEndpoint = Requirement( @@ -632,9 +649,10 @@ Heading(name='RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GoCryptographicModule', level=3, num='4.1.1'), Heading(name='RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Build.GOFIPS140', level=3, num='4.1.2'), Heading(name='RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Binary', level=3, num='4.1.3'), - Heading(name='RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Approved.TLSProtocolVersions', level=3, num='4.1.4'), - Heading(name='RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Approved.CipherSuites.TLSv12.Approved', level=3, num='4.1.5'), - Heading(name='RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Approved.CipherSuites.TLSv13.Approved', level=3, num='4.1.6'), + Heading(name='RQ.SRS-013.ClickHouse.BackupUtility.non-FIPS.Binary', level=3, num='4.1.4'), + Heading(name='RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Approved.TLSProtocolVersions', level=3, num='4.1.5'), + Heading(name='RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Approved.CipherSuites.TLSv12.Approved', level=3, num='4.1.6'), + Heading(name='RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Approved.CipherSuites.TLSv13.Approved', level=3, num='4.1.7'), Heading(name='Connectivity', level=2, num='4.2'), Heading(name='RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Connectivity.FIPSEndpoint', level=3, num='4.2.1'), Heading(name='RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Connectivity.NonFIPSEndpoint', level=3, num='4.2.2'), @@ -675,6 +693,7 @@ RQ_SRS_013_ClickHouse_BackupUtility_FIPS_GoCryptographicModule, RQ_SRS_013_ClickHouse_BackupUtility_FIPS_Build_GOFIPS140, RQ_SRS_013_ClickHouse_BackupUtility_FIPS_Binary, + RQ_SRS_013_ClickHouse_BackupUtility_non_FIPS_Binary, RQ_SRS_013_ClickHouse_BackupUtility_FIPS_Approved_TLSProtocolVersions, RQ_SRS_013_ClickHouse_BackupUtility_FIPS_Approved_CipherSuites_TLSv12_Approved, RQ_SRS_013_ClickHouse_BackupUtility_FIPS_Approved_CipherSuites_TLSv13_Approved, @@ -723,9 +742,10 @@ * 4.1.1 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.GoCryptographicModule](#rqsrs-013clickhousebackuputilityfipsgocryptographicmodule) * 4.1.2 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Build.GOFIPS140](#rqsrs-013clickhousebackuputilityfipsbuildgofips140) * 4.1.3 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Binary](#rqsrs-013clickhousebackuputilityfipsbinary) - * 4.1.4 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Approved.TLSProtocolVersions](#rqsrs-013clickhousebackuputilityfipsapprovedtlsprotocolversions) - * 4.1.5 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Approved.CipherSuites.TLSv12.Approved](#rqsrs-013clickhousebackuputilityfipsapprovedciphersuitestlsv12approved) - * 4.1.6 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Approved.CipherSuites.TLSv13.Approved](#rqsrs-013clickhousebackuputilityfipsapprovedciphersuitestlsv13approved) + * 4.1.4 [RQ.SRS-013.ClickHouse.BackupUtility.non-FIPS.Binary](#rqsrs-013clickhousebackuputilitynon-fipsbinary) + * 4.1.5 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Approved.TLSProtocolVersions](#rqsrs-013clickhousebackuputilityfipsapprovedtlsprotocolversions) + * 4.1.6 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Approved.CipherSuites.TLSv12.Approved](#rqsrs-013clickhousebackuputilityfipsapprovedciphersuitestlsv12approved) + * 4.1.7 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Approved.CipherSuites.TLSv13.Approved](#rqsrs-013clickhousebackuputilityfipsapprovedciphersuitestlsv13approved) * 4.2 [Connectivity](#connectivity) * 4.2.1 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Connectivity.FIPSEndpoint](#rqsrs-013clickhousebackuputilityfipsconnectivityfipsendpoint) * 4.2.2 [RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Connectivity.NonFIPSEndpoint](#rqsrs-013clickhousebackuputilityfipsconnectivitynonfipsendpoint) @@ -832,6 +852,12 @@ The FIPS-compatible build of the [clickhouse-backup] utility SHALL be distributed as a separate binary named `clickhouse-backup-fips`, distinct from the standard `clickhouse-backup` binary. +#### RQ.SRS-013.ClickHouse.BackupUtility.non-FIPS.Binary +version: 1.0 + +The regular build of the [clickhouse-backup] utility SHALL be distributed as a +binary named `clickhouse-backup`, that SHALL report ``FIPS 140-3: false`` + #### RQ.SRS-013.ClickHouse.BackupUtility.FIPS.Approved.TLSProtocolVersions version: 1.0 diff --git a/test/testflows/clickhouse_backup/tests/fips_140_3.py b/test/testflows/clickhouse_backup/tests/fips_140_3.py index 7830a500..716ae453 100644 --- a/test/testflows/clickhouse_backup/tests/fips_140_3.py +++ b/test/testflows/clickhouse_backup/tests/fips_140_3.py @@ -412,8 +412,7 @@ def clickhouse_backup_fips_version_output(self): @TestScenario @Requirements( - RQ_SRS_013_ClickHouse_BackupUtility_FIPS_Binary("1.0"), - RQ_SRS_013_ClickHouse_BackupUtility_FIPS_Version_Status("1.0"), + RQ_SRS_013_ClickHouse_BackupUtility_non_FIPS_Binary("1.0"), ) def clickhouse_backup_fips_version_output_negative_check(self): """Self-check for `clickhouse_backup_fips_version_output`.