Skip to content

feat: sanitize ECS snapshot tag keys and add per-BSL/VSL credential support#66

Open
nomanuddin wants to merge 1 commit into
AliyunContainerService:masterfrom
nomanuddin:feat/sanitize-ecs-snapshot-tag-keys
Open

feat: sanitize ECS snapshot tag keys and add per-BSL/VSL credential support#66
nomanuddin wants to merge 1 commit into
AliyunContainerService:masterfrom
nomanuddin:feat/sanitize-ecs-snapshot-tag-keys

Conversation

@nomanuddin

@nomanuddin nomanuddin commented Jun 7, 2026

Copy link
Copy Markdown

Problem

1. InvalidTagKey.Malformed on volume backups

Velero passes tag keys such as velero.io/backup and velero.io/pv when
calling CreateSnapshot. ECS disks may also carry CSI-assigned tags like
kubernetes.io/created-for/pvc/name. All of these contain / which is
rejected by the Alibaba Cloud ECS tag key validation, causing every volume
backup to fail with InvalidTagKey.Malformed.

2. No per-location credential support

BSL and VSL share the same process-wide env vars, making it impossible to
use different credentials for object store (BSL) and volume snapshots (VSL)
on the same Velero deployment — e.g. a static access key for BSL
alongside an ECS instance RAM role for VSL.

Solution

1. Tag key sanitization (volume_snapshotter.go)

Add sanitizeTagKey() which:

  • Replaces any character not in [a-zA-Z0-9_\-\.] with _
  • Truncates to 128 characters
  • Skips keys starting with forbidden prefixes (aliyun, acs:, http://, https://)

Applied in getTags() for both Velero-supplied and volume-copied tags.
Also fixes originalVolumeAZTagKey constant which itself contained a /.

2. Per-BSL/VSL credential support (common.go)

Add three optional config keys for BackupStorageLocation and VolumeSnapshotLocation:

Key Description
accessKeyId Alibaba Cloud Access Key ID
accessKeySecret Alibaba Cloud Access Key Secret
stsToken STS token (optional)

Two credential patterns are now supported:

  1. Shared credential (existing): a single Kubernetes Secret used for both BSL and VSL,
    passed via --secret-file or spec.credential — works for most cases.

  2. Per-location credential (new): set spec.credential on each BSL/VSL to reference
    a separate Kubernetes Secret (same approach as the AWS plugin). Velero v1.10+ mounts
    each secret and injects credentialsFile into the plugin config per location.

Alternatively, set accessKeyId + accessKeySecret directly in the BSL/VSL config
when a Kubernetes Secret is not available.

All existing auth methods (env vars, credentialsFile, RAM role) are fully preserved
as fallback: fully backward compatible.

Testing

  • 11 unit tests for sanitizeTagKey covering all edge cases
  • 5 unit tests for per-location credential priority and fallback behavior

Local kind cluster validation (Velero 1.11.0, linux/amd64)

  • Configured separate Kubernetes secrets per BSL and VSL using spec.credential on each location
  • BSL validated as Available with OSS credentials ✅
  • VSL configured with ECS snapshot credentials ✅
  • Volume backup completed and uploaded to Alibaba OSS bucket successfully ✅
  • Full cluster object backup: credentials authenticated correctly, upload timed out due to local network constraints (not a plugin issue)

Note on tag sanitization validation: The sanitizeTagKey() logic is covered by 11 unit
tests. End-to-end validation on real ECS snapshots is pending deployment to a live cluster
— the fix addresses the InvalidTagKey.Malformed errors observed in production logs where
328+ errors were recorded per backup run.


I noticed PR #65 addresses a similar issue by filtering out tags with forbidden prefixes (acs:, aliyun, http://, https://). This PR takes a different and more complete approach:

  • PR [Patch] 跳过非法标签 #65 drops any tag whose key starts with a forbidden prefix — tags from Velero itself (e.g. velero.io/backup, velero.io/pv) would also be silently discarded since they contain / which is invalid.
  • This PR sanitizes the key by replacing invalid characters with _ rather than dropping the tag entirely. This means Velero's own metadata tags are preserved as velero.io_backup, velero.io_pv etc., keeping the snapshot traceable back to the originating backup.

The two fixes are complementary: happy to coordinate or consolidate if the maintainers prefer a single PR.

@nomanuddin nomanuddin changed the title fix: sanitize ECS snapshot tag keys to avoid InvalidTagKey.Malformed fix: sanitize ECS snapshot tag keys and add per-BSL/VSL inline credential support Jun 7, 2026
@nomanuddin

Copy link
Copy Markdown
Author

@AlbeeSo, mowangdk could you please review this PR when you get a chance? It fixes InvalidTagKey.Malformed errors on volume backups (active issue on clusters using ACK) and adds optional per-BSL/VSL inline credentials

Happy to adjust anything based on your feedback.

…upport

  ## Tag sanitization (fixes InvalidTagKey.Malformed)

  Alibaba Cloud ECS rejects tag keys containing '/' or other characters
  outside [a-zA-Z0-9_\-.]. Velero passes standard keys like
  'velero.io/backup' and 'kubernetes.io/cluster/<name>' that trigger this
  error on every snapshot backup.

  - Add sanitizeTagKey() that replaces invalid chars with '_', skips keys
    with forbidden prefixes (aliyun, acs:, http://, https://), and
    truncates to 128 chars
  - Apply sanitization in getTags() and getTagsForCluster()
  - Fix dedup in getTags() to track emitted sanitized keys (not raw keys)
    to avoid collision when two raw keys map to the same sanitized key
  - Add legacyVolumeAZTagKey constant and check both old (slash) and new
    (underscore) key in determineVolumeAZ() so existing snapshots restore
    to the correct AZ after upgrade

  ## Per-location credential support

  Add three optional BSL/VSL config keys — accessKeyId, accessKeySecret,
  and stsToken — that allow per-location credentials to be specified
  directly in the location config. When accessKeyId and accessKeySecret
  are both present they take priority over all other credential sources
  for that location.

  Velero v1.10+ also supports spec.credential on BSL/VSL objects, which
  references a Kubernetes Secret. Velero mounts the secret and injects
  credentialsFile into the plugin config. The plugin reads this via the
  existing credentialsFile config key path — no extra code needed. This
  enables independent credentials for BSL and VSL on the same deployment.

  Both approaches are fully backward compatible: the existing auth methods
  (env vars, credentialsFile, RAM role, ECS metadata) are preserved and
  used when neither credential source is present.

  Add unit tests covering all new code paths.

Signed-off-by: Noman Uddin <noman.uddin@live.com>
@nomanuddin nomanuddin force-pushed the feat/sanitize-ecs-snapshot-tag-keys branch from 52606a8 to 939a0f7 Compare June 7, 2026 12:34
@nomanuddin nomanuddin changed the title fix: sanitize ECS snapshot tag keys and add per-BSL/VSL inline credential support fix: sanitize ECS snapshot tag keys and add per-BSL/VSL credential support Jun 7, 2026
@nomanuddin nomanuddin changed the title fix: sanitize ECS snapshot tag keys and add per-BSL/VSL credential support feat: sanitize ECS snapshot tag keys and add per-BSL/VSL credential support Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant