feat: sanitize ECS snapshot tag keys and add per-BSL/VSL credential support#66
Open
nomanuddin wants to merge 1 commit into
Open
Conversation
Author
…upport
## Tag sanitization (fixes InvalidTagKey.Malformed)
Alibaba Cloud ECS rejects tag keys containing '/' or other characters
outside [a-zA-Z0-9_\-.]. Velero passes standard keys like
'velero.io/backup' and 'kubernetes.io/cluster/<name>' that trigger this
error on every snapshot backup.
- Add sanitizeTagKey() that replaces invalid chars with '_', skips keys
with forbidden prefixes (aliyun, acs:, http://, https://), and
truncates to 128 chars
- Apply sanitization in getTags() and getTagsForCluster()
- Fix dedup in getTags() to track emitted sanitized keys (not raw keys)
to avoid collision when two raw keys map to the same sanitized key
- Add legacyVolumeAZTagKey constant and check both old (slash) and new
(underscore) key in determineVolumeAZ() so existing snapshots restore
to the correct AZ after upgrade
## Per-location credential support
Add three optional BSL/VSL config keys — accessKeyId, accessKeySecret,
and stsToken — that allow per-location credentials to be specified
directly in the location config. When accessKeyId and accessKeySecret
are both present they take priority over all other credential sources
for that location.
Velero v1.10+ also supports spec.credential on BSL/VSL objects, which
references a Kubernetes Secret. Velero mounts the secret and injects
credentialsFile into the plugin config. The plugin reads this via the
existing credentialsFile config key path — no extra code needed. This
enables independent credentials for BSL and VSL on the same deployment.
Both approaches are fully backward compatible: the existing auth methods
(env vars, credentialsFile, RAM role, ECS metadata) are preserved and
used when neither credential source is present.
Add unit tests covering all new code paths.
Signed-off-by: Noman Uddin <noman.uddin@live.com>
52606a8 to
939a0f7
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
1. InvalidTagKey.Malformed on volume backups
Velero passes tag keys such as
velero.io/backupandvelero.io/pvwhencalling
CreateSnapshot. ECS disks may also carry CSI-assigned tags likekubernetes.io/created-for/pvc/name. All of these contain/which isrejected by the Alibaba Cloud ECS tag key validation, causing every volume
backup to fail with
InvalidTagKey.Malformed.2. No per-location credential support
BSL and VSL share the same process-wide env vars, making it impossible to
use different credentials for object store (BSL) and volume snapshots (VSL)
on the same Velero deployment — e.g. a static access key for BSL
alongside an ECS instance RAM role for VSL.
Solution
1. Tag key sanitization (
volume_snapshotter.go)Add
sanitizeTagKey()which:[a-zA-Z0-9_\-\.]with_aliyun,acs:,http://,https://)Applied in
getTags()for both Velero-supplied and volume-copied tags.Also fixes
originalVolumeAZTagKeyconstant which itself contained a/.2. Per-BSL/VSL credential support (
common.go)Add three optional config keys for BackupStorageLocation and VolumeSnapshotLocation:
accessKeyIdaccessKeySecretstsTokenTwo credential patterns are now supported:
Shared credential (existing): a single Kubernetes Secret used for both BSL and VSL,
passed via
--secret-fileorspec.credential— works for most cases.Per-location credential (new): set
spec.credentialon each BSL/VSL to referencea separate Kubernetes Secret (same approach as the AWS plugin). Velero v1.10+ mounts
each secret and injects
credentialsFileinto the plugin config per location.Alternatively, set
accessKeyId+accessKeySecretdirectly in the BSL/VSL configwhen a Kubernetes Secret is not available.
All existing auth methods (env vars,
credentialsFile, RAM role) are fully preservedas fallback: fully backward compatible.
Testing
sanitizeTagKeycovering all edge casesLocal kind cluster validation (Velero 1.11.0, linux/amd64)
spec.credentialon each locationNote on tag sanitization validation: The
sanitizeTagKey()logic is covered by 11 unittests. End-to-end validation on real ECS snapshots is pending deployment to a live cluster
— the fix addresses the
InvalidTagKey.Malformederrors observed in production logs where328+ errors were recorded per backup run.
I noticed PR #65 addresses a similar issue by filtering out tags with forbidden prefixes (
acs:,aliyun,http://,https://). This PR takes a different and more complete approach:velero.io/backup,velero.io/pv) would also be silently discarded since they contain/which is invalid._rather than dropping the tag entirely. This means Velero's own metadata tags are preserved asvelero.io_backup,velero.io_pvetc., keeping the snapshot traceable back to the originating backup.The two fixes are complementary: happy to coordinate or consolidate if the maintainers prefer a single PR.