Skip to content

Add per-pool TSIG key creation gated by AZAwareMode#486

Open
abhiramnarayana wants to merge 1 commit into
openstack-k8s-operators:mainfrom
abhiramnarayana:osprh-30752-per-pool-tsig
Open

Add per-pool TSIG key creation gated by AZAwareMode#486
abhiramnarayana wants to merge 1 commit into
openstack-k8s-operators:mainfrom
abhiramnarayana:osprh-30752-per-pool-tsig

Conversation

@abhiramnarayana

Copy link
Copy Markdown
Contributor

Replace the shared TSIG key model with per-pool keys when AZAwareMode=Enabled. Each pool (including default) gets its own TSIG key with scope=POOL and resource_id set to the actual pool UUID from Designate.

Implement the two-pass creation flow: pools are created without tsigkey_id, pool UUIDs are retrieved, TSIG keys are created per pool, then pools.yaml is regenerated with tsigkey_id populated.

Add lifecycle management: cleanup orphaned TSIG keys when pools are removed, clean transition between per-pool and shared modes. The shared TSIG model remains unchanged when AZAwareMode is not enabled.

Add unit tests for pool generation with TSIG key IDs.

@openshift-ci openshift-ci Bot requested review from abays and omersch381 June 25, 2026 06:55
@abhiramnarayana abhiramnarayana force-pushed the osprh-30752-per-pool-tsig branch from 7d5f435 to c179353 Compare June 25, 2026 07:13

@omersch381 omersch381 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few comments inline

Comment thread internal/designate/tsigkeys.go Outdated
Comment on lines 160 to 163
ID: result["id"].(string),
Name: result["name"].(string),
Algorithm: result["algorithm"].(string),
Secret: result["secret"].(string),

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know you only added the ID field, but I wonder, if any field is missing, or nil for any reason, the operator process panics. Should we add a comma, ok here?

Comment on lines +752 to +754
// If we can't get the parent CR, fall back to shared mode
Log.Info(fmt.Sprintf("Could not get Designate CR for AZ mode check, using shared TSIG model: %v", err))
return r.reconcileSharedTSIGSecret(ctx, instance, helper, multipoolConfig)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we fall back to shared mode if we can't get the parent CR?

}
}
}
// If secret doesn't exist yet, tsigKeyIDs stays nil — pools.yaml is generated without tsigkey_ids (first pass)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and if we have another error? (say, except for secret not found)

tsigKeys, err := r.ensurePerPoolTSIGKeys(ctx, helper, instance, multipoolConfig)
if err != nil {
if strings.Contains(err.Error(), "not ready") || strings.Contains(err.Error(), "no running") ||
strings.Contains(err.Error(), "not yet registered") {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the only error we could actually get here is an error that would contain "not ready".

So we might also want to change the existing pattern at line ~974. Maybe we should switch to errors.Is()

}

// getPerPoolConfigHash generates a hash including ALL pools (for per-pool TSIG model).
func (r *DesignateBackendbind9Reconciler) getPerPoolConfigHash(multipoolConfig *designate.MultipoolConfig) (string, error) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add mDNS IPs as change signal - their IPs are stored as "server" blocks in the bind config files. I.e. if their IPs are modified due to any change, the binds config secret won't be re-written.

Comment on lines +1199 to +1206
// Add key definitions for each pool
for _, poolName := range poolNames {
key := tsigKeys[poolName]
fmt.Fprintf(&config, "key \"%s\" {\n", key.Name)
fmt.Fprintf(&config, " algorithm %s;\n", key.Algorithm)
fmt.Fprintf(&config, " secret \"%s\";\n", key.Secret)
config.WriteString("};\n\n")
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: we might want to validate the key names before we inject them directly to the bind config files.


// reconcilePerPoolTSIGSecrets creates per-pool TSIG keys and stores them in a Secret.
// Every pool (including default) gets its own TSIG key with scope=POOL and resource_id=pool-UUID.
func (r *DesignateBackendbind9Reconciler) reconcilePerPoolTSIGSecrets(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we extract sub-functions from this function? It is very long IMO

@openshift-ci

openshift-ci Bot commented Jun 30, 2026

Copy link
Copy Markdown
Contributor

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: abhiramnarayana
Once this PR has been reviewed and has the lgtm label, please ask for approval from omersch381. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@abhiramnarayana abhiramnarayana force-pushed the osprh-30752-per-pool-tsig branch from 01d4ddd to aa5e523 Compare June 30, 2026 13:21
@omersch381

Copy link
Copy Markdown
Contributor

There are some parts in the code that allow / hint that transition from AZ aware env to non AZ aware (say, removing existing views) is allowed. In such a transition / removal, we'd lose replications, hence we lose data and therefore we shouldn't allow it. I will open another ticket that will add a webhook that catches flipping the AZ aware flag and does nothing when it happens.

Replace the shared TSIG key model with per-pool keys when
AZAwareMode=Enabled. Each pool (including default) gets its own
TSIG key with scope=POOL and resource_id set to the actual pool
UUID from Designate.

Implement the two-pass creation flow: pools are created without
tsigkey_id, pool UUIDs are retrieved, TSIG keys are created per
pool, then pools.yaml is regenerated with tsigkey_id populated.

Add lifecycle management: cleanup orphaned TSIG keys when pools
are removed, clean transition between per-pool and shared modes.
The shared TSIG model remains unchanged when AZAwareMode is not
enabled.

Add unit tests for pool generation with TSIG key IDs.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Signed-off-by: Abhiram R N <abhiramrn@gmail.com>
@omersch381 omersch381 force-pushed the osprh-30752-per-pool-tsig branch from aa5e523 to 935c279 Compare July 2, 2026 14:20

@beagles beagles left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get?PoolConfigHash functions are almost entirely duplicate code. I would like to see the internal logic collapsed into a single helper as suggested and both functions implemented in terms of that.

As mentioned in the file comment, I feel we need to revisit designatebackendbind9_multipool.go and refactor the bits that have nothing to do with BIND9 management or are heavily intertwined with other operator managed resources. The introduction of the TSIG key management and mDNS dependencies have made this pretty unweildy. We might also rename the file to bind9_multipoolextensions.go as it is not a complete controller but additional implementationadded to the controller implemented in designate_backendbind9.go. This is bigger change and can be done after this. I am okay with merging this after the small refactor above is implemented, with the provision we set a task to start refactoring this asap.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of (maybe most?) of these changes are not directly related to the configuration and management of the BIND9 backend. We should revisit and refactor out of here. Note also that while the file is a "multipool" specific file, these are all bound to the DesignateBackendbind9Reconciler type which might lead to some confusing mistakes in the future.

for _, pool := range multipoolConfig.Pools {
keyName := instance.Name + "-" + pool.Name + designate.PerPoolTSIGKeySuffix

poolUUID, exists := poolNameToID[pool.Name]

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want to look at how this works... IIUC, we create a have a deployment change around a pool, and we get this far without having designate "knowing about it". Is there a way to structure things that TSIG key and related items get managed with a view to the pool as it will be created and then do the following changes - the presumption being that the pool manage will eventually create the pools in designate. The question is I suppose what the harm is in pushing configuration in to BIND9 that isn't used. Maybe this will be more clear if we move this out of the BIND9 backend files and work it into the logic closer to where the pools yaml is generated?

for _, poolName := range poolNames {
key := tsigKeys[poolName]
fmt.Fprintf(&config, " %s;\n", key.Name)
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bind configs are getting complicated :) we should consider staging configuration like create a "proposed-bind9-config" and run a named-checkconf job on it and if successful, move the contents from proposed to whatever the runtime secret is called.


sort.Strings(poolNames)

hashInput := "per-pool:" + strings.Join(poolNames, ",") + "|mdns:" + strings.Join(mdnsIPs, ",")

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A lot of duplication between getPerPoolConfigHash and getPoolConfigHash. Why not add a helper with a prefix of "" or "per-pool:"?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants