Add per-pool TSIG key creation gated by AZAwareMode#486
Conversation
7d5f435 to
c179353
Compare
omersch381
left a comment
There was a problem hiding this comment.
A few comments inline
| ID: result["id"].(string), | ||
| Name: result["name"].(string), | ||
| Algorithm: result["algorithm"].(string), | ||
| Secret: result["secret"].(string), |
There was a problem hiding this comment.
I know you only added the ID field, but I wonder, if any field is missing, or nil for any reason, the operator process panics. Should we add a comma, ok here?
| // If we can't get the parent CR, fall back to shared mode | ||
| Log.Info(fmt.Sprintf("Could not get Designate CR for AZ mode check, using shared TSIG model: %v", err)) | ||
| return r.reconcileSharedTSIGSecret(ctx, instance, helper, multipoolConfig) |
There was a problem hiding this comment.
should we fall back to shared mode if we can't get the parent CR?
| } | ||
| } | ||
| } | ||
| // If secret doesn't exist yet, tsigKeyIDs stays nil — pools.yaml is generated without tsigkey_ids (first pass) |
There was a problem hiding this comment.
and if we have another error? (say, except for secret not found)
| tsigKeys, err := r.ensurePerPoolTSIGKeys(ctx, helper, instance, multipoolConfig) | ||
| if err != nil { | ||
| if strings.Contains(err.Error(), "not ready") || strings.Contains(err.Error(), "no running") || | ||
| strings.Contains(err.Error(), "not yet registered") { |
There was a problem hiding this comment.
I think the only error we could actually get here is an error that would contain "not ready".
So we might also want to change the existing pattern at line ~974. Maybe we should switch to errors.Is()
| } | ||
|
|
||
| // getPerPoolConfigHash generates a hash including ALL pools (for per-pool TSIG model). | ||
| func (r *DesignateBackendbind9Reconciler) getPerPoolConfigHash(multipoolConfig *designate.MultipoolConfig) (string, error) { |
There was a problem hiding this comment.
We should also add mDNS IPs as change signal - their IPs are stored as "server" blocks in the bind config files. I.e. if their IPs are modified due to any change, the binds config secret won't be re-written.
| // Add key definitions for each pool | ||
| for _, poolName := range poolNames { | ||
| key := tsigKeys[poolName] | ||
| fmt.Fprintf(&config, "key \"%s\" {\n", key.Name) | ||
| fmt.Fprintf(&config, " algorithm %s;\n", key.Algorithm) | ||
| fmt.Fprintf(&config, " secret \"%s\";\n", key.Secret) | ||
| config.WriteString("};\n\n") | ||
| } |
There was a problem hiding this comment.
nit: we might want to validate the key names before we inject them directly to the bind config files.
|
|
||
| // reconcilePerPoolTSIGSecrets creates per-pool TSIG keys and stores them in a Secret. | ||
| // Every pool (including default) gets its own TSIG key with scope=POOL and resource_id=pool-UUID. | ||
| func (r *DesignateBackendbind9Reconciler) reconcilePerPoolTSIGSecrets( |
There was a problem hiding this comment.
Can we extract sub-functions from this function? It is very long IMO
c179353 to
01d4ddd
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: abhiramnarayana The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |
01d4ddd to
aa5e523
Compare
|
There are some parts in the code that allow / hint that transition from AZ aware env to non AZ aware (say, removing existing views) is allowed. In such a transition / removal, we'd lose replications, hence we lose data and therefore we shouldn't allow it. I will open another ticket that will add a webhook that catches flipping the AZ aware flag and does nothing when it happens. |
Replace the shared TSIG key model with per-pool keys when AZAwareMode=Enabled. Each pool (including default) gets its own TSIG key with scope=POOL and resource_id set to the actual pool UUID from Designate. Implement the two-pass creation flow: pools are created without tsigkey_id, pool UUIDs are retrieved, TSIG keys are created per pool, then pools.yaml is regenerated with tsigkey_id populated. Add lifecycle management: cleanup orphaned TSIG keys when pools are removed, clean transition between per-pool and shared modes. The shared TSIG model remains unchanged when AZAwareMode is not enabled. Add unit tests for pool generation with TSIG key IDs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> Signed-off-by: Abhiram R N <abhiramrn@gmail.com>
aa5e523 to
935c279
Compare
beagles
left a comment
There was a problem hiding this comment.
The get?PoolConfigHash functions are almost entirely duplicate code. I would like to see the internal logic collapsed into a single helper as suggested and both functions implemented in terms of that.
As mentioned in the file comment, I feel we need to revisit designatebackendbind9_multipool.go and refactor the bits that have nothing to do with BIND9 management or are heavily intertwined with other operator managed resources. The introduction of the TSIG key management and mDNS dependencies have made this pretty unweildy. We might also rename the file to bind9_multipoolextensions.go as it is not a complete controller but additional implementationadded to the controller implemented in designate_backendbind9.go. This is bigger change and can be done after this. I am okay with merging this after the small refactor above is implemented, with the provision we set a task to start refactoring this asap.
There was a problem hiding this comment.
Some of (maybe most?) of these changes are not directly related to the configuration and management of the BIND9 backend. We should revisit and refactor out of here. Note also that while the file is a "multipool" specific file, these are all bound to the DesignateBackendbind9Reconciler type which might lead to some confusing mistakes in the future.
| for _, pool := range multipoolConfig.Pools { | ||
| keyName := instance.Name + "-" + pool.Name + designate.PerPoolTSIGKeySuffix | ||
|
|
||
| poolUUID, exists := poolNameToID[pool.Name] |
There was a problem hiding this comment.
We might want to look at how this works... IIUC, we create a have a deployment change around a pool, and we get this far without having designate "knowing about it". Is there a way to structure things that TSIG key and related items get managed with a view to the pool as it will be created and then do the following changes - the presumption being that the pool manage will eventually create the pools in designate. The question is I suppose what the harm is in pushing configuration in to BIND9 that isn't used. Maybe this will be more clear if we move this out of the BIND9 backend files and work it into the logic closer to where the pools yaml is generated?
| for _, poolName := range poolNames { | ||
| key := tsigKeys[poolName] | ||
| fmt.Fprintf(&config, " %s;\n", key.Name) | ||
| } |
There was a problem hiding this comment.
bind configs are getting complicated :) we should consider staging configuration like create a "proposed-bind9-config" and run a named-checkconf job on it and if successful, move the contents from proposed to whatever the runtime secret is called.
|
|
||
| sort.Strings(poolNames) | ||
|
|
||
| hashInput := "per-pool:" + strings.Join(poolNames, ",") + "|mdns:" + strings.Join(mdnsIPs, ",") |
There was a problem hiding this comment.
A lot of duplication between getPerPoolConfigHash and getPoolConfigHash. Why not add a helper with a prefix of "" or "per-pool:"?
Replace the shared TSIG key model with per-pool keys when AZAwareMode=Enabled. Each pool (including default) gets its own TSIG key with scope=POOL and resource_id set to the actual pool UUID from Designate.
Implement the two-pass creation flow: pools are created without tsigkey_id, pool UUIDs are retrieved, TSIG keys are created per pool, then pools.yaml is regenerated with tsigkey_id populated.
Add lifecycle management: cleanup orphaned TSIG keys when pools are removed, clean transition between per-pool and shared modes. The shared TSIG model remains unchanged when AZAwareMode is not enabled.
Add unit tests for pool generation with TSIG key IDs.