From a6b38acf661c644f7ad855f4ba9adfd09265a6ba Mon Sep 17 00:00:00 2001 From: Luigi Toscano Date: Sun, 31 May 2026 22:32:26 +0200 Subject: [PATCH] [ci_dcn_site] Fix Nova aggregate creation idempotency The Nova aggregate create API call can return HTTP 500 (MessageDeliveryFailure) when RabbitMQ restarts during a DCN deployment triggered by a queue rebalance. However, the aggregate is written to the Nova DB before the scheduler fanout fails, so the resource actually exists despite the error response. Retrying the create then fails permanently with HTTP 409 (ConflictException: Aggregate already exists), exhausting all retry attempts without ever succeeding. Fix this by following the established check-then-create pattern used across this role and in roles/federation/tasks/run_openstack_setup.yml: - Mark the create task with ignore_errors: true (consistent with the surrounding tasks in this file: aggregate show at line 19, add host at line 45), so a transient 500 does not abort the play. - Add a dedicated verification task that uses the existing retry pattern (retries/delay/until: rc == 0) to confirm the aggregate exists, polling until the RabbitMQ-induced transient failure has passed. This task is gated on the same when condition so it only runs when a creation was attempted. Root cause: DataPlaneDeployment applies Glance az0 config, triggering a RabbitMQ queue rebalance and rolling restart. Nova aggregate creation is attempted during this window and the scheduler fanout fails. Co-Authored-By: Claude Sonnet 4.5 Related-Issue: DCN deployment failure with MessageDeliveryFailure Signed-off-by: Luigi Toscano --- roles/ci_dcn_site/tasks/az.yml | 15 +++++++++++++++ 1 file changed, 15 insertions(+) diff --git a/roles/ci_dcn_site/tasks/az.yml b/roles/ci_dcn_site/tasks/az.yml index 0909f1dcc..cf5e067c0 100644 --- a/roles/ci_dcn_site/tasks/az.yml +++ b/roles/ci_dcn_site/tasks/az.yml @@ -34,6 +34,7 @@ - name: Create AZ if it does not exist when: - az_hosts.rc == 1 + ignore_errors: true kubernetes.core.k8s_exec: api_key: "{{ _auth_results.openshift_auth.api_key }}" namespace: "{{ cifmw_openstack_namespace }}" @@ -41,6 +42,20 @@ command: >- openstack aggregate create {{ _az }} --zone {{ _az }} +- name: Verify AZ aggregate exists + when: + - az_hosts.rc == 1 + register: az_verify + retries: 10 + delay: 30 + until: az_verify.rc == 0 + kubernetes.core.k8s_exec: + api_key: "{{ _auth_results.openshift_auth.api_key }}" + namespace: "{{ cifmw_openstack_namespace }}" + pod: openstackclient + command: >- + openstack aggregate show {{ _az }} -c name -f value + - name: Add only the missing edpm hosts to AZ ignore_errors: true register: ignore_errors_register