Skip to content

fix(api): update InternalRBACRules SPIFFE identifiers to nico-* prefix#1907

Open
shayan1995 wants to merge 1 commit into
NVIDIA:mainfrom
shayan1995:fix/internal-rbac-spiffe-identifiers
Open

fix(api): update InternalRBACRules SPIFFE identifiers to nico-* prefix#1907
shayan1995 wants to merge 1 commit into
NVIDIA:mainfrom
shayan1995:fix/internal-rbac-spiffe-identifiers

Conversation

@shayan1995
Copy link
Copy Markdown
Contributor

Description

After the carbide → NICo rename, deployed services present nico-* SPIFFE identifiers but InternalRBACRules in crates/api/src/auth/internal_rbac_rules.rs still matched against hardcoded carbide-* strings. Every internal service-to-api gRPC call failed mTLS authorization with HTTP 403.

Updates all 21 hardcoded carbide-* strings (production + test fixtures) to nico-* so RulePrincipal::{Dns, Dhcp, Ssh, SshRs, Pxe, BmcProxy, Health, Flow, MaintenanceJobs, DsxExchangeConsumer} match the SPIFFE identifiers presented by deployed nico-* services.

Type of Change

  • Add
  • Change
  • Fix
  • Remove
  • Internal

Related Issues (Optional)

Fixes #1891

Breaking Changes

  • This PR contains breaking changes

Testing

  • Unit tests added/updated
  • Integration tests added/updated
  • Manual testing performed
  • No testing required

Verified deployed serviceNames in helm/charts/nico-*/values.yaml match the updated rule strings (nico-dns, nico-dhcp, nico-pxe, nico-bmc-proxy, nico-hardware-health, nico-ssh-console-rs, nico-dsx-exchange-consumer, nico-flow).

Additional Notes

These identifiers are stringly-typed with no compile-time link to the actual deployed service names. A follow-up should either derive them from a shared constant or add an integration test that asserts each RulePrincipal resolves to a SPIFFE identifier matching the cert subject of the corresponding deployed service.

After the carbide → nico platform rename, all deployed services present
SPIFFE identifiers with the nico-* prefix, but InternalRBACRules in
crates/api/src/auth/internal_rbac_rules.rs still matched against hardcoded
carbide-* strings. Every internal service-to-api gRPC call failed mTLS
authorization with HTTP 403, silently breaking all service-to-service
communication.

Update each RulePrincipal → Principal::SpiffeServiceIdentifier mapping
(plus the corresponding test fixtures in the same file) to use the
nico-* prefix:

  carbide-dns                    -> nico-dns
  carbide-dhcp                   -> nico-dhcp
  carbide-ssh-console            -> nico-ssh-console
  carbide-ssh-console-rs         -> nico-ssh-console-rs
  carbide-pxe                    -> nico-pxe
  carbide-bmc-proxy              -> nico-bmc-proxy
  carbide-hardware-health        -> nico-hardware-health
  carbide-flow                   -> nico-flow
  carbide-maintenance-jobs       -> nico-maintenance-jobs
  carbide-dsx-exchange-consumer  -> nico-dsx-exchange-consumer

Failure mode before this fix: inbound gRPC from e.g. nico-dns to nico-api
surfaced as

  WARN auth::internal_rbac_rules — principal SpiffeServiceIdentifier("nico-dns")
       not authorized for method LookupRecordLegacy — no matching rule

with no TLS-level error, masking the root cause. Impact spanned DNS
resolution, DHCP lease lookups, PXE GetCloudInitInstructions, SSH console
access, hardware health reporting, and maintenance job scheduling — every
internal principal that authenticates via SpiffeServiceIdentifier.

Follow-up (not in this PR): these identifiers are stringly-typed with no
compile-time link to the actual deployed service names. Worth deriving
them from a shared constant or asserting consistency in an integration
test that round-trips each principal through cert subject + RBAC lookup.

Fixes NVIDIA#1891
@shayan1995 shayan1995 requested a review from a team as a code owner May 22, 2026 21:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: SPIFFE service identifiers in InternalRBACRules not updated after carbide→NICo rename — all internal gRPC calls return 403

1 participant