recipe: add gatekeeper as optional component by JaydipGabani · Pull Request #821 · NVIDIA/aicr

JaydipGabani · 2026-05-08T17:29:10Z

Summary

Adds Gatekeeper as an optional AICR recipe component for clusters that need admission-controller support for AI workload guardrails.

Motivation / Context

The direct Slack discussion in #aicr started from the Gatekeeper maintainers asking whether defining AI workload Kubernetes policies belongs in AICR. The thread identified three possible integration shapes: policy-as-component, validation integration through gator policy and --no-cluster, and conformance evidence.

Reviewers in the thread asked for concrete AI-specific policy examples and for the distinction from AICR/Chainsaw checks. The discussion converged on workload-shape guardrails such as nvidia.com/gpu request/limit parity, memory request/limit parity for GPU pods, CPU request requirements, training shared-memory requirements, and optional max-GPU caps for shared clusters. The resulting split was that AICR and Chainsaw cover platform correctness and component health, while Gatekeeper and gator cover reusable workload guardrails through offline policy evaluation or admission-time enforcement.

This PR lands only the deployable Gatekeeper component slice. It does not ship policy bundles, run gator, connect Gatekeeper to AICR validation, or capture policy results as conformance evidence. The broader Slack-derived rationale and follow-up integration paths are tracked in #860.

Fixes: N/A
Related: #860
Related discussion: https://kubernetes.slack.com/archives/C0AQMPP1BK7/p1775874775990329

Type of Change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation update
Refactoring (no functional changes)
Build/CI/tooling

Component(s) Affected

CLI (cmd/aicr, pkg/cli)
API server (cmd/aicrd, pkg/api, pkg/server)
Recipe engine / data (pkg/recipe)
Bundlers (pkg/bundler, pkg/component/*)
Collectors / snapshotter (pkg/collector, pkg/snapshotter)
Validator (pkg/validator)
Core libraries (pkg/errors, pkg/k8s)
Docs/examples (docs/, examples/)
Other: New optional Gatekeeper component

Implementation Notes

Registers Gatekeeper as an optional Helm component in recipes/registry.yaml.
Pins the upstream Gatekeeper chart to 3.22.2 and installs it into gatekeeper-system when recipes opt in.
Wires only system-tier node selector and toleration paths because Gatekeeper is an admission controller, not a GPU workload.
Adds base Helm values for controller-manager and audit resources, disables mutation by default, and leaves the component out of base overlays.
Adds a Chainsaw health check that asserts both Gatekeeper deployments exist with available replicas and that no Gatekeeper pods are in unhealthy phases.
Defers policy bundle components, gator validation, AICR validator integration, and conformance evidence integration to follow-up work tracked in Epic: Track Gatekeeper integration for AI workload guardrails #860.

Testing

make qualify

Passed locally. This covers race-enabled unit tests, coverage enforcement, linting, Chainsaw CLI e2e tests, vulnerability scan, license checks, and repository-specific documentation checks.

Risk Assessment

Low - Isolated optional component, not referenced by base overlays, and easy to revert
Medium - Touches multiple components or has broader impact
High - Breaking change, affects critical paths, or complex rollout

Rollout notes: Gatekeeper remains opt-in. Existing recipes are unchanged unless they explicitly reference the component, and this PR does not install or enforce any workload policies.

Checklist

Tests pass locally (make test with -race)
Linter passes (make lint)
I did not skip/disable tests to make CI green
I added/updated tests for new functionality
I updated docs if user-facing behavior changed
Changes follow existing patterns in the codebase
Commits are cryptographically signed (git commit -S) - GPG signing info

copy-pr-bot · 2026-05-08T17:29:14Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2026-05-08T17:29:27Z

Welcome to AICR, @JaydipGabani! Thanks for your first pull request.

Before review, please ensure:

All commits are signed off per the DCO
CI checks pass (tests, lint, security scan)
The PR description explains the why behind your changes

A maintainer will review this soon.

coderabbitai · 2026-05-08T17:31:23Z

📝 Walkthrough

Walkthrough

This PR adds Gatekeeper as an optional system-tier component: a registry entry (Helm chart defaults, defaultVersion 3.22.2, defaultNamespace gatekeeper-system, valueOverride key, and node-scheduling paths), a Helm values file with conservative defaults (replicas, resources, disableMutation: true, auditInterval: 60, constraintViolationsLimit: 20), and a Kyverno Chainsaw health-check that asserts both controller-manager and audit Deployments have available replicas and no Pods are in Pending/Failed/Unknown.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~12 minutes

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding Gatekeeper as an optional component to the project.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Description check	✅ Passed	The PR description comprehensively explains the addition of Gatekeeper as an optional component, detailing motivation, implementation, scope, and deferral of follow-up work.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@recipes/components/gatekeeper/values.yaml`:
- Around line 52-55: Update the misleading comment: instead of referencing the
unrelated flag "gatekeeper:disableMutation=false", instruct users to re-enable
the PDB via the actual Helm value for this key (podDisruptionBudget.enabled) —
e.g., tell them to use "--set gatekeeper.podDisruptionBudget.enabled=true" (or
the equivalent overlay) so the comment correctly points to the
podDisruptionBudget.enabled setting.

🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

Push a commit to this branch (recommended)
Create a new PR with the fixes

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 15dfa5e4-9d35-4c3a-856d-2fe6258734e2

📥 Commits

Reviewing files that changed from the base of the PR and between 2f22e2b and 9e2361a.

📒 Files selected for processing (3)

recipes/checks/gatekeeper/health-check.yaml
recipes/components/gatekeeper/values.yaml
recipes/registry.yaml

coderabbitai

♻️ Duplicate comments (1)

recipes/components/gatekeeper/values.yaml (1)

53-53: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Fix the Helm override example syntax for PDB re-enable.

Line 53 uses --set gatekeeper:podDisruptionBudget.enabled=true; Helm --set expects dot-separated nested paths. This example can fail to apply the intended value.

Suggested fix

-# Production overlays can re-enable via --set gatekeeper:podDisruptionBudget.enabled=true.
+# Production overlays can re-enable via --set gatekeeper.podDisruptionBudget.enabled=true.

What is the correct Helm --set syntax for nested values, and is "gatekeeper:podDisruptionBudget.enabled=true" valid compared to "gatekeeper.podDisruptionBudget.enabled=true"?

🤖 Prompt for AI Agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@recipes/components/gatekeeper/values.yaml` at line 53, The Helm override
example uses invalid colon notation; change the example to use Helm's
dot-separated nested path syntax so the dashboard PDB value is set correctly
(replace "gatekeeper:podDisruptionBudget.enabled=true" with
"gatekeeper.podDisruptionBudget.enabled=true"), and update the comment in
values.yaml accordingly so users run the correct --set flag.

🤖 Prompt for all review comments with AI agents

Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@recipes/components/gatekeeper/values.yaml`:
- Line 53: The Helm override example uses invalid colon notation; change the
example to use Helm's dot-separated nested path syntax so the dashboard PDB
value is set correctly (replace "gatekeeper:podDisruptionBudget.enabled=true"
with "gatekeeper.podDisruptionBudget.enabled=true"), and update the comment in
values.yaml accordingly so users run the correct --set flag.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Enterprise

Run ID: 22d83453-7fc3-4806-a9fd-adf8d04b420b

📥 Commits

Reviewing files that changed from the base of the PR and between 9e2361a and ea24241.

📒 Files selected for processing (1)

recipes/components/gatekeeper/values.yaml

mchmarny · 2026-05-09T11:30:29Z

Hi @JaydipGabani — thanks for kicking this off and for the Slack discussion behind it.

Before we dig into the YAML, can we get the PR description filled in to match the PR template? Right now the Summary, Motivation / Context, and Implementation Notes sections are empty, the Type of Change boxes aren't checked, and there's no Fixes: issue reference — the only context is a Slack link, which isn't durable for future readers (search, audit, anyone outside the workspace).

Concretely, what would help reviewers most is an issue (filed in this repo) that captures the parts of the Slack thread that justify landing Gatekeeper in AICR — specifically:

What problem does this solve, and for whom? @yuanchen8911 and @lockwobr both asked for concrete AI-workload policy examples; your nvidia.com/gpu request==limit / memory==limit / training-shm examples from the thread are exactly the right framing and should live in the issue.
Why AICR? The thread converged on a useful split — AICR + Chainsaw cover platform correctness; Gatekeeper covers workload guardrails — and on three integration shapes (policy-as-component, validation integration, conformance evidence). Pinning down which shape this PR is delivering, and which are deferred, would give reviewers a frame for what's in scope here.
Why now / why this slice? @yuanchen8911 also flagged that gator (CLI tooling) isn't a deployable, only the policy bundle and Gatekeeper itself are — so this PR appears to be the "Gatekeeper as optional component" piece. Stating that explicitly avoids reviewers re-litigating the larger design in this PR.

Once an issue exists, please link it via Fixes: #N (or Related: #N if it's part of a larger initiative), fill in the rest of the template (Implementation Notes, Testing — make qualify, Component(s) Affected = Recipes), and we'll take a proper review pass. Happy to help shape the issue if useful.

Also worth noting that the branch is mergeable_state: behind — a rebase onto current main would be good to do alongside the description fill-in.

mchmarny

Needs context

JaydipGabani · 2026-05-13T00:23:50Z

Addressed the durable-context and template gaps:

Filed Epic: Track Gatekeeper integration for AI workload guardrails #860 with the Slack-derived problem statement, AI workload policy examples, the AICR/Chainsaw vs. Gatekeeper scope split, and the deferred integration shapes.
Updated this PR description to the project template with Summary, Motivation / Context, Type of Change, Component(s) Affected, Implementation Notes, Testing, Risk Assessment, and Checklist.
Linked the issue as Related: #860 because it tracks the broader Gatekeeper integration work while this PR stays scoped to the optional deployable component.

Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>

JaydipGabani · 2026-05-13T03:26:03Z

Added contex

mchmarny

Let's hold on on this PR for now to align on the scope. See: #860 (comment)

JaydipGabani requested a review from a team as a code owner May 8, 2026 17:29

github-actions Bot added area/recipes size/M labels May 8, 2026

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

Comment thread recipes/components/gatekeeper/values.yaml Outdated

mchmarny assigned JaydipGabani May 8, 2026

coderabbitai Bot reviewed May 8, 2026

View reviewed changes

mchmarny requested changes May 9, 2026

View reviewed changes

JaydipGabani added 3 commits May 13, 2026 00:48

add gatekeeper as optional component

8392122

Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>

update comment to accurately describe variable

fe30672

Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>

remove pdb variable

c45a9e1

Signed-off-by: Jaydip Gabani <gabanijaydip@gmail.com>

JaydipGabani force-pushed the feat/gatekeeper-component branch from 2e1b03a to c45a9e1 Compare May 13, 2026 00:48

JaydipGabani requested a review from mchmarny May 13, 2026 03:25

mchmarny requested changes May 13, 2026

View reviewed changes

JaydipGabani mentioned this pull request May 22, 2026

Add Gatekeeper as an optional deployable component #1012

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

recipe: add gatekeeper as optional component#821

recipe: add gatekeeper as optional component#821
JaydipGabani wants to merge 3 commits into
NVIDIA:mainfrom
JaydipGabani:feat/gatekeeper-component

JaydipGabani commented May 8, 2026 •

edited

Loading

Uh oh!

copy-pr-bot Bot commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

coderabbitai Bot commented May 8, 2026 •

edited

Loading

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Uh oh!

mchmarny commented May 9, 2026 •

edited

Loading

Uh oh!

mchmarny left a comment

Uh oh!

JaydipGabani commented May 13, 2026 •

edited

Loading

Uh oh!

JaydipGabani commented May 13, 2026

Uh oh!

mchmarny left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

JaydipGabani commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Motivation / Context

Type of Change

Component(s) Affected

Implementation Notes

Testing

Risk Assessment

Checklist

Uh oh!

copy-pr-bot Bot commented May 8, 2026

Uh oh!

github-actions Bot commented May 8, 2026

Uh oh!

coderabbitai Bot commented May 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Estimated code review effort

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

mchmarny commented May 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mchmarny left a comment

Choose a reason for hiding this comment

Uh oh!

JaydipGabani commented May 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

JaydipGabani commented May 13, 2026

Uh oh!

mchmarny left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

JaydipGabani commented May 8, 2026 •

edited

Loading

coderabbitai Bot commented May 8, 2026 •

edited

Loading

mchmarny commented May 9, 2026 •

edited

Loading

JaydipGabani commented May 13, 2026 •

edited

Loading