Skip to content

[DO NOT MERGE] Phase 2.1: rest-only path-trigger test (#1756)#1857

Closed
lachen-nv wants to merge 20 commits into
NVIDIA:mainfrom
lachen-nv:test/phase2-rest-only
Closed

[DO NOT MERGE] Phase 2.1: rest-only path-trigger test (#1756)#1857
lachen-nv wants to merge 20 commits into
NVIDIA:mainfrom
lachen-nv:test/phase2-rest-only

Conversation

@lachen-nv
Copy link
Copy Markdown
Contributor

Phase 2 validation — scenario 2.1: rest-only path trigger

Throwaway PR for #1756 / #1800 Phase 2 validation. Will be closed without merging.

Setup: branch off chore/rest-ci-test (PR #1800 HEAD = 590d19e4) + one commit adding rest-api/PHASE2_TEST.md.

Expected behavior:

  • Core CI (Carbide CI) — changes gate evaluates run_core_ci=falseprepare skips → all downstream core jobs SKIPPED
  • carbide-ci-pass aggregator — runs (if: always()), sees all needs as SKIPPED, reports SUCCESS
  • REST CI (NICo REST CI) — changes gate evaluates run_rest_ci=true → runs end-to-end
  • rest-ci-pass aggregator — reports SUCCESS

Verifies that job-level path gating correctly isolates rest-only changes from core pipeline.

lachen-nv added 20 commits May 20, 2026 11:12
@copy-pr-bot
Copy link
Copy Markdown

copy-pr-bot Bot commented May 21, 2026

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@lachen-nv
Copy link
Copy Markdown
Contributor Author

/ok to test 853c7775e2d50a3a16f1d2c19b40c91d4f1e22a3

@lachen-nv
Copy link
Copy Markdown
Contributor Author

/ok to test 853c777 (prior comment had a typo'd SHA)

@github-actions
Copy link
Copy Markdown

🔐 TruffleHog Secret Scan

No secrets or credentials found!

Your code has been scanned for 700+ types of secrets and credentials. All clear! 🎉

🔗 View scan details

🕐 Last updated: 2026-05-21 05:39:11 UTC | Commit: 853c777

@github-actions
Copy link
Copy Markdown

🔍 Container Scan Summary

Service Total Critical High Medium Low Other
nico-flow 66 4 34 18 2 8
nico-nsm 82 2 28 43 9 0
nico-psm 67 4 35 18 2 8
nico-rest-api 100 6 53 30 3 8
nico-rest-cert-manager 65 4 34 18 1 8
nico-rest-db 66 4 34 18 2 8
nico-rest-site-agent 65 4 34 18 1 8
nico-rest-site-manager 65 4 34 18 1 8
nico-rest-workflow 67 4 35 18 2 8
TOTAL 643 36 321 199 23 64

Per-CVE detail lives in the per-service grype-* artifacts (JSON + SARIF). Severity counts only — no CVE IDs published here.

@lachen-nv lachen-nv closed this May 25, 2026
lachen-nv added a commit that referenced this pull request May 25, 2026
## Description

Migrates the REST API GitHub Actions from the standalone `nico-rest`
repo into `infra-controller-core`, and refactors the CI infrastructure
so Core CI (`Carbide CI`) and REST CI (`NICo REST CI`) can coexist as
two independent workflows on the same repo with stable required checks.

Issue: #1756.

## What changed

### REST CI workflows copied over (new files)
- `.github/workflows/rest-ci.yml` — top-level REST CI orchestrator
- `.github/workflows/rest-prepare-build-info.yml` — version + build
metadata
- `.github/workflows/rest-lint-and-test.yml` — Go lint + unit tests
- `.github/workflows/rest-build-binaries.yml` — Go cross-compile
(linux-amd64, linux-arm64, darwin-arm64)
- `.github/workflows/rest-build-push-docker.yml` — 9 `nico-*` container
images
- `.github/workflows/rest-build-push-service.yml` — service image
plumbing
- `.github/workflows/rest-helm-workflows.yml` — Helm chart validate +
push

### Dual-pipeline gating
- Added `changes` job to both `ci.yaml` and `rest-ci.yml` using
`dorny/paths-filter@v3` with:
- `base: main` + `fetch-depth: 0` + PR-refs-only `if:` (dorny only runs
on `pull-request/N` mirror refs from copy-pr-bot; main/release/tag
short-circuit to `run_*_ci=true`)
- `predicate-quantifier: every` on the core side with negative filters
to catch any non-rest path
- Gate output gates `prepare`; downstream jobs skip via
`needs.prepare.result == 'success'` cascade
- Commit-message escape hatch `ci-run-complete-pipeline` forces both
pipelines

### Aggregator pattern (stable required check per workflow)
- New `carbide-ci-pass` (in `ci.yaml`) — `needs: [changes, prepare,
build-release-container-x86_64, build-release-container-aarch64,
security-secret-scan, lint-police]`
- New `rest-ci-pass` (in `rest-ci.yml`) — `needs: [changes, prepare,
lint-and-test, build-binaries, security-secret-scan, build-and-push,
helm]`
- Both use `if: always()` + jq logic: `skipped` counts as pass,
`failure`/`cancelled` fail the aggregator
- Designed so the branch protection ruleset can require ONE context per
workflow

### REST helm aligned to core's pattern
- Dropped `detect-changes` and `validate-versions` jobs (no more manual
`Chart.yaml` bump check)
- `validate-charts` now runs unconditionally on every PR
- `push-charts` gate: `!cancelled() && event != schedule && event !=
workflow_dispatch && validate.success && !pull-request`
- Chart version source: `prepare.outputs.helm_version`
(git-describe-derived, SemVer-strict `1.6.0-3.gabc1234`) — replaces
reading `version:` from `Chart.yaml`
- `helm-package-push` action SHA pinned to match core (`7de61972...`)
- Side benefit: fixes pre-existing bug where `helm_version` only read
`nico-rest/Chart.yaml`, ignoring `nico-rest-site-agent`

### Misc
- Renamed REST TruffleHog job to `REST Secret Scan with TruffleHog` to
avoid collision with core's same-named job in the PR checks UI
- Added `name: Detect Carbide CI Gate` to core `changes` job (UI
alignment with REST's `Detect REST CI Gate`)
- Reverted root `VERSION` file; both core and REST now use `git
describe`
- Switched REST test execution to `make rest-api/test-<module>` (Kyle's
Makefile)

## Why this design

GitHub Actions has no cross-workflow `needs:` and **workflow-level path
filters leave the other side's required checks stuck at "Expected —
Waiting for status"** when only one pipeline fires. We considered four
options ([analysis](#analysis-link-or-just-mention)):

1. **Aggregator job per workflow + both required** ← **chosen**: each
workflow always wakes up; job-level gates handle "do nothing"; one
stable required check name per workflow.
2. Reusable + central router workflow — cleanest in theory but `ci.yaml`
is 1467 lines and refactor cost is too high before Computex; also
incompatible with copy-pr-bot's `pull-request/N` push-event model.
3. GitHub merge queue — needs Enterprise + `on: merge_group`.
4. External Status API aggregator — more moving parts than needed.

`base: main` on the dorny filter is critical for copy-pr-bot: each `/ok
to test` force-pushes the PR head to `pull-request/N`; without an
explicit base, `dorny/paths-filter` would diff against the previous
force-push tip (i.e., only the delta between PR head versions), which
would lose earlier changes after a fixup commit.

REST helm aligned to core because the original `detect-changes` + manual
`Chart.yaml` bump pattern would block main/release/tag publishes
whenever someone forgot to bump the chart version. Git-describe-derived
version guarantees uniqueness per commit and unblocks `ngc-duplicate:
fail`.

## Test plan

### Tested 

| Scenario | Result |
|---|---|
| Mixed PR run (this PR's own pipeline) | Both pipelines completed
end-to-end; both aggregators green |
| Aggregator mechanism (`if: always()` + jq) | `skipped` correctly
counts as pass; verified on 4 separate runs |
| `dorny/paths-filter` with `base: main` on copy-pr-bot mirror | Gate
log shows correct file list and `Filter X = true/false` decision |
| REST TruffleHog rename | Both `Secret Scan with TruffleHog` (core) and
`REST Secret Scan with TruffleHog` (rest) appear separately in PR checks
UI; no collision |
| Helm `validate-charts` unconditional | Validates both `nico-rest` and
`nico-rest-site-agent` charts |
| Helm `push-charts` skipped on PR | Confirmed `!contains(github.ref,
'pull-request/')` evaluates false → SKIPPED |
| REST end-to-end on monorepo | All 9 `nico-*` container builds, all
binary cross-compile, all lint/test pass |

### Not yet tested 

| Scenario | Reason | Mitigation |
|---|---|---|
| **True path-filter SKIP behavior on isolated PRs** | Throwaway test
PRs (#1857, #1858, #1859) branched from this PR, so dorny `base: main`
saw the full PR diff (113 files) and both gates evaluated true; no PR
yet has isolated paths against main | Will validate organically after
merge with the next real REST-only / core-only PR; gate logic verified
by code review + dry-run `git diff` against `chore/rest-ci-test` base
showed correct file-list isolation |
| **`release/v0.X.0` push** | Not exercised due to publish side-effects
| Workflow trigger pattern matches existing
`release/v0.1.0`–`release/v0.9.0` branches; gate logic forces
`run_*_ci=true` for non-PR refs |
| **Tag push** | Not exercised (would trigger real NGC publish) | Same
as above — gate forces both pipelines true; publish gating already
validated on PR side as inverted condition |

### Open gaps (from #1756 spec, not blocking this PR)
- **3 REST modules untested in CI**: `flow`, `powershelf-manager`,
`nvswitch-manager` have no `test-X` target in `rest-api/Makefile`;
matrix has a TODO comment. Blocked on Makefile updates.

## Next steps 

###  required-checks ruleset 
Update ruleset `10088763` (`main` branch) `required_status_checks` from:
- `build-release-container-x86_64 / build`
- `build-release-container-aarch64 / build`
- `Secret Scan with TruffleHog`
- `lint-police`

To:
- `carbide-ci-pass`
- `rest-ci-pass`


### Versioning 
- Manually tag `v1.6.0` post-merge so `git describe` produces clean
output (avoids REST going backward from 1.x to 0.x).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant