Skip to content

[codex] Add deploy fallback runbook preflight#667

Merged
jaeyunha merged 13 commits into
stagingfrom
issue-645-deploy-fallback-runbook
Jun 19, 2026
Merged

[codex] Add deploy fallback runbook preflight#667
jaeyunha merged 13 commits into
stagingfrom
issue-645-deploy-fallback-runbook

Conversation

@jaeyunha

Copy link
Copy Markdown
Member

Summary

Refs #645.

Adds the low-risk deploy fallback slice for the Mac mini runner SPOF:

  • Documents the break-glass manual deploy path in agent_docs/runbooks/deploy-fallback.md.
  • Adds bun run deploy:fallback:preflight, a non-mutating preflight that checks Docker/buildx, AWS identity, ECR/ECS reachability, and the required Secrets Manager secret name without reading secret values or changing production.
  • Extends tests/deploy.test.ts with static checks proving the runbook/preflight wiring and the remaining closure gate.

Important gate

Final issue closure remains gated on one real no-op fallback deploy from a non-Mac-mini machine. This PR deliberately does not run or claim that production exercise; it only provides the documented fallback path and preflight.

Validation

  • bun test tests/deploy.test.ts
  • bun run deploy:fallback:preflight
  • make check
  • git diff --check

@jaeyunha jaeyunha force-pushed the issue-645-deploy-fallback-runbook branch from 17c2023 to c7fa7ea Compare June 18, 2026 19:58
@jaeyunha jaeyunha marked this pull request as ready for review June 18, 2026 19:58

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c7fa7eac15

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/deploy-fallback-preflight.ts Outdated
Comment on lines +186 to +192
awsCheck(
"Secrets Manager secret name",
[
"secretsmanager",
"describe-secret",
"--secret-id",
webhookSecretId,

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Check all required deploy secrets in preflight

In the fallback all path, scripts/deploy.sh also resolves TRACKING_SECRET_SECRET_ID and INGESTER_JOB_TOKEN_SECRET_ID before registering the app/ingester/scheduler task definitions, but this preflight only describes webhookSecretId. A machine with the other required secret names missing or misconfigured will still print “Preflight passed” and then fail after deploy_app has already built and pushed images, so include all deploy-required secret IDs in the read-only checks.

Useful? React with 👍 / 👎.

Comment thread tools/deploy-fallback-preflight.ts Outdated
Comment on lines +23 to +26
const appRepo = env.APP_REPO || `${product}-app`;
const ingesterRepo = env.ING_REPO || `${product}-ingester`;
const appService = env.APP_SERVICE || `${product}-app`;
const ingesterService = env.ING_SERVICE || `${product}-ingester`;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Avoid validating overrides deploy.sh ignores

These APP_REPO/ING_REPO/APP_SERVICE/ING_SERVICE overrides are only consumed by the new preflight/runbook; scripts/deploy.sh still assigns those names from PRODUCT unconditionally. If an operator follows the fallback contract and sets one of these for a nonstandard production target, the preflight validates the override but bash scripts/deploy.sh all deploys the default ${PRODUCT}-* resources instead, so either remove these overrides here/docs or make the deploy script honor them.

Useful? React with 👍 / 👎.

@jaeyunha jaeyunha left a comment

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed both Codex P2 review comments in 6c745b8.

What changed:

  • Preflight now validates all deploy-required Secrets Manager metadata resolved by scripts/deploy.sh all: WEBHOOK_SECRET_ENCRYPTION_KEY_SECRET_ID/ARN, TRACKING_SECRET_SECRET_ID/ARN, and INGESTER_JOB_TOKEN_SECRET_ID/ARN.
  • Optional inbound token metadata is checked only when explicitly configured, matching deploy.sh's non-fatal inbound-token behavior.
  • Removed the unsafe APP_REPO/ING_REPO/APP_SERVICE/ING_SERVICE fallback contract from the preflight/runbook; repo/service targets now derive from PRODUCT, matching deploy.sh.

Validation:

  • bun test tests/deploy.test.ts
  • bun run deploy:fallback:preflight ✅ (non-mutating metadata checks only; no secret values fetched/printed)
  • make check
  • git diff --check
  • Push guardrails during git push

@github-actions github-actions Bot added size/XL and removed size/L labels Jun 18, 2026

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 6c745b813e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/deploy-fallback-preflight.ts Outdated
}

const env = process.env;
const region = env.AWS_REGION || env.AWS_DEFAULT_REGION || "us-east-1";

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use only AWS_REGION for the deploy region

When a workstation has AWS_DEFAULT_REGION set but not AWS_REGION, this preflight validates all AWS resources in AWS_DEFAULT_REGION and only passes that region to its child aws commands; the documented follow-up bash scripts/deploy.sh all does not read AWS_DEFAULT_REGION and defaults AWS_REGION to us-east-1, so a passing preflight can be followed by a deploy against a different region. Match scripts/deploy.sh here or fail until AWS_REGION is explicitly exported.

Useful? React with 👍 / 👎.

Comment on lines +21 to +24
```bash
docker --version
docker buildx version
aws --version

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require python3 before fallback deploy

On a fallback workstation that has Docker/AWS/Bun but no python3, this checklist and the new preflight can pass, but scripts/deploy.sh all calls python3 while generating ECS task definition JSON after deploy_app has already pushed the image, so the break-glass deploy fails mid-flight. Add python3 --version to the required tools/preflight before directing operators to run the deploy.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Addressed the P2 deploy fallback review comments in commit 0cd9afe.

Fixes:

  • tools/deploy-fallback-preflight.ts now preflights all deploy-required Secrets Manager metadata used before safe task registration/deploy: webhook secret encryption key, tracking secret, ingester job token, and ingester inbound token. It still only calls describe-secret; no secret values are fetched or printed.
  • Removed the optional inbound-token preflight path from the runbook/preflight contract so the fallback check fails before deploy if that production metadata is missing.
  • Kept the override-drift correction: runbook/preflight validate the same ${PRODUCT}-app / ${PRODUCT}-ingester ECR repo and ECS service names that scripts/deploy.sh derives, and tests guard against APP_REPO / ING_REPO / APP_SERVICE / ING_SERVICE override claims.

Validation:

  • bun test tests/deploy.test.ts
  • make check
  • git diff --check
  • bun run deploy:fallback:preflight failed safely/non-mutating because aws secretsmanager describe-secret --secret-id opensend/ingester-inbound-token returned ResourceNotFoundException. Earlier metadata checks passed; no secret values were requested or printed.

Leaving the PR open for controller review/merge.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 0cd9afe7c8

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

awsGlobalCheck("AWS CLI", ["--version"], (stdout) => stdout),
identityCheck(),
awsNameListCheck(
"ECR repositories",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate Docker can push to ECR

On a fresh fallback workstation that has AWS CLI permissions but no Docker ECR credential helper/login, this ECR check still passes because it only calls aws ecr describe-repositories; the subsequent scripts/deploy.sh build_and_push path runs docker buildx build ... --push, so the failure occurs after the image build starts instead of during preflight. Add a non-printing ECR Docker login/credential check, or document that login as a required preflight step, before reporting success.

Useful? React with 👍 / 👎.

@jaeyunha

Copy link
Copy Markdown
Member Author

Correction to my previous comment: the branch is pushed and the intended update is below (posting with shell-safe quoting so inline code is preserved).

Addressed the Codex P2 Docker-to-ECR readiness comment in commit a1d52c0.

What changed:

  • tools/deploy-fallback-preflight.ts now resolves the AWS account, then verifies Docker can authenticate to the matching ECR registry via aws ecr get-login-password piped into docker login --password-stdin. The password is only passed via stdin and is not printed.
  • The check is skipped if caller identity/account guard fails, so it does not refresh Docker credentials for the wrong account.
  • agent_docs/runbooks/deploy-fallback.md now documents that preflight refreshes local Docker ECR login, but still does not push images, update ECS, run tasks, register task definitions, or fetch secret values.
  • tests/deploy.test.ts covers the new Docker ECR auth contract and no-push/no-secret expectations.

Validation:

  • bun test tests/deploy.test.ts — 22 pass
  • git diff --check — pass
  • make check — typecheck + Biome pass

No images were pushed and no production deploy/mutation commands were run.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a1d52c0265

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/deploy-fallback-preflight.ts Outdated
Comment on lines +300 to +301
"ecs",
"describe-services",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate task definitions before the first image push

In a fallback account/role that can ecs:DescribeServices but not ecs:DescribeTaskDefinition (or when the current service points at a task definition missing the expected container), this preflight still passes here because it only checks service names. The checked scripts/deploy.sh path then runs deploy_app first, pushes the app image, and only afterward register_app_task_definition/write_app_task_definition call aws ecs describe-task-definition, so the failure happens mid-deploy after ECR has already been mutated. Add read-only task-definition lookups for the app and ingester base task definitions before reporting success.

Useful? React with 👍 / 👎.

@jaeyunha

Copy link
Copy Markdown
Member Author

Addressed the latest Codex P2 task-definition preflight comment.

Changed:

  • Added read-only ECS task-definition metadata checks to tools/deploy-fallback-preflight.ts for the app and ingester services before preflight success.
  • Each check now reads the service current task definition, calls aws ecs describe-task-definition, validates taskDefinition.family, and verifies the expected app/ingester container name is present.
  • Updated the deploy fallback runbook and targeted static deploy test coverage for the new task-definition checks.

Validation:

  • bun test tests/deploy.test.ts — 22 pass
  • git diff --check — pass
  • make check — typecheck + Biome pass
  • Push guardrails during git push — typecheck + changed-file Biome pass

No live deploy was run; no ECR/ECS mutations were performed.

Copy link
Copy Markdown
Member Author

Fixed the remaining deploy-fallback review blockers.

Files changed:

  • tools/deploy-fallback-preflight.ts
  • tests/deploy.test.ts
  • agent_docs/runbooks/deploy-fallback.md

Validation:

  • bun run test tests/deploy.test.ts ✅ (22/22)
  • git diff --check
  • make check
  • Push guardrails also passed typecheck + changed-file Biome ✅

Head SHA: 057d85f575a0321d2ee9bdf3cba39a5e82fb3ae1

No live deploy was run; no ECR/ECS mutations were performed; no secret values were fetched or printed.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 057d85f575

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/deploy-fallback-preflight.ts Outdated
[appService, ingesterService],
),
ecsTaskDefinitionMetadataCheck(appService, appContainerName),
ecsTaskDefinitionMetadataCheck(ingesterService, ingesterContainerName),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate scheduler base-task secrets before passing preflight

In the all fallback path I checked scripts/deploy.sh: after this preflight allows image pushes, deploy_scheduler calls write_scheduler_task_definition, whose Python exits unless the current ingester task definition contains DATABASE_URL and BETTER_AUTH_SECRET secrets. This ingester task-definition check only verifies the container name, so a base task missing either secret still reports “Preflight passed” and then fails mid-deploy after ECR has been mutated; include these scheduler base-secret checks in the read-only task-definition metadata validation.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Addressed the latest Codex P2 scheduler base-secret preflight blocker.

Files changed:

  • tools/deploy-fallback-preflight.ts
  • tests/deploy.test.ts
  • agent_docs/runbooks/deploy-fallback.md

Validation:

  • bun run test tests/deploy.test.ts ✅ (22/22)
  • git diff --check
  • make check
  • Push guardrails passed typecheck + changed-file Biome ✅

Head SHA: 5e80c15038913c9141cb9021fe8ae35e2e1befdb

No live deploy was run; no ECR/ECS mutations were performed; no secret values were fetched or printed.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 5e80c15038

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/deploy-fallback-preflight.ts Outdated
],
[appService, ingesterService],
),
ecsTaskDefinitionMetadataCheck(appService, appContainerName),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require DATABASE_URL on the app base task

In a fallback account where the current app task definition is missing or has renamed DATABASE_URL, this app metadata check still passes because it only verifies the container name. bash scripts/deploy.sh all then pushes/registers images before run_migrations clones the app task into the migrator, and src/lib/db/migrate.ts exits immediately when DATABASE_URL is absent, so the preflight can report success and still fail mid-deploy after ECR/ECS have been mutated. Check that the app container exposes DATABASE_URL (secret or environment name) before passing preflight.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Fixed the latest Codex P2 app DATABASE_URL base-task metadata blocker.

Files changed:

  • tools/deploy-fallback-preflight.ts
  • tests/deploy.test.ts
  • agent_docs/runbooks/deploy-fallback.md

Validation:

  • bun run test tests/deploy.test.ts ✅ (22/22)
  • git diff --check
  • make check
  • Push guardrails passed typecheck + changed-file Biome ✅

Head SHA: ede1bdc4355505c0f408ede22b54a44136c7fcf9

No live deploy was run; no ECR/ECS mutations were performed; no secret values were fetched or printed.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ede1bdc435

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +23 to +27
docker --version
docker buildx version
python3 --version
aws --version
```

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Include Bun in fallback prerequisites

When this runbook is followed on a fresh authorized workstation that has every tool listed here but not Bun, step 4 fails immediately because the documented preflight command is bun run deploy:fallback:preflight; scripts/deploy.sh itself does not install Bun or provide a non-Bun entrypoint. Add bun --version/installation guidance to the required tools list or document an invocation that works without Bun, otherwise the break-glass path can block during the runner outage it is meant to handle.

Useful? React with 👍 / 👎.

@jaeyunha

Copy link
Copy Markdown
Member Author

Fixed the Bun prerequisite blocker for the fallback deploy runbook/preflight.

Files changed:

  • agent_docs/runbooks/deploy-fallback.md
  • tools/deploy-fallback-preflight.ts
  • tests/deploy.test.ts

Validation:

  • bun run test tests/deploy.test.ts
  • git diff --check
  • make check
  • push guardrails: changed-file check, full-project typecheck, changed-file Biome lint ✅

Head SHA: 7997f2244ebe548aad52e7723107a728a6d71ecf

Safety confirmation: no live deploy run; no ECR/ECS mutation; no Secrets Manager value fetch or secret values printed.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 7997f2244e

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/deploy-fallback-preflight.ts Outdated
Comment on lines +481 to +484
ecsTaskDefinitionMetadataCheck(ingesterService, ingesterContainerName, [
"DATABASE_URL",
"BETTER_AUTH_SECRET",
]),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Check ingester boot secrets before passing preflight

In fallback accounts where the current ingester task definition is missing production-required boot secrets such as WEBHOOK_SECRET_ENCRYPTION_KEY or UNSUBSCRIBE_SECRET, this still passes because the ingester metadata check only requires DATABASE_URL/BETTER_AUTH_SECRET. write_ingester_task_definition does not add those names, while packages/ingester/src/startup-checks.ts validates the production ingester environment and refuses to boot without them, so bash scripts/deploy.sh all can push/register images and only fail when the ingester rolls; include the ingester startup-required metadata here before reporting success.

Useful? React with 👍 / 👎.

@jaeyunha

Copy link
Copy Markdown
Member Author

Fixed the latest Codex P2 ingester startup metadata preflight blocker.

Files changed:

  • tools/deploy-fallback-preflight.ts
  • agent_docs/runbooks/deploy-fallback.md
  • tests/deploy.test.ts

Validation:

  • bun run test tests/deploy.test.ts ✅ (22/22)
  • git diff --check
  • make check
  • push guardrails passed typecheck + changed-file Biome ✅

Head SHA: 42f99d72772383a18c23e5a67b487bafc5cb74f5

Safety confirmation: no live deploy run; no ECR/ECS mutation; no Secrets Manager value fetch or secret values printed.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 42f99d7277

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/deploy-fallback-preflight.ts Outdated
appService,
appContainerName,
[],
["DATABASE_URL"],

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Check app startup env metadata before passing preflight

In a fallback account where the current app task is missing production app settings such as BETTER_AUTH_SECRET, BETTER_AUTH_URL, NEXT_PUBLIC_APP_URL, BETTER_AUTH_TRUSTED_ORIGINS, UNSUBSCRIBE_SECRET, or DKIM_ENCRYPTION_KEY, this app-task check still passes because it only requires DATABASE_URL. Fresh evidence is src/lib/startup-checks.ts invoking validateOpenSendEnv(..., { service: "app" }), while scripts/deploy.sh clones the app task and only injects webhook/tracking secrets, so the deploy can push/register images and then roll an app task that refuses to boot; include the production app metadata names here just like the ingester startup metadata check.

Useful? React with 👍 / 👎.

@jaeyunha

Copy link
Copy Markdown
Member Author

Fixed the latest Codex P2 blocker: the fallback preflight now checks the current app base task-definition metadata for all production app startup-required env/secret names before success.

Files changed:

  • tools/deploy-fallback-preflight.ts
  • agent_docs/runbooks/deploy-fallback.md
  • tests/deploy.test.ts

Validation:

  • bun run test tests/deploy.test.ts
  • git diff --check
  • make check
  • push guardrails typecheck + changed-file Biome ✅

Head SHA: bac8aa3633b41ce40335cfbd5d53f67659643f3f

Confirmed: no live deploy run, no ECR/ECS/Secrets Manager mutation, and no secret values fetched or printed; this was metadata-name validation only.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: bac8aa3633

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread tools/deploy-fallback-preflight.ts Outdated
appService,
ingesterService,
"--query",
"services[].serviceName",

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require ECS services to be active before passing preflight

For a fallback account where ${PRODUCT}-app or ${PRODUCT}-ingester was deleted recently, this query still passes because it only checks serviceName; AWS docs note that DRAINING/INACTIVE services remain visible to DescribeServices, while UpdateService rejects inactive services. bash scripts/deploy.sh all then pushes/registers images before the later redeploy call fails on that inactive service, so include status in this read-only check and fail unless each service is ACTIVE.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Member Author

Fixed the latest P2 ECS service-status blocker.

Files changed:

  • tools/deploy-fallback-preflight.ts
  • tests/deploy.test.ts
  • agent_docs/runbooks/deploy-fallback.md

Validation:

  • bun run test tests/deploy.test.ts
  • git diff --check
  • make check
  • Push guardrails during git push also passed typecheck + changed-file Biome ✅

Head SHA: 2168545a40b51d9a8ec2eed1b6e0eeab6ca43fb5

Confirmed: no live deploy run, no ECR/ECS/Secrets Manager mutation, and no secret values fetched or printed.

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 2168545a40

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread agent_docs/runbooks/deploy-fallback.md Outdated
| `PRODUCT` | `opensend` | Prefix for repositories and services. |
| `ECS_CLUSTER` | `namuh` | ECS cluster name. |
| `IMAGE_TAG` | `latest` | Image tag pushed to ECR. |
| `PLATFORM` | `linux/amd64` | Docker build target platform. |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate PLATFORM before approving fallback deploy

When a fallback workstation has PLATFORM inherited as linux/arm64 (common on Apple Silicon shells), this runbook allows that override and the preflight never validates or even prints it, but scripts/deploy.sh passes ${PLATFORM} directly to docker buildx build --platform. That can produce and push images incompatible with the production linux/amd64 Fargate task definitions after preflight has reported success, so require linux/amd64 in the preflight or remove this override from the fallback contract.

Useful? React with 👍 / 👎.

],
[appRepo, ingesterRepo],
),
ecsServicesActiveCheck([appService, ingesterService]),

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Check scheduler service state before preflight passes

Checked the scripts/deploy.sh all path: after image pushes and task registration, it calls create_or_redeploy_scheduler_service, and scheduler_service_exists treats a DRAINING scheduler service as present before calling redeploy. This preflight only checks ${PRODUCT}-app and ${PRODUCT}-ingester, so a recently deleted or draining ${PRODUCT}-scheduler can still pass preflight and then fail late during the scheduler update/create step; include the scheduler service as active-or-absent metadata before reporting success.

Useful? React with 👍 / 👎.

@jaeyunha

Copy link
Copy Markdown
Member Author

Fixed the latest current-head P2 blockers.

Files changed:

  • tools/deploy-fallback-preflight.ts
  • tests/deploy.test.ts
  • agent_docs/runbooks/deploy-fallback.md

Validation:

  • bun run test tests/deploy.test.ts
  • git diff --check
  • make check

Head SHA: af2eaad2980feb4f28bd44b37a954114538f7217

Confirmation: no live deploy run; no ECR/ECS/Secrets Manager mutation commands run; no secret values fetched or printed.

@jaeyunha jaeyunha merged commit 6cfbb81 into staging Jun 19, 2026
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant