chore: simplify AMI deployment and fix overwriting config#1173
Conversation
WalkthroughRemoves the DB_OWNER line from the AMI Changes
Sequence Diagram(s)sequenceDiagram
autonumber
actor User
participant Compose as docker-compose
participant PG as tn-postgres
participant Node as tn-node
participant MCP as tn-mcp
User->>Compose: docker-compose up
Compose->>PG: start tn-postgres
PG-->>Compose: healthy
Compose->>Node: start tn-node (depends_on: tn-postgres healthy)
Node->>Node: check /root/.kwild/config.toml
alt config exists
Node->>Node: load existing identity/config
else no config
Node->>Node: ensure curl/wget, fetch public IP, generate identity
Node->>Node: copy genesis/configs, convert private key to nodekey.json
Node->>PG: connect/init using host=tn-postgres
end
Node->>Node: start TRUF.NETWORK node
Compose->>MCP: start tn-mcp (depends_on: tn-postgres healthy)
MCP->>PG: connect via DATABASE_URI (host=tn-postgres)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~45 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (2 warnings)
✅ Passed checks (3 passed)
✨ Finishing touches
🧪 Generate unit tests
📜 Recent review detailsConfiguration used: CodeRabbit UI Review profile: CHILL Plan: Pro 📒 Files selected for processing (2)
💤 Files with no reviewable changes (1)
🧰 Additional context used🧠 Learnings (3)📓 Common learnings📚 Learning: 2025-09-19T18:59:51.942ZApplied to files:
📚 Learning: 2025-09-23T18:50:13.274ZApplied to files:
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
🔇 Additional comments (1)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Time Submission Status
|
|
after the snapshot download, the block height stuck. I am debugging this |
There was a problem hiding this comment.
Actionable comments posted: 2
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (1)
deployments/infra/stacks/docker-compose.template.yml (1)
64-74: Harden nodekey.json permissions.Write key with restricted perms to avoid world/group-readable secrets.
Apply this diff:
- echo '{\"key\":\"'$$CLEAN_KEY'\",\"type\":\"secp256k1\"}' > /root/.kwild/nodekey.json - echo 'Nodekey created successfully' + echo '{\"key\":\"'$$CLEAN_KEY'\",\"type\":\"secp256k1\"}' > /root/.kwild/nodekey.json + chmod 600 /root/.kwild/nodekey.json || true + echo 'Nodekey created successfully'
🧹 Nitpick comments (3)
deployments/infra/stacks/docker-compose.template.yml (3)
76-78: Use exec for proper signal handling and PID 1 reaping.Replace the shell with the process so Compose stops cleanly.
Apply this diff:
- /app/kwild start + exec /app/kwild start
3-3: Pin Postgres image to a version for reproducibility.Avoid floating latest to prevent surprise upgrades.
Example:
image: kwildb/postgres:16
22-22: Verify tn-node tag strategy.sha-eb8d9f0 looks like a moving tag; consider immutable digests or semver tags for AMI stability.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
deployments/infra/stacks/ami_pipeline_stack.go(1 hunks)deployments/infra/stacks/docker-compose.template.yml(3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
- deployments/infra/stacks/ami_pipeline_stack.go
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: outerlook
PR: trufnetwork/node#1168
File: .github/workflows/publish-node-image.yaml:19-19
Timestamp: 2025-09-19T18:14:01.428Z
Learning: The ghcr.io/trufnetwork/tn-db image was never released, so no backwards compatibility or transitional measures are needed when renaming to ghcr.io/trufnetwork/node.
📚 Learning: 2025-09-19T18:59:51.942Z
Learnt from: outerlook
PR: trufnetwork/node#1168
File: tests/extensions/tn_digest/test_tn_digest.sh:15-15
Timestamp: 2025-09-19T18:59:51.942Z
Learning: In Docker Compose files, service names (like "tn-db:") can remain unchanged even when the underlying image is updated (e.g., from tn-db:local to ghcr.io/trufnetwork/node:local). Service names are internal network identifiers and don't need to match the actual image names.
Applied to files:
deployments/infra/stacks/docker-compose.template.yml
📚 Learning: 2025-09-19T18:14:01.428Z
Learnt from: outerlook
PR: trufnetwork/node#1168
File: .github/workflows/publish-node-image.yaml:19-19
Timestamp: 2025-09-19T18:14:01.428Z
Learning: The ghcr.io/trufnetwork/tn-db image was never released, so no backwards compatibility or transitional measures are needed when renaming to ghcr.io/trufnetwork/node.
Applied to files:
deployments/infra/stacks/docker-compose.template.yml
📚 Learning: 2025-09-23T18:50:13.233Z
Learnt from: MicBun
PR: trufnetwork/node#1172
File: deployments/infra/stacks/docker-compose.template.yml:3-7
Timestamp: 2025-09-23T18:50:13.233Z
Learning: The Kwil system expects POSTGRES_HOST_AUTH_METHOD=trust for PostgreSQL database configuration in Docker deployments. This is an intentional configuration requirement for Kwil nodes, not a security oversight.
Applied to files:
deployments/infra/stacks/docker-compose.template.yml
🔇 Additional comments (4)
deployments/infra/stacks/docker-compose.template.yml (4)
2-4: Service rename to tn-postgres is consistent and aligns references.Matches updates in depends_on and DB URIs; internal DNS will resolve correctly.
34-35: Good use of health-gated dependency for Postgres.Ensures tn-node waits for a ready DB.
84-84: DB URI host change looks correct.Matches tn-postgres and trust auth model per Kwil requirement.
90-93: Confirm Compose supports depends_on.condition: service_started in your environment.Some older Compose versions don’t honor conditions unless using the v2+ spec. If unsupported, tn-mcp may start too early.
If needed, switch to “service_healthy” with a tn-node healthcheck. Example (add under tn-node; adjust to what the image provides):
healthcheck: test: ["CMD", "/app/kwild", "status"] interval: 15s timeout: 5s retries: 10Then:
depends_on: tn-node: condition: service_healthy
resolves: #1132
Summary by CodeRabbit
New Features
Refactor
Bug Fixes
Chores