feat: setup node inside AWS AMI Instance#1172
Conversation
Implement automated AMI building pipeline to reduce developer onboarding from 3+ hours to under 10 minutes. - **AWS CDK Infrastructure**: EC2 Image Builder pipeline with isolated deployment - **Docker-in-AMI**: Pre-configured containers for node, database, and MCP server - **Always-Latest Strategy**: Containers pull latest images on startup - **Simple Configuration**: 3-command setup with `tn-node-configure` script - **Multi-Region Distribution**: AMI available across AWS regions - `deployments/infra/ami-cdk.go` - Isolated AMI deployment application - `deployments/infra/stacks/ami_pipeline_stack.go` - Main CDK stack ✅ Successfully deployed in us-east-2 ✅ Docker containers and MCP server operational Additional issue will be handled separately to limit scope. Removes major adoption barrier by providing one-click deployment alternative to complex manual setup. resolves: #1131
WalkthroughThis PR updates the AMI pipeline to embed a base64-encoded docker-compose template, switches to new config paths (/opt/tn/.../v2), adds a network configs download phase, and revises private key handling. The docker-compose template gains a scripted tn-node init/start flow and updated images/ports. Tests are aligned to the fixed network and stricter key validation. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Dev as AMI Build (CDK)
participant EC2 as AMI Instance
participant S3 as Network Configs Source
participant FS as /opt/tn FS
rect rgba(200,220,255,0.25)
note over Dev: Build-time
Dev->>Dev: Embed docker-compose.template.yml (base64)
Dev->>EC2: Provision AMI phases
end
rect rgba(220,255,220,0.25)
note over EC2,FS: First boot
EC2->>S3: Download genesis.json
S3-->>EC2: genesis.json
EC2->>FS: Create /opt/tn/{configs/network/v2,data}
EC2->>FS: Decode docker-compose.yml to /opt/tn
EC2->>FS: Write .env (TN_PRIVATE_KEY, etc.)
end
sequenceDiagram
autonumber
participant DC as docker-compose
participant Node as tn-node (container)
participant PG as kwil-postgres
participant MCP as postgres-mcp
participant FS as /root/.kwild
rect rgba(255,245,200,0.35)
note over DC: Runtime bring-up
DC->>PG: Start (host trust auth, shm=1gb)
DC->>Node: Start (entrypoint via command)
end
rect rgba(235,235,255,0.35)
note over Node,FS: Init sequence
Node->>Node: Generate temp node config
Node->>FS: Copy to /root/.kwild if valid
Node->>FS: Edit config.toml DB host=kwil-postgres
alt TN_PRIVATE_KEY present
Node->>FS: Validate/convert -> nodekey.json
else No key
Node-->>Node: Skip key conversion
end
Node->>Node: Start /app/kwild -r /root/.kwild
end
PG-->>DC: healthy
DC->>MCP: Start (depends_on: PG healthy, Node started)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Possibly related PRs
Suggested reviewers
Poem
Pre-merge checks and finishing touches❌ Failed checks (1 inconclusive)
✅ Passed checks (4 passed)
✨ Finishing touches
🧪 Generate unit tests
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Time Submission Status
|
Successfully implemented and validated AWS AMI infrastructure for TRUF.NETWORK mainnet deployment. - **Rapid Deployment**: 3-command setup reduces deployment time from hours to minutes - **Production Ready**: Mainnet-configured nodes with state-sync for fast bootstrap - **User Experience**: Simplified private key management and automated service configuration - ✅ **AMI Pipeline**: Automated image building with EC2 Image Builder - ✅ **Mainnet Configuration**: Proper chain ID, genesis, and peer connections - ✅ **Database Integration**: Containerized PostgreSQL with optimized settings - ✅ **Service Management**: Systemd integration for reliable node operation resolves: #1132
05d96e9 to
9d8247f
Compare
There was a problem hiding this comment.
Actionable comments posted: 5
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (3)
deployments/infra/stacks/ami_pipeline_stack.go (3)
167-173: Fix Docker Compose detection:docker-compose --versionwill fail on Ubuntu 24.04You install the v2 plugin, so only
docker compose versionis guaranteed. This breaks AMI builds during verification.- - docker --version - - docker-compose --version - - systemctl is-active docker + - docker --version + - docker compose version + - systemctl is-active docker
289-298: Harden .env handling (ownership and permissions)
.envcontains secrets (TN_PRIVATE_KEY). Set strict perms and ownership so only usertncan read it.cat > .env << ENVEOF CHAIN_ID=$CHAIN_ID DB_OWNER=postgres://kwild:kwild@kwil-postgres:5432/kwild ENVEOF + sudo chown tn:tn .env + sudo chmod 600 .env
333-338: Replace legacydocker-composewith/usr/bin/docker composein AMI startupAMI install script only adds the Compose CLI plugin (no legacy
docker-composebinary), so the current calls will fail on fresh instances.
- Change in deployments/infra/stacks/ami_pipeline_stack.go (around lines 333–338):
- sudo -u tn docker-compose pull + sudo -u tn /usr/bin/docker compose pull @@ - sudo -u tn docker-compose up -d + sudo -u tn /usr/bin/docker compose up -d
- rg shows other legacy usages (scripts/test-ami.sh, deployments/infra/README.md, Taskfile.yml, deployments/dev-gateway/README.md, etc.) — update docs/scripts or add runtime detection/fallbacks.
🧹 Nitpick comments (4)
deployments/infra/stacks/ami_pipeline_stack.go (3)
210-213: Use printf or heredoc for base64 decode to avoid echo pitfalls
echocan mangle long/base64 strings. Useprintfor a heredoc to ensure exact bytes.- echo "` + getEncodedDockerCompose() + `" | base64 -d > /opt/tn/docker-compose.yml + printf "%s" "` + getEncodedDockerCompose() + `" | base64 -d > /opt/tn/docker-compose.yml
226-237: Consider switching systemd unit Type to simple (or add health checks)
Type=oneshotonly orchestrates start/stop, not health. If compose fails, systemd shows “active (exited)”. PreferType=simplewith a wrapper that blocks until containers are healthy, or keep oneshot but add periodic checks elsewhere.
299-305: Avoid writing secrets to world-readable files without opt-inAppending
TN_PRIVATE_KEYto.envis fine if perms are strict. Ensure this runs after the chmod above; also consider supporting--env-filepath to avoid mixing multiple settings.deployments/infra/stacks/docker-compose.template.yml (1)
39-76: Add a healthcheck for tn-node to improve dependency ordering
postgres-mcpdepends onservice_startedfor tn-node, which might not imply readiness. Add a tn-node healthcheck and depend onservice_healthy.Example:
tn-node: @@ restart: unless-stopped + healthcheck: + test: ["CMD-SHELL", "curl -fsS http://127.0.0.1:8484/health || exit 1"] + interval: 10s + timeout: 5s + retries: 24Then in postgres-mcp:
depends_on: @@ - tn-node: - condition: service_started + tn-node: + condition: service_healthyAdjust the health endpoint as appropriate for kwild.
📜 Review details
Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (3)
deployments/infra/stacks/ami_pipeline_stack.go(5 hunks)deployments/infra/stacks/docker-compose.template.yml(2 hunks)scripts/test-ami.sh(3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-23T02:13:19.979Z
Learnt from: MicBun
PR: trufnetwork/node#1170
File: deployments/infra/stacks/ami_pipeline_stack.go:297-384
Timestamp: 2025-09-23T02:13:19.979Z
Learning: In the trufnetwork/node repository, the chain ID should always be "tn-v2.1" regardless of network configuration (mainnet/testnet). There are no other chains currently supported.
Applied to files:
deployments/infra/stacks/ami_pipeline_stack.go
📚 Learning: 2025-09-19T18:59:51.942Z
Learnt from: outerlook
PR: trufnetwork/node#1168
File: tests/extensions/tn_digest/test_tn_digest.sh:15-15
Timestamp: 2025-09-19T18:59:51.942Z
Learning: In Docker Compose files, service names (like "tn-db:") can remain unchanged even when the underlying image is updated (e.g., from tn-db:local to ghcr.io/trufnetwork/node:local). Service names are internal network identifiers and don't need to match the actual image names.
Applied to files:
deployments/infra/stacks/docker-compose.template.yml
🪛 Gitleaks (8.28.0)
scripts/test-ami.sh
[high] 165-165: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.
(generic-api-key)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
- GitHub Check: lint
- GitHub Check: acceptance-test
🔇 Additional comments (1)
deployments/infra/stacks/docker-compose.template.yml (1)
29-33: Verify public exposure of RPC port 26657You map 26657 to all interfaces. If RPC should be private, bind to localhost.
- - "26657:26657" + - "127.0.0.1:26657:26657"If public RPC is intended, ensure rate-limits and auth as needed.
Successfully implemented and validated AWS AMI infrastructure for TRUF.NETWORK mainnet deployment.
Key Business Impact
Technical Achievements
Tested by deploying the AMI and launch instance using Amazon Console
resolves: #1132
Summary by CodeRabbit
New Features
Chores
Tests