Skip to content

feat: setup node inside AWS AMI Instance#1172

Merged
MicBun merged 7 commits into
mainfrom
AwsAmiConfigure
Sep 23, 2025
Merged

feat: setup node inside AWS AMI Instance#1172
MicBun merged 7 commits into
mainfrom
AwsAmiConfigure

Conversation

@MicBun
Copy link
Copy Markdown
Contributor

@MicBun MicBun commented Sep 23, 2025

Successfully implemented and validated AWS AMI infrastructure for TRUF.NETWORK mainnet deployment.

Key Business Impact

  • Rapid Deployment: 3-command setup reduces deployment time from hours to minutes
  • Production Ready: Mainnet-configured nodes with state-sync for fast bootstrap
  • User Experience: Simplified private key management and automated service configuration

Technical Achievements

  • AMI Pipeline: Automated image building with EC2 Image Builder
  • Mainnet Configuration: Proper chain ID, genesis, and peer connections
  • Database Integration: Containerized PostgreSQL with optimized settings
  • Service Management: Systemd integration for reliable node operation

Tested by deploying the AMI and launch instance using Amazon Console

cdk --app 'go run ami-cdk.go' deploy \
  --context stage=dev \
  --context devPrefix=test-$(whoami) \
  --require-approval never

aws cloudformation describe-stacks \
  --stack-name AMI-Pipeline-default-Stack \
  --region us-east-2 \
  --query 'Stacks[0].Outputs[?OutputKey==`AmiPipelineArnOutput`].OutputValue' \
  --output text

PIPELINE_ARN=$(aws cloudformation describe-stacks \
  --stack-name AMI-Pipeline-default-Stack \
  --region us-east-2 \
  --query 'Stacks[0].Outputs[?OutputKey==`AmiPipelineArnOutput`].OutputValue' \
  --output text)

# Start the AMI build
BUILD_RESULT=$(aws imagebuilder start-image-pipeline-execution \
  --image-pipeline-arn $PIPELINE_ARN \
  --region us-east-2)

echo "Build started: $BUILD_RESULT"

# Extract the image build version ARN from the result
IMAGE_BUILD_ARN=$(echo $BUILD_RESULT | jq -r '.imageBuildVersionArn')
echo "Image build ARN: $IMAGE_BUILD_ARN"

# Check build status using the image build ARN
aws imagebuilder get-image \
  --image-build-version-arn $IMAGE_BUILD_ARN \
  --region us-east-2 \
  --query 'image.{State:state.status,Reason:state.reason}'

resolves: #1132

Summary by CodeRabbit

  • New Features

    • Automatic download of network configs into a new v2 path.
    • Embedded docker-compose delivered and decoded at deploy time.
    • Streamlined node startup with auto-configuration and key conversion.
    • Simplified setup: no network flag; fixed to “mainnet (tn-v2.1)”.
  • Chores

    • Updated container images; reduced exposed ports.
    • Postgres bound to localhost with increased shared memory.
    • Improved health checks and clearer runtime logs.
  • Tests

    • Expanded private key validation coverage.
    • Updated instructions and expectations to match new setup.

Implement automated AMI building pipeline to reduce developer onboarding from 3+ hours to under 10 minutes.

- **AWS CDK Infrastructure**: EC2 Image Builder pipeline with isolated deployment
- **Docker-in-AMI**: Pre-configured containers for node, database, and MCP server
- **Always-Latest Strategy**: Containers pull latest images on startup
- **Simple Configuration**: 3-command setup with `tn-node-configure` script
- **Multi-Region Distribution**: AMI available across AWS regions

- `deployments/infra/ami-cdk.go` - Isolated AMI deployment application
- `deployments/infra/stacks/ami_pipeline_stack.go` - Main CDK stack

✅ Successfully deployed in us-east-2
✅ Docker containers and MCP server operational
Additional issue will be handled separately to limit scope.

Removes major adoption barrier by providing one-click deployment alternative to complex manual setup.

resolves: #1131
@MicBun MicBun self-assigned this Sep 23, 2025
@MicBun MicBun added the type: feat New feature or request label Sep 23, 2025
@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented Sep 23, 2025

Walkthrough

This PR updates the AMI pipeline to embed a base64-encoded docker-compose template, switches to new config paths (/opt/tn/.../v2), adds a network configs download phase, and revises private key handling. The docker-compose template gains a scripted tn-node init/start flow and updated images/ports. Tests are aligned to the fixed network and stricter key validation.

Changes

Cohort / File(s) Summary
AMI pipeline stack updates
deployments/infra/stacks/ami_pipeline_stack.go
Adds base64 helper for docker-compose template; introduces DownloadNetworkConfigs phase; updates config dirs to /opt/tn/configs/network/v2 and /opt/tn/{configs/network/v2,data}; removes dynamic NETWORK; writes .env with TN_PRIVATE_KEY; decodes compose to /opt/tn/docker-compose.yml.
Docker Compose template and runtime flow
deployments/infra/stacks/docker-compose.template.yml
Updates images (kwildb/postgres:latest, tn-node pinned sha); tightens Postgres bind/auth; adds shm_size; reduces exposed ports; replaces env-based defaults with explicit init/start command that generates config, edits DB host, validates TN_PRIVATE_KEY -> nodekey.json, then starts kwild; updates dependencies/healthchecks.
AMI test script alignment
scripts/test-ami.sh
Removes --network parsing; fixes network display to “mainnet (tn-v2.1)”; switches test key to 64-hex; adds negative key validation cases; expands .env generation tests; updates usage hint to tn-node-configure --enable-mcp.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Dev as AMI Build (CDK)
  participant EC2 as AMI Instance
  participant S3 as Network Configs Source
  participant FS as /opt/tn FS

  rect rgba(200,220,255,0.25)
  note over Dev: Build-time
  Dev->>Dev: Embed docker-compose.template.yml (base64)
  Dev->>EC2: Provision AMI phases
  end

  rect rgba(220,255,220,0.25)
  note over EC2,FS: First boot
  EC2->>S3: Download genesis.json
  S3-->>EC2: genesis.json
  EC2->>FS: Create /opt/tn/{configs/network/v2,data}
  EC2->>FS: Decode docker-compose.yml to /opt/tn
  EC2->>FS: Write .env (TN_PRIVATE_KEY, etc.)
  end
Loading
sequenceDiagram
  autonumber
  participant DC as docker-compose
  participant Node as tn-node (container)
  participant PG as kwil-postgres
  participant MCP as postgres-mcp
  participant FS as /root/.kwild

  rect rgba(255,245,200,0.35)
  note over DC: Runtime bring-up
  DC->>PG: Start (host trust auth, shm=1gb)
  DC->>Node: Start (entrypoint via command)
  end

  rect rgba(235,235,255,0.35)
  note over Node,FS: Init sequence
  Node->>Node: Generate temp node config
  Node->>FS: Copy to /root/.kwild if valid
  Node->>FS: Edit config.toml DB host=kwil-postgres
  alt TN_PRIVATE_KEY present
    Node->>FS: Validate/convert -> nodekey.json
  else No key
    Node-->>Node: Skip key conversion
  end
  Node->>Node: Start /app/kwild -r /root/.kwild
  end

  PG-->>DC: healthy
  DC->>MCP: Start (depends_on: PG healthy, Node started)
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

Suggested reviewers

  • outerlook

Poem

I nibbled on configs, crisp and new,
Base64 bundles for a smoother brew.
Genesis fetched, keys tucked tight,
Compose whispers, “Node—ignite!”
Postgres hums, the chains align—
Thump-thump, my paws: deploy time! 🐇✨

Pre-merge checks and finishing touches

❌ Failed checks (1 inconclusive)
Check name Status Explanation Resolution
Linked Issues Check ❓ Inconclusive The changes implement several coding-related acceptance criteria from [#1132], including an AMI pipeline, mainnet configuration paths (/opt/tn/configs/network/v2), containerized PostgreSQL in the compose template, automatic network config download, and in-container startup logic that handles config generation and TN_PRIVATE_KEY handling which partially satisfies the "configuration wrapper" objective; however, the provided summaries do not show explicit systemd unit files or a distinct top-level wrapper script and therefore do not fully demonstrate that all system services are pre-configured and that the AMI reduces user setup to the promised small number of commands. Please include or point to the explicit systemd unit/service installation and enablement steps and either add a standalone wrapper script or document the exact entrypoint commands that constitute the claimed one/three-command user flow, and add a short verification test or manifest entry in the AMI pipeline showing those services are installed and enabled.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title "feat: setup node inside AWS AMI Instance" is concise, single-sentence, and accurately reflects the primary change described in the PR (AMI-based node setup, pipeline, and configuration changes), so it communicates the main author intent clearly to reviewers.
Out of Scope Changes Check ✅ Passed All modifications summarized (AMI pipeline updates, base64-encoded docker-compose template, new config paths, docker-compose service and startup logic, and updated AMI test scripts) are directly related to provisioning and configuring the AMI for node deployment and align with the linked issue objectives, with no unrelated feature additions or broad refactors evident in the provided summaries.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch AwsAmiConfigure

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Base automatically changed from featAwsAmi to main September 23, 2025 12:53
Comment thread deployments/infra/stacks/docker-compose.template.yml Outdated
@holdex
Copy link
Copy Markdown

holdex Bot commented Sep 23, 2025

Time Submission Status

Member Status Time Action Last Update
@outerlook ❌ Missing - ⚠️ Submit time -
MicBun ✅ Submitted 12h Update time Sep 23, 2025, 6:50 PM

Successfully implemented and validated AWS AMI infrastructure for TRUF.NETWORK mainnet deployment.

- **Rapid Deployment**: 3-command setup reduces deployment time from hours to minutes
- **Production Ready**: Mainnet-configured nodes with state-sync for fast bootstrap
- **User Experience**: Simplified private key management and automated service configuration

- ✅ **AMI Pipeline**: Automated image building with EC2 Image Builder
- ✅ **Mainnet Configuration**: Proper chain ID, genesis, and peer connections
- ✅ **Database Integration**: Containerized PostgreSQL with optimized settings
- ✅ **Service Management**: Systemd integration for reliable node operation

resolves: #1132
@MicBun MicBun marked this pull request as ready for review September 23, 2025 18:39
@MicBun MicBun requested a review from outerlook September 23, 2025 18:41
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (3)
deployments/infra/stacks/ami_pipeline_stack.go (3)

167-173: Fix Docker Compose detection: docker-compose --version will fail on Ubuntu 24.04

You install the v2 plugin, so only docker compose version is guaranteed. This breaks AMI builds during verification.

-            - docker --version
-            - docker-compose --version
-            - systemctl is-active docker
+            - docker --version
+            - docker compose version
+            - systemctl is-active docker

289-298: Harden .env handling (ownership and permissions)

.env contains secrets (TN_PRIVATE_KEY). Set strict perms and ownership so only user tn can read it.

               cat > .env << ENVEOF
               CHAIN_ID=$CHAIN_ID
               DB_OWNER=postgres://kwild:kwild@kwil-postgres:5432/kwild
               ENVEOF
+              sudo chown tn:tn .env
+              sudo chmod 600 .env

333-338: Replace legacy docker-compose with /usr/bin/docker compose in AMI startup

AMI install script only adds the Compose CLI plugin (no legacy docker-compose binary), so the current calls will fail on fresh instances.

  • Change in deployments/infra/stacks/ami_pipeline_stack.go (around lines 333–338):
-              sudo -u tn docker-compose pull
+              sudo -u tn /usr/bin/docker compose pull
@@
-              sudo -u tn docker-compose up -d
+              sudo -u tn /usr/bin/docker compose up -d
  • rg shows other legacy usages (scripts/test-ami.sh, deployments/infra/README.md, Taskfile.yml, deployments/dev-gateway/README.md, etc.) — update docs/scripts or add runtime detection/fallbacks.
🧹 Nitpick comments (4)
deployments/infra/stacks/ami_pipeline_stack.go (3)

210-213: Use printf or heredoc for base64 decode to avoid echo pitfalls

echo can mangle long/base64 strings. Use printf or a heredoc to ensure exact bytes.

-              echo "` + getEncodedDockerCompose() + `" | base64 -d > /opt/tn/docker-compose.yml
+              printf "%s" "` + getEncodedDockerCompose() + `" | base64 -d > /opt/tn/docker-compose.yml

226-237: Consider switching systemd unit Type to simple (or add health checks)

Type=oneshot only orchestrates start/stop, not health. If compose fails, systemd shows “active (exited)”. Prefer Type=simple with a wrapper that blocks until containers are healthy, or keep oneshot but add periodic checks elsewhere.


299-305: Avoid writing secrets to world-readable files without opt-in

Appending TN_PRIVATE_KEY to .env is fine if perms are strict. Ensure this runs after the chmod above; also consider supporting --env-file path to avoid mixing multiple settings.

deployments/infra/stacks/docker-compose.template.yml (1)

39-76: Add a healthcheck for tn-node to improve dependency ordering

postgres-mcp depends on service_started for tn-node, which might not imply readiness. Add a tn-node healthcheck and depend on service_healthy.

Example:

   tn-node:
@@
     restart: unless-stopped
+    healthcheck:
+      test: ["CMD-SHELL", "curl -fsS http://127.0.0.1:8484/health || exit 1"]
+      interval: 10s
+      timeout: 5s
+      retries: 24

Then in postgres-mcp:

   depends_on:
@@
-      tn-node:
-        condition: service_started
+      tn-node:
+        condition: service_healthy

Adjust the health endpoint as appropriate for kwild.

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 17d4ce4 and eb1163b.

📒 Files selected for processing (3)
  • deployments/infra/stacks/ami_pipeline_stack.go (5 hunks)
  • deployments/infra/stacks/docker-compose.template.yml (2 hunks)
  • scripts/test-ami.sh (3 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-23T02:13:19.979Z
Learnt from: MicBun
PR: trufnetwork/node#1170
File: deployments/infra/stacks/ami_pipeline_stack.go:297-384
Timestamp: 2025-09-23T02:13:19.979Z
Learning: In the trufnetwork/node repository, the chain ID should always be "tn-v2.1" regardless of network configuration (mainnet/testnet). There are no other chains currently supported.

Applied to files:

  • deployments/infra/stacks/ami_pipeline_stack.go
📚 Learning: 2025-09-19T18:59:51.942Z
Learnt from: outerlook
PR: trufnetwork/node#1168
File: tests/extensions/tn_digest/test_tn_digest.sh:15-15
Timestamp: 2025-09-19T18:59:51.942Z
Learning: In Docker Compose files, service names (like "tn-db:") can remain unchanged even when the underlying image is updated (e.g., from tn-db:local to ghcr.io/trufnetwork/node:local). Service names are internal network identifiers and don't need to match the actual image names.

Applied to files:

  • deployments/infra/stacks/docker-compose.template.yml
🪛 Gitleaks (8.28.0)
scripts/test-ami.sh

[high] 165-165: Detected a Generic API Key, potentially exposing access to various services and sensitive operations.

(generic-api-key)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (2)
  • GitHub Check: lint
  • GitHub Check: acceptance-test
🔇 Additional comments (1)
deployments/infra/stacks/docker-compose.template.yml (1)

29-33: Verify public exposure of RPC port 26657

You map 26657 to all interfaces. If RPC should be private, bind to localhost.

-      - "26657:26657"
+      - "127.0.0.1:26657:26657"

If public RPC is intended, ensure rate-limits and auth as needed.

Comment thread deployments/infra/stacks/ami_pipeline_stack.go
Comment thread deployments/infra/stacks/docker-compose.template.yml
Comment thread deployments/infra/stacks/docker-compose.template.yml
Comment thread deployments/infra/stacks/docker-compose.template.yml
Comment thread scripts/test-ami.sh
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

type: feat New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Problem(AWS AMI): Complex node setup prevents developer adoption

2 participants