CloudForge is a production-grade, self-service Internal Developer Platform (IDP) designed to bridge the gap between developer agility and secure, standardized cloud engineering. It allows developers to provision, manage, and tear down curated, highly secure AWS infrastructure resources using pre-configured Terraform blueprints directly from an intuitive, modern dashboard.
Featuring real-time WebSocket logs, automated health-checking, secure credential harvesting from AWS Secrets Manager, and bulletproof transactional rollbacks, CloudForge represents a modern developer platform that makes cloud resources accessible, transparent, and safe.
CloudForge is engineered with a zero-trust and zero-leaked-state design:
- Self-Service with Guardrails: Developers select resources from a structured Service Catalog (EC2, S3, RDS Aurora, ElastiCache Redis, Lambda) and tailor capacity (e.g., node type, memory, database capacity). Behind the scenes, CloudForge guarantees that all deployments adhere strictly to compliance boundaries (e.g., auto-managed security groups, TLS-encrypted key pairs, default VPC bindings).
- Secure Credential Harvesting: The backend never handles or stores plaintext static AWS passwords or database master keys on disk. Instead, Terraform creates the infrastructure and registers credentials directly inside AWS Secrets Manager with random suffixes. The CloudForge backend securely harvests these secrets temporarily via the AWS SDK using the temporary Secret ARN outputted by Terraform, serving them dynamically to the frontend client without ever persisting secrets in its database or local logs.
- Live Infrastructure Feedback Loop: Leveraging standard OS process pipes mapped to Socket.io WebSockets, developers get instant, streaming console logs for
terraform initandterraform applycommands directly on a terminal interface, mimicking public cloud providers (e.g., AWS CloudFormation or HashiCorp Terraform Cloud). - Self-Healing State & Rollback Engine: In the event of a provisioning error (e.g., KMS key constraints, Secrets Manager scheduling conflicts, AWS permission limits), the orchestration engine handles structural rollbacks instantly, pruning failed states, cleaning up local records, and keeping the active control plane 100% in sync with physical cloud resources.
CloudForge is structured as a decoupled, multi-container system that orchestrates local process execution, databases, and remote clouds.
graph TB
subgraph Client Layer [Client Interface]
FE[React + Vite SPA]
SocketClient[Socket.io-Client]
end
subgraph Internal Developer Platform Portal [CloudForge Platform]
NGINX[Nginx Container: Port 8080]
BE[Node.js + Express API: Port 3000]
SocketServer[Socket.io Server]
SQLite[(SQLite Database: cloudforge.db)]
TFE[Terraform CLI Engine]
AWSCLI[AWS CLI & SDK]
end
subgraph AWS Cloud [Target Cloud Infrastructure]
SM[AWS Secrets Manager]
RDS[RDS Aurora PostgreSQL]
EC2[EC2 Compute Nodes]
S3[S3 Storage Buckets]
Redis[ElastiCache Redis]
Lambda[AWS Lambda Functions]
end
%% Client Interactions
FE -->|HTTP REST APIs| NGINX
NGINX -->|Reverse Proxy| BE
SocketClient <-->|Live Bi-directional WebSockets| SocketServer
%% Backend Orchestration
BE -->|Tracks State, Catalog, Stats & Activity| SQLite
SocketServer -->|Spawns Async Processes| TFE
TFE -->|State Outputs| BE
AWSCLI -->|sts:GetCallerIdentity validation| AWS Cloud
AWSCLI -->|secretsmanager:GetSecretValue| SM
TFE -->|Provisions Resources| AWS Cloud
%% Dynamic Secret Extraction
SM -.->|Delivers Encryption Keys & DB Master Passwords| AWSCLI
The real-time orchestration system of CloudForge utilizes an asynchronous, event-driven pattern designed to handle long-running infrastructure actions safely.
This sequence diagram tracks the entire lifecycle of a provisioning event—from the user pushing the "Deploy" button to the rendering of retrieved cloud credentials.
sequenceDiagram
autonumber
actor Dev as Developer (UI)
participant API as Express API
participant DB as SQLite DB
participant WS as Socket.io Server
participant TF as Terraform Engine
participant AWS as AWS Cloud / Secrets Manager
Dev->>API: POST /api/provision (engine, capacity, env)
activate API
API->>API: Validate AWS STS caller identity
API->>DB: INSERT INTO resources (status='Provisioning', environment, engine)
API->>DB: INSERT INTO activity (status='PENDING', action='provisioned')
API->>DB: Recalculate Stats & cost estimates
API-->>Dev: HTTP 202 Accepted (provisionId)
deactivate API
Dev->>WS: Establish Connection (Socket.io)
Dev->>WS: Emit 'start_provisioning' { provisionId, engine, capacity }
activate WS
WS->>WS: Map engine -> /terraform/templates/aws-[service]
WS->>TF: Spawn Child Process: terraform init
activate TF
loop Stream Stdout & Stderr
TF-->>WS: Stream line-by-line output
WS-->>Dev: Emit 'terraform_log' (normal/error type)
end
TF-->>WS: Process closed (Exit Code 0)
deactivate TF
WS->>TF: Spawn Child Process: terraform apply -auto-approve -var=capacity=...
activate TF
loop Stream Stdout & Stderr
TF-->>WS: Stream line-by-line output
WS-->>Dev: Emit 'terraform_log' (normal/error type)
end
TF-->>WS: Process closed (Exit Code 0)
deactivate TF
Note over WS,TF: Provisions completed! Extracting outputs.
WS->>TF: Spawn Process: terraform output -json
activate TF
TF-->>WS: Return JSON output containing secret_arn
deactivate TF
WS->>AWS: Spawn AWS CLI: aws secretsmanager get-secret-value --secret-id [secretArn]
activate AWS
AWS-->>WS: Return SecretString (JSON string with Username, Password, IP, etc.)
deactivate AWS
WS->>DB: UPDATE resources SET status='Active', ip=[endpoint], credentials=[JSON] WHERE id=[provisionId]
WS-->>Dev: Emit 'terraform_complete' { success: true, credentials }
deactivate WS
flowchart TD
subgraph Destroy Workflow
D1[User clicks 'Destroy' in UI] --> D2[DELETE /api/resources/:id]
D2 --> D3[DB: Update status to 'Terminating']
D3 --> D4[Asynchronously spawn 'terraform destroy -auto-approve']
D4 --> D5{Exit Code?}
D5 -->|0: Success| D6[DB: Delete resource row & add activity]
D5 -->|Non-zero: Error| D7[DB: Set status to 'Error' & report log]
end
subgraph Fail & Rollback Workflow
R1[Terraform Apply Fails] --> R2[DB: Set resource status to 'Error']
R2 --> R3[Trigger 'handleRollback' script]
R3 --> R4[DB: Remove the failed resource row]
R4 --> R5[DB: Add a failed/rollback activity event]
R5 --> R6[WebSocket: Send 'terraform_complete' with success=false]
end
cloudForge/
├── .github/
│ └── workflows/
│ └── build.yml # CI/CD Pipeline (Build, Test, & Docker compliance checks)
├── backend/
│ ├── data/
│ │ └── cloudforge.db # Local SQLite database (gitignored, created on startup)
│ ├── database/ # Seed files & SQLite connection configurations
│ ├── routes/
│ │ └── api.js # Main Express Router (REST endpoints for resources, catalog, & credentials)
│ ├── services/
│ │ ├── awsValidator.js # Pre-flight AWS connectivity checks & sts validation
│ │ ├── database.js # Data Access Objects (DAO), SQLite schemas, seeding, & stats tracking
│ │ └── terraformRunner.js # Spawn-based orchestration engine & AWS Secrets Manager harvester
│ ├── __tests__/
│ │ └── sanity.test.js # Jest testing suite for the backend controllers
│ ├── .env # Local backend environment variables (gitignored)
│ ├── Dockerfile # Multi-stage Docker deployment build for the Node backend
│ ├── package.json
│ └── server.js # Node Express & Socket.io server entrypoint
├── frontend/
│ ├── public/
│ ├── src/
│ │ ├── components/ # Reusable UI containers (Terminal, Sidebar, Header, Status)
│ │ ├── hooks/
│ │ │ └── useSocket.js # Socket.io connection, log buffer, & state hooks
│ │ ├── pages/ # Routed pages (Dashboard, Catalog, Provisioning, ActiveResources, Docs)
│ │ ├── App.css
│ │ ├── App.jsx # App routing & viewport shell
│ │ ├── index.css # Design system system tokens & typography
│ │ └── main.jsx # Application root mounting
│ ├── __tests__/
│ │ └── sanity.test.cjs # Frontend integration tests (Jest)
│ ├── Dockerfile # Nginx static server builder for React production bundles
│ ├── nginx.conf # Nginx redirection and reverse proxy server configuration
│ ├── package.json
│ ├── postcss.config.js
│ ├── tailwind.config.js # Design system theme palette (surfaces, highlights, borders)
│ └── vite.config.js # Vite bundling configs
├── terraform/
│ └── templates/ # Terraform AWS blueprints
│ ├── aws-aurora-postgres/ # Multi-AZ Aurora Serverless/Provisioned PG clusters
│ ├── aws-ec2/ # Standard VPC subnets, elastic IPs, key pairs, & Security Groups
│ ├── aws-elasticache-redis/ # High-performance in-memory ElastiCache clusters
│ ├── aws-lambda/ # Serverless microservices, zip bundles, & API Gateways
│ └── aws-s3/ # Standard S3 object buckets with versioning & KMS encryption
└── docker-compose.yml # Local multi-service orchestrator definition
Each folder in terraform/templates contains isolated, self-contained Terraform code modeled around modern Cloud Architecture security principles.
- EC2 Instance (
aws-ec2): Generates an dynamic 24-character random password (random_password) and an RSA 4096-bit SSH key pair (tls_private_key). It deploys the instance within the default VPC, provisions an Elastic IP (EIP), registers an auto-expiring AWS Secrets Manager secret, and utilizes standarduser_datato automatically configure host authentication with the random password inside Amazon Linux. - Aurora PostgreSQL (
aws-aurora-postgres): Deploys a managed PostgreSQL database instance. Avoids hardcoded admin credentials by generating credentials dynamically, storing them in AWS Secrets Manager, and outputting endpoints. - ElastiCache Redis (
aws-elasticache-redis): Configures Redis memory caching layers. Supports node tailoring and secure connectivity within subnet groups. - AWS Lambda (
aws-lambda): Packages serverless functions on the fly, hooks them up to API Gateway v2 (HTTP API) endpoints, and outputs live execution HTTP endpoints. - S3 Bucket (
aws-s3): Deploys fully versioned, private S3 buckets secured with KMS-based server-side encryption (SSE-KMS).
Deploying CloudForge locally takes less than five minutes using Docker Compose.
Ensure you have the following installed on your host system:
- Docker & Docker Compose
- AWS CLI with configured credentials (run
aws configureto verify). - Access to an active AWS account (e.g. Sandbox/IAM User with proper resource privileges).
Create a .env file in the backend/ directory:
# cloudForge/backend/.env
PORT=3000
NODE_ENV=productionFrom the root of the project, run:
docker-compose up --build -dThis builds and starts:
idp-backend: Exposed onhttp://localhost:3000(Socket.io endpoint & REST API).idp-frontend: Exposed onhttp://localhost:8080(React application running in Nginx).
To execute real Terraform actions on AWS, the container must access valid credentials. Instead of hardcoding credentials, docker-compose.yml mounts your host's local ~/.aws folder directly to the backend's runtime profile:
# Extract from docker-compose.yml
idp-backend:
volumes:
- ./terraform:/terraform # Mounts Terraform templates
- ${HOME}/.aws:/home/node/.aws:ro # MOUNTS HOST AWS CREDENTIALS SECURELY (READ-ONLY)
environment:
- AWS_SDK_LOAD_CONFIG=1 # Directs AWS SDK to load config profilesFor quick development changes or debugging, you can run the backend and frontend locally without Docker.
Install dependencies, verify your local terraform and aws installations, and start the node service:
cd backend
npm install
node server.jsThe backend database will automatically initialize SQLite at backend/data/cloudforge.db and listen on port 3000.
Install frontend packages and launch the Vite development server:
cd ../frontend
npm install
npm run devOpen http://localhost:5173 to interact with the developer portal.
CloudForge implements comprehensive testing strategies across both the presentation and API engine layers.
graph LR
subgraph Testing Framework [Jest Test Suits]
FE_Test[Frontend Integration Tests]
BE_Test[Backend Sanity Tests]
Syntax[Server.js Syntax Compliance]
end
FE_Test -->|npm test| Jest_FE[Jest + ESM Config]
BE_Test -->|npm test| Jest_BE[Jest Backend]
Syntax -->|node --check| Engine[V8 Engine Check]
cd frontend
npm run testcd backend
npm run testcd backend
node --check server.jsCloudForge utilizes a highly optimized GitHub Actions pipeline to validate code quality and compliance with every push to the codebase. The configuration is defined in build.yml.
graph TD
Trigger[GitHub Event: Push to main/dev or PR to main] --> Checkout[actions/checkout@v4]
subgraph Test Phase [Parallel Checks]
Checkout --> FE_Job[Frontend Job]
Checkout --> BE_Job[Backend Job]
end
subgraph Frontend Verification
FE_Job --> FE_Node[Setup Node.js 20 & Cache]
FE_Node --> FE_Install[npm ci]
FE_Install --> FE_RunTest[npm test]
FE_RunTest --> FE_Build[npm run build]
end
subgraph Backend Verification
BE_Job --> BE_Node[Setup Node.js 20 & Cache]
BE_Node --> BE_Install[npm ci]
BE_Install --> BE_RunTest[npm test]
BE_RunTest --> BE_Check[node --check server.js]
end
FE_Build & BE_Check --> Docker_Job[Docker Job: Build Docker Images]
subgraph Docker Verification
Docker_Job --> Docker_Set[Set up Docker Buildx]
Docker_Set --> Docker_Build_BE[Build Backend Docker Image]
Docker_Set --> Docker_Build_FE[Build Frontend Docker Image]
end
Docker_Build_BE & Docker_Build_FE --> Complete[CI Checks Passed 🎉]
- Build Matrix & Caching: Cache dependencies across builds via standard
actions/setup-nodecaching linked directly topackage-lock.jsonpaths. - Docker Validation: Uses standard
docker/setup-buildx-actionanddocker/build-push-actionusing Action cache formats (type=gha) to confirm both backend and frontend images build reliably under strict Docker conditions without publishing.
CloudForge is licensed under the ISC License.
For additions, security fixes, or cloud blueprint modifications:
- Ensure all custom AWS resources define tags:
ManagedBy = "IDP-Orchestrator". - Secrets generated during Terraform provisioning should be dynamically configured using
random_passwordelements rather than hardcoded configurations, mapping their connection inputs back to Secret values. - Ensure both
frontendandbackendtesting pipelines return green (npm run test) before generating pull requests.