Nova Forge — Devpost Submission Draft

Project Name

Nova Forge

Tagline

Describe it. Nova builds it. Open-source agent orchestration that turns natural language into deployed applications.

Inspiration

We've built 89+ production services using multi-agent orchestration patterns — parallel task execution, quality gates, artifact handoffs, autonomy systems. But those patterns were locked inside a proprietary system tied to a single LLM provider.

When Amazon announced the Nova AI Hackathon, we saw the opportunity to extract those battle-tested patterns into an open-source framework powered by Amazon Nova. The question wasn't "can Nova build software?" — it was "can we build an orchestration layer that makes Nova as productive as possible?"

The answer is Nova Forge: 30,000 lines of Python, 1,670 tests, and all 3 Nova models achieving S-tier benchmark scores.

What it does

Nova Forge takes a natural language description of what you want to build and orchestrates Amazon Nova models to plan, build, review, and deploy it automatically.

Example: Type "Build an expense tracker with categories, monthly charts, and CSV export" — Nova Forge:

Interviews you with a 3-phase deep planning session (scope, technical decisions, risk assessment)
Plans the project as a dependency graph of tasks with topological sort
Builds tasks in parallel waves using multiple AI agents with pre-seeded upstream context
Reviews the output with an adversarial gate reviewer (PASS/FAIL/CONDITIONAL)
Previews the result via Cloudflare Tunnel (shareable URL, no account needed)
Deploys to production with Docker + nginx + SSL in one command

Result: A working Flask application with 5 files in 144 seconds. Benchmark grade: S (100%).

Key capabilities:

3 Amazon Nova models — Lite (32K, fast), Pro (300K, balanced), Premier (1M, deep reasoning)
11 agent formations — Pre-built team layouts from single-file edits to full-stack multi-agent builds
13 agent tools — read, write, edit, bash, glob, grep, remember, and more
6-level autonomy system — A0 (ask for everything) through A5 (fully autonomous)
14-stack preview detection — Auto-detects Flask, FastAPI, Node, React, and 10 more
Circuit breaker — Auto-disables failing tools to prevent infinite retry loops
Agent self-correction — Agents verify their own output with read-back checks

How we built it

Pure Python, no dependencies on proprietary agent frameworks. Nova Forge is a standalone tool-use loop that calls Amazon Nova via AWS Bedrock's Converse API.

Architecture:

ForgeAgent — The core loop: send prompt → Nova responds with tool calls → execute tools → loop until done. 13 tools available (read_file, write_file, bash, remember, etc.)
ModelRouter — Adapter for AWS Bedrock with custom botocore.Config (300s read timeout for Premier's ~100s inference time)
WaveExecutor — Runs independent tasks in parallel using asyncio.gather() with semaphore throttling
GateReviewer — Spawns a read-only reviewer agent that adversarially checks the build output
PromptBuilder — 3-tier system prompts: Slim (~600 chars for 32K Lite), Focused (~1,500 chars for 300K+ Pro/Premier), Full (~5K chars for 1M+)

The 3-tier prompt system was the single biggest breakthrough. Nova Lite's 32K context window means every token matters. We created a slim prompt that strips tools to essentials and adds output coaching ("respond with JSON, not markdown"). This took Lite from C-grade to S-grade.

Pre-seeded upstream context was the second breakthrough. Instead of letting dependent tasks discover files through tool calls (2-3 turns wasted), we inject the file content from completed upstream tasks directly into the prompt. Premier went from A to S with this change.

19 sprints in 7 days. We tracked every change through a sprint system, enabling rapid iteration on model-specific optimizations.

Challenges we ran into

Nova Lite's 32K context window — Our initial system prompts used ~5,000 characters. At 32K, that's a significant fraction of the budget. We had to create an entirely different prompt tier (Slim) with minimal tools and coaching to make Lite work.
Nova Premier's inference time — Premier takes ~100 seconds per inference call. The default boto3 timeout of 60 seconds caused every Premier request to fail. We had to discover and configure botocore.Config(read_timeout=300) — this isn't well-documented.
Malformed JSON from LLM output — All Nova models occasionally produce trailing commas, truncated JSON, or markdown fences around JSON. We built _recover_json() with progressive fallback: strip fences → fix trailing commas → extract JSON substring → parse partial.
Agent tool failure spirals — When a tool fails (e.g., edit_file with wrong old_string), agents would retry the exact same call repeatedly. We implemented a circuit breaker that disables a tool after 3 consecutive failures, forcing the agent to try alternative approaches.
Context window exhaustion — On long builds (30+ turns), the conversation exceeds the model's context. We implemented budget-based compaction (60% threshold for 32K, 65% for 200K+) that preserves toolUse/toolResult pairs to avoid breaking the conversation structure.

Accomplishments that we're proud of

All 3 Nova models at S-tier on the Expense Tracker benchmark (100%, 100%, 100%)
A-tier on the Hard scenario (Kanban Board: auth + 3 tables + 7 files) across all 3 models
1,670 tests passing across 50 test files — comprehensive coverage of the entire framework
19 sprints in 7 days — from zero to competition-ready with disciplined iteration
Grade progression C → S through systematic model-specific optimization (not prompt hacking)
Pure Python — no dependency on Claude Code, Cursor, or any proprietary agent framework. The orchestration patterns are the innovation, not the model.

What we learned

Model-specific prompts matter enormously. A one-size-fits-all system prompt works poorly across 32K-1M context windows. Tailoring the prompt to the model's capacity was the #1 performance lever.
Pre-seeding context is better than letting agents discover it. Injecting upstream file content saves 2-3 turns per task — and with Premier's ~100s/turn, that's 3-5 minutes saved per task.
Adversarial review catches real bugs. The gate reviewer (a separate Nova agent with read-only tools) found issues that the building agents missed in ~30% of builds.
Amazon Nova is genuinely capable of complex multi-file builds. With the right orchestration, even Nova Lite (the smallest model) can build a full-stack app with auth, database, and frontend — and score S-tier.

Enterprise & Community Impact

For developers and teams:

Democratizes AI-powered development — Any developer with AWS credentials can go from idea to deployed app in minutes, not days. No proprietary frameworks, no vendor lock-in.
Open-source with community infrastructure — MIT license, CONTRIBUTING.md, issue templates, CI pipeline, and a comprehensive User Guide. Ready for community contributions from day one.
Model-agnostic architecture — While built on Amazon Nova, the provider adapter pattern supports any LLM with function calling. Teams can swap in their preferred model without rewriting orchestration logic.
Production patterns, not demos — The formations, autonomy system, gate review, and circuit breaker patterns come from managing 89+ production services. These are battle-tested patterns, not hackathon prototypes.

For the Amazon Nova ecosystem:

Proves Nova can build real software — All 3 Nova models score S-tier on our benchmark. This is evidence that Nova is production-capable for agentic coding, not just chat.
Showcases Bedrock Converse API — Pure boto3 integration with documented solutions for undocumented challenges (300s timeout, stop_reason detection, max_tokens scaling).
Reference implementation — 30,000 lines of well-tested Python showing how to build a serious agent framework on Amazon Nova. Forkable, extensible, and ready for enterprise adoption.

For education:

Comprehensive user guide at forge.herakles.dev/guide.html — 12 sections covering models, formations, autonomy, and advanced usage.
7 interactive demo apps built entirely by Nova agents — browse the source to learn how AI builds software.
1,670 tests serve as executable documentation of every framework behavior.

What's next

More benchmark scenarios — Expand beyond Expense Tracker and Kanban Board to test against more diverse application types
Real-time streaming UI — Replace the CLI spinner with a web-based build monitor showing live agent activity
Multi-model orchestration — Use Lite for simple tasks and Premier for complex ones within the same build, automatically routing by task complexity
Community templates — Let users contribute app templates and formation patterns
Plugin system — Allow custom tools beyond the built-in 13

Built with

Python 3.11
Amazon Nova (Lite, Pro, Premier) via AWS Bedrock
AWS Bedrock Converse API
boto3 + botocore
Click (CLI framework)
Rich (terminal UI)
pytest (testing)
Flask (website + Ask Nova chat)
Docker + nginx (deployment)
Cloudflare Tunnels (preview)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Nova Forge — Devpost Submission Draft

Project Name

Tagline

Inspiration

What it does

Key capabilities:

How we built it

Challenges we ran into

Accomplishments that we're proud of

What we learned

Enterprise & Community Impact

What's next

Built with

Category

Links

FilesExpand file tree

DEVPOST.md

Latest commit

History

DEVPOST.md

File metadata and controls

Nova Forge — Devpost Submission Draft

Project Name

Tagline

Inspiration

What it does

Key capabilities:

How we built it

Challenges we ran into

Accomplishments that we're proud of

What we learned

Enterprise & Community Impact

What's next

Built with

Category

Links