An AI-powered tool that transforms a feature idea into a structured, developer-ready PRD in under 30 seconds — complete with competitive research, typed user stories, acceptance criteria, success metrics, and engineering-ready edge cases.
🟢 Live demo: pm-spec-prd-generator.vercel.app
Built for product managers who spend 4–8 hours writing specs that could be written in minutes with the right AI pipeline behind them.
Writing a PRD from a raw idea is one of the highest-friction tasks in a PM's week. The process is sequential and labor-intensive: research competitors, translate vague requirements into testable user stories, anticipate edge cases engineering will surface anyway, define metrics with baselines, and surface the open questions that will block the build.
Most teams short-circuit this process and ship specs that are thin on competitive context, weak on acceptance criteria, and missing the edge cases that come back as scope creep. The result is misaligned engineering work, rework cycles, and delayed launches.
This tool compresses that process into a single AI pipeline that does the research, structures the output, and lets the PM iterate through natural language — so the spec that reaches engineering is grounded, specific, and ready for review.
Given a feature description (2–4 sentences), the tool outputs a complete PRD structured as:
| Section | Description |
|---|---|
| Problem statement | Clear articulation of user pain and current gap in the market |
| Goal | Measurable success definition — for the product and for the user |
| Target personas | 2–3 specific user types with role and behavioral context |
| Competitive context | How existing products solve this today, and where the opportunity lies — sourced from live web research |
| User stories | 3 stories in As a / I want / So that format with priority classification |
| Acceptance criteria | Given/When/Then criteria per story — specific, testable, implementation-ready |
| Success metrics | 3 metrics with current baseline, target, and timeframe |
| Out of scope | Explicit boundaries to prevent scope creep |
| Edge cases | 4 engineering-relevant scenarios the build must handle |
| Risks | Delivery and adoption risks with mitigations |
| Open questions | Unresolved decisions that must be answered before build starts |
Export formats: Word (.docx) · PDF · Markdown · Google Docs
The system runs three stages in sequence, each using a different AI capability:
Feature idea (plain language input)
│
▼
┌─────────────────────────────────────────┐
│ Stage 1 — Competitive Research │
│ │
│ Claude runs live web search to find │
│ how existing products handle this │
│ feature today. Sources: product pages, │
│ G2 reviews, Reddit, changelog posts. │
│ │
│ Output: grounded competitive summary │
└──────────────────┬──────────────────────┘
│ injected as context
▼
┌─────────────────────────────────────────┐
│ Stage 2 — Spec Generation │
│ │
│ Claude + Instructor generates a full │
│ PRD against a strict Pydantic schema. │
│ Every field — including nested │
│ acceptance criteria — is typed and │
│ validated before leaving the server. │
│ │
│ Output: PRDSpec (structured object) │
└──────────────────┬──────────────────────┘
│ streamed to UI via SSE
▼
┌─────────────────────────────────────────┐
│ Stage 3 — Refinement Loop │
│ │
│ PM iterates through natural language. │
│ Each instruction sends the full PRD │
│ as context — Claude edits targeted │
│ sections while preserving the rest. │
│ │
│ Output: updated PRD, same schema │
└─────────────────────────────────────────┘
Research before writing. A PRD written without competitive context produces user stories that already exist in three competing products. Running the research agent first grounds the spec in what the market does today — and where the actual gap is.
Typed output over free text. Unstructured LLM output cannot drive a product UI reliably. Every API call uses Instructor to enforce the PRDSpec Pydantic schema — including nested acceptance criteria — so the frontend always receives a predictable, validated object.
Streaming over polling. A 30-second request with no feedback degrades trust in the tool. SSE lets the UI surface exactly what is happening at each stage — research fired, research complete with a snippet of findings, writing in progress, done. The live elapsed timer makes the cost of each step visible.
Full context in refinement. Sending only a summary to the refinement agent loses the detail needed for targeted edits. The full PRD serialized as JSON is injected into each refinement prompt so Claude can make precise changes without reconstructing the spec from scratch.
| Component | Technology | Implementation detail |
|---|---|---|
| LLM | claude-haiku-4-5-20251001 |
Research agent, spec writer, and refinement loop |
| Web search | web_search_20250305 tool (Anthropic) |
Live competitor research injected into spec prompt |
| Structured output | Instructor + Pydantic v2 | PRDSpec schema enforced on every LLM call |
| Streaming | FastAPI SSE + ReadableStream |
Pipeline stage events pushed to browser in real time |
| Async | asyncio.run_in_executor |
Blocking LLM calls offloaded from event loop |
| Backend | FastAPI + Uvicorn | /generate (SSE), /refine, /export/docx, /export/pdf |
| Frontend | React 18 + Vite | Input form, live pipeline view, tabbed PRD viewer |
| Word export | python-docx | Formatted .docx with headings, metrics table, bullets |
| PDF export | fpdf2 | Multi-page PDF with headers, page numbers, clean layout |
PM-Spec-PRD-Generator/
├── backend/
│ ├── agents/
│ │ ├── research.py Live web search via Claude tool use
│ │ └── spec_writer.py PRD generation and refinement via Instructor
│ ├── api/
│ │ ├── main.py FastAPI — SSE streaming, endpoints, async patterns
│ │ └── exporters.py DOCX and PDF generation from PRDSpec
│ ├── models/
│ │ └── schemas.py Full PRDSpec Pydantic schema with nested types
│ ├── requirements.txt
│ └── .env.example
└── frontend/
├── src/
│ ├── App.jsx SSE stream reader, state management, refine loop
│ └── components/
│ ├── InputForm.jsx Feature input with built-in examples
│ ├── GeneratingView.jsx Live timer, pipeline stages, agent event log
│ └── PRDView.jsx Tabbed PRD viewer, refinement chat, export panel
└── vite.config.js
- Python 3.11+
- Node 18+
- Anthropic API key — console.anthropic.com
git clone https://github.com/skyplon/PM-Spec-PRD-Generator.git
cd PM-Spec-PRD-Generator/backend
cp .env.example .env
# Add ANTHROPIC_API_KEY to .envpython3.11 -m venv venv
source venv/bin/activate
pip install -r requirements.txt# Must run from project root for relative imports to resolve
cd PM-Spec-PRD-Generator
uvicorn backend.api.main:app --reload --port 8001cd frontend
npm install && npm run dev# After initial setup, use the included start script
./start.sh| Instruction | Effect |
|---|---|
Make acceptance criteria more specific with exact thresholds and numbers |
Rewrites all Given/When/Then criteria with quantified conditions |
Add 2 more edge cases around API rate limiting and third-party failures |
Expands edge cases section with infrastructure-specific scenarios |
Rewrite the success metrics with tighter targets and shorter timeframes |
Updates baselines, targets, and measurement windows |
Make this more mobile-focused |
Updates personas, stories, and constraints for mobile context |
Add risks around GDPR compliance for EU users |
Expands risk section with regulatory and data handling considerations |
These are deliberate constraints in the current implementation, not oversights. Each one represents a tradeoff made to ship a working product, with a known path to resolution.
Output volume is intentionally capped.
The spec writer currently generates exactly 3 user stories, 2 acceptance criteria per story, 3 success metrics, and 4 edge cases. This is a workaround, not a design choice. The underlying model (claude-haiku-4-5) has an 8,192 output token ceiling. A fully detailed PRD for a complex feature — with 5–7 stories and 3 AC each — routinely exceeds that limit, causing the Instructor schema validation to fail mid-generation and return nothing. The correct fix is to upgrade to claude-sonnet (64k output tokens) or split generation across two sequential calls: one for stories and metrics, one for risks and edge cases. The current cap ensures consistent, complete output at the cost of depth.
Competitive research quality depends on the feature description. The research agent runs a single web search pass. For well-known feature categories (e.g., "expense approvals," "email tone detection") it surfaces relevant competitor data reliably. For internal tooling, niche verticals, or highly specific B2B workflows, the search results are generic and add limited value to the spec. A multi-query research strategy — breaking the idea into 3–4 targeted searches — would improve coverage significantly.
Refinement does not have memory across sessions. Each refinement call is stateless. If you refine a spec, close the browser, and reopen it, the history is gone. The PRD lives in React state only — there is no persistence layer. This means the tool is suited for single-session spec drafting, not async collaborative workflows where a PM and engineering lead iterate over multiple days.
The generated PRD does not adapt to company-specific context. All specs are generated from general product knowledge. There is no mechanism to inject company terminology, existing design patterns, an established tech stack, or prior PRDs as context. A RAG layer over internal documentation would allow the tool to write specs that sound like they belong to a specific product organization.
Export formatting is functional, not polished. The Word and PDF exports produce clean, readable documents but do not match the visual standards of a professional template (no branded headers, no logo, no custom typography). For internal use this is acceptable. For client-facing or exec-review specs, a template-based export layer would be required.
The following represent the highest-value improvements, ordered by expected impact on output quality and user adoption.
Upgrade to claude-sonnet for unconstrained output depth. Removing the artificial token cap and letting the model generate as many stories, metrics, and edge cases as the feature warrants is the single highest-impact change. The PRD quality in the current version is limited not by the model's reasoning but by its output budget.
Multi-query competitive research. Instead of one broad web search, decompose the feature idea into 3–4 targeted queries: competitor feature pages, user review sites (G2, Reddit, App Store), recent product changelog posts, and job postings (which signal where competitors are investing). This would produce materially richer competitive context sections.
RAG over internal documentation. Allow teams to upload existing PRDs, design principles, a product glossary, and engineering constraints. The spec writer would retrieve relevant context before generating, producing output that uses the company's terminology, respects its established patterns, and doesn't contradict decisions already made.
Session persistence and async collaboration. Store generated specs in a database with a shareable link. Allow a PM to generate a spec, share it with an engineering lead for async review, and refine it collaboratively over multiple sessions — with a version history showing what changed between iterations.
PRD scoring and completeness check. After generation, run a second LLM pass that scores the spec against a PM quality rubric: Are the acceptance criteria actually testable? Do the success metrics have realistic baselines? Are the edge cases specific enough for an engineer to act on? Surface a completeness score with specific improvement suggestions before the PM exports.
Template library for common feature patterns. Pre-built context packs for recurring feature types — authentication flows, notification systems, onboarding checklists, billing and pricing changes — that inject domain-specific edge cases and risks the model might otherwise miss. A notification system spec should always address delivery failures, opt-out flows, and rate limiting; a billing change spec should always address prorated charges, failed payment retries, and tax handling.
Spec-to-ticket generation. One-click export of each user story as a Jira epic or Linear project, with acceptance criteria mapped to sub-tasks and the open questions filed as blockers. This closes the loop between spec creation and execution tracking.
- AI Meeting Co-pilot — LangGraph agent pipeline that converts meeting transcripts into action items, routed tasks, and follow-up emails
- Prospect-IQ — AI sales intelligence tool for Enterprise SDRs
- GTM-Ops-Agent — AI-powered GTM planning automation for Sales Operations
Juan Manuel Navarrete Solano Senior Product Manager — Agentic AI & Generative AI








