Skip to content

skyplon/PM-Spec-PRD-Generator

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PM Spec Generator

An AI-powered tool that transforms a feature idea into a structured, developer-ready PRD in under 30 seconds — complete with competitive research, typed user stories, acceptance criteria, success metrics, and engineering-ready edge cases.

🟢 Live demo: pm-spec-prd-generator.vercel.app

Built for product managers who spend 4–8 hours writing specs that could be written in minutes with the right AI pipeline behind them.


The Problem

Writing a PRD from a raw idea is one of the highest-friction tasks in a PM's week. The process is sequential and labor-intensive: research competitors, translate vague requirements into testable user stories, anticipate edge cases engineering will surface anyway, define metrics with baselines, and surface the open questions that will block the build.

Most teams short-circuit this process and ship specs that are thin on competitive context, weak on acceptance criteria, and missing the edge cases that come back as scope creep. The result is misaligned engineering work, rework cycles, and delayed launches.

This tool compresses that process into a single AI pipeline that does the research, structures the output, and lets the PM iterate through natural language — so the spec that reaches engineering is grounded, specific, and ready for review.


What It Produces

Given a feature description (2–4 sentences), the tool outputs a complete PRD structured as:

Section Description
Problem statement Clear articulation of user pain and current gap in the market
Goal Measurable success definition — for the product and for the user
Target personas 2–3 specific user types with role and behavioral context
Competitive context How existing products solve this today, and where the opportunity lies — sourced from live web research
User stories 3 stories in As a / I want / So that format with priority classification
Acceptance criteria Given/When/Then criteria per story — specific, testable, implementation-ready
Success metrics 3 metrics with current baseline, target, and timeframe
Out of scope Explicit boundaries to prevent scope creep
Edge cases 4 engineering-relevant scenarios the build must handle
Risks Delivery and adoption risks with mitigations
Open questions Unresolved decisions that must be answered before build starts

Export formats: Word (.docx) · PDF · Markdown · Google Docs


Screenshots

Input — describe the feature in plain language

Live pipeline — web research fires first, spec writing second

Overview — problem statement, goal, and target users

User stories — prioritized with Given/When/Then acceptance criteria

Success metrics — baseline, target, and timeframe

Competitive context — sourced from live web search at generation time

Risks and edge cases

Iterative refinement — update the spec through natural language

Export — Word, PDF, Markdown, or directly to Google Docs


How It Works

The system runs three stages in sequence, each using a different AI capability:

Feature idea (plain language input)
              │
              ▼
┌─────────────────────────────────────────┐
│  Stage 1 — Competitive Research         │
│                                         │
│  Claude runs live web search to find    │
│  how existing products handle this      │
│  feature today. Sources: product pages, │
│  G2 reviews, Reddit, changelog posts.   │
│                                         │
│  Output: grounded competitive summary   │
└──────────────────┬──────────────────────┘
                   │  injected as context
                   ▼
┌─────────────────────────────────────────┐
│  Stage 2 — Spec Generation              │
│                                         │
│  Claude + Instructor generates a full   │
│  PRD against a strict Pydantic schema.  │
│  Every field — including nested         │
│  acceptance criteria — is typed and     │
│  validated before leaving the server.   │
│                                         │
│  Output: PRDSpec (structured object)    │
└──────────────────┬──────────────────────┘
                   │  streamed to UI via SSE
                   ▼
┌─────────────────────────────────────────┐
│  Stage 3 — Refinement Loop              │
│                                         │
│  PM iterates through natural language.  │
│  Each instruction sends the full PRD    │
│  as context — Claude edits targeted     │
│  sections while preserving the rest.    │
│                                         │
│  Output: updated PRD, same schema       │
└─────────────────────────────────────────┘

Design Decisions

Research before writing. A PRD written without competitive context produces user stories that already exist in three competing products. Running the research agent first grounds the spec in what the market does today — and where the actual gap is.

Typed output over free text. Unstructured LLM output cannot drive a product UI reliably. Every API call uses Instructor to enforce the PRDSpec Pydantic schema — including nested acceptance criteria — so the frontend always receives a predictable, validated object.

Streaming over polling. A 30-second request with no feedback degrades trust in the tool. SSE lets the UI surface exactly what is happening at each stage — research fired, research complete with a snippet of findings, writing in progress, done. The live elapsed timer makes the cost of each step visible.

Full context in refinement. Sending only a summary to the refinement agent loses the detail needed for targeted edits. The full PRD serialized as JSON is injected into each refinement prompt so Claude can make precise changes without reconstructing the spec from scratch.


Technical Implementation

Component Technology Implementation detail
LLM claude-haiku-4-5-20251001 Research agent, spec writer, and refinement loop
Web search web_search_20250305 tool (Anthropic) Live competitor research injected into spec prompt
Structured output Instructor + Pydantic v2 PRDSpec schema enforced on every LLM call
Streaming FastAPI SSE + ReadableStream Pipeline stage events pushed to browser in real time
Async asyncio.run_in_executor Blocking LLM calls offloaded from event loop
Backend FastAPI + Uvicorn /generate (SSE), /refine, /export/docx, /export/pdf
Frontend React 18 + Vite Input form, live pipeline view, tabbed PRD viewer
Word export python-docx Formatted .docx with headings, metrics table, bullets
PDF export fpdf2 Multi-page PDF with headers, page numbers, clean layout

Project Structure

PM-Spec-PRD-Generator/
├── backend/
│   ├── agents/
│   │   ├── research.py        Live web search via Claude tool use
│   │   └── spec_writer.py     PRD generation and refinement via Instructor
│   ├── api/
│   │   ├── main.py            FastAPI — SSE streaming, endpoints, async patterns
│   │   └── exporters.py       DOCX and PDF generation from PRDSpec
│   ├── models/
│   │   └── schemas.py         Full PRDSpec Pydantic schema with nested types
│   ├── requirements.txt
│   └── .env.example
└── frontend/
    ├── src/
    │   ├── App.jsx             SSE stream reader, state management, refine loop
    │   └── components/
    │       ├── InputForm.jsx       Feature input with built-in examples
    │       ├── GeneratingView.jsx  Live timer, pipeline stages, agent event log
    │       └── PRDView.jsx         Tabbed PRD viewer, refinement chat, export panel
    └── vite.config.js

Setup

Requirements

1. Clone and configure

git clone https://github.com/skyplon/PM-Spec-PRD-Generator.git
cd PM-Spec-PRD-Generator/backend
cp .env.example .env
# Add ANTHROPIC_API_KEY to .env

2. Install backend

python3.11 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

3. Start backend

# Must run from project root for relative imports to resolve
cd PM-Spec-PRD-Generator
uvicorn backend.api.main:app --reload --port 8001

4. Start frontend

cd frontend
npm install && npm run dev

Open http://localhost:5174

Convenience script

# After initial setup, use the included start script
./start.sh

Refinement Examples

Instruction Effect
Make acceptance criteria more specific with exact thresholds and numbers Rewrites all Given/When/Then criteria with quantified conditions
Add 2 more edge cases around API rate limiting and third-party failures Expands edge cases section with infrastructure-specific scenarios
Rewrite the success metrics with tighter targets and shorter timeframes Updates baselines, targets, and measurement windows
Make this more mobile-focused Updates personas, stories, and constraints for mobile context
Add risks around GDPR compliance for EU users Expands risk section with regulatory and data handling considerations

Known Limitations

These are deliberate constraints in the current implementation, not oversights. Each one represents a tradeoff made to ship a working product, with a known path to resolution.

Output volume is intentionally capped. The spec writer currently generates exactly 3 user stories, 2 acceptance criteria per story, 3 success metrics, and 4 edge cases. This is a workaround, not a design choice. The underlying model (claude-haiku-4-5) has an 8,192 output token ceiling. A fully detailed PRD for a complex feature — with 5–7 stories and 3 AC each — routinely exceeds that limit, causing the Instructor schema validation to fail mid-generation and return nothing. The correct fix is to upgrade to claude-sonnet (64k output tokens) or split generation across two sequential calls: one for stories and metrics, one for risks and edge cases. The current cap ensures consistent, complete output at the cost of depth.

Competitive research quality depends on the feature description. The research agent runs a single web search pass. For well-known feature categories (e.g., "expense approvals," "email tone detection") it surfaces relevant competitor data reliably. For internal tooling, niche verticals, or highly specific B2B workflows, the search results are generic and add limited value to the spec. A multi-query research strategy — breaking the idea into 3–4 targeted searches — would improve coverage significantly.

Refinement does not have memory across sessions. Each refinement call is stateless. If you refine a spec, close the browser, and reopen it, the history is gone. The PRD lives in React state only — there is no persistence layer. This means the tool is suited for single-session spec drafting, not async collaborative workflows where a PM and engineering lead iterate over multiple days.

The generated PRD does not adapt to company-specific context. All specs are generated from general product knowledge. There is no mechanism to inject company terminology, existing design patterns, an established tech stack, or prior PRDs as context. A RAG layer over internal documentation would allow the tool to write specs that sound like they belong to a specific product organization.

Export formatting is functional, not polished. The Word and PDF exports produce clean, readable documents but do not match the visual standards of a professional template (no branded headers, no logo, no custom typography). For internal use this is acceptable. For client-facing or exec-review specs, a template-based export layer would be required.


Future Enhancements

The following represent the highest-value improvements, ordered by expected impact on output quality and user adoption.

Upgrade to claude-sonnet for unconstrained output depth. Removing the artificial token cap and letting the model generate as many stories, metrics, and edge cases as the feature warrants is the single highest-impact change. The PRD quality in the current version is limited not by the model's reasoning but by its output budget.

Multi-query competitive research. Instead of one broad web search, decompose the feature idea into 3–4 targeted queries: competitor feature pages, user review sites (G2, Reddit, App Store), recent product changelog posts, and job postings (which signal where competitors are investing). This would produce materially richer competitive context sections.

RAG over internal documentation. Allow teams to upload existing PRDs, design principles, a product glossary, and engineering constraints. The spec writer would retrieve relevant context before generating, producing output that uses the company's terminology, respects its established patterns, and doesn't contradict decisions already made.

Session persistence and async collaboration. Store generated specs in a database with a shareable link. Allow a PM to generate a spec, share it with an engineering lead for async review, and refine it collaboratively over multiple sessions — with a version history showing what changed between iterations.

PRD scoring and completeness check. After generation, run a second LLM pass that scores the spec against a PM quality rubric: Are the acceptance criteria actually testable? Do the success metrics have realistic baselines? Are the edge cases specific enough for an engineer to act on? Surface a completeness score with specific improvement suggestions before the PM exports.

Template library for common feature patterns. Pre-built context packs for recurring feature types — authentication flows, notification systems, onboarding checklists, billing and pricing changes — that inject domain-specific edge cases and risks the model might otherwise miss. A notification system spec should always address delivery failures, opt-out flows, and rate limiting; a billing change spec should always address prorated charges, failed payment retries, and tax handling.

Spec-to-ticket generation. One-click export of each user story as a Jira epic or Linear project, with acceptance criteria mapped to sub-tasks and the open questions filed as blockers. This closes the loop between spec creation and execution tracking.


Related Work

  • AI Meeting Co-pilot — LangGraph agent pipeline that converts meeting transcripts into action items, routed tasks, and follow-up emails
  • Prospect-IQ — AI sales intelligence tool for Enterprise SDRs
  • GTM-Ops-Agent — AI-powered GTM planning automation for Sales Operations

Author

Juan Manuel Navarrete Solano Senior Product Manager — Agentic AI & Generative AI

LinkedIn · GitHub

About

Turn a 2-sentence feature idea into a developer-ready PRD with user stories, acceptance criteria, competitive research, and success metrics in 30 seconds. Built with Claude API, web search tool use, Instructor, and Pydantic structured outputs.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors