What we've built, what's next, and the phases to make it world-class.
| Area | Status | Quality |
|---|---|---|
| Backend API (5 endpoints) | Complete | Production-grade |
| Compliance Engine (PII, bias, safety) | Complete | Strong — smart exclusions, scoring |
| RAG Pipeline (FAISS + embeddings) | Complete | Functional |
| React Dashboard (3 pages) | Complete | Functional but incomplete |
| Docker + CI/CD | Complete | Solid |
| Tests (203 total) | Complete | Excellent coverage |
Overall: Strong engineering, weak storytelling. The hardest parts are built. What's missing is the layer that makes someone say "that's cool" in 10 seconds.
Status: COMPLETE — Interactive /playground page where users type a prompt, submit it, and watch the compliance engine analyze the LLM response in real-time. Deployed at safe-gen-dev.vercel.app/playground.
- Prompt input — textarea with character count and example prompts
- Submit button — calls
POST /api/validateand streams the result - Response panel — shows the LLM response with inline flag highlights
- Compliance sidebar — score gauge, pass/fail badge, flag list with severity
- Example prompts — pre-loaded buttons that demonstrate each validator:
- "Clean" prompt (passes all checks)
- PII prompt (triggers email/phone/SSN detection)
- Bias prompt (triggers gendered language detection)
- Mixed prompt (triggers multiple validators)
- Category toggles — let users enable/disable PII, bias, safety layers to see the difference
This is the page you demo in interviews. This is the page in the README GIF. This is the page that turns "I built a compliance pipeline" into "let me show you what it does."
| File | Purpose |
|---|---|
src/pages/PlaygroundPage.tsx |
Main page with prompt input + results |
src/components/playground/PromptInput.tsx |
Textarea with example buttons |
src/components/playground/ComplianceResult.tsx |
Score, flags, pass/fail display |
src/components/playground/ResponseViewer.tsx |
LLM response with highlighted flags |
src/components/playground/CategoryToggle.tsx |
PII/bias/safety layer toggles |
- Renders prompt input and submit button
- Displays loading state during API call
- Renders compliance results with flags
- Example prompt buttons populate textarea
- Category toggles update request payload
Priority: HIGH — A dashboard full of zeros tells no story.
- Seed data script (
scripts/seed_demo.py) — sends 15-20 curated validation requests that produce a mix of pass/fail/flagged results across different days - Demo mode flag —
DEMO_MODE=trueenv var that auto-seeds on first startup - Better empty states:
- Dashboard: "No validation data yet. Try the Playground to get started." with link
- Audit Log: "No audit records. Send your first prompt from the Playground."
- Rules: "No rules ingested yet. Upload a compliance document to get started."
- Score gauge fix — show "No data" instead of "0.0% Critical" when empty
- Grammar fix — "1 chunk" not "1 chunks"
A hiring manager will docker-compose up, open localhost:5173, and see either a compelling dashboard or a wall of zeros. You get one chance.
Priority: MEDIUM — Currently the weakest page. Three cards with "1 chunks" is not impressive.
- Rule detail modal — click a card to see the full extracted text, chunk count, and when it was ingested
- Rule content preview — show first 2-3 lines of each rule on the card itself
- Delete rule — button to remove an ingested rule (needs new
DELETE /api/rules/:idendpoint) - Chunk visualization — show how the document was split into chunks (demonstrates RAG understanding)
- Search rules — text search across ingested rule content (demonstrates semantic search)
- Better cards — add icon per document type (PDF, MD, DOCX, TXT), file size, ingestion timestamp
| Endpoint | Method | Purpose |
|---|---|---|
/api/rules/:id |
GET | Get single rule with full text + chunks |
/api/rules/:id |
DELETE | Remove rule and re-index FAISS |
/api/rules/search |
POST | Semantic search across rule chunks |
Priority: MEDIUM — The UI is functional but generic. Small changes have outsized impact on perception.
- Flag Breakdown chart — replace the purple with a proper color scale (red for critical, amber for warning, blue for info) that matches the severity system
- Request Trend chart — add passed (green area) vs failed (red area) stacked visualization instead of a single line
- Score gauge — add animation on load (count up from 0 to actual value)
- Sidebar active state — add a left border accent on the active nav item
- Loading skeletons — replace any flash of empty content with skeleton loaders (already have the shadcn component, just need to wire it up consistently)
- Responsive mobile — test and fix the hamburger menu, ensure tables scroll horizontally on small screens
- Favicon — add a proper SafeGen icon instead of the default Vite logo
- Page transitions — subtle fade between route changes
Critical flags: hsl(0, 84%, 60%) — Red
Warning flags: hsl(38, 92%, 50%) — Amber
Info flags: hsl(217, 91%, 60%) — Blue
Passed: hsl(142, 71%, 45%) — Green
Failed: hsl(0, 84%, 60%) — Red
Priority: HIGH — The README is the landing page of your portfolio piece. Most people never look past it.
- Hero GIF — 15-second recording of the Playground page: type a prompt with PII, submit, watch flags appear. This single asset will get more attention than anything else.
- Architecture diagram — Mermaid or Excalidraw diagram showing the full data flow: Client -> Azure Functions -> OpenAI -> Compliance Engine -> Audit Store -> Dashboard
- Quick start that actually works —
docker-compose upwith seed data, link tolocalhost:5173/playground - Screenshots — Dashboard (dark mode), Playground with flags, Audit detail modal
- Feature comparison table — what SafeGen validates vs what raw LLM output gives you
- Tech decision rationale — brief "Why FAISS over Pinecone?" "Why Azure Functions over FastAPI?" sections that show architectural thinking
| File | Purpose |
|---|---|
README.md |
Rewrite with GIF, screenshots, quick start |
docs/screenshots/ |
Dashboard, Playground, Audit screenshots |
docs/architecture.excalidraw |
Visual architecture diagram |
Priority: LOW (for portfolio) / HIGH (for real deployment)
- Rate limiting — prevent abuse of the validate endpoint
- Request validation — max prompt length, content-type checks
- Error boundaries — React error boundary component so one component crash doesn't take down the whole app
- API retry logic — exponential backoff in the frontend API client
- Health check endpoint —
GET /api/healthreturning status + dependency health - Structured logging — JSON logs with correlation IDs for request tracing
- OpenAPI spec — auto-generated Swagger docs from the Azure Functions endpoints
- Environment config — proper config management for dev/staging/prod
Priority: NICE-TO-HAVE — These push SafeGen from "good portfolio project" to "this person thinks about AI safety deeply."
- Custom rule authoring — let users write their own compliance rules in the UI (not just upload documents)
- Rule effectiveness dashboard — which rules trigger most often? Which are never triggered?
- Prompt rewriting — when a flag is detected, suggest a rewritten version that passes compliance
- Batch validation — upload a CSV of prompts and get a compliance report
- Webhook notifications — alert on critical compliance failures
- Multi-model comparison — validate the same prompt against GPT-4o, GPT-3.5, and compare compliance scores
- Export audit report — PDF/CSV export of audit logs for compliance documentation
| Phase | Impact | Effort | Do It |
|---|---|---|---|
| Phase 7: Playground | Transformative | 1 session | DONE |
| Phase 8: Seed Data + Empty States | High | 2 hours | Second |
| Phase 11: README + GIF | High | 1 hour | Third |
| Phase 10: Visual Polish | Medium | 3 hours | Fourth |
| Phase 9: Rules Page | Medium | 1 session | Fifth |
| Phase 12: Production Hardening | Low (portfolio) | 1 session | Optional |
| Phase 13: Advanced Features | Low (portfolio) | Multiple sessions | Stretch |
If you only do three things:
- Build the Playground page — this is the demo
- Record a README GIF — this is the hook
- Add seed data — this is the first impression
Everything else is polish. These three turn SafeGen from "strong backend project with a monitoring UI" into "full-stack AI safety platform you can demo live."