jxnl · jxnl-oai · Apr 13, 2026 · Apr 13, 2026
diff --git a/agents/skills/README.md b/agents/skills/README.md
@@ -3,6 +3,7 @@
 Codex skills stored for installation to `~/.codex/skills` via `./install.sh --skills` or `./install.sh --agents`.
 
 ## Included
+- `audit-ai-code` - Audit AI-shaped backend/general code and apply safe cleanup fixes.
 - `audit-ai-frontend` - Audit AI-looking frontend UI and apply targeted fixes.
 - `audit-ai-writing` - Audit AI-writing residue, citation failures, and cleanup rewrites.
 - `gh-address-comments` - Address GitHub PR review comments on the current branch.

diff --git a/agents/skills/audit-ai-code/SKILL.md b/agents/skills/audit-ai-code/SKILL.md
@@ -0,0 +1,75 @@
+---
+name: audit-ai-code
+description: Audit AI-generated or AI-shaped backend/general code diffs for duplicate helpers, over-defensive control flow, broad exception wrappers, speculative scaffolding, comment/docstring boilerplate, local style drift, hallucinated APIs/dependencies, fixture-shaped test hacks, and obvious safety/performance gaps. Use when reviewing or safely cleaning up Python, TypeScript, or other implementation code after a feature, bugfix, or prototype pass.
+---
+
+# AI Code Audit
+
+## Use
+
+Audit or repair implementation code that reads generically AI-generated, while preserving behavior, public APIs, and tests unless the user explicitly asks for a refactor.
+
+Review in this order:
+
+1. Find the target scope.
+   - Prefer `git diff --check` and `git diff --stat` first.
+   - Inspect the current diff for touched files; if there is no git diff, fall back to recently modified files.
+   - Ask one narrow question if scope is genuinely ambiguous.
+
+2. Collapse duplicate helpers and shadow APIs.
+   - Find helpers or wrappers that do the same job with slightly different names, signatures, or one-off branches.
+   - Prefer one canonical helper with a narrow, clear API.
+   - Replace hand-rolled parsing, path, date/time, and string logic with existing project utilities when they already exist.
+   - Verify any newly introduced helper, method, import, or package is real and canonical in this repo or dependency graph.
+
+3. Flatten defensive control flow and exception boundaries.
+   - Replace nested condition ladders with guard clauses and early exits.
+   - Consolidate checks that lead to the same result and hoist duplicate branch bodies.
+   - Remove stateful control flags when direct control flow is clearer.
+   - Delete broad exception wrappers that hide uncertainty, keep one clear handler around real boundary failures, and replace expected non-exceptional cases with explicit precondition checks.
+
+4. Remove generated-code residue.
+   - Delete speculative abstractions, factories, generic hooks, pass-through wrappers, broad options objects, dead branches, and placeholder fallbacks that have no concrete caller or product need.
+   - Remove comments/docstrings that restate obvious code; keep only non-obvious intent, invariants, and tradeoffs.
+   - Normalize naming, module boundaries, error style, and helper shape to match adjacent hand-written code.
+   - Treat fixture-shaped branches, magic constants, and deleted or weakened tests as a smell; encode the actual invariant instead.
+
+5. Check safety and runtime basics.
+   - Look for secrets in code/config, string-built queries or shell commands, path traversal, unsafe deserialization, SSRF-shaped fetches, missing server-side authorization, sensitive data in logs, swallowed exceptions, unchecked return values, missing outbound timeouts, and check-then-act races.
+   - Patch only high-confidence local fixes; report broader security or architecture changes as follow-up.
+
+6. Verify.
+   - Run the narrowest relevant tests, typechecks, and linters for touched files.
+   - Re-open the diff and confirm cleanup did not change intended behavior.
+
+For larger diffs, parallelize read-only review into up to four passes: reuse/shadow APIs, control-flow/exception boundaries, generated-code residue, and quality/safety/performance. Prefer a stronger model for ambiguous tradeoffs and a smaller model for narrow, easy-to-verify scans.
+
+## Output
+
+For each finding, include:
+
+- `Issue`
+- `Evidence`
+- `Class` (`P0`, `P1`, `P2`)
+- `Why it matters / why it reads as generated`
+- `Possible non-AI explanation`
+- `Smallest fix`
+- `Acceptance check`
+- `Confidence` (`High`, `Medium`, `Low`)
+- `File/line`
+
+Return only the top 5-8 findings for review-only asks and merge repeated symptoms under one root cause.
+
+For implementation asks, patch the code directly, then summarize what was simplified, what was intentionally left alone, what validation ran, and any follow-up risks.
+
+## Guardrails
+
+- Treat "AI-looking" as a quality smell, not a provenance claim.
+- Prefer objective maintainability, correctness, and safety defects over style-only opinions.
+- Do not widen APIs into mega-helpers, config bags, or boolean-flag mode switches just to reduce line count.
+- Do not add speculative abstraction layers, broad framework wrappers, or one-off utility namespaces.
+- Do not reformat unrelated files or chase broad style churn.
+
+## Resource
+
+- `references/sources.md`: source basis for code-smell, AI-generated-code, and security-review checks.
diff --git a/agents/skills/audit-ai-code/agents/openai.yaml b/agents/skills/audit-ai-code/agents/openai.yaml
@@ -0,0 +1,4 @@
+interface:
+  display_name: "AI Code Audit"
+  short_description: "Audit AI-shaped backend/code smells and apply safe fixes."
+  default_prompt: "Use $audit-ai-code to review my current diff for duplicate helpers, defensive branches, broad exception handlers, speculative wrappers, and fixture-shaped hacks."
diff --git a/agents/skills/audit-ai-code/references/sources.md b/agents/skills/audit-ai-code/references/sources.md
@@ -0,0 +1,10 @@
+# Source basis
+
+This skill is a quality and safety audit checklist, not proof of AI authorship. Use these references to ground the smell taxonomy and reviewer language.
+
+- Wikipedia on code smells: `https://en.wikipedia.org/wiki/Code_smell`
+- Refactoring.Guru smell catalog and refactorings, especially Duplicate Code, Long Method, Long Parameter List, Comments, and Speculative Generality: `https://refactoring.guru/refactoring/smells`
+- GitHub Copilot responsible-use guidance for code review, including security checks and AI-generated suggestion risk: `https://docs.github.com/en/copilot/responsible-use/code-review`
+- LLM-generated code smell study: `https://arxiv.org/abs/2510.03029`
+- Package hallucination risk in AI-generated code: `https://www.usenix.org/publications/loginonline/we-have-package-you-comprehensive-analysis-package-hallucinations-code`
+- Hallucinated API behavior in code-generation systems: `https://arxiv.org/abs/2401.01701`
diff --git a/agents/skills/audit-ai-frontend/SKILL.md b/agents/skills/audit-ai-frontend/SKILL.md
@@ -5,63 +5,50 @@ description: Audit AI-generated or AI-looking frontend implementations, UI scree
 
 # AI Frontend Audit
 
-## Use
+Use this one-page scale to audit or repair frontend UI that looks generically AI-generated. Treat model/tool clues as weak priors; judge the shipped experience.
 
-Audit or repair frontend UI that looks generically AI-generated, while preserving existing structure unless the user asks for a redesign.
+## Scale
 
-Review in this order:
+`S0` Blockers: broken keyboard, labels, focus, contrast, touch targets, mobile layout, or missing loading/empty/error states.
 
-1. Inspect code and UI together.
-   - Read components, CSS/theme tokens, and existing primitives first.
-   - If runnable, invoke `$playwright` or `$playwright-interactive` and follow that skill's `SKILL.md`; this skill decides what to inspect, not browser mechanics.
-   - If screenshot-only, review visuals but label implementation risks as `Inferred`.
+`S1` Product truth: fake metrics, demo data, missing source/date labels, UI/API/schema drift, auth or tenancy assumptions, no retry or recovery.
 
-2. Load only the reference you need.
-   - `references/patterns.md` for concrete AI-tell and code-smell fixes.
-   - `references/rubric.md` for broad UX/a11y/design audits.
-   - `references/workflows.md` for Playwright QA, reference-packet, and brief-lock loops.
+`S2` Local fit: ignores the repo's component library, tokens, typography, density, adjacent screens, or established interaction states.
 
-3. Preserve local system intent while removing accidental defaults.
-   - Keep copy/order/IA and known product tokens unless the user asks for a redesign.
-   - Keep a common-looking font/card/palette only if adjacent screens or documented tokens already use it; replace it when the style exists only in the generated screen.
-   - If references are missing, derive one explicit design contract from product domain + user job + existing primitives; do not fabricate named reference sites.
+`S3` Task hierarchy: generic dashboard or landing-page structure, all panels equal weight, unclear primary action, weak IA.
 
-4. Fix in this order.
-   - `P0`: keyboard, labels, contrast, touch targets, mobile overflow, missing loading/empty/error states.
-   - `P1`: generic SaaS layout, card overuse, icon-pill repetition, Inter/Roboto/system defaults, purple/indigo/cyan gradient/glass tropes, vague CTA/copy.
-   - `P2`: spacing rhythm, token consistency, one memorable visual rule, reduced-motion and state polish.
+`S4` AI aesthetic defaults: Inter/system-only personality, purple/indigo/cyan gradients, glass/glow layers, rounded-card grids, Lucide icon pills, vague CTA/copy, overlong explanatory prose, repeated section shells.
 
-5. Re-verify in browser after edits whenever possible.
+`S5` Tool fingerprints: v0/shadcn registry shells, Claude artifact polish, Codex minimal-diff conservatism, Gemini explainer layouts, Lovable/Supabase app shells, Bolt/Replit fallback scaffolds, Figma layer residue.
 
-## Output
+`S6` Creative polish: fun styles, algorithmic art, theme packs, image assets, stickers, motion, or novelty that does not affect the core task.
 
-For each finding, include:
+## Do
 
-- `Issue`
-- `Evidence`
-- `Class` (`P0`, `P1`, `P2`)
-- `Why it matters / why it reads as generic`
-- `Possible non-AI explanation`
-- `Smallest fix`
-- `Acceptance check`
-- `Confidence` (`High`, `Medium`, `Low`)
-- `File/line` when code is available
+- Inspect code and UI together before proposing changes.
+- Rank findings by the scale above; fix lower-numbered issues before style.
+- Preserve local copy, IA, tokens, and component primitives unless they are the problem.
+- Use installed icon packs or existing icon components by default; custom SVGs are only for bespoke product marks, diagrams, or assets the icon set cannot express.
+- Use source/tool clues only to expand searches, for example `CardHeader`, `text-muted-foreground`, `lucide-react`, `supabase`, `VITE_`, fixed `left/top/width/height`, `features.map`, `bg-clip-text`.
+- Replace generic polish with product-specific hierarchy: one primary action, one dominant data surface, concrete object/action copy, and realistic states.
+- When a UI still feels AI-generated after visual cleanup, cut copy and change the information structure before changing colors or adding decoration.
+- Verify runnable UIs in browser at desktop and mobile sizes, including keyboard/focus, long text, empty data, loading, errors, disabled states, and reduced motion.
 
-Return only the top 5-8 findings and merge repeated symptoms under one root cause. End with one line: `If I had to change only one thing: ...`
+## Don't
 
-For implementation asks, patch the code directly, then summarize only the meaningful design changes and any remaining risk.
+- Don't claim AI authorship from style, model fingerprints, or component choices.
+- Don't prioritize fun skills, stickers, algorithmic art, theme packs, or dramatic motion above operability and product truth.
+- Don't overcorrect generic UI with random ornaments, novelty fonts, noisy textures, or one-off visual chaos.
+- Don't keep repeated card grids, bordered panels, or long prose blocks just because they are already implemented; collapse them into rows, matrices, labels, or plain text when the content is simple.
+- Don't hand-roll inline SVG icons when the repo already has `lucide-react`, Heroicons, Font Awesome, Radix icons, Material icons, or another installed icon system.
+- Don't replace a documented design system just because it uses common fonts, cards, or neutral tokens.
+- Don't report inferred accessibility or code defects from screenshots as fact; mark them `Inferred`.
+- Don't leave pretty demo states in place of real authorization, validation, empty, error, setup, or API contract behavior.
 
-## Guardrails
+## Output
 
-- Treat "AI-looking" as a quality smell, not a provenance claim.
-- Prefer objective defects over taste opinions.
-- When auditing shadcn/ui projects, preserve semantic component usage and tokens. Use the `shadcn` skill if component APIs, registry install/update, or shadcn-specific composition rules are part of the fix.
-- Avoid anti-slop overcorrection: no random ornaments, novelty fonts, or one-off visual chaos.
-- Anchor each finding in code, screenshots, DOM/a11y snapshots, or browser behavior, and separate fact from inference.
+For review-only asks, return the top 5-8 findings with `Issue`, `Evidence`, `Scale`, `Class`, `Smallest fix`, `Acceptance check`, `Confidence`, and `File/line`. Merge repeated symptoms under one root cause and end with `If I had to change only one thing: ...`
 
-## Resource
+For implementation asks, patch directly, then summarize the meaningful design changes and remaining risk.
 
-- `references/patterns.md`: checklist of AI-frontend tells, code smells, and repair patterns.
-- `references/rubric.md`: compact UX/a11y/design-quality rubric for broader audits.
-- `references/workflows.md`: Playwright QA, reference-packet, and brief-lock loops; delegates browser mechanics to `$playwright` and `$playwright-interactive`.
-- Use `$playwright` and `$playwright-interactive` directly for browser execution workflows.
+Use `references/patterns.md`, `references/rubric.md`, `references/workflows.md`, and `references/sources.md` only when the one-page scale is not enough.
diff --git a/agents/skills/audit-ai-frontend/references/patterns.md b/agents/skills/audit-ai-frontend/references/patterns.md
@@ -216,6 +216,9 @@ Search for these framework-agnostic patterns before patching:
 - repeated hover scale transforms on every card or tile
 - repeated `Card` maps with the same `Icon + title + description` structure
 - repeated outline-icon imports used only as section decoration
+- component-library demo shells such as `CardHeader`, `CardDescription`, `Badge`, `Tabs`, `DropdownMenu`, `text-muted-foreground`, or `lucide-react`
+- full-stack builder scaffolding such as `Dashboard`, `Overview`, `Recent Activity`, `Settings`, `ProtectedRoute`, `useAuth`, `supabase`, `VITE_`, or fallback demo arrays
+- design-to-code residue such as fixed `left/top/width/height`, layer-like asset names, or repeated exact pixel values
 - `outline: none`, `tabIndex={-1}`, clickable `div`/`span`
 - missing `aria-label`, `aria-describedby`, `alt`, or dialog titles
 - fixed desktop widths, fixed card grids, sticky sidebars, or wide tables with no mobile fallback
@@ -226,6 +229,7 @@ If the project uses Tailwind, also search for these optional utility-class equiv
 
 - `font-sans`, `from-purple-*`, `to-indigo-*`, `bg-indigo-*`, `text-transparent bg-clip-text`
 - `rounded-2xl`, `rounded-3xl`, `shadow-xl`, `backdrop-blur-*`, `hover:scale-105`
+- `text-muted-foreground`, `bg-card`, `border-border`, `bg-background`, `--radius`, `oklch`
 
 ## Minimal Repair Playbook
 

diff --git a/agents/skills/audit-ai-frontend/references/sources.md b/agents/skills/audit-ai-frontend/references/sources.md
@@ -0,0 +1,15 @@
+# Source basis
+
+This skill treats "AI-looking" UI as a quality pattern, not a provenance claim. Use these references to ground the visual-smell and UX checks.
+
+- Wikipedia on AI slop: `https://en.wikipedia.org/wiki/AI_slop`
+- CrowdGenUI study on generated UI converging to generic solutions and missing task/user context: `https://arxiv.org/abs/2411.03477`
+- Jidong Lab essay on why AI-generated UIs converge visually: `https://www.jidonglab.com/blog/why-every-ai-generated-ui-looks-the-same-and-how-to-escape-the-digital-sea-of-sameness`
+- WCAG 2.2 for objective accessibility failures such as contrast, focus indicators, and touch target size: `https://www.w3.org/TR/WCAG22/`
+- OpenAI model/release notes for current GPT/Codex coding behavior: `https://openai.com/research/index/release/`
+- Anthropic Claude Sonnet 4.5 release notes for current coding/agentic/computer-use positioning: `https://www.anthropic.com/news/claude-sonnet-4-5`
+- Google Gemini 3 in Search / AI Mode notes for generative UI, dynamic visual layouts, interactive tools, and simulations: `https://blog.google/products/search/gemini-3-search-ai-mode`
+- Vercel AI Elements notes for shadcn/ui-based AI interface primitives: `https://vercel.com/changelog/introducing-ai-elements`
+- shadcn/ui docs for Tailwind/CSS-variable component defaults used by many generated UIs: `https://ui.shadcn.com/docs`
+- Lovable docs for Supabase-centered app-builder workflows: `https://docs.lovable.dev/`
+- Bolt docs for browser full-stack app-builder workflows: `https://support.bolt.new/`
diff --git a/agents/skills/audit-ai-frontend/references/workflows.md b/agents/skills/audit-ai-frontend/references/workflows.md
@@ -10,7 +10,7 @@ Delegate browser operation details to the existing Playwright skills:
 
 - Use `$playwright` for one-shot CLI browser inspection, snapshots, screenshots, and trace capture.
 - Use `$playwright-interactive` for persistent browser sessions, repeated edit/reload loops, and deeper visual QA with a shared QA inventory.
-- Open `${CODEX_HOME:-$HOME/.codex}/skills/playwright/SKILL.md` or `${CODEX_HOME:-$HOME/.codex}/skills/playwright-interactive/SKILL.md` before running browser commands.
+- Open `/Users/jasonliu/.codex/skills/playwright/SKILL.md` or `/Users/jasonliu/.codex/skills/playwright-interactive/SKILL.md` before running browser commands.
 - Keep this workflow focused on what to inspect for AI-frontend quality, not how to operate Playwright primitives.
 
 1. Open the page in a real browser using the `playwright` skill.

diff --git a/agents/skills/audit-ai-writing/SKILL.md b/agents/skills/audit-ai-writing/SKILL.md
@@ -7,6 +7,8 @@ description: Reference-only checklist for AI-writing artifacts, citation failure
 
 ## Use
 
+Audit or repair Markdown, docs, and pasted prose that may contain AI-writing residue, broken citations, or house-style drift.
+
 Open `patterns.md` and review in this order:
 
 1. Machine residue, broken markup, and broken citations.
@@ -21,13 +23,14 @@ For each finding, include:
 - `Issue`
 - `Evidence` (exact snippet or line location)
 - `Class` (`P0`, `P1`, `P2`)
-- `Why it matters / why it reads as generic`
+- `Why it matters / why it reads as generated`
 - `Possible non-AI explanation`
 - `Smallest fix`
+- `Acceptance check`
 - `Confidence` (`High`, `Medium`, `Low`)
-- `File/line` when available
+- `File/line` when a file is available
 
-Return only the top 5-8 findings and merge repeated symptoms under one root cause.
+Return only the top 5-8 findings and merge repeated symptoms under one root cause. For rewrite asks, patch the text directly and summarize only the meaningful cleanup.
 
 ## Guardrails
 
@@ -38,4 +41,4 @@ Return only the top 5-8 findings and merge repeated symptoms under one root caus
 
 ## Resource
 
-- `patterns.md`: compact artifact taxonomy, verification checks, and rewrite guidance.
+- `patterns.md`: compact artifact taxonomy, verification checks, rewrite guidance, and source basis.