Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions agents/skills/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
Codex skills stored for installation to `~/.codex/skills` via `./install.sh --skills` or `./install.sh --agents`.

## Included
- `audit-ai-code` - Audit AI-shaped backend/general code and apply safe cleanup fixes.
- `audit-ai-frontend` - Audit AI-looking frontend UI and apply targeted fixes.
- `audit-ai-writing` - Audit AI-writing residue, citation failures, and cleanup rewrites.
- `gh-address-comments` - Address GitHub PR review comments on the current branch.
Expand Down
75 changes: 75 additions & 0 deletions agents/skills/audit-ai-code/SKILL.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
---
name: audit-ai-code
description: Audit AI-generated or AI-shaped backend/general code diffs for duplicate helpers, over-defensive control flow, broad exception wrappers, speculative scaffolding, comment/docstring boilerplate, local style drift, hallucinated APIs/dependencies, fixture-shaped test hacks, and obvious safety/performance gaps. Use when reviewing or safely cleaning up Python, TypeScript, or other implementation code after a feature, bugfix, or prototype pass.
---

# AI Code Audit

## Use

Audit or repair implementation code that reads generically AI-generated, while preserving behavior, public APIs, and tests unless the user explicitly asks for a refactor.

Review in this order:

1. Find the target scope.
- Prefer `git diff --check` and `git diff --stat` first.
- Inspect the current diff for touched files; if there is no git diff, fall back to recently modified files.
- Ask one narrow question if scope is genuinely ambiguous.

2. Collapse duplicate helpers and shadow APIs.
- Find helpers or wrappers that do the same job with slightly different names, signatures, or one-off branches.
- Prefer one canonical helper with a narrow, clear API.
- Replace hand-rolled parsing, path, date/time, and string logic with existing project utilities when they already exist.
- Verify any newly introduced helper, method, import, or package is real and canonical in this repo or dependency graph.

3. Flatten defensive control flow and exception boundaries.
- Replace nested condition ladders with guard clauses and early exits.
- Consolidate checks that lead to the same result and hoist duplicate branch bodies.
- Remove stateful control flags when direct control flow is clearer.
- Delete broad exception wrappers that hide uncertainty, keep one clear handler around real boundary failures, and replace expected non-exceptional cases with explicit precondition checks.

4. Remove generated-code residue.
- Delete speculative abstractions, factories, generic hooks, pass-through wrappers, broad options objects, dead branches, and placeholder fallbacks that have no concrete caller or product need.
- Remove comments/docstrings that restate obvious code; keep only non-obvious intent, invariants, and tradeoffs.
- Normalize naming, module boundaries, error style, and helper shape to match adjacent hand-written code.
- Treat fixture-shaped branches, magic constants, and deleted or weakened tests as a smell; encode the actual invariant instead.

5. Check safety and runtime basics.
- Look for secrets in code/config, string-built queries or shell commands, path traversal, unsafe deserialization, SSRF-shaped fetches, missing server-side authorization, sensitive data in logs, swallowed exceptions, unchecked return values, missing outbound timeouts, and check-then-act races.
- Patch only high-confidence local fixes; report broader security or architecture changes as follow-up.

6. Verify.
- Run the narrowest relevant tests, typechecks, and linters for touched files.
- Re-open the diff and confirm cleanup did not change intended behavior.

For larger diffs, parallelize read-only review into up to four passes: reuse/shadow APIs, control-flow/exception boundaries, generated-code residue, and quality/safety/performance. Prefer a stronger model for ambiguous tradeoffs and a smaller model for narrow, easy-to-verify scans.

## Output

For each finding, include:

- `Issue`
- `Evidence`
- `Class` (`P0`, `P1`, `P2`)
- `Why it matters / why it reads as generated`
- `Possible non-AI explanation`
- `Smallest fix`
- `Acceptance check`
- `Confidence` (`High`, `Medium`, `Low`)
- `File/line`

Return only the top 5-8 findings for review-only asks and merge repeated symptoms under one root cause.

For implementation asks, patch the code directly, then summarize what was simplified, what was intentionally left alone, what validation ran, and any follow-up risks.

## Guardrails

- Treat "AI-looking" as a quality smell, not a provenance claim.
- Prefer objective maintainability, correctness, and safety defects over style-only opinions.
- Do not widen APIs into mega-helpers, config bags, or boolean-flag mode switches just to reduce line count.
- Do not add speculative abstraction layers, broad framework wrappers, or one-off utility namespaces.
- Do not reformat unrelated files or chase broad style churn.

## Resource

- `references/sources.md`: source basis for code-smell, AI-generated-code, and security-review checks.
4 changes: 4 additions & 0 deletions agents/skills/audit-ai-code/agents/openai.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
interface:
display_name: "AI Code Audit"
short_description: "Audit AI-shaped backend/code smells and apply safe fixes."
default_prompt: "Use $audit-ai-code to review my current diff for duplicate helpers, defensive branches, broad exception handlers, speculative wrappers, and fixture-shaped hacks."
10 changes: 10 additions & 0 deletions agents/skills/audit-ai-code/references/sources.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
# Source basis

This skill is a quality and safety audit checklist, not proof of AI authorship. Use these references to ground the smell taxonomy and reviewer language.

- Wikipedia on code smells: `https://en.wikipedia.org/wiki/Code_smell`
- Refactoring.Guru smell catalog and refactorings, especially Duplicate Code, Long Method, Long Parameter List, Comments, and Speculative Generality: `https://refactoring.guru/refactoring/smells`
- GitHub Copilot responsible-use guidance for code review, including security checks and AI-generated suggestion risk: `https://docs.github.com/en/copilot/responsible-use/code-review`
- LLM-generated code smell study: `https://arxiv.org/abs/2510.03029`
- Package hallucination risk in AI-generated code: `https://www.usenix.org/publications/loginonline/we-have-package-you-comprehensive-analysis-package-hallucinations-code`
- Hallucinated API behavior in code-generation systems: `https://arxiv.org/abs/2401.01701`
75 changes: 31 additions & 44 deletions agents/skills/audit-ai-frontend/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,63 +5,50 @@ description: Audit AI-generated or AI-looking frontend implementations, UI scree

# AI Frontend Audit

## Use
Use this one-page scale to audit or repair frontend UI that looks generically AI-generated. Treat model/tool clues as weak priors; judge the shipped experience.

Audit or repair frontend UI that looks generically AI-generated, while preserving existing structure unless the user asks for a redesign.
## Scale

Review in this order:
`S0` Blockers: broken keyboard, labels, focus, contrast, touch targets, mobile layout, or missing loading/empty/error states.

1. Inspect code and UI together.
- Read components, CSS/theme tokens, and existing primitives first.
- If runnable, invoke `$playwright` or `$playwright-interactive` and follow that skill's `SKILL.md`; this skill decides what to inspect, not browser mechanics.
- If screenshot-only, review visuals but label implementation risks as `Inferred`.
`S1` Product truth: fake metrics, demo data, missing source/date labels, UI/API/schema drift, auth or tenancy assumptions, no retry or recovery.

2. Load only the reference you need.
- `references/patterns.md` for concrete AI-tell and code-smell fixes.
- `references/rubric.md` for broad UX/a11y/design audits.
- `references/workflows.md` for Playwright QA, reference-packet, and brief-lock loops.
`S2` Local fit: ignores the repo's component library, tokens, typography, density, adjacent screens, or established interaction states.

3. Preserve local system intent while removing accidental defaults.
- Keep copy/order/IA and known product tokens unless the user asks for a redesign.
- Keep a common-looking font/card/palette only if adjacent screens or documented tokens already use it; replace it when the style exists only in the generated screen.
- If references are missing, derive one explicit design contract from product domain + user job + existing primitives; do not fabricate named reference sites.
`S3` Task hierarchy: generic dashboard or landing-page structure, all panels equal weight, unclear primary action, weak IA.

4. Fix in this order.
- `P0`: keyboard, labels, contrast, touch targets, mobile overflow, missing loading/empty/error states.
- `P1`: generic SaaS layout, card overuse, icon-pill repetition, Inter/Roboto/system defaults, purple/indigo/cyan gradient/glass tropes, vague CTA/copy.
- `P2`: spacing rhythm, token consistency, one memorable visual rule, reduced-motion and state polish.
`S4` AI aesthetic defaults: Inter/system-only personality, purple/indigo/cyan gradients, glass/glow layers, rounded-card grids, Lucide icon pills, vague CTA/copy, overlong explanatory prose, repeated section shells.

5. Re-verify in browser after edits whenever possible.
`S5` Tool fingerprints: v0/shadcn registry shells, Claude artifact polish, Codex minimal-diff conservatism, Gemini explainer layouts, Lovable/Supabase app shells, Bolt/Replit fallback scaffolds, Figma layer residue.

## Output
`S6` Creative polish: fun styles, algorithmic art, theme packs, image assets, stickers, motion, or novelty that does not affect the core task.

For each finding, include:
## Do

- `Issue`
- `Evidence`
- `Class` (`P0`, `P1`, `P2`)
- `Why it matters / why it reads as generic`
- `Possible non-AI explanation`
- `Smallest fix`
- `Acceptance check`
- `Confidence` (`High`, `Medium`, `Low`)
- `File/line` when code is available
- Inspect code and UI together before proposing changes.
- Rank findings by the scale above; fix lower-numbered issues before style.
- Preserve local copy, IA, tokens, and component primitives unless they are the problem.
- Use installed icon packs or existing icon components by default; custom SVGs are only for bespoke product marks, diagrams, or assets the icon set cannot express.
- Use source/tool clues only to expand searches, for example `CardHeader`, `text-muted-foreground`, `lucide-react`, `supabase`, `VITE_`, fixed `left/top/width/height`, `features.map`, `bg-clip-text`.
- Replace generic polish with product-specific hierarchy: one primary action, one dominant data surface, concrete object/action copy, and realistic states.
- When a UI still feels AI-generated after visual cleanup, cut copy and change the information structure before changing colors or adding decoration.
- Verify runnable UIs in browser at desktop and mobile sizes, including keyboard/focus, long text, empty data, loading, errors, disabled states, and reduced motion.

Return only the top 5-8 findings and merge repeated symptoms under one root cause. End with one line: `If I had to change only one thing: ...`
## Don't

For implementation asks, patch the code directly, then summarize only the meaningful design changes and any remaining risk.
- Don't claim AI authorship from style, model fingerprints, or component choices.
- Don't prioritize fun skills, stickers, algorithmic art, theme packs, or dramatic motion above operability and product truth.
- Don't overcorrect generic UI with random ornaments, novelty fonts, noisy textures, or one-off visual chaos.
- Don't keep repeated card grids, bordered panels, or long prose blocks just because they are already implemented; collapse them into rows, matrices, labels, or plain text when the content is simple.
- Don't hand-roll inline SVG icons when the repo already has `lucide-react`, Heroicons, Font Awesome, Radix icons, Material icons, or another installed icon system.
- Don't replace a documented design system just because it uses common fonts, cards, or neutral tokens.
- Don't report inferred accessibility or code defects from screenshots as fact; mark them `Inferred`.
- Don't leave pretty demo states in place of real authorization, validation, empty, error, setup, or API contract behavior.

## Guardrails
## Output

- Treat "AI-looking" as a quality smell, not a provenance claim.
- Prefer objective defects over taste opinions.
- When auditing shadcn/ui projects, preserve semantic component usage and tokens. Use the `shadcn` skill if component APIs, registry install/update, or shadcn-specific composition rules are part of the fix.
- Avoid anti-slop overcorrection: no random ornaments, novelty fonts, or one-off visual chaos.
- Anchor each finding in code, screenshots, DOM/a11y snapshots, or browser behavior, and separate fact from inference.
For review-only asks, return the top 5-8 findings with `Issue`, `Evidence`, `Scale`, `Class`, `Smallest fix`, `Acceptance check`, `Confidence`, and `File/line`. Merge repeated symptoms under one root cause and end with `If I had to change only one thing: ...`

## Resource
For implementation asks, patch directly, then summarize the meaningful design changes and remaining risk.

- `references/patterns.md`: checklist of AI-frontend tells, code smells, and repair patterns.
- `references/rubric.md`: compact UX/a11y/design-quality rubric for broader audits.
- `references/workflows.md`: Playwright QA, reference-packet, and brief-lock loops; delegates browser mechanics to `$playwright` and `$playwright-interactive`.
- Use `$playwright` and `$playwright-interactive` directly for browser execution workflows.
Use `references/patterns.md`, `references/rubric.md`, `references/workflows.md`, and `references/sources.md` only when the one-page scale is not enough.
4 changes: 4 additions & 0 deletions agents/skills/audit-ai-frontend/references/patterns.md
Original file line number Diff line number Diff line change
Expand Up @@ -216,6 +216,9 @@ Search for these framework-agnostic patterns before patching:
- repeated hover scale transforms on every card or tile
- repeated `Card` maps with the same `Icon + title + description` structure
- repeated outline-icon imports used only as section decoration
- component-library demo shells such as `CardHeader`, `CardDescription`, `Badge`, `Tabs`, `DropdownMenu`, `text-muted-foreground`, or `lucide-react`
- full-stack builder scaffolding such as `Dashboard`, `Overview`, `Recent Activity`, `Settings`, `ProtectedRoute`, `useAuth`, `supabase`, `VITE_`, or fallback demo arrays
- design-to-code residue such as fixed `left/top/width/height`, layer-like asset names, or repeated exact pixel values
- `outline: none`, `tabIndex={-1}`, clickable `div`/`span`
- missing `aria-label`, `aria-describedby`, `alt`, or dialog titles
- fixed desktop widths, fixed card grids, sticky sidebars, or wide tables with no mobile fallback
Expand All @@ -226,6 +229,7 @@ If the project uses Tailwind, also search for these optional utility-class equiv

- `font-sans`, `from-purple-*`, `to-indigo-*`, `bg-indigo-*`, `text-transparent bg-clip-text`
- `rounded-2xl`, `rounded-3xl`, `shadow-xl`, `backdrop-blur-*`, `hover:scale-105`
- `text-muted-foreground`, `bg-card`, `border-border`, `bg-background`, `--radius`, `oklch`

## Minimal Repair Playbook

Expand Down
15 changes: 15 additions & 0 deletions agents/skills/audit-ai-frontend/references/sources.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
# Source basis

This skill treats "AI-looking" UI as a quality pattern, not a provenance claim. Use these references to ground the visual-smell and UX checks.

- Wikipedia on AI slop: `https://en.wikipedia.org/wiki/AI_slop`
- CrowdGenUI study on generated UI converging to generic solutions and missing task/user context: `https://arxiv.org/abs/2411.03477`
- Jidong Lab essay on why AI-generated UIs converge visually: `https://www.jidonglab.com/blog/why-every-ai-generated-ui-looks-the-same-and-how-to-escape-the-digital-sea-of-sameness`
- WCAG 2.2 for objective accessibility failures such as contrast, focus indicators, and touch target size: `https://www.w3.org/TR/WCAG22/`
- OpenAI model/release notes for current GPT/Codex coding behavior: `https://openai.com/research/index/release/`
- Anthropic Claude Sonnet 4.5 release notes for current coding/agentic/computer-use positioning: `https://www.anthropic.com/news/claude-sonnet-4-5`
- Google Gemini 3 in Search / AI Mode notes for generative UI, dynamic visual layouts, interactive tools, and simulations: `https://blog.google/products/search/gemini-3-search-ai-mode`
- Vercel AI Elements notes for shadcn/ui-based AI interface primitives: `https://vercel.com/changelog/introducing-ai-elements`
- shadcn/ui docs for Tailwind/CSS-variable component defaults used by many generated UIs: `https://ui.shadcn.com/docs`
- Lovable docs for Supabase-centered app-builder workflows: `https://docs.lovable.dev/`
- Bolt docs for browser full-stack app-builder workflows: `https://support.bolt.new/`
2 changes: 1 addition & 1 deletion agents/skills/audit-ai-frontend/references/workflows.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Delegate browser operation details to the existing Playwright skills:

- Use `$playwright` for one-shot CLI browser inspection, snapshots, screenshots, and trace capture.
- Use `$playwright-interactive` for persistent browser sessions, repeated edit/reload loops, and deeper visual QA with a shared QA inventory.
- Open `${CODEX_HOME:-$HOME/.codex}/skills/playwright/SKILL.md` or `${CODEX_HOME:-$HOME/.codex}/skills/playwright-interactive/SKILL.md` before running browser commands.
- Open `/Users/jasonliu/.codex/skills/playwright/SKILL.md` or `/Users/jasonliu/.codex/skills/playwright-interactive/SKILL.md` before running browser commands.
- Keep this workflow focused on what to inspect for AI-frontend quality, not how to operate Playwright primitives.

1. Open the page in a real browser using the `playwright` skill.
Expand Down
11 changes: 7 additions & 4 deletions agents/skills/audit-ai-writing/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,8 @@ description: Reference-only checklist for AI-writing artifacts, citation failure

## Use

Audit or repair Markdown, docs, and pasted prose that may contain AI-writing residue, broken citations, or house-style drift.

Open `patterns.md` and review in this order:

1. Machine residue, broken markup, and broken citations.
Expand All @@ -21,13 +23,14 @@ For each finding, include:
- `Issue`
- `Evidence` (exact snippet or line location)
- `Class` (`P0`, `P1`, `P2`)
- `Why it matters / why it reads as generic`
- `Why it matters / why it reads as generated`
- `Possible non-AI explanation`
- `Smallest fix`
- `Acceptance check`
- `Confidence` (`High`, `Medium`, `Low`)
- `File/line` when available
- `File/line` when a file is available

Return only the top 5-8 findings and merge repeated symptoms under one root cause.
Return only the top 5-8 findings and merge repeated symptoms under one root cause. For rewrite asks, patch the text directly and summarize only the meaningful cleanup.

## Guardrails

Expand All @@ -38,4 +41,4 @@ Return only the top 5-8 findings and merge repeated symptoms under one root caus

## Resource

- `patterns.md`: compact artifact taxonomy, verification checks, and rewrite guidance.
- `patterns.md`: compact artifact taxonomy, verification checks, rewrite guidance, and source basis.
Loading