feat: gstack-inspired AI PR review pipeline with HuggingFace triage, Claude review, and auto-merge#305
Conversation
…triage and fix skill-docs workflow
🤖 gstack as a GitHub Action — your skills, running on every PR, automaticallyWhat if This PR turns gstack's review skills into a fully automated GitHub Actions pipeline. It reads the actual The PipelineTotal: ~$0.10–0.33 per PR. Under 2 minutes. Why this mattersgstack's superpower is its opinionated review skills — the paranoid staff engineer persona, the architecture heuristics, the severity classification framework. But right now those only fire when someone manually runs This pipeline makes them fire on every PR, automatically, without changing a single line of the skill files. Step 2 reads the real SKILL.md files from The scoring and routingEvery PR gets a structured JSON review with 5-dimension scores:
The key differentiator isn't just cost — it's that the review knowledge comes from this repo's own skill files. Not a generic prompt. Not Anthropic's internal review framework. The same opinionated, battle-tested staff engineer persona from Claude Code Review is a great product for enterprises that want zero-config depth. This is for people who already have gstack and want it running continuously. What's in the PRFive files. The skill files are untouched — read at runtime from Setup: Two secrets ( tl;drgstack already has the best review skills. This makes them run on every PR without anyone lifting a finger — 50–100x cheaper and 10x faster than the managed alternative. The skills stay in your control, the triage is transparent, and the automation is configurable from "comment only" all the way to "auto-merge". Every review produces a downloadable JSON artifact — full audit trail of what was scored, what was found, and what action was taken. |
Summary
Adds a 4-step AI-powered PR review pipeline to GitHub Actions, inspired by
gstack's
/reviewand/plan-eng-reviewskill files. The pipeline classifies, reviews, scores, and optionally merges PRs
— fully automated, with structured JSON output and a complete audit trail.
Motivation
AI-assisted code generation is accelerating faster than teams can review it.
Anthropic's Claude Code Review solves this at the enterprise tier ($15–25/review,
20 min, Teams/Enterprise only). This pipeline brings comparable structured$0.10–0.30/review)review to any GitHub repo at a fraction of the cost (
by combining a lightweight open-source triage model with Claude's reasoning
capabilities and gstack's battle-tested review principles.
Architecture
Step 1 — Triage (HuggingFace, ~2-5s, ~free)
Gathers full PR context via GitHub API: diff, inline review comments,
conversation thread, and linked issues. Classifies the PR by type, risk,
size, and review depth using Qwen2.5-3B-Instruct. Falls back to
rule-based heuristics if the HF API is unavailable.
Step 2 — Prompt Compilation (Claude Sonnet, ~10-15s, ~$0.01-0.03)
Reads the actual gstack skill files (
review/SKILL.mdandplan-eng-review/SKILL.md) from themainbranch, plus the triageoutput. Following the
compile-instructions.mdmeta-prompt, Claudestrips interactive patterns, extracts the review principles, and
compiles a tailored single-pass review prompt optimized for this
specific PR's type, risk level, and review context.
Step 3 — Deep Review (Claude Sonnet, ~30-90s, ~$0.05-0.30)
Executes the compiled prompt against the actual PR diff. Produces a
structured JSON result with 5-dimension scores (design, security,
performance, test coverage, completeness), severity-classified findings
with file/line references, and a verdict.
Step 4 — Action Routing (bash, ~1-2s, free)
Pure deterministic logic — reads the review JSON and triage output,
then calls GitHub API to:
ai-approved,needs-work,security-review-needed, etc.)AUTO_MERGE_ENABLEDrepo variable)Decision Matrix
ai-review-passedneeds-human-reviewneeds-worksecurity-review-neededKey Design Decisions
classification and an expensive, capable model (Claude) for reasoning.
This keeps costs ~50-100x lower than Claude Code Review.
actual
review/SKILL.mdandplan-eng-review/SKILL.mdon themainbranch— not static copies. A
compile-instructions.mdmeta-prompt tells Claude howto strip interactive patterns and compile them into a headless CI review prompt.
When gstack updates its skills, the pipeline automatically picks up the changes.
review-schema.json)is the formal interface between the LLM and the automation layer.
Every review decision is a downloadable artifact for audit.
AUTO_MERGE_ENABLEDrepo variable (default:false)provides a global kill switch. Even when enabled, auto-merge requires:
score ≥ 9, no critical/major findings, triage approval, AND all other
CI checks passing.
heuristics. The pipeline never fails silently at classification.
Files
Note: The pipeline reads
review/SKILL.mdandplan-eng-review/SKILL.mdfrom the
mainbranch at runtime. These are the actual gstack skill files,not copies. When the skills are updated, the pipeline automatically picks
up the changes.
Setup Required
Secrets
ANTHROPIC_API_KEY(required) — Claude API accessHF_TOKEN(recommended) — HuggingFace Inference APIVariables
AUTO_MERGE_ENABLED—"true"to enable auto-merge, default"false"Optional (for merge/approve authority)
APP_ID(variable) +APP_PRIVATE_KEY(secret) — GitHub App credentialsTriggers
pull_request: [opened, synchronize, ready_for_review]issue_commentcontaining@gstack-review(manual re-trigger)Cost per Review
Compare: Anthropic Claude Code Review = $15-25/review (Teams/Enterprise only).