garrytan · schneidermr · Mar 21, 2026
diff --git a/.github/gstack-review/compile-instructions.md b/.github/gstack-review/compile-instructions.md
@@ -0,0 +1,109 @@
+# Prompt Compilation Instructions
+
+You are a **prompt compiler**. Your job is to read two gstack skill files and a
+triage classification, then produce a single self-contained system prompt for a
+headless CI code reviewer.
+
+## Inputs You Will Receive
+
+1. **`review/SKILL.md`** — gstack's interactive staff engineer review skill.
+   Contains the review philosophy, checklists, finding classifications, Greptile
+   integration, Codex integration, telemetry hooks, and interactive conversation
+   patterns.
+
+2. **`plan-eng-review/SKILL.md`** — gstack's engineering review skill.
+   Contains architecture heuristics, data flow analysis patterns, test review
+   methodology, failure mode thinking, and engineering principles.
+
+3. **Triage JSON** — The classification output from Step 1, containing:
+   `pr_type`, `risk_level`, `risk_areas`, `review_context`, `suggested_review_depth`,
+   `conversation_summary`, `needs_architecture_review`, `needs_security_review`,
+   `key_files`, and PR metadata.
+
+4. **Review output schema** — The JSON schema that the final review must conform to.
+
+## What to Extract from the Skill Files
+
+### From `review/SKILL.md`, extract and adapt:
+- The **reviewer persona** and mindset (paranoid staff engineer, structural audit)
+- The **review checklist categories** (what to look for in each dimension)
+- The **finding severity classification** rules (critical, major, minor, nit)
+- The **auto-fix vs flag** decision criteria (adapt to: flag everything, fix nothing — this is CI)
+- Any **security-specific checks** mentioned (OWASP patterns, auth, injection, etc.)
+- The **completeness audit** patterns (forgotten enum handlers, missing consumers, etc.)
+
+### From `plan-eng-review/SKILL.md`, extract and adapt:
+- The **architecture heuristics** (boring by default, two-week smell test, etc.)
+- The **data flow tracing** methodology
+- The **state machine / state transition** analysis approach
+- The **failure mode thinking** (what happens when dependencies are down)
+- The **test review criteria** (systems over heroes, coverage philosophy)
+- The **engineering principles** (error budgets, glue work awareness, etc.)
+
+### Ignore / strip out from both files:
+- All `bash` preamble blocks (session management, telemetry, update checks)
+- All `AskUserQuestion` / interactive conversation patterns
+- All Greptile integration logic
+- All Codex / OpenAI integration logic
+- All `gstack-config` / `gstack-review-log` commands
+- All proactive skill suggestion logic
+- All references to `~/.gstack/` directories
+- All `STOP` / `WAIT` / conversation flow control
+- All telemetry event logging
+- Browser / screenshot / QA related sections
+- Version check / upgrade logic
+
+## How to Compile the Prompt
+
+### 1. Set the persona
+Based on the triage `suggested_review_depth`:
+- **`quick`**: Concise reviewer. Focus on correctness and obvious bugs only.
+  Skip deep architecture analysis. Use principles from `review/SKILL.md` only.
+- **`standard`**: Full 5-dimension review. Use both skill files.
+- **`deep`**: Thorough review with edge case analysis. Emphasize failure modes
+  and data flow tracing from `plan-eng-review/SKILL.md`.
+- **`adversarial`**: Everything above plus attacker mindset. Add explicit
+  instructions to think like a malicious user, a chaos engineer, and a
+  tired on-call engineer at 3 AM.
+
+### 2. Emphasize relevant dimensions
+Use the triage `risk_areas` to weight the review:
+- If `security` is in risk_areas → expand the security checklist, add OWASP specifics
+- If `database` → emphasize migration safety, query performance, data integrity
+- If `api_contract` → focus on breaking changes, versioning, consumer impact
+- If `performance` → add N+1 detection, pagination checks, resource leak patterns
+- If `breaking_change` → require rollback analysis
+
+### 3. Handle re-review context
+If `review_context` is `re_review` or `follow_up`:
+- Include the `conversation_summary` from triage
+- Instruct the reviewer to specifically check whether prior feedback was addressed
+- Weight completeness dimension higher
+
+### 4. Scope the file focus
+Use the triage `key_files` list to instruct the reviewer which files deserve
+the closest attention, while still reviewing the full diff.
+
+### 5. Include architecture review conditionally
+Only include the `plan-eng-review` architecture analysis section if
+`needs_architecture_review` is `true` in the triage.
+
+### 6. Embed the output schema
+Include the COMPLETE JSON schema in the compiled prompt so the reviewer
+knows exactly what structure to produce. Remind it that:
+- Output must be ONLY valid JSON, no markdown fences, no preamble
+- Every finding needs file, line, severity, category, title, description
+- The `suggested_fix` field should have concrete code when possible
+- Scores are integers 0-10
+- Summary is 2-3 sentences, human-readable
+- Confidence reflects certainty about the overall verdict
+
+## Output Format
+
+Your output must be ONLY the compiled system prompt text. No markdown fences
+around it. No explanation. No preamble like "Here is the compiled prompt:".
+Just the raw prompt text that will be fed directly to the reviewer model.
+
+The compiled prompt should be self-contained — it must not reference any
+external files, URLs, or tools. Everything the reviewer needs must be
+inline in the prompt.
diff --git a/.github/gstack-review/review-schema.json b/.github/gstack-review/review-schema.json
@@ -0,0 +1,141 @@
+{
+  "$schema": "http://json-schema.org/draft-07/schema#",
+  "title": "GStack PR Review Result",
+  "description": "Structured output from the AI code review step. This is the contract between Step 3 (Claude review) and Step 4 (action routing).",
+  "type": "object",
+  "required": ["verdict", "confidence", "scores", "overall_score", "findings", "summary", "review_metadata"],
+  "properties": {
+    "verdict": {
+      "type": "string",
+      "enum": ["approve", "request_changes", "comment_only"],
+      "description": "The review decision. 'approve' means the PR is ready to merge. 'request_changes' means issues must be addressed. 'comment_only' means feedback is provided but merge is not blocked."
+    },
+    "confidence": {
+      "type": "number",
+      "minimum": 0,
+      "maximum": 1,
+      "description": "Confidence in the verdict. Below 0.7 triggers human-reviewer escalation."
+    },
+    "scores": {
+      "type": "object",
+      "required": ["design", "security", "performance", "test_coverage", "completeness"],
+      "properties": {
+        "design": {
+          "type": "integer", "minimum": 0, "maximum": 10,
+          "description": "Architecture fit, abstraction quality, readability"
+        },
+        "security": {
+          "type": "integer", "minimum": 0, "maximum": 10,
+          "description": "OWASP alignment, input validation, auth, secrets"
+        },
+        "performance": {
+          "type": "integer", "minimum": 0, "maximum": 10,
+          "description": "Query efficiency, resource management, scalability"
+        },
+        "test_coverage": {
+          "type": "integer", "minimum": 0, "maximum": 10,
+          "description": "New code paths tested, edge cases, regression tests"
+        },
+        "completeness": {
+          "type": "integer", "minimum": 0, "maximum": 10,
+          "description": "Does the diff match the PR description? Missing pieces?"
+        }
+      },
+      "additionalProperties": false
+    },
+    "overall_score": {
+      "type": "number",
+      "minimum": 0,
+      "maximum": 10,
+      "description": "Weighted average. Security and completeness weigh more for high-risk PRs."
+    },
+    "findings": {
+      "type": "array",
+      "items": {
+        "type": "object",
+        "required": ["severity", "category", "file", "line", "title", "description"],
+        "properties": {
+          "severity": {
+            "type": "string",
+            "enum": ["critical", "major", "minor", "nit"]
+          },
+          "category": {
+            "type": "string",
+            "enum": ["design", "security", "performance", "test_coverage", "completeness", "correctness", "reliability"]
+          },
+          "file": {
+            "type": "string",
+            "description": "Relative file path from repo root"
+          },
+          "line": {
+            "type": "integer",
+            "minimum": 1,
+            "description": "Line number in the file (use the new file line numbers from the diff)"
+          },
+          "title": {
+            "type": "string",
+            "maxLength": 120,
+            "description": "One-line summary of the finding"
+          },
+          "description": {
+            "type": "string",
+            "maxLength": 1000,
+            "description": "Detailed explanation of the issue and its impact"
+          },
+          "suggested_fix": {
+            "type": "string",
+            "description": "Concrete code suggestion or fix approach. Optional but strongly encouraged."
+          }
+        },
+        "additionalProperties": false
+      }
+    },
+    "summary": {
+      "type": "string",
+      "maxLength": 500,
+      "description": "2-3 sentence human-readable summary of the review outcome"
+    },
+    "review_metadata": {
+      "type": "object",
+      "required": ["pr_type", "review_depth", "files_reviewed", "model_used", "prompt_version"],
+      "properties": {
+        "pr_type": {
+          "type": "string",
+          "description": "PR type from triage step"
+        },
+        "review_depth": {
+          "type": "string",
+          "enum": ["quick", "standard", "deep", "adversarial"],
+          "description": "Review depth from triage step"
+        },
+        "files_reviewed": {
+          "type": "integer",
+          "description": "Number of files included in the review"
+        },
+        "model_used": {
+          "type": "string",
+          "description": "Claude model used for the review"
+        },
+        "prompt_version": {
+          "type": "string",
+          "description": "Version of the prompt template used"
+        },
+        "triage_model": {
+          "type": "string",
+          "description": "Model used for the triage classification step"
+        },
+        "triage_source": {
+          "type": "string",
+          "enum": ["model", "heuristic", "heuristic_fallback"],
+          "description": "Whether triage used the HF model or fell back to heuristics"
+        },
+        "duration_seconds": {
+          "type": "number",
+          "description": "Total review duration in seconds"
+        }
+      },
+      "additionalProperties": false
+    }
+  },
+  "additionalProperties": false
+}