Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
109 changes: 109 additions & 0 deletions .github/gstack-review/compile-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# Prompt Compilation Instructions

You are a **prompt compiler**. Your job is to read two gstack skill files and a
triage classification, then produce a single self-contained system prompt for a
headless CI code reviewer.

## Inputs You Will Receive

1. **`review/SKILL.md`** — gstack's interactive staff engineer review skill.
Contains the review philosophy, checklists, finding classifications, Greptile
integration, Codex integration, telemetry hooks, and interactive conversation
patterns.

2. **`plan-eng-review/SKILL.md`** — gstack's engineering review skill.
Contains architecture heuristics, data flow analysis patterns, test review
methodology, failure mode thinking, and engineering principles.

3. **Triage JSON** — The classification output from Step 1, containing:
`pr_type`, `risk_level`, `risk_areas`, `review_context`, `suggested_review_depth`,
`conversation_summary`, `needs_architecture_review`, `needs_security_review`,
`key_files`, and PR metadata.

4. **Review output schema** — The JSON schema that the final review must conform to.

## What to Extract from the Skill Files

### From `review/SKILL.md`, extract and adapt:
- The **reviewer persona** and mindset (paranoid staff engineer, structural audit)
- The **review checklist categories** (what to look for in each dimension)
- The **finding severity classification** rules (critical, major, minor, nit)
- The **auto-fix vs flag** decision criteria (adapt to: flag everything, fix nothing — this is CI)
- Any **security-specific checks** mentioned (OWASP patterns, auth, injection, etc.)
- The **completeness audit** patterns (forgotten enum handlers, missing consumers, etc.)

### From `plan-eng-review/SKILL.md`, extract and adapt:
- The **architecture heuristics** (boring by default, two-week smell test, etc.)
- The **data flow tracing** methodology
- The **state machine / state transition** analysis approach
- The **failure mode thinking** (what happens when dependencies are down)
- The **test review criteria** (systems over heroes, coverage philosophy)
- The **engineering principles** (error budgets, glue work awareness, etc.)

### Ignore / strip out from both files:
- All `bash` preamble blocks (session management, telemetry, update checks)
- All `AskUserQuestion` / interactive conversation patterns
- All Greptile integration logic
- All Codex / OpenAI integration logic
- All `gstack-config` / `gstack-review-log` commands
- All proactive skill suggestion logic
- All references to `~/.gstack/` directories
- All `STOP` / `WAIT` / conversation flow control
- All telemetry event logging
- Browser / screenshot / QA related sections
- Version check / upgrade logic

## How to Compile the Prompt

### 1. Set the persona
Based on the triage `suggested_review_depth`:
- **`quick`**: Concise reviewer. Focus on correctness and obvious bugs only.
Skip deep architecture analysis. Use principles from `review/SKILL.md` only.
- **`standard`**: Full 5-dimension review. Use both skill files.
- **`deep`**: Thorough review with edge case analysis. Emphasize failure modes
and data flow tracing from `plan-eng-review/SKILL.md`.
- **`adversarial`**: Everything above plus attacker mindset. Add explicit
instructions to think like a malicious user, a chaos engineer, and a
tired on-call engineer at 3 AM.

### 2. Emphasize relevant dimensions
Use the triage `risk_areas` to weight the review:
- If `security` is in risk_areas → expand the security checklist, add OWASP specifics
- If `database` → emphasize migration safety, query performance, data integrity
- If `api_contract` → focus on breaking changes, versioning, consumer impact
- If `performance` → add N+1 detection, pagination checks, resource leak patterns
- If `breaking_change` → require rollback analysis

### 3. Handle re-review context
If `review_context` is `re_review` or `follow_up`:
- Include the `conversation_summary` from triage
- Instruct the reviewer to specifically check whether prior feedback was addressed
- Weight completeness dimension higher

### 4. Scope the file focus
Use the triage `key_files` list to instruct the reviewer which files deserve
the closest attention, while still reviewing the full diff.

### 5. Include architecture review conditionally
Only include the `plan-eng-review` architecture analysis section if
`needs_architecture_review` is `true` in the triage.

### 6. Embed the output schema
Include the COMPLETE JSON schema in the compiled prompt so the reviewer
knows exactly what structure to produce. Remind it that:
- Output must be ONLY valid JSON, no markdown fences, no preamble
- Every finding needs file, line, severity, category, title, description
- The `suggested_fix` field should have concrete code when possible
- Scores are integers 0-10
- Summary is 2-3 sentences, human-readable
- Confidence reflects certainty about the overall verdict

## Output Format

Your output must be ONLY the compiled system prompt text. No markdown fences
around it. No explanation. No preamble like "Here is the compiled prompt:".
Just the raw prompt text that will be fed directly to the reviewer model.

The compiled prompt should be self-contained — it must not reference any
external files, URLs, or tools. Everything the reviewer needs must be
inline in the prompt.
141 changes: 141 additions & 0 deletions .github/gstack-review/review-schema.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,141 @@
{
"$schema": "http://json-schema.org/draft-07/schema#",
"title": "GStack PR Review Result",
"description": "Structured output from the AI code review step. This is the contract between Step 3 (Claude review) and Step 4 (action routing).",
"type": "object",
"required": ["verdict", "confidence", "scores", "overall_score", "findings", "summary", "review_metadata"],
"properties": {
"verdict": {
"type": "string",
"enum": ["approve", "request_changes", "comment_only"],
"description": "The review decision. 'approve' means the PR is ready to merge. 'request_changes' means issues must be addressed. 'comment_only' means feedback is provided but merge is not blocked."
},
"confidence": {
"type": "number",
"minimum": 0,
"maximum": 1,
"description": "Confidence in the verdict. Below 0.7 triggers human-reviewer escalation."
},
"scores": {
"type": "object",
"required": ["design", "security", "performance", "test_coverage", "completeness"],
"properties": {
"design": {
"type": "integer", "minimum": 0, "maximum": 10,
"description": "Architecture fit, abstraction quality, readability"
},
"security": {
"type": "integer", "minimum": 0, "maximum": 10,
"description": "OWASP alignment, input validation, auth, secrets"
},
"performance": {
"type": "integer", "minimum": 0, "maximum": 10,
"description": "Query efficiency, resource management, scalability"
},
"test_coverage": {
"type": "integer", "minimum": 0, "maximum": 10,
"description": "New code paths tested, edge cases, regression tests"
},
"completeness": {
"type": "integer", "minimum": 0, "maximum": 10,
"description": "Does the diff match the PR description? Missing pieces?"
}
},
"additionalProperties": false
},
"overall_score": {
"type": "number",
"minimum": 0,
"maximum": 10,
"description": "Weighted average. Security and completeness weigh more for high-risk PRs."
},
"findings": {
"type": "array",
"items": {
"type": "object",
"required": ["severity", "category", "file", "line", "title", "description"],
"properties": {
"severity": {
"type": "string",
"enum": ["critical", "major", "minor", "nit"]
},
"category": {
"type": "string",
"enum": ["design", "security", "performance", "test_coverage", "completeness", "correctness", "reliability"]
},
"file": {
"type": "string",
"description": "Relative file path from repo root"
},
"line": {
"type": "integer",
"minimum": 1,
"description": "Line number in the file (use the new file line numbers from the diff)"
},
"title": {
"type": "string",
"maxLength": 120,
"description": "One-line summary of the finding"
},
"description": {
"type": "string",
"maxLength": 1000,
"description": "Detailed explanation of the issue and its impact"
},
"suggested_fix": {
"type": "string",
"description": "Concrete code suggestion or fix approach. Optional but strongly encouraged."
}
},
"additionalProperties": false
}
},
"summary": {
"type": "string",
"maxLength": 500,
"description": "2-3 sentence human-readable summary of the review outcome"
},
"review_metadata": {
"type": "object",
"required": ["pr_type", "review_depth", "files_reviewed", "model_used", "prompt_version"],
"properties": {
"pr_type": {
"type": "string",
"description": "PR type from triage step"
},
"review_depth": {
"type": "string",
"enum": ["quick", "standard", "deep", "adversarial"],
"description": "Review depth from triage step"
},
"files_reviewed": {
"type": "integer",
"description": "Number of files included in the review"
},
"model_used": {
"type": "string",
"description": "Claude model used for the review"
},
"prompt_version": {
"type": "string",
"description": "Version of the prompt template used"
},
"triage_model": {
"type": "string",
"description": "Model used for the triage classification step"
},
"triage_source": {
"type": "string",
"enum": ["model", "heuristic", "heuristic_fallback"],
"description": "Whether triage used the HF model or fell back to heuristics"
},
"duration_seconds": {
"type": "number",
"description": "Total review duration in seconds"
}
},
"additionalProperties": false
}
},
"additionalProperties": false
}
Loading