Bug Reproduction Replayer

**Current file:** `app/src/lib/tools/bug-replayer.ts`  
**Current model:** `deepseek-r1-0528`  
**Current approach:** Single prompt asking for reproduction steps, API calls, test script, expected vs actual, debugging checklist, and possible root causes. No structured bug analysis, no code validation, no environment-specific handling.

**Problems with current approach:**
- Generated CURL commands and test scripts may have syntax errors.
- Reproduction steps are generic and may not match the described environment.
- Test scripts reference libraries or APIs without verifying correctness.
- Root cause ranking is not informed by the error pattern - it is a generic list.
- No validation that the generated scripts are runnable.

**Upgrade plan:**

| Step | Agent | Action |
|------|-------|--------|
| 1 | Bug Classifier | Analyze the bug description and error output. Classify: bug type (API error, UI glitch, data corruption, race condition, auth failure), affected layer (frontend, backend, database, network), severity. |
| 2 | Environment Analyzer | Parse the environment string to identify: language, framework, runtime version, hosting platform. Use this to generate environment-specific reproduction code. |
| 3 | Reproduction Generator | Generate: numbered reproduction steps, cURL/fetch commands tailored to the environment, and a test script using the correct testing framework for the detected language. |
| 4 | Script Validator | Programmatic: Syntax-check the generated test script (e.g., parse Python with `ast.parse`, parse JS/TS with a lightweight parser). Validate cURL command syntax. |
| 5 | Root Cause Ranker | Using the classified bug type and error patterns, generate a ranked list of root causes with specific evidence from the error output. |
| 6 | Refinement Agent | If syntax validation fails on the test script, feed errors back and regenerate. Max 2 retries. |

- You are free to enhance the agents stacks in the above plan layout, the above one is just for reference. You can enhance more if needed.

**Model suggestions to start with:**
- Step 1: Try `deepseek-v3.2` or `llama-4-maverick-17b` for classification (lightweight task).
- Steps 3 and 6: Try `qwen-3-coder-30b` for code generation. Also try `kimi-k2.6` (agentic coding, can generate environment-aware scripts).
- Step 5: Try `deepseek-r1-0528` for root cause reasoning. Also try `kimi-k2-thinking` for deep analysis.

**Model Selection Guidance**
- **You are free to pick any model from the Oxlo catalog** based on your own testing and evaluation.
- The Models suggestions above, not mandates. Try them first, and if they do not meet the accuracy target, experiment with alternatives.

**Compare against:** GPT 5.3 Thinking & Claude Sonnet 4.6 Thinking.

**Acceptance criteria:**
- Generated test scripts pass syntax validation in 90%+ of cases.
- cURL commands are syntactically valid and use the correct HTTP method and headers for the described scenario.
- Environment-specific details (import paths, framework syntax) match the declared environment.
- Root cause ranking references specific evidence from the error output, not generic guesses.
- Overall quality matches or exceeds GPT 5.3 Thinking & Claude Sonnet 4.6 on test cases.
- Overall accuracy at 80%+.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug Reproduction Replayer #31

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Step	Agent	Action
1	Bug Classifier	Analyze the bug description and error output. Classify: bug type (API error, UI glitch, data corruption, race condition, auth failure), affected layer (frontend, backend, database, network), severity.
2	Environment Analyzer	Parse the environment string to identify: language, framework, runtime version, hosting platform. Use this to generate environment-specific reproduction code.
3	Reproduction Generator	Generate: numbered reproduction steps, cURL/fetch commands tailored to the environment, and a test script using the correct testing framework for the detected language.
4	Script Validator	Programmatic: Syntax-check the generated test script (e.g., parse Python with `ast.parse`, parse JS/TS with a lightweight parser). Validate cURL command syntax.
5	Root Cause Ranker	Using the classified bug type and error patterns, generate a ranked list of root causes with specific evidence from the error output.
6	Refinement Agent	If syntax validation fails on the test script, feed errors back and regenerate. Max 2 retries.

Bug Reproduction Replayer #31

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions