[prompt-clustering] Copilot Agent Prompt Clustering Analysis – 2026-02-23 #17915
Closed
Replies: 2 comments
-
|
🤖 Beep boop! The smoke test agent was here! 👋 Just dropping by to say that automated testing is alive and well. The Copilot smoke test agent swung through, kicked the tires, and everything looks shipshape! 🚀
|
Beta Was this translation helpful? Give feedback.
0 replies
-
|
This discussion was automatically closed because it expired on 2026-02-24T12:51:32.940Z.
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
-
Daily NLP-based clustering analysis of copilot agent task prompts from the last 30 days (1,697 PRs analysed).
Summary
Cluster Analysis
Cluster Breakdown Table
C1 — review / md / ci (624 tasks, 72% merge rate)
The largest cluster — general implementation tasks touching CI workflows, markdown files, and code review scenarios. High volume and solid merge rate suggest well-understood scope.
C2 — discussion / aw / 2026 (356 tasks, 58% merge rate)⚠️ lowest success
Tasks that reference analysis reports, discussions, and AI-generated content. Lowest merge rate at 58% and 18 open PRs. Many of these are auto-generated improvements derived from daily analysis workflows (quality reports, schema consistency findings, etc.), which may have higher rejection/closure rates due to lower specificity or conflicting changes.
C3 — safe / safe outputs / outputs (195 tasks, 79% merge rate) ✅ highest success
Safe-outputs feature work — the highest merge rate of any cluster. Despite being the most commit-intensive (avg 4.4), tasks are well-scoped and clearly defined, leading to excellent outcomes.
activation-commentsto disable activation/fixC4 — docs / custom / documentation (130 tasks, 75% merge rate)
Documentation and custom instruction tasks. Fewest files changed (avg 8.4) — small, focused PRs with good outcomes.
C5 — mcp / server / mcp server (102 tasks, 60% merge rate)
MCP server integration tasks. Higher file-change count (avg 22.8) and below-average merge rate suggest these tasks are more complex and prone to scope/compatibility issues.
MCPServerIDsemantic type for MCP server ID constantsC6 — debug / create / custom (92 tasks, 64% merge rate)
Debugging and creation tasks with custom AI configurations. Mid-range outcomes with moderate complexity.
C7 — version / v0 / release (90 tasks, 67% merge rate)
Version bump and release tasks. Highest average files changed (76.3) due to package lock file regeneration and bulk dependency updates.
imports:dependencies locallyC8 — failing / implement / id (76 tasks, 76% merge rate)
Bug fix tasks initiated from CI failures or root-cause analysis. High merge rate and low file count (avg 11.2) suggest targeted, precise fixes.
C9 — triggering command / block (32 tasks, 72% merge rate)
Smallest cluster — workflow trigger and command-blocking tasks. Focused, low-complexity work with good outcomes.
Key Findings
Dominant task type — C1 ("review / md / ci") accounts for 37% of all tasks (624/1,697). General implementation work touching CI and markdown infrastructure is by far the most common category.
Success rate spread — C3 ("safe / safe outputs / outputs") achieves 79% merge rate vs C2 ("discussion / aw / 2026") at 58% — a 21-point gap. Analysis-driven tasks from automated workflows show the lowest success rate.
Task complexity by commits — C3 ("safe / safe outputs / outputs") requires the most commits on average (4.4), yet has the highest merge rate, suggesting that well-scoped complex work outperforms loosely-scoped simpler work.
Release tasks are file-heavy — C7 ("version / v0 / release") averages 76.3 files changed per PR — driven by package-lock regeneration. This inflates the file-change metric and should be interpreted separately.
In-progress work — 44 PRs are currently open; C2 has the most open PRs (18), further confirming that analysis-driven tasks have a longer completion cycle.
CI-failure bug fixes are efficient — C8 ("failing / implement / id") combines a high merge rate (76%) with the fewest commits (3.1 avg) and targeted scope (11.2 files avg), making it the most efficient cluster.
Recommendations
Study C3 as a prompt template: "safe / safe outputs / outputs" tasks achieve the highest merge rate (79%) despite high complexity. Examine what makes those prompts effective — likely: clear acceptance criteria, specific file references, and well-bounded scope.
Improve C2 prompt specificity: Analysis-driven tasks (C2, 58% merge rate) should include more precise acceptance criteria and explicit test/validation steps to reduce rejections. Consider adding structured "Definition of Done" sections to auto-generated prompts.
Address MCP server task friction (C5, 60%): The below-average merge rate for MCP tasks combined with high file counts (22.8) suggests scope creep. Add explicit constraints on which MCP servers/files should be modified.
Separate release automation (C7): Package dependency updates generate extremely high file counts (76 avg) that obscure metrics. Consider tagging these PRs for separate tracking.
Add task-type labels at creation: Tagging PRs with a category at creation time would make future clustering more accurate and enable trend tracking over time.
Representative PRs Sample (3 per cluster)
activation-commentsto disable activationimports:dependencies locallybase64executable not found on WindowsMCPServerIDsemantic type for MCP server IDReferences: §22306119147
Beta Was this translation helpful? Give feedback.
All reactions