[prompt-clustering] Copilot Agent Prompt Clustering Analysis – 2026-02-23 #17915

2026-02-23T12:51:33Z

github-actions[bot]
bot Feb 23, 2026

Daily NLP-based clustering analysis of copilot agent task prompts from the last 30 days (1,697 PRs analysed).

Summary

Metric	Value
Analysis period	Last 30 days
Total PRs analysed	1,697
Merged	1,169 (69%)
Open	44
Closed (not merged)	484
Clusters identified	9
Most common cluster	review / md / ci (624 tasks, 37%)
Highest merge rate	safe / safe outputs / outputs (79%)
Lowest merge rate	discussion / aw / 2026 (58%)

Cluster Analysis

Cluster Breakdown Table

#	Theme	Tasks	Merged	Merge Rate	Avg Commits	Avg Files	Top Keywords
C1	review / md / ci	624	449	72%	4.0	21.2	review, md, ci, remove, support
C2	discussion / aw / 2026	356	207	58%	2.9	13.5	discussion, analysis, quality, report, ai generated
C3	safe / safe outputs / outputs	195	154	79%	4.4	21.3	safe outputs, output, create, project, handler
C4	docs / custom / documentation	130	98	75%	3.7	8.4	docs, custom, documentation, instructions, model
C5	mcp / server / mcp server	102	61	60%	3.6	22.8	mcp, server, mcp server, tool, tools
C6	debug / create / custom	92	59	64%	3.3	16.0	debug, create, custom, ai, aw
C7	version / v0 / release	90	60	67%	3.7	76.3	version, v0, release, patch
C8	failing / implement / id	76	58	76%	3.1	11.2	failing, implement, id, root cause, logs
C9	triggering command / block	32	23	72%	3.0	12.9	triggering command, block, command http

C1 — review / md / ci (624 tasks, 72% merge rate)

The largest cluster — general implementation tasks touching CI workflows, markdown files, and code review scenarios. High volume and solid merge rate suggest well-understood scope.

Avg commits: 4.0 | Avg files changed: 21.2
Top keywords: review, md, ci, remove, support, input
Example PRs:
- #17857 — fix(tests): clear GITHUB_RUN_ID/GH_AW_WORKFLOW_ID in memory tests
- #17842 — feat(release): celebrate community issue authors in release notes
- #17841 — Add --retry 3 with delay to curl downloads to handle transient errors

C2 — discussion / aw / 2026 (356 tasks, 58% merge rate) ⚠️ lowest success

Tasks that reference analysis reports, discussions, and AI-generated content. Lowest merge rate at 58% and 18 open PRs. Many of these are auto-generated improvements derived from daily analysis workflows (quality reports, schema consistency findings, etc.), which may have higher rejection/closure rates due to lower specificity or conflicting changes.

Avg commits: 2.9 | Avg files changed: 13.5
Top keywords: discussion, aw, 2026, analysis, quality, report, generated, ai generated
Example PRs:
- #17911 — [WIP] Expand RuntimeConfig and RuntimesConfig to include all fields
- #17910 — docs: update network.md to document Codex and Gemini engines
- #11657 — Add markdown style guidelines to daily-news workflow

C3 — safe / safe outputs / outputs (195 tasks, 79% merge rate) ✅ highest success

Safe-outputs feature work — the highest merge rate of any cluster. Despite being the most commit-intensive (avg 4.4), tasks are well-scoped and clearly defined, leading to excellent outcomes.

Avg commits: 4.4 | Avg files changed: 21.3
Top keywords: safe outputs, output, create, project, handler, schema
Example PRs:
- #17836 — fix(duplicate-code-detector): require noop when no duplicates found
- #17834 — feat: add activation-comments to disable activation/fix
- #17790 — Fix SEC-003 false positives in safe-outputs conformance

C4 — docs / custom / documentation (130 tasks, 75% merge rate)

Documentation and custom instruction tasks. Fewest files changed (avg 8.4) — small, focused PRs with good outcomes.

Avg commits: 3.7 | Avg files changed: 8.4
Top keywords: docs, custom, documentation, instructions, model
Example PRs:
- #17813 — chore: recompile workflow lock files
- #17656 — Promote Google Gemini CLI from experimental to GA

C5 — mcp / server / mcp server (102 tasks, 60% merge rate)

MCP server integration tasks. Higher file-change count (avg 22.8) and below-average merge rate suggest these tasks are more complex and prone to scope/compatibility issues.

Avg commits: 3.6 | Avg files changed: 22.8
Top keywords: mcp, server, mcp server, tool, tools
Example PRs:
- #17897 — Add MCPServerID semantic type for MCP server ID constants
- #17822 — fix: web-fetch MCP server generates container format incorrectly

C6 — debug / create / custom (92 tasks, 64% merge rate)

Debugging and creation tasks with custom AI configurations. Mid-range outcomes with moderate complexity.

Avg commits: 3.3 | Avg files changed: 16.0
Top keywords: debug, create, custom, ai, aw
Example PRs:
- #17597 — fix: preserve expression-based draft boolean in create-pr
- #17571 — docs: rephrase prompt text starter on creating workflows

C7 — version / v0 / release (90 tasks, 67% merge rate)

Version bump and release tasks. Highest average files changed (76.3) due to package lock file regeneration and bulk dependency updates.

Avg commits: 3.7 | Avg files changed: 76.3
Top keywords: version, v0, release, patch, update
Example PRs:
- #17765 — fix: fetch frontmatter imports: dependencies locally
- #17697 — Update MCP Gateway to v0.1.5

C8 — failing / implement / id (76 tasks, 76% merge rate)

Bug fix tasks initiated from CI failures or root-cause analysis. High merge rate and low file count (avg 11.2) suggest targeted, precise fixes.

Avg commits: 3.1 | Avg files changed: 11.2
Top keywords: failing, implement, id, root cause, logs
Example PRs:
- #16681 — Fix TestTaskJobGenerationFix to use on.roles instead of on.users
- #17260 — fix: GitHub App token not overriding custom token in Copilot

C9 — triggering command / block (32 tasks, 72% merge rate)

Smallest cluster — workflow trigger and command-blocking tasks. Focused, low-complexity work with good outcomes.

Avg commits: 3.0 | Avg files changed: 12.9
Top keywords: triggering command, block, command http
Example PRs:
- #16685 — fix(permissions): reject invalid id-token: read permission
- #16671 — Move skip-if-match/skip-if-no-match console output to debug

Key Findings

Dominant task type — C1 ("review / md / ci") accounts for 37% of all tasks (624/1,697). General implementation work touching CI and markdown infrastructure is by far the most common category.
Success rate spread — C3 ("safe / safe outputs / outputs") achieves 79% merge rate vs C2 ("discussion / aw / 2026") at 58% — a 21-point gap. Analysis-driven tasks from automated workflows show the lowest success rate.
Task complexity by commits — C3 ("safe / safe outputs / outputs") requires the most commits on average (4.4), yet has the highest merge rate, suggesting that well-scoped complex work outperforms loosely-scoped simpler work.
Release tasks are file-heavy — C7 ("version / v0 / release") averages 76.3 files changed per PR — driven by package-lock regeneration. This inflates the file-change metric and should be interpreted separately.
In-progress work — 44 PRs are currently open; C2 has the most open PRs (18), further confirming that analysis-driven tasks have a longer completion cycle.
CI-failure bug fixes are efficient — C8 ("failing / implement / id") combines a high merge rate (76%) with the fewest commits (3.1 avg) and targeted scope (11.2 files avg), making it the most efficient cluster.

Recommendations

Study C3 as a prompt template: "safe / safe outputs / outputs" tasks achieve the highest merge rate (79%) despite high complexity. Examine what makes those prompts effective — likely: clear acceptance criteria, specific file references, and well-bounded scope.
Improve C2 prompt specificity: Analysis-driven tasks (C2, 58% merge rate) should include more precise acceptance criteria and explicit test/validation steps to reduce rejections. Consider adding structured "Definition of Done" sections to auto-generated prompts.
Address MCP server task friction (C5, 60%): The below-average merge rate for MCP tasks combined with high file counts (22.8) suggests scope creep. Add explicit constraints on which MCP servers/files should be modified.
Separate release automation (C7): Package dependency updates generate extremely high file counts (76 avg) that obscure metrics. Consider tagging these PRs for separate tracking.
Add task-type labels at creation: Tagging PRs with a category at creation time would make future clustering more accurate and enable trend tracking over time.

Representative PRs Sample (3 per cluster)

PR #	Title	Cluster	Merged	Files
#17813	chore: recompile workflow lock files	docs / custom / documentation	✅	11
#17656	Promote Google Gemini CLI from experimental to GA	docs / custom / documentation	✅	5
#17619	docs: add guide for taking website screenshots	docs / custom / documentation	❌	3
#17857	fix(tests): clear GITHUB_RUN_ID in memory tests	review / md / ci	✅	1
#17842	feat(release): celebrate community issue authors	review / md / ci	✅	2
#17841	Add --retry 3 with delay to curl downloads	review / md / ci	✅	3
#17836	fix(duplicate-code-detector): require noop when no duplicates	safe / safe outputs / outputs	✅	1
#17834	feat: add `activation-comments` to disable activation	safe / safe outputs / outputs	✅	17
#17790	Fix SEC-003 false positives in safe-outputs conformance	safe / safe outputs / outputs	✅	1
#16685	fix(permissions): reject invalid id-token: read permission	triggering command / block	✅	4
#16671	Move skip-if-match console output to debug level	triggering command / block	✅	1
#16620	Add cross-repository support to dispatch-workflow	triggering command / block	❌	9
#17661	Fix daily-issues-report workflow failure caused by lock	debug / create / custom	❌	3
#17597	fix: preserve expression-based draft boolean	debug / create / custom	✅	13
#17571	docs: rephrase prompt text starter on creating workflow	debug / create / custom	❌	1
#16681	Fix TestTaskJobGenerationFix to use on.roles	failing / implement / id	✅	1
#17260	fix: GitHub App token not overriding custom token	failing / implement / id	❌	0
#17241	Fix GitHub App token not overriding custom token (v2)	failing / implement / id	❌	3
#17911	[WIP] Expand RuntimeConfig and RuntimesConfig	discussion / aw / 2026	🔄	0
#17910	docs: update network.md to document Codex and Gemini	discussion / aw / 2026	🔄	1
#17909	[WIP] Fix network/firewall schema description	discussion / aw / 2026	🔄	0
#17765	fix: fetch frontmatter `imports:` dependencies locally	version / v0 / release	✅	3
#17720	Fix `base64` executable not found on Windows	version / v0 / release	✅	2
#17697	Update MCP Gateway to v0.1.5	version / v0 / release	✅	165
#17897	Add `MCPServerID` semantic type for MCP server ID	mcp / server / mcp server	✅	14
#17881	[WIP] Update google.golang.org/grpc	mcp / server / mcp server	❌	0
#17822	fix: web-fetch MCP server generates container format	mcp / server / mcp server	✅	9

References: §22306119147

AI generated by Copilot Agent Prompt Clustering Analysis

expires on Feb 24, 2026, 12:51 PM UTC

2026-02-23T13:15:25Z

github-actions[bot]
bot Feb 23, 2026
Author

🤖 Beep boop! The smoke test agent was here! 👋

Just dropping by to say that automated testing is alive and well. The Copilot smoke test agent swung through, kicked the tires, and everything looks shipshape! 🚀

📰 BREAKING: Report filed by Smoke Copilot

0 replies

2026-02-24T12:59:19Z

github-actions[bot]
bot Feb 24, 2026
Author

This discussion was automatically closed because it expired on 2026-02-24T12:51:32.940Z.

Closed by Workflow

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[prompt-clustering] Copilot Agent Prompt Clustering Analysis – 2026-02-23 #17915

Uh oh!

{{title}}

Uh oh!

Replies: 2 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

[prompt-clustering] Copilot Agent Prompt Clustering Analysis – 2026-02-23 #17915

Uh oh!

github-actions[bot] bot Feb 23, 2026

Summary

Cluster Analysis

Key Findings

Recommendations

Replies: 2 comments

Uh oh!

github-actions[bot] bot Feb 23, 2026 Author

Uh oh!

github-actions[bot] bot Feb 24, 2026 Author

github-actions[bot]
bot Feb 23, 2026

github-actions[bot]
bot Feb 23, 2026
Author

github-actions[bot]
bot Feb 24, 2026
Author