Added instructions on working with teamcity.toml and the link command to the agent skill#352
Conversation
There was a problem hiding this comment.
Pull request overview
This PR updates the TeamCity CLI agent skill documentation to formalize how agents should use teamcity.toml and the teamcity link command, and extends the eval harness to enforce the new repository-binding requirements (including adding support for pre-seeded workspace files in eval runs).
Changes:
- Added “Mandatory rules” to the TeamCity CLI skill, emphasizing
teamcity.tomlbinding checks and when to use (or not use)teamcity link. - Expanded the command/workflow references with a new
teamcity linksection and repository-binding workflow examples. - Updated eval scaffolding and checks to validate binding behaviors, and added
setup_filessupport to seed files (e.g.,teamcity.toml) into eval workspaces.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| skills/teamcity-cli/SKILL.md | Adds mandatory agent rules for teamcity.toml + teamcity link usage. |
| skills/teamcity-cli/references/workflows.md | Adds a repository-binding workflow section. |
| skills/teamcity-cli/references/commands.md | Adds teamcity link to the command reference (docs). |
| evals/tests/test_tasks.py | Wires setup_files from task config into the runner. |
| evals/tasks.json | Updates task checklists and adds a new task that seeds teamcity.toml. |
| evals/scaffold/tasks.py | Extends TaskConfig to include setup_files. |
| evals/scaffold/claude.py | Implements workspace file seeding + env-var templating for evals. |
| evals/checks.py | Adds repository-binding checks and registers them; updates valid subcommand list. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| - `--project <id>` - Specifies the project for the binding | ||
| - `--job <id>` - Specifies the job for the binding | ||
| - `--jobs <id1,id2>` - Specifies jobs of interest stored separately from the main binding job | ||
| - `--server <url>` - When multiple servers are authenticated, can be used to specify the server for which the binding is upserted | ||
| - `--scope <path>` - Upserts a binding for a specific directory. If the value is an empty string, upserts the binding for the repository root. |
There was a problem hiding this comment.
I intentionally omitted the auto mode, as we plan to have a standardised guide for both the CLI and the MCP for project/job lookup, and auto would become redundant and make the instructions more complicated.
Viktor (@tiulpin) Viktoria Petrenko (@vbedrosova) What do you think, should we have the agents use the auto mode? Or maybe we should document the auto mode right now, but remove the mention once we add the lookup guide to the CLI?
| def added_repository_link_with_project_only(runner: EvalRunner) -> None: | ||
| for cmd in runner.commands: | ||
| c = cmd.lower() | ||
| if "teamcity link" in c and ("--project " in c or " -p " in c) and ("--job " not in c and " -j " not in c): | ||
| runner.passed("Linked the repository using only the project argument") | ||
| return | ||
| runner.failed("Did not link the repository using only the project argument") |
There was a problem hiding this comment.
This is fine. The check validates that no job id is passed, as there is none to pass in the corresponding evals. Setting jobs of interest is acceptable and might even be desirable.
But perhaps the check name is confusing, we can name it added_repository_link_with_project_and_without_job instead.
| def used_project_from_repository_link(runner: EvalRunner) -> None: | ||
| missing_linked_project = "No project found by name or internal/external id 'Project_uipBGpvQua'" | ||
| for result in runner.events.tool_results.values(): | ||
| content = result.get("content", "") | ||
| if isinstance(content, str) and missing_linked_project in content: | ||
| runner.passed("Used project from teamcity.toml") | ||
| return | ||
| if isinstance(content, list): | ||
| for block in content: | ||
| if isinstance(block, dict) and missing_linked_project in block.get("text", ""): | ||
| runner.passed("Used project from teamcity.toml") | ||
| return | ||
| runner.failed("Did not use project from teamcity.toml") |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: ab43a827ad
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".
| def added_repository_link_with_project_only(runner: EvalRunner) -> None: | ||
| for cmd in runner.commands: | ||
| c = cmd.lower() | ||
| if "teamcity link" in c and ("--project " in c or " -p " in c) and ("--job " not in c and " -j " not in c): |
There was a problem hiding this comment.
Parse equals-form link flags in the project-only check
This check can both miss valid project-only links and falsely pass job-specific links when agents use the standard Cobra/pflag equals syntax, e.g. teamcity link --project=JBR --no-input fails because there is no "--project ", while teamcity link --project JBR --job=Foo passes because there is no "--job ". That corrupts the new evals that rely on this check to distinguish project-wide bindings from job-specific bindings; parse tokens or match --project=/--job= and -p=/-j= as well.
Useful? React with 👍 / 👎.
… to the agent skill
ab43a82 to
a57b459
Compare
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: a57b4596a0
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".
| def added_repository_link(runner: EvalRunner) -> None: | ||
| if runner.has_command("teamcity", "link"): | ||
| runner.passed("Linked the repository to a TeamCity job / project") |
There was a problem hiding this comment.
Require an actual successful repository link
This passes as soon as any Bash command contains teamcity and link, so an eval run that only does teamcity link --help, teamcity link with no flags, or another failed/usage-only invocation is graded as having created the binding. Because the new tasks use this check to validate that teamcity.toml was completed, this can falsely pass runs that never write or update the binding; the check should verify a non-help/non-validation link invocation succeeded or that teamcity.toml was created/updated by the command.
Useful? React with 👍 / 👎.
Viktor (tiulpin)
left a comment
There was a problem hiding this comment.
Thanks! The link docs are good (verified flags against the binary; add --auto), and setup_files we need anyway. But can we maybe split this? TL;DR: a pull request should change the skill or the checks, never both.
It races #350 (unfortunately, merged only this morning, but I'm happy to help you with resolving conflicts): the allowlist is gone – cli_schema.json is generated from the cobra tree, link/run tree already covered.
Bigger issue: added_repository_link, mentioned_teamcity_toml and etc. on existing tasks assert behavior only SKILL.md mandates. CONTROL run can't know the rule and doesn't need it for the task, so baseline can't pass; lift inflates without the skill getting better.
Our inventd rule for this: a scored check must be passable by someone who never read SKILL.md; skill-prescribed behavior becomes an unscored tag. used_project_from_repository_link keys on an exact error string the CLI emits automatically in both arms – measures the binary, not the agent.
Here's my split suggestion:
- setup_files, rebased, + .. guard + unit test. Measurement-neutral, merges now.
- the docs; SKILL.md cut to: respect existing teamcity.toml, never hand-edit, link only when asked. Mandatory link on read-only tasks = agents writing into user repos unprompted. CI measures the new text automatically.
- one dedicated use-repository-link task: real project binding, underspecified prompt, score whether the bound scope was used. CONTROL can discover the file, so it's fair. Re-baselines main.
P.S. Maybe not all checks follow this rule at the moment, but we need to update old checks if they don't and some work on that is in progress.
|
Thought about this more... Before we invest more in skill rules for linking: the original problem statement from the issue was already solved. teamcity run list --revision @headfinds the build for the agent's exact commit, and the agent skill already documents this pattern (workflows reference, "wait for my commit" flow) What link adds on top is convenience: a persisted choice of which job is "the" job for the repo, default scoping, monorepo paths. Useful – but the ticket's premise ("agent struggles to find the exact build") isn't blocked on it. Proposal: measure before building. We have an A/B eval harness for the CLI skill. Happy to help with that this week – it's one task definition in the existing pipeline |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 244c9516ec
ℹ️ About Codex in GitHub
Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "Codex (@codex) review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".
|
|
||
|
|
||
| def used_project_from_repository_link(runner: EvalRunner) -> None: | ||
| linked_project = "No project found by name or internal/external id 'Project_uipBGpvQua'" |
There was a problem hiding this comment.
Accept the linked job as repository-link usage
In the new use-repository-link eval, teamcity.toml is seeded with both project = "Project_uipBGpvQua" and job = "FooJob", and run list resolves the linked default job when no explicit filter is supplied. An agent that correctly honors the more specific linked job can therefore query FooJob and receive a job/build-type error instead of this hard-coded project-not-found text, causing a false failure even though it used the repository binding. Consider accepting either the linked project or linked job marker for this task.
Useful? React with 👍 / 👎.
Summary
Instructions on working with
teamcity.tomland thelinkcommand are added to the agent skill. Evals are updated and expanded to check the new requirements.Changes
SKILL.md,commands.md,workflows.md– instructions on working withteamcity.tomland thelinkcommand.evals/task.json,evals/checks.py– updated evals, added one new eval to check if the repository binding is being used.evals/– changes that make it possible to setup evals with files.Design Decisions
Added a "Mandatory rules" section at the top of
SKILL.mdto force agents to adhere to the new rule.Example
N/A — not user-visible.
Test Plan
just unit)just lint)just acceptance).txtartest inacceptance/testdata/. N/A — no applicable changes.--jsonsupport. N/A — no applicable changes.--jsonoutput: no field removals/renames (additive only). N/A — no applicable changes.docs/,skills/, andREADME.md. N/A — no applicable changes.status:finalizedissue (or trivial/docs/deps change). N/A — no applicable changes.