Skip to content

Added instructions on working with teamcity.toml and the link command to the agent skill#352

Open
Boris Yakhno (boris-yakhno) wants to merge 2 commits into
JetBrains:mainfrom
boris-yakhno:reporitory-binding-skill-update
Open

Added instructions on working with teamcity.toml and the link command to the agent skill#352
Boris Yakhno (boris-yakhno) wants to merge 2 commits into
JetBrains:mainfrom
boris-yakhno:reporitory-binding-skill-update

Conversation

@boris-yakhno

Copy link
Copy Markdown
Contributor

Summary

Instructions on working with teamcity.toml and the link command are added to the agent skill. Evals are updated and expanded to check the new requirements.

Changes

  • SKILL.md, commands.md, workflows.md – instructions on working with teamcity.toml and the link command.
  • evals/task.json, evals/checks.py – updated evals, added one new eval to check if the repository binding is being used.
  • Other files undes evals/ – changes that make it possible to setup evals with files.

Design Decisions

Added a "Mandatory rules" section at the top of SKILL.md to force agents to adhere to the new rule.

Example

N/A — not user-visible.

Test Plan

  • Unit tests pass (just unit)
  • Linter passes (just lint)
  • Acceptance tests pass (just acceptance)
  • If adding a new command/flag: added .txtar test in acceptance/testdata/. N/A — no applicable changes.
  • If adding a data-producing command: includes --json support. N/A — no applicable changes.
  • If modifying --json output: no field removals/renames (additive only). N/A — no applicable changes.
  • If changing docs-visible behavior: updated docs/, skills/, and README.md. N/A — no applicable changes.
  • External contributors: links a status:finalized issue (or trivial/docs/deps change). N/A — no applicable changes.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR updates the TeamCity CLI agent skill documentation to formalize how agents should use teamcity.toml and the teamcity link command, and extends the eval harness to enforce the new repository-binding requirements (including adding support for pre-seeded workspace files in eval runs).

Changes:

  • Added “Mandatory rules” to the TeamCity CLI skill, emphasizing teamcity.toml binding checks and when to use (or not use) teamcity link.
  • Expanded the command/workflow references with a new teamcity link section and repository-binding workflow examples.
  • Updated eval scaffolding and checks to validate binding behaviors, and added setup_files support to seed files (e.g., teamcity.toml) into eval workspaces.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 6 comments.

Show a summary per file
File Description
skills/teamcity-cli/SKILL.md Adds mandatory agent rules for teamcity.toml + teamcity link usage.
skills/teamcity-cli/references/workflows.md Adds a repository-binding workflow section.
skills/teamcity-cli/references/commands.md Adds teamcity link to the command reference (docs).
evals/tests/test_tasks.py Wires setup_files from task config into the runner.
evals/tasks.json Updates task checklists and adds a new task that seeds teamcity.toml.
evals/scaffold/tasks.py Extends TaskConfig to include setup_files.
evals/scaffold/claude.py Implements workspace file seeding + env-var templating for evals.
evals/checks.py Adds repository-binding checks and registers them; updates valid subcommand list.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread evals/scaffold/claude.py
Comment on lines +549 to +553
- `--project <id>` - Specifies the project for the binding
- `--job <id>` - Specifies the job for the binding
- `--jobs <id1,id2>` - Specifies jobs of interest stored separately from the main binding job
- `--server <url>` - When multiple servers are authenticated, can be used to specify the server for which the binding is upserted
- `--scope <path>` - Upserts a binding for a specific directory. If the value is an empty string, upserts the binding for the repository root.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I intentionally omitted the auto mode, as we plan to have a standardised guide for both the CLI and the MCP for project/job lookup, and auto would become redundant and make the instructions more complicated.

Viktor (@tiulpin) Viktoria Petrenko (@vbedrosova) What do you think, should we have the agents use the auto mode? Or maybe we should document the auto mode right now, but remove the mention once we add the lookup guide to the CLI?

Comment thread evals/checks.py
Comment on lines +330 to +336
def added_repository_link_with_project_only(runner: EvalRunner) -> None:
for cmd in runner.commands:
c = cmd.lower()
if "teamcity link" in c and ("--project " in c or " -p " in c) and ("--job " not in c and " -j " not in c):
runner.passed("Linked the repository using only the project argument")
return
runner.failed("Did not link the repository using only the project argument")

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fine. The check validates that no job id is passed, as there is none to pass in the corresponding evals. Setting jobs of interest is acceptable and might even be desirable.
But perhaps the check name is confusing, we can name it added_repository_link_with_project_and_without_job instead.

Comment thread evals/checks.py
Comment on lines +384 to +396
def used_project_from_repository_link(runner: EvalRunner) -> None:
missing_linked_project = "No project found by name or internal/external id 'Project_uipBGpvQua'"
for result in runner.events.tool_results.values():
content = result.get("content", "")
if isinstance(content, str) and missing_linked_project in content:
runner.passed("Used project from teamcity.toml")
return
if isinstance(content, list):
for block in content:
if isinstance(block, dict) and missing_linked_project in block.get("text", ""):
runner.passed("Used project from teamcity.toml")
return
runner.failed("Did not use project from teamcity.toml")
Comment thread evals/tasks.json
Comment thread evals/tasks.json

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: ab43a827ad

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".

Comment thread evals/checks.py Outdated
def added_repository_link_with_project_only(runner: EvalRunner) -> None:
for cmd in runner.commands:
c = cmd.lower()
if "teamcity link" in c and ("--project " in c or " -p " in c) and ("--job " not in c and " -j " not in c):

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Parse equals-form link flags in the project-only check

This check can both miss valid project-only links and falsely pass job-specific links when agents use the standard Cobra/pflag equals syntax, e.g. teamcity link --project=JBR --no-input fails because there is no "--project ", while teamcity link --project JBR --job=Foo passes because there is no "--job ". That corrupts the new evals that rely on this check to distinguish project-wide bindings from job-specific bindings; parse tokens or match --project=/--job= and -p=/-j= as well.

Useful? React with 👍 / 👎.

@boris-yakhno Boris Yakhno (boris-yakhno) force-pushed the reporitory-binding-skill-update branch from ab43a82 to a57b459 Compare June 11, 2026 10:00

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: a57b4596a0

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".

Comment thread evals/checks.py
Comment on lines +323 to +325
def added_repository_link(runner: EvalRunner) -> None:
if runner.has_command("teamcity", "link"):
runner.passed("Linked the repository to a TeamCity job / project")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Require an actual successful repository link

This passes as soon as any Bash command contains teamcity and link, so an eval run that only does teamcity link --help, teamcity link with no flags, or another failed/usage-only invocation is graded as having created the binding. Because the new tasks use this check to validate that teamcity.toml was completed, this can falsely pass runs that never write or update the binding; the check should verify a non-help/non-validation link invocation succeeded or that teamcity.toml was created/updated by the command.

Useful? React with 👍 / 👎.

@tiulpin Viktor (tiulpin) left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! The link docs are good (verified flags against the binary; add --auto), and setup_files we need anyway. But can we maybe split this? TL;DR: a pull request should change the skill or the checks, never both.

It races #350 (unfortunately, merged only this morning, but I'm happy to help you with resolving conflicts): the allowlist is gone – cli_schema.json is generated from the cobra tree, link/run tree already covered.

Bigger issue: added_repository_link, mentioned_teamcity_toml and etc. on existing tasks assert behavior only SKILL.md mandates. CONTROL run can't know the rule and doesn't need it for the task, so baseline can't pass; lift inflates without the skill getting better.

Our inventd rule for this: a scored check must be passable by someone who never read SKILL.md; skill-prescribed behavior becomes an unscored tag. used_project_from_repository_link keys on an exact error string the CLI emits automatically in both arms – measures the binary, not the agent.

Here's my split suggestion:

  1. setup_files, rebased, + .. guard + unit test. Measurement-neutral, merges now.
  2. the docs; SKILL.md cut to: respect existing teamcity.toml, never hand-edit, link only when asked. Mandatory link on read-only tasks = agents writing into user repos unprompted. CI measures the new text automatically.
  3. one dedicated use-repository-link task: real project binding, underspecified prompt, score whether the bound scope was used. CONTROL can discover the file, so it's fair. Re-baselines main.

P.S. Maybe not all checks follow this rule at the moment, but we need to update old checks if they don't and some work on that is in progress.

@tiulpin

Viktor (tiulpin) commented Jun 11, 2026

Copy link
Copy Markdown
Member

Thought about this more...

Before we invest more in skill rules for linking: the original problem statement from the issue was already solved.

teamcity run list --revision @head

finds the build for the agent's exact commit, and the agent skill already documents this pattern (workflows reference, "wait for my commit" flow)

What link adds on top is convenience: a persisted choice of which job is "the" job for the repo, default scoping, monorepo paths. Useful – but the ticket's premise ("agent struggles to find the exact build") isn't blocked on it.

Proposal: measure before building. We have an A/B eval harness for the CLI skill.
One task that models this exact flow – repo workspace, "I pushed a fix, is CI green?" – tells us whether agents succeed via --revision @head without a binding, and whether a binding (or skill text about it) actually moves the success rate. If it does, we ship the skill rules with evidence; if it doesn't, this could re-scope to the setup-MVP flow (TW-99789) where the agent creates the pipeline and linking is natural.

Happy to help with that this week – it's one task definition in the existing pipeline

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 244c9516ec

ℹ️ About Codex in GitHub

Codex has been enabled to automatically review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "Codex (@codex) review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

When you sign up for Codex through ChatGPT, Codex can also answer questions or update the PR, like "Codex (@codex) address that feedback".

Comment thread evals/checks.py


def used_project_from_repository_link(runner: EvalRunner) -> None:
linked_project = "No project found by name or internal/external id 'Project_uipBGpvQua'"

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Accept the linked job as repository-link usage

In the new use-repository-link eval, teamcity.toml is seeded with both project = "Project_uipBGpvQua" and job = "FooJob", and run list resolves the linked default job when no explicit filter is supplied. An agent that correctly honors the more specific linked job can therefore query FooJob and receive a job/build-type error instead of this hard-coded project-not-found text, causing a false failure even though it used the repository binding. Consider accepting either the linked project or linked job marker for this task.

Useful? React with 👍 / 👎.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants