Skip to content

Add multi-repo support + torchtitan#9

Open
xmfan wants to merge 6 commits intodrisspg:mainfrom
xmfan:xmfan/torchtitan
Open

Add multi-repo support + torchtitan#9
xmfan wants to merge 6 commits intodrisspg:mainfrom
xmfan:xmfan/torchtitan

Conversation

@xmfan
Copy link
Copy Markdown
Collaborator

@xmfan xmfan commented Apr 6, 2026

Summary

  • Refactor hardcoded pytorch profile into a config-driven repo registry ([repos.*] in config.toml)
  • Add torchtitan as an add-on repo with dedicated prompts, lightweight venv setup (inherits base torch build), and per-job worktrees
  • Wire --repo through CLI, web UI, services, job IDs, and issue lookup
  • Both repos are always cloned in every workspace; --repo selects which repo an issue is filed in

Test plan

  • python -m pytest tests/ passes
  • ptq --help / ptq run --help / ptq worktree --help show updated descriptions
  • Web UI dropdown says "Issue from"
  • ptq setup gpu-dev — workspace clones both repos
  • ptq run --issue 2818 --repo torchtitan --machine gpu-dev — torchtitan job runs, agent can cross-reference pytorch source
  • ptq run --issue 179597 --machine gpu-dev — pytorch job runs
  • Web UI: job list shows repo column, new job form works for both repos

@xmfan xmfan force-pushed the xmfan/torchtitan branch from 419db27 to 5c79fc6 Compare April 8, 2026 08:20
## Debugging Tools

**Distributed training debugging**:
- Run with single process first: `CUDA_VISIBLE_DEVICES=0 {workspace}/jobs/{job_id}/.venv/bin/python <script.py>`
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this only works if the script uses fake process group. let's remove this instruction and always use torchrun

job_dir = f"{backend.workspace}/jobs/{job_id}"
worktree = f"{job_dir}/pytorch"

from ptq.repo_profiles import get_profile
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imports at top of file

job_dir = f"{workspace}/jobs/{job_id}"
worktree = f"{job_dir}/pytorch"

from ptq.repo_profiles import get_profile
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imports top of file

ptq/cli.py Outdated
] = "pytorch",
) -> None:
"""Launch an AI agent to investigate a PyTorch issue or run an adhoc task.
"""Launch an AI agent to investigate a PyTorch/TorchTitan issue or run an adhoc task.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's just remove repo names from prompts

@xmfan xmfan force-pushed the xmfan/torchtitan branch 3 times, most recently from 50a894b to 643ebe2 Compare April 9, 2026 09:39
@xmfan xmfan changed the title Add torchtitan support Add multi-repo support and torchtitan profile Apr 9, 2026
xmfan added 5 commits April 9, 2026 11:59
Move hardcoded pytorch profile into a config-driven RepoProfile
registry loaded from [repos.*] sections in config.toml. Prompt
templates are discovered by naming convention. Built-in defaults
used as fallback when config has no [repos] section.
- Add torchtitan profile to config.toml and _DEFAULT_PROFILES
- Add investigate/adhoc prompt templates for torchtitan
- Add repo field to JobRecord and RunRequest
- Include repo name in job IDs to avoid cross-repo collisions
- Filter find_by_issue by repo for correct re-run matching
- Update agent.py and issue.py to use repo profiles
- run_service / worktree_service: repo-aware worktree and venv setup;
  move _setup_lightweight_venv to worktree_service
- job_service / pr_service / rebase_service: top-level profile imports
- cli.py: generic --repo flag, auto-reload via create_debug_app factory
- workspace.py: generic _clone_repo driven by repo profiles
- app.py: add create_debug_app() factory for uvicorn auto-reload
- routes.py: pass profile objects to template for dynamic repo dropdown,
  repo column in job list, merge-base diff, dynamic issue links
- templates: iterate repos from config, repo column, dynamic issue links
@xmfan xmfan force-pushed the xmfan/torchtitan branch from 643ebe2 to edf160f Compare April 9, 2026 10:00
from ptq.application.worktree_service import provision_worktree, validate_workspace
from ptq.domain.policies import make_job_id
from ptq.infrastructure.backends import create_backend
from ptq.repo_profiles import get_profile
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apparently there's some circular imports

@xmfan xmfan marked this pull request as ready for review April 9, 2026 10:01
@xmfan xmfan force-pushed the xmfan/torchtitan branch from 8cf51f7 to c2311b1 Compare April 9, 2026 13:04
@xmfan xmfan changed the title Add multi-repo support and torchtitan profile Add multi-repo support + torchtitan Apr 9, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants