Skip to content

refactor: replace dummy_nodes with real xpyd-sim#135

Merged
hlin99 merged 6 commits into
mainfrom
replace-dummy-final
Apr 5, 2026
Merged

refactor: replace dummy_nodes with real xpyd-sim#135
hlin99 merged 6 commits into
mainfrom
replace-dummy-final

Conversation

@hlin99

@hlin99 hlin99 commented Apr 5, 2026

Copy link
Copy Markdown
Member

Remove dummy_nodes/ entirely. All tests use real xpyd-sim via sim_adapter.py.

CI Status: ✅ ALL GREEN

  • lint ✅ | build ✅ | benchmark ✅
  • test 3.10 ✅ | test 3.11 ✅ | test 3.12 ✅
  • 596 tests (same as main, zero coverage loss)

Changes

  • sim_adapter.py: make_sim_app(model_name, mode) creates real xpyd-sim instances
  • All test imports: dummy_nodessim_adapter
  • Proxy test assertions aligned with sim behavior (finish_reason, streaming)
  • Subprocess tests use uvicorn sim_adapter:prefill_app with correct model name
  • pyproject.toml: xpyd-sim>=0.2.0 + pytest-timeout in dev deps
  • CI: no PYTHONPATH hack, timeout 15min/job + 120s/test

Deleted

  • dummy_nodes/ (4 files, -1095 lines)
  • test_prefill_node.py + test_decode_node.py → replaced by test_sim_nodes.py

Supersedes #132, #133, #134.

Remove dummy_nodes/ entirely. All tests now use xpyd-sim via sim_adapter.py
which calls xpyd_sim.server.create_app for accurate simulation behavior.

Changes:
- sim_adapter.py: make_sim_app() creates real xpyd-sim instances
- All test imports: dummy_nodes -> sim_adapter
- _make_dummy_app: uses make_sim_app (real xpyd-sim per model)
- _spawn_node: uvicorn sim_adapter:prefill_app with PYTHONPATH
- Proxy test assertions aligned with sim behavior:
  - finish_reason: accept 'stop' or 'length'
  - streaming chunks: relaxed exact counts
  - max_tokens_very_large: capped to 1000
- pyproject.toml: xpyd-sim>=0.2.0 + pytest-timeout in dev deps
- CI: timeout 15min per job, 120s per test, no PYTHONPATH hack
- Deleted: dummy_nodes/, test_prefill_node.py, test_decode_node.py
@hlin99 hlin99 marked this pull request as draft April 5, 2026 08:51
hlin99 added 3 commits April 5, 2026 16:56
Root cause: proxy validates that backend nodes serve the same model
as configured. sim_adapter defaults to 'dummy' but proxy expects
the tokenizer path. Fixed by:
- sim_adapter reads SIM_MODEL_NAME env var (default: 'dummy')
- conftest.py sets SIM_MODEL_NAME to tokenizer path before import
- subprocess tests pass SIM_MODEL_NAME=TOKENIZER_DIR in env
- sim_adapter defaults to tokenizers/DeepSeek-R1 (from repo root)
- No env vars — model_name resolved from __file__ path
- All subprocess tests use same tokenizer as conftest
- All imports use make_sim_app() with defaults or explicit model_name
@hlin99 hlin99 requested a review from Copilot April 5, 2026 09:22
@hlin99 hlin99 marked this pull request as ready for review April 5, 2026 09:23
Replaces deleted test_prefill_node.py + test_decode_node.py (10 tests)
with equivalent tests against sim_adapter's xpyd-sim apps.

@hlin99-Review-Bot hlin99-Review-Bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Clean refactor — removes 1095 lines of dummy_nodes and replaces with a thin sim_adapter.py wrapping real xpyd-sim.

Reviewed:

  • sim_adapter.py: correct ServerConfig with zero delays, eos_min_ratio=1.0, proper default model path
  • CI workflow: PYTHONPATH hack removed, xpyd-sim installed as proper dep
  • All 16 test files updated consistently (dummy_nodes imports → sim_adapter)
  • Test assertions relaxed where needed (finish_reason accepts "length", streaming chunk counts use >= instead of ==) — appropriate for real sim behavior
  • pyproject.toml: xpyd-sim and pytest-timeout added to dev deps, dummy_nodes removed from packages
  • conftest.py: healthcheck-based readiness instead of sleep(2) — nice improvement
  • No xpyd/ business logic changes
  • CI fully green (all Python versions, lint, build, benchmark)

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR removes the in-repo dummy_nodes/ test backend and updates the test suite + CI to use the real xpyd-sim implementation via a new sim_adapter.py shim.

Changes:

  • Delete dummy_nodes/ and the dummy-node unit tests; introduce sim_adapter.py to create xpyd-sim FastAPI apps for prefill/decode.
  • Update integration/stress tests to import or spawn sim_adapter apps instead of dummy_nodes, and relax a few assertions to match simulator behavior.
  • Update CI/dev tooling: add xpyd-sim + pytest-timeout, drop PYTHONPATH hacks, and adjust packaging discovery to exclude dummy_nodes.

Reviewed changes

Copilot reviewed 24 out of 25 changed files in this pull request and generated 9 comments.

Show a summary per file
File Description
sim_adapter.py New adapter that creates xpyd-sim apps and exposes prefill_app/decode_app for uvicorn.
pyproject.toml Add dev deps (xpyd-sim, pytest-timeout), remove dummy_nodes from package discovery, set pytest pythonpath.
.github/workflows/ci.yml Remove PYTHONPATH hacks, install dev deps, add pytest timeout, simplify lint/build steps.
.isort.cfg Mark sim_adapter as first-party for import sorting.
tests/conftest.py Swap dummy-node apps for sim_adapter apps; add HTTP readiness polling for started uvicorn threads.
tests/integration/test_proxy.py Use sim_adapter; relax finish_reason + streaming assertions.
tests/integration/test_completions_endpoint.py Relax finish_reason assertion to allow simulator behavior.
tests/integration/test_yaml_integration.py Replace dummy-node imports with make_sim_app(...) app creation.
tests/integration/test_streaming_edge.py Replace dummy-node imports with make_sim_app(...) app creation.
tests/integration/test_resilience_integration.py Replace dummy-node imports with make_sim_app(...) app creation.
tests/integration/test_concurrent_requests.py Replace dummy-node imports with make_sim_app(...) app creation.
tests/integration/test_large_payload.py Replace dummy-node imports with make_sim_app(...); adjust a max_tokens edge-case value.
tests/integration/test_xpyd_start_proxy_integration.py Spawn sim_adapter:prefill_app / sim_adapter:decode_app via uvicorn; switch tokenizer path.
tests/integration/test_proxy_matrix.py Spawn sim_adapter apps via uvicorn; switch tokenizer path; drop explicit request model field.
tests/integration/test_multi_model_routing.py Replace custom dummy FastAPI app with make_sim_app(...) for per-model nodes.
tests/integration/test_dual_routing.py Replace custom dummy FastAPI app with make_sim_app(...) for per-model nodes.
tests/stress/test_benchmark_integration.py Bench cluster spawns sim_adapter uvicorn apps; update manual run instructions.
tests/stress/test_benchmark_e2e.py Bench cluster spawns sim_adapter uvicorn apps; update manual run instructions.
tests/integration/test_prefill_node.py Deleted (dummy-node-specific tests removed).
tests/integration/test_decode_node.py Deleted (dummy-node-specific tests removed).
dummy_nodes/common.py Deleted (dummy backend removed).
dummy_nodes/prefill_node.py Deleted (dummy backend removed).
dummy_nodes/decode_node.py Deleted (dummy backend removed).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.


This test file is excluded from CI via --ignore in the workflow.
Run manually: PYTHONPATH=core:dummy_nodes pytest tests/test_benchmark_integration.py -v
Run manually: pytest tests/test_benchmark_integration.py -v

Copilot AI Apr 5, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manual run command points to tests/test_benchmark_integration.py, but this file lives under tests/stress/. As written, the command won’t run this test module unless the path is corrected (e.g., pytest tests/stress/test_benchmark_integration.py -v).

Suggested change
Run manually: pytest tests/test_benchmark_integration.py -v
Run manually: pytest tests/stress/test_benchmark_integration.py -v

Copilot uses AI. Check for mistakes.
Comment on lines 3 to 7
Topology: 2 prefill + 16 decode + 1 proxy (same as test_benchmark_integration).
Excluded from CI via --ignore. Run manually:

PYTHONPATH=core:dummy_nodes pytest tests/test_benchmark_e2e.py -v -s
pytest tests/test_benchmark_e2e.py -v -s

Copilot AI Apr 5, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The manual run command points to tests/test_benchmark_e2e.py, but this module is under tests/stress/. Update the path so the instructions actually run this test file.

Copilot uses AI. Check for mistakes.
stderr=subprocess.PIPE,
text=True,
[PYTHON, "-m", "uvicorn", app_ref, "--host", "127.0.0.1", "--port", str(port), "--log-level", "warning"],
cwd=REPO_ROOT, env={**os.environ, "PYTHONPATH": str(REPO_ROOT)},

Copilot AI Apr 5, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ENV_BASE is constructed (including model_path, delay knobs, and no_proxy settings) but _spawn_node() ignores it and spawns uvicorn with a fresh env containing only PYTHONPATH. This makes the test harder to reason about and can diverge from the environment the proxy script is launched with; pass ENV_BASE (or {**ENV_BASE, ...}) into the node subprocess as well, or remove ENV_BASE fields that are no longer relevant.

Suggested change
cwd=REPO_ROOT, env={**os.environ, "PYTHONPATH": str(REPO_ROOT)},
cwd=REPO_ROOT, env=ENV_BASE,

Copilot uses AI. Check for mistakes.
stderr=subprocess.PIPE,
text=True,
[PYTHON, "-m", "uvicorn", app_ref, "--host", "127.0.0.1", "--port", str(port), "--log-level", "warning"],
cwd=REPO_ROOT, env={**os.environ, "PYTHONPATH": str(REPO_ROOT)},

Copilot AI Apr 5, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ENV (including the delay env vars) is defined at module scope but _spawn_node() doesn’t use it; instead it spawns nodes with a minimal env containing only PYTHONPATH. To keep the test configuration consistent (and avoid unused config), pass ENV into _spawn_node()/Popen or remove the unused ENV entries.

Suggested change
cwd=REPO_ROOT, env={**os.environ, "PYTHONPATH": str(REPO_ROOT)},
cwd=REPO_ROOT, env=ENV,

Copilot uses AI. Check for mistakes.
Comment on lines 166 to 173
lines = resp.text.strip().split("\n")
data_lines = [line for line in lines if line.startswith("data: ")]

# 1 role + 5 content + 1 finish + 1 [DONE] = 8
assert len(data_lines) == 8
# role + content + finish + [DONE]
assert len(data_lines) >= 4

assert data_lines[-1] == "data: [DONE]"

Copilot AI Apr 5, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The streaming assertion was relaxed to only len(data_lines) >= 4, which can miss regressions (e.g., missing role chunk, missing finish chunk, or no content chunks). Prefer asserting the presence of each required chunk type (role delta, at least one content delta, a chunk with finish_reason, and the final [DONE]) rather than only a minimum count.

Copilot uses AI. Check for mistakes.
Comment on lines 193 to +210
async def test_streaming_token_count(client: AsyncClient):
"""Verify the number of content tokens in streaming matches max_tokens."""
payload = {**CHAT_PAYLOAD, "max_tokens": 7, "stream": True}
resp = await client.post("/v1/chat/completions", json=payload)

lines = resp.text.strip().split("\n")
data_lines = [
line for line in lines if line.startswith("data: ") and line != "data: [DONE]"
]

content_chunks = 0
for line in data_lines:
chunk = json.loads(line.removeprefix("data: "))
delta = chunk["choices"][0]["delta"]
if delta.get("content") is not None:
content_chunks += 1

assert content_chunks == 7
assert content_chunks >= 1

Copilot AI Apr 5, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

test_streaming_token_count now only asserts content_chunks >= 1, which no longer validates the behavior the test name/docstring suggests. If xpyd-sim can aggregate tokens into fewer chunks, consider asserting something stable such as: (1) there is at least one content delta, (2) the response ends with a non-[DONE] chunk that has finish_reason, and (3) the concatenated content is non-empty / matches the usage.completion_tokens (if present).

Copilot uses AI. Check for mistakes.
Comment on lines 131 to 139
@pytest.mark.anyio
async def test_max_tokens_very_large(client: AsyncClient):
"""Very large max_tokens should succeed (dummy backend caps output)."""
payload = {
"model": "dummy",
"messages": [{"role": "user", "content": "Hi"}],
"max_tokens": 999999999,
"max_tokens": 1000,
"stream": False,
}

Copilot AI Apr 5, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test docstring says “Very large max_tokens”, but the payload now uses max_tokens: 1000, which is no longer particularly large and weakens the edge-case being exercised. Either update the docstring to match the new intent, or keep a truly large value if the proxy/backend is expected to handle it (even if output is capped).

Copilot uses AI. Check for mistakes.
Comment on lines 35 to +37
def _make_dummy_app(model_id: str):
"""Create a minimal dummy node app serving a given model_id."""
from dummy_nodes.common import (
ChatCompletionRequest,
ChatCompletionResponse,
Choice,
ChoiceMessage,
CompletionChoice,
CompletionRequest,
CompletionResponse,
ModelCard,
ModelListResponse,
UsageInfo,
count_prompt_tokens_from_messages,
count_prompt_tokens_from_prompt,
generate_id,
get_effective_max_tokens,
now_ts,
render_dummy_text,
)

app = FastAPI(title=f"Dummy Node ({model_id})")

@app.get("/v1/models")
async def models():
return ModelListResponse(
data=[ModelCard(id=model_id, created=now_ts(), max_model_len=131072)]
)

@app.get("/health")
async def health():
return "OK"

@app.post("/v1/chat/completions")
async def chat(request: ChatCompletionRequest):
prompt_tokens = count_prompt_tokens_from_messages(request.messages)
max_tokens = get_effective_max_tokens(
request.max_completion_tokens,
request.max_tokens,
)
text = render_dummy_text(max_tokens)
return ChatCompletionResponse(
id=generate_id(),
created=now_ts(),
model=model_id,
choices=[Choice(message=ChoiceMessage(content=text))],
usage=UsageInfo(
prompt_tokens=prompt_tokens,
completion_tokens=len(text),
total_tokens=prompt_tokens + len(text),
),
)

@app.post("/v1/completions")
async def completions(request: CompletionRequest):
prompt_tokens = count_prompt_tokens_from_prompt(request.prompt)
max_tokens = get_effective_max_tokens(request.max_tokens)
text = render_dummy_text(max_tokens)
return CompletionResponse(
id=generate_id("cmpl"),
created=now_ts(),
model=model_id,
choices=[CompletionChoice(text=text)],
usage=UsageInfo(
prompt_tokens=prompt_tokens,
completion_tokens=len(text),
total_tokens=prompt_tokens + len(text),
),
)
from sim_adapter import make_sim_app
return make_sim_app(model_id)

Copilot AI Apr 5, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

make_sim_app() is documented/used elsewhere with a local tokenizer directory path (e.g., tokenizers/DeepSeek-R1), but here it’s called with logical model ids like llama-3 / qwen-2 / deepseek-r1, none of which exist under tokenizers/. If xpyd-sim treats model_name as a tokenizer path or tries to resolve remote models, this can introduce network/filesystem dependencies in CI. Consider updating sim_adapter.make_sim_app to accept a separate display model_id (used in responses) while always using a local tokenizer path for tokenization, or pass a local tokenizer directory here and adjust the tests’ expectations accordingly.

Copilot uses AI. Check for mistakes.
Comment on lines 33 to +35
def _make_dummy_app(model_id: str):
"""Create a minimal dummy node serving a given model_id."""
from dummy_nodes.common import (
ChatCompletionRequest,
ChatCompletionResponse,
Choice,
ChoiceMessage,
CompletionChoice,
CompletionRequest,
CompletionResponse,
ModelCard,
ModelListResponse,
UsageInfo,
count_prompt_tokens_from_messages,
count_prompt_tokens_from_prompt,
generate_id,
get_effective_max_tokens,
now_ts,
render_dummy_text,
)

app = FastAPI(title=f"Dummy Node ({model_id})")

@app.get("/v1/models")
async def models():
return ModelListResponse(
data=[ModelCard(id=model_id, created=now_ts(), max_model_len=131072)]
)

@app.get("/health")
async def health():
return "OK"

@app.post("/v1/chat/completions")
async def chat(request: ChatCompletionRequest):
prompt_tokens = count_prompt_tokens_from_messages(request.messages)
max_tokens = get_effective_max_tokens(
request.max_completion_tokens,
request.max_tokens,
)
text = render_dummy_text(max_tokens)
return ChatCompletionResponse(
id=generate_id(),
created=now_ts(),
model=model_id,
choices=[Choice(message=ChoiceMessage(content=text))],
usage=UsageInfo(
prompt_tokens=prompt_tokens,
completion_tokens=len(text),
total_tokens=prompt_tokens + len(text),
),
)

@app.post("/v1/completions")
async def completions(request: CompletionRequest):
prompt_tokens = count_prompt_tokens_from_prompt(request.prompt)
max_tokens = get_effective_max_tokens(request.max_tokens)
text = render_dummy_text(max_tokens)
return CompletionResponse(
id=generate_id("cmpl"),
created=now_ts(),
model=model_id,
choices=[CompletionChoice(text=text)],
usage=UsageInfo(
prompt_tokens=prompt_tokens,
completion_tokens=len(text),
total_tokens=prompt_tokens + len(text),
),
)
from sim_adapter import make_sim_app
return make_sim_app(model_id)

Copilot AI Apr 5, 2026

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same concern as in multi-model routing: _make_dummy_app() passes values like qwen-2/llama-3/deepseek-r1 into make_sim_app() even though the repo’s bundled tokenizers are under tokenizers/ and don’t include these names. If xpyd-sim resolves model_name as a path or remote model id, this can create flaky/network-dependent tests. Prefer a local tokenizer path for model_name and (if needed) a separate response model label.

Copilot uses AI. Check for mistakes.

@hlin99-Review-BotX hlin99-Review-BotX left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review by hlin99-Review-BotX (commit 00322ad)

Code review: the refactor looks clean and correct. sim_adapter.py is a thin, well-structured wrapper around xpyd-sim. All 16 test files are consistently updated, new test_sim_nodes.py properly replaces deleted dummy-node tests, and no xpyd/ business logic was touched.

However, CI lint job has FAILED on this commit. Cannot approve until all CI checks are green. Benchmark is also still in progress.

Will re-review once CI is fully green.

@hlin99-Review-Bot hlin99-Review-Bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review complete on 7df3124. Two new commits since my last approval:

  • 00322ad — adds test_sim_nodes.py with comprehensive prefill/decode node tests (health, streaming, non-streaming, max_tokens). Clean fixtures, good assertions.
  • 7df3124 — renames lline to fix ruff E741 (ambiguous variable name). Straightforward.

No issues found. Waiting for CI (tests 3.10/3.11/3.12 + benchmark still in progress) before submitting APPROVE.

@hlin99-Review-Bot hlin99-Review-Bot left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI is fully green (lint, build, benchmark, tests 3.10/3.11/3.12 all pass). Code reviewed on 7df3124 — two commits since prior approval: 00322ad adds test_sim_nodes.py with solid coverage of prefill/decode nodes, 7df3124 fixes ruff E741. No issues found. LGTM.

@hlin99-Review-BotX hlin99-Review-BotX left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code review complete. All 25 changed files examined.

sim_adapter.py: Clean drop-in replacement using real xpyd-sim with deterministic config (zero delays, eos_min_ratio=1.0). Default model path correctly resolves to repo-local tokenizer.

CI: PYTHONPATH hacks removed, per-job timeout (15min) and per-test timeout (120s) added, build job explicitly installs xpyd-sim for installed-mode tests. All 6 checks green.

conftest.py: Health-check polling loop replaces fragile time.sleep(2) — solid improvement.

Test migrations: All dummy_nodes imports updated to sim_adapter. Assertions appropriately relaxed where real sim behavior differs from dummy (finish_reason accepts 'length', streaming chunk counts use >= instead of ==).

pyproject.toml: xpyd-sim>=0.2.0 and pytest-timeout deps added; dummy_nodes removed from setuptools packages; pythonpath=['.'] configured for pytest.

No xpyd/ business logic changes. No security, concurrency, or type safety issues found. Mergeable and CI green. LGTM.

@hlin99 hlin99 merged commit 984b727 into main Apr 5, 2026
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants