From 02aca59736abe870094997e46b450c0dced7a339 Mon Sep 17 00:00:00 2001 From: hlin99 Date: Mon, 6 Apr 2026 21:49:03 +0800 Subject: [PATCH] =?UTF-8?q?docs:=20unified=20structure=20=E2=80=94=20LICEN?= =?UTF-8?q?SE,=20README,=20CONTRIBUTING,=20bot/?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- CONTRIBUTING.md | 31 +++++ LICENSE | 202 ++++++++++++++++++++++++++++ README.md | 170 +---------------------- REVIEW_POLICY.md | 62 --------- bot/AUTHOR_POLICY.md | 23 ++++ bot/BOT_POLICY.md | 34 +++++ bot/DESIGN_PRINCIPLES.md | 41 ++++++ bot/DEV_LOOP.md | 28 ++++ bot/ENTRY.md | 14 ++ bot/REVIEW_POLICY.md | 38 ++++++ {docs => bot}/iterations/current.md | 0 docs/DESIGN_PRINCIPLES.md | 43 ------ docs/DEV_LOOP.md | 85 ------------ 13 files changed, 416 insertions(+), 355 deletions(-) create mode 100644 CONTRIBUTING.md create mode 100644 LICENSE delete mode 100644 REVIEW_POLICY.md create mode 100644 bot/AUTHOR_POLICY.md create mode 100644 bot/BOT_POLICY.md create mode 100644 bot/DESIGN_PRINCIPLES.md create mode 100644 bot/DEV_LOOP.md create mode 100644 bot/ENTRY.md create mode 100644 bot/REVIEW_POLICY.md rename {docs => bot}/iterations/current.md (100%) delete mode 100644 docs/DESIGN_PRINCIPLES.md delete mode 100644 docs/DEV_LOOP.md diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md new file mode 100644 index 0000000..eee3b77 --- /dev/null +++ b/CONTRIBUTING.md @@ -0,0 +1,31 @@ +# Contributing + +## Setup + +```bash +pip install -e ".[dev]" +pre-commit install +``` + +## Workflow + +1. Fork & branch from `main` +2. Write code + tests +3. Lint: `ruff check xpyd_plan tests` +4. Open a PR — CI must pass, two reviewer bots must approve + +## Commit Messages + +Use [Conventional Commits](https://www.conventionalcommits.org/): + +``` +feat: add new analyzer +fix: correct SLA threshold check +docs: update README +``` + +## Code Style + +- Formatter/linter: `ruff` +- Type hints required for public APIs +- Tests required for new features diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..d645695 --- /dev/null +++ b/LICENSE @@ -0,0 +1,202 @@ + + Apache License + Version 2.0, January 2004 + http://www.apache.org/licenses/ + + TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION + + 1. Definitions. + + "License" shall mean the terms and conditions for use, reproduction, + and distribution as defined by Sections 1 through 9 of this document. + + "Licensor" shall mean the copyright owner or entity authorized by + the copyright owner that is granting the License. + + "Legal Entity" shall mean the union of the acting entity and all + other entities that control, are controlled by, or are under common + control with that entity. For the purposes of this definition, + "control" means (i) the power, direct or indirect, to cause the + direction or management of such entity, whether by contract or + otherwise, or (ii) ownership of fifty percent (50%) or more of the + outstanding shares, or (iii) beneficial ownership of such entity. + + "You" (or "Your") shall mean an individual or Legal Entity + exercising permissions granted by this License. + + "Source" form shall mean the preferred form for making modifications, + including but not limited to software source code, documentation + source, and configuration files. + + "Object" form shall mean any form resulting from mechanical + transformation or translation of a Source form, including but + not limited to compiled object code, generated documentation, + and conversions to other media types. + + "Work" shall mean the work of authorship, whether in Source or + Object form, made available under the License, as indicated by a + copyright notice that is included in or attached to the work + (an example is provided in the Appendix below). + + "Derivative Works" shall mean any work, whether in Source or Object + form, that is based on (or derived from) the Work and for which the + editorial revisions, annotations, elaborations, or other modifications + represent, as a whole, an original work of authorship. For the purposes + of this License, Derivative Works shall not include works that remain + separable from, or merely link (or bind by name) to the interfaces of, + the Work and Derivative Works thereof. + + "Contribution" shall mean any work of authorship, including + the original version of the Work and any modifications or additions + to that Work or Derivative Works thereof, that is intentionally + submitted to Licensor for inclusion in the Work by the copyright owner + or by an individual or Legal Entity authorized to submit on behalf of + the copyright owner. For the purposes of this definition, "submitted" + means any form of electronic, verbal, or written communication sent + to the Licensor or its representatives, including but not limited to + communication on electronic mailing lists, source code control systems, + and issue tracking systems that are managed by, or on behalf of, the + Licensor for the purpose of discussing and improving the Work, but + excluding communication that is conspicuously marked or otherwise + designated in writing by the copyright owner as "Not a Contribution." + + "Contributor" shall mean Licensor and any individual or Legal Entity + on behalf of whom a Contribution has been received by Licensor and + subsequently incorporated within the Work. + + 2. Grant of Copyright License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + copyright license to reproduce, prepare Derivative Works of, + publicly display, publicly perform, sublicense, and distribute the + Work and such Derivative Works in Source or Object form. + + 3. Grant of Patent License. Subject to the terms and conditions of + this License, each Contributor hereby grants to You a perpetual, + worldwide, non-exclusive, no-charge, royalty-free, irrevocable + (except as stated in this section) patent license to make, have made, + use, offer to sell, sell, import, and otherwise transfer the Work, + where such license applies only to those patent claims licensable + by such Contributor that are necessarily infringed by their + Contribution(s) alone or by combination of their Contribution(s) + with the Work to which such Contribution(s) was submitted. If You + institute patent litigation against any entity (including a + cross-claim or counterclaim in a lawsuit) alleging that the Work + or a Contribution incorporated within the Work constitutes direct + or contributory patent infringement, then any patent licenses + granted to You under this License for that Work shall terminate + as of the date such litigation is filed. + + 4. Redistribution. You may reproduce and distribute copies of the + Work or Derivative Works thereof in any medium, with or without + modifications, and in Source or Object form, provided that You + meet the following conditions: + + (a) You must give any other recipients of the Work or + Derivative Works a copy of this License; and + + (b) You must cause any modified files to carry prominent notices + stating that You changed the files; and + + (c) You must retain, in the Source form of any Derivative Works + that You distribute, all copyright, patent, trademark, and + attribution notices from the Source form of the Work, + excluding those notices that do not pertain to any part of + the Derivative Works; and + + (d) If the Work includes a "NOTICE" text file as part of its + distribution, then any Derivative Works that You distribute must + include a readable copy of the attribution notices contained + within such NOTICE file, excluding those notices that do not + pertain to any part of the Derivative Works, in at least one + of the following places: within a NOTICE text file distributed + as part of the Derivative Works; within the Source form or + documentation, if provided along with the Derivative Works; or, + within a display generated by the Derivative Works, if and + wherever such third-party notices normally appear. The contents + of the NOTICE file are for informational purposes only and + do not modify the License. You may add Your own attribution + notices within Derivative Works that You distribute, alongside + or as an addendum to the NOTICE text from the Work, provided + that such additional attribution notices cannot be construed + as modifying the License. + + You may add Your own copyright statement to Your modifications and + may provide additional or different license terms and conditions + for use, reproduction, or distribution of Your modifications, or + for any such Derivative Works as a whole, provided Your use, + reproduction, and distribution of the Work otherwise complies with + the conditions stated in this License. + + 5. Submission of Contributions. Unless You explicitly state otherwise, + any Contribution intentionally submitted for inclusion in the Work + by You to the Licensor shall be under the terms and conditions of + this License, without any additional terms or conditions. + Notwithstanding the above, nothing herein shall supersede or modify + the terms of any separate license agreement you may have executed + with Licensor regarding such Contributions. + + 6. Trademarks. This License does not grant permission to use the trade + names, trademarks, service marks, or product names of the Licensor, + except as required for reasonable and customary use in describing the + origin of the Work and reproducing the content of the NOTICE file. + + 7. Disclaimer of Warranty. Unless required by applicable law or + agreed to in writing, Licensor provides the Work (and each + Contributor provides its Contributions) on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or + implied, including, without limitation, any warranties or conditions + of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A + PARTICULAR PURPOSE. You are solely responsible for determining the + appropriateness of using or redistributing the Work and assume any + risks associated with Your exercise of permissions under this License. + + 8. Limitation of Liability. In no event and under no legal theory, + whether in tort (including negligence), contract, or otherwise, + unless required by applicable law (such as deliberate and grossly + negligent acts) or agreed to in writing, shall any Contributor be + liable to You for damages, including any direct, indirect, special, + incidental, or consequential damages of any character arising as a + result of this License or out of the use or inability to use the + Work (including but not limited to damages for loss of goodwill, + work stoppage, computer failure or malfunction, or any and all + other commercial damages or losses), even if such Contributor + has been advised of the possibility of such damages. + + 9. Accepting Warranty or Additional Liability. While redistributing + the Work or Derivative Works thereof, You may choose to offer, + and charge a fee for, acceptance of support, warranty, indemnity, + or other liability obligations and/or rights consistent with this + License. However, in accepting such obligations, You may act only + on Your own behalf and on Your sole responsibility, not on behalf + of any other Contributor, and only if You agree to indemnify, + defend, and hold each Contributor harmless for any liability + incurred by, or claims asserted against, such Contributor by reason + of your accepting any such warranty or additional liability. + + END OF TERMS AND CONDITIONS + + APPENDIX: How to apply the Apache License to your work. + + To apply the Apache License to your work, attach the following + boilerplate notice, with the fields enclosed by brackets "[]" + replaced with your own identifying information. (Don't include + the brackets!) The text should be enclosed in the appropriate + comment syntax for the file format. We also recommend that a + file or class name and description of purpose be included on the + same "printed page" as the copyright notice for easier + identification within third-party archives. + + Copyright [yyyy] [name of copyright owner] + + Licensed under the Apache License, Version 2.0 (the "License"); + you may not use this file except in compliance with the License. + You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, software + distributed under the License is distributed on an "AS IS" BASIS, + WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + See the License for the specific language governing permissions and + limitations under the License. diff --git a/README.md b/README.md index 74ef64e..a6d6693 100644 --- a/README.md +++ b/README.md @@ -1,18 +1,13 @@ # xPyD-plan -Benchmark-data analysis toolkit for [xPyD-proxy](https://github.com/xPyD-hub/xPyD-proxy) — find the optimal **Prefill:Decode instance ratio** from real benchmark results. - -> **Core principle:** No guessing, no modeling, no simulation — everything is based on actual benchmark data. +PD ratio planner — recommend optimal **Prefill:Decode** allocation from real benchmark data. ## Install ```bash -# Base install pip install xpyd-plan - -# With HTML report generation +# With HTML reports pip install "xpyd-plan[report]" - # Development pip install "xpyd-plan[dev]" ``` @@ -20,171 +15,16 @@ pip install "xpyd-plan[dev]" ## Quick Start ```bash -# Analyze a benchmark file — find the optimal P:D ratio +# Find optimal P:D ratio from benchmark results xpyd-plan analyze --benchmark results.json --sla-ttft 200 --sla-tpot 50 # Compare two benchmark runs xpyd-plan compare --baseline baseline.json --current current.json -# Generate a Markdown report +# Generate report xpyd-plan report --format markdown --benchmark results.json --output report.md - -# Export results as JSON for automation -xpyd-plan analyze --benchmark results.json --output-format json ``` -## Features - -### Core Analysis - -| Command | Description | -|---------|-------------| -| `analyze` | SLA compliance check, utilization analysis, optimal P:D ratio finder | -| `export` | Batch export analysis results as JSON/CSV/table | -| `confidence` | Bootstrap confidence intervals for latency percentiles | -| `sensitivity` | P:D ratio vs SLA satisfaction curves with cliff detection | -| `decompose` | Per-request latency decomposition into prefill/decode/overhead phases | -| `tail` | Extended percentile analysis (P99.9, P99.99) with tail classification | - -### Comparison & Testing - -| Command | Description | -|---------|-------------| -| `compare` | Benchmark comparison with regression detection | -| `ab-test` | Statistical A/B test analysis (Welch's t-test, Mann-Whitney U, effect size) | -| `drift` | Distribution drift detection via Kolmogorov-Smirnov test | -| `model-compare` | Multi-model side-by-side latency and cost-efficiency comparison | - -### Planning & Optimization - -| Command | Description | -|---------|-------------| -| `plan-capacity` | Capacity planning with linear scaling model and confidence levels | -| `plan-benchmarks` | Generate prioritized list of P:D ratios to benchmark | -| `what-if` | Scenario simulation — scale QPS or instances and compare | -| `fleet` | Multi-GPU-type fleet sizing with budget constraints | -| `pareto` | Pareto frontier analysis across latency, cost, and waste | -| `recommend` | Ranked recommendations combining SLA, cost, Pareto, and trend data | -| `interpolate` | Performance interpolation/extrapolation for untested P:D ratios | -| `threshold-advisor` | SLA threshold tuning — find thresholds for target pass rates | -| `forecast` | Capacity forecasting from historical trend data | - -### Cost & Budgeting - -| Command | Description | -|---------|-------------| -| `budget` | SLA budget allocation across TTFT/TPOT stages | -| `scorecard` | Composite efficiency score (SLA compliance + utilization + waste) | -| `sla-tier` | Multi-SLA tier analysis for different service levels | - -### Data Management - -| Command | Description | -|---------|-------------| -| `validate` | Data quality scoring, outlier detection (IQR/Z-score) | -| `filter` | Token/latency/time-window filtering and random sampling | -| `merge` | Merge multiple benchmark files (union/intersection strategies) | -| `annotate` | Tag benchmark files with metadata (non-destructive sidecar) | -| `discover` | Recursive directory scanning for benchmark files | -| `generate` | Generate synthetic benchmark data for testing | - -### Monitoring & Alerting - -| Command | Description | -|---------|-------------| -| `alert` | YAML-defined alert rules with CI/CD-friendly exit codes | -| `trend` | Historical trend tracking with SQLite storage | -| `saturation` | Saturation point detection across QPS levels | -| `metrics` | Prometheus/OpenMetrics text format export | -| `dashboard` | Real-time Rich TUI dashboard | -| `timeline` | Time-window analysis with warmup detection and trend regression | - -### Workload Characterization - -| Command | Description | -|---------|-------------| -| `workload` | Request clustering into categories (prefill-heavy, decode-heavy, etc.) | -| `correlation` | Pearson correlation between request characteristics and latency | -| `heatmap` | 2D latency heatmap (prompt × output tokens) with hotspot detection | -| `root-cause` | Anomaly root cause analysis — SLA-failing vs passing request comparison | -| `scaling` | Throughput scaling efficiency with knee-point detection | - -### Reporting & Configuration - -| Command | Description | -|---------|-------------| -| `report` | HTML report with inline SVG visualizations | -| `report --format markdown` | Markdown report for GitHub PRs/wikis | -| `config init` | Generate starter YAML config file | -| `config show` | Display resolved configuration | -| `pipeline` | YAML-defined multi-step batch pipeline runner | - -## Benchmark Data Format - -xPyD-plan consumes JSON benchmark files (native or [xpyd-bench](https://github.com/xPyD-hub/xPyD-bench) format) containing per-request measurements: - -```json -{ - "metadata": { - "num_prefill_instances": 2, - "num_decode_instances": 6, - "total_instances": 8, - "measured_qps": 10.5 - }, - "requests": [ - { - "request_id": "req-001", - "prompt_tokens": 512, - "output_tokens": 128, - "ttft_ms": 45.2, - "tpot_ms": 12.1, - "total_latency_ms": 1593.0, - "timestamp": 1700000000.0 - } - ] -} -``` - -## Programmatic API - -Every feature is available as a Python function: - -```python -from xpyd_plan import analyze, compare_benchmarks, analyze_ab_test - -# Analyze benchmark data -result = analyze("benchmark.json", sla_ttft_ms=200, sla_tpot_ms=50) - -# Compare two runs -comparison = compare_benchmarks("baseline.json", "current.json") -``` - -## Configuration - -Create `xpyd-plan.yaml` (or `~/.config/xpyd-plan/config.yaml`) for persistent defaults: - -```bash -xpyd-plan config init # generates a commented starter file -xpyd-plan config show # shows resolved config -``` - -CLI flags always override config file values. - -## How It Works - -1. **Input**: Benchmark results from `xpyd-bench` (real measured data) -2. **Analyze**: Compute latency distributions, SLA compliance, utilization per P:D ratio -3. **Optimize**: Find the ratio with minimum resource waste while meeting SLA constraints -4. **Report**: Output tables, JSON/CSV, Markdown/HTML reports, or Prometheus metrics - -## Documentation - -- **[使用指南 (Guide)](docs/guide.md)** — 完整使用指南:安装、子命令说明、典型工作流、结果解读 -- [设计原则](docs/DESIGN_PRINCIPLES.md) — 架构与设计决策 -- [开发循环](docs/DEV_LOOP.md) — 开发流程 -- [Roadmap](ROADMAP.md) — 完整里程碑列表 -- [当前迭代状态](docs/iterations/current.md) — M104 里程碑总结 - ## License -TBD +[Apache 2.0](LICENSE) diff --git a/REVIEW_POLICY.md b/REVIEW_POLICY.md deleted file mode 100644 index 4e3ee8a..0000000 --- a/REVIEW_POLICY.md +++ /dev/null @@ -1,62 +0,0 @@ -# Review Policy - -## Roles - -| Role | GitHub Account | Action | -|------|---------------|--------| -| Implementer | `hlin99` | Write code, submit PRs, fix issues | -| Reviewer 1 | `hlin99-Review-Bot` | Review PRs: approve / request changes / close | -| Reviewer 2 | `hlin99-Review-BotX` | Review PRs: approve / request changes / close | - -## Timing - -| Parameter | Value | -|-----------|-------| -| Iteration interval | 10 minutes | -| PR wait for review | max 15 minutes | -| Fix after request changes | max 10 minutes | -| Reviewer check frequency | every 5 minutes | -| Reviewer response deadline | 15 minutes after assign | -| Reviewer timeout action | close PR (iteration failed) | -| Total round timeout | 1 hour from PR creation | -| Round timeout action | close PR (iteration failed) | - -## Review Criteria - -Reviewers evaluate each PR on two dimensions: - -### 1. Idea Value -- Is the direction/approach valuable for the project? -- Does it align with the project goals? -- **If NO → close PR immediately** (one close = PR rejected) - -### 2. Code Quality -- Is the code correct? -- Are tests included/passing? -- Is `docs/iterations/current.md` updated with clear description? -- Does `docs/guide.md` reflect changes (if applicable)? -- **If idea is good but code has issues → request changes** - -## Decision Rules - -| Scenario | Action | -|----------|--------| -| Both reviewers approve | Auto-merge | -| One approves, one requests changes | Implementer fixes, reviewers re-review | -| Either reviewer closes | PR closed, iteration failed | -| Both approve after fixes | Auto-merge | -| Timeout (15min no review) | PR closed, iteration failed | -| Total timeout (1 hour) | PR closed, iteration failed | - -## Iteration Record - -Every PR MUST update `docs/iterations/current.md` with: -- What was done this iteration -- Result: merged / closed (with reason) -- Reviewer scores/comments summary - -## Auto-Merge Requirements - -- 2 approvals from designated reviewers -- CI passes (all checks green) -- No unresolved review comments diff --git a/bot/AUTHOR_POLICY.md b/bot/AUTHOR_POLICY.md new file mode 100644 index 0000000..54c6d36 --- /dev/null +++ b/bot/AUTHOR_POLICY.md @@ -0,0 +1,23 @@ + + +# Author Policy — xPyD-plan + +## Code Standards + +- **Language:** Python 3.10+ +- **Lint:** `ruff check xpyd_plan tests` +- **Type hints:** Required for all public APIs +- **Tests:** Required for every new feature or bug fix + +## Git + +- **Committer:** `hlin99 ` +- **Branch naming:** `feat/`, `fix/`, `docs/`, `refactor/` +- **Commit format:** Conventional Commits + +## PR Checklist + +- [ ] Code + tests +- [ ] `ruff check xpyd_plan tests` passes +- [ ] `bot/iterations/current.md` updated +- [ ] PR body contains `Closes #N` (if applicable) diff --git a/bot/BOT_POLICY.md b/bot/BOT_POLICY.md new file mode 100644 index 0000000..48cd598 --- /dev/null +++ b/bot/BOT_POLICY.md @@ -0,0 +1,34 @@ + + +# Bot Policy — xPyD-plan + +## Identity + +- **Project:** xPyD-plan +- **Repo:** `xPyD-hub/xPyD-plan` +- **Architecture:** Offline planning tool — analyze vLLM/SGLang/TRT-LLM benchmark data to recommend optimal P:D instance ratios. + +## Accounts + +| Role | GitHub Account | +|------|---------------| +| Implementer | `hlin99` | +| Reviewer 1 | `hlin99-Review-Bot` | +| Reviewer 2 | `hlin99-Review-BotX` | + +## Rules + +1. **Never push directly to `main`.** All changes go through PRs. +2. **Rebase onto latest `main`** before every push. +3. **Run pre-commit** before every commit. +4. **Never self-merge.** Wait for both reviewer bots to approve. +5. **All code, docs, issues, PRs in English.** +6. **Conventional Commits** format for all commit messages. +7. **CI must be green** before merge. + +## References + +- `bot/DESIGN_PRINCIPLES.md` — what to build and why +- `bot/DEV_LOOP.md` — how to iterate +- `bot/REVIEW_POLICY.md` — review process +- `ROADMAP.md` — milestone tracker diff --git a/bot/DESIGN_PRINCIPLES.md b/bot/DESIGN_PRINCIPLES.md new file mode 100644 index 0000000..9076b86 --- /dev/null +++ b/bot/DESIGN_PRINCIPLES.md @@ -0,0 +1,41 @@ + + +# xPyD-plan Design Principles + +## Core Positioning + +Offline planning tool: analyze benchmark data from real hardware to recommend the optimal P:D configuration. + +## Methodology + +1. Users run benchmarks on real hardware with specified P:D configurations +2. Collect measured data points (TTFT, TPOT/ITL, throughput, etc.) +3. Analyze data, fit/interpolate, build performance model +4. Extrapolate predicted performance for all possible P:D ratios +5. Find the P:D combination that meets SLA with minimum waste + +## Principles + +### Data-Driven +- Everything based on real hardware benchmark data +- No pure theoretical simulation, no guessing +- Supports vLLM, SGLang, TensorRT-LLM formats + +### Independent Thinking +- Reference industry solutions for ideas, never copy blindly +- Every technical decision must have its own analysis + +### User-Friendly +- Minimize required benchmark runs +- Clearly specify which configurations to benchmark +- Include confidence/reliability assessment +- Be honest about uncertainty + +### Rigorous Definitions +- "Optimal" = minimum idle waste while meeting SLA +- SLA checks use measured percentiles (P95/P99), not averages +- Granularity: instances, not individual GPU cards + +## References +- ai-dynamo planner: reference its ideas on profiling data interpolation +- dynamo is online autoscaler; we are offline planner — different scenarios diff --git a/bot/DEV_LOOP.md b/bot/DEV_LOOP.md new file mode 100644 index 0000000..adad447 --- /dev/null +++ b/bot/DEV_LOOP.md @@ -0,0 +1,28 @@ + + +# Development Loop + +Autonomous infinite loop. Runs until explicitly stopped. + +## Each Iteration + +1. Pull latest `main`, rebase branch +2. Read `ROADMAP.md` — find next incomplete milestone +3. Read `bot/DESIGN_PRINCIPLES.md` — follow the rules +4. Check open issues/PRs — handle unmerged PRs first +5. Create GitHub Issue: problem, solution, acceptance criteria, tests +6. Create branch, implement code + tests +7. Lint: `ruff check xpyd_plan tests` +8. Update `bot/iterations/current.md` +9. Create PR (body contains `Closes #N`) +10. Wait for CI green. Fix failures. +11. Wait for reviewer bots (see `bot/REVIEW_POLICY.md`) +12. Handle review result: + - **2 approvals** → auto-merge → update ROADMAP → step 1 + - **request changes** → fix, push → wait for re-review + - **closed** → record failure in `bot/iterations/current.md` → step 1 + +## Deliverables (every iteration) + +- Code changes + tests +- Updated `bot/iterations/current.md` diff --git a/bot/ENTRY.md b/bot/ENTRY.md new file mode 100644 index 0000000..473c2db --- /dev/null +++ b/bot/ENTRY.md @@ -0,0 +1,14 @@ + + +# Bot Entry Point — xPyD-plan + +Read these files **in order** before every iteration: + +1. `bot/BOT_POLICY.md` — project rules and constraints +2. `bot/AUTHOR_POLICY.md` — coding standards +3. `bot/DESIGN_PRINCIPLES.md` — architectural guidelines +4. `bot/DEV_LOOP.md` — iteration steps +5. `bot/REVIEW_POLICY.md` — PR review process +6. `bot/iterations/current.md` — current state + +**Do not skip any file. Do not summarize from memory.** diff --git a/bot/REVIEW_POLICY.md b/bot/REVIEW_POLICY.md new file mode 100644 index 0000000..cfd721f --- /dev/null +++ b/bot/REVIEW_POLICY.md @@ -0,0 +1,38 @@ + + +# Review Policy + +## Roles + +| Role | GitHub Account | +|------|---------------| +| Implementer | `hlin99` | +| Reviewer 1 | `hlin99-Review-Bot` | +| Reviewer 2 | `hlin99-Review-BotX` | + +## Review Criteria + +### 1. Idea Value +- Does the direction align with project goals? +- **If NO → close PR immediately** (one close = PR rejected) + +### 2. Code Quality +- Correct code, tests included/passing +- `bot/iterations/current.md` updated +- **If idea good but code has issues → request changes** + +## Decision Rules + +| Scenario | Action | +|----------|--------| +| Both approve | Auto-merge | +| One approves, one requests changes | Fix, re-review | +| Either closes | PR closed, iteration failed | +| Timeout (15 min no review) | PR closed | +| Total timeout (1 hour) | PR closed | + +## Auto-Merge Requirements + +- 2 approvals from designated reviewers +- CI green +- No unresolved review comments diff --git a/docs/iterations/current.md b/bot/iterations/current.md similarity index 100% rename from docs/iterations/current.md rename to bot/iterations/current.md diff --git a/docs/DESIGN_PRINCIPLES.md b/docs/DESIGN_PRINCIPLES.md deleted file mode 100644 index 17627c0..0000000 --- a/docs/DESIGN_PRINCIPLES.md +++ /dev/null @@ -1,43 +0,0 @@ -# xPyD-plan Design Principles - -## Core Positioning -Offline planning tool: analyze vLLM benchmark data from real hardware to recommend the optimal P:D configuration. - -## Methodology -1. Users run vLLM benchmarks on real hardware with several P:D configurations we specify -2. Collect multiple measured data points (TTFT, TPOT/ITL, throughput, etc.) -3. Analyze data, fit/interpolate, build performance model -4. Extrapolate predicted performance for all possible P:D ratios -5. Find the P:D combination that meets SLA with minimum waste (best cost-efficiency) - -## Principles - -### Data-Driven -- Everything is based on real hardware benchmark data -- No pure theoretical simulation, no guessing hardware performance -- Data format: vLLM benchmark standard output - -### Independent Thinking -- Reference industry solutions (e.g., dynamo planner) for ideas, but never copy -- Every technical decision must have its own analysis and reasoning -- Algorithm choices based on actual data characteristics, not "because others do it" - -### User-Friendly -- Minimize the number of benchmark runs users need to perform -- Clearly tell users which configurations to benchmark -- Recommendations must include confidence/reliability assessment -- Be honest about uncertainty when sample points are few - -### Rigorous Definitions -- "Optimal" = minimum idle waste on P and D instances while meeting SLA -- "Waste" must have a strict mathematical definition, no ambiguity -- SLA checks use measured percentiles (P95/P99), not averages - -### Granularity -- Measured in instances, not individual GPU cards -- QPS is not optimized — it is a given from the benchmark - -## References -- ai-dynamo/dynamo planner module: reference its ideas on profiling data interpolation and SLA checking -- However, dynamo is an online autoscaler; we are an offline planning tool — different scenarios -- Do not copy dynamo's code or data formats diff --git a/docs/DEV_LOOP.md b/docs/DEV_LOOP.md deleted file mode 100644 index a889be4..0000000 --- a/docs/DEV_LOOP.md +++ /dev/null @@ -1,85 +0,0 @@ -# Development Loop - -Autonomous infinite loop. Runs until explicitly stopped. - -## Setup (every iteration) -``` -git config user.email "tony.lin@intel.com" -git config user.name "hlin99" -``` - -## Each Iteration - -1. Pull latest code -2. Read `ROADMAP.md` — find the next incomplete milestone -3. Read `DESIGN_PRINCIPLES.md` — follow the rules -4. Check open issues/PRs — handle unmerged PRs first (fix CI failures, address review comments) -5. If no milestone left, create new ones (see Phase 2 below) -6. Create GitHub Issue: problem, solution, acceptance criteria, tests -7. Create branch, implement code + tests -8. Pass lint: `ruff check src tests && isort --check src tests` -9. Update `docs/iterations/current.md` with what you did this iteration -10. Create PR (body contains `Closes #N`) -11. Wait for CI green. Fix failures. Never merge red CI. -12. **Wait for reviewer bots** — do NOT self-merge. Two reviewer bots (`hlin99-Review-Bot` and `hlin99-Review-BotX`) will be auto-assigned. -13. Handle review result: - - **2 approvals** → auto-merge → update ROADMAP.md → go to step 1 - - **request changes** → fix code, push to same PR → wait for re-review (max 10 min to fix) - - **closed by reviewer** → iteration failed → push update to `docs/iterations/current.md` on main recording the failure (what was attempted, why rejected, reviewer comments) → go to step 1 with a different task -14. Go to step 1 - -## Review Rules (see REVIEW_POLICY.md) - -- 2 reviewer bots are auto-assigned on PR creation -- Either reviewer can close the PR (idea rejected) — one close = PR dead -- Both must approve for merge -- Reviewer timeout: 15 minutes → PR auto-closed -- Total round timeout: 1 hour → PR auto-closed -- Implementer (hlin99) must NEVER approve or merge their own PR - -## Timing - -| Parameter | Value | -|-----------|-------| -| Iteration interval | 10 minutes | -| PR wait for review | max 15 minutes | -| Fix after request changes | max 10 minutes | -| Total round timeout | 1 hour | - -## Deliverables (every iteration) - -Every PR MUST include: -- Code changes (if any) -- Tests for new code -- Updated `docs/iterations/current.md` describing what was done - -## Rules -- Committer must be `hlin99 ` — always set git config before any commit -- All code, docs, issues, PRs in English -- Commit messages: conventional commits format -- Never self-merge — wait for reviewer bots - -## Phase 1: Roadmap-Driven -Follow ROADMAP.md milestones in order. - -## Phase 2: Continuous Evolution -When all milestones are done: -1. Review the project — find limitations, improvements, new scenarios -2. Create new milestones in ROADMAP.md -3. Return to Phase 1 - -## Iteration Tracking - -`docs/iterations/current.md` must maintain a running log at the bottom: - -```markdown -## Iteration History - -| # | Date | Task | Result | Reviewer Comments | -|---|------|------|--------|-------------------| -| 1 | 2026-04-06 | Added X feature | ✅ merged | Both approved | -| 2 | 2026-04-06 | Refactored Y | ❌ closed | BotX: idea not valuable | -| 3 | 2026-04-06 | Fixed Z bug | ✅ merged | Bot requested changes, fixed | -``` - -This table is the source of truth for iteration success/failure rate.