Improve out-of-the-box recipe for Kimi-K2.5

Recipe file: [moonshotai/Kimi-K2.5.md](https://github.com/vllm-project/recipes/blob/main/moonshotai/Kimi-K2.5.md)

Goal: 1) no arbitrary flags and environment variables, 2) vllm serve just works -the flags should be pushed to the core engine codebase. 


Action items

- [x] **`--compilation_config.pass_config.fuse_allreduce_rms true`**: Can we make default for MoE models on Hopper hardware? -> Yes. That is for v0.17. The latest vllm does not need these flags. (Hanjie Qiu, Wei Zhao).
- [ ]  **`--mm-encoder-tp-mode data`**: Can we push this to vllm codebase to auto-detect?
- [x] **`--tool-call-parser` / `--reasoning-parser` / `--enable-auto-tool-choice`**: Can we parser names from model? -> Not for now. (Ben Browning). But can add this in the future. 
- [ ] **`--trust-remote-code`**: Can we eliminate this?
- [ ] Split into **Min Latency**  and **Max Throughput** configs. -> Found. (Ankur Singh) 
- [ ] Add validated results (throughput, TTFT, TPOT, ITL) etc. -> Added. (Ankur Singh)
- [ ] Fix Eagle3 command to match base serve flags consistently.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve out-of-the-box recipe for Kimi-K2.5 #324

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Improve out-of-the-box recipe for Kimi-K2.5 #324

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions