Skip to content

Improve out-of-the-box recipe for Kimi-K2.5 #324

@faradawn

Description

@faradawn

Recipe file: moonshotai/Kimi-K2.5.md

Goal: 1) no arbitrary flags and environment variables, 2) vllm serve just works -the flags should be pushed to the core engine codebase.

Action items

  • --compilation_config.pass_config.fuse_allreduce_rms true: Can we make default for MoE models on Hopper hardware? -> Yes. That is for v0.17. The latest vllm does not need these flags. (Hanjie Qiu, Wei Zhao).
  • --mm-encoder-tp-mode data: Can we push this to vllm codebase to auto-detect?
  • --tool-call-parser / --reasoning-parser / --enable-auto-tool-choice: Can we parser names from model? -> Not for now. (Ben Browning). But can add this in the future.
  • --trust-remote-code: Can we eliminate this?
  • Split into Min Latency and Max Throughput configs. -> Found. (Ankur Singh)
  • Add validated results (throughput, TTFT, TPOT, ITL) etc. -> Added. (Ankur Singh)
  • Fix Eagle3 command to match base serve flags consistently.

Metadata

Metadata

Assignees

Labels

documentationImprovements or additions to documentation

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions