You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Goal: 1) no arbitrary flags and environment variables, 2) vllm serve just works -the flags should be pushed to the core engine codebase.
Action items
--compilation_config.pass_config.fuse_allreduce_rms true: Can we make default for MoE models on Hopper hardware? -> Yes. That is for v0.17. The latest vllm does not need these flags. (Hanjie Qiu, Wei Zhao).
--mm-encoder-tp-mode data: Can we push this to vllm codebase to auto-detect?
--tool-call-parser / --reasoning-parser / --enable-auto-tool-choice: Can we parser names from model? -> Not for now. (Ben Browning). But can add this in the future.
--trust-remote-code: Can we eliminate this?
Split into Min Latency and Max Throughput configs. -> Found. (Ankur Singh)
Recipe file: moonshotai/Kimi-K2.5.md
Goal: 1) no arbitrary flags and environment variables, 2) vllm serve just works -the flags should be pushed to the core engine codebase.
Action items
--compilation_config.pass_config.fuse_allreduce_rms true: Can we make default for MoE models on Hopper hardware? -> Yes. That is for v0.17. The latest vllm does not need these flags. (Hanjie Qiu, Wei Zhao).--mm-encoder-tp-mode data: Can we push this to vllm codebase to auto-detect?--tool-call-parser/--reasoning-parser/--enable-auto-tool-choice: Can we parser names from model? -> Not for now. (Ben Browning). But can add this in the future.--trust-remote-code: Can we eliminate this?