Fix SM70 docker build, CUDA 13.0 compat conflict, and README docker args#43
Fix SM70 docker build, CUDA 13.0 compat conflict, and README docker args#43titidatiti wants to merge 39 commits into
Conversation
Signed-off-by: Pan-Shuhan-YMZX <263558224+Pan-Shuhan-YMZX@users.noreply.github.com> (cherry picked from commit f8e4c58adad5561ab4cd006fdab6c9b1903eec1c)
Signed-off-by: Pan-Shuhan-YMZX <263558224+Pan-Shuhan-YMZX@users.noreply.github.com> (cherry picked from commit 2fc562b8cfae2bb255baf097e0c71b498860c327)
Signed-off-by: yangzhuxinyzx <153831768+yangzhuxinyzx@users.noreply.github.com>
Updated the WeChat group QR code image in the README.
修复了错误的名字
Add FLASH_ATTN_V100 runtime path, Qwen3.5/Qwen3.6 launch profiles, SM70 AWQ updates, vendored build dependencies, and public regression charts.
Signed-off-by: yangzhuxinyzx <153831768+yangzhuxinyzx@users.noreply.github.com>
Signed-off-by: yangzhuxinyzx <153831768+yangzhuxinyzx@users.noreply.github.com>
Signed-off-by: yangzhuxinyzx <153831768+yangzhuxinyzx@users.noreply.github.com>
Signed-off-by: yangzhuxinyzx <153831768+yangzhuxinyzx@users.noreply.github.com>
Signed-off-by: yangzhuxinyzx <153831768+yangzhuxinyzx@users.noreply.github.com>
Signed-off-by: yangzhuxinyzx <153831768+yangzhuxinyzx@users.noreply.github.com>
Signed-off-by: yangzhuxinyzx <153831768+yangzhuxinyzx@users.noreply.github.com>
Signed-off-by: yangzhuxinyzx <153831768+yangzhuxinyzx@users.noreply.github.com>
Signed-off-by: yangzhuxinyzx <153831768+yangzhuxinyzx@users.noreply.github.com>
Signed-off-by: yangzhuxinyzx <153831768+yangzhuxinyzx@users.noreply.github.com>
Signed-off-by: yangzhuxinyzx <153831768+yangzhuxinyzx@users.noreply.github.com>
Signed-off-by: yangzhuxinyzx <153831768+yangzhuxinyzx@users.noreply.github.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run You ask your reviewers to trigger select CI tests on top of Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add If you have any questions, please reach out to us on Slack at https://slack.vllm.ai. 🚀 |
为什么需要这个 PR (What this PR does / why we need it)
我在使用 Tesla V100 (SM70) 和较新宿主机驱动(如 CUDA 13.0)部署镜像时遇到了编译报错、运行时 Error 803 以及Vllm参数未生效的问题。这些问题在本地部署时不会发生,而Docker部署就会出现问题,本 PR 旨在彻底修复这些问题:
[Bugfix] 修复 SM70 AWQ kernel 编译失败:
docker/Dockerfile的csrc-build阶段缺失了对lmdeploy源码的拷贝,导致 CMake 判定跳过 SM70 核心的编译(issue: https://github.com/1CatAI/1Cat-vLLM/issues/42#issuecomment-4467796333)。本提交补充了COPY lmdeploy lmdeploy/。[Core] 移除硬编码的 cuda-compat 库配置:
注释掉了
docker/Dockerfile中强制写入/etc/ld.so.conf.d/cuda-compat.conf的两行命令。如果宿主机的显卡驱动较新(如 580 系列),强行加载旧的 compat 库会覆盖nvidia-container-toolkit的智能动态判定,导致容器启动报错 Error 803 (system has unsupported display driver / cuda driver combination)。移除后,让底层 toolkit 自行接管新旧驱动的兼容性挂载,更为稳妥。[Doc] 纠正 README 中的 docker 启动参数传参方式:
修正了
README.md中有关 Docker 启动配置的错误。原文档将model、tensor_parallel_size等核心参数作为环境变量 (-e VLLM_MODEL=...) 传递,但vllm serve入口点原生仅识别 CLI 参数。这会导致容器静默忽略用户的设定并错误地加载源码默认的Qwen3-0.6B。已将其替换为正确的 CLI 传参语法。测试记录 (Test Record)