Feature: Support downloading model weights on-the-fly from HuggingFace (#166)#167
Feature: Support downloading model weights on-the-fly from HuggingFace (#166)#167rohan-uiuc wants to merge 18 commits intoVectorInstitute:mainfrom
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #167 +/- ##
==========================================
+ Coverage 90.83% 90.86% +0.03%
==========================================
Files 14 14
Lines 1342 1369 +27
==========================================
+ Hits 1219 1244 +25
- Misses 123 125 +2
🚀 New features to boost your workflow:
|
XkunW
left a comment
There was a problem hiding this comment.
Hi @rohan-uiuc, thanks for opening this, I left a few comments. Another thing worth considering is adding a check in the API to see if a model needs to be downloaded, and if that's the case, only allow the download if the HF cache directory env var is set, so that users wouldn't accidentally download a model to their home directory and use up all the quota
| """ | ||
| launcher_script = ["\n"] | ||
|
|
||
| vllm_args_copy = self.params["vllm_args"].copy() |
There was a problem hiding this comment.
Not sure if this is necessary, as the model name should be parsed with launch command not part of --vllm-args
There was a problem hiding this comment.
Currently, model_name is the short name used for config lookup, log directories, and job naming (e.g., llama-3). The --model in vllm_args would allow users to specify the full HF path when downloading from HuggingFace.
I'm open to alternative approaches if you have a preference, like:
Dedicated CLI option (e.g., --hf-model) - keeps model_name as the short identifier, adds explicit option for full HF path
Reuse existing model_name - allow full HF paths directly, but adjust config lookups, log directory structure, etc. to handle paths with /
There was a problem hiding this comment.
Ah thanks for the clarification, I think having a dedicated CLI option keeps things clean and means minimal changes. However the code base has changed pretty significantly and there are quite a few conflicts in order to merge the changes, if you give me the access to your branch I can help resolve the conflicts.
There was a problem hiding this comment.
I have updated the PR to address your suggestion. Since my fork now lives under an org, I didn't have a simple option to grant access to the branch. So, I have also invited you to the repository, please feel free to make changes.
Adds hf_model field to ModelConfig and LaunchOptions to specify a HuggingFace model id for vLLM to download at runtime.
Updates SlurmScriptGenerator and BatchSlurmScriptGenerator to use hf_model for vllm serve when local weights don't exist. Priority: local weights > hf_model > model name.
PR Type
Feature
Short Description
Implements support for on-the-fly model weight downloads from HuggingFace when local model weights directory doesn't exist. This allows users to launch models without manually downloading and mounting weight directories.
The code now checks if the model weights directory exists before attempting to bind mount it. If the directory doesn't exist, it skips the bind mount and uses the model identifier from
--modelin vllm_args (or falls back to model_name). Users must pass the full HuggingFace model identifier (e.g.,Qwen/Qwen2.5-7B-Instruct) via--modelin vllm_args for automatic downloads to work.Fixes #166
Tests Added
test_generate_server_setup_singularity_no_weights: Verifies server setup doesn't include model weights path when directory doesn't existtest_generate_launch_cmd_singularity_no_local_weights: Verifies launch command uses HF model identifier when local weights are missingtest_generate_model_launch_script_singularity_no_weights: Verifies batch mode correctly handles missing model weights--modelis specified in vllm_args