Feature: Support downloading model weights on-the-fly from HuggingFace (#166) by rohan-uiuc · Pull Request #167 · VectorInstitute/vector-inference

rohan-uiuc · 2025-11-12T22:58:44Z

PR Type

Feature

Short Description

Implements support for on-the-fly model weight downloads from HuggingFace when local model weights directory doesn't exist. This allows users to launch models without manually downloading and mounting weight directories.

The code now checks if the model weights directory exists before attempting to bind mount it. If the directory doesn't exist, it skips the bind mount and uses the model identifier from --model in vllm_args (or falls back to model_name). Users must pass the full HuggingFace model identifier (e.g., Qwen/Qwen2.5-7B-Instruct) via --model in vllm_args for automatic downloads to work.

Fixes #166

Tests Added

test_generate_server_setup_singularity_no_weights: Verifies server setup doesn't include model weights path when directory doesn't exist
test_generate_launch_cmd_singularity_no_local_weights: Verifies launch command uses HF model identifier when local weights are missing
test_generate_model_launch_script_singularity_no_weights: Verifies batch mode correctly handles missing model weights
All existing tests pass (28 tests in test_slurm_script_generator.py, 116+ total tests)
Verified end-to-end: model downloads and serves successfully from HuggingFace when local weights don't exist and --model is specified in vllm_args

…vllm args

…ssing

…/llm-inference into hf_download

codecov-commenter · 2025-11-12T23:04:46Z

Codecov Report

❌ Patch coverage is 93.54839% with 2 lines in your changes missing coverage. Please review.
✅ Project coverage is 90.86%. Comparing base (d11de79) to head (2db9c6c).
⚠️ Report is 14 commits behind head on main.

Files with missing lines	Patch %	Lines
vec_inf/client/_slurm_script_generator.py	91.30%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #167      +/-   ##
==========================================
+ Coverage   90.83%   90.86%   +0.03%     
==========================================
  Files          14       14              
  Lines        1342     1369      +27     
==========================================
+ Hits         1219     1244      +25     
- Misses        123      125       +2

Files with missing lines	Coverage Δ
vec_inf/client/_slurm_templates.py	`100.00% <ø> (ø)`
vec_inf/client/_utils.py	`79.88% <100.00%> (+0.99%)`	⬆️
vec_inf/client/_slurm_script_generator.py	`96.81% <91.30%> (-1.02%)`	⬇️

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

XkunW

Hi @rohan-uiuc, thanks for opening this, I left a few comments. Another thing worth considering is adding a check in the API to see if a model needs to be downloaded, and if that's the case, only allow the download if the HF cache directory env var is set, so that users wouldn't accidentally download a model to their home directory and use up all the quota

vec_inf/client/_slurm_templates.py

XkunW · 2025-11-18T19:39:12Z

vec_inf/client/_slurm_script_generator.py

        """
        launcher_script = ["\n"]
+
+        vllm_args_copy = self.params["vllm_args"].copy()


Not sure if this is necessary, as the model name should be parsed with launch command not part of --vllm-args

Currently, model_name is the short name used for config lookup, log directories, and job naming (e.g., llama-3). The --model in vllm_args would allow users to specify the full HF path when downloading from HuggingFace.

I'm open to alternative approaches if you have a preference, like:
Dedicated CLI option (e.g., --hf-model) - keeps model_name as the short identifier, adds explicit option for full HF path
Reuse existing model_name - allow full HF paths directly, but adjust config lookups, log directory structure, etc. to handle paths with /

Ah thanks for the clarification, I think having a dedicated CLI option keeps things clean and means minimal changes. However the code base has changed pretty significantly and there are quite a few conflicts in order to merge the changes, if you give me the access to your branch I can help resolve the conflicts.

I have updated the PR to address your suggestion. Since my fork now lives under an org, I didn't have a simple option to grant access to the branch. So, I have also invited you to the repository, please feel free to make changes.

vec_inf/client/_slurm_script_generator.py

Adds hf_model field to ModelConfig and LaunchOptions to specify a HuggingFace model id for vLLM to download at runtime.

Updates SlurmScriptGenerator and BatchSlurmScriptGenerator to use hf_model for vllm serve when local weights don't exist. Priority: local weights > hf_model > model name.

rohan-uiuc added 7 commits October 30, 2025 17:33

Add support to download models automatically if --model specified in …

fc843ed

…vllm args

create model dir if it doesn't exist

5f790ff

Check model weights existence before binding; use HF model name if mi…

0f22bec

…ssing

Remove commented code

9f2fdd2

Apply code formatting fixes from pre-commit

38011be

revert unnecessary test change

4de3563

Merge branch 'develop' of https://github.com/Center-for-AI-Innovation…

eb1e929

…/llm-inference into hf_download

rohan-uiuc added 2 commits November 12, 2025 17:05

Apply formatting fixes from pre-commit

8b6a211

Add tests for model weights existence coverage

c68cb35

XkunW reviewed Nov 18, 2025

View reviewed changes

rohan-uiuc added 9 commits January 5, 2026 17:32

Remove redundant /dev/infiniband

b610891

Remove unused variable

a7a5deb

Add warning if downloading weights and HF cache not set

bb3142b

format ONLY

2db9c6c

Add --hf-model CLI option and config field

079c86a

Adds hf_model field to ModelConfig and LaunchOptions to specify a HuggingFace model id for vLLM to download at runtime.

Use hf_model as model source when local weights missing

20163da

Updates SlurmScriptGenerator and BatchSlurmScriptGenerator to use hf_model for vllm serve when local weights don't exist. Priority: local weights > hf_model > model name.

Pass hf_model from CLI to launch params

ed82b77

Add tests for hf_model override in slurm script generation

d3f6772

Add documentation for --hf-model option

1c312df

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature: Support downloading model weights on-the-fly from HuggingFace (#166)#167

Feature: Support downloading model weights on-the-fly from HuggingFace (#166)#167
rohan-uiuc wants to merge 18 commits intoVectorInstitute:mainfrom
Center-for-AI-Innovation:hf_download

rohan-uiuc commented Nov 12, 2025 •

edited

Loading

Uh oh!

codecov-commenter commented Nov 12, 2025 •

edited

Loading

Uh oh!

XkunW left a comment

Uh oh!

Uh oh!

XkunW Nov 18, 2025

Uh oh!

rohan-uiuc Jan 5, 2026

Uh oh!

XkunW Jan 12, 2026

Uh oh!

rohan-uiuc Jan 29, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

rohan-uiuc commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Type

Short Description

Tests Added

Uh oh!

codecov-commenter commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

XkunW left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

XkunW Nov 18, 2025

Choose a reason for hiding this comment

Uh oh!

rohan-uiuc Jan 5, 2026

Choose a reason for hiding this comment

Uh oh!

XkunW Jan 12, 2026

Choose a reason for hiding this comment

Uh oh!

rohan-uiuc Jan 29, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

rohan-uiuc commented Nov 12, 2025 •

edited

Loading

codecov-commenter commented Nov 12, 2025 •

edited

Loading