Add model folder pre-validation for inference sessions in Manager scheduler

Currently, when an INFERENCE session (vLLM, TGI, NIM, SGLang, etc.) is created without a model virtual folder (VFolderUsageMode.MODEL), the error is only caught on the Agent side (agent.py:3303 ModelFolderNotSpecifiedError) after the RPC has already been dispatched.

This causes unnecessary RPC traffic and repeated failures. In Dogbowl, this resulted in ~900 failed RPC calls per hour sustained over 2 days.

The Manager's SessionValidator (sokovan/scheduling_controller/validators/) has rules for container limits, service ports, resource limits, and mount names, but no rule to validate that inference sessions include at least one model-type vfolder.

Implementation:
- Add a new SessionValidatorRule (e.g. InferenceModelFolderRule) in validators/inference.py
- When session_type == INFERENCE and runtime_variant != CUSTOM, require at least one mount with usage_mode MODEL
- SessionCreationSpec already has session_type (line 86) and creation_spec with mounts/runtime_variant (lines 172-176)
- Register the new rule in scheduling_controller.py alongside existing rules (line 117-122)
- Export from validators/__init__.py

Files:
- NEW: sokovan/scheduling_controller/validators/inference.py
- MOD: sokovan/scheduling_controller/validators/__init__.py
- MOD: sokovan/scheduling_controller/scheduling_controller.py


JIRA Issue: BA-4816

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add model folder pre-validation for inference sessions in Manager scheduler #9556

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add model folder pre-validation for inference sessions in Manager scheduler #9556

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions