Local Model Deployment

SmartResume can run without external APIs in two different ways.

Option 1: OpenAI-compatible vLLM server

Install vLLM and download the resume model:

pip install vllm
python scripts/download_models.py

Launch the server (port 8001 is used by default in the config):

python -m vllm.entrypoints.openai.api_server \
  --model ./models/Qwen3-0.6B \
  --port 8001 \
  --host 0.0.0.0 \
  --tensor-parallel-size 1

Update configs/config.yaml so that the extraction channels point to the local endpoint:

channels:
  local_qwen:
    name: "models/Qwen3-0.6B"
    api_url: "http://localhost:8001/v1"
    api_key: "local"

extract_channels:
  basic_info: "local_qwen"
  work_experience: "local_qwen"
  education: "local_qwen"

Run the parser as usual:

python scripts/start.py --file resume.pdf

Option 2: Direct model loading (offline)

If you prefer to load the Transformers model directly, enable the direct mode in the same config:

use_direct_models: true
direct_model_name: "models/Qwen3-0.6B"

When use_direct_models is true, SmartResume first attempts to load the model from disk and falls back to the configured channels or remote API if necessary.

Python API example

from smartresume import ResumeAnalyzer

analyzer = ResumeAnalyzer(init_ocr=True, init_llm=True, config_path="configs/config.yaml")
result = analyzer.pipeline(
    cv_path="resume.pdf",
    resume_id="resume_001",
    extract_types=["basic_info", "work_experience", "education"],
)

No extra arguments are required—the behavior is entirely driven by the YAML configuration.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Local Model Deployment

Option 1: OpenAI-compatible vLLM server

Option 2: Direct model loading (offline)

Python API example

FilesExpand file tree

local-models.md

Latest commit

History

local-models.md

File metadata and controls

Local Model Deployment

Option 1: OpenAI-compatible vLLM server

Option 2: Direct model loading (offline)

Python API example