Add Fara-7b recipes#384
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adds a new microsoft-Fara-7B/builtin recipe bundle to export/optimize the Fara-7B vision-language model to ONNX (via Olive) and run it with ONNX Runtime GenAI, including CPU and CUDA pipelines.
Changes:
- Added Olive pipeline JSONs for embedding / vision / text sub-model export + optimization for
cpu_and_mobile/andcuda/. - Added Python orchestration and runtime scripts (
optimize.py,inference.py,user_script.py) plus model code undercodes/. - Added supporting metadata/docs (
README.md,info.yml) and local ignore rules (.gitignore).
Reviewed changes
Copilot reviewed 14 out of 15 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| microsoft-Fara-7B/builtin/user_script.py | Olive callbacks for loading the custom VL model + IO/dummy input definitions for export. |
| microsoft-Fara-7B/builtin/requirements.txt | Python dependencies to run Olive export/optimization and scripts. |
| microsoft-Fara-7B/builtin/optimize.py | Runs the three Olive configs and patches/writes GenAI runtime config files. |
| microsoft-Fara-7B/builtin/info.yml | Minimal metadata for the builtin recipe directory. |
| microsoft-Fara-7B/builtin/inference.py | Example ONNX Runtime GenAI inference script for text-only and image+text prompts. |
| microsoft-Fara-7B/builtin/cuda/embedding.json | CUDA Olive pipeline for exporting/optimizing embedding sub-model. |
| microsoft-Fara-7B/builtin/cuda/text.json | CUDA Olive pipeline for producing INT4 text decoder via ModelBuilder. |
| microsoft-Fara-7B/builtin/cuda/vision.json | CUDA Olive pipeline for exporting/optimizing vision encoder sub-model. |
| microsoft-Fara-7B/builtin/cpu_and_mobile/embedding.json | CPU/mobile Olive pipeline for exporting/quantizing embedding sub-model. |
| microsoft-Fara-7B/builtin/cpu_and_mobile/text.json | CPU/mobile Olive pipeline for producing INT4 text decoder via ModelBuilder. |
| microsoft-Fara-7B/builtin/cpu_and_mobile/vision.json | CPU/mobile Olive pipeline for exporting/quantizing vision encoder sub-model. |
| microsoft-Fara-7B/builtin/codes/modeling_qwen2_5_vl.py | Custom Qwen2.5-VL-derived PyTorch modeling code to enable ONNX export. |
| microsoft-Fara-7B/builtin/codes/init.py | Package marker for codes/. |
| microsoft-Fara-7B/builtin/README.md | End-to-end instructions and usage examples for export + inference. |
| microsoft-Fara-7B/builtin/.gitignore | Ignores generated artifacts and caches for the builtin workflow. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
|
Please add LICENSE file. |
| "int4": { | ||
| "type": "OnnxBlockWiseRtnQuantization", | ||
| "block_size": 128, | ||
| "is_symmetric": true, |
There was a problem hiding this comment.
If the recipe is using default value of a parameter then there is no need to add it in the config.
| @@ -0,0 +1,11 @@ | |||
| { | |||
There was a problem hiding this comment.
Just FYI.
You may be directly able to use olive.capture_onnx_graph(...) API or corresponding CLI directly for standard scenarios like this instead of writing .json configs.
| @@ -0,0 +1,420 @@ | |||
| """Evaluate Fara-7B ONNX model on ScreenSpot-v2 (GUI grounding benchmark). | |||
There was a problem hiding this comment.
Olive has support for A) evaluating model using genai and B) using HF datasets for evaluation. Let's use these features in follow up PR.
For example, we had many many recipes using imagenet dataset earlier and now it is simple to evaluate a model using imagenet dataset.
If needed, please update the Olive to support ScreenSpot-V2 dataset in the follow up PR. Thanks!
This pull request introduces a complete ONNX Runtime GenAI example for the Fara-7B vision-language model, including documentation, configuration files for model export and optimization (for both CPU and CUDA), a Python inference script, and supporting metadata. The changes enable users to export, optimize, quantize, and run inference with Fara-7B using ONNX Runtime GenAI, supporting both text and image inputs.
Key changes:
1. Documentation and Metadata
README.mddescribing the Fara-7B ONNX Runtime GenAI pipeline, setup instructions, usage examples, and directory structure.info.ymlwith metadata such as supported execution providers, devices, and keywords for discoverability.2. Model Export and Optimization Pipelines
cpu_and_mobile/embedding.json,cpu_and_mobile/vision.json,cpu_and_mobile/text.json) and CUDA (cuda/embedding.json,cuda/vision.json,cuda/text.json) pipelines, specifying model export, graph surgeries, optimizations, and quantization/precision steps for each sub-model (vision encoder, embedding, text decoder). [1] [2] [3] [4] [5] [6]3. Inference Script
inference.py, a Python script for running text or multimodal (image+text) inference with ONNX Runtime GenAI, supporting both batch and interactive modes.4. Project Structure and Ignore Rules
.gitignoreto exclude generated models, cache, Python bytecode, and log files.These changes together provide an end-to-end workflow for exporting, optimizing, quantizing, and running inference on the Fara-7B model with ONNX Runtime GenAI, making it easy for users to deploy and test the model on both CPU and GPU platforms.