Skip to content

Add Fara-7b recipes#384

Open
apsonawane wants to merge 7 commits into
mainfrom
asonawane/fara-recipes
Open

Add Fara-7b recipes#384
apsonawane wants to merge 7 commits into
mainfrom
asonawane/fara-recipes

Conversation

@apsonawane
Copy link
Copy Markdown
Contributor

This pull request introduces a complete ONNX Runtime GenAI example for the Fara-7B vision-language model, including documentation, configuration files for model export and optimization (for both CPU and CUDA), a Python inference script, and supporting metadata. The changes enable users to export, optimize, quantize, and run inference with Fara-7B using ONNX Runtime GenAI, supporting both text and image inputs.

Key changes:

1. Documentation and Metadata

  • Added a comprehensive README.md describing the Fara-7B ONNX Runtime GenAI pipeline, setup instructions, usage examples, and directory structure.
  • Introduced info.yml with metadata such as supported execution providers, devices, and keywords for discoverability.

2. Model Export and Optimization Pipelines

  • Added Olive configuration JSONs for CPU/mobile (cpu_and_mobile/embedding.json, cpu_and_mobile/vision.json, cpu_and_mobile/text.json) and CUDA (cuda/embedding.json, cuda/vision.json, cuda/text.json) pipelines, specifying model export, graph surgeries, optimizations, and quantization/precision steps for each sub-model (vision encoder, embedding, text decoder). [1] [2] [3] [4] [5] [6]

3. Inference Script

  • Added inference.py, a Python script for running text or multimodal (image+text) inference with ONNX Runtime GenAI, supporting both batch and interactive modes.

4. Project Structure and Ignore Rules

  • Updated .gitignore to exclude generated models, cache, Python bytecode, and log files.

These changes together provide an end-to-end workflow for exporting, optimizing, quantizing, and running inference on the Fara-7B model with ONNX Runtime GenAI, making it easy for users to deploy and test the model on both CPU and GPU platforms.

Copilot AI review requested due to automatic review settings April 24, 2026 23:17
Comment thread microsoft-Fara-7B/builtin/codes/modeling_qwen2_5_vl.py Dismissed
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds a new microsoft-Fara-7B/builtin recipe bundle to export/optimize the Fara-7B vision-language model to ONNX (via Olive) and run it with ONNX Runtime GenAI, including CPU and CUDA pipelines.

Changes:

  • Added Olive pipeline JSONs for embedding / vision / text sub-model export + optimization for cpu_and_mobile/ and cuda/.
  • Added Python orchestration and runtime scripts (optimize.py, inference.py, user_script.py) plus model code under codes/.
  • Added supporting metadata/docs (README.md, info.yml) and local ignore rules (.gitignore).

Reviewed changes

Copilot reviewed 14 out of 15 changed files in this pull request and generated 4 comments.

Show a summary per file
File Description
microsoft-Fara-7B/builtin/user_script.py Olive callbacks for loading the custom VL model + IO/dummy input definitions for export.
microsoft-Fara-7B/builtin/requirements.txt Python dependencies to run Olive export/optimization and scripts.
microsoft-Fara-7B/builtin/optimize.py Runs the three Olive configs and patches/writes GenAI runtime config files.
microsoft-Fara-7B/builtin/info.yml Minimal metadata for the builtin recipe directory.
microsoft-Fara-7B/builtin/inference.py Example ONNX Runtime GenAI inference script for text-only and image+text prompts.
microsoft-Fara-7B/builtin/cuda/embedding.json CUDA Olive pipeline for exporting/optimizing embedding sub-model.
microsoft-Fara-7B/builtin/cuda/text.json CUDA Olive pipeline for producing INT4 text decoder via ModelBuilder.
microsoft-Fara-7B/builtin/cuda/vision.json CUDA Olive pipeline for exporting/optimizing vision encoder sub-model.
microsoft-Fara-7B/builtin/cpu_and_mobile/embedding.json CPU/mobile Olive pipeline for exporting/quantizing embedding sub-model.
microsoft-Fara-7B/builtin/cpu_and_mobile/text.json CPU/mobile Olive pipeline for producing INT4 text decoder via ModelBuilder.
microsoft-Fara-7B/builtin/cpu_and_mobile/vision.json CPU/mobile Olive pipeline for exporting/quantizing vision encoder sub-model.
microsoft-Fara-7B/builtin/codes/modeling_qwen2_5_vl.py Custom Qwen2.5-VL-derived PyTorch modeling code to enable ONNX export.
microsoft-Fara-7B/builtin/codes/init.py Package marker for codes/.
microsoft-Fara-7B/builtin/README.md End-to-end instructions and usage examples for export + inference.
microsoft-Fara-7B/builtin/.gitignore Ignores generated artifacts and caches for the builtin workflow.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread microsoft-Fara-7B/builtin/codes/modeling_qwen2_5_vl.py
Comment thread microsoft-Fara-7B/builtin/requirements.txt Outdated
Comment thread microsoft-Fara-7B/builtin/README.md Outdated
Comment thread microsoft-Fara-7B/builtin/codes/modeling_qwen2_5_vl.py
@apsonawane apsonawane enabled auto-merge (squash) April 24, 2026 23:52
Comment thread microsoft-Fara-7B/builtin/eval.py Fixed
Comment thread microsoft-Fara-7B/builtin/eval.py Fixed
apsonawane and others added 3 commits April 28, 2026 17:38
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <223894421+github-code-quality[bot]@users.noreply.github.com>
@devang-ml
Copy link
Copy Markdown
Contributor

Please add LICENSE file.

"int4": {
"type": "OnnxBlockWiseRtnQuantization",
"block_size": 128,
"is_symmetric": true,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the recipe is using default value of a parameter then there is no need to add it in the config.

@@ -0,0 +1,11 @@
{
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just FYI.
You may be directly able to use olive.capture_onnx_graph(...) API or corresponding CLI directly for standard scenarios like this instead of writing .json configs.

@@ -0,0 +1,420 @@
"""Evaluate Fara-7B ONNX model on ScreenSpot-v2 (GUI grounding benchmark).
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Olive has support for A) evaluating model using genai and B) using HF datasets for evaluation. Let's use these features in follow up PR.

For example, we had many many recipes using imagenet dataset earlier and now it is simple to evaluate a model using imagenet dataset.

If needed, please update the Olive to support ScreenSpot-V2 dataset in the follow up PR. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants