Skip to content

Translate gemma 4B recipes#404

Open
tanzeel-amd wants to merge 7 commits into
microsoft:mainfrom
tanzeel-amd:translate-gemma-4b
Open

Translate gemma 4B recipes#404
tanzeel-amd wants to merge 7 commits into
microsoft:mainfrom
tanzeel-amd:translate-gemma-4b

Conversation

@tanzeel-amd
Copy link
Copy Markdown

Translate gemma 4B recipes

VishalX and others added 7 commits April 21, 2026 08:27
- Introduced benchmark_wmt24pp.py for evaluating translation quality on the WMT24++ dataset using COMET.
- Added inference.py for text and image translation capabilities.
- Created README.md with setup instructions, usage examples, and model architecture details.
- Implemented optimization pipeline in optimize.py for exporting sub-models (text decoder, vision encoder, embedding) with INT4 quantization.
- Developed user script in user_script.py for model integration.
- Configured export settings in JSON files for different model variants (INT4, AWQ, FP32).
- Included test images for demonstration purposes.
Copilot AI review requested due to automatic review settings May 8, 2026 10:32
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Olive recipe for exporting and running the gated google/translategemma-4b-it multimodal translation model (text and image translation) via ONNX Runtime GenAI, including an optional WMT24++ + COMET benchmarking script.

Changes:

  • Added end-to-end export pipeline (builtin/optimize.py) plus Olive JSON configs to export text/vision/embedding ONNX sub-models.
  • Added runtime scripts for inference (inference.py) and WMT24++ benchmarking (benchmark_wmt24pp.py).
  • Added recipe documentation and included the Gemma Terms of Use.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
google-translategemma-4b-it/README.md Documents export/inference/benchmark usage and model architecture.
google-translategemma-4b-it/LICENSE Adds Gemma Terms of Use text to the recipe directory.
google-translategemma-4b-it/inference.py CLI for text/image translation using ORT GenAI multimodal pipeline.
google-translategemma-4b-it/benchmark_wmt24pp.py CLI to run WMT24++ translations and score with COMET, with resume support.
google-translategemma-4b-it/builtin/optimize.py Orchestrates Olive exports and patches GenAI/processor/tokenizer configs.
google-translategemma-4b-it/builtin/user_script.py Provides PyTorch wrapper modules + IO/dummy-input helpers for Olive export.
google-translategemma-4b-it/builtin/cpu_and_mobile/text.json Olive config for INT4 RTN text decoder export.
google-translategemma-4b-it/builtin/cpu_and_mobile/embedding.json Olive config for embedding/scatter ONNX export.
google-translategemma-4b-it/builtin/cpu_and_mobile/vision.json Olive config for vision encoder ONNX export.
google-translategemma-4b-it/builtin/cpu_and_mobile_fp32/text.json Olive config for FP32 text decoder export.
google-translategemma-4b-it/builtin/cpu_and_mobile_fp32/embedding.json Olive config for FP32 embedding/scatter ONNX export.
google-translategemma-4b-it/builtin/cpu_and_mobile_fp32/vision.json Olive config for FP32 vision encoder ONNX export.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +33 to +35
# AWQ INT4 text decoder + FP32 vision/embedding (~6.4 GB total)
python optimize.py --config-dir cpu_and_mobile_awq

Comment on lines +135 to +136
translategemma-4b-it/
data/ # Test images
Comment on lines +173 to +174
parser.add_argument("--hf-model-dir", default=str(SCRIPT_DIR / "model"),
help="HF model path for tokenizer/chat template")
Comment on lines +117 to +120
def translate_batch(model, processor, stream, hf_tok, sources: list[str], target_lang: str, max_length: int) -> list[str]:
"""Translate a list of source texts one at a time."""
translations = []
for src in sources:
all_hypotheses.extend(hypotheses)
all_references.extend(references)

new_pairs = len(all_sources) // (args.max_segments or 1) if all_sources else 0
import io
from pathlib import Path

import numpy as np
Comment on lines +7 to +10
import math
import os
import glob

@tanzeel-amd
Copy link
Copy Markdown
Author

@microsoft-github-policy-service agree company="AMD"

@VishalX
Copy link
Copy Markdown

VishalX commented May 11, 2026

@xieofxie / @devang-ml pls review

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants