genkit-ai · chrisraygill · Apr 16, 2026 · Apr 10, 2026 · Apr 10, 2026
diff --git a/src/content/docs/docs/evaluation.mdx b/src/content/docs/docs/evaluation.mdx
@@ -1382,6 +1382,10 @@ Genkit includes a number of built-in evaluators, inspired by [RAGAS](https://doc
 - **Deep Equal** -- Checks if the generated output is deep-equal to the reference output
 - **JSONata** -- Checks if the generated output matches a JSONata expression provided in the reference field
 
+#### Python
+
+Use **`ai.define_evaluator()`** (see [Custom evaluators](#custom-evaluators)) for project-specific metrics, or install plugins that register evaluators with your `Genkit` instance—for example [Vertex AI evaluation metrics](/docs/integrations/vertex-ai#evaluation-metrics) via the Google GenAI plugin. Third-party packages may ship additional evaluators; follow each package’s install and registration instructions.
+
 ### Evaluator plugins
 
 Genkit supports additional evaluators through plugins, like the Vertex Rapid Evaluators, which you can access via the [VertexAI Plugin](/docs/integrations/vertex-ai#evaluation-metrics).
@@ -1433,7 +1437,7 @@ async def food_evaluator(
         test_case_id=datapoint.test_case_id or '',
         evaluation=Score(
             score=response.text,
-            status=EvalStatusEnum.PASS_,
+            status=EvalStatusEnum.PASS,
             details={'reasoning': f'LLM judged: {response.text}'},
         ),
     )

diff --git a/src/content/docs/docs/frameworks/fastapi.mdx b/src/content/docs/docs/frameworks/fastapi.mdx
@@ -25,13 +25,36 @@ cd my-genkit-app
 Install FastAPI and Genkit dependencies:
 
 ```bash
-uv add fastapi uvicorn genkit genkit-plugin-google-genai
+uv add fastapi uvicorn genkit genkit-plugin-google-genai genkit-plugin-fastapi
 ```
 
 ## Define Genkit flows and FastAPI routes
 
 To expose your Genkit flows as FastAPI endpoints, you should use standard FastAPI custom endpoints. For simplicity in this example, we'll create a single file (for example, `main.py`) to initialize Genkit, define your flows, and expose them.
 
+### FastAPI handler plugin
+
+To serve a flow with Genkit’s HTTP protocol (streaming chunks, structured errors, and compatibility with Genkit clients), decorate the flow with `genkit_fastapi_handler` and set `response_model=None` on the route so FastAPI does not try to validate or coerce Genkit’s JSON payloads:
+
+```python
+from fastapi import FastAPI
+from genkit import Genkit
+from genkit.plugins.fastapi import genkit_fastapi_handler
+
+app = FastAPI()
+ai = Genkit(...)
+
+@app.post('/chat', response_model=None)
+@genkit_fastapi_handler(ai)
+@ai.flow()
+async def chat(prompt: str) -> str:
+    return 'Hello world'
+```
+
+The handlers stack bottom-up: FastAPI route, then `genkit_fastapi_handler`, then `@ai.flow()`.
+
+The examples below show other patterns when you need custom request or response shapes.
+
 ### 1. Genkit Client Compatible API (Recommended)
 
 If you want your FastAPI server to work with the official Genkit Client SDKs for Web or Dart (see [Accessing flows from the client](/docs/client)), your endpoint must consume and produce messages that match Genkit's network protocol.
@@ -102,7 +125,7 @@ async def menu_suggestion_stream(payload: GenkitEnvelope):
     async def sse_generator():
         flow_stream = streaming_menu_flow.stream(theme)
 
-        # In current PyPI Genkit (0.5.1), stream() returns a tuple (stream, future).
+        # In current PyPI Genkit (0.5.2), stream() returns a tuple (stream, future).
         # We use flow_stream[0] if it is a tuple, otherwise we assume it is the stream itself.
         fs = flow_stream[0] if isinstance(flow_stream, tuple) else flow_stream
 

diff --git a/src/content/docs/docs/frameworks/flask.mdx b/src/content/docs/docs/frameworks/flask.mdx
@@ -274,6 +274,8 @@ async def admin_flow(action: str, ctx: ActionRunContext):
 
 ## Error Handling
 
+When you use `@genkit_flask_handler`, Genkit serializes error details in a form Flask can turn into HTTP responses (the plugin stack avoids returning raw Pydantic models where frameworks expect JSON-serializable payloads).
+
 ### Custom Error Responses
 
 ```python

diff --git a/src/content/docs/docs/integrations/anthropic.mdx b/src/content/docs/docs/integrations/anthropic.mdx
@@ -1044,11 +1044,11 @@ print(response.text)
 ## Configuration Options
 
 ```python
-from genkit.types import GenerationCommonConfig
+from genkit import ModelConfig
 
 response = await ai.generate(
     prompt='Your prompt here',
-    config=GenerationCommonConfig(
+    config=ModelConfig(
         temperature=0.7,
         max_output_tokens=1000,
     ),

diff --git a/src/content/docs/docs/integrations/google-genai.mdx b/src/content/docs/docs/integrations/google-genai.mdx
@@ -616,7 +616,7 @@ const response = await ai.generate({
 
 ### Available Models
 
-- `gemini-embedding-001` - Latest Gemini embedding model (3072 dimensions, customizable)
+- `gemini-embedding-001` — Default **3072** dimensions; set **`output_dimensionality`** in embed params (for example **768**, **1536**, or **3072**) when you want a shorter vector.
 
 ### Usage
 
@@ -627,6 +627,13 @@ const embeddings = await ai.embed({
 });
 
 console.log(embeddings);
+
+// Optional: request a shorter embedding (size indexes to match)
+const compact = await ai.embed({
+  embedder: googleAI.embedder('gemini-embedding-001'),
+  content: 'Machine learning models process data to make predictions.',
+  options: { outputDimensionality: 768 },
+});
 ```
 
 ## Image Models
@@ -1439,6 +1446,20 @@ ai = Genkit(
 )
 ```
 
+3. **Per-request**: Override the API key (or pass other provider-specific options) in the `config` passed to `generate()`:
+
+```python
+response = await ai.generate(
+    model='googleai/gemini-2.5-flash',
+    prompt='Your prompt here',
+    config={
+        'api_key': 'different-api-key',
+    },
+)
+```
+
+This is useful for multi-tenant apps or routing requests to different keys. Model `config` also accepts additional provider-specific fields without strict schema errors.
+
 ## Language Models
 
 ### Available Models
@@ -1469,6 +1490,15 @@ response = await ai.generate(
     prompt='Explain how neural networks learn in simple terms.',
 )
 print(response.text)
+
+# Non-text parts (images, audio, etc.)
+if response.message:
+    for media in response.message.media:
+        print(f'Media type: {media.content_type}')
+
+# Usage may include thinking and context-cache token counts on supported models
+print(response.usage.thoughts_tokens)
+print(response.usage.cached_content_tokens)
 ```
 
 ### Structured Output
@@ -1579,8 +1609,8 @@ response = await ai.generate(
 
 ### Available Models
 
-- `gemini-embedding-001` - Latest Gemini embedding model (3072 dimensions)
-- `text-embedding-004` - Text embedding model (768 dimensions)
+- `gemini-embedding-001` — Default **3072** dimensions; pass **`output_dimensionality`** in **`options`** on **`embed`** / **`embed_many`** (for example **768**, **1536**, or **3072**) when you want a shorter vector.
+- `text-embedding-004` — **768** dimensions in typical use.
 
 ### Usage
 
@@ -1590,6 +1620,14 @@ embeddings = await ai.embed(
     content='Machine learning models process data to make predictions.',
 )
 print(embeddings)
+
+# gemini-embedding-001: default 3072, or request a shorter embedding (size indexes to match)
+gemini_embeddings = await ai.embed(
+    embedder='googleai/gemini-embedding-001',
+    content='Machine learning models process data to make predictions.',
+    options={'output_dimensionality': 768},
+)
+print(gemini_embeddings)
 ```
 
 ## Image Models
@@ -1715,6 +1753,27 @@ if response.message and response.message.content:
 - `pitch`: Voice pitch (-20.0 to 20.0)
 - `volume_gain_db`: Volume (-96.0 to 16.0)
 
+## Context caching
+
+Gemini 2.5 and newer models automatically cache common content prefixes (minimum 1024 tokens for Flash, 2048 for Pro), providing a significant token discount on cached tokens.
+
+```python
+# Structure prompts with consistent content at the beginning
+base_context = 'You are a helpful cook... (large context) ...' * 50
+
+# First request — prefix may be cached by Gemini
+await ai.generate(
+    model='googleai/gemini-2.5-flash',
+    prompt=f'{base_context}\n\nTask 1...',
+)
+
+# Second request with the same prefix — eligible for a cache hit
+await ai.generate(
+    model='googleai/gemini-2.5-flash',
+    prompt=f'{base_context}\n\nTask 2...',
+)
+```
+
 ## Next Steps
 
 - Learn about [generating content](/docs/models) to understand how to use these models effectively

diff --git a/src/content/docs/docs/integrations/vertex-ai.mdx b/src/content/docs/docs/integrations/vertex-ai.mdx
@@ -329,7 +329,8 @@ Use Vertex AI Vector Search for enterprise-grade vector operations:
 
 1. Create a Vector Search index in the [Google Cloud Console](https://console.cloud.google.com/vertex-ai/matching-engine/indexes)
 2. Configure dimensions based on your embedding model:
-   - `gemini-embedding-001`: 768 dimensions
+   - `gemini-embedding-001` / `gemini-embedding-2-preview`: **default 3072** dimensions; you can set **`output_dimensionality`** on embed calls (for example **768**, **1536**, or **3072** per Google). Size the index to the length you actually use.
+   - `text-embedding-005`: 768 dimensions
    - `text-multilingual-embedding-002`: 768 dimensions
    - `multimodalEmbedding001`: 128, 256, 512, or 1408 dimensions
 3. Deploy the index to a standard endpoint
@@ -566,13 +567,21 @@ ai = Genkit(
 - **Vertex AI Express Mode:** A streamlined way to try out many Vertex AI features using just an API key, without needing to set up billing or full project configurations. This is ideal for quick experimentation and has generous free tier quotas. [Learn More about Express Mode](https://cloud.google.com/vertex-ai/generative-ai/docs/start/express-mode/overview).
 
 ```python
-# Using Vertex AI Express Mode (Easy to start, some limitations)
+# Using Vertex AI Express Mode (easy to start; some limitations).
 # Get an API key from the Vertex AI Studio Express Mode setup.
 import os
-VertexAI(api_key=os.environ.get('VERTEX_EXPRESS_API_KEY'))
+
+from genkit import Genkit
+from genkit.plugins.google_genai import VertexAI
+
+ai = Genkit(
+    plugins=[
+        VertexAI(api_key=os.environ['VERTEX_EXPRESS_API_KEY']),
+    ],
+)
 ```
 
-_Note: When using Express Mode, you do not provide `project` and `location` in the plugin config._
+_Note: When using Express Mode, you typically omit `project` and `location` on `VertexAI` (see the [Express Mode docs](https://cloud.google.com/vertex-ai/generative-ai/docs/start/express-mode/overview))._
 
 ### Basic Usage
 
@@ -601,6 +610,20 @@ embeddings = await ai.embed(
 )
 ```
 
+:::note[Embedding vector sizes]
+Size Vector Search indexes (and any application-side buffers) to the **length of vectors your app actually produces**. **`gemini-embedding-001`** and **`gemini-embedding-2-preview`** default to **3072** dimensions; pass **`output_dimensionality`** in **`options`** on **`embed`** / **`embed_many`** to use a shorter vector (Google documents common choices such as **768**, **1536**, or **3072**). Example:
+
+```python
+embeddings = await ai.embed(
+    embedder='vertexai/gemini-embedding-001',
+    content='Your text here.',
+    options={'output_dimensionality': 768},
+)
+```
+
+**`vertexai/text-embedding-005`** and **`vertexai/text-multilingual-embedding-002`** typically use **768** dimensions. See [Embedding models](/docs/integrations/google-genai#embedding-models) and the [Gemini embedding documentation](https://ai.google.dev/gemini-api/docs/embeddings).
+:::
+
 ### Image Generation (Imagen)
 
 ```python
@@ -799,71 +822,66 @@ llm_response = await ai.generate(
 
 ### Model Garden Integration
 
-Access third-party models through Vertex AI Model Garden using the separate `vertex_ai` plugin:
+Access third-party models through Vertex AI Model Garden using the `genkit-plugin-vertex-ai` package (`ModelGardenPlugin`). The plugin requires a Google Cloud project ID: pass `project_id`, or set `GCLOUD_PROJECT` / `GOOGLE_CLOUD_PROJECT`. Model IDs must use the publisher-qualified names shown in the Google Cloud console (for example `meta/...` for Llama, `anthropic/...` for Claude on Vertex). Pass them to `model_garden_name()` so Genkit resolves the action as `modelgarden/<model-id>`.
 
 **Installation:**
 
 ```bash
 uv add genkit-plugin-vertex-ai
 ```
 
-#### Claude 3 Models
+#### Llama (Meta) models
 
 ```python
 from genkit import Genkit
 from genkit.plugins.vertex_ai import ModelGardenPlugin
+from genkit.plugins.vertex_ai.model_garden import model_garden_name
 
 ai = Genkit(
     plugins=[
         ModelGardenPlugin(
+            project_id='my-gcp-project',
             location='us-central1',
-            models=['claude-3-haiku', 'claude-3-sonnet', 'claude-3-opus'],
         ),
     ],
 )
 
 response = await ai.generate(
-    model='claude-3-sonnet',
-    prompt='What should I do when I visit Melbourne?',
+    model=model_garden_name('meta/llama-3.1-405b-instruct-maas'),
+    prompt='Write a function that adds two numbers together',
 )
 ```
 
-#### Llama 3.1 405b
+Another identifier shipped in the Python SDK registry is `meta/llama-3.2-90b-vision-instruct-maas`. Always confirm the exact model resource name for your project in the [Vertex AI Model Garden](https://cloud.google.com/vertex-ai/generative-ai/docs/model-garden/explore-models) console.
 
-```python
-ai = Genkit(
-    plugins=[
-        ModelGardenPlugin(
-            location='us-central1',
-            models=['llama3-405b-instruct-maas'],
-        ),
-    ],
-)
-
-response = await ai.generate(
-    model='llama3-405b-instruct-maas',
-    prompt='Write a function that adds two numbers together',
-)
-```
+#### Anthropic (Claude) models on Vertex
 
-#### Mistral Models
+Claude on Vertex uses `anthropic/...` model IDs. Version strings often include dates or `@` — use the exact ID from the console:
 
 ```python
+from genkit import Genkit
+from genkit.plugins.vertex_ai import ModelGardenPlugin
+from genkit.plugins.vertex_ai.model_garden import model_garden_name
+
 ai = Genkit(
     plugins=[
         ModelGardenPlugin(
+            project_id='my-gcp-project',
             location='us-central1',
-            models=['mistral-large', 'mistral-small'],
         ),
     ],
 )
 
 response = await ai.generate(
-    model='mistral-large',
-    prompt='Explain quantum computing',
+    model=model_garden_name('anthropic/claude-3-5-haiku-20241022'),
+    prompt='What should I do when I visit Melbourne?',
 )
 ```
 
+#### Other OpenAI-compatible Model Garden endpoints
+
+For additional publishers (for example Mistral), use the same `model_garden_name()` pattern with the full Model Garden model ID. Models not in the built-in registry still resolve via the generic OpenAI-compatible Model Garden path.
+
 Vertex AI provides access to various third-party models through Model Garden. Consult the [Vertex AI Model Garden documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/model-garden/explore-models) for the full list of supported models and their capabilities.
 
 ### Evaluation Metrics