Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion src/content/docs/docs/evaluation.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -1382,6 +1382,10 @@ Genkit includes a number of built-in evaluators, inspired by [RAGAS](https://doc
- **Deep Equal** -- Checks if the generated output is deep-equal to the reference output
- **JSONata** -- Checks if the generated output matches a JSONata expression provided in the reference field

#### Python

Use **`ai.define_evaluator()`** (see [Custom evaluators](#custom-evaluators)) for project-specific metrics, or install plugins that register evaluators with your `Genkit` instance—for example [Vertex AI evaluation metrics](/docs/integrations/vertex-ai#evaluation-metrics) via the Google GenAI plugin. Third-party packages may ship additional evaluators; follow each package’s install and registration instructions.

### Evaluator plugins

Genkit supports additional evaluators through plugins, like the Vertex Rapid Evaluators, which you can access via the [VertexAI Plugin](/docs/integrations/vertex-ai#evaluation-metrics).
Expand Down Expand Up @@ -1433,7 +1437,7 @@ async def food_evaluator(
test_case_id=datapoint.test_case_id or '',
evaluation=Score(
score=response.text,
status=EvalStatusEnum.PASS_,
status=EvalStatusEnum.PASS,
details={'reasoning': f'LLM judged: {response.text}'},
),
)
Expand Down
27 changes: 25 additions & 2 deletions src/content/docs/docs/frameworks/fastapi.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -25,13 +25,36 @@ cd my-genkit-app
Install FastAPI and Genkit dependencies:

```bash
uv add fastapi uvicorn genkit genkit-plugin-google-genai
uv add fastapi uvicorn genkit genkit-plugin-google-genai genkit-plugin-fastapi
```

## Define Genkit flows and FastAPI routes

To expose your Genkit flows as FastAPI endpoints, you should use standard FastAPI custom endpoints. For simplicity in this example, we'll create a single file (for example, `main.py`) to initialize Genkit, define your flows, and expose them.

### FastAPI handler plugin

To serve a flow with Genkit’s HTTP protocol (streaming chunks, structured errors, and compatibility with Genkit clients), decorate the flow with `genkit_fastapi_handler` and set `response_model=None` on the route so FastAPI does not try to validate or coerce Genkit’s JSON payloads:

```python
from fastapi import FastAPI
from genkit import Genkit
from genkit.plugins.fastapi import genkit_fastapi_handler

app = FastAPI()
ai = Genkit(...)

@app.post('/chat', response_model=None)
@genkit_fastapi_handler(ai)
@ai.flow()
async def chat(prompt: str) -> str:
return 'Hello world'
```

The handlers stack bottom-up: FastAPI route, then `genkit_fastapi_handler`, then `@ai.flow()`.

The examples below show other patterns when you need custom request or response shapes.

### 1. Genkit Client Compatible API (Recommended)

If you want your FastAPI server to work with the official Genkit Client SDKs for Web or Dart (see [Accessing flows from the client](/docs/client)), your endpoint must consume and produce messages that match Genkit's network protocol.
Expand Down Expand Up @@ -102,7 +125,7 @@ async def menu_suggestion_stream(payload: GenkitEnvelope):
async def sse_generator():
flow_stream = streaming_menu_flow.stream(theme)

# In current PyPI Genkit (0.5.1), stream() returns a tuple (stream, future).
# In current PyPI Genkit (0.5.2), stream() returns a tuple (stream, future).
# We use flow_stream[0] if it is a tuple, otherwise we assume it is the stream itself.
fs = flow_stream[0] if isinstance(flow_stream, tuple) else flow_stream

Expand Down
2 changes: 2 additions & 0 deletions src/content/docs/docs/frameworks/flask.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -274,6 +274,8 @@ async def admin_flow(action: str, ctx: ActionRunContext):

## Error Handling

When you use `@genkit_flask_handler`, Genkit serializes error details in a form Flask can turn into HTTP responses (the plugin stack avoids returning raw Pydantic models where frameworks expect JSON-serializable payloads).

### Custom Error Responses

```python
Expand Down
4 changes: 2 additions & 2 deletions src/content/docs/docs/integrations/anthropic.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -1044,11 +1044,11 @@ print(response.text)
## Configuration Options

```python
from genkit.types import GenerationCommonConfig
from genkit import ModelConfig

response = await ai.generate(
prompt='Your prompt here',
config=GenerationCommonConfig(
config=ModelConfig(
temperature=0.7,
max_output_tokens=1000,
),
Expand Down
65 changes: 62 additions & 3 deletions src/content/docs/docs/integrations/google-genai.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -616,7 +616,7 @@ const response = await ai.generate({

### Available Models

- `gemini-embedding-001` - Latest Gemini embedding model (3072 dimensions, customizable)
- `gemini-embedding-001` — Default **3072** dimensions; set **`output_dimensionality`** in embed params (for example **768**, **1536**, or **3072**) when you want a shorter vector.

### Usage

Expand All @@ -627,6 +627,13 @@ const embeddings = await ai.embed({
});

console.log(embeddings);

// Optional: request a shorter embedding (size indexes to match)
const compact = await ai.embed({
embedder: googleAI.embedder('gemini-embedding-001'),
content: 'Machine learning models process data to make predictions.',
options: { outputDimensionality: 768 },
});
```

## Image Models
Expand Down Expand Up @@ -1439,6 +1446,20 @@ ai = Genkit(
)
```

3. **Per-request**: Override the API key (or pass other provider-specific options) in the `config` passed to `generate()`:

```python
response = await ai.generate(
model='googleai/gemini-2.5-flash',
prompt='Your prompt here',
config={
'api_key': 'different-api-key',
},
)
```

This is useful for multi-tenant apps or routing requests to different keys. Model `config` also accepts additional provider-specific fields without strict schema errors.

## Language Models

### Available Models
Expand Down Expand Up @@ -1469,6 +1490,15 @@ response = await ai.generate(
prompt='Explain how neural networks learn in simple terms.',
)
print(response.text)

# Non-text parts (images, audio, etc.)
if response.message:
for media in response.message.media:
print(f'Media type: {media.content_type}')

# Usage may include thinking and context-cache token counts on supported models
print(response.usage.thoughts_tokens)
Comment thread
huangjeff5 marked this conversation as resolved.
print(response.usage.cached_content_tokens)
```

### Structured Output
Expand Down Expand Up @@ -1579,8 +1609,8 @@ response = await ai.generate(

### Available Models

- `gemini-embedding-001` - Latest Gemini embedding model (3072 dimensions)
- `text-embedding-004` - Text embedding model (768 dimensions)
- `gemini-embedding-001` — Default **3072** dimensions; pass **`output_dimensionality`** in **`options`** on **`embed`** / **`embed_many`** (for example **768**, **1536**, or **3072**) when you want a shorter vector.
- `text-embedding-004` — **768** dimensions in typical use.

### Usage

Expand All @@ -1590,6 +1620,14 @@ embeddings = await ai.embed(
content='Machine learning models process data to make predictions.',
)
print(embeddings)

# gemini-embedding-001: default 3072, or request a shorter embedding (size indexes to match)
gemini_embeddings = await ai.embed(
embedder='googleai/gemini-embedding-001',
content='Machine learning models process data to make predictions.',
options={'output_dimensionality': 768},
)
print(gemini_embeddings)
```

## Image Models
Expand Down Expand Up @@ -1715,6 +1753,27 @@ if response.message and response.message.content:
- `pitch`: Voice pitch (-20.0 to 20.0)
- `volume_gain_db`: Volume (-96.0 to 16.0)

## Context caching

Gemini 2.5 and newer models automatically cache common content prefixes (minimum 1024 tokens for Flash, 2048 for Pro), providing a significant token discount on cached tokens.

```python
# Structure prompts with consistent content at the beginning
base_context = 'You are a helpful cook... (large context) ...' * 50

# First request — prefix may be cached by Gemini
await ai.generate(
model='googleai/gemini-2.5-flash',
prompt=f'{base_context}\n\nTask 1...',
)

# Second request with the same prefix — eligible for a cache hit
await ai.generate(
model='googleai/gemini-2.5-flash',
prompt=f'{base_context}\n\nTask 2...',
)
```

## Next Steps

- Learn about [generating content](/docs/models) to understand how to use these models effectively
Expand Down
76 changes: 47 additions & 29 deletions src/content/docs/docs/integrations/vertex-ai.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -329,7 +329,8 @@ Use Vertex AI Vector Search for enterprise-grade vector operations:

1. Create a Vector Search index in the [Google Cloud Console](https://console.cloud.google.com/vertex-ai/matching-engine/indexes)
2. Configure dimensions based on your embedding model:
- `gemini-embedding-001`: 768 dimensions
- `gemini-embedding-001` / `gemini-embedding-2-preview`: **default 3072** dimensions; you can set **`output_dimensionality`** on embed calls (for example **768**, **1536**, or **3072** per Google). Size the index to the length you actually use.
- `text-embedding-005`: 768 dimensions
- `text-multilingual-embedding-002`: 768 dimensions
- `multimodalEmbedding001`: 128, 256, 512, or 1408 dimensions
3. Deploy the index to a standard endpoint
Expand Down Expand Up @@ -566,13 +567,21 @@ ai = Genkit(
- **Vertex AI Express Mode:** A streamlined way to try out many Vertex AI features using just an API key, without needing to set up billing or full project configurations. This is ideal for quick experimentation and has generous free tier quotas. [Learn More about Express Mode](https://cloud.google.com/vertex-ai/generative-ai/docs/start/express-mode/overview).

```python
# Using Vertex AI Express Mode (Easy to start, some limitations)
# Using Vertex AI Express Mode (easy to start; some limitations).
# Get an API key from the Vertex AI Studio Express Mode setup.
import os
VertexAI(api_key=os.environ.get('VERTEX_EXPRESS_API_KEY'))

from genkit import Genkit
from genkit.plugins.google_genai import VertexAI

ai = Genkit(
plugins=[
VertexAI(api_key=os.environ['VERTEX_EXPRESS_API_KEY']),
Comment thread
huangjeff5 marked this conversation as resolved.
],
)
```

_Note: When using Express Mode, you do not provide `project` and `location` in the plugin config._
_Note: When using Express Mode, you typically omit `project` and `location` on `VertexAI` (see the [Express Mode docs](https://cloud.google.com/vertex-ai/generative-ai/docs/start/express-mode/overview))._

### Basic Usage

Expand Down Expand Up @@ -601,6 +610,20 @@ embeddings = await ai.embed(
)
```

:::note[Embedding vector sizes]
Size Vector Search indexes (and any application-side buffers) to the **length of vectors your app actually produces**. **`gemini-embedding-001`** and **`gemini-embedding-2-preview`** default to **3072** dimensions; pass **`output_dimensionality`** in **`options`** on **`embed`** / **`embed_many`** to use a shorter vector (Google documents common choices such as **768**, **1536**, or **3072**). Example:

```python
embeddings = await ai.embed(
embedder='vertexai/gemini-embedding-001',
content='Your text here.',
options={'output_dimensionality': 768},
)
```

**`vertexai/text-embedding-005`** and **`vertexai/text-multilingual-embedding-002`** typically use **768** dimensions. See [Embedding models](/docs/integrations/google-genai#embedding-models) and the [Gemini embedding documentation](https://ai.google.dev/gemini-api/docs/embeddings).
:::

### Image Generation (Imagen)

```python
Expand Down Expand Up @@ -799,71 +822,66 @@ llm_response = await ai.generate(

### Model Garden Integration

Access third-party models through Vertex AI Model Garden using the separate `vertex_ai` plugin:
Access third-party models through Vertex AI Model Garden using the `genkit-plugin-vertex-ai` package (`ModelGardenPlugin`). The plugin requires a Google Cloud project ID: pass `project_id`, or set `GCLOUD_PROJECT` / `GOOGLE_CLOUD_PROJECT`. Model IDs must use the publisher-qualified names shown in the Google Cloud console (for example `meta/...` for Llama, `anthropic/...` for Claude on Vertex). Pass them to `model_garden_name()` so Genkit resolves the action as `modelgarden/<model-id>`.

**Installation:**

```bash
uv add genkit-plugin-vertex-ai
```

#### Claude 3 Models
#### Llama (Meta) models

```python
from genkit import Genkit
from genkit.plugins.vertex_ai import ModelGardenPlugin
from genkit.plugins.vertex_ai.model_garden import model_garden_name

ai = Genkit(
plugins=[
ModelGardenPlugin(
project_id='my-gcp-project',
location='us-central1',
models=['claude-3-haiku', 'claude-3-sonnet', 'claude-3-opus'],
),
],
)

response = await ai.generate(
model='claude-3-sonnet',
prompt='What should I do when I visit Melbourne?',
model=model_garden_name('meta/llama-3.1-405b-instruct-maas'),
prompt='Write a function that adds two numbers together',
)
```

#### Llama 3.1 405b
Another identifier shipped in the Python SDK registry is `meta/llama-3.2-90b-vision-instruct-maas`. Always confirm the exact model resource name for your project in the [Vertex AI Model Garden](https://cloud.google.com/vertex-ai/generative-ai/docs/model-garden/explore-models) console.

```python
ai = Genkit(
plugins=[
ModelGardenPlugin(
location='us-central1',
models=['llama3-405b-instruct-maas'],
),
],
)

response = await ai.generate(
model='llama3-405b-instruct-maas',
prompt='Write a function that adds two numbers together',
)
```
#### Anthropic (Claude) models on Vertex

#### Mistral Models
Claude on Vertex uses `anthropic/...` model IDs. Version strings often include dates or `@` — use the exact ID from the console:

```python
from genkit import Genkit
from genkit.plugins.vertex_ai import ModelGardenPlugin
from genkit.plugins.vertex_ai.model_garden import model_garden_name

ai = Genkit(
plugins=[
ModelGardenPlugin(
project_id='my-gcp-project',
location='us-central1',
models=['mistral-large', 'mistral-small'],
),
],
)

response = await ai.generate(
model='mistral-large',
prompt='Explain quantum computing',
model=model_garden_name('anthropic/claude-3-5-haiku-20241022'),
prompt='What should I do when I visit Melbourne?',
)
```

#### Other OpenAI-compatible Model Garden endpoints

For additional publishers (for example Mistral), use the same `model_garden_name()` pattern with the full Model Garden model ID. Models not in the built-in registry still resolve via the generic OpenAI-compatible Model Garden path.

Vertex AI provides access to various third-party models through Model Garden. Consult the [Vertex AI Model Garden documentation](https://cloud.google.com/vertex-ai/generative-ai/docs/model-garden/explore-models) for the full list of supported models and their capabilities.

### Evaluation Metrics
Expand Down
Loading