feat: Support Gemini realtime text interruptions #998

tillkolter · 2026-01-25T02:07:40Z

Description

This PR refines Gemini Realtime interrupt handling by sending realtime_input text when an interrupt is active, while always keeping full user turns in content.

The goal is to preserve context fidelity while still triggering low‑latency interruption behavior.

In contrast to OpenAI's well defined message/event schema to support interruption, truncation and request cancelation, Gemini is falls short of providing an explicit interface to manage the state of the request. This solution tries to simulate the internal behavior of the OpenAI plugin to achieve similar behaviour for text interruptions.

Changes Made

Gate realtime_input text on an explicit interrupt flag instead of generation state, so interrupts remain responsive even after _done is set.
Always include complete turns in content (no stripping), so the model’s context stays consistent.
Introduce a lightweight interrupt flag to drive the “send realtime_input text” decision and clear it after use.

Pre-Review Checklist

Build passes: All builds (lint, typecheck, tests) pass locally
AI-generated code reviewed: Removed unnecessary comments and ensured code quality
Changes explained: All changes are properly documented and justified above
Scope appropriate: All changes relate to the PR title, or explanations provided for why they're included

Additional Notes

This keeps Gemini’s interruption responsiveness while avoiding the context drift caused by stripping user turns from content.

Summary by CodeRabbit

New Features
- Real-time sessions now forward user text (alongside media) so live text input is processed during sessions.
Bug Fixes
- Improved interrupt/preemption so in-progress outputs are finalized or discarded reliably to avoid spillover.
- Ensured accumulated user turns are sent during interrupts.
- Prevented spurious new generations while output is being discarded and reset session state cleanly on close.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

changeset-bot · 2026-01-25T02:07:44Z

⚠️ No Changeset found

Latest commit: 266cbd6

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

coderabbitai · 2026-01-25T02:07:59Z

📝 Walkthrough

Walkthrough

Adds interrupt-driven realtime input and preemption handling to RealtimeSession: introduces private flags to defer or early-complete generations, forwards accumulated user text as realtime_input, and discards/gates server outputs during early-completion to avoid spurious new generations.

Changes

Cohort / File(s)	Summary
Realtime session interrupt & preemption `plugins/google/src/beta/realtime/realtime_api.ts`	Added private flags `pendingInterruptText` and `earlyCompletionPending`, plus `generationHasOutput` helper. `interrupt` sets flags and may finalize an active generation for preemption. `updateChatCtx` forwards accumulated user text as `realtime_input` when pending. Server `modelTurn`/`outputTranscription` handling is gated/discarded while early-completion is pending; `isNewGeneration`/`onReceiveMessage` short-circuit new generation logic when preemption is active.

Sequence Diagram

sequenceDiagram
    participant User as User
    participant Session as RealtimeSession
    participant Gen as Generation
    participant Server as Server

    User->>Session: send interrupt (with text)
    activate Session
    Session->>Session: set pendingInterruptText or earlyCompletionPending
    alt active generation exists and generationHasOutput(gen)
        Session->>Gen: finalize current generation (early completion / preempt)
    end
    Session->>Server: emit realtime_input (accumulated user text) if pending
    deactivate Session

    Server->>Session: incoming realtime_input (text/media)
    activate Session
    Session->>Session: treat text as realtime input / forward to model
    deactivate Session

    Server->>Session: modelTurn / outputTranscription
    activate Session
    alt earlyCompletionPending == true
        Session->>Session: discard server output
    else
        Session->>Session: process and append server output
    end
    deactivate Session

    Server->>Session: turnComplete / generationComplete
    Session->>Session: reset earlyCompletionPending / discard state

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

fix: Fix Google realtime generation lifecycle to fix "generateReply timed out waiting for generation_created" event (parity with Python plugin) #989: Modifies the same plugins/google/src/beta/realtime/realtime_api.ts and touches generation lifecycle, interrupt/preemption gating and helpers.

Suggested reviewers

toubatbrian
theomonnom
lukasIO

Poem

🐇
I tuck your phrases in my paw,
Pause the stream, then push and draw,
When interrupts hop into sight,
I buffer, nudge, and send them right,
Hopping code, swift and light.

🚥 Pre-merge checks | ✅ 3

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title 'feat: Support Gemini realtime text interruptions' clearly and specifically summarizes the main change: adding text interruption support for Gemini Realtime API.
Description check	✅ Passed	The PR description is mostly complete with clear Description and Changes Made sections, but the Pre-Review Checklist is only partially marked and Testing section is missing entirely.
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing touches

📝 Generate docstrings

📜 Recent review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4c52aab and 266cbd6.

📒 Files selected for processing (1)

plugins/google/src/beta/realtime/realtime_api.ts

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'

Files:

plugins/google/src/beta/realtime/realtime_api.ts

**/*.{ts,tsx}?(test|example|spec)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

When testing inference LLM, always use full model names from agents/src/inference/models.ts (e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')

Files:

plugins/google/src/beta/realtime/realtime_api.ts

**/*.{ts,tsx}?(test|example)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Initialize logger before using any LLM functionality with initializeLogger({ pretty: true }) from '@livekit/agents'

Files:

plugins/google/src/beta/realtime/realtime_api.ts

🧬 Code graph analysis (1)

plugins/google/src/beta/realtime/realtime_api.ts (2)

plugins/openai/src/realtime/api_proto.ts (1)

Content (181-186)

plugins/openai/src/realtime/realtime_model.ts (1)

session (305-307)

🪛 Biome (2.1.2)

plugins/google/src/beta/realtime/realtime_api.ts

[error] 944-944: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

🔇 Additional comments (8)

plugins/google/src/beta/realtime/realtime_api.ts (8)

412-413: LGTM!

The new interrupt coordination flags are well-named and appropriately initialized. Good to see them properly scoped as private class members.

460-476: LGTM!

Good placement of flag resets after session close. This ensures both interrupt-related flags are cleared when the session lifecycle resets, preventing stuck states across restarts.

574-593: LGTM!

The flag is correctly cleared only after successfully sending realtime text (line 590 inside the if (text) block), addressing the previous review concern about premature flag clearing when no user text exists.

744-746: LGTM!

The helper correctly identifies whether a generation has produced any output by checking both text content and the first token timestamp (which is set when audio data arrives).

748-764: LGTM!

The interrupt flow is well-designed:

Respects NO_INTERRUPTION configuration

Sets pendingInterruptText to queue realtime text emission

Triggers early completion only when generation has produced output

Always invokes startUserActivity() to signal the activity to Gemini

The condition checks and early returns in markCurrentGenerationDone provide safe guards against race conditions.

1338-1410: LGTM!

The discard logic correctly:

Gates modelTurn and outputTranscription processing behind discardOutput

Preserves inputTranscription processing (user speech capture continues during interrupts)

This ensures stale model output from interrupted generations is dropped while user input is still captured.

1420-1431: LGTM!

The logic correctly:

Guards against double-finalization by checking !earlyCompletionPending before calling markCurrentGenerationDone

Clears earlyCompletionPending only after receiving the completion signal

Documents the protocol ordering assumption in the comment

This aligns well with the protocol behavior where Gemini sends turnComplete/generationComplete before new generation content.

1586-1603: LGTM!

The early return when earlyCompletionPending is true is essential to prevent spurious new generations during the interrupt window. This ensures we wait for the interrupted generation's completion signal before allowing new generation creation.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Fix all issues with AI agents

In `@plugins/google/src/beta/realtime/realtime_api.ts`:
- Around line 573-592: The code clears this.pendingInterruptText unconditionally
even when no user text is found; change the logic in the block that iterates
over turns (types.Content) so you only clear this.pendingInterruptText after you
have actually sent at least one realtime_input via this.sendClientEvent.
Concretely, add a local boolean (e.g., sentRealtime) before the loop, set it to
true whenever you call this.sendClientEvent({ type: 'realtime_input', ... }),
and after the loop set this.pendingInterruptText = false only if sentRealtime is
true (leave it true otherwise) so future user text can still trigger
realtime_input.
- Around line 934-943: The switch case handling realtime_input declares const {
mediaChunks, activityStart, activityEnd, text } without a block which triggers
noSwitchDeclarations; wrap the case body in a block (add { ... } around the
existing statements) so the const lives inside a lexical block and keep the
existing awaits (session.sendRealtimeInput calls for mediaChunks, text, and
activityStart) unchanged; locate the case labeled realtime_input and enclose the
current lines starting with "const { mediaChunks, activityStart, activityEnd,
text } = msg.value;" through the activityStart await in a new { } block.
- Around line 1576-1578: The discardServerOutput flag can cause new-generation
content to be dropped if protocol ordering allows content for the next
generation to arrive before generationComplete/turnComplete; update the logic to
be robust by either (1) adding a clear explanatory comment near
discardServerOutput and the isNewGeneration() call documenting the protocol
ordering guarantee expected, or (2) proactively clearing discardServerOutput
when serverContent.interrupted is observed (in the same handler that checks
serverContent), or (3) modifying isNewGeneration() to treat a differing
generationId as a new generation even if discardServerOutput is true so
new-generation messages are accepted; locate and change behavior around
isNewGeneration(), discardServerOutput, and the serverContent handling to
implement one of these options.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 3544bfa and 627ebb0.

📒 Files selected for processing (1)

plugins/google/src/beta/realtime/realtime_api.ts

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'

Files:

plugins/google/src/beta/realtime/realtime_api.ts

**/*.{ts,tsx}?(test|example|spec)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

When testing inference LLM, always use full model names from agents/src/inference/models.ts (e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')

Files:

plugins/google/src/beta/realtime/realtime_api.ts

**/*.{ts,tsx}?(test|example)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Initialize logger before using any LLM functionality with initializeLogger({ pretty: true }) from '@livekit/agents'

Files:

plugins/google/src/beta/realtime/realtime_api.ts

🧬 Code graph analysis (1)

plugins/google/src/beta/realtime/realtime_api.ts (2)

plugins/openai/src/realtime/api_proto.ts (1)

Content (181-186)

plugins/openai/src/realtime/realtime_model.ts (1)

session (305-307)

🪛 Biome (2.1.2)

plugins/google/src/beta/realtime/realtime_api.ts

[error] 934-934: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

🔇 Additional comments (3)

plugins/google/src/beta/realtime/realtime_api.ts (3)

413-414: Clean state flags for interrupt handling.
The added flags make the interrupt/discard state explicit and easy to reason about.

748-752: Interrupt flow sequencing looks good.
Marking the current generation done before starting activity keeps preemption consistent.

1329-1420: Output-discard gating is solid.
Suppressing modelTurn/outputTranscription while discarding avoids stale playout and text leakage.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

plugins/google/src/beta/realtime/realtime_api.ts

tillkolter · 2026-01-25T12:46:01Z

found some issues with the state machine. will re-open once fixed.

chore: Keep pendingInterruptText true until a realtime text is actually emitted. chore: add comment to explaion protocol ordering assumption fix(google): guard realtime interrupt completion

coderabbitai

Actionable comments posted: 1

🤖 Fix all issues with AI agents

In `@plugins/google/src/beta/realtime/realtime_api.ts`:
- Around line 1418-1429: The earlyCompletionPending flag only gets cleared in
the generation-handling path (around markCurrentGenerationDone) and can remain
true if the session closes or restarts, causing new server content to be
misrouted; update the session lifecycle handlers (e.g., closeActiveSession,
onclose, and markRestartNeeded) to explicitly set this. Locate the class field
earlyCompletionPending and add a reset (earlyCompletionPending = false) in those
lifecycle methods so the flag is cleared whenever a session is closed or
restarted, ensuring new generations are processed normally. Ensure any
unit/cleanup paths that abort a session also reset earlyCompletionPending to
avoid a stuck state.

📜 Review details

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 135305a and 0d6259c.

📒 Files selected for processing (1)

plugins/google/src/beta/realtime/realtime_api.ts

🧰 Additional context used

📓 Path-based instructions (3)

**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Add SPDX-FileCopyrightText and SPDX-License-Identifier headers to all newly added files with '// SPDX-FileCopyrightText: 2025 LiveKit, Inc.' and '// SPDX-License-Identifier: Apache-2.0'

Files:

plugins/google/src/beta/realtime/realtime_api.ts

**/*.{ts,tsx}?(test|example|spec)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

When testing inference LLM, always use full model names from agents/src/inference/models.ts (e.g., 'openai/gpt-4o-mini' instead of 'gpt-4o-mini')

Files:

plugins/google/src/beta/realtime/realtime_api.ts

**/*.{ts,tsx}?(test|example)

📄 CodeRabbit inference engine (.cursor/rules/agent-core.mdc)

Initialize logger before using any LLM functionality with initializeLogger({ pretty: true }) from '@livekit/agents'

Files:

plugins/google/src/beta/realtime/realtime_api.ts

🪛 Biome (2.1.2)

plugins/google/src/beta/realtime/realtime_api.ts

[error] 942-942: Other switch clauses can erroneously access this declaration.
Wrap the declaration in a block to restrict its access to the switch clause.

The declaration is defined in this switch clause:

Safe fix: Wrap the declaration in a block.

(lint/correctness/noSwitchDeclarations)

🔇 Additional comments (2)

plugins/google/src/beta/realtime/realtime_api.ts (2)

572-599: Interrupt-driven realtime text emission looks good.

The gating on pendingInterruptText while always sending full turns preserves context and keeps interruption behavior responsive.

742-760: Interrupt flow + output detection are consistent.

The helper and interrupt branching read cleanly and align with the preemption intent.

_{✏️ Tip: You can disable this entire section by setting review_details to false in your review settings.}

plugins/google/src/beta/realtime/realtime_api.ts

toubatbrian · 2026-01-27T20:35:35Z

Hey @tillkolter, let me know when the PR is ready to review!

tillkolter changed the title ~~Support Gemini realtime text interruptions~~ feat: Support Gemini realtime text interruptions Jan 25, 2026

coderabbitai bot reviewed Jan 25, 2026

View reviewed changes

plugins/google/src/beta/realtime/realtime_api.ts Show resolved Hide resolved

plugins/google/src/beta/realtime/realtime_api.ts Show resolved Hide resolved

plugins/google/src/beta/realtime/realtime_api.ts Outdated Show resolved Hide resolved

tillkolter marked this pull request as draft January 25, 2026 12:07

tillkolter closed this Jan 25, 2026

feat(google): support realtime text interruptions

0d6259c

chore: Keep pendingInterruptText true until a realtime text is actually emitted. chore: add comment to explaion protocol ordering assumption fix(google): guard realtime interrupt completion

tillkolter reopened this Jan 26, 2026

tillkolter marked this pull request as ready for review January 26, 2026 18:36

tillkolter force-pushed the google-interrupt-close-streams branch from 3cf6ffe to 0d6259c Compare January 26, 2026 18:36

coderabbitai bot reviewed Jan 26, 2026

View reviewed changes

plugins/google/src/beta/realtime/realtime_api.ts Show resolved Hide resolved

tillkolter added 2 commits January 26, 2026 20:13

fix(google): reset early completion on session close

4c52aab

fix(google): reset interrupt flags on session close

266cbd6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: Support Gemini realtime text interruptions #998

feat: Support Gemini realtime text interruptions #998

tillkolter commented Jan 25, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

changeset-bot bot commented Jan 25, 2026 •

edited

Loading

Uh oh!

coderabbitai bot commented Jan 25, 2026 •

edited

Loading

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tillkolter commented Jan 25, 2026

Uh oh!

coderabbitai bot left a comment

Uh oh!

Uh oh!

toubatbrian commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

feat: Support Gemini realtime text interruptions #998

Are you sure you want to change the base?

feat: Support Gemini realtime text interruptions #998

Conversation

tillkolter commented Jan 25, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Changes Made

Pre-Review Checklist

Additional Notes

Summary by CodeRabbit

Uh oh!

changeset-bot bot commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

⚠️ No Changeset found

Uh oh!

coderabbitai bot commented Jan 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram

Estimated code review effort

Possibly related PRs

Suggested reviewers

Poem

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

tillkolter commented Jan 25, 2026

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

toubatbrian commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

tillkolter commented Jan 25, 2026 •

edited by coderabbitai bot

Loading

changeset-bot bot commented Jan 25, 2026 •

edited

Loading

coderabbitai bot commented Jan 25, 2026 •

edited

Loading