feat: generate endpoint with SSE streaming by ryanontheinside · Pull Request #399 · daydreamlive/scope

ryanontheinside · 2026-02-04T19:25:33Z

Add batch video generation endpoint with SSE streaming

Summary

Adds /api/v1/generate endpoint for batch video generation with server-side chunking and SSE progress streaming. Supports text-to-video, video-to-video, VACE conditioning, and comprehensive per-chunk parameter scheduling.

This is important for the ComfyUI node wrapper for Scope. It also could conceivably replace the test.py/test_vace.py, or at least their boiler plate code.

Changes

schema.py: Add GenerateRequest/GenerateResponse models with EncodedArray for binary data
generate.py: New module handling chunked generation with SSE progress events
app.py: Wire up the endpoint
test_generate_endpoint.py: Integration tests for v2v, depth, inpainting, LoRA ramps
ComfyUI nodes: Update ScopeSampler to use new schema

Features

Generation modes

Text-to-video
Video-to-video

VACE conditioning

Reference images: Style/identity conditioning via image paths
Depth/structure guidance: Pass conditioning frames for structural control
Inpainting: Binary masks specify regions to regenerate vs preserve

Per-chunk parameter scheduling

All scheduling parameters accept either a single value (applied to all chunks) or a list (applied per-chunk, last value repeats if list is shorter than chunk count).

Parameter	Type	Description
`seed`	`int \| list[int]`	Random seed per chunk

Sparse keyframe updates

These parameters use a chunk-indexed specification, only sending updates when values change (sticky behavior).

Parameter	Type	Description
`chunk_prompts`	`list[{chunk, text}]`	Prompt changes at specific chunks

Design decisions

Some features were left out of this PR for simplicity (eg, prompt spatial/temporal blending). They can be added or included in a follow up.

SSE streaming

Clients, like test files or ComfyUI nodes, need performance and progress updates. SSE provides per-chunk progress updates without requiring WebSocket infrastructure:

event: progress
data: {"chunk": 1, "total_chunks": 8, "fps": 4.2, "latency": 2.85}

event: progress
data: {"chunk": 2, "total_chunks": 8, "fps": 4.5, "latency": 2.67}

event: complete
data: {"video_base64": "...", "video_shape": [96, 320, 576, 3], ...}

Server-side chunking

The server determines chunk size from the pipeline, handles frame padding, and manages KV cache initialization. Callers specify total frames and per-chunk parameters—the server handles the rest.

Example usage

LoRA strength ramp (dissolve effect)

request = GenerateRequest(
    pipeline_id="longlive",
    prompt="a woman dissolving into particles",
    num_frames=96,  # 8 chunks × 12 frames
    lora_scales={
        "path/to/dissolve.safetensors": [0.0, 0.15, 0.3, 0.5, 0.7, 0.85, 1.0, 1.0]
    },
)

Video-to-video with prompt changes

request = GenerateRequest(
    pipeline_id="longlive",
    prompt="a cat sitting calmly",
    chunk_prompts=[
        {"chunk": 3, "text": "a cat jumping"},
        {"chunk": 6, "text": "a cat landing gracefully"},
    ],
    input_video=EncodedArray(base64="...", shape=[96, 512, 512, 3]),
    noise_scale=0.6,
)

Depth-guided generation

request = GenerateRequest(
    pipeline_id="longlive",
    prompt="a robot walking through a forest",
    vace_frames=EncodedArray(base64="...", shape=[1, 3, 48, 320, 576]),
    vace_context_scale=1.5,
)

Test plan

# Add batch video generation endpoint with SSE streaming ## Summary Adds `/api/v1/generate` endpoint for batch video generation with server-side chunking and SSE progress streaming. Supports text-to-video, video-to-video, VACE conditioning, and comprehensive per-chunk parameter scheduling. This is important for the ComfyUI node wrapper for Scope. It also could conceivably replace the test.py/test_vace.py, or at least their boiler plate code. ## Changes - **`schema.py`**: Add `GenerateRequest`/`GenerateResponse` models with `EncodedArray` for binary data - **`generate.py`**: New module handling chunked generation with SSE progress events - **`app.py`**: Wire up the endpoint - **`test_generate_endpoint.py`**: Integration tests for v2v, depth, inpainting, LoRA ramps - **ComfyUI nodes**: Update `ScopeSampler` to use new schema ## Features ### Generation modes - **Text-to-video**: Generate from prompt alone - **Video-to-video**: Transform input video with configurable noise scale ### VACE conditioning - **Reference images**: Style/identity conditioning via image paths - **Depth/structure guidance**: Pass conditioning frames for structural control - **Inpainting**: Binary masks specify regions to regenerate vs preserve ### Per-chunk parameter scheduling All scheduling parameters accept either a single value (applied to all chunks) or a list (applied per-chunk, last value repeats if list is shorter than chunk count). | Parameter | Type | Description | |-----------|------|-------------| | `seed` | `int \| list[int]` | Random seed per chunk | | `noise_scale` | `float \| list[float]` | V2V noise injection strength | | `vace_context_scale` | `float \| list[float]` | VACE conditioning influence | | `lora_scales` | `dict[str, float \| list[float]]` | Per-LoRA strength scheduling | ### Sparse keyframe updates These parameters use a chunk-indexed specification, only sending updates when values change (sticky behavior). | Parameter | Type | Description | |-----------|------|-------------| | `chunk_prompts` | `list[{chunk, text}]` | Prompt changes at specific chunks | | `first_frames` | `list[{chunk, image}]` | First frame anchors for extension mode | | `last_frames` | `list[{chunk, image}]` | Last frame anchors for extension mode | | `vace_ref_images` | `list[{chunk, images}]` | Reference images at specific chunks | ## Design decisions Some features were left out of this PR for simplicity (eg, prompt spatial/temporal blending). They can be added or included in a follow up. ### SSE streaming Clients, like test files or ComfyUI nodes, need performance and progress updates. SSE provides per-chunk progress updates without requiring WebSocket infrastructure: ``` event: progress data: {"chunk": 1, "total_chunks": 8, "fps": 4.2, "latency": 2.85} event: progress data: {"chunk": 2, "total_chunks": 8, "fps": 4.5, "latency": 2.67} event: complete data: {"video_base64": "...", "video_shape": [96, 320, 576, 3], ...} ``` ### Server-side chunking The server determines chunk size from the pipeline, handles frame padding, and manages KV cache initialization. Callers specify total frames and per-chunk parameters—the server handles the rest. ## Example usage ### LoRA strength ramp (dissolve effect) ```python request = GenerateRequest( pipeline_id="longlive", prompt="a woman dissolving into particles", num_frames=96, # 8 chunks × 12 frames lora_scales={ "path/to/dissolve.safetensors": [0.0, 0.15, 0.3, 0.5, 0.7, 0.85, 1.0, 1.0] }, ) ``` ### Video-to-video with prompt changes ```python request = GenerateRequest( pipeline_id="longlive", prompt="a cat sitting calmly", chunk_prompts=[ {"chunk": 3, "text": "a cat jumping"}, {"chunk": 6, "text": "a cat landing gracefully"}, ], input_video=EncodedArray(base64="...", shape=[96, 512, 512, 3]), noise_scale=0.6, ) ``` ### Depth-guided generation ```python request = GenerateRequest( pipeline_id="longlive", prompt="a robot walking through a forest", vace_frames=EncodedArray(base64="...", shape=[1, 3, 48, 320, 576]), vace_context_scale=1.5, ) ``` ## Test plan - [x] `uv run daydream-scope` starts without errors - [x] V2V generation produces correct output - [x] VACE depth conditioning works - [x] VACE inpainting with masks works - [x] LoRA scale ramping works across chunks - [x] Per-chunk noise scale scheduling works - [x] Prompt keyframing updates at correct chunks - [x] ComfyUI ScopeSampler node works (WIP) - [x] Test with Longlive - [x] Same test with StreamDiffusionv2 Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

enables rife Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

ryanontheinside requested a review from yondonfu February 4, 2026 19:25

ryanontheinside force-pushed the ryanontheinside/feat/generate-endpoint branch from c2b5afb to 50e33a1 Compare February 4, 2026 19:31

ryanontheinside added 3 commits February 4, 2026 14:46

remove edge case padding

4ebe090

enables rife Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

rm longliveloadparams

3d5b8bb

Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

move scripts

e1af42b

Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: generate endpoint with SSE streaming#399

feat: generate endpoint with SSE streaming#399
ryanontheinside wants to merge 4 commits intomainfrom
ryanontheinside/feat/generate-endpoint

ryanontheinside commented Feb 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ryanontheinside commented Feb 4, 2026

Add batch video generation endpoint with SSE streaming

Summary

Changes

Features

Generation modes

VACE conditioning

Per-chunk parameter scheduling

Sparse keyframe updates

Design decisions

SSE streaming

Server-side chunking

Example usage

LoRA strength ramp (dissolve effect)

Video-to-video with prompt changes

Depth-guided generation

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant