Skip to content

feat: generate endpoint with SSE streaming#399

Open
ryanontheinside wants to merge 4 commits intomainfrom
ryanontheinside/feat/generate-endpoint
Open

feat: generate endpoint with SSE streaming#399
ryanontheinside wants to merge 4 commits intomainfrom
ryanontheinside/feat/generate-endpoint

Conversation

@ryanontheinside
Copy link
Collaborator

Add batch video generation endpoint with SSE streaming

Summary

Adds /api/v1/generate endpoint for batch video generation with server-side chunking and SSE progress streaming. Supports text-to-video, video-to-video, VACE conditioning, and comprehensive per-chunk parameter scheduling.

This is important for the ComfyUI node wrapper for Scope. It also could conceivably replace the test.py/test_vace.py, or at least their boiler plate code.

Changes

  • schema.py: Add GenerateRequest/GenerateResponse models with EncodedArray for binary data
  • generate.py: New module handling chunked generation with SSE progress events
  • app.py: Wire up the endpoint
  • test_generate_endpoint.py: Integration tests for v2v, depth, inpainting, LoRA ramps
  • ComfyUI nodes: Update ScopeSampler to use new schema

Features

Generation modes

  • Text-to-video
  • Video-to-video

VACE conditioning

  • Reference images: Style/identity conditioning via image paths
  • Depth/structure guidance: Pass conditioning frames for structural control
  • Inpainting: Binary masks specify regions to regenerate vs preserve

Per-chunk parameter scheduling

All scheduling parameters accept either a single value (applied to all chunks) or a list (applied per-chunk, last value repeats if list is shorter than chunk count).

Parameter Type Description
seed int | list[int] Random seed per chunk

Sparse keyframe updates

These parameters use a chunk-indexed specification, only sending updates when values change (sticky behavior).

Parameter Type Description
chunk_prompts list[{chunk, text}] Prompt changes at specific chunks

Design decisions

Some features were left out of this PR for simplicity (eg, prompt spatial/temporal blending). They can be added or included in a follow up.

SSE streaming

Clients, like test files or ComfyUI nodes, need performance and progress updates. SSE provides per-chunk progress updates without requiring WebSocket infrastructure:

event: progress
data: {"chunk": 1, "total_chunks": 8, "fps": 4.2, "latency": 2.85}

event: progress
data: {"chunk": 2, "total_chunks": 8, "fps": 4.5, "latency": 2.67}

event: complete
data: {"video_base64": "...", "video_shape": [96, 320, 576, 3], ...}

Server-side chunking

The server determines chunk size from the pipeline, handles frame padding, and manages KV cache initialization. Callers specify total frames and per-chunk parameters—the server handles the rest.

Example usage

LoRA strength ramp (dissolve effect)

request = GenerateRequest(
    pipeline_id="longlive",
    prompt="a woman dissolving into particles",
    num_frames=96,  # 8 chunks × 12 frames
    lora_scales={
        "path/to/dissolve.safetensors": [0.0, 0.15, 0.3, 0.5, 0.7, 0.85, 1.0, 1.0]
    },
)

Video-to-video with prompt changes

request = GenerateRequest(
    pipeline_id="longlive",
    prompt="a cat sitting calmly",
    chunk_prompts=[
        {"chunk": 3, "text": "a cat jumping"},
        {"chunk": 6, "text": "a cat landing gracefully"},
    ],
    input_video=EncodedArray(base64="...", shape=[96, 512, 512, 3]),
    noise_scale=0.6,
)

Depth-guided generation

request = GenerateRequest(
    pipeline_id="longlive",
    prompt="a robot walking through a forest",
    vace_frames=EncodedArray(base64="...", shape=[1, 3, 48, 320, 576]),
    vace_context_scale=1.5,
)

Test plan

  • uv run daydream-scope starts without errors
  • V2V generation produces correct output
  • VACE depth conditioning works
  • VACE inpainting with masks works
  • LoRA scale ramping works across chunks
  • Per-chunk noise scale scheduling works
  • Prompt keyframing updates at correct chunks
  • ComfyUI ScopeSampler node works (WIP)
  • Test with Longlive
  • Same test with StreamDiffusionv2

# Add batch video generation endpoint with SSE streaming

## Summary

Adds `/api/v1/generate` endpoint for batch video generation with server-side chunking and SSE progress streaming. Supports text-to-video, video-to-video, VACE conditioning, and comprehensive per-chunk parameter scheduling.

This is important for the ComfyUI node wrapper for Scope. It also could conceivably replace the test.py/test_vace.py, or at least their boiler plate code.

## Changes

- **`schema.py`**: Add `GenerateRequest`/`GenerateResponse` models with `EncodedArray` for binary data
- **`generate.py`**: New module handling chunked generation with SSE progress events
- **`app.py`**: Wire up the endpoint
- **`test_generate_endpoint.py`**: Integration tests for v2v, depth, inpainting, LoRA ramps
- **ComfyUI nodes**: Update `ScopeSampler` to use new schema

## Features

### Generation modes
- **Text-to-video**: Generate from prompt alone
- **Video-to-video**: Transform input video with configurable noise scale

### VACE conditioning
- **Reference images**: Style/identity conditioning via image paths
- **Depth/structure guidance**: Pass conditioning frames for structural control
- **Inpainting**: Binary masks specify regions to regenerate vs preserve

### Per-chunk parameter scheduling

All scheduling parameters accept either a single value (applied to all chunks) or a list (applied per-chunk, last value repeats if list is shorter than chunk count).

| Parameter | Type | Description |
|-----------|------|-------------|
| `seed` | `int \| list[int]` | Random seed per chunk |
| `noise_scale` | `float \| list[float]` | V2V noise injection strength |
| `vace_context_scale` | `float \| list[float]` | VACE conditioning influence |
| `lora_scales` | `dict[str, float \| list[float]]` | Per-LoRA strength scheduling |

### Sparse keyframe updates

These parameters use a chunk-indexed specification, only sending updates when values change (sticky behavior).

| Parameter | Type | Description |
|-----------|------|-------------|
| `chunk_prompts` | `list[{chunk, text}]` | Prompt changes at specific chunks |
| `first_frames` | `list[{chunk, image}]` | First frame anchors for extension mode |
| `last_frames` | `list[{chunk, image}]` | Last frame anchors for extension mode |
| `vace_ref_images` | `list[{chunk, images}]` | Reference images at specific chunks |

## Design decisions

Some features were left out of this PR for simplicity (eg, prompt spatial/temporal blending). They can be added or included in a follow up.
### SSE streaming

Clients, like test files or ComfyUI nodes, need performance and progress updates. SSE provides per-chunk progress updates without requiring WebSocket infrastructure:

```
event: progress
data: {"chunk": 1, "total_chunks": 8, "fps": 4.2, "latency": 2.85}

event: progress
data: {"chunk": 2, "total_chunks": 8, "fps": 4.5, "latency": 2.67}

event: complete
data: {"video_base64": "...", "video_shape": [96, 320, 576, 3], ...}
```

### Server-side chunking

The server determines chunk size from the pipeline, handles frame padding, and manages KV cache initialization. Callers specify total frames and per-chunk parameters—the server handles the rest.

## Example usage

### LoRA strength ramp (dissolve effect)

```python
request = GenerateRequest(
    pipeline_id="longlive",
    prompt="a woman dissolving into particles",
    num_frames=96,  # 8 chunks × 12 frames
    lora_scales={
        "path/to/dissolve.safetensors": [0.0, 0.15, 0.3, 0.5, 0.7, 0.85, 1.0, 1.0]
    },
)
```

### Video-to-video with prompt changes

```python
request = GenerateRequest(
    pipeline_id="longlive",
    prompt="a cat sitting calmly",
    chunk_prompts=[
        {"chunk": 3, "text": "a cat jumping"},
        {"chunk": 6, "text": "a cat landing gracefully"},
    ],
    input_video=EncodedArray(base64="...", shape=[96, 512, 512, 3]),
    noise_scale=0.6,
)
```

### Depth-guided generation

```python
request = GenerateRequest(
    pipeline_id="longlive",
    prompt="a robot walking through a forest",
    vace_frames=EncodedArray(base64="...", shape=[1, 3, 48, 320, 576]),
    vace_context_scale=1.5,
)
```

## Test plan

- [x] `uv run daydream-scope` starts without errors
- [x] V2V generation produces correct output
- [x] VACE depth conditioning works
- [x] VACE inpainting with masks works
- [x] LoRA scale ramping works across chunks
- [x] Per-chunk noise scale scheduling works
- [x] Prompt keyframing updates at correct chunks
- [x] ComfyUI ScopeSampler node works (WIP)
- [x] Test with Longlive
- [x] Same test with StreamDiffusionv2

Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>
@ryanontheinside ryanontheinside force-pushed the ryanontheinside/feat/generate-endpoint branch from c2b5afb to 50e33a1 Compare February 4, 2026 19:31
enables rife

Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>
Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>
Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant