feat: generate endpoint with SSE streaming#399
Open
ryanontheinside wants to merge 4 commits intomainfrom
Open
Conversation
# Add batch video generation endpoint with SSE streaming
## Summary
Adds `/api/v1/generate` endpoint for batch video generation with server-side chunking and SSE progress streaming. Supports text-to-video, video-to-video, VACE conditioning, and comprehensive per-chunk parameter scheduling.
This is important for the ComfyUI node wrapper for Scope. It also could conceivably replace the test.py/test_vace.py, or at least their boiler plate code.
## Changes
- **`schema.py`**: Add `GenerateRequest`/`GenerateResponse` models with `EncodedArray` for binary data
- **`generate.py`**: New module handling chunked generation with SSE progress events
- **`app.py`**: Wire up the endpoint
- **`test_generate_endpoint.py`**: Integration tests for v2v, depth, inpainting, LoRA ramps
- **ComfyUI nodes**: Update `ScopeSampler` to use new schema
## Features
### Generation modes
- **Text-to-video**: Generate from prompt alone
- **Video-to-video**: Transform input video with configurable noise scale
### VACE conditioning
- **Reference images**: Style/identity conditioning via image paths
- **Depth/structure guidance**: Pass conditioning frames for structural control
- **Inpainting**: Binary masks specify regions to regenerate vs preserve
### Per-chunk parameter scheduling
All scheduling parameters accept either a single value (applied to all chunks) or a list (applied per-chunk, last value repeats if list is shorter than chunk count).
| Parameter | Type | Description |
|-----------|------|-------------|
| `seed` | `int \| list[int]` | Random seed per chunk |
| `noise_scale` | `float \| list[float]` | V2V noise injection strength |
| `vace_context_scale` | `float \| list[float]` | VACE conditioning influence |
| `lora_scales` | `dict[str, float \| list[float]]` | Per-LoRA strength scheduling |
### Sparse keyframe updates
These parameters use a chunk-indexed specification, only sending updates when values change (sticky behavior).
| Parameter | Type | Description |
|-----------|------|-------------|
| `chunk_prompts` | `list[{chunk, text}]` | Prompt changes at specific chunks |
| `first_frames` | `list[{chunk, image}]` | First frame anchors for extension mode |
| `last_frames` | `list[{chunk, image}]` | Last frame anchors for extension mode |
| `vace_ref_images` | `list[{chunk, images}]` | Reference images at specific chunks |
## Design decisions
Some features were left out of this PR for simplicity (eg, prompt spatial/temporal blending). They can be added or included in a follow up.
### SSE streaming
Clients, like test files or ComfyUI nodes, need performance and progress updates. SSE provides per-chunk progress updates without requiring WebSocket infrastructure:
```
event: progress
data: {"chunk": 1, "total_chunks": 8, "fps": 4.2, "latency": 2.85}
event: progress
data: {"chunk": 2, "total_chunks": 8, "fps": 4.5, "latency": 2.67}
event: complete
data: {"video_base64": "...", "video_shape": [96, 320, 576, 3], ...}
```
### Server-side chunking
The server determines chunk size from the pipeline, handles frame padding, and manages KV cache initialization. Callers specify total frames and per-chunk parameters—the server handles the rest.
## Example usage
### LoRA strength ramp (dissolve effect)
```python
request = GenerateRequest(
pipeline_id="longlive",
prompt="a woman dissolving into particles",
num_frames=96, # 8 chunks × 12 frames
lora_scales={
"path/to/dissolve.safetensors": [0.0, 0.15, 0.3, 0.5, 0.7, 0.85, 1.0, 1.0]
},
)
```
### Video-to-video with prompt changes
```python
request = GenerateRequest(
pipeline_id="longlive",
prompt="a cat sitting calmly",
chunk_prompts=[
{"chunk": 3, "text": "a cat jumping"},
{"chunk": 6, "text": "a cat landing gracefully"},
],
input_video=EncodedArray(base64="...", shape=[96, 512, 512, 3]),
noise_scale=0.6,
)
```
### Depth-guided generation
```python
request = GenerateRequest(
pipeline_id="longlive",
prompt="a robot walking through a forest",
vace_frames=EncodedArray(base64="...", shape=[1, 3, 48, 320, 576]),
vace_context_scale=1.5,
)
```
## Test plan
- [x] `uv run daydream-scope` starts without errors
- [x] V2V generation produces correct output
- [x] VACE depth conditioning works
- [x] VACE inpainting with masks works
- [x] LoRA scale ramping works across chunks
- [x] Per-chunk noise scale scheduling works
- [x] Prompt keyframing updates at correct chunks
- [x] ComfyUI ScopeSampler node works (WIP)
- [x] Test with Longlive
- [x] Same test with StreamDiffusionv2
Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>
c2b5afb to
50e33a1
Compare
enables rife Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>
Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>
Signed-off-by: RyanOnTheInside <7623207+ryanontheinside@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add batch video generation endpoint with SSE streaming
Summary
Adds
/api/v1/generateendpoint for batch video generation with server-side chunking and SSE progress streaming. Supports text-to-video, video-to-video, VACE conditioning, and comprehensive per-chunk parameter scheduling.This is important for the ComfyUI node wrapper for Scope. It also could conceivably replace the test.py/test_vace.py, or at least their boiler plate code.
Changes
schema.py: AddGenerateRequest/GenerateResponsemodels withEncodedArrayfor binary datagenerate.py: New module handling chunked generation with SSE progress eventsapp.py: Wire up the endpointtest_generate_endpoint.py: Integration tests for v2v, depth, inpainting, LoRA rampsScopeSamplerto use new schemaFeatures
Generation modes
VACE conditioning
Per-chunk parameter scheduling
All scheduling parameters accept either a single value (applied to all chunks) or a list (applied per-chunk, last value repeats if list is shorter than chunk count).
seedint | list[int]Sparse keyframe updates
These parameters use a chunk-indexed specification, only sending updates when values change (sticky behavior).
chunk_promptslist[{chunk, text}]Design decisions
Some features were left out of this PR for simplicity (eg, prompt spatial/temporal blending). They can be added or included in a follow up.
SSE streaming
Clients, like test files or ComfyUI nodes, need performance and progress updates. SSE provides per-chunk progress updates without requiring WebSocket infrastructure:
Server-side chunking
The server determines chunk size from the pipeline, handles frame padding, and manages KV cache initialization. Callers specify total frames and per-chunk parameters—the server handles the rest.
Example usage
LoRA strength ramp (dissolve effect)
Video-to-video with prompt changes
Depth-guided generation
Test plan
uv run daydream-scopestarts without errors