Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions HACKATHON-REFERENCE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Hackathon Reference (Read-Me-First)

This branch is a **readability-first snapshot** of work added on top of Scope. It is intended for code review / judging and is **not guaranteed to be runnable as-is**.

## Where to Look

- **Realtime control plane (new module)**: [`src/scope/realtime/`](./src/scope/realtime/)
- Event semantics + deterministic chunk-boundary application: [`src/scope/realtime/control_bus.py`](./src/scope/realtime/control_bus.py)
- Prompt sequencing: [`src/scope/realtime/prompt_playlist.py`](./src/scope/realtime/prompt_playlist.py)
- Driver glue: [`src/scope/realtime/generator_driver.py`](./src/scope/realtime/generator_driver.py), [`src/scope/realtime/pipeline_adapter.py`](./src/scope/realtime/pipeline_adapter.py)

- **CLI tools**: [`src/scope/cli/`](./src/scope/cli/)
- Main CLI entry: [`src/scope/cli/video_cli.py`](./src/scope/cli/video_cli.py)
- Stream Deck integration: [`src/scope/cli/streamdeck_control.py`](./src/scope/cli/streamdeck_control.py)

- **Server-side recording**: [`src/scope/server/session_recorder.py`](./src/scope/server/session_recorder.py)

- **Input + control-map generation** (depth/edges/composite conditioning): [`src/scope/server/frame_processor.py`](./src/scope/server/frame_processor.py)
- Vendored depth model used by the control-map pipeline: [`src/scope/vendored/video_depth_anything/`](./src/scope/vendored/video_depth_anything/)

- **VACE integration + chunk-stability work**: [`src/scope/core/pipelines/wan2_1/vace/`](./src/scope/core/pipelines/wan2_1/vace/)

- **NDI input support**: [`src/scope/server/ndi/`](./src/scope/server/ndi/)

## What’s Intentionally Not Included

This branch is intentionally scoped to **feature work + readability**. Hardware-specific performance codepaths and low-level optimization infrastructure are out of scope for this public snapshot.

See [`PERF-NOTES.md`](./PERF-NOTES.md) for a high-level description of performance work (without code).
71 changes: 71 additions & 0 deletions PERF-NOTES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,71 @@
# Perf Notes (High Level)

This is a **high-level summary + journey log** of performance work done while building a realtime video pipeline. It is intentionally written without low-level implementation details.

Code map / entrypoints: [HACKATHON-REFERENCE.md](./HACKATHON-REFERENCE.md)

## Goals

- Reduce end-to-end chunk latency and stabilize throughput (avoid periodic stalls).
- Keep output temporally stable across chunk boundaries (cache correctness is as important as raw speed).
- Make performance/debuggability observable (what backend ran, what shapes ran, when caches reset).

## Starting Point → Current

- Starting point: ~11 FPS (early end-to-end baseline with stable output).
- Best observed baseline throughput after core optimizations: ~33 FPS (settings-dependent; after warmup).
- Current “performable” mode: ~23 FPS at 448×448 (B200/B300-class GPUs; includes realtime control/conditioning overhead).

## How We Measured (Practical)

- Measured the system as three rates: **input FPS** (camera/NDI/WebRTC ingest), **pipeline FPS** (generation), and **output pacing FPS** (what viewers actually see).
- Used chunk boundaries as the primary unit of “state commits” (cache resets, parameter application, replay determinism).
- Avoided benchmarking under GPU contention (server still running, another job holding the device), because it makes results noisy and misleading.

## Performance Journey (What Moved the Needle)

### 1) Remove Hidden Caps (Pacing, Contention, Fallbacks)

- Used the measurement split above (input vs pipeline vs pacing) to quickly detect input-limited and output-limited runs.
- Routinely checked for GPU contention (a background server or another job can cut throughput dramatically).
- Made backend selection observable so “silent fallbacks” don’t masquerade as model regressions.

### 2) Make The Hot Path GPU-Efficient

- Integrated a fused attention backend (e.g., FlashAttention 4) where available, with safe fallbacks.
- Focused on the end-to-end critical path: attention + MLP + decode, not just one microkernel.
- Prioritized reducing synchronization points and avoiding accidental host/device round trips.

### 3) Fix Data Movement Before Micro-Optimizing Kernels

- Hunted down implicit copies / contiguity fixes / view-to-contiguous transitions in hot paths (especially decode/resize/resample style code).
- Preferred stable shapes and stable layouts across chunks so caches and compiled graphs can actually be reused.

### 4) Selective Compilation (When It Helps, When It Hurts)

- Used `torch.compile` selectively on stable subgraphs and avoided compile on paths that are shape-volatile or stateful across invocations.
- Accepted that compilation has warmup cost; measured steady-state after warmup.
- Watched for cudagraph / reuse interactions that can surface as “reused output” failures when state persists between calls.

### 5) Cache Hygiene + Transition Semantics (Correctness + Perf)

- Treated chunk boundaries as the primary “state commit” point: cache resets, parameter application, and replay all happen there.
- Made transitions explicit:
- **Hard cut** = intentional cache reset.
- **Soft cut** = controlled transition over multiple chunk boundaries.
- Avoided mixing independent encode/decode streams through a shared temporal cache (a common source of boundary artifacts).

### 6) Keep Preprocessing Off The Critical Path

- Depth/control-map generation needs to be fast and predictable, or it becomes the bottleneck (even if generation is fast).
- Prefer asynchronous/pre-buffered preprocessing so occasional slow frames don’t stall the whole pipeline.

### 7) Precision / Quantization Tradeoffs

- Explored mixed precision and (where appropriate) FP8-style quantization to reduce memory bandwidth pressure.
- Kept correctness guardrails so visual quality regressions are obvious and attributable.

## Takeaways

- Most “FPS regressions” weren’t one kernel getting slower — they were fallbacks, extra copies, contention, or a cache/compile mode mismatch.
- Optimizations only stick if they’re observable (backend reporting) and repeatable (benchmark hygiene).
8 changes: 8 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,14 @@ Scope is a tool for running and customizing real-time, interactive generative AI

🚧 This project is currently in **beta**. 🚧

## Hackathon Snapshot (`competition-vace`)

This fork/branch is a **hackathon submission snapshot** of additional work on top of Scope, optimized for readability and review.

- Start here: [HACKATHON-REFERENCE.md](./HACKATHON-REFERENCE.md)
- High-level performance notes (no code): [PERF-NOTES.md](./PERF-NOTES.md)
- Note: this branch is not guaranteed to be runnable as-is.

## Table of Contents

- [Table of Contents](#table-of-contents)
Expand Down
259 changes: 259 additions & 0 deletions src/scope/cli/streamdeck_control.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,259 @@
#!/usr/bin/env python3
"""Stream Deck controller for Scope - sends style commands to remote server.

Usage:
VIDEO_API_URL=http://your-gpu-server:8000 uv run python -m scope.cli.streamdeck_control

Or:
uv run python -m scope.cli.streamdeck_control --url http://your-gpu-server:8000
"""

from __future__ import annotations

import argparse
import os
import sys
import time
from io import BytesIO

import httpx
from PIL import Image, ImageDraw, ImageFont

# Button layout (15-key Stream Deck, 3 rows x 5 cols)
# Key indices go left-to-right, top-to-bottom:
# [0] [1] [2] [3] [4] Row 0: HIDARI YETI TMNT RAT KAIJU
# [5] [6] [7] [8] [9] Row 1: [empty row]
# [10] [11] [12] [13] [14] Row 2: STEP HARD SOFT PLAY [empty]

STYLES = ["hidari", "yeti", "tmnt", "rat", "kaiju"]

# Key index mapping (0-14, left-to-right, top-to-bottom)
STYLE_KEYS = {0: "hidari", 1: "yeti", 2: "tmnt", 3: "rat", 4: "kaiju"}
ACTION_KEYS = {
10: "step", # Bottom row, first
11: "hard_cut", # Bottom row, second
12: "soft_cut", # Bottom row, third
13: "play_pause", # Bottom row, fourth
}


def create_button_image(
deck, text: str, bg_color: str = "#1a1a2e", text_color: str = "#ffffff", active: bool = False
) -> bytes:
"""Create a button image with text."""
# Get the button size for this deck
image_format = deck.key_image_format()
size = (image_format["size"][0], image_format["size"][1])

# Create image
if active:
bg_color = "#4a9eff" # Highlight active style
img = Image.new("RGB", size, bg_color)
draw = ImageDraw.Draw(img)

# Try to use a nice font, fall back to default
font_size = size[0] // 5
try:
font = ImageFont.truetype("/System/Library/Fonts/Helvetica.ttc", font_size)
except OSError:
font = ImageFont.load_default()

# Center the text
bbox = draw.textbbox((0, 0), text, font=font)
text_width = bbox[2] - bbox[0]
text_height = bbox[3] - bbox[1]
x = (size[0] - text_width) // 2
y = (size[1] - text_height) // 2

draw.text((x, y), text, font=font, fill=text_color)

# Rotate 180° - Stream Deck Original has flipped orientation
img = img.rotate(180)

# Convert to the format the deck expects
img_bytes = BytesIO()
img.save(img_bytes, format="JPEG")
return img_bytes.getvalue()


class StreamDeckController:
"""Controls Scope via Stream Deck button presses."""

def __init__(self, api_url: str):
self.api_url = api_url.rstrip("/")
self.client = httpx.Client(timeout=5.0)
self.deck = None
self.current_style: str | None = None
self.is_paused: bool = False

def connect(self) -> bool:
"""Connect to Stream Deck."""
from StreamDeck.DeviceManager import DeviceManager

decks = DeviceManager().enumerate()
if not decks:
print("No Stream Deck found!")
return False

self.deck = decks[0]
self.deck.open()
try:
self.deck.reset()
except Exception as e:
print(f"Warning: Could not reset deck ({e}), continuing anyway...")
print(f"Connected: {self.deck.deck_type()} ({self.deck.key_count()} keys)")
return True

def update_buttons(self):
"""Update all button images."""
if not self.deck:
return

# Style buttons (keys 0-3)
for key, style in STYLE_KEYS.items():
active = style == self.current_style
img = create_button_image(self.deck, style[:6].upper(), active=active)
self.deck.set_key_image(key, img)

# Action buttons (bottom row: 10, 11, 12, 13)
self.deck.set_key_image(10, create_button_image(self.deck, "STEP", bg_color="#2d3436"))
self.deck.set_key_image(11, create_button_image(self.deck, "HARD", bg_color="#d63031"))
self.deck.set_key_image(12, create_button_image(self.deck, "SOFT", bg_color="#fdcb6e", text_color="#000000"))
self.deck.set_key_image(13, create_button_image(self.deck, "PLAY" if self.is_paused else "PAUSE", bg_color="#2d3436"))

# Clear unused keys
for key in range(15):
if key not in STYLE_KEYS and key not in [10, 11, 12, 13]:
self.deck.set_key_image(key, create_button_image(self.deck, "", bg_color="#0d0d0d"))

def fetch_state(self):
"""Fetch current state from server."""
try:
r = self.client.get(f"{self.api_url}/api/v1/realtime/state")
if r.status_code == 200:
state = r.json()
self.current_style = state.get("active_style")
self.is_paused = state.get("paused", False)
return True
except httpx.RequestError as e:
print(f"Failed to fetch state: {e}")
return False

def set_style(self, style: str):
"""Set the active style."""
try:
r = self.client.put(f"{self.api_url}/api/v1/realtime/style", json={"name": style})
if r.status_code == 200:
print(f"Style: {style}")
self.current_style = style
self.update_buttons()
else:
print(f"Failed to set style: {r.status_code}")
except httpx.RequestError as e:
print(f"Error: {e}")

def toggle_pause(self):
"""Toggle pause/play."""
try:
endpoint = "/api/v1/realtime/run" if self.is_paused else "/api/v1/realtime/pause"
r = self.client.post(f"{self.api_url}{endpoint}")
if r.status_code == 200:
self.is_paused = not self.is_paused
print("Paused" if self.is_paused else "Running")
self.update_buttons()
except httpx.RequestError as e:
print(f"Error: {e}")

def step(self):
"""Step one frame."""
try:
r = self.client.post(f"{self.api_url}/api/v1/realtime/step")
if r.status_code == 200:
print("Stepped")
except httpx.RequestError as e:
print(f"Error: {e}")

def hard_cut(self):
"""Trigger hard cut (reset cache)."""
try:
r = self.client.post(f"{self.api_url}/api/v1/realtime/hard-cut")
if r.status_code == 200:
print("Hard cut!")
except httpx.RequestError as e:
print(f"Error: {e}")

def soft_cut(self):
"""Trigger soft cut."""
try:
r = self.client.post(f"{self.api_url}/api/v1/realtime/soft-cut")
if r.status_code == 200:
print("Soft cut")
except httpx.RequestError as e:
print(f"Error: {e}")

def on_key(self, deck, key: int, pressed: bool):
"""Handle key press."""
if not pressed: # Only act on press, not release
return

if key in STYLE_KEYS:
self.set_style(STYLE_KEYS[key])
elif key == 10:
self.step()
elif key == 11:
self.hard_cut()
elif key == 12:
self.soft_cut()
elif key == 13:
self.toggle_pause()

def run(self):
"""Main loop."""
if not self.connect():
return 1

# Fetch initial state
if self.fetch_state():
print(f"Current style: {self.current_style}, Paused: {self.is_paused}")
else:
print("Warning: Could not fetch initial state (server may be offline)")

self.update_buttons()
self.deck.set_key_callback(self.on_key)

print("\nStream Deck ready! Press Ctrl+C to exit.")
print(" Row 1: HIDARI | YETI | TMNT | RAT | KAIJU")
print(" Row 3: STEP | HARD | SOFT | PLAY/PAUSE")

try:
while True:
time.sleep(1)
except KeyboardInterrupt:
print("\nShutting down...")
finally:
if self.deck:
try:
self.deck.reset()
except Exception:
pass # Ignore reset errors during cleanup
self.deck.close()

return 0


def main():
parser = argparse.ArgumentParser(description="Stream Deck controller for Scope")
parser.add_argument(
"--url",
default=os.environ.get("VIDEO_API_URL", "http://localhost:8000"),
help="Scope server URL (default: VIDEO_API_URL env or http://localhost:8000)",
)
args = parser.parse_args()

print(f"Connecting to: {args.url}")
controller = StreamDeckController(args.url)
sys.exit(controller.run())


if __name__ == "__main__":
main()
Loading