StableSteering

Interactive steering for diffusion image generation, from a user text prompt to preference-guided refinement.

Docs Site · Quick Start · Configuration Manual · Student Tutorial · User Guide · Developer Guide

What It Is

StableSteering is a research-oriented system for interactive image generation with diffusion models.

Instead of relying on one-shot prompt rewriting, the system starts from a user text prompt, proposes multiple candidate directions, records user preferences, updates an internal steering state, and generates the next round from that evolving state.

The current repository includes both:

the original specification and research documents
a runnable FastAPI-based MVP with a real GPU-backed Diffusers backend
Gemini-generated visual assets used to make the Markdown and HTML docs easier to learn

Why It Matters

Text-to-image generation is powerful, but creative control is still awkward in practice. Users often know which result is better before they know how to rewrite the prompt that would produce it.

StableSteering is built around that gap. It turns generation into a feedback loop:

start from a text prompt
generate candidate images
capture user preference
update steering state
generate a stronger next round

That makes the project useful both as:

a research platform for studying human-in-the-loop steering
a concrete prototype for interactive generative workflows

Current MVP

The current system includes:

a FastAPI backend for experiments, sessions, async jobs, replay, diagnostics, and trace reporting
a real Diffusers-backed runtime on GPU by default
a mock generator reserved strictly for tests
SQLite-backed local persistence
backend and frontend tracing with per-session HTML reports
browser and backend test coverage
a real GPU-backed example-run generator with standalone HTML output

Example artifacts checked into the repo:

User Flow

The main workflow is prompt-first:

the user opens /setup
enters a text prompt
optionally edits the per-session YAML configuration
creates a session
generates a round of candidate images
submits explicit feedback for the active mode
waits for the async update job to finish
inspects replay and the saved trace report

The normal runtime is GPU-only and uses the real Diffusers backend. If CUDA is unavailable, the app refuses to start instead of silently falling back.

Getting Started

Install the project:

python -m pip install -e .[dev,inference]

Prepare model assets:

python scripts/setup_huggingface.py

Run the app:

python scripts/run_dev.py

Open:

http://127.0.0.1:8000

Helpful pages:

http://127.0.0.1:8000/setup
http://127.0.0.1:8000/diagnostics/view
http://127.0.0.1:8000/sessions/{session_id}/trace-report

Run Tests

Backend tests:

python -m pytest

Browser tests:

npm install
npm run test:e2e:chrome

Headed browser debug:

npm run test:e2e:debug

Real model smoke:

python scripts/smoke_real_diffusers.py

Real end-to-end example bundle:

python scripts/create_real_e2e_example.py

Checked-in sample bundle:

Repo Guides

Per-folder documentation is available in:

Banner Asset

The README banner is stored at docs/assets/readme_banner.png.

It can be regenerated with:

python scripts/generate_readme_banner.py

The generation script expects GEMINI_API_KEY in the environment and uses the official Gemini image-generation API.

Diagrams And Illustrations

The documentation layer can include Gemini-generated illustrations to make the Markdown and published HTML easier to scan.

Current visual assets include:

They can be regenerated with:

python scripts/generate_readme_banner.py
python scripts/generate_doc_illustrations.py

The Pages builder copies these assets into the generated HTML site automatically.

Legacy Source

The original combined specification is preserved as:

Legacy Combined Spec

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.github/workflows		.github/workflows
app		app
data		data
docs		docs
models		models
output		output
scripts		scripts
site		site
tests		tests
.gitignore		.gitignore
INSTALL.md		INSTALL.md
README.md		README.md
RELEASE.md		RELEASE.md
RELEASE_NOTES_v0.1.0.md		RELEASE_NOTES_v0.1.0.md
RELEASE_NOTES_v0.1.1.md		RELEASE_NOTES_v0.1.1.md
package-lock.json		package-lock.json
package.json		package.json
playwright.config.js		playwright.config.js
playwright.real.config.js		playwright.real.config.js
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

StableSteering

What It Is

Why It Matters

Current MVP

User Flow

Getting Started

Read Next

Run Tests

Repo Guides

Banner Asset

Diagrams And Illustrations

Legacy Source

About

Uh oh!

Releases 2

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

StableSteering

What It Is

Why It Matters

Current MVP

User Flow

Getting Started

Read Next

Run Tests

Repo Guides

Banner Asset

Diagrams And Illustrations

Legacy Source

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 2

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages