Skip to content

ApartsinProjects/StableSteering

Repository files navigation

StableSteering banner

StableSteering

Interactive steering for diffusion image generation, from a user text prompt to preference-guided refinement.

Docs Site · Quick Start · Configuration Manual · Student Tutorial · User Guide · Developer Guide

What It Is

StableSteering is a research-oriented system for interactive image generation with diffusion models.

Instead of relying on one-shot prompt rewriting, the system starts from a user text prompt, proposes multiple candidate directions, records user preferences, updates an internal steering state, and generates the next round from that evolving state.

The current repository includes both:

  • the original specification and research documents
  • a runnable FastAPI-based MVP with a real GPU-backed Diffusers backend
  • Gemini-generated visual assets used to make the Markdown and HTML docs easier to learn

Why It Matters

Text-to-image generation is powerful, but creative control is still awkward in practice. Users often know which result is better before they know how to rewrite the prompt that would produce it.

StableSteering is built around that gap. It turns generation into a feedback loop:

  1. start from a text prompt
  2. generate candidate images
  3. capture user preference
  4. update steering state
  5. generate a stronger next round

That makes the project useful both as:

  • a research platform for studying human-in-the-loop steering
  • a concrete prototype for interactive generative workflows

Current MVP

The current system includes:

  • a FastAPI backend for experiments, sessions, async jobs, replay, diagnostics, and trace reporting
  • a real Diffusers-backed runtime on GPU by default
  • a mock generator reserved strictly for tests
  • SQLite-backed local persistence
  • backend and frontend tracing with per-session HTML reports
  • browser and backend test coverage
  • a real GPU-backed example-run generator with standalone HTML output

Example artifacts checked into the repo:

User Flow

The main workflow is prompt-first:

  1. the user opens /setup
  2. enters a text prompt
  3. optionally edits the per-session YAML configuration
  4. creates a session
  5. generates a round of candidate images
  6. submits explicit feedback for the active mode
  7. waits for the async update job to finish
  8. inspects replay and the saved trace report

The normal runtime is GPU-only and uses the real Diffusers backend. If CUDA is unavailable, the app refuses to start instead of silently falling back.

Runtime architecture diagram

Getting Started

Install the project:

python -m pip install -e .[dev,inference]

Prepare model assets:

python scripts/setup_huggingface.py

Run the app:

python scripts/run_dev.py

Open:

http://127.0.0.1:8000

Helpful pages:

  • http://127.0.0.1:8000/setup
  • http://127.0.0.1:8000/diagnostics/view
  • http://127.0.0.1:8000/sessions/{session_id}/trace-report

Read Next

Recommended reading order:

  1. Motivation
  2. Student Tutorial
  3. Theoretical Background
  4. System Specification
  5. System Test Specification
  6. Pre-Implementation Blueprint
  7. Quick Start
  8. Configuration Manual
  9. System Improvement Roadmap
  10. Research Improvement Roadmap

Additional docs:

Run Tests

Backend tests:

python -m pytest

Browser tests:

npm install
npm run test:e2e:chrome

Headed browser debug:

npm run test:e2e:debug

Real model smoke:

python scripts/smoke_real_diffusers.py

Real end-to-end example bundle:

python scripts/create_real_e2e_example.py

Checked-in sample bundle:

Repo Guides

Per-folder documentation is available in:

Banner Asset

The README banner is stored at docs/assets/readme_banner.png.

It can be regenerated with:

python scripts/generate_readme_banner.py

The generation script expects GEMINI_API_KEY in the environment and uses the official Gemini image-generation API.

Diagrams And Illustrations

The documentation layer can include Gemini-generated illustrations to make the Markdown and published HTML easier to scan.

Current visual assets include:

They can be regenerated with:

python scripts/generate_readme_banner.py
python scripts/generate_doc_illustrations.py

The Pages builder copies these assets into the generated HTML site automatically.

Legacy Source

The original combined specification is preserved as:

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors