Skip to content

Commit 352fad0

Browse files
committed
autogen docs
1 parent 8ae94fe commit 352fad0

4 files changed

Lines changed: 532 additions & 0 deletions

File tree

.github/copilot-instructions.md

Lines changed: 140 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,140 @@
1+
# GitHub Copilot Instructions for Stable Diffusion Server
2+
3+
## Project Context
4+
This is a production-ready Stable Diffusion server supporting multiple AI models (SDXL, Flux Schnell) with FastAPI backend and Gradio UI. The server handles text-to-image generation, inpainting, style transfer, and cloud storage integration.
5+
6+
## Code Style & Patterns
7+
8+
### Python Standards
9+
- Use type hints for all function parameters and return values
10+
- Follow existing error handling patterns with try/except and retry logic
11+
- Use `torch.inference_mode()` context for all model inference operations
12+
- Implement proper memory management with CPU offloading and cache clearing
13+
14+
### Model Pipeline Patterns
15+
```python
16+
# Always use this pattern for inference
17+
with torch.inference_mode():
18+
image = pipe(
19+
prompt=prompt,
20+
num_inference_steps=n_steps,
21+
guidance_scale=guidance_scale,
22+
generator=generator,
23+
).images[0]
24+
```
25+
26+
### Image Processing Standards
27+
- Convert all input images to RGB: `image.convert("RGB")`
28+
- Use 64-pixel alignment for dimensions: `width - (width % 64)`
29+
- Save images as WebP with quality=85 and optimize=True
30+
- Always process images through `process_image_for_stable_diffusion()` before inference
31+
32+
### Error Handling Requirements
33+
- Implement retry logic with prompt modification on failures
34+
- Use `shorten_prompt_for_retry()` and `remove_stopwords()` for retries
35+
- Log warnings with attempt counts: `logger.warning(f"Failed on attempt {attempt + 1}/{retries}: {err}")`
36+
- Detect and handle "too bumpy" images with regeneration
37+
38+
### API Endpoint Patterns
39+
```python
40+
@app.get("/endpoint_name")
41+
async def endpoint_function(
42+
prompt: str,
43+
save_path: str = "",
44+
# other params with defaults
45+
):
46+
# URL encode save_path components
47+
path_components = save_path.split("/")[0:-1]
48+
final_name = save_path.split("/")[-1]
49+
save_path = "/".join(path_components) + quote_plus(final_name)
50+
51+
# Check cache first
52+
if check_if_blob_exists(save_path):
53+
return JSONResponse({"path": f"https://{BUCKET_NAME}/{BUCKET_PATH}/{save_path}"})
54+
```
55+
56+
## Model Architecture Guidelines
57+
58+
### Pipeline Initialization
59+
- Enable CPU offloading: `pipe.enable_model_cpu_offload()`
60+
- Enable memory optimizations: `pipe.enable_attention_slicing()`, `pipe.enable_vae_slicing()`
61+
- Share components between pipelines to reduce memory usage
62+
- Set `pipe.watermark = None` to disable watermarking
63+
64+
### Memory Management
65+
- Use sequential CPU offloading for production: `pipe.enable_sequential_cpu_offload()`
66+
- Implement component sharing: `img2img.unet = pipe.unet`
67+
- Consider Optimum Quanto quantization for memory-constrained environments
68+
- Use torch.Generator with fixed seeds for reproducible results
69+
70+
## Cloud Storage Integration
71+
72+
### Upload Pattern
73+
```python
74+
# Always check existence before generation
75+
if check_if_blob_exists(save_path):
76+
return f"https://{BUCKET_NAME}/{BUCKET_PATH}/{save_path}"
77+
78+
# Generate and upload
79+
bio = create_image_from_prompt(prompt, width, height)
80+
link = upload_to_bucket(save_path, bio, is_bytesio=True)
81+
return link
82+
```
83+
84+
### Environment Variables
85+
- Use `STORAGE_PROVIDER` to switch between 'r2' and 'gcs'
86+
- Support both R2_ENDPOINT_URL and GOOGLE_APPLICATION_CREDENTIALS
87+
- Respect BUCKET_NAME and BUCKET_PATH configuration
88+
89+
## Gradio UI Patterns
90+
91+
### Interface Structure
92+
```python
93+
with gr.Blocks() as demo:
94+
with gr.Row():
95+
with gr.Column():
96+
# Input controls
97+
with gr.Column():
98+
# Output displays
99+
100+
# Event handlers
101+
button.click(function, inputs=[...], outputs=[...])
102+
```
103+
104+
### Image Handling
105+
- Use `gr.Image(tool="editor", type="pil")` for inpainting interfaces
106+
- Save intermediate results to outputs/ directory with descriptive names
107+
- Yield intermediate results for progressive display: `yield images`
108+
109+
## Security & Performance
110+
111+
### Input Validation
112+
- Always call `shorten_too_long_text()` on prompts and save_paths
113+
- Validate image dimensions and adjust to model requirements
114+
- Use UUID prefixes for file naming to avoid conflicts
115+
116+
### Production Considerations
117+
- Implement progress tracking with progress.txt updates
118+
- Handle CUDA memory issues with automatic server restart logic
119+
- Use proper timeout settings: `--timeout-keep-alive 600`
120+
- Limit concurrency: `--limit-concurrency 4` for memory management
121+
122+
## Testing & Development
123+
124+
### Local Development
125+
- Use `python gradio_ui.py` for quick UI testing
126+
- Test individual models with `python flux_schnell.py`
127+
- Run server locally: `uvicorn main:app --port 8000 --reload`
128+
129+
### Production Testing
130+
- Test with limited concurrency settings
131+
- Verify cloud storage uploads work correctly
132+
- Monitor memory usage and restart behavior
133+
- Test error handling with malformed inputs
134+
135+
## Common Pitfalls to Avoid
136+
- Never use models without CPU offloading in production
137+
- Don't forget to convert masks to RGB for inpainting
138+
- Always check for None returns from image generation functions
139+
- Don't skip the "too bumpy" detection - it prevents bad outputs
140+
- Remember to update progress.txt to prevent supervisor timeouts

AGENTS.md

Lines changed: 213 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,213 @@
1+
# AGENTS.md - AI Assistant Guidelines
2+
3+
This file provides comprehensive guidance for AI assistants (Claude, ChatGPT, etc.) working with the Stable Diffusion Server codebase.
4+
5+
## Project Overview
6+
A production-ready AI image generation server supporting multiple diffusion models with cloud storage integration, Gradio UI, and FastAPI backend.
7+
8+
## Core Capabilities
9+
- **Text-to-Image**: Flux Schnell and SDXL model support
10+
- **Style Transfer**: ControlNet-guided image transformation
11+
- **Inpainting**: Mask-based image editing with refinement
12+
- **Cloud Storage**: R2/GCS integration with automatic caching
13+
- **UI Components**: Gradio interfaces for local development
14+
15+
## Quick Start Commands
16+
17+
### Development Setup
18+
```bash
19+
# Environment setup
20+
pip install uv && uv venv && source .venv/bin/activate
21+
uv pip install -r requirements.txt -r dev-requirements.txt
22+
python -c "import nltk; nltk.download('stopwords')"
23+
24+
# Local testing
25+
python flux_schnell.py # Test Flux model
26+
python gradio_ui.py # Launch UI
27+
uvicorn main:app --port 8000 # Run API server
28+
```
29+
30+
### Production Deployment
31+
```bash
32+
# With environment variables
33+
GOOGLE_APPLICATION_CREDENTIALS=secrets/google-credentials.json \
34+
PYTHONPATH=. uvicorn --port 8000 --timeout-keep-alive 600 --workers 1 --limit-concurrency 4 main:app
35+
```
36+
37+
## Key Architecture Components
38+
39+
### Model Pipelines (main.py:74-444)
40+
1. **Primary SDXL Pipeline** (`pipe`) - ProteusV0.2 with LCM scheduler
41+
2. **Flux Schnell Pipeline** (`flux_pipe`) - Fast text-to-image generation
42+
3. **Image2Image Pipeline** (`img2img`) - Style transfer operations
43+
4. **Inpainting Pipelines** (`inpaintpipe`, `inpaint_refiner`) - Mask-based editing
44+
5. **ControlNet Pipelines** - Canny edge and line-guided generation
45+
46+
### Memory Management Strategy
47+
- CPU offloading for all pipelines to manage GPU memory
48+
- Component sharing between pipelines (shared UNet, VAE, encoders)
49+
- Attention slicing and VAE slicing for efficiency
50+
- Optional Optimum Quanto quantization support
51+
52+
### API Endpoints
53+
- `/create_and_upload_image` - Text-to-image with cloud upload
54+
- `/inpaint_and_upload_image` - Inpainting with cloud upload
55+
- `/style_transfer_and_upload_image` - Style transfer with cloud upload
56+
- `/style_transfer_bytes_and_upload_image` - File upload support
57+
58+
## Development Guidelines
59+
60+
### When Adding New Features
61+
1. **Follow existing patterns**: Use the same error handling, retry logic, and memory management
62+
2. **Maintain compatibility**: Ensure new features work with existing pipeline architecture
63+
3. **Test thoroughly**: Use both Gradio UI and API endpoints for validation
64+
4. **Document changes**: Update relevant sections in CLAUDE.md and this file
65+
66+
### Code Quality Standards
67+
```python
68+
# Always use type hints
69+
def generate_image(prompt: str, width: int = 1024) -> Image.Image:
70+
71+
# Use inference mode for all model operations
72+
with torch.inference_mode():
73+
image = pipe(prompt=prompt).images[0]
74+
75+
# Implement proper error handling with retries
76+
for attempt in range(retries + 1):
77+
try:
78+
# Generation logic
79+
break
80+
except Exception as err:
81+
if attempt >= retries:
82+
raise
83+
logger.warning(f"Failed attempt {attempt + 1}/{retries}: {err}")
84+
```
85+
86+
### Common Tasks and Solutions
87+
88+
#### Adding a New Model
89+
1. Load model in main.py initialization section
90+
2. Enable CPU offloading and memory optimizations
91+
3. Share components with existing pipelines where possible
92+
4. Add corresponding API endpoint following existing patterns
93+
5. Test with Gradio UI integration
94+
95+
#### Modifying Image Processing
96+
1. Update `stable_diffusion_server/image_processing.py`
97+
2. Ensure compatibility with existing dimension requirements (64-pixel alignment)
98+
3. Test with various input formats and sizes
99+
4. Update error handling for edge cases
100+
101+
#### Cloud Storage Integration
102+
1. Check existing `stable_diffusion_server/bucket_api.py` implementation
103+
2. Follow the check-exists-before-generate pattern
104+
3. Handle both R2 and GCS storage backends
105+
4. Test upload/download functionality thoroughly
106+
107+
## Troubleshooting Common Issues
108+
109+
### Memory Problems
110+
- **Black images**: Usually indicates CUDA memory issues, server auto-restarts
111+
- **OOM errors**: Reduce concurrency, enable more aggressive CPU offloading
112+
- **Slow inference**: Check if models are properly using CPU offloading
113+
114+
### Image Quality Issues
115+
- **"Too bumpy" images**: Automatic detection triggers regeneration with modified prompts
116+
- **Poor style transfer**: Ensure canny edge detection is working correctly
117+
- **Blurry outputs**: Check if proper refinement passes are enabled
118+
119+
### API/Server Issues
120+
- **Timeouts**: Update progress.txt file during long operations
121+
- **Upload failures**: Verify cloud storage credentials and bucket permissions
122+
- **Rate limiting**: Adjust `--limit-concurrency` and `--backlog` settings
123+
124+
## Environment Configuration
125+
126+
### Required Environment Variables
127+
```bash
128+
# Storage (choose one)
129+
STORAGE_PROVIDER=r2|gcs
130+
BUCKET_NAME=your-bucket-name
131+
BUCKET_PATH=static/uploads
132+
R2_ENDPOINT_URL=https://account.r2.cloudflarestorage.com
133+
PUBLIC_BASE_URL=your-domain.com
134+
GOOGLE_APPLICATION_CREDENTIALS=path/to/credentials.json
135+
136+
# Model paths (optional)
137+
DF11_MODEL_PATH=DFloat11/FLUX.1-schnell-DF11
138+
CONTROLNET_LORA=black-forest-labs/flux-controlnet-line-lora
139+
LOAD_LCM_LORA=1
140+
```
141+
142+
### Model Directory Structure
143+
```
144+
models/
145+
├── ProteusV0.2/ # Primary SDXL model
146+
├── stable-diffusion-xl-base-1.0/ # Base SDXL model
147+
├── lcm-lora-sdxl/ # LCM LoRA weights
148+
└── diffusers/
149+
└── controlnet-canny-sdxl-1.0/ # ControlNet model
150+
```
151+
152+
## Testing and Validation
153+
154+
### Before Submitting Changes
155+
1. **Run existing tests**: `pytest -q`
156+
2. **Check code style**: `flake8`
157+
3. **Test UI functionality**: Launch `python gradio_ui.py` and verify all features
158+
4. **Test API endpoints**: Send requests to key endpoints and verify responses
159+
5. **Memory usage**: Monitor GPU/CPU usage during generation
160+
161+
### Integration Testing
162+
```bash
163+
# Test image generation
164+
curl "http://localhost:8000/create_and_upload_image?prompt=test&save_path=test.webp"
165+
166+
# Test style transfer
167+
curl -X POST "http://localhost:8000/style_transfer_bytes_and_upload_image" \
168+
-F "prompt=anime style" -F "image_file=@test.jpg" -F "save_path=output.webp"
169+
```
170+
171+
## Performance Optimization
172+
173+
### Memory Optimization
174+
- Use `enable_sequential_cpu_offload()` for lowest memory usage
175+
- Share model components between pipelines
176+
- Consider quantization for memory-constrained environments
177+
- Monitor and tune batch sizes for optimal throughput
178+
179+
### Speed Optimization
180+
- Use Flux Schnell for fastest generation (4-8 steps)
181+
- Enable LCM LoRA for SDXL speed improvements
182+
- Implement proper caching with `check_if_blob_exists()`
183+
- Use appropriate guidance scales (0.0 for Flux, 7+ for SDXL)
184+
185+
## Security Considerations
186+
187+
### Input Validation
188+
- Always validate and sanitize prompts using `shorten_too_long_text()`
189+
- Validate image dimensions and file formats
190+
- Use UUID prefixes for generated filenames to prevent conflicts
191+
192+
### Production Security
193+
- Never expose cloud storage credentials in code
194+
- Use proper environment variable management
195+
- Implement rate limiting and request validation
196+
- Monitor for suspicious usage patterns
197+
198+
## Contributing Guidelines
199+
200+
### Pull Request Checklist
201+
- [ ] Code follows existing patterns and style
202+
- [ ] New features include appropriate error handling
203+
- [ ] Memory management is properly implemented
204+
- [ ] Tests pass and new functionality is tested
205+
- [ ] Documentation is updated (CLAUDE.md, this file, docstrings)
206+
- [ ] No sensitive information is committed
207+
208+
### Code Review Focus Areas
209+
1. **Memory safety**: Proper pipeline management and GPU memory usage
210+
2. **Error handling**: Robust retry logic and graceful degradation
211+
3. **API consistency**: Following established endpoint patterns
212+
4. **Performance impact**: Changes don't negatively affect generation speed
213+
5. **Security**: Input validation and credential management

0 commit comments

Comments
 (0)