|
| 1 | +# AGENTS.md - AI Assistant Guidelines |
| 2 | + |
| 3 | +This file provides comprehensive guidance for AI assistants (Claude, ChatGPT, etc.) working with the Stable Diffusion Server codebase. |
| 4 | + |
| 5 | +## Project Overview |
| 6 | +A production-ready AI image generation server supporting multiple diffusion models with cloud storage integration, Gradio UI, and FastAPI backend. |
| 7 | + |
| 8 | +## Core Capabilities |
| 9 | +- **Text-to-Image**: Flux Schnell and SDXL model support |
| 10 | +- **Style Transfer**: ControlNet-guided image transformation |
| 11 | +- **Inpainting**: Mask-based image editing with refinement |
| 12 | +- **Cloud Storage**: R2/GCS integration with automatic caching |
| 13 | +- **UI Components**: Gradio interfaces for local development |
| 14 | + |
| 15 | +## Quick Start Commands |
| 16 | + |
| 17 | +### Development Setup |
| 18 | +```bash |
| 19 | +# Environment setup |
| 20 | +pip install uv && uv venv && source .venv/bin/activate |
| 21 | +uv pip install -r requirements.txt -r dev-requirements.txt |
| 22 | +python -c "import nltk; nltk.download('stopwords')" |
| 23 | + |
| 24 | +# Local testing |
| 25 | +python flux_schnell.py # Test Flux model |
| 26 | +python gradio_ui.py # Launch UI |
| 27 | +uvicorn main:app --port 8000 # Run API server |
| 28 | +``` |
| 29 | + |
| 30 | +### Production Deployment |
| 31 | +```bash |
| 32 | +# With environment variables |
| 33 | +GOOGLE_APPLICATION_CREDENTIALS=secrets/google-credentials.json \ |
| 34 | +PYTHONPATH=. uvicorn --port 8000 --timeout-keep-alive 600 --workers 1 --limit-concurrency 4 main:app |
| 35 | +``` |
| 36 | + |
| 37 | +## Key Architecture Components |
| 38 | + |
| 39 | +### Model Pipelines (main.py:74-444) |
| 40 | +1. **Primary SDXL Pipeline** (`pipe`) - ProteusV0.2 with LCM scheduler |
| 41 | +2. **Flux Schnell Pipeline** (`flux_pipe`) - Fast text-to-image generation |
| 42 | +3. **Image2Image Pipeline** (`img2img`) - Style transfer operations |
| 43 | +4. **Inpainting Pipelines** (`inpaintpipe`, `inpaint_refiner`) - Mask-based editing |
| 44 | +5. **ControlNet Pipelines** - Canny edge and line-guided generation |
| 45 | + |
| 46 | +### Memory Management Strategy |
| 47 | +- CPU offloading for all pipelines to manage GPU memory |
| 48 | +- Component sharing between pipelines (shared UNet, VAE, encoders) |
| 49 | +- Attention slicing and VAE slicing for efficiency |
| 50 | +- Optional Optimum Quanto quantization support |
| 51 | + |
| 52 | +### API Endpoints |
| 53 | +- `/create_and_upload_image` - Text-to-image with cloud upload |
| 54 | +- `/inpaint_and_upload_image` - Inpainting with cloud upload |
| 55 | +- `/style_transfer_and_upload_image` - Style transfer with cloud upload |
| 56 | +- `/style_transfer_bytes_and_upload_image` - File upload support |
| 57 | + |
| 58 | +## Development Guidelines |
| 59 | + |
| 60 | +### When Adding New Features |
| 61 | +1. **Follow existing patterns**: Use the same error handling, retry logic, and memory management |
| 62 | +2. **Maintain compatibility**: Ensure new features work with existing pipeline architecture |
| 63 | +3. **Test thoroughly**: Use both Gradio UI and API endpoints for validation |
| 64 | +4. **Document changes**: Update relevant sections in CLAUDE.md and this file |
| 65 | + |
| 66 | +### Code Quality Standards |
| 67 | +```python |
| 68 | +# Always use type hints |
| 69 | +def generate_image(prompt: str, width: int = 1024) -> Image.Image: |
| 70 | + |
| 71 | +# Use inference mode for all model operations |
| 72 | +with torch.inference_mode(): |
| 73 | + image = pipe(prompt=prompt).images[0] |
| 74 | + |
| 75 | +# Implement proper error handling with retries |
| 76 | +for attempt in range(retries + 1): |
| 77 | + try: |
| 78 | + # Generation logic |
| 79 | + break |
| 80 | + except Exception as err: |
| 81 | + if attempt >= retries: |
| 82 | + raise |
| 83 | + logger.warning(f"Failed attempt {attempt + 1}/{retries}: {err}") |
| 84 | +``` |
| 85 | + |
| 86 | +### Common Tasks and Solutions |
| 87 | + |
| 88 | +#### Adding a New Model |
| 89 | +1. Load model in main.py initialization section |
| 90 | +2. Enable CPU offloading and memory optimizations |
| 91 | +3. Share components with existing pipelines where possible |
| 92 | +4. Add corresponding API endpoint following existing patterns |
| 93 | +5. Test with Gradio UI integration |
| 94 | + |
| 95 | +#### Modifying Image Processing |
| 96 | +1. Update `stable_diffusion_server/image_processing.py` |
| 97 | +2. Ensure compatibility with existing dimension requirements (64-pixel alignment) |
| 98 | +3. Test with various input formats and sizes |
| 99 | +4. Update error handling for edge cases |
| 100 | + |
| 101 | +#### Cloud Storage Integration |
| 102 | +1. Check existing `stable_diffusion_server/bucket_api.py` implementation |
| 103 | +2. Follow the check-exists-before-generate pattern |
| 104 | +3. Handle both R2 and GCS storage backends |
| 105 | +4. Test upload/download functionality thoroughly |
| 106 | + |
| 107 | +## Troubleshooting Common Issues |
| 108 | + |
| 109 | +### Memory Problems |
| 110 | +- **Black images**: Usually indicates CUDA memory issues, server auto-restarts |
| 111 | +- **OOM errors**: Reduce concurrency, enable more aggressive CPU offloading |
| 112 | +- **Slow inference**: Check if models are properly using CPU offloading |
| 113 | + |
| 114 | +### Image Quality Issues |
| 115 | +- **"Too bumpy" images**: Automatic detection triggers regeneration with modified prompts |
| 116 | +- **Poor style transfer**: Ensure canny edge detection is working correctly |
| 117 | +- **Blurry outputs**: Check if proper refinement passes are enabled |
| 118 | + |
| 119 | +### API/Server Issues |
| 120 | +- **Timeouts**: Update progress.txt file during long operations |
| 121 | +- **Upload failures**: Verify cloud storage credentials and bucket permissions |
| 122 | +- **Rate limiting**: Adjust `--limit-concurrency` and `--backlog` settings |
| 123 | + |
| 124 | +## Environment Configuration |
| 125 | + |
| 126 | +### Required Environment Variables |
| 127 | +```bash |
| 128 | +# Storage (choose one) |
| 129 | +STORAGE_PROVIDER=r2|gcs |
| 130 | +BUCKET_NAME=your-bucket-name |
| 131 | +BUCKET_PATH=static/uploads |
| 132 | +R2_ENDPOINT_URL=https://account.r2.cloudflarestorage.com |
| 133 | +PUBLIC_BASE_URL=your-domain.com |
| 134 | +GOOGLE_APPLICATION_CREDENTIALS=path/to/credentials.json |
| 135 | + |
| 136 | +# Model paths (optional) |
| 137 | +DF11_MODEL_PATH=DFloat11/FLUX.1-schnell-DF11 |
| 138 | +CONTROLNET_LORA=black-forest-labs/flux-controlnet-line-lora |
| 139 | +LOAD_LCM_LORA=1 |
| 140 | +``` |
| 141 | + |
| 142 | +### Model Directory Structure |
| 143 | +``` |
| 144 | +models/ |
| 145 | +├── ProteusV0.2/ # Primary SDXL model |
| 146 | +├── stable-diffusion-xl-base-1.0/ # Base SDXL model |
| 147 | +├── lcm-lora-sdxl/ # LCM LoRA weights |
| 148 | +└── diffusers/ |
| 149 | + └── controlnet-canny-sdxl-1.0/ # ControlNet model |
| 150 | +``` |
| 151 | + |
| 152 | +## Testing and Validation |
| 153 | + |
| 154 | +### Before Submitting Changes |
| 155 | +1. **Run existing tests**: `pytest -q` |
| 156 | +2. **Check code style**: `flake8` |
| 157 | +3. **Test UI functionality**: Launch `python gradio_ui.py` and verify all features |
| 158 | +4. **Test API endpoints**: Send requests to key endpoints and verify responses |
| 159 | +5. **Memory usage**: Monitor GPU/CPU usage during generation |
| 160 | + |
| 161 | +### Integration Testing |
| 162 | +```bash |
| 163 | +# Test image generation |
| 164 | +curl "http://localhost:8000/create_and_upload_image?prompt=test&save_path=test.webp" |
| 165 | + |
| 166 | +# Test style transfer |
| 167 | +curl -X POST "http://localhost:8000/style_transfer_bytes_and_upload_image" \ |
| 168 | + -F "prompt=anime style" -F "image_file=@test.jpg" -F "save_path=output.webp" |
| 169 | +``` |
| 170 | + |
| 171 | +## Performance Optimization |
| 172 | + |
| 173 | +### Memory Optimization |
| 174 | +- Use `enable_sequential_cpu_offload()` for lowest memory usage |
| 175 | +- Share model components between pipelines |
| 176 | +- Consider quantization for memory-constrained environments |
| 177 | +- Monitor and tune batch sizes for optimal throughput |
| 178 | + |
| 179 | +### Speed Optimization |
| 180 | +- Use Flux Schnell for fastest generation (4-8 steps) |
| 181 | +- Enable LCM LoRA for SDXL speed improvements |
| 182 | +- Implement proper caching with `check_if_blob_exists()` |
| 183 | +- Use appropriate guidance scales (0.0 for Flux, 7+ for SDXL) |
| 184 | + |
| 185 | +## Security Considerations |
| 186 | + |
| 187 | +### Input Validation |
| 188 | +- Always validate and sanitize prompts using `shorten_too_long_text()` |
| 189 | +- Validate image dimensions and file formats |
| 190 | +- Use UUID prefixes for generated filenames to prevent conflicts |
| 191 | + |
| 192 | +### Production Security |
| 193 | +- Never expose cloud storage credentials in code |
| 194 | +- Use proper environment variable management |
| 195 | +- Implement rate limiting and request validation |
| 196 | +- Monitor for suspicious usage patterns |
| 197 | + |
| 198 | +## Contributing Guidelines |
| 199 | + |
| 200 | +### Pull Request Checklist |
| 201 | +- [ ] Code follows existing patterns and style |
| 202 | +- [ ] New features include appropriate error handling |
| 203 | +- [ ] Memory management is properly implemented |
| 204 | +- [ ] Tests pass and new functionality is tested |
| 205 | +- [ ] Documentation is updated (CLAUDE.md, this file, docstrings) |
| 206 | +- [ ] No sensitive information is committed |
| 207 | + |
| 208 | +### Code Review Focus Areas |
| 209 | +1. **Memory safety**: Proper pipeline management and GPU memory usage |
| 210 | +2. **Error handling**: Robust retry logic and graceful degradation |
| 211 | +3. **API consistency**: Following established endpoint patterns |
| 212 | +4. **Performance impact**: Changes don't negatively affect generation speed |
| 213 | +5. **Security**: Input validation and credential management |
0 commit comments