Netwrck
diff --git a/‎.github/copilot-instructions.md‎
Lines changed: 140 additions & 0 deletions b/‎.github/copilot-instructions.md‎
Lines changed: 140 additions & 0 deletions
diff --git a/‎AGENTS.md‎
Lines changed: 213 additions & 0 deletions b/‎AGENTS.md‎
Lines changed: 213 additions & 0 deletions
@@ -0,0 +1,140 @@
+# GitHub Copilot Instructions for Stable Diffusion Server
+
+## Project Context
+This is a production-ready Stable Diffusion server supporting multiple AI models (SDXL, Flux Schnell) with FastAPI backend and Gradio UI. The server handles text-to-image generation, inpainting, style transfer, and cloud storage integration.
+
+## Code Style & Patterns
+
+### Python Standards
+- Use type hints for all function parameters and return values
+- Follow existing error handling patterns with try/except and retry logic
+- Use `torch.inference_mode()` context for all model inference operations
+- Implement proper memory management with CPU offloading and cache clearing
+
+### Model Pipeline Patterns
+```python
+# Always use this pattern for inference
+with torch.inference_mode():
+    image = pipe(
+        prompt=prompt,
+        num_inference_steps=n_steps,
+        guidance_scale=guidance_scale,
+        generator=generator,
+    ).images[0]
+```
+
+### Image Processing Standards
+- Convert all input images to RGB: `image.convert("RGB")`
+- Use 64-pixel alignment for dimensions: `width - (width % 64)`
+- Save images as WebP with quality=85 and optimize=True
+- Always process images through `process_image_for_stable_diffusion()` before inference
+
+### Error Handling Requirements
+- Implement retry logic with prompt modification on failures
+- Use `shorten_prompt_for_retry()` and `remove_stopwords()` for retries
+- Log warnings with attempt counts: `logger.warning(f"Failed on attempt {attempt + 1}/{retries}: {err}")`
+- Detect and handle "too bumpy" images with regeneration
+
+### API Endpoint Patterns
+```python
+@app.get("/endpoint_name")
+async def endpoint_function(
+    prompt: str, 
+    save_path: str = "",
+    # other params with defaults
+):
+    # URL encode save_path components
+    path_components = save_path.split("/")[0:-1]
+    final_name = save_path.split("/")[-1] 
+    save_path = "/".join(path_components) + quote_plus(final_name)
+    
+    # Check cache first
+    if check_if_blob_exists(save_path):
+        return JSONResponse({"path": f"https://{BUCKET_NAME}/{BUCKET_PATH}/{save_path}"})
+```
+
+## Model Architecture Guidelines
+
+### Pipeline Initialization
+- Enable CPU offloading: `pipe.enable_model_cpu_offload()`
+- Enable memory optimizations: `pipe.enable_attention_slicing()`, `pipe.enable_vae_slicing()`
+- Share components between pipelines to reduce memory usage
+- Set `pipe.watermark = None` to disable watermarking
+
+### Memory Management
+- Use sequential CPU offloading for production: `pipe.enable_sequential_cpu_offload()`
+- Implement component sharing: `img2img.unet = pipe.unet`
+- Consider Optimum Quanto quantization for memory-constrained environments
+- Use torch.Generator with fixed seeds for reproducible results
+
+## Cloud Storage Integration
+
+### Upload Pattern
+```python
+# Always check existence before generation
+if check_if_blob_exists(save_path):
+    return f"https://{BUCKET_NAME}/{BUCKET_PATH}/{save_path}"
+
+# Generate and upload
+bio = create_image_from_prompt(prompt, width, height)
+link = upload_to_bucket(save_path, bio, is_bytesio=True)
+return link
+```
+
+### Environment Variables
+- Use `STORAGE_PROVIDER` to switch between 'r2' and 'gcs'
+- Support both R2_ENDPOINT_URL and GOOGLE_APPLICATION_CREDENTIALS
+- Respect BUCKET_NAME and BUCKET_PATH configuration
+
+## Gradio UI Patterns
+
+### Interface Structure
+```python
+with gr.Blocks() as demo:
+    with gr.Row():
+        with gr.Column():
+            # Input controls
+        with gr.Column():
+            # Output displays
+    
+    # Event handlers
+    button.click(function, inputs=[...], outputs=[...])
+```
+
+### Image Handling
+- Use `gr.Image(tool="editor", type="pil")` for inpainting interfaces
+- Save intermediate results to outputs/ directory with descriptive names
+- Yield intermediate results for progressive display: `yield images`
+
+## Security & Performance
+
+### Input Validation
+- Always call `shorten_too_long_text()` on prompts and save_paths
+- Validate image dimensions and adjust to model requirements
+- Use UUID prefixes for file naming to avoid conflicts
+
+### Production Considerations
+- Implement progress tracking with progress.txt updates
+- Handle CUDA memory issues with automatic server restart logic
+- Use proper timeout settings: `--timeout-keep-alive 600`
+- Limit concurrency: `--limit-concurrency 4` for memory management
+
+## Testing & Development
+
+### Local Development
+- Use `python gradio_ui.py` for quick UI testing
+- Test individual models with `python flux_schnell.py`
+- Run server locally: `uvicorn main:app --port 8000 --reload`
+
+### Production Testing
+- Test with limited concurrency settings
+- Verify cloud storage uploads work correctly
+- Monitor memory usage and restart behavior
+- Test error handling with malformed inputs
+
+## Common Pitfalls to Avoid
+- Never use models without CPU offloading in production
+- Don't forget to convert masks to RGB for inpainting
+- Always check for None returns from image generation functions
+- Don't skip the "too bumpy" detection - it prevents bad outputs
+- Remember to update progress.txt to prevent supervisor timeouts
@@ -0,0 +1,213 @@
+# AGENTS.md - AI Assistant Guidelines
+
+This file provides comprehensive guidance for AI assistants (Claude, ChatGPT, etc.) working with the Stable Diffusion Server codebase.
+
+## Project Overview
+A production-ready AI image generation server supporting multiple diffusion models with cloud storage integration, Gradio UI, and FastAPI backend.
+
+## Core Capabilities
+- **Text-to-Image**: Flux Schnell and SDXL model support
+- **Style Transfer**: ControlNet-guided image transformation  
+- **Inpainting**: Mask-based image editing with refinement
+- **Cloud Storage**: R2/GCS integration with automatic caching
+- **UI Components**: Gradio interfaces for local development
+
+## Quick Start Commands
+
+### Development Setup
+```bash
+# Environment setup
+pip install uv && uv venv && source .venv/bin/activate
+uv pip install -r requirements.txt -r dev-requirements.txt
+python -c "import nltk; nltk.download('stopwords')"
+
+# Local testing
+python flux_schnell.py                    # Test Flux model
+python gradio_ui.py                       # Launch UI
+uvicorn main:app --port 8000             # Run API server
+```
+
+### Production Deployment
+```bash
+# With environment variables
+GOOGLE_APPLICATION_CREDENTIALS=secrets/google-credentials.json \
+PYTHONPATH=. uvicorn --port 8000 --timeout-keep-alive 600 --workers 1 --limit-concurrency 4 main:app
+```
+
+## Key Architecture Components
+
+### Model Pipelines (main.py:74-444)
+1. **Primary SDXL Pipeline** (`pipe`) - ProteusV0.2 with LCM scheduler
+2. **Flux Schnell Pipeline** (`flux_pipe`) - Fast text-to-image generation  
+3. **Image2Image Pipeline** (`img2img`) - Style transfer operations
+4. **Inpainting Pipelines** (`inpaintpipe`, `inpaint_refiner`) - Mask-based editing
+5. **ControlNet Pipelines** - Canny edge and line-guided generation
+
+### Memory Management Strategy
+- CPU offloading for all pipelines to manage GPU memory
+- Component sharing between pipelines (shared UNet, VAE, encoders)
+- Attention slicing and VAE slicing for efficiency
+- Optional Optimum Quanto quantization support
+
+### API Endpoints
+- `/create_and_upload_image` - Text-to-image with cloud upload
+- `/inpaint_and_upload_image` - Inpainting with cloud upload
+- `/style_transfer_and_upload_image` - Style transfer with cloud upload  
+- `/style_transfer_bytes_and_upload_image` - File upload support
+
+## Development Guidelines
+
+### When Adding New Features
+1. **Follow existing patterns**: Use the same error handling, retry logic, and memory management
+2. **Maintain compatibility**: Ensure new features work with existing pipeline architecture
+3. **Test thoroughly**: Use both Gradio UI and API endpoints for validation
+4. **Document changes**: Update relevant sections in CLAUDE.md and this file
+
+### Code Quality Standards
+```python
+# Always use type hints
+def generate_image(prompt: str, width: int = 1024) -> Image.Image:
+
+# Use inference mode for all model operations  
+with torch.inference_mode():
+    image = pipe(prompt=prompt).images[0]
+
+# Implement proper error handling with retries
+for attempt in range(retries + 1):
+    try:
+        # Generation logic
+        break
+    except Exception as err:
+        if attempt >= retries:
+            raise
+        logger.warning(f"Failed attempt {attempt + 1}/{retries}: {err}")
+```
+
+### Common Tasks and Solutions
+
+#### Adding a New Model
+1. Load model in main.py initialization section
+2. Enable CPU offloading and memory optimizations
+3. Share components with existing pipelines where possible
+4. Add corresponding API endpoint following existing patterns
+5. Test with Gradio UI integration
+
+#### Modifying Image Processing
+1. Update `stable_diffusion_server/image_processing.py`
+2. Ensure compatibility with existing dimension requirements (64-pixel alignment)
+3. Test with various input formats and sizes
+4. Update error handling for edge cases
+
+#### Cloud Storage Integration
+1. Check existing `stable_diffusion_server/bucket_api.py` implementation
+2. Follow the check-exists-before-generate pattern
+3. Handle both R2 and GCS storage backends
+4. Test upload/download functionality thoroughly
+
+## Troubleshooting Common Issues
+
+### Memory Problems
+- **Black images**: Usually indicates CUDA memory issues, server auto-restarts
+- **OOM errors**: Reduce concurrency, enable more aggressive CPU offloading
+- **Slow inference**: Check if models are properly using CPU offloading
+
+### Image Quality Issues  
+- **"Too bumpy" images**: Automatic detection triggers regeneration with modified prompts
+- **Poor style transfer**: Ensure canny edge detection is working correctly
+- **Blurry outputs**: Check if proper refinement passes are enabled
+
+### API/Server Issues
+- **Timeouts**: Update progress.txt file during long operations
+- **Upload failures**: Verify cloud storage credentials and bucket permissions
+- **Rate limiting**: Adjust `--limit-concurrency` and `--backlog` settings
+
+## Environment Configuration
+
+### Required Environment Variables
+```bash
+# Storage (choose one)
+STORAGE_PROVIDER=r2|gcs
+BUCKET_NAME=your-bucket-name
+BUCKET_PATH=static/uploads
+R2_ENDPOINT_URL=https://account.r2.cloudflarestorage.com
+PUBLIC_BASE_URL=your-domain.com
+GOOGLE_APPLICATION_CREDENTIALS=path/to/credentials.json
+
+# Model paths (optional)
+DF11_MODEL_PATH=DFloat11/FLUX.1-schnell-DF11
+CONTROLNET_LORA=black-forest-labs/flux-controlnet-line-lora
+LOAD_LCM_LORA=1
+```
+
+### Model Directory Structure
+```
+models/
+├── ProteusV0.2/           # Primary SDXL model
+├── stable-diffusion-xl-base-1.0/  # Base SDXL model
+├── lcm-lora-sdxl/         # LCM LoRA weights
+└── diffusers/
+    └── controlnet-canny-sdxl-1.0/  # ControlNet model
+```
+
+## Testing and Validation
+
+### Before Submitting Changes
+1. **Run existing tests**: `pytest -q`
+2. **Check code style**: `flake8`  
+3. **Test UI functionality**: Launch `python gradio_ui.py` and verify all features
+4. **Test API endpoints**: Send requests to key endpoints and verify responses
+5. **Memory usage**: Monitor GPU/CPU usage during generation
+
+### Integration Testing
+```bash
+# Test image generation
+curl "http://localhost:8000/create_and_upload_image?prompt=test&save_path=test.webp"
+
+# Test style transfer  
+curl -X POST "http://localhost:8000/style_transfer_bytes_and_upload_image" \
+  -F "prompt=anime style" -F "image_file=@test.jpg" -F "save_path=output.webp"
+```
+
+## Performance Optimization
+
+### Memory Optimization
+- Use `enable_sequential_cpu_offload()` for lowest memory usage
+- Share model components between pipelines
+- Consider quantization for memory-constrained environments
+- Monitor and tune batch sizes for optimal throughput
+
+### Speed Optimization  
+- Use Flux Schnell for fastest generation (4-8 steps)
+- Enable LCM LoRA for SDXL speed improvements
+- Implement proper caching with `check_if_blob_exists()`
+- Use appropriate guidance scales (0.0 for Flux, 7+ for SDXL)
+
+## Security Considerations
+
+### Input Validation
+- Always validate and sanitize prompts using `shorten_too_long_text()`
+- Validate image dimensions and file formats
+- Use UUID prefixes for generated filenames to prevent conflicts
+
+### Production Security
+- Never expose cloud storage credentials in code
+- Use proper environment variable management
+- Implement rate limiting and request validation
+- Monitor for suspicious usage patterns
+
+## Contributing Guidelines
+
+### Pull Request Checklist
+- [ ] Code follows existing patterns and style
+- [ ] New features include appropriate error handling
+- [ ] Memory management is properly implemented
+- [ ] Tests pass and new functionality is tested
+- [ ] Documentation is updated (CLAUDE.md, this file, docstrings)
+- [ ] No sensitive information is committed
+
+### Code Review Focus Areas
+1. **Memory safety**: Proper pipeline management and GPU memory usage
+2. **Error handling**: Robust retry logic and graceful degradation
+3. **API consistency**: Following established endpoint patterns
+4. **Performance impact**: Changes don't negatively affect generation speed
+5. **Security**: Input validation and credential management