|
| 1 | +# MultiGpuHelper |
| 2 | + |
| 3 | +A production-ready C# library for scheduling compute jobs across multiple GPUs. Designed for hobby projects involving AI inference, rendering, or other GPU-accelerated workloads. |
| 4 | + |
| 5 | +## Features |
| 6 | + |
| 7 | +- **Multi-GPU Support**: Distribute work across NVIDIA GPUs (and extensible to other vendors) |
| 8 | +- **Device Discovery**: Automatic GPU detection via `nvidia-smi` with graceful fallback |
| 9 | +- **Selection Policies**: Round-robin, most-available-VRAM, or explicit device targeting |
| 10 | +- **VRAM Budgeting**: Soft-reservation system with per-device limits |
| 11 | +- **Concurrency Control**: Per-GPU semaphores to limit parallel job execution |
| 12 | +- **Async-First**: Built on async/await for responsive applications |
| 13 | +- **Cross-Platform**: Targets .NET Standard 2.0 for .NET Framework and .NET Core/.NET compatibility |
| 14 | + |
| 15 | +## Installation |
| 16 | + |
| 17 | +### Via NuGet |
| 18 | + |
| 19 | +```bash |
| 20 | +dotnet add package MultiGpuHelper |
| 21 | +``` |
| 22 | + |
| 23 | +### From Source |
| 24 | + |
| 25 | +```bash |
| 26 | +git clone https://github.com/Vanderhell/MultiGpuHelper.git |
| 27 | +cd MultiGpuHelper |
| 28 | +dotnet build |
| 29 | +``` |
| 30 | + |
| 31 | +## Quick Start |
| 32 | + |
| 33 | +### Auto-Detect GPUs |
| 34 | + |
| 35 | +```csharp |
| 36 | +using MultiGpuHelper.Management; |
| 37 | +using MultiGpuHelper.Dispatching; |
| 38 | +using MultiGpuHelper.Models; |
| 39 | + |
| 40 | +var manager = new GpuManager(); |
| 41 | +await manager.InitializeFromProbeAsync(); // Detects GPUs via nvidia-smi |
| 42 | +
|
| 43 | +var dispatcher = new GpuDispatcher(manager); |
| 44 | + |
| 45 | +// Run work on any available GPU |
| 46 | +var result = await dispatcher.RunAsync( |
| 47 | + deviceId => { |
| 48 | + Console.WriteLine($"Running on GPU {deviceId}"); |
| 49 | + return Task.FromResult(42); |
| 50 | + }, |
| 51 | + GpuPolicy.RoundRobin |
| 52 | +); |
| 53 | +``` |
| 54 | + |
| 55 | +### Manual Registration |
| 56 | + |
| 57 | +```csharp |
| 58 | +using MultiGpuHelper.Management; |
| 59 | +using MultiGpuHelper.Utilities; |
| 60 | + |
| 61 | +var builder = new GpuRegistrationBuilder(); |
| 62 | +builder |
| 63 | + .AddDevice(0, "NVIDIA RTX 4090", Size.GiB(24)) |
| 64 | + .ConfigureDevice(0, budgetBytes: Size.GiB(20), maxConcurrentJobs: 2) |
| 65 | + .AddDevice(1, "NVIDIA RTX 4080", Size.GiB(16)) |
| 66 | + .ConfigureDevice(1, budgetBytes: Size.GiB(14), maxConcurrentJobs: 1); |
| 67 | + |
| 68 | +var manager = builder.Build(); |
| 69 | +var dispatcher = new GpuDispatcher(manager); |
| 70 | + |
| 71 | +// Work items will be distributed according to policy |
| 72 | +``` |
| 73 | + |
| 74 | +## Selection Policies |
| 75 | + |
| 76 | +- **RoundRobin**: Distributes work evenly across GPUs in sequence |
| 77 | +- **MostFreeVram**: Always selects the GPU with the most available memory |
| 78 | +- **SpecificDevice**: Routes work to a specific GPU by ID |
| 79 | + |
| 80 | +## VRAM Budgeting |
| 81 | + |
| 82 | +Set per-device VRAM limits: |
| 83 | + |
| 84 | +```csharp |
| 85 | +device.VramBudget.LimitBytes = Size.GiB(20); |
| 86 | + |
| 87 | +// Try to reserve VRAM |
| 88 | +if (!device.VramBudget.TryReserve(Size.MiB(500))) |
| 89 | +{ |
| 90 | + throw new GpuBudgetExceededException("Insufficient VRAM budget"); |
| 91 | +} |
| 92 | + |
| 93 | +// Automatically released when work completes |
| 94 | +``` |
| 95 | + |
| 96 | +## Advanced Usage |
| 97 | + |
| 98 | +### Timeouts and Cancellation |
| 99 | + |
| 100 | +```csharp |
| 101 | +var workItem = new GpuWorkItem |
| 102 | +{ |
| 103 | + TimeoutMs = 30000, // 30-second timeout |
| 104 | + RequestedVramBytes = Size.MiB(512), |
| 105 | + Tag = "ProcessingTask" |
| 106 | +}; |
| 107 | + |
| 108 | +await dispatcher.RunAsync( |
| 109 | + async deviceId => { /* work */ }, |
| 110 | + GpuPolicy.MostFreeVram, |
| 111 | + workItem, |
| 112 | + cancellationToken |
| 113 | +); |
| 114 | +``` |
| 115 | + |
| 116 | +### Custom Logging |
| 117 | + |
| 118 | +```csharp |
| 119 | +public class ConsoleLogger : IGpuLogger |
| 120 | +{ |
| 121 | + public void Debug(string message) => Console.WriteLine($"[DEBUG] {message}"); |
| 122 | + public void Info(string message) => Console.WriteLine($"[INFO] {message}"); |
| 123 | + public void Warn(string message) => Console.WriteLine($"[WARN] {message}"); |
| 124 | + public void Error(string message) => Console.WriteLine($"[ERROR] {message}"); |
| 125 | +} |
| 126 | + |
| 127 | +var manager = new GpuManager(logger: new ConsoleLogger()); |
| 128 | +``` |
| 129 | + |
| 130 | +## API Overview |
| 131 | + |
| 132 | +### Core Types |
| 133 | + |
| 134 | +- `GpuDevice`: Represents a GPU device with VRAM and concurrency settings |
| 135 | +- `VramBudget`: Thread-safe VRAM reservation system |
| 136 | +- `GpuWorkItem`: Describes a unit of work with memory requirements and timeouts |
| 137 | +- `GpuPolicy`: Enum for device selection strategies |
| 138 | +- `GpuManager`: Manages device registry and selection logic |
| 139 | +- `GpuDispatcher`: Main interface for scheduling work on GPUs |
| 140 | + |
| 141 | +### Exceptions |
| 142 | + |
| 143 | +- `GpuSelectionException`: Thrown when device selection fails |
| 144 | +- `GpuProbeException`: Thrown when GPU detection fails |
| 145 | +- `GpuBudgetExceededException`: Thrown when VRAM budget is exceeded |
| 146 | + |
| 147 | +## Project Structure |
| 148 | + |
| 149 | +``` |
| 150 | +MultiGpuHelper.sln |
| 151 | +├── src/ |
| 152 | +│ └── MultiGpuHelper/ # Main library (netstandard2.0) |
| 153 | +│ ├── Models/ # GpuDevice, VramBudget, etc. |
| 154 | +│ ├── Management/ # GpuManager, GpuRegistrationBuilder |
| 155 | +│ ├── Dispatching/ # GpuDispatcher |
| 156 | +│ ├── Probing/ # GPU detection providers |
| 157 | +│ ├── Logging/ # Logging abstraction |
| 158 | +│ └── Utilities/ # Helper functions (Size, etc.) |
| 159 | +├── samples/ |
| 160 | +│ ├── SampleConsole/ # .NET 8 sample with async examples |
| 161 | +│ └── SampleNetFramework/ # .NET Framework 4.7.2 sample |
| 162 | +├── tests/ |
| 163 | +│ └── MultiGpuHelper.Tests/ # Unit tests (xUnit) |
| 164 | +└── README.md |
| 165 | +``` |
| 166 | + |
| 167 | +## Supported Platforms |
| 168 | + |
| 169 | +- **.NET Framework**: 4.6.1+ |
| 170 | +- **.NET Core**: 2.1+ |
| 171 | +- **.NET**: 6.0, 8.0+ |
| 172 | + |
| 173 | +## GPU Support |
| 174 | + |
| 175 | +- **NVIDIA**: Full support via `nvidia-smi` |
| 176 | +- **AMD ROCm**: Future extensibility via `IGpuProbeProvider` |
| 177 | +- **Intel oneAPI**: Future extensibility via `IGpuProbeProvider` |
| 178 | + |
| 179 | +The library gracefully handles missing or non-functional GPU probes, returning an empty device list rather than crashing. |
| 180 | + |
| 181 | +## Error Handling |
| 182 | + |
| 183 | +All library boundaries throw meaningful custom exceptions with context: |
| 184 | + |
| 185 | +```csharp |
| 186 | +try |
| 187 | +{ |
| 188 | + await dispatcher.RunAsync(work, policy); |
| 189 | +} |
| 190 | +catch (GpuSelectionException ex) |
| 191 | +{ |
| 192 | + // No suitable device found |
| 193 | +} |
| 194 | +catch (GpuBudgetExceededException ex) |
| 195 | +{ |
| 196 | + // VRAM budget exceeded |
| 197 | +} |
| 198 | +catch (GpuProbeException ex) |
| 199 | +{ |
| 200 | + // GPU detection failed |
| 201 | +} |
| 202 | +``` |
| 203 | + |
| 204 | +## Thread Safety |
| 205 | + |
| 206 | +- `GpuManager`: Fully thread-safe |
| 207 | +- `VramBudget`: Thread-safe atomic operations |
| 208 | +- `GpuDispatcher`: Safe for concurrent work dispatching |
| 209 | +- Device semaphores prevent over-subscription |
| 210 | + |
| 211 | +## Performance Considerations |
| 212 | + |
| 213 | +1. **Concurrency Limits**: Set `MaxConcurrentJobs` conservatively to avoid GPU saturation |
| 214 | +2. **VRAM Budgets**: Reserve headroom (typically 10-20% of total VRAM) |
| 215 | +3. **Device Refresh**: Call `manager.RefreshAsync()` periodically for accurate VRAM info |
| 216 | +4. **Selection Policy**: Use `MostFreeVram` for workloads with variable memory requirements |
| 217 | + |
| 218 | +## Testing |
| 219 | + |
| 220 | +Run the unit tests: |
| 221 | + |
| 222 | +```bash |
| 223 | +dotnet test tests/MultiGpuHelper.Tests/ |
| 224 | +``` |
| 225 | + |
| 226 | +Run the samples: |
| 227 | + |
| 228 | +```bash |
| 229 | +dotnet run --project samples/SampleConsole/ |
| 230 | +dotnet run --project samples/SampleNetFramework/ # Requires .NET Framework SDK |
| 231 | +``` |
| 232 | + |
| 233 | +Run hardware verification test with real GPUs: |
| 234 | + |
| 235 | +```bash |
| 236 | +dotnet run --project samples/HardwareTest/ |
| 237 | +``` |
| 238 | + |
| 239 | +## Packaging & CI |
| 240 | + |
| 241 | +### Building Locally |
| 242 | + |
| 243 | +Build the solution: |
| 244 | + |
| 245 | +```bash |
| 246 | +dotnet build -c Release |
| 247 | +``` |
| 248 | + |
| 249 | +### NuGet Package |
| 250 | + |
| 251 | +The library is configured for automatic NuGet package generation. |
| 252 | + |
| 253 | +**Build and pack**: |
| 254 | + |
| 255 | +```bash |
| 256 | +dotnet pack src/MultiGpuHelper/MultiGpuHelper.csproj -c Release -o ./artifacts |
| 257 | +``` |
| 258 | + |
| 259 | +**Package location**: |
| 260 | + |
| 261 | +- `.nupkg` (main package) → `artifacts/MultiGpuHelper.{version}.nupkg` |
| 262 | +- `.snupkg` (symbol package) → `artifacts/MultiGpuHelper.{version}.snupkg` |
| 263 | + |
| 264 | +**Local NuGet push** (for testing): |
| 265 | + |
| 266 | +```bash |
| 267 | +dotnet nuget push ./artifacts/MultiGpuHelper.1.0.0.nupkg -s <local-nuget-source> |
| 268 | +``` |
| 269 | + |
| 270 | +### Strong-Name Signing |
| 271 | + |
| 272 | +The assembly is **strongly signed** with a 2048-bit RSA key: |
| 273 | + |
| 274 | +- Key file: `MultiGpuHelper.snk` (solution root) |
| 275 | +- Configured in: `src/MultiGpuHelper/MultiGpuHelper.csproj` |
| 276 | +- Property: `<SignAssembly>true</SignAssembly>` |
| 277 | + |
| 278 | +This enables the library to be used in full-trust .NET Framework applications. |
| 279 | + |
| 280 | +### CI/CD Pipeline |
| 281 | + |
| 282 | +Automated builds run on GitHub Actions (`.github/workflows/ci.yml`): |
| 283 | + |
| 284 | +**Triggers**: |
| 285 | +- Every push to `main` or `develop` |
| 286 | +- Every pull request to `main` or `develop` |
| 287 | + |
| 288 | +**Pipeline steps**: |
| 289 | +1. Checkout code |
| 290 | +2. Setup .NET 8.x SDK |
| 291 | +3. Restore dependencies |
| 292 | +4. Build (Release configuration) |
| 293 | +5. Run tests (if `tests/` exists) |
| 294 | +6. Pack NuGet package |
| 295 | +7. Upload artifacts (.nupkg + .snupkg) |
| 296 | + |
| 297 | +**Artifacts**: Available on GitHub Actions run page under "nuget-packages" |
| 298 | + |
| 299 | +### Versioning |
| 300 | + |
| 301 | +This project follows **Semantic Versioning (SemVer)**. |
| 302 | + |
| 303 | +See [VERSIONING.md](VERSIONING.md) for detailed versioning policy and release workflow. |
| 304 | + |
| 305 | +Current version: **1.0.0** |
| 306 | + |
| 307 | +## License |
| 308 | + |
| 309 | +This library is released under the **MIT License**. See [LICENSE](LICENSE) for details. |
| 310 | + |
| 311 | +## Contributing |
| 312 | + |
| 313 | +Contributions are welcome! Please ensure: |
| 314 | + |
| 315 | +1. Code follows existing style conventions (English comments throughout) |
| 316 | +2. New features include unit tests |
| 317 | +3. Breaking changes are avoided or clearly documented |
| 318 | +4. Documentation is updated |
| 319 | + |
| 320 | +## Future Enhancements |
| 321 | + |
| 322 | +- [ ] AMD ROCm probe provider |
| 323 | +- [ ] Intel oneAPI probe provider |
| 324 | +- [ ] OpenCL support |
| 325 | +- [ ] GPU memory profiling hooks |
| 326 | +- [ ] Work queue persistence |
| 327 | +- [ ] Multi-machine GPU clustering |
| 328 | + |
| 329 | +## Support |
| 330 | + |
| 331 | +For issues, questions, or suggestions, please open an issue on GitHub. |
| 332 | +Provided as-is, no SLA |
| 333 | +--- |
| 334 | + |
| 335 | +**MultiGpuHelper** — Making multi-GPU scheduling simple and robust. |
0 commit comments