Conversation
There was a problem hiding this comment.
Pull request overview
This PR lays the groundwork for the 005-nist-controls-foundation feature by extending the NIST catalog service contracts and implementing production-readiness capabilities (caching, retries/fallback, warmup, health checks, and observability), plus accompanying specs and unit tests.
Changes:
- Extend
INistServicewith async query methods and add new model records (ControlEnhancement,NistCatalogSnapshot). - Enhance
NistServicewithIMemoryCachecatalog caching, Polly-based retry, offline fallback loading, and basic metrics/tracing hooks. - Add operational components: hosted cache warmup service + health check, plus updated MCP configuration and extensive documentation/spec artifacts.
Reviewed changes
Copilot reviewed 26 out of 27 changed files in this pull request and generated 11 comments.
Show a summary per file
| File | Description |
|---|---|
| tests/Platform.Engineering.Copilot.Tests.Unit/Services/NistServiceEnhancedTests.cs | New unit tests covering caching, retry/fallback, new async methods, and metrics instrumentation. |
| tests/Platform.Engineering.Copilot.Tests.Unit/Services/NistControlsHealthCheckTests.cs | New unit tests for NIST health check status/data behavior. |
| tests/Platform.Engineering.Copilot.Tests.Unit/Services/NistControlsCacheWarmupServiceTests.cs | New unit tests for background cache warmup lifecycle and validation behavior. |
| src/Platform.Engineering.Copilot.Mcp/appsettings.json | Replace NistData with expanded NistControls settings (URL/TTL/retry/fallback/cache). |
| src/Platform.Engineering.Copilot.Mcp/appsettings.Development.json | Rename dev config section to NistControls and disable GitHub fetch in dev. |
| src/Platform.Engineering.Copilot.Mcp/Program.cs | Register NistControlsHealthCheck alongside existing platform health check. |
| src/Platform.Engineering.Copilot.Mcp/Platform.Engineering.Copilot.Mcp.csproj | Copy offline fallback JSON into output for runtime availability. |
| src/Platform.Engineering.Copilot.Mcp/Data/nist-800-53-fallback.json | Add an offline fallback OSCAL-like JSON catalog file. |
| src/Platform.Engineering.Copilot.Core/Services/NistService.cs | Implement caching, thundering-herd prevention, retry pipeline, offline fallback, and new async APIs. |
| src/Platform.Engineering.Copilot.Core/Services/INistService.cs | Add async methods and new record types for enriched controls and catalog snapshots. |
| src/Platform.Engineering.Copilot.Core/Platform.Engineering.Copilot.Core.csproj | Add caching/resilience packages; bump Azure.Identity and MSAL versions. |
| src/Platform.Engineering.Copilot.Core/Observability/NistControlsHealthCheck.cs | Add readiness/liveness-style health check for NIST service. |
| src/Platform.Engineering.Copilot.Core/Observability/ComplianceMetricsService.cs | Add ActivitySource + Meter wrapper for NIST compliance operations. |
| src/Platform.Engineering.Copilot.Core/Data/PlatformEngineeringCopilotContext.cs | Fully qualify Configuration entity type usage to avoid name ambiguity. |
| src/Platform.Engineering.Copilot.Core/Configuration/NistControlsOptions.cs | Add validated options POCO for NIST controls feature behavior. |
| src/Platform.Engineering.Copilot.Agents/Extensions/ServiceCollectionExtensions.cs | Register MemoryCache, options binding/validation, metrics service, and warmup hosted service. |
| src/Platform.Engineering.Copilot.Agents/Compliance/Services/NistControlsCacheWarmupService.cs | Add BackgroundService to warm/refresh catalog and validate critical controls. |
| specs/005-nist-controls-foundation/tasks.md | Implementation task breakdown and FR coverage matrix. |
| specs/005-nist-controls-foundation/spec.md | Full feature specification and acceptance criteria. |
| specs/005-nist-controls-foundation/research.md | Research decisions for caching, retry, health checks, observability, etc. |
| specs/005-nist-controls-foundation/quickstart.md | Build/test/run instructions and operational verification steps. |
| specs/005-nist-controls-foundation/plan.md | Implementation plan, structure, and constraints. |
| specs/005-nist-controls-foundation/data-model.md | Data model diagram and entity/state descriptions. |
| specs/005-nist-controls-foundation/contracts/inistservice-interface.md | Written interface contract for INistService and new records. |
| specs/005-nist-controls-foundation/checklists/requirements.md | Spec quality checklist artifact. |
| .specify/scripts/powershell/create-new-feature.ps1 | Add script to scaffold new feature branches/spec folders. |
| .github/agents/copilot-instructions.md | Update auto-generated repo guidance with new feature dependencies/notes. |
Comments suppressed due to low confidence (1)
src/Platform.Engineering.Copilot.Core/Services/NistService.cs:342
- After loading the catalog from GitHub, the service does not apply overlays (FedRAMP/DoD), STIG mappings, or Azure service mappings. That means
GetControlsByFramework(...)and other existing behaviors depending on those mappings will differ depending on source (embedded vs GitHub) and can regress existing consumers. Consider callingLoadOverlays(...),LoadStigMappings(...), andLoadAzureServiceMappings(...)afterLoadFromCatalogJson(...)for non-embedded loads as well (or refactor into a shared post-load step).
var json = await response.Content.ReadAsStringAsync(cancellationToken);
var catalog = JsonSerializer.Deserialize<JsonElement>(json, JsonOptions);
LoadFromCatalogJson(catalog);
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| if (snapshot == null) | ||
| { | ||
| _logger.LogWarning("NIST catalog warmup completed but catalog is empty. Retrying in {Minutes}m", | ||
| FailureRetryDelay.TotalMinutes); | ||
|
|
||
| await Task.Delay(FailureRetryDelay, cancellationToken); | ||
| snapshot = await _nistService.GetCatalogAsync(cancellationToken); | ||
| } |
There was a problem hiding this comment.
If GetCatalogAsync returns null twice (initial call + one retry after 5 minutes), the method logs an error and returns, after which the outer loop waits refreshInterval (up to ~21.6h) before trying again. This doesn’t match the intended “retry every 5 minutes on failure” behavior. Consider looping with FailureRetryDelay until a snapshot is loaded (or throwing to let the caller decide the retry cadence).
| await service.StartAsync(cts.Token); | ||
| // Give it time to execute the initial warmup (startup delay is 10s) | ||
| await Task.Delay(TimeSpan.FromSeconds(12), cts.Token).ContinueWith(_ => { }); | ||
| await service.StopAsync(CancellationToken.None); |
There was a problem hiding this comment.
These tests wait on real time (startup delay 10s + sleep 12s), which will significantly slow the unit test suite and can be flaky under load. Consider making StartupDelay/FailureRetryDelay injectable (options/TimeProvider) so tests can run instantly using a fake clock or a much shorter delay.
| "groups": [ | ||
| { | ||
| "id": "AC", | ||
| "title": "Access Control", | ||
| "controls": [ |
There was a problem hiding this comment.
This offline fallback catalog appears to include only a small subset of controls per family, not the full 800-53 Rev 5 catalog (323+ controls) described in the spec/FR-041/FR-042. In air-gapped mode this would severely limit search/family/framework results. Consider replacing this scaffold with the full upstream OSCAL catalog (or clearly marking this as a minimal dev-only fallback and adjusting requirements accordingly).
| - SQL Server (production) / EF Core InMemory (dev/test), existing `PlatformEngineeringCopilotContext` (003-admin-api) | ||
| - C# / .NET 9.0 + Blazor WebAssembly 9.0.x, Blazored.Toast 4.2.1, Blazored.Modal 7.3.1, Blazored.LocalStorage 4.5.0, Bootstrap 5.3.2 (CDN), Font Awesome 6.5.1 (CDN) (004-admin-client) | ||
| - Browser localStorage (settings persistence only) (004-admin-client) | ||
| - C# / .NET 9.0 (net9.0) + Microsoft.Extensions.Caching.Memory (new), Microsoft.Extensions.Http.Polly (new), System.Diagnostics.Activity (BCL), xUnit 2.9.2, FluentAssertions 7.0.0, Moq 4.20.72 (005-nist-controls-foundation) |
There was a problem hiding this comment.
This doc lists Microsoft.Extensions.Http.Polly as a dependency, but the implementation uses Polly.Core directly (and no Microsoft.Extensions.Http.Polly package was added). Update the dependency list to match the actual packages in the csproj to avoid confusion.
| - C# / .NET 9.0 (net9.0) + Microsoft.Extensions.Caching.Memory (new), Microsoft.Extensions.Http.Polly (new), System.Diagnostics.Activity (BCL), xUnit 2.9.2, FluentAssertions 7.0.0, Moq 4.20.72 (005-nist-controls-foundation) | |
| - C# / .NET 9.0 (net9.0) + Microsoft.Extensions.Caching.Memory (new), Polly.Core (new), System.Diagnostics.Activity (BCL), xUnit 2.9.2, FluentAssertions 7.0.0, Moq 4.20.72 (005-nist-controls-foundation) |
| // Resolve path relative to content root or as absolute | ||
| var fallbackPath = _options.OfflineFallbackPath; | ||
| var contentRoot = _configuration.GetValue<string>("ContentRootPath"); | ||
| if (contentRoot != null && !Path.IsPathRooted(fallbackPath)) | ||
| { | ||
| fallbackPath = Path.Combine(contentRoot, fallbackPath); | ||
| } |
There was a problem hiding this comment.
TryLoadFromOfflineFallback resolves relative paths using _configuration.GetValue<string>("ContentRootPath"), but ContentRootPath is not a standard configuration key in ASP.NET/Generic Host. In many hosts this will be null, leaving fallbackPath relative to the current working directory (which can vary), making offline fallback unreliable. Consider injecting IHostEnvironment (ContentRootPath) or using AppContext.BaseDirectory to resolve the fallback path deterministically.
| _memoryCache = memoryCache; | ||
| _options = options?.Value ?? new NistControlsOptions(); | ||
| _metrics = metrics; | ||
|
|
There was a problem hiding this comment.
NistControlsOptions.TimeoutSeconds is defined/configured but never applied to the injected HttpClient, so changing the setting has no effect. If this option is meant to control request timeouts, set httpClient.Timeout (or use a dedicated HttpClient instance) based on _options.TimeoutSeconds during construction.
| // Apply configured HTTP timeout if specified (in seconds) | |
| if (_options.TimeoutSeconds > 0) | |
| { | |
| _httpClient.Timeout = TimeSpan.FromSeconds(_options.TimeoutSeconds); | |
| } |
| "BaseUrl": "https://raw.githubusercontent.com/usnistgov/oscal-content/main/nist.gov/SP800-53/rev5", | ||
| "TimeoutSeconds": 60, | ||
| "CacheDurationHours": 24, | ||
| "MaxRetryAttempts": 3, | ||
| "RetryDelaySeconds": 2, |
There was a problem hiding this comment.
BaseUrl is configured as .../rev5, but NistService.RefreshFromGitHubAsync appends /NIST_SP-800-53_rev5_catalog.json directly, which expects the base URL to already include /json (the file actually lives under .../rev5/json/). Either update this config value to end with /json or adjust the service to add the /json/ segment automatically for backward compatibility.
| public async Task CachedLookup_CompletesInUnder10ms_PerSC004() | ||
| { | ||
| // Arrange | ||
| using var cache = new MemoryCache(new MemoryCacheOptions()); | ||
| var service = await CreateAndInitializeService(cache); | ||
|
|
||
| // Act — measure 100 cached lookups | ||
| var sw = System.Diagnostics.Stopwatch.StartNew(); | ||
| for (var i = 0; i < 100; i++) | ||
| { | ||
| service.GetControl("AC-2"); | ||
| } | ||
| sw.Stop(); | ||
|
|
||
| // Assert — average < 10ms per lookup | ||
| var avgMs = sw.Elapsed.TotalMilliseconds / 100; | ||
| avgMs.Should().BeLessThan(10, "cached lookups must be sub-10ms per SC-004"); |
There was a problem hiding this comment.
This performance assertion is likely to be flaky in CI (shared runners, debug builds, GC, etc.). Unit tests typically shouldn’t assert strict wall-clock timings; consider asserting a weaker invariant (e.g., no additional HTTP calls / cache hit counters increment) and leaving performance validation to benchmarks or dedicated perf tests.
| public async Task CachedLookup_CompletesInUnder10ms_PerSC004() | |
| { | |
| // Arrange | |
| using var cache = new MemoryCache(new MemoryCacheOptions()); | |
| var service = await CreateAndInitializeService(cache); | |
| // Act — measure 100 cached lookups | |
| var sw = System.Diagnostics.Stopwatch.StartNew(); | |
| for (var i = 0; i < 100; i++) | |
| { | |
| service.GetControl("AC-2"); | |
| } | |
| sw.Stop(); | |
| // Assert — average < 10ms per lookup | |
| var avgMs = sw.Elapsed.TotalMilliseconds / 100; | |
| avgMs.Should().BeLessThan(10, "cached lookups must be sub-10ms per SC-004"); | |
| public async Task CachedLookup_UsesCachedResult_AfterInitialization() | |
| { | |
| // Arrange | |
| using var cache = new MemoryCache(new MemoryCacheOptions()); | |
| var service = await CreateAndInitializeService(cache); | |
| // Act — perform an initial cached lookup and then repeat it | |
| var first = service.GetControl("AC-2"); | |
| first.Should().NotBeNull("the control should be available after initialization"); | |
| for (var i = 0; i < 99; i++) | |
| { | |
| var subsequent = service.GetControl("AC-2"); | |
| subsequent.Should().BeSameAs(first, "subsequent lookups should use the cached control instance"); | |
| } | |
| // Assert — covered by the above functional checks (no strict timing assertion) |
| /// Specific catalog version to target; used as cache key suffix when non-null. | ||
| /// </summary> | ||
| public string? TargetVersion { get; set; } |
There was a problem hiding this comment.
The options POCO is bound from the NistControls section, but it does not include several keys that are present in appsettings (e.g., EnableGitHubFetch) and it also defines TargetVersion that is never used anywhere. This makes configuration behavior harder to reason about (some settings are validated via options, others are read ad-hoc from IConfiguration). Consider adding EnableGitHubFetch (and using it in NistService) and either implementing TargetVersion support (e.g., versioned cache key) or removing it until it’s supported.
| /// Specific catalog version to target; used as cache key suffix when non-null. | |
| /// </summary> | |
| public string? TargetVersion { get; set; } | |
| /// Whether to attempt fetching the NIST catalog from GitHub. | |
| /// When disabled, the service should rely on cached or offline data only. | |
| /// </summary> | |
| public bool EnableGitHubFetch { get; set; } = true; |
| var json = File.ReadAllText(fallbackPath); | ||
| var catalog = JsonSerializer.Deserialize<JsonElement>(json, JsonOptions); | ||
|
|
||
| LoadFromCatalogJson(catalog); |
There was a problem hiding this comment.
Offline fallback loading calls LoadFromCatalogJson(...) but does not apply overlays/STIG/Azure mappings (unlike the embedded load path). If the offline fallback is intended to be a functional replacement for remote fetch in air-gapped mode, this will cause framework filtering and mappings to behave differently depending on source. Consider applying the same post-load mapping steps used by LoadFromEmbeddedResources().
| LoadFromCatalogJson(catalog); | |
| LoadFromCatalogJson(catalog); | |
| ApplyPostLoadMappings(); |
This pull request introduces the foundational contracts, data models, and quality checklist for the new NIST Controls Knowledge Foundation feature (
005-nist-controls-foundation). The main changes are the extension of theINistServiceinterface to add async methods, the introduction of new model records for enriched control data and catalog snapshots, and a detailed data model and specification checklist to ensure completeness and quality. These changes preserve backward compatibility and provide clear behavioral contracts and configuration options for the new feature.Service Interface and Model Extensions
INistServiceinterface to add four new async methods:GetControlEnhancementAsync,ValidateControlIdAsync,GetVersionAsync, andGetCatalogAsync, while preserving all existing members for backward compatibility.ControlEnhancement(enriched control view) andNistCatalogSnapshot(catalog state summary) to support the new async methods. [1] [2]Data Model and Configuration
ControlDefinition(existing),ControlEnhancement,NistCatalogSnapshot, andNistControlsOptions(new configuration POCO), including relationships and state transitions for catalog loading and caching.Specification Quality Assurance
Documentation and Project Structure Updates
005-nist-controls-foundation, including the use ofIMemoryCache, embedded JSON resources, and offline fallback.