Reduce custom app permissions and improve setup reliability#409
Merged
Conversation
Removes DelegatedPermissionGrant.ReadWrite.All and AgentIdentity.Create.All from the required CLI app permission set. Agent identity creation now uses Blueprint app-only credentials (AgentIdentity.CreateAsManager auto-granted to Blueprint apps). Principal-scoped oauth2 grants use AgentIdentityBlueprint.UpdateAuthProperties.All. EnsureServicePrincipalForAppIdAsync eliminated for agent identity SPs (id == appId for ServiceIdentity type), removing the Application.ReadWrite.All dependency. Adds exponential back-off retry loops for AADSTS700016 and Authorization_IdentityNotFound propagation errors on fresh blueprint setups. All propagation-lag retry logs downgraded to Debug (not user-actionable). Additional fixes: - --authmode obo with --aiteammate warns instead of hard-erroring - Messaging endpoint summary shows not-configured vs failed correctly - Explicit null guard on AgentBlueprintClientSecret before UnprotectSecret - Stale error message referencing removed permissions corrected - Retry loop convention aligned (maxAttempts / < throughout) - ConfigService omits null values from ExtractDynamicProperties to prevent null-overwrite cycle on re-run (issue 408 fix) Validated end-to-end across base, --aiteammate, --m365, and --authmode both paths as Agent ID Developer role with no Application.ReadWrite.All, DelegatedPermissionGrant.ReadWrite.All, AgentIdentity.ReadWrite.All, or AgentIdentity.Create.All on the custom app. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Dependency Review✅ No vulnerabilities or license issues or OpenSSF Scorecard issues found.Scanned FilesNone |
Contributor
There was a problem hiding this comment.
Pull request overview
Updates a365 setup flows to reduce required permissions on the custom CLI app and improve reliability on fresh blueprint setups (replication-lag retries, clearer messaging), while aligning generated config behavior to avoid null-overwrite cycles.
Changes:
- Switch agent identity creation to use Blueprint app-only credentials and remove
DelegatedPermissionGrant.ReadWrite.Allfrom required permission lists. - Add exponential backoff retries for transient Entra propagation errors and downgrade propagation-lag logs to
Debug. - Improve setup validation/UX:
--authmode obo --aiteammatewarns (continues), messaging endpoint summary distinguishes “not configured”, and generated config omits null dynamic properties.
Reviewed changes
Copilot reviewed 9 out of 10 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Tests/Microsoft.Agents.A365.DevTools.Cli.Tests/Services/Agent365ConfigServiceTests.cs | Adds regression tests ensuring null dynamic properties are omitted and non-null secrets persist. |
| src/Tests/Microsoft.Agents.A365.DevTools.Cli.Tests/Commands/SetupCommandTests.cs | Updates tests for --authmode + --aiteammate behavior (warning vs error) and validates incompatible modes still fail. |
| src/Microsoft.Agents.A365.DevTools.Cli/Services/GraphApiService.cs | Adds retry/backoff for blueprint token acquisition and agent identity creation; adjusts propagation-lag logging levels. |
| src/Microsoft.Agents.A365.DevTools.Cli/Services/ConfigService.cs | Filters out null values when extracting dynamic properties to avoid null-overwrite on reruns. |
| src/Microsoft.Agents.A365.DevTools.Cli/Constants/AuthenticationConstants.cs | Removes DelegatedPermissionGrant.ReadWrite.All from required permissions/scopes. |
| src/Microsoft.Agents.A365.DevTools.Cli/Commands/SetupSubcommands/SetupHelpers.cs | Improves summary output/action-required messaging for “messaging endpoint not configured”. |
| src/Microsoft.Agents.A365.DevTools.Cli/Commands/SetupSubcommands/NonDwBlueprintSetupOrchestrator.cs | Uses blueprint client secret for agent identity creation; removes agent identity SP “ensure” step. |
| src/Microsoft.Agents.A365.DevTools.Cli/Commands/SetupSubcommands/AllSubcommand.cs | Changes --authmode obo --aiteammate from hard error to warning; keeps other modes incompatible. |
| CHANGELOG.md | Documents permission reductions, retry behavior, and setup UX fixes. |
| .gitignore | Ignores docs/min-permissions/. |
- Reduce required delegated scopes for a365 CLI client app: - Use AgentIdentityBlueprint.ReadWrite.All as umbrella for blueprint ops - Require AgentIdentityBlueprintPrincipal.Create for SP creation - Replace Directory.Read.All with Application.Read.All - Remove User.ReadWrite.All, broad blueprint sub-scopes, and AppRoleAssignment.ReadWrite.All - Update all code, logging, and user guidance to reference new scopes - Role checks now decode wids claim from MSAL token (no Graph call) - Improve token acquisition retry logic for blueprint creation - Update tests and documentation to match new permission model - Endpoint registration guidance now points to Teams Developer Portal - Reduces privilege footprint; 7-permission set validated across admin and developer roles
gwharris7
previously approved these changes
May 8, 2026
- Fix off-by-one in retry log {Max} argument: pass maxRetries/maxAttempts
instead of maxRetries-1/maxAttempts-1 in three retry loops
- Assert exit code is 0 (not just non-1) in WarnsAndContinues test
- Replace brittle JSON string assertions with JsonNode parsing in
SaveStateAsync_NonNullStringProperty_IsWrittenToJson
- Remove misleading 'id == appId' comment in NonDwBlueprintSetupOrchestrator
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 19 changed files in this pull request and generated 4 comments.
Comments suppressed due to low confidence (1)
src/Microsoft.Agents.A365.DevTools.Cli/Services/GraphApiService.cs:1526
- GetBlueprintAccessTokenAsync increased maxRetries to 12 with exponential backoff capped at 60s. With baseDelaySeconds=5 this can sleep for ~8+ minutes total (5+10+20+40+60*7), which is a large behavioral/operational change and doesn’t match the PR description’s “~60s total”. Consider reducing attempts (or lowering baseDelaySeconds) and rename maxRetries to maxAttempts for clarity since the loop condition is attempt < maxRetries (total attempts).
const int maxRetries = 12;
const int baseDelaySeconds = 5;
for (int attempt = 0; attempt < maxRetries; attempt++)
{
- Fix XML comment in InteractiveGraphAuthServiceTests: AgentIdentityBlueprintPrincipal.Create is a separate required scope, not covered by the ReadWrite.All umbrella - Improve Contains() guard comment in AgentBlueprintService: explicitly states agent user cleanup is disabled (intentional) until create-instance is re-enabled - Document RequiredPermissionGrantScopes = [] intent: empty routes to standard AuthenticationService token path which already carries all required scopes via RequiredClientAppPermissions (PR #409) - Document RequiredS2SGrantScopes = [] intent: AppRoleAssignment.ReadWrite.All removed; admins have bypass, developers fall back to PowerShell instructions (PR #409) - Add detection rules E/F/G to pr-code-reviewer.md to catch these patterns in future reviews Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The "Run tests" step on Ubuntu has hung twice on this branch (4h 50m and 1h+
respectively) with no log surfaced because GitHub publishes job logs only on
completion. Add two narrowly-scoped diagnostic guardrails so the next hang
fails fast and tells us which test is stuck:
- job-level `timeout-minutes: 20` — bounds the run to ~2x the Windows-local
suite time instead of GitHub's 6-hour default.
- `--blame-hang --blame-hang-timeout 5min` — produces a Sequence_*.xml hang
report naming the stuck test method (and the test before it) when any
single test exceeds 5 minutes.
Also demote the MsalBrowserCredential "Failed to register persistent token
cache" warning to Debug. The same exception was already logged at Debug on
the line above; the warning text ("auth prompts may be repeated") was not
actionable by the user (common cause on headless Linux is no D-Bus/Keychain)
and produced noise in CI test output.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Six Copilot AI comments addressed:
- GraphApiService: rewrite XML doc for CheckDirectoryRoleAsync and its two
wrappers to describe the wids-claim implementation (no Graph call, no
scope dependency) and document the group-assignment / PIM-eligible
limitations. The previous doc still described the old transitiveMemberOf
query path.
- SetupHelpers: replace the misleading "uses AgentIdentityBlueprint.ReadWrite.All
as umbrella" comment with an explicit note that permissionGrantScopes is
intentionally empty and that empty arrays fall through to the standard
token path.
- AuthenticationConstants: delete unused AgentIdentityBlueprintDeleteRestoreAllScope
and AgentIdentityBlueprintAddRemoveCredsAllScope constants — they
contradicted the code that uses the ReadWrite.All umbrella for those
operations, and grep confirmed no callers in src/.
- CHANGELOG: correct the retry-timing claim from "~60s total" to several
minutes worst case (12 attempts × 60s cap ≈ 8 min for the blueprint
token retry).
- GraphApiServiceTests: rename IsCurrentUserAdminAsync_GraphFails_ReturnsUnknown
and IsCurrentUserAgentIdAdminAsync_GraphReturnsNull_ReturnsUnknown to
*_TokenAcquisitionFails_ReturnsUnknown so the names match the now-token-based
failure mode.
- MessagingEndpointFailureReasons: extract the four string literals
("NotOwner", "BlueprintMissing", "NotConfigured", "Other") into a shared
constant class in Constants/, replacing 11 string-literal usages across
AllSubcommand, SetupHelpers, TeamsGraphBackendConfigurator, and
AllSubcommandTests.
CI fix:
- MockToolingServerSubcommandTests: remove HandleStartServer_WithValidPort_LogsStartingMessage
and HandleStartServer_WithNullPort_UsesDefaultPort. Both started a real
Kestrel server via Server.Start() on a fire-and-forget LongRunning task
that the test never tore down. On Linux CI this caused two failures:
(a) the Theory port 1 case requires root and never binds, and (b) parallel
tests collided on the leaked port 5309 binding. --blame-hang-timeout caught
the deadlock on the previous run. Remaining tests still cover handler logic
(dry-run, background, invalid port, verbose) without binding any port; a
comment documents the decision to keep the regression from coming back.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous CI run hung in PermissionsSubcommandTests.ConfigureMcpPermissionsAsync_V1AndMetadataScopes_AreKnownAndProceed for 5+ minutes until --blame-hang-timeout aborted the run. Root cause: BatchPermissionsOrchestrator's pre-warm call, graph.GraphGetAsync(tenantId, "/v1.0/me?$select=id", ct, scopes: prewarmScopes) now receives scopes: [] because RequiredPermissionGrantScopes was emptied earlier on this branch. Empty scopes route EnsureGraphHeadersAsync to the standard token path (GetGraphAccessTokenAsync), and on a partial mock that falls through to the real MSAL AuthenticationService. On Linux CI with no cached credentials, that blocks waiting for browser/device-code auth. Windows masked it with cached tokens (2s test runtime). Fix: pre-stub three virtual GraphApiService methods (GraphGetAsync, IsCurrentUserAdminAsync, IsCurrentUserAgentIdAdminAsync) in the test class constructor so the orchestrator gets a null pre-warm response and short-circuits out of Phase 1/2/3 deterministically. Inline comment documents why so a future reader hitting the same pattern in another test class has the reasoning. Targeted test now runs in 178 ms (was 2 s on Windows, 5+ min hang on Linux). Full suite drops from 12.58 s to 5.18 s for 1392 passing tests. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two workstreams combined in one commit per request. A. Copilot review-comment fixes on PR #409: - GraphApiService.CheckDirectoryRoleAsync: previously acquired the role-check token via AuthenticationService → PowerShell well-known clientId, which does NOT have the wids optional claim configured. The method always returned Unknown, causing BatchPermissionsOrchestrator to treat real Global Admins as non-admins (admin URL printed even when the signed-in user could grant inline). Now routes through _tokenProvider with CustomClientAppId and User.Read, so the JWT comes from the app that actually carries wids. - GraphApiService.EnsureGraphHeadersAsync: empty IEnumerable<string> previously fell through to the same PowerShell-clientId path. Routing changed to use _tokenProvider whenever (hasScopes || hasCustomApp). Bootstrap escape hatch preserved: no scopes AND no CustomClientAppId still uses legacy AuthenticationService so the initial app lookup doesn't hang on a null clientId. - GraphApiServiceTests: helper mocks now return the wids JWT via the token provider (matching the new production path). Production methods called by 8 existing tests still pass. - pr-code-reviewer.md: added Rule H — "JWT claim decoded → verify the token was issued by the app registration that has the claim configured." Cites PR #409 as the concrete example so reviewers ground future analysis. B. Remove a365 deploy references from CLI code: - PermissionsSubcommand.cs: help text and runtime "Next step" log no longer reference the long-removed 'a365 deploy'. Both now point at 'a365 publish', the actual next a365 command in the workflow. - PermissionsSubcommandTests.cs: assertion updated to pin "a365 publish". - NodeBuildFailedException.cs and NodeDependencyInstallException.cs deleted: dead code since a365 deploy was removed (no throw sites, no test refs). - ErrorCodes.cs: removed NodeBuildFailed and NodeDependencyInstallFailed (only callers were the deleted exception classes). - design.md: removed DeployCommand.cs row, removed the five deploy-era service rows from the Services folder tree, replaced the entire "Multiplatform Deployment Architecture" section (IPlatformBuilder interface + Deployment Pipeline mermaid + Restart Mode) with a tight "Multiplatform Project Detection" section that accurately describes what PlatformDetector does today (used by publish, not deploy). Fixed Program.cs sketch. - CHANGELOG.md: one bullet under [Unreleased] Fixed documenting the user-visible help/log change. Validation: - Unit suite: 1392/1392 pass, 7.2s total, no slow tests. - End-to-end Run 2-retest2 Minimum (cached cache, 8s): all role-check tokens from clientId 716ae110- (test custom app). - End-to-end Run 2-retest2 Medium (cleared cache, 1m 50s): bootstrap escape hatch correctly used legacy AuthenticationService, no Connect-MgGraph fallback; steady-state used custom app. Doc-side a365 deploy references in docs/ai-workflows/, docs/agent365-guided- setup/, CLAUDE.md, DEVELOPER.md, and two folder READMEs deferred to a follow-on PR (per user's plan scope). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
biswapm
approved these changes
May 11, 2026
gwharris7
approved these changes
May 11, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Removes DelegatedPermissionGrant.ReadWrite.All and AgentIdentity.Create.All from the required CLI app permission set. Agent identity creation now uses Blueprint app-only credentials (AgentIdentity.CreateAsManager auto-granted to Blueprint apps). Principal-scoped oauth2 grants use AgentIdentityBlueprint.UpdateAuthProperties.All. EnsureServicePrincipalForAppIdAsync eliminated for agent identity SPs (id == appId for ServiceIdentity type), removing the Application.ReadWrite.All dependency.
Adds exponential back-off retry loops for AADSTS700016 and Authorization_IdentityNotFound propagation errors on fresh blueprint setups. All propagation-lag retry logs downgraded to Debug (not user-actionable).
Additional fixes:
Validated end-to-end across base, --aiteammate, --m365, and --authmode both paths as Agent ID Developer role with no Application.ReadWrite.All, DelegatedPermissionGrant.ReadWrite.All, AgentIdentity.ReadWrite.All, or AgentIdentity.Create.All on the custom app.