feat: implement updated design for regional access boundary#1955
feat: implement updated design for regional access boundary#1955nbayati wants to merge 14 commits intogoogleapis:mainfrom
Conversation
Make the fetching async and non blocking. Provide a method for manual override. Implement proactive refresh every 6 hours. Implement automatic recovery if api request fails due to stale regional boundary. Remove no-op signal and checks. Refactor to Regional Access Boundary name.
Make the fetching async and non blocking. Provide a method for manual override. Implement proactive refresh when RAB TTL has expired. Implement automatic recovery in http if api request fails due to stale regional boundary. Remove no-op signal and checks. Refactor to Regional Access Boundary name.
Summary of ChangesHello @nbayati, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a significant architectural update to how regional access boundaries (RAB) are managed within the authentication library. The previous synchronous 'trust boundary' mechanism has been replaced with a more resilient and performant asynchronous system. This new design ensures that RAB information is fetched in the background, proactively refreshed, and automatically recovered in case of failures, thereby improving the overall reliability and user experience without blocking application execution. Highlights
Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a significant and well-executed refactoring of the trust boundary mechanism, renaming it to Regional Access Boundary (RAB) and overhauling its implementation. The changes make the fetching of RAB information asynchronous and non-blocking, which is a great improvement for performance and responsiveness. The introduction of automatic recovery for stale boundaries, proactive refreshing, and a manual override method makes the feature more robust and flexible. The code is well-structured, particularly with the new _regional_access_boundary_utils.py file encapsulating the async logic. I've found one critical issue that needs to be addressed, but otherwise, the changes are excellent.
…ry-python into rab-update-feb
daniel-sanche
left a comment
There was a problem hiding this comment.
I'll try to do another pass on this soon, but my main feedback so far is the we need to keep the public API consistent, even if we're not using parts of it anymore. We can add comments to the docstrings mentioning deprecated arguments and their replacements
| scopes=None, | ||
| default_scopes=None, | ||
| universe_domain=None, | ||
| trust_boundary=None, |
There was a problem hiding this comment.
We can't remove an argument for a public class
We could add a comment to mark it as deprecated though, if it's no longer needed. But what's the context here?
| ) | ||
|
|
||
| @_helpers.copy_docstring(credentials.CredentialsWithTrustBoundary) | ||
| def with_trust_boundary(self, trust_boundary): |
There was a problem hiding this comment.
this is also part of the public API surface
| ) | ||
|
|
||
|
|
||
| class CredentialsWithTrustBoundary(Credentials): |
There was a problem hiding this comment.
We have to keep the old class around too
| _WORKLOAD_IDENTITY_POOL_TRUST_BOUNDARY_LOOKUP_ENDPOINT = "https://iamcredentials.{universe_domain}/v1/projects/{project_number}/locations/global/workloadIdentityPools/{pool_id}/allowedLocations" | ||
| _SERVICE_ACCOUNT_REGIONAL_ACCESS_BOUNDARY_LOOKUP_ENDPOINT = "https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/{service_account_email}/allowedLocations" | ||
| _WORKFORCE_POOL_REGIONAL_ACCESS_BOUNDARY_LOOKUP_ENDPOINT = "https://iamcredentials.googleapis.com/v1/locations/global/workforcePools/{pool_id}/allowedLocations" | ||
| _WORKLOAD_IDENTITY_POOL_REGIONAL_ACCESS_BOUNDARY_LOOKUP_ENDPOINT = "https://iamcredentials.googleapis.com/v1/projects/{project_number}/locations/global/workloadIdentityPools/{pool_id}/allowedLocations" |
There was a problem hiding this comment.
do we not need to worry about universe domains now?
| variable_name | ||
| ) | ||
| ) | ||
| return False |
There was a problem hiding this comment.
nit: this could be return value.lower() in ("true", "1")
| self._lock = threading.Lock() | ||
| self._worker = None | ||
|
|
||
| def start_refresh(self, credentials, request): |
There was a problem hiding this comment.
I'm trying to get my head around the ownership here. It looks like each credentials object has a _RegionalAccessBoundaryRefreshManager to manage refresh? But we also pass in a credentials object at refresh time? Would start_refresh ever be called with a different credential object?
Does it make sense to return early from the manager when a worker is active, if the worker is working on a different credentials object? That seems like it could lead to misunderstandings
I just want to make sure I understand this, since there are locks in play.
| request (google.auth.transport.Request): The object used to make | ||
| HTTP requests. | ||
| """ | ||
| with self._lock: |
There was a problem hiding this comment.
is this lock needed? Is the manager ever entered from a background threads?
|
|
||
| DEFAULT_UNIVERSE_DOMAIN = "googleapis.com" | ||
| NO_OP_TRUST_BOUNDARY_LOCATIONS: List[str] = [] | ||
| NO_OP_TRUST_BOUNDARY_ENCODED_LOCATIONS = "0x0" |
There was a problem hiding this comment.
These might be considered part of the public API too
| @@ -157,24 +154,26 @@ def _build_trust_boundary_lookup_url(self): | |||
| try: | |||
| info = _metadata.get_service_account_info(request, "default") | |||
There was a problem hiding this comment.
There's no request in this scope, how does this work?
| creds._universe_domain_cached = True | ||
| url = creds._build_regional_access_boundary_lookup_url() | ||
|
|
||
| mock_get_service_account_info.assert_called_once_with(mock.ANY, "default") |
There was a problem hiding this comment.
Probably shouldn't use mock.ANY, as I think this masked a bug. This should pass a mock request object to the call above, and then validate that it was passed here.
| return {"x-allowed-locations": ""} | ||
| else: | ||
| return {"x-allowed-locations": self._trust_boundary["encodedLocations"]} | ||
| def _get_regional_access_boundary_header(self): |
There was a problem hiding this comment.
With the current implementation, if the background refresh fails repeatedly and the 6-hour TTL passes, we will continue to blindly attach the expired header?
| - "true", "1" are considered true. | ||
| - "false", "0" are considered false. | ||
| Any other values will raise an exception. | ||
| - Any other value (or unset) is considered false. |
There was a problem hiding this comment.
Documentation says this is case insensitive, implementation doesn't match.
| @_helpers.copy_docstring(credentials_async.Credentials) | ||
| async def before_request(self, request, method, url, headers): | ||
| # Explicit override to bypass synchronous CredentialsWithRegionalAccessBoundary. | ||
| await credentials_async.Credentials.before_request( | ||
| self, request, method, url, headers | ||
| ) | ||
|
|
| # If all checks pass, start the background refresh. | ||
| self._regional_access_boundary_refresh_manager.start_refresh(self, request) | ||
|
|
||
| def _is_regional_access_boundary_lookup_required(self): |
There was a problem hiding this comment.
os.environ is polled on every request. Should we cache it?
| self._lock = threading.Lock() | ||
| self._worker = None | ||
|
|
||
| def start_refresh(self, credentials, request): |
| ) | ||
| target._current_rab_cooldown_duration = self._current_rab_cooldown_duration | ||
| # Create a new lock for the target instance to ensure independent thread-safety. | ||
| target._stale_boundary_lock = threading.Lock() |
There was a problem hiding this comment.
We should not be creating a new threading.Lock() here, nor should we let the copied credential use a new _RegionalAccessBoundaryRefreshManager.
When credentials are copied via with_quota_project() or with_scopes(), they share the exact same expiry time. If a user has 5 scoped copies of a credential and the soft expiry threshold is reached, all 5 copies will try to refresh concurrently. Because they have different locks and managers, they will spawn 5 identical background threads and to call the lookup endpoint simultaneously.
To prevent these redundant network requests, cloned credentials should share the same _stale_boundary_lock and _regional_access_boundary_refresh_manager by reference so the manager can successfully deduplicate the refresh threads.
|
Great work with the PR! Adding some of Gemini's thoughts: Test Coverage GapsA. Masking the missing This is dangerous because it is actively masking a B. Missing coverage for the background thread's "Happy Path" C. Missing coverage for Reactive Refresh (Stale RAB Error) |
Make the fetching async and non blocking.
Provide a method for manual override.
Implement proactive refresh every 6 hours.
Implement automatic recovery if api request fails due to stale regional boundary. Remove no-op signal and checks.
Refactor to Regional Access Boundary name.