Skip to content

Comments

feat: implement updated design for regional access boundary#1955

Open
nbayati wants to merge 14 commits intogoogleapis:mainfrom
nbayati:rab-update-feb
Open

feat: implement updated design for regional access boundary#1955
nbayati wants to merge 14 commits intogoogleapis:mainfrom
nbayati:rab-update-feb

Conversation

@nbayati
Copy link
Contributor

@nbayati nbayati commented Feb 7, 2026

Make the fetching async and non blocking.
Provide a method for manual override.
Implement proactive refresh every 6 hours.
Implement automatic recovery if api request fails due to stale regional boundary. Remove no-op signal and checks.
Refactor to Regional Access Boundary name.

Make the fetching async and non blocking.
Provide a method for manual override.
Implement proactive refresh every 6 hours.
Implement automatic recovery if api request fails due to stale regional boundary.
Remove no-op signal and checks.
Refactor to Regional Access Boundary name.
Make the fetching async and non blocking.
Provide a method for manual override.
Implement proactive refresh when RAB TTL has expired.
Implement automatic recovery in http if api request fails due to stale regional boundary. Remove no-op signal and checks.
Refactor to Regional Access Boundary name.
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @nbayati, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a significant architectural update to how regional access boundaries (RAB) are managed within the authentication library. The previous synchronous 'trust boundary' mechanism has been replaced with a more resilient and performant asynchronous system. This new design ensures that RAB information is fetched in the background, proactively refreshed, and automatically recovered in case of failures, thereby improving the overall reliability and user experience without blocking application execution.

Highlights

  • Refactoring and Renaming: The concept of 'Trust Boundary' has been comprehensively refactored and renamed to 'Regional Access Boundary' (RAB) across the codebase for improved clarity and consistency.
  • Asynchronous and Non-Blocking Fetching: Implemented asynchronous and non-blocking fetching of RAB information utilizing a dedicated background refresh thread, preventing UI freezes and improving responsiveness.
  • Manual Override Capability: A new with_regional_access_boundary method has been introduced, allowing developers to manually provide and cache RAB data, bypassing the initial asynchronous lookup.
  • Proactive Refresh Mechanism: RAB information is now proactively refreshed every 6 hours in the background to ensure its freshness and prevent the use of stale data.
  • Automatic Recovery from Stale RAB Errors: An automatic recovery mechanism has been implemented to detect stale RAB errors (HTTP 406 status code) and trigger an immediate background refresh with exponential backoff, followed by a retry of the original request.
  • Simplified Logic: The previous 'no-op' trust boundary signals and their associated checks have been removed, streamlining the RAB management logic.
Changelog
  • google/auth/_constants.py
    • Renamed endpoint constants from _TRUST_BOUNDARY_LOOKUP_ENDPOINT to _REGIONAL_ACCESS_BOUNDARY_LOOKUP_ENDPOINT.
    • Removed universe_domain from the endpoint format string, hardcoding it to googleapis.com.
  • google/auth/_helpers.py
    • Modified get_bool_from_env to simplify boolean parsing; any non-'true'/'1' value (or unset) is now considered false, and the InvalidValue exception for invalid inputs has been removed.
  • google/auth/_regional_access_boundary_utils.py
    • Added a new utility file containing _RegionalAccessBoundaryRefreshThread for background RAB lookups and _RegionalAccessBoundaryRefreshManager to manage refresh threads, defining default TTL and exponential cooldown periods.
  • google/auth/compute_engine/credentials.py
    • Updated the base class to CredentialsWithRegionalAccessBoundary.
    • Removed trust_boundary parameters from the constructor and copy methods.
    • Refactored the RAB lookup URL building to log errors and return None on failure instead of raising exceptions.
    • Introduced a _make_copy method to encapsulate credential copying logic and ensure RAB state is copied.
  • google/auth/credentials.py
    • Renamed CredentialsWithTrustBoundary to CredentialsWithRegionalAccessBoundary.
    • Introduced core RAB management logic, including with_regional_access_boundary for manual override, handle_stale_regional_access_boundary for automatic recovery, and _maybe_start_regional_access_boundary_refresh for proactive background refreshes.
    • Updated apply, before_request, and refresh methods to integrate with the new asynchronous RAB system.
    • Removed NO_OP_TRUST_BOUNDARY_LOCATIONS and NO_OP_TRUST_BOUNDARY_ENCODED_LOCATIONS constants, simplifying RAB state handling.
  • google/auth/external_account.py
    • Adapted to the new CredentialsWithRegionalAccessBoundary base class.
    • Removed trust_boundary parameters from constructors and copy methods.
    • Updated from_info to support regional_access_boundary configuration.
    • Removed explicit _handle_trust_boundary logic from the refresh method.
  • google/auth/external_account_authorized_user.py
    • Applied similar updates as external_account.py for authorized user credentials, including base class change and regional_access_boundary support in from_info.
  • google/auth/identity_pool.py
    • Removed the explicit call to self._handle_trust_boundary(request) from the refresh method, as RAB management is now handled by the CredentialsWithRegionalAccessBoundary base class.
  • google/auth/impersonated_credentials.py
    • Updated to use CredentialsWithRegionalAccessBoundary.
    • Renamed RAB endpoint constants.
    • Modified RAB URL building to log errors and return None instead of raising ValueError.
    • Adapted from_impersonated_service_account_info for regional_access_boundary configuration.
  • google/auth/transport/requests.py
    • Implemented _is_stale_regional_access_boundary_error to detect HTTP 406 responses indicating a stale RAB.
    • Added retry logic to the request method that automatically clears the cached boundary and re-attempts the request once if a stale RAB error is detected.
  • google/oauth2/_client.py
    • Renamed trust boundary lookup functions to regional access boundary (e.g., _lookup_trust_boundary to _lookup_regional_access_boundary).
    • Enhanced RAB lookup robustness by logging errors and returning None on failure instead of raising exceptions.
    • Increased exponential backoff retry attempts to 6 for RAB lookups.
    • Added HTTP 502 (Bad Gateway) as a retryable error for RAB lookups.
  • google/oauth2/service_account.py
    • Updated to use CredentialsWithRegionalAccessBoundary.
    • Removed trust_boundary parameters from constructors.
    • Integrated regional_access_boundary support into _from_signer_and_info.
Activity
  • The pull request implements a comprehensive set of design changes for Regional Access Boundary (RAB) handling.
  • It introduces asynchronous fetching, proactive refreshing, and automatic recovery mechanisms for RAB.
  • The changes involve significant refactoring of existing trust boundary logic and the addition of new utility components.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant and well-executed refactoring of the trust boundary mechanism, renaming it to Regional Access Boundary (RAB) and overhauling its implementation. The changes make the fetching of RAB information asynchronous and non-blocking, which is a great improvement for performance and responsiveness. The introduction of automatic recovery for stale boundaries, proactive refreshing, and a manual override method makes the feature more robust and flexible. The code is well-structured, particularly with the new _regional_access_boundary_utils.py file encapsulating the async logic. I've found one critical issue that needs to be addressed, but otherwise, the changes are excellent.

@nbayati nbayati changed the title Implement updated design changes for RAB feat: implement updated design for regional access boundary Feb 11, 2026
@nbayati nbayati marked this pull request as ready for review February 13, 2026 21:44
@nbayati nbayati requested review from a team as code owners February 13, 2026 21:44
@nbayati nbayati changed the title feat: implement updated design for regional access boundary feat: implement updated design for regional access boundary Feb 13, 2026
@daniel-sanche daniel-sanche added the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Feb 14, 2026
@yoshi-kokoro yoshi-kokoro removed the kokoro:force-run Add this label to force Kokoro to re-run the tests. label Feb 14, 2026
@daniel-sanche daniel-sanche requested a review from a team as a code owner February 18, 2026 19:29
@nbayati nbayati requested a review from lsirac February 18, 2026 19:59
Copy link
Collaborator

@daniel-sanche daniel-sanche left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to do another pass on this soon, but my main feedback so far is the we need to keep the public API consistent, even if we're not using parts of it anymore. We can add comments to the docstrings mentioning deprecated arguments and their replacements

scopes=None,
default_scopes=None,
universe_domain=None,
trust_boundary=None,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can't remove an argument for a public class

We could add a comment to mark it as deprecated though, if it's no longer needed. But what's the context here?

)

@_helpers.copy_docstring(credentials.CredentialsWithTrustBoundary)
def with_trust_boundary(self, trust_boundary):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is also part of the public API surface

)


class CredentialsWithTrustBoundary(Credentials):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to keep the old class around too

_WORKLOAD_IDENTITY_POOL_TRUST_BOUNDARY_LOOKUP_ENDPOINT = "https://iamcredentials.{universe_domain}/v1/projects/{project_number}/locations/global/workloadIdentityPools/{pool_id}/allowedLocations"
_SERVICE_ACCOUNT_REGIONAL_ACCESS_BOUNDARY_LOOKUP_ENDPOINT = "https://iamcredentials.googleapis.com/v1/projects/-/serviceAccounts/{service_account_email}/allowedLocations"
_WORKFORCE_POOL_REGIONAL_ACCESS_BOUNDARY_LOOKUP_ENDPOINT = "https://iamcredentials.googleapis.com/v1/locations/global/workforcePools/{pool_id}/allowedLocations"
_WORKLOAD_IDENTITY_POOL_REGIONAL_ACCESS_BOUNDARY_LOOKUP_ENDPOINT = "https://iamcredentials.googleapis.com/v1/projects/{project_number}/locations/global/workloadIdentityPools/{pool_id}/allowedLocations"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we not need to worry about universe domains now?

variable_name
)
)
return False
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this could be return value.lower() in ("true", "1")

self._lock = threading.Lock()
self._worker = None

def start_refresh(self, credentials, request):
Copy link
Collaborator

@daniel-sanche daniel-sanche Feb 18, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to get my head around the ownership here. It looks like each credentials object has a _RegionalAccessBoundaryRefreshManager to manage refresh? But we also pass in a credentials object at refresh time? Would start_refresh ever be called with a different credential object?

Does it make sense to return early from the manager when a worker is active, if the worker is working on a different credentials object? That seems like it could lead to misunderstandings

I just want to make sure I understand this, since there are locks in play.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

request (google.auth.transport.Request): The object used to make
HTTP requests.
"""
with self._lock:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this lock needed? Is the manager ever entered from a background threads?


DEFAULT_UNIVERSE_DOMAIN = "googleapis.com"
NO_OP_TRUST_BOUNDARY_LOCATIONS: List[str] = []
NO_OP_TRUST_BOUNDARY_ENCODED_LOCATIONS = "0x0"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These might be considered part of the public API too

@@ -157,24 +154,26 @@ def _build_trust_boundary_lookup_url(self):
try:
info = _metadata.get_service_account_info(request, "default")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no request in this scope, how does this work?

creds._universe_domain_cached = True
url = creds._build_regional_access_boundary_lookup_url()

mock_get_service_account_info.assert_called_once_with(mock.ANY, "default")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably shouldn't use mock.ANY, as I think this masked a bug. This should pass a mock request object to the call above, and then validate that it was passed here.

return {"x-allowed-locations": ""}
else:
return {"x-allowed-locations": self._trust_boundary["encodedLocations"]}
def _get_regional_access_boundary_header(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With the current implementation, if the background refresh fails repeatedly and the 6-hour TTL passes, we will continue to blindly attach the expired header?

- "true", "1" are considered true.
- "false", "0" are considered false.
Any other values will raise an exception.
- Any other value (or unset) is considered false.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Documentation says this is case insensitive, implementation doesn't match.

Comment on lines +78 to +84
@_helpers.copy_docstring(credentials_async.Credentials)
async def before_request(self, request, method, url, headers):
# Explicit override to bypass synchronous CredentialsWithRegionalAccessBoundary.
await credentials_async.Credentials.before_request(
self, request, method, url, headers
)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this bit?

# If all checks pass, start the background refresh.
self._regional_access_boundary_refresh_manager.start_refresh(self, request)

def _is_regional_access_boundary_lookup_required(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

os.environ is polled on every request. Should we cache it?

self._lock = threading.Lock()
self._worker = None

def start_refresh(self, credentials, request):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

)
target._current_rab_cooldown_duration = self._current_rab_cooldown_duration
# Create a new lock for the target instance to ensure independent thread-safety.
target._stale_boundary_lock = threading.Lock()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should not be creating a new threading.Lock() here, nor should we let the copied credential use a new _RegionalAccessBoundaryRefreshManager.

When credentials are copied via with_quota_project() or with_scopes(), they share the exact same expiry time. If a user has 5 scoped copies of a credential and the soft expiry threshold is reached, all 5 copies will try to refresh concurrently. Because they have different locks and managers, they will spawn 5 identical background threads and to call the lookup endpoint simultaneously.

To prevent these redundant network requests, cloned credentials should share the same _stale_boundary_lock and _regional_access_boundary_refresh_manager by reference so the manager can successfully deduplicate the refresh threads.

@lsirac
Copy link
Contributor

lsirac commented Feb 20, 2026

Great work with the PR! Adding some of Gemini's thoughts:

Test Coverage Gaps

A. Masking the missing request bug with mock.ANY
In tests/compute_engine/test_credentials.py, we are using mock.ANY to validate the request object being passed:
mock_get_service_account_info.assert_called_once_with(mock.ANY, "default")

This is dangerous because it is actively masking a NameError bug. The variable request isn't actually being passed down to _build_regional_access_boundary_lookup_url(), but mock.ANY blindly accepts the garbage/leaked variable from the test environment. We should instantiate a mock_req = mock.Mock(), pass it explicitly, and assert against that exact mock object rather than using mock.ANY. This will force us to fix the missing request arguments in the URL builder methods.

B. Missing coverage for the background thread's "Happy Path"
It looks like many of the old synchronous refresh() tests for the boundary lookups were deleted because the architecture shifted to a background thread (e.g., the massive -271 line deletions in Compute Engine tests). However, we currently have no tests verifying that the new background thread successfully executes, updates the credential state, and properly handles TTL/cooldowns. We need to add tests that explicitly validate _RegionalAccessBoundaryRefreshThread.run() to ensure we aren't silently failing in the background.

C. Missing coverage for Reactive Refresh (Stale RAB Error)
The design document mentions that if a developer provides a manual override that goes stale, the library should catch the "stale regional access boundary" error (406 Not Acceptable), clear the cache, retry, and trigger a background refresh. I noticed we don't have any transport-layer tests covering this reactive refresh logic. We need both the implementation and the tests to verify the cache clears and the retry succeeds.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants