Race condition when using Application Inference Profile with Bedrock plugin

The `get_inference_profile_info` function has a race condition in its caching logic. When multiple concurrent requests arrive, they will all pass the cache check before any thread can populate the cache, causing a thundering herd of duplicate API calls to `GetInferenceProfile` on Bedrock.

This can be replicated by:
- Add a custom model using an application inference profile ID to the Bedrock plugin
- In Studio, create a new chatbot app and set the newly imported model as the default
- Publish the app
- Send 100+ concurrent requests to this endpoint, (e.g., `http://localhost/v1/chat-messages`)

The expected behavior is that `GetInferenceProfile` will only be called once, and the rest of the requests should just use the cached data. However, you will notice from CloudTrail that this will result in 50-100 invocations to `GetInferenceProfile`.

`GetInferenceProfile` is a control plane API with a fairly low TPS (double digits based on my testing) that cannot be increased. The race condition can cause `ThrottlingException` errors during batch operations or with sufficient concurrent users, resulting in 500 errors.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Race condition when using Application Inference Profile with Bedrock plugin #164

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Race condition when using Application Inference Profile with Bedrock plugin #164

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions