Skip to content

Race condition when using Application Inference Profile with Bedrock plugin #164

@ericfzhu

Description

@ericfzhu

The get_inference_profile_info function has a race condition in its caching logic. When multiple concurrent requests arrive, they will all pass the cache check before any thread can populate the cache, causing a thundering herd of duplicate API calls to GetInferenceProfile on Bedrock.

This can be replicated by:

  • Add a custom model using an application inference profile ID to the Bedrock plugin
  • In Studio, create a new chatbot app and set the newly imported model as the default
  • Publish the app
  • Send 100+ concurrent requests to this endpoint, (e.g., http://localhost/v1/chat-messages)

The expected behavior is that GetInferenceProfile will only be called once, and the rest of the requests should just use the cached data. However, you will notice from CloudTrail that this will result in 50-100 invocations to GetInferenceProfile.

GetInferenceProfile is a control plane API with a fairly low TPS (double digits based on my testing) that cannot be increased. The race condition can cause ThrottlingException errors during batch operations or with sufficient concurrent users, resulting in 500 errors.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions