Skip to content

aws-samples/sample-amazon-bedrock-global-cris

Repository files navigation

Amazon Bedrock Global Cross-Region Inference

Amazon Bedrock Global CRIS examples using Claude, Cohere, Amazon Nova, and TwelveLabs Pegasus models.

Blog: Access Anthropic Claude models in India on Amazon Bedrock with Global cross-Region inference

Global Cross-Region Inference

Global cross-Region inference extends cross-Region inference beyond geographic boundaries, enabling the routing of inference requests to supported commercial AWS Regions worldwide, optimizing available resources and enabling higher model throughput.

Examples

Foundation Models - Simple Examples

Model Model ID Converse Converse Stream Invoke Model Invoke Model Stream
Claude Haiku global.anthropic.claude-haiku converse stream invoke stream
Claude Opus global.anthropic.claude-opus converse stream invoke stream
Claude Opus 4.6 global.anthropic.claude-opus-4-6-v1 converse stream invoke stream
Claude Opus 4.7 global.anthropic.claude-opus-4-7 converse stream invoke stream
Claude Opus 4.8 global.anthropic.claude-opus-4-8 converse stream invoke stream
Claude Sonnet global.anthropic.claude-sonnet converse stream invoke stream
Claude Sonnet 4.6 global.anthropic.claude-sonnet-4-6 converse stream invoke stream
Claude Sonnet 5 global.anthropic.claude-sonnet-5 converse stream invoke stream
Amazon Nova Lite global.amazon.nova-lite converse stream invoke stream
TwelveLabs Pegasus invoke stream

Foundation Models - Advanced Examples

Advanced examples demonstrate features like adaptive thinking with effort levels, compaction for long conversations, custom summarization, and pause after compaction. These features require the InvokeModel or InvokeModelWithResponseStream APIs (not Converse).

Model Feature Invoke Model Invoke Model Stream
Claude Opus 4.6 All features (monolithic) invoke stream
Claude Opus 4.7 Adaptive Thinking + Effort invoke stream
Claude Opus 4.7 Compaction invoke stream
Claude Opus 4.7 Custom Summarization invoke stream
Claude Opus 4.7 Pause After Compaction invoke stream
Claude Opus 4.8 Adaptive Thinking + Effort invoke stream
Claude Opus 4.8 Compaction invoke stream
Claude Opus 4.8 Custom Summarization invoke stream
Claude Opus 4.8 Pause After Compaction invoke stream
Claude Sonnet 5 Adaptive Thinking (always on) invoke stream
Claude Sonnet 5 Compaction invoke stream
Claude Sonnet 5 Custom Summarization invoke stream
Claude Sonnet 5 Pause After Compaction invoke stream

Embeddings Models

Model Model ID Example
Cohere Embed global.cohere.embed invoke

Application Inference Profiles

Use Case Example
Multi-tenant workloads with isolated throughput and cost tracking invoke

Model Notes

Model Context Max Output Reasoning Sampling Params Key Difference
Claude Opus 4.6 1M 128K Adaptive thinking temperature, top_p, top_k supported Monolithic advanced example
Claude Opus 4.7 1M 128K Adaptive thinking Not supported Adds xhigh effort level
Claude Opus 4.8 1M 128K Adaptive thinking Not supported Deepest reasoning, long autonomous tasks
Claude Sonnet 4.6 1M 128K temperature, top_p, top_k supported Balanced speed and intelligence
Claude Sonnet 5 1M 128K Always ON (cannot disable) Not supported Near-Opus intelligence at Sonnet pricing

Setup

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt
pip install -e global-cris/foundation_models/   # Shared utilities for advanced examples

Environment Configuration (for Pegasus examples)

The TwelveLabs Pegasus video model examples require S3 bucket configuration:

cp .env.example .env
# Edit .env with your S3 bucket name and AWS region

Benefits of Global Cross-Region Inference

  • Enhanced throughput during peak demand -- Automatically routes requests to AWS Regions with available capacity, handling traffic spikes without manual intervention.
  • Cost-efficiency -- Approximately 10% savings on input and output token pricing compared to geographic cross-Region inference.
  • Streamlined monitoring -- CloudWatch and CloudTrail record log entries in your source Region, maintaining centralized observability.
  • On-demand quota flexibility -- Workloads dynamically route across the AWS global infrastructure, accessing a much larger pool of resources.

References

Security

See CONTRIBUTING for more information.

Responsible AI

Implement safeguards customized to your application requirements and responsible AI policies using Amazon Bedrock Guardrails

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. You should not use this AWS Content in your production accounts, or on production or other critical data. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

License

This library is licensed under the MIT-0 License. See the LICENSE file.