Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions braintrust/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -158,8 +158,20 @@ This Helm chart includes comprehensive automated unit tests.

## Breaking Changes

### Version 2

With version 2 of this helm, the Brainstore pods are split into Readers and Writers improving performance and the ability to independently scale for more read operations or write operations. For existing customers that have deployed our Helm or via other means on Kubernetes, please update your override values file or deployment to match this change. This will result in no data loss, but will be a brief downtime as the existing Brainstore Pods are removed and new Brainstore Pods for Reading and Writing are launched.

### Version 3

Breaking change only for Azure customers which introduced the Azure Container Storage CSI driver.

### Version 4

This version of the Helm is in preparation of 2.0.0 of the Braintrust Self hosted Data Plane. Starting with 1.1.32 Brainstore will now need to reach out to the API, where before Brainstore didn't talk to the API. In Helm this is being done over the internal Kubernetes endpoint. If you have additional security restrictions or are limiting traffic between services, this will need to be allowed before upgrading to 2.0.0 of the data plane.

We are also increasing the default sizing of our deployments, please ensure you have the node pool capacity for these increased defaults.

## Example Values Files

Example values files for different cloud providers and configurations are located in the `examples/` folder.
14 changes: 7 additions & 7 deletions braintrust/examples/google-autopilot/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ api:
service:
networking.gke.io/load-balancer-type: "Internal"
replicas: 4
# Uncomment the following section to use usee a different image or tag from the version in the Helm release
# Uncomment the following section to use a different image or tag from the version in the Helm release
#image:
#repository: public.ecr.aws/braintrust/standalone-api
#tag: "<your image tag>"
Expand Down Expand Up @@ -75,11 +75,11 @@ brainstore:
cpu: "16"
memory: "32Gi"
limits:
cpu: "20"
memory: "40Gi"
cpu: "16"
memory: "32Gi"
cacheDir: "/mnt/tmp/brainstore"
objectStoreCacheMemoryLimit: "1Gi"
objectStoreCacheFileSize: "100Gi"
objectStoreCacheFileSize: "1000Gi"
verbose: true
volume:
size: "200Gi"
Expand All @@ -99,11 +99,11 @@ brainstore:
cpu: "32"
memory: "64Gi"
limits:
cpu: "40"
memory: "80Gi"
cpu: "32"
memory: "64Gi"
cacheDir: "/mnt/tmp/brainstore"
objectStoreCacheMemoryLimit: "1Gi"
objectStoreCacheFileSize: "100Gi"
objectStoreCacheFileSize: "1000Gi"
verbose: true
volume:
size: "200Gi"
Expand Down
144 changes: 144 additions & 0 deletions braintrust/examples/google-standard/values.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,144 @@
# Sample values for GKE Standard deployment
#
# GKE Standard requires manual node pool configuration:
# - Create a dedicated node pool with local NVMe SSDs for Brainstore workloads
# - Recommended machine types: c4-standard-32 or higher with local SSDs
# - Configure local SSDs: Use 4x375GB local SSDs (1500GB total) or more
# - Total local SSD capacity should exceed the volume.size configured below

# Global configs
global:
orgName: "<your Braintrust org name>"
namespace: "braintrust"

# Cloud provider configuration
cloud: "google"

# Google Cloud specific configuration for Standard
google:
mode: "standard"

objectStorage:
google:
brainstoreBucket: "<your brainstore bucket name>"
apiBucket: "<your api bucket name>"

api:
name: "braintrust-api"
annotations:
service:
networking.gke.io/load-balancer-type: "Internal"
replicas: 4
# Uncomment the following section to use a different image or tag from the version in the Helm release
#image:
#repository: public.ecr.aws/braintrust/standalone-api
#tag: "<your image tag>"
service:
type: LoadBalancer
port: 8000
portName: http
serviceAccount:
name: "braintrust-api"
googleServiceAccount: "<your Braintrust API Google service account>"
# this is for native GCS authentication via workload identity (defaults to false for S3-compatible access) Requires v1.1.31 or later of the dataplane to be set to true.
enableGcsAuth: false
nodeSelector:
cloud.google.com/gke-nodepool: "api"
resources:
requests:
cpu: "4"
memory: "4Gi"
limits:
cpu: "4"
memory: "8Gi"
extraEnvVars:
# For S3-compatible GCS Storage, set the AWS_REGION environment variable to the region of your GCS bucket
- name: AWS_REGION
value: "us-central1"

# Brainstore configuration (split into reader and writer)
brainstore:
serviceAccount:
name: "brainstore"
googleServiceAccount: "<your Braintrust Brainstore Google service account>"
# Uncomment the following section to use a different image or tag from the version in the Helm release
#image:
#repository: public.ecr.aws/braintrust/brainstore
#tag: "<your image tag>"
# New deployments should use objectStorage as the locks backend. Existing deployments should remain on redis at this time.
locksBackend: "objectStorage"

# Brainstore Reader configuration
reader:
name: "brainstore-reader"
replicas: 2
service:
name: ""
type: ClusterIP
port: 4000
portName: http
nodeSelector:
cloud.google.com/gke-nodepool: "brainstore" # Target your node pool
resources:
requests:
cpu: "16"
memory: "32Gi"
limits:
cpu: "16"
memory: "32Gi"
affinity: # Prevent readers and writers from sharing nodes
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- brainstore-reader
- brainstore-writer
topologyKey: kubernetes.io/hostname
cacheDir: "/mnt/tmp/brainstore"
objectStoreCacheMemoryLimit: "1Gi"
objectStoreCacheFileSize: "1000Gi"
verbose: true
volume:
size: "200Gi"
extraEnvVars:

# Brainstore Writer configuration
writer:
name: "brainstore-writer"
replicas: 1
service:
name: ""
type: ClusterIP
port: 4000
portName: http
nodeSelector:
cloud.google.com/gke-nodepool: "brainstore"
resources:
requests:
cpu: "32"
memory: "64Gi"
limits:
cpu: "32"
memory: "64Gi"
affinity: # Prevent readers and writers from sharing nodes
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- brainstore-reader
- brainstore-writer
topologyKey: kubernetes.io/hostname
cacheDir: "/mnt/tmp/brainstore"
objectStoreCacheMemoryLimit: "1Gi"
objectStoreCacheFileSize: "1000Gi"
verbose: true
volume:
size: "200Gi"
extraEnvVars:

7 changes: 7 additions & 0 deletions braintrust/templates/brainstore-reader-deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,13 @@ spec:
secretKeyRef:
name: braintrust-secrets
key: REDIS_URL
- name: BRAINSTORE_XACT_MANAGER_URI
valueFrom:
secretKeyRef:
name: braintrust-secrets
key: REDIS_URL
- name: BRAINSTORE_AI_PROXY_URL
value: "http://{{ .Values.api.service.name | default .Values.api.name }}:{{ .Values.api.service.port }}"
{{- if eq .Values.brainstore.locksBackend "redis" }}
- name: BRAINSTORE_LOCKS_URI
valueFrom:
Expand Down
7 changes: 7 additions & 0 deletions braintrust/templates/brainstore-writer-deployment.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -107,6 +107,13 @@ spec:
secretKeyRef:
name: braintrust-secrets
key: REDIS_URL
- name: BRAINSTORE_XACT_MANAGER_URI
valueFrom:
secretKeyRef:
name: braintrust-secrets
key: REDIS_URL
- name: BRAINSTORE_AI_PROXY_URL
value: "http://{{ .Values.api.service.name | default .Values.api.name }}:{{ .Values.api.service.port }}"
{{- if eq .Values.brainstore.locksBackend "redis" }}
- name: BRAINSTORE_LOCKS_URI
valueFrom:
Expand Down
27 changes: 27 additions & 0 deletions braintrust/tests/brainstore-reader_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -203,3 +203,30 @@ tests:
path: spec.template.spec.containers[0].livenessProbe
- isNull:
path: spec.template.spec.containers[0].readinessProbe

- it: should include BRAINSTORE_XACT_MANAGER_URI environment variable
values:
- __fixtures__/base-values.yaml
release:
namespace: "braintrust"
asserts:
- contains:
path: spec.template.spec.containers[0].env
content:
name: BRAINSTORE_XACT_MANAGER_URI
valueFrom:
secretKeyRef:
name: braintrust-secrets
key: REDIS_URL

- it: should include BRAINSTORE_AI_PROXY_URL environment variable
values:
- __fixtures__/base-values.yaml
release:
namespace: "braintrust"
asserts:
- contains:
path: spec.template.spec.containers[0].env
content:
name: BRAINSTORE_AI_PROXY_URL
value: "http://braintrust-api:8000"
27 changes: 27 additions & 0 deletions braintrust/tests/brainstore-writer_test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -203,3 +203,30 @@ tests:
path: spec.template.spec.containers[0].livenessProbe
- isNull:
path: spec.template.spec.containers[0].readinessProbe

- it: should include BRAINSTORE_XACT_MANAGER_URI environment variable
values:
- __fixtures__/base-values.yaml
release:
namespace: "braintrust"
asserts:
- contains:
path: spec.template.spec.containers[0].env
content:
name: BRAINSTORE_XACT_MANAGER_URI
valueFrom:
secretKeyRef:
name: braintrust-secrets
key: REDIS_URL

- it: should include BRAINSTORE_AI_PROXY_URL environment variable
values:
- __fixtures__/base-values.yaml
release:
namespace: "braintrust"
asserts:
- contains:
path: spec.template.spec.containers[0].env
content:
name: BRAINSTORE_AI_PROXY_URL
value: "http://braintrust-api:8000"
22 changes: 11 additions & 11 deletions braintrust/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -78,7 +78,7 @@ api:
service: {}
pod: {}
serviceaccount: {}
replicas: 2
replicas: 4
image:
repository: public.ecr.aws/braintrust/standalone-api
tag: v1.1.31
Expand Down Expand Up @@ -204,14 +204,14 @@ brainstore:
portName: http
resources:
requests:
cpu: "4"
memory: "8Gi"
cpu: "16"
memory: "32Gi"
limits:
cpu: "8"
memory: "16Gi"
cpu: "16"
memory: "32Gi"
cacheDir: "/mnt/tmp/brainstore"
objectStoreCacheMemoryLimit: "1Gi"
objectStoreCacheFileSize: "50Gi"
objectStoreCacheFileSize: "1000Gi"
verbose: true
# Optional: Volume configuration for cache storage
# When not set, uses default emptyDir: {} (backward compatible)
Expand Down Expand Up @@ -241,14 +241,14 @@ brainstore:
portName: http
resources:
requests:
cpu: "8"
memory: "16Gi"
cpu: "32"
memory: "64Gi"
limits:
cpu: "16"
memory: "32Gi"
cpu: "32"
memory: "64Gi"
cacheDir: "/mnt/tmp/brainstore"
objectStoreCacheMemoryLimit: "1Gi"
objectStoreCacheFileSize: "50Gi"
objectStoreCacheFileSize: "1000Gi"
verbose: true
# Optional: Volume configuration for cache storage
# When not set, uses default emptyDir: {} (backward compatible)
Expand Down